Guide for your new resume to apply the dream job with Resumize (My Rust & RAG side project)

Thanaphoom Babparn
8 min readMay 1, 2024

Introduction

Hello everyone, nice to meet you again! This article is a record of my side project related to RAG and LLM. This time, I chose to use Rust for the entire project. I hope everyone will enjoy reading it, or if you’re interested enough to try it yourself, I hope you have fun doing it! 🙇‍♂️

Moreover, I’ve attached various pieces of knowledge related to the topic. It’s suitable for beginners who are even more beginner than me. I hope it will be useful to you, even if just a little bit. 😁

Resumize or เรซูมั้ย?

Resumize or เรซูมั้ย? This is a project that I didn’t intend to make it big. I just wanted it to help guide the way in writing or adjusting resumes without having to search the internet and only find Resume Coaches that just want to collect money. So, I decided to use LLM. That’s the concept, haha.

I want it to help find ways on the resume without having to spend money.

P.S. For those who have more creativity than me or more computation power, such as having a GPU, feel free to do so. I don’t have money, haha.

Repository

Demo

Feel free to skip if you feel it too long.

Tech Stack

How it’s designed

My simple diagram to explain what is happening on the background

Use-cases

  1. Upload my own experience (as JSON)
  2. Upload Job description from recruiter by copy from job market
  3. Create Resume content based on job description with suggestion

Prerequisite knowledge

Before we start creating this, I want to give everyone some terms first. The terms you’ll encounter after this, what they are, what they do, briefly, to reduce the knowledge gap before we explore what I’ve done.

LLM (Large Language Models)

AI model that are created from large dataset for thinking and generating the ideas/contents like human. On each model has its own Pros depend on the purpose of training and the using dataset. Some of them are using in general context, some of them able to using in special way e.g. Coding and etc.

Well-known example

  • GPT-3/4 (OpenAI)
  • Llama 3 (Meta)

Llama 3

It’s an open-source LLM from Meta, designed for use cases of generative AI ideas for developers, researchers, or other general purposes. You can read more about it here.

Prompt Engineering

Prompt Engineering is a process of crafting input to guide a generative AI to produce output that aligns with a specific problem and achieves the expected results.

Prompt

This is the name of the input that we use in the world of generative AI to make it create a response for us. It could be a short word or a long sentence. The model uses this to create a related response.

System Message, User Message

System Message — This is a term used to assign a role to the LLM to control the results to be more within the scope we want.

User Message — This is the input that we type in, such as what we type into ChatGPT, for example.

If they are mixed together, the result will look like this.

RAG (Retrieval-Augmented Generation)

Retrieval-Augmented Generation (RAG) is the process to optimizing output of LLM by adding knowledge base as extras from the trained data before the model make the response/answer.

  • Generation is meaning LLM generate data from user query and create the new result base on the knowledge of LLM
  • But if the data is too general, out of date, or you need the data that specific to your business?
  • So? Retrieval-Augmented is meaning we will placing the data source (somewhere else) and called it documents.
  • Then before LLM create the response, We will retrieve the similar documents and attach together with prompt and send it to LLM for consideration of answer.

More information about RAG, Please kindly check here. I think this is the best one.

Vector Database

Vector Database is the database which designed to store data as vector (series of numerical).

  • Inserting => Transform document/chunk of text to vector and store to database
  • Retrieving => Searching data by similarity/context of data and return. (Similarity search)
Source: What is a Vector Database?

Qdrant

Qdrant is a vector database used for similarity search, written in Rust 🦀.

Actix Web

Actix Web is a web framework of Rust language, or we can call it writing Backend with Rust language. It allows developers to create Backend for Web/Mobile/Desktop Frontend.

Dioxus

Dioxus is a library that helps us create applications on desktop, web, mobile (as they say), inspired by React.

LlamaEdge

They say it’s the easiest, smallest, and fastest local LLM runtime and API server. That is, we use it to run local LLM. Also, LlamaEdge has an API Server that can interact with the model in the form of OpenAI-compatible REST APIs.

Let’s begin!

Run LlamaEdge on locally

Please follow this article about how to run WASM on your locally Getting Started with Llama-3–8B.

Below are the steps required for the project only

  1. Install WasmEdge via the following command line.
curl -sSf https://raw.githubusercontent.com/WasmEdge/WasmEdge/master/utils/install.sh | bash -s - - plugin wasi_nn-ggml

2. Download the Llama-3–8B model GGUF file.

curl -LO https://huggingface.co/second-state/Llama-3-8B-Instruct-GGUF/resolve/main/Meta-Llama-3-8B-Instruct-Q5_K_M.gguf

3. Download an API server app. It is also a cross-platform portable Wasm app that can run on many CPU and GPU devices. This API server is OpenAI-compatible.

curl -LO https://github.com/LlamaEdge/LlamaEdge/releases/latest/download/llama-api-server.wasm

4. Download the chatbot web UI to interact with the model with a chatbot UI.

curl -LO https://github.com/LlamaEdge/chatbot-ui/releases/latest/download/chatbot-ui.tar.gz
tar xzf chatbot-ui.tar.gz
rm chatbot-ui.tar.gz

5. Start an API server for the model.

wasmedge - dir .:. - nn-preload default:GGML:AUTO:Meta-Llama-3–8B-Instruct-Q5_K_M.gguf \
llama-api-server.wasm \
- prompt-template llama-3-chat \
- ctx-size 4096 \
- model-name Llama-3–8B

Run Qdrant by container approach

Depending on your approach, I would suggest to using container technology to speed up on build your playground.

Standalone container

docker run -p 6333:6333 -p 6334:6334 \
-v $(pwd)/qdrant_storage:/qdrant/storage:z \
qdrant/qdrant

Compose file (I used this approach)

podman compose up -d

Here my compose.yml

Run Actix Web

Run below command to start dev server on locally

cargo watch -x "run --bin server"

Code

Here’s a brief explanation of the code

  1. When starting the backend application, it loads the configuration into AppState and creates a collection for storing documents.
  2. The exposed port depends on the value in Setting.toml.
  3. The API for uploading documents accepts Multipart, then converts it through langchain-rust to make it a vector and puts it in the database.
  4. The API for resume suggestion accepts a Path as a Job title and Body as a Job description. The Content-Type is used as text/plain. It searches for related experiences with the Job Title and uses the Job Description to ask Llama 3.

Example of backend response — resume suggestion

Example of backend response

Run Dioxus desktop application

We want to have a frontend to interact with so we can see the complete flow of work. You can run it with this command.

dx serve --hot-reload --platform desktop

Note — In this case, it’s not necessary to run Tailwind because I’ve committed all the CSS. But if you want to modify Tailwind, follow these instructions instead.

Simple desktop application with 2 pages

Code

The way to make an enhancement

Here are some suggestions for enhancements, divided into different flows

General

Upload experience enhancement suggestion

  • If you use PDF, it could cover many use-cases. On the backend, you could switch to using PDFLoader.
  • From this design, consider making it asynchronous. For example, the user uploads a file => forwards the file elsewhere => immediately responds => another workload upserts the document on the vector database.
Enhancement upload flow to asynchronous approach by message queue
  • You can use user identity to separate related documents. We can search through metadata, right? But if we make consent for the experience data to be shared, we could suggest resumes better in the part of tips.

Generate new resume content enhancement

  • The current result is in Markdown, but Dioxus doesn’t support markdown on desktop yet. So, if anyone is making a Web frontend, there should be an easy way to display markdown. It would probably look better.
  • You can also make it an asynchronous approach. It could be like, we submit a request and then notify the result when it’s done. For example, send an Email or Push notification and have the result in the Inbox in our app.
  • I think there could be a prompt template that makes the result better.

Conclusions

How’s it going with this project? Honestly, it was built with the RAG mindset as the main focus. But since I’ve already started it, why not do the whole thing, not just the Backend?

The results are:

  • I’ve practiced writing Rust more.
  • I’ve reviewed Actix Web.
  • I’ve tried playing with Qdrant.
  • I’ve tried using Dioxus.
  • I’ve thought that it might be time to buy a new machine, just kidding! 😜

Anyway, if you got benefit from this article, thank you very much. But if the project is not that interesting, I apologize.

Thank you for reading. See you in the next article. I wish everyone happiness 🙇‍♂️.

Facebook: Thanaphoom Babparn
FB Page: TP Coder
LinkedIn: Thanaphoom Babparn
Website: TP Coder — Portfolio

--

--

Thanaphoom Babparn

Software engineer who wanna improve himself and make an impact on the world.