Guide for your new resume to apply the dream job with Resumize (My Rust & RAG side project)

8 min readMay 1, 2024

Introduction

Hello everyone, nice to meet you again! This article is a record of my side project related to RAG and LLM. This time, I chose to use Rust for the entire project. I hope everyone will enjoy reading it, or if you’re interested enough to try it yourself, I hope you have fun doing it! 🙇‍♂️

Moreover, I’ve attached various pieces of knowledge related to the topic. It’s suitable for beginners who are even more beginner than me. I hope it will be useful to you, even if just a little bit. 😁

Resumize or เรซูมั้ย?

Resumize or เรซูมั้ย? This is a project that I didn’t intend to make it big. I just wanted it to help guide the way in writing or adjusting resumes without having to search the internet and only find Resume Coaches that just want to collect money. So, I decided to use LLM. That’s the concept, haha.

I want it to help find ways on the resume without having to spend money.

P.S. For those who have more creativity than me or more computation power, such as having a GPU, feel free to do so. I don’t have money, haha.

Repository

GitHub — marttp/resumize: Application to help generate resume content based on your experience and…

Application to help generate resume content based on your experience and your interesting job. — marttp/resumize

github.com

Demo

Feel free to skip if you feel it too long.

Tech Stack

How it’s designed

My simple diagram to explain what is happening on the background

Use-cases

Upload my own experience (as JSON)
Upload Job description from recruiter by copy from job market
Create Resume content based on job description with suggestion

Prerequisite knowledge

Before we start creating this, I want to give everyone some terms first. The terms you’ll encounter after this, what they are, what they do, briefly, to reduce the knowledge gap before we explore what I’ve done.

LLM (Large Language Models)

AI model that are created from large dataset for thinking and generating the ideas/contents like human. On each model has its own Pros depend on the purpose of training and the using dataset. Some of them are using in general context, some of them able to using in special way e.g. Coding and etc.

Well-known example

GPT-3/4 (OpenAI)
Llama 3 (Meta)

Llama 3

It’s an open-source LLM from Meta, designed for use cases of generative AI ideas for developers, researchers, or other general purposes. You can read more about it here.

Introducing Meta Llama 3: The most capable openly available LLM to date

Today, we’re introducing Meta Llama 3, the next generation of our state-of-the-art open source large language model. In…

ai.meta.com

Prompt Engineering

Prompt Engineering is a process of crafting input to guide a generative AI to produce output that aligns with a specific problem and achieves the expected results.

Prompt

This is the name of the input that we use in the world of generative AI to make it create a response for us. It could be a short word or a long sentence. The model uses this to create a related response.

System Message, User Message

System Message — This is a term used to assign a role to the LLM to control the results to be more within the scope we want.

User Message — This is the input that we type in, such as what we type into ChatGPT, for example.

If they are mixed together, the result will look like this.

RAG (Retrieval-Augmented Generation)

Retrieval-Augmented Generation (RAG) is the process to optimizing output of LLM by adding knowledge base as extras from the trained data before the model make the response/answer.

Generation is meaning LLM generate data from user query and create the new result base on the knowledge of LLM

But if the data is too general, out of date, or you need the data that specific to your business?
So? Retrieval-Augmented is meaning we will placing the data source (somewhere else) and called it documents.
Then before LLM create the response, We will retrieve the similar documents and attach together with prompt and send it to LLM for consideration of answer.

More information about RAG, Please kindly check here. I think this is the best one.

What is Retrieval-Augmented Generation (RAG)?

Vector Database

Vector Database is the database which designed to store data as vector (series of numerical).

Inserting => Transform document/chunk of text to vector and store to database
Retrieving => Searching data by similarity/context of data and return. (Similarity search)

Qdrant

Qdrant is a vector database used for similarity search, written in Rust 🦀.

GitHub — qdrant/qdrant: Qdrant — High-performance, massive-scale Vector Database for the next…

Qdrant — High-performance, massive-scale Vector Database for the next generation of AI. Also available in the cloud…

github.com

Actix Web

Actix Web is a web framework of Rust language, or we can call it writing Backend with Rust language. It allows developers to create Backend for Web/Mobile/Desktop Frontend.

Dioxus

Dioxus is a library that helps us create applications on desktop, web, mobile (as they say), inspired by React.

LlamaEdge

They say it’s the easiest, smallest, and fastest local LLM runtime and API server. That is, we use it to run local LLM. Also, LlamaEdge has an API Server that can interact with the model in the form of OpenAI-compatible REST APIs.

Let’s begin!

Run LlamaEdge on locally

Please follow this article about how to run WASM on your locally Getting Started with Llama-3–8B.

Getting Started with Llama-3–8B

Fast and Portable Llama2 Inference on the Heterogeneous Edge.

www.secondstate.io

Below are the steps required for the project only

Install WasmEdge via the following command line.

curl -sSf https://raw.githubusercontent.com/WasmEdge/WasmEdge/master/utils/install.sh | bash -s - - plugin wasi_nn-ggml

2. Download the Llama-3–8B model GGUF file.

curl -LO https://huggingface.co/second-state/Llama-3-8B-Instruct-GGUF/resolve/main/Meta-Llama-3-8B-Instruct-Q5_K_M.gguf

3. Download an API server app. It is also a cross-platform portable Wasm app that can run on many CPU and GPU devices. This API server is OpenAI-compatible.

curl -LO https://github.com/LlamaEdge/LlamaEdge/releases/latest/download/llama-api-server.wasm

4. Download the chatbot web UI to interact with the model with a chatbot UI.

curl -LO https://github.com/LlamaEdge/chatbot-ui/releases/latest/download/chatbot-ui.tar.gz
tar xzf chatbot-ui.tar.gz
rm chatbot-ui.tar.gz

5. Start an API server for the model.

wasmedge - dir .:. - nn-preload default:GGML:AUTO:Meta-Llama-3–8B-Instruct-Q5_K_M.gguf \
 llama-api-server.wasm \
 - prompt-template llama-3-chat \
 - ctx-size 4096 \
 - model-name Llama-3–8B

Run Qdrant by container approach

Depending on your approach, I would suggest to using container technology to speed up on build your playground.

Standalone container

docker run -p 6333:6333 -p 6334:6334 \
-v $(pwd)/qdrant_storage:/qdrant/storage:z \
qdrant/qdrant

Compose file (I used this approach)

podman compose up -d

Here my compose.yml

Run Actix Web

Run below command to start dev server on locally

cargo watch -x "run --bin server"

Code

Here’s a brief explanation of the code

When starting the backend application, it loads the configuration into AppState and creates a collection for storing documents.
The exposed port depends on the value in Setting.toml.
The API for uploading documents accepts Multipart, then converts it through langchain-rust to make it a vector and puts it in the database.
The API for resume suggestion accepts a Path as a Job title and Body as a Job description. The Content-Type is used as text/plain. It searches for related experiences with the Job Title and uses the Job Description to ask Llama 3.

Example of backend response — resume suggestion

Run Dioxus desktop application

We want to have a frontend to interact with so we can see the complete flow of work. You can run it with this command.

dx serve --hot-reload --platform desktop

Note — In this case, it’s not necessary to run Tailwind because I’ve committed all the CSS. But if you want to modify Tailwind, follow these instructions instead.