Retrieval-Augmented Generation (RAG): Unlocking AI’s Memory

on October 23, 2024

LLMs are incredibly powerful. They are capable of generating human-like text, answering questions with their vast knowledge-base, and processing complex language, allowing them to participate in conversation and produce quality content. LLMs, however, have one very large limitation: they only have access to the data on which they were trained.

This limits the ability of language models to be customized with specific information. An LLM won’t be able to provide accurate information about things that happened after it was trained. It also can’t access some of the more niche topics that weren’t included in the training data. And if you want a model to provide data from specific, vetted sources, general pre-trained language models give responses using their broad variety of information.

For example, if we ask an LLM what the score of the UK football game last weekend was, that falls outside of the LLM’s knowledge. Asking LLaMA resulted in the following:

“I’m happy to help! However, I need to clarify that the University of Kentucky does not have a football team. The University of Kentucky’s athletic teams are known as the Wildcats, and they compete in the Southeastern Conference (SEC) in sports such as basketball, baseball, and soccer. But they do not have a football team.

If you’re looking for information on a specific college football game, I’d be happy to help you with that! Please let me know the teams that played and I can try to find the score for you.”

The response above highlights another problem that comes with limited training information, hallucinations. Hallucinations occur when a language model generates information that is false, fabricated, or not based on any real-world data. These hallucinations happen because the model predicts likely text sequences based on patterns in its training data, rather than verifying facts. Essentially, the model doesn’t “know” about UK’s football team, so it can’t respond with accurate information about it.

So how can we get around this? We need to give the model access to information beyond what it was trained on. Ideally, we would be able to specify which information in particular the model should look at when answering a question. With RAG, this is possible.

Retrieval-Augmented Generation (RAG) allows us to store vast quantities of data in a vector database. Through a process called vectorization, text documents are converted to a long set of floating point numbers, vectors. These vectors are then collected together into a database. While the vectors can have somewhere in the range of 770 dimensions, think of them as having 2 dimensions and appearing on a coordinate plane. Suppose we vectorize our set of documents and it creates this graph:

Then, when we want to query our database, we take our query, or a piece of our query, and pass it through the same vectorization process performed on the documents. We can then compare it with the rest of the database:

Next, we find the closest neighboring documents using a simple distance calculation:

And now, we’ve found the 4 most semantically related documents for our query. We can inject these documents into the context window of our LLM to provide it the background information it needs to respond to our query. We have taken our large database of information, and narrowed it down to only what is most relevant to the user’s input.

This comes with the benefit of not filling up the context window with unnecessary information that could confuse the LLM and lead to an increased cost due to the larger number of tokens being processed. It also allows us to have vast stores of knowledge that would not all be able to fit in the context window of the model. These vector databases are not limited in size (except by an increasing search time), so we can store an expansive information set does not need to be retrained into the model. We can also easily add and remove elements from our vector databases as new information becomes relevant or old information becomes stale.

Summary

CAAI uses Retrieval Augmented Generation techniques across a variety of projects. In one case, RAG is used in a chatbot designed for the KY Agriculture Extension Network to provide resources from the university’s College of Agriculture webpages and database of PDF resources. In another project, RAG is implemented to ensure accurate nutritional information is given to users empowering them to make healthier decisions about food.

RAG is a powerful technique that allows us to retrieve highly relevant information from external sources, whether they be databases or documents, and supplement generated responses with this data. The result is a model that produces more accurate and contextually aware responses. Having RAG in our toolkit, allows us to turn around projects with such requirements in a quick, straightforward manner. If you’re interested in collaborating or want to learn more about these tools, please reach out.

Categories:

Tags:

large language models