Building RAG Applications with Autogen and LlamaIndex: A Beginner's Guide
Have you ever wondered how to create a chatbot that can answer questions based on specific documents or data? Enter the world of RAG (Retrieval-Augmented Generation) applications! In this tutorial, I'll walk you through building a RAG chatbot using Autogen and LlamaIndex, two powerful tools in the AI developer's toolkit. I have previously written about how to build chatbots with Autogen before, you can read about it here.
If you are in a rush and just want to see the code head over to GitHub.
What is RAG?
Before we dive into the code, let's understand what RAG is. Retrieval-Augmented Generation is a technique that combines the power of large language models with the ability to retrieve specific information from a knowledge base. This means our chatbot can provide answers based on both its general knowledge and the specific data we provide.
We store data in a vector store, in this tutorial, we are going to use ChromeDB. When a user makes a query we search for similar documents in the vector store and send them together with the user question in the prompt. This gives the LLM the ability to incorporate our data into the answer it will give us.
Setting Up Our Environment
Before we start building the RAG chatbot, let's set up our development environment.
Python Installation
If you don't have Python installed, visit the official Python website and follow the installation instructions for your operating system.
Installing the dependencies
We'll use PyAutoGen, which is the Python package for AutoGen and other dependencies that will make the development of our chatbot easier:
pip install pyautogen groq llama-index chromadb python-dotenv llama-index-vector-stores-chroma
Getting the OPENAI_API_KEY
By default LlamaIndex uses text-embedding-ada-002
, which is the default embedding used by OpenAI. We need an OPENAI_API_KEY
for the embeddings that will be stored in the chromadb vector database. Head over to https://platform.openai.com/api-keys and grab an api key. If you don’t have an account you might need to register first. You might need to buy credits worth $5 to get started and if your account is new you might qualify for some free credits to get started.
Getting the GROQ_API_KEY
Groq will provide our language model. Follow these steps:
- Go to https://console.groq.com/ and create an account
- Navigate to the API keys section
- Create a new API key for the project
With these steps completed, your environment should be ready for building our RAG chatbot!
Building our RAG chatbot
Let's create our RAG chatbot step by step. Create a project directory named basic-rag-llamaindex-autogen
. Inside this directory, create a .env
file to store our API keys
GROQ_API_KEY=gsk_secret_key_xxxxxxxxxxxxxxxxxxx
OPENAI_API_KEY=sk-proj-secret-key-xxxxxxxxxxxxx
Import Required Libraries
Create a file named rag-chatbot.py
in your project directory. We'll build our chatbot in this file.
First, let's import the necessary libraries:
import os
from dotenv import load_dotenv
from autogen import ConversableAgent
import chromadb
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
from llama_index.vector_stores.chroma import ChromaVectorStore
from llama_index.core import StorageContext
load_dotenv()
These imports set us up with Autogen for creating our chatbot, LlamaIndex for managing our document index, and ChromaDB for our vector store as well as loading environment variables using the dotenv
library.
Adding documents
Create a folder in the root of your folder named documents
. This is where you will place the documents you want added to the vector database index. SimpleDirectoryReader
has the ability to read different formats including Markdown, PDFs, Word documents, PowerPoint decks, images, audio and video. We will configure the directory name in the next section.
Creating Our Document Index
The heart of our RAG application is the document index. This is where we store and retrieve information. Let's break down the initialize_index()
function:
def initialize_index():
db = chromadb.PersistentClient(path="./chroma_db")
chroma_collection = db.get_or_create_collection("my-docs-collection")
vector_store = ChromaVectorStore(chroma_collection=chroma_collection)
storage_context = StorageContext.from_defaults(vector_store=vector_store)
if chroma_collection.count() > 0:
print("Loading existing index...")
return VectorStoreIndex.from_vector_store(
vector_store, storage_context=storage_context
)
else:
print("Creating new index...")
documents = SimpleDirectoryReader("./documents").load_data()
return VectorStoreIndex.from_documents(
documents, storage_context=storage_context
)
index = initialize_index()
query_engine = index.as_query_engine()
This function does a few key things:
- It sets up a ChromaDB client to store our vectors persistently.
- It checks if we have an existing index. If so, it loads it; if not, it creates a new one from documents in a specified directory.
- It returns a
VectorStoreIndex
, which we'll use to query our documents.
Generating the prompt
The create_prompt()
function is where the magic happens. It takes a user's input, queries our document index, and creates a prompt for our chatbot:
def create_prompt(user_input):
result = query_engine.query(user_input)
prompt = f"""
Your Task: Provide a concise and informative response to the user's query, drawing on the provided context.
Context: {result}
User Query: {user_input}
Guidelines:
1. Relevance: Focus directly on the user's question.
2. Conciseness: Avoid unnecessary details.
3. Accuracy: Ensure factual correctness.
4. Clarity: Use clear language.
5. Contextual Awareness: Use general knowledge if context is insufficient.
6. Honesty: State if you lack information.
Response Format:
- Direct answer
- Brief explanation (if necessary)
- Citation (if relevant)
- Conclusion
"""
return prompt
This function queries our index with the user's input, then constructs a prompt that includes the retrieved information and guidelines for how to respond.
Setting Up Our Chatbot
We use Autogen's ConversableAgent
to create our chatbot:
llm_config = {
"config_list": [
{
"model": "llama-3.1-8b-instant",
"api_key": os.getenv("GROQ_API_KEY"),
"api_type": "groq",
}
]
}
rag_agent = ConversableAgent(
name="RAGbot",
system_message="You are a RAG chatbot",
llm_config=llm_config,
code_execution_config=False,
human_input_mode="NEVER",
)
Here, we're using the Llama 3.1 model via the Groq API. You'll need to set up your API key in a .env
file for this to work.
Running Our Chatbot
Finally, we have our main loop:
def main():
print("Welcome to RAGbot! Type 'exit', 'quit', or 'bye' to end the conversation.")
while True:
user_input = input(f"\\nUser: ")
if user_input.lower() in ["exit", "quit", "bye"]:
print(f"Goodbye! Have a great day!!")
break
prompt = create_prompt(user_input)
reply = rag_agent.generate_reply(messages=[{"content": prompt, "role": "user"}])
print(f"\\nRAGbot: {reply['content']}")
if __name__ == "__main__":
main()
This sets up a conversation loop where the user can input questions, and our RAGbot will respond based on the information in our document index and its general knowledge.
Talking to your RAG chatbot
To start the chatbot, navigate to your project directory in the terminal and run:
python rag-chatbot.py
Conclusion
We've built a RAG application that can answer questions based on specific documents and general knowledge. This is just the beginning – you can expand on this by adding more documents to your index, fine-tuning the prompt, or even adding multiple agents for more complex interactions. The power of RAG lies in its ability to combine specific knowledge with the general capabilities of large language models. This makes it an incredibly versatile tool for building knowledgeable AI assistants, customized chatbots, and much more.
Remember the code is available on GitHub. If you have any questions you can reach out to me on X (formerly twitter) or LinkedIn. Happy coding!
AI should drive results, not complexity. AgentemAI helps businesses build scalable, efficient, and secure AI solutions. See how we can help.