Building RAG Applications with Autogen and LlamaIndex: A Beginner's Guide

Lennex Zinyando

03 Sep 2024 • 4 min read

Have you ever wondered how to create a chatbot that can answer questions based on specific documents or data? Enter the world of RAG (Retrieval-Augmented Generation) applications! In this tutorial, I'll walk you through building a RAG chatbot using Autogen and LlamaIndex, two powerful tools in the AI developer's toolkit. I have previously written about how to build chatbots with Autogen before, you can read about it here.

If you are in a rush and just want to see the code head over to GitHub.

What is RAG?

Before we dive into the code, let's understand what RAG is. Retrieval-Augmented Generation is a technique that combines the power of large language models with the ability to retrieve specific information from a knowledge base. This means our chatbot can provide answers based on both its general knowledge and the specific data we provide.

We store data in a vector store, in this tutorial, we are going to use ChromeDB. When a user makes a query we search for similar documents in the vector store and send them together with the user question in the prompt. This gives the LLM the ability to incorporate our data into the answer it will give us.

Setting Up Our Environment

Before we start building the RAG chatbot, let's set up our development environment.

Python Installation

If you don't have Python installed, visit the official Python website and follow the installation instructions for your operating system.

Installing the dependencies

We'll use PyAutoGen, which is the Python package for AutoGen and other dependencies that will make the development of our chatbot easier:

pip install pyautogen groq llama-index chromadb python-dotenv llama-index-vector-stores-chroma

Getting the `OPENAI_API_KEY`

By default LlamaIndex uses text-embedding-ada-002, which is the default embedding used by OpenAI. We need an OPENAI_API_KEY for the embeddings that will be stored in the chromadb vector database. Head over to https://platform.openai.com/api-keys and grab an api key. If you don’t have an account you might need to register first. You might need to buy credits worth $5 to get started and if your account is new you might qualify for some free credits to get started.

Getting the `GROQ_API_KEY`

Groq will provide our language model. Follow these steps:

Go to https://console.groq.com/ and create an account
Navigate to the API keys section
Create a new API key for the project

With these steps completed, your environment should be ready for building our RAG chatbot!

Building our RAG chatbot

Let's create our RAG chatbot step by step. Create a project directory named basic-rag-llamaindex-autogen. Inside this directory, create a .env file to store our API keys

GROQ_API_KEY=gsk_secret_key_xxxxxxxxxxxxxxxxxxx
OPENAI_API_KEY=sk-proj-secret-key-xxxxxxxxxxxxx

Import Required Libraries

Create a file named rag-chatbot.py in your project directory. We'll build our chatbot in this file.

First, let's import the necessary libraries:

import os
from dotenv import load_dotenv
from autogen import ConversableAgent
import chromadb
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
from llama_index.vector_stores.chroma import ChromaVectorStore
from llama_index.core import StorageContext

load_dotenv()

These imports set us up with Autogen for creating our chatbot, LlamaIndex for managing our document index, and ChromaDB for our vector store as well as loading environment variables using the dotenv library.

Adding documents

Create a folder in the root of your folder named documents . This is where you will place the documents you want added to the vector database index. SimpleDirectoryReader has the ability to read different formats including Markdown, PDFs, Word documents, PowerPoint decks, images, audio and video. We will configure the directory name in the next section.

Creating Our Document Index

The heart of our RAG application is the document index. This is where we store and retrieve information. Let's break down the initialize_index() function:

def initialize_index():
    db = chromadb.PersistentClient(path="./chroma_db")
    chroma_collection = db.get_or_create_collection("my-docs-collection")

    vector_store = ChromaVectorStore(chroma_collection=chroma_collection)
    storage_context = StorageContext.from_defaults(vector_store=vector_store)

    if chroma_collection.count() > 0:
        print("Loading existing index...")
        return VectorStoreIndex.from_vector_store(
            vector_store, storage_context=storage_context
        )
    else:
        print("Creating new index...")
        documents = SimpleDirectoryReader("./documents").load_data()
        return VectorStoreIndex.from_documents(
            documents, storage_context=storage_context
        )
        

index = initialize_index()
query_engine = index.as_query_engine()

This function does a few key things:

It sets up a ChromaDB client to store our vectors persistently.
It checks if we have an existing index. If so, it loads it; if not, it creates a new one from documents in a specified directory.
It returns a VectorStoreIndex, which we'll use to query our documents.

Generating the prompt

The create_prompt() function is where the magic happens. It takes a user's input, queries our document index, and creates a prompt for our chatbot:

def create_prompt(user_input):
    result = query_engine.query(user_input)

    prompt = f"""
    Your Task: Provide a concise and informative response to the user's query, drawing on the provided context.

    Context: {result}

    User Query: {user_input}

    Guidelines:
    1. Relevance: Focus directly on the user's question.
    2. Conciseness: Avoid unnecessary details.
    3. Accuracy: Ensure factual correctness.
    4. Clarity: Use clear language.
    5. Contextual Awareness: Use general knowledge if context is insufficient.
    6. Honesty: State if you lack information.

    Response Format:
    - Direct answer
    - Brief explanation (if necessary)
    - Citation (if relevant)
    - Conclusion
    """

    return prompt

This function queries our index with the user's input, then constructs a prompt that includes the retrieved information and guidelines for how to respond.

Setting Up Our Chatbot

We use Autogen's ConversableAgent to create our chatbot:

llm_config = {
    "config_list": [
        {
            "model": "llama-3.1-8b-instant",
            "api_key": os.getenv("GROQ_API_KEY"),
            "api_type": "groq",
        }
    ]
}

rag_agent = ConversableAgent(
    name="RAGbot",
    system_message="You are a RAG chatbot",
    llm_config=llm_config,
    code_execution_config=False,
    human_input_mode="NEVER",
)

Here, we're using the Llama 3.1 model via the Groq API. You'll need to set up your API key in a .env file for this to work.

Running Our Chatbot

Finally, we have our main loop:

def main():
    print("Welcome to RAGbot! Type 'exit', 'quit', or 'bye' to end the conversation.")
    while True:
        user_input = input(f"\\nUser: ")

        if user_input.lower() in ["exit", "quit", "bye"]:
            print(f"Goodbye! Have a great day!!")
            break

        prompt = create_prompt(user_input)

        reply = rag_agent.generate_reply(messages=[{"content": prompt, "role": "user"}])

        print(f"\\nRAGbot: {reply['content']}")

if __name__ == "__main__":
    main()

This sets up a conversation loop where the user can input questions, and our RAGbot will respond based on the information in our document index and its general knowledge.

Talking to your RAG chatbot

To start the chatbot, navigate to your project directory in the terminal and run:

python rag-chatbot.py

Conclusion

We've built a RAG application that can answer questions based on specific documents and general knowledge. This is just the beginning – you can expand on this by adding more documents to your index, fine-tuning the prompt, or even adding multiple agents for more complex interactions. The power of RAG lies in its ability to combine specific knowledge with the general capabilities of large language models. This makes it an incredibly versatile tool for building knowledgeable AI assistants, customized chatbots, and much more.

Remember the code is available on GitHub. If you have any questions you can reach out to me on X (formerly twitter) or LinkedIn. Happy coding!

AI should drive results, not complexity. AgentemAI helps businesses build scalable, efficient, and secure AI solutions. See how we can help.