A Beginner's Guide to LlamaIndex Workflows

Lennex Zinyando

05 Sep 2024 • 6 min read

LLM-powered applications and AI agents have taken the world by storm in the last 2 years. At the heart is Retrieval-Augmented Generation (RAG), which is a powerful tool for enhancing the capabilities of models by providing contextually relevant information from external sources. In my last post, I explained how to create a RAG application using LlamaIndex and AutoGen. In this post, I will be introducing the concept of workflows in LllamaIndex. I will be refactoring the application (which you can find on GitHub) to use workflows.

If you are in a rush and just want to see the code head over to GitHub.

What are Workflows in LlamaIndex?

A workflow is essentially an event-driven sequence of steps executed in a predefined order to achieve a task. In the context of LlamaIndex, workflows allow you to define a structured process that spans several steps—each step being responsible for a particular aspect of the task (such as setting up a database or generating a response from a model).

This approach helps in organizing your application logic, making it easier to extend and maintain, particularly when working with multiple steps such as:

Retrieving relevant documents
Generating a prompt for the LLM
Producing a final reply to the user.

With workflows, you can clearly define the flow of your RAG application in a modular, easy-to-follow manner. You might be wondering why use workflows in the first place. The main reason is as LLM-powered applications grow in complexity it has become harder to manage the flow of data and control the execution of the applications. Workflows provide a mechanism to break down complex applications into smaller, more manageable steps.

Setting Up Our Environment

If you are new to Python and LLM applications you need to do a few things before getting started. You can follow the instructions in this previous post. Essentially you need to get two API keys OPENAI_API_KEY and GROQ_API_KEY.

I advise you to go through the linked blog post so that you get a true appreciation of how workflows work and why we need them.

Installing the dependencies

We'll use PyAutoGen, which is the Python package for AutoGen and other dependencies that will make the development of our chatbot easier:

pip install pyautogen groq llama-index chromadb python-dotenv llama-index-vector-stores-chroma asyncio

Building the Workflow application

Let's create our RAG chatbot step by step. Create a project directory named workflow-rag-llamaindex-autogen. Inside this directory, create a .env file to store our API keys

GROQ_API_KEY=gsk_secret_key_xxxxxxxxxxxxxxxxxxx
OPENAI_API_KEY=sk-proj-secret-key-xxxxxxxxxxxxx

Import Required Libraries

Create a file named rag-chatbot.py in your project directory. We'll build our chatbot in this file.

First, let's import the necessary libraries:

import os
from dotenv import load_dotenv
from chromadb import Collection, PersistentClient
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader, StorageContext
from llama_index.core.workflow import (
    StartEvent,
    StopEvent,
    Workflow,
    step,
    Event,
    Context,
)
from llama_index.vector_stores.chroma import ChromaVectorStore
from autogen import ConversableAgent

load_dotenv()

This code block is setting up the environment and importing necessary libraries for the RAG (Retrieval-Augmented Generation) system.

These lines import various modules and classes needed for the RAG system:

os: For interacting with the operating system
load_dotenv: To load environment variables from a .env file
chromadb: A vector database for storing and retrieving embeddings
llama_index: A framework for building RAG applications
autogen: LLM agent provider

load_dotenv()

This line loads environment variables from a .env file into the system's environment variables. This is a common practice for managing configuration and secrets.

Adding documents

Create a folder in the root of your folder named documents . This is where you will place the documents you want added to the vector database index. SimpleDirectoryReader has the ability to read different formats including Markdown, PDFs, Word documents, PowerPoint decks, images, audio and video. We will configure the directory name in the next section.

Setting Up Our Agent

We use Autogen's ConversableAgent to create our chatbot:

llm_config = {
    "config_list": [
        {
            "model": "llama-3.1-8b-instant",
            "api_key": os.getenv("GROQ_API_KEY"),
            "api_type": "groq",
        }
    ]
}

rag_agent = ConversableAgent(
    name="RAGbot",
    system_message="You are a RAG chatbot",
    llm_config=llm_config,
    code_execution_config=False,
    human_input_mode="NEVER",
)

Here, we're using the Llama 3.1 model via the Groq API. You'll need to set up your API key in a .env file for this to work.

Setting up ChromaDB

For the vector store, we use Chroma, a vector database that stores the embeddings for document retrieval:

db = PersistentClient(path="./chroma_db")
chroma_collection: Collection = db.get_or_create_collection("my-docs-collection")
vector_store = ChromaVectorStore(chroma_collection=chroma_collection)
storage_context = StorageContext.from_defaults(vector_store=vector_store)

This creates a connection to Chroma’s persistent storage, where we can store and retrieve vectors representing document embeddings. The vector embeddings are created using OpenAi text-embedding-ada-002 model, this is why you require a OPENAI_API_KEY.

Creating Custom Events

The custom events in this workflow define the points of interaction within the flow:

class SetupEvent(Event):
    query: str

class CreatePromptEvent(Event):
    query: str

class GenerateReplyEvent(Event):
    query: str

These events enable the passing of information between different steps in the workflow.

Defining the RAG Workflow

This RAGFlow class defines a workflow for handling user queries using a RAG approach. It combines document indexing, information retrieval, and language model generation to provide informative and context-aware responses to user queries.

Here’s the heart of the application—defining the workflow steps:

class RAGFlow(Workflow):
    @step
    async def start(
        self, ctx: Context, ev: StartEvent
    ) -> SetupEvent | CreatePromptEvent:
        if chroma_collection.count() < 1:
            return SetupEvent(query=ev.query)

        print("Loading existing index...")
        index = VectorStoreIndex.from_vector_store(
            vector_store, storage_context=storage_context
        )
        await ctx.set("index", index)
        return CreatePromptEvent(query=ev.query)

This is the initial step of the workflow. It checks if there's an existing index:

If the index is empty, it returns a SetupEvent to create a new index.
If an index exists, it loads it and returns a CreatePromptEvent.

    @step
    async def setup(self, ctx: Context, ev: SetupEvent) -> StartEvent:
        print("Creating new index...")
        documents = SimpleDirectoryReader("./documents").load_data()
        index = VectorStoreIndex.from_documents(
            documents, storage_context=storage_context
        )
        await ctx.set("index", index)
        return StartEvent(query=ev.query)

In the setup step, we create a new index using the documents found in a local directory. This index allows us to perform searches and retrieve relevant information.

    @step
    async def create_prompt(
        self, ctx: Context, ev: CreatePromptEvent
    ) -> GenerateReplyEvent:
        index = await ctx.get("index")
        query_engine = index.as_query_engine()
        user_input = ev.query
        result = query_engine.query(user_input)

        prompt = f"""
        Your Task: Provide a concise and informative response to the user's query, drawing on the provided context.

        Context: {result}
        User Query: {user_input}

        Guidelines:
        1. Relevance: Focus directly on the user's question.
        2. Conciseness: Avoid unnecessary details.
        3. Accuracy: Ensure factual correctness.
        4. Clarity: Use clear language.
        5. Contextual Awareness: Use general knowledge if context is insufficient.
        6. Honesty: State if you lack information.

        Response Format:
        - Direct answer
        - Brief explanation (if necessary)
        - Citation (if relevant)
        - Conclusion
        """

        await ctx.set("prompt", prompt)
        return GenerateReplyEvent(query=ev.query)

The create_prompt step builds a detailed prompt for the LLM, ensuring that the model provides a response based on the context retrieved from the vector store.

This step:

Retrieves the index
Creates a query engine
Queries the index with the user's input
Constructs a detailed prompt for the language model The prompt includes the context from the query result and guidelines for generating a response.

    @step
    async def generate_reply(self, ctx: Context, ev: GenerateReplyEvent) -> StopEvent:
        prompt = await ctx.get("prompt")
        reply = rag_agent.generate_reply(messages=[{"content": prompt, "role": "user"}])
        return StopEvent(result=reply["content"])

Finally, in the generate_reply step, the LLM generates a response based on the constructed prompt. The result is then passed back as a StopEvent, which marks the end of the workflow.

This final step:

Retrieves the prompt created in the previous step
Uses a rag_agent to generate a reply
Returns a StopEvent with the generated content

Running the Workflow

main() is an asynchronous function that contains the main logic of the program. It creates a new RAGFlow object for each user input. It sets a timeout of 10 seconds and enables verbose mode. It runs the RAG workflow asynchronously with the user's input as the query. The await keyword is used because this is an asynchronous operation. The main() function runs the RAG workflow in a loop, continuously accepting user input until the conversation ends:

async def main():
    print("Welcome to RAGbot! Type 'exit', 'quit', or 'bye' to end the conversation.")
    while True:
        user_input = input(f"\\nUser: ")
        if user_input.lower() in ["exit", "quit", "bye"]:
            print(f"Goodbye! Have a great day!!")
            break

        workflow = RAGFlow(timeout=10, verbose=True)
        reply = await workflow.run(query=user_input)
        print(f"\\nRAGbot: {reply}")

This interactive loop allows users to query the bot in real time, and the workflow handles document retrieval, prompt creation, and response generation behind the scenes.

if __name__ == "__main__":
    import asyncio
    asyncio.run(main())

This block checks if the script is being run directly (not imported). If so, it imports the asyncio module and uses it to run the main() function asynchronously.

Talking to your RAG chatbot

To start the chatbot, navigate to your project directory in the terminal and run:

python rag-chatbot.py

Use Cases for Workflows

Workflows offer a structured, scalable way to develop applications that rely on multiple steps, such as retrieval and generation. Here are some key use cases:

Document Retrieval Systems: Search and answer systems based on large document collections.
Customer Support: Bots that can answer questions based on FAQs or internal knowledge bases.
Research Assistants: Tools that can fetch relevant articles, papers, or research materials and summarize key insights.

By modularizing tasks into workflows, developers can build more robust, maintainable, and efficient RAG applications.

Conclusion

With LlamaIndex Workflows, you can break down complex tasks into manageable steps and organize your logic clearly. In this tutorial, we built a RAG-based chatbot that pulls data from a vector store and provides contextual, LLM-generated replies. By leveraging workflows, this process becomes intuitive and easy to follow.

The next steps for you might include adding more steps to the workflow to refine it, playing with the prompt to see if you can getter responses and cleaning up the code to make it more reusable. Remember the code is available on GitHub. If you have any questions you can reach out to me on X (formerly twitter) or LinkedIn. Happy coding!

AI should drive results, not complexity. AgentemAI helps businesses build scalable, efficient, and secure AI solutions. See how we can help.