A Beginner's Guide to LlamaIndex Workflows
LLM-powered applications and AI agents have taken the world by storm in the last 2 years. At the heart is Retrieval-Augmented Generation (RAG), which is a powerful tool for enhancing the capabilities of models by providing contextually relevant information from external sources. In my last post, I explained how to create a RAG application using LlamaIndex and AutoGen. In this post, I will be introducing the concept of workflows in LllamaIndex. I will be refactoring the application (which you can find on GitHub) to use workflows.
If you are in a rush and just want to see the code head over to GitHub.
What are Workflows in LlamaIndex?
A workflow is essentially an event-driven sequence of steps executed in a predefined order to achieve a task. In the context of LlamaIndex, workflows allow you to define a structured process that spans several steps—each step being responsible for a particular aspect of the task (such as setting up a database or generating a response from a model).
This approach helps in organizing your application logic, making it easier to extend and maintain, particularly when working with multiple steps such as:
- Retrieving relevant documents
- Generating a prompt for the LLM
- Producing a final reply to the user.
With workflows, you can clearly define the flow of your RAG application in a modular, easy-to-follow manner. You might be wondering why use workflows in the first place. The main reason is as LLM-powered applications grow in complexity it has become harder to manage the flow of data and control the execution of the applications. Workflows provide a mechanism to break down complex applications into smaller, more manageable steps.
Setting Up Our Environment
If you are new to Python and LLM applications you need to do a few things before getting started. You can follow the instructions in this previous post. Essentially you need to get two API keys OPENAI_API_KEY
and GROQ_API_KEY
.
I advise you to go through the linked blog post so that you get a true appreciation of how workflows work and why we need them.
Installing the dependencies
We'll use PyAutoGen
, which is the Python package for AutoGen and other dependencies that will make the development of our chatbot easier:
pip install pyautogen groq llama-index chromadb python-dotenv llama-index-vector-stores-chroma asyncio
Building the Workflow application
Let's create our RAG chatbot step by step. Create a project directory named workflow-rag-llamaindex-autogen
. Inside this directory, create a .env
file to store our API keys
GROQ_API_KEY=gsk_secret_key_xxxxxxxxxxxxxxxxxxx
OPENAI_API_KEY=sk-proj-secret-key-xxxxxxxxxxxxx
Import Required Libraries
Create a file named rag-chatbot.py
in your project directory. We'll build our chatbot in this file.
First, let's import the necessary libraries:
import os
from dotenv import load_dotenv
from chromadb import Collection, PersistentClient
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader, StorageContext
from llama_index.core.workflow import (
StartEvent,
StopEvent,
Workflow,
step,
Event,
Context,
)
from llama_index.vector_stores.chroma import ChromaVectorStore
from autogen import ConversableAgent
load_dotenv()
This code block is setting up the environment and importing necessary libraries for the RAG (Retrieval-Augmented Generation) system.
These lines import various modules and classes needed for the RAG system:
os
: For interacting with the operating systemload_dotenv
: To load environment variables from a .env filechromadb
: A vector database for storing and retrieving embeddingsllama_index
: A framework for building RAG applicationsautogen
: LLM agent provider
load_dotenv()
This line loads environment variables from a .env file into the system's environment variables. This is a common practice for managing configuration and secrets.
Adding documents
Create a folder in the root of your folder named documents
. This is where you will place the documents you want added to the vector database index. SimpleDirectoryReader
has the ability to read different formats including Markdown, PDFs, Word documents, PowerPoint decks, images, audio and video. We will configure the directory name in the next section.
Setting Up Our Agent
We use Autogen's ConversableAgent
to create our chatbot:
llm_config = {
"config_list": [
{
"model": "llama-3.1-8b-instant",
"api_key": os.getenv("GROQ_API_KEY"),
"api_type": "groq",
}
]
}
rag_agent = ConversableAgent(
name="RAGbot",
system_message="You are a RAG chatbot",
llm_config=llm_config,
code_execution_config=False,
human_input_mode="NEVER",
)
Here, we're using the Llama 3.1 model via the Groq API. You'll need to set up your API key in a .env
file for this to work.
Setting up ChromaDB
For the vector store, we use Chroma, a vector database that stores the embeddings for document retrieval:
db = PersistentClient(path="./chroma_db")
chroma_collection: Collection = db.get_or_create_collection("my-docs-collection")
vector_store = ChromaVectorStore(chroma_collection=chroma_collection)
storage_context = StorageContext.from_defaults(vector_store=vector_store)
This creates a connection to Chroma’s persistent storage, where we can store and retrieve vectors representing document embeddings. The vector embeddings are created using OpenAi text-embedding-ada-002
model, this is why you require a OPENAI_API_KEY
.
Creating Custom Events
The custom events in this workflow define the points of interaction within the flow:
class SetupEvent(Event):
query: str
class CreatePromptEvent(Event):
query: str
class GenerateReplyEvent(Event):
query: str
These events enable the passing of information between different steps in the workflow.
Defining the RAG Workflow
This RAGFlow
class defines a workflow for handling user queries using a RAG approach. It combines document indexing, information retrieval, and language model generation to provide informative and context-aware responses to user queries.
Here’s the heart of the application—defining the workflow steps:
class RAGFlow(Workflow):
@step
async def start(
self, ctx: Context, ev: StartEvent
) -> SetupEvent | CreatePromptEvent:
if chroma_collection.count() < 1:
return SetupEvent(query=ev.query)
print("Loading existing index...")
index = VectorStoreIndex.from_vector_store(
vector_store, storage_context=storage_context
)
await ctx.set("index", index)
return CreatePromptEvent(query=ev.query)
This is the initial step of the workflow. It checks if there's an existing index:
- If the index is empty, it returns a
SetupEvent
to create a new index. - If an index exists, it loads it and returns a
CreatePromptEvent
.
@step
async def setup(self, ctx: Context, ev: SetupEvent) -> StartEvent:
print("Creating new index...")
documents = SimpleDirectoryReader("./documents").load_data()
index = VectorStoreIndex.from_documents(
documents, storage_context=storage_context
)
await ctx.set("index", index)
return StartEvent(query=ev.query)
In the setup
step, we create a new index using the documents found in a local directory. This index allows us to perform searches and retrieve relevant information.
@step
async def create_prompt(
self, ctx: Context, ev: CreatePromptEvent
) -> GenerateReplyEvent:
index = await ctx.get("index")
query_engine = index.as_query_engine()
user_input = ev.query
result = query_engine.query(user_input)
prompt = f"""
Your Task: Provide a concise and informative response to the user's query, drawing on the provided context.
Context: {result}
User Query: {user_input}
Guidelines:
1. Relevance: Focus directly on the user's question.
2. Conciseness: Avoid unnecessary details.
3. Accuracy: Ensure factual correctness.
4. Clarity: Use clear language.
5. Contextual Awareness: Use general knowledge if context is insufficient.
6. Honesty: State if you lack information.
Response Format:
- Direct answer
- Brief explanation (if necessary)
- Citation (if relevant)
- Conclusion
"""
await ctx.set("prompt", prompt)
return GenerateReplyEvent(query=ev.query)
The create_prompt
step builds a detailed prompt for the LLM, ensuring that the model provides a response based on the context retrieved from the vector store.
This step:
- Retrieves the index
- Creates a query engine
- Queries the index with the user's input
- Constructs a detailed prompt for the language model The prompt includes the context from the query result and guidelines for generating a response.
@step
async def generate_reply(self, ctx: Context, ev: GenerateReplyEvent) -> StopEvent:
prompt = await ctx.get("prompt")
reply = rag_agent.generate_reply(messages=[{"content": prompt, "role": "user"}])
return StopEvent(result=reply["content"])
Finally, in the generate_reply
step, the LLM generates a response based on the constructed prompt. The result is then passed back as a StopEvent
, which marks the end of the workflow.
This final step:
- Retrieves the prompt created in the previous step
- Uses a
rag_agent
to generate a reply - Returns a
StopEvent
with the generated content
Running the Workflow
main()
is an asynchronous function that contains the main logic of the program. It creates a new RAGFlow
object for each user input. It sets a timeout of 10 seconds and enables verbose mode. It runs the RAG workflow asynchronously with the user's input as the query. The await
keyword is used because this is an asynchronous operation. The main()
function runs the RAG workflow in a loop, continuously accepting user input until the conversation ends:
async def main():
print("Welcome to RAGbot! Type 'exit', 'quit', or 'bye' to end the conversation.")
while True:
user_input = input(f"\\nUser: ")
if user_input.lower() in ["exit", "quit", "bye"]:
print(f"Goodbye! Have a great day!!")
break
workflow = RAGFlow(timeout=10, verbose=True)
reply = await workflow.run(query=user_input)
print(f"\\nRAGbot: {reply}")
This interactive loop allows users to query the bot in real time, and the workflow handles document retrieval, prompt creation, and response generation behind the scenes.
if __name__ == "__main__":
import asyncio
asyncio.run(main())
This block checks if the script is being run directly (not imported). If so, it imports the asyncio
module and uses it to run the main()
function asynchronously.
Talking to your RAG chatbot
To start the chatbot, navigate to your project directory in the terminal and run:
python rag-chatbot.py
Use Cases for Workflows
Workflows offer a structured, scalable way to develop applications that rely on multiple steps, such as retrieval and generation. Here are some key use cases:
- Document Retrieval Systems: Search and answer systems based on large document collections.
- Customer Support: Bots that can answer questions based on FAQs or internal knowledge bases.
- Research Assistants: Tools that can fetch relevant articles, papers, or research materials and summarize key insights.
By modularizing tasks into workflows, developers can build more robust, maintainable, and efficient RAG applications.
Conclusion
With LlamaIndex Workflows, you can break down complex tasks into manageable steps and organize your logic clearly. In this tutorial, we built a RAG-based chatbot that pulls data from a vector store and provides contextual, LLM-generated replies. By leveraging workflows, this process becomes intuitive and easy to follow.
The next steps for you might include adding more steps to the workflow to refine it, playing with the prompt to see if you can getter responses and cleaning up the code to make it more reusable. Remember the code is available on GitHub. If you have any questions you can reach out to me on X (formerly twitter) or LinkedIn. Happy coding!
AI should drive results, not complexity. AgentemAI helps businesses build scalable, efficient, and secure AI solutions. See how we can help.