Self-RAG (Self-Reflective Retrieval-Augmented Generation) is a framework that combines the benefits of retrieval-augmented generation (RAG) with self-reflection, allowing large language models (LLMs) to adaptively retrieve passages on-demand and generate more accurate responses.
Langgraph recently published a blog post on Self-RAG with their own implementation.
I really like the idea of validating the retrieved information. In that past I have noticed that, while vector databases will return documents that usually match the input query string, some may contain irrelevant content. Having the ability to validate and correct this is a great feature for any RAG setup.
Initial implementation of the Langchain blog post
Here’s my take on implementing a simple Self-RAG agent using Langchain and Langgraph.
My use-case was the following: I wanted to use self RAG to provide answers based on our internal documentation first, then if the application vector database does not return any documents, it will search the web.
So my first step was to implement the code provided by Langchain in their blog post and make it work on my local LLM and local vector database (PGVector)
Graph
%%{init: {'flowchart': {'curve': 'linear'}}}%% graph TD; __start__([__start__
]):::first retrieve(retrieve) grade_documents(grade_documents) generate(generate) transform_query(transform_query) __end__([__end__
]):::last __start__ --> retrieve; retrieve --> grade_documents; grade_documents -.-> transform_query; grade_documents -.-> generate; transform_query -.-> retrieve; generate -. useful .-> __end__; generate -. not useful .-> transform_query; generate -. not supported .-> generate; classDef default fill:#0059B3FF,line-height:1.2,color:#FFFFFF classDef first fill:#0A470AFF classDef last fill:#0A470AFF linkStyle default stroke:#FFFFFF
Here is what happens in this graph:
- When we send the question to our
retrieve
agent, it will retrieve relevant documents from PGVector. - Then the
grade_documents
agent will grade those documents based on their relevance to the question.- If the document is useful for answering the question, it generates an answer directly.
- If not, it transforms the query and retrieves more documents until it finds a useful one.
- After it finds a useful document, it generates an answer from that document using the
generate
agent. - Using a
hallucination_grader
agent, it will check if the generated answer is supported by a set of facts or not.- If it’s not supported, it will regenerate the answer and try again.
- If it’s suppported, it will grade the answer
- Using an
answer_grader
agent, it will determine if the generated answer is useful or not for answering the question.- If it is useful, it ends the process and outputs the final answer.
- If it is not useful, it transforms the query again and tries to find a more relevant document.
Next step: Web search
After setting up the Self-RAG implementation, I encountered a first major challenge: infinite loops. In some cases, no matter how many times I transformed the question, I couldn’t retrieve relevant documents from the vector database.
To address this issue, I made the decision to limit the number of transformations to 2
per attempt. If, after
these 2
retries (3
attempts in total), the retrieved documents still didn’t meet the expectations,
it was time to fall back on web searching using DuckDuckGo.
So I implemented a retry
counter. When the current question fails to yield useful results for 3
consecutive
attempts,
it then reset the current question to the original question and start over from that point.
After resetting the question, it will stop using the retrieve
agent and instead, use the web_search
agent.
This agent will execute a web search using DuckDuckGo and then pass the documents through the grade_documents
agent.
If this process yield useful documents, it will use the generate
agent to produce an answer from them.
However, even with these precautions in place, I still had a retry
limit for web searching: 3
attempts before
conceding defeat. At the end of the process, there is two possible outcomes: either I’d successfully generated an
answer and reached the result
node, or I’d encountered a scenario where no useful information was available after
multiple attempts, at which point I’d reach the no_results
node.
Graph
Here is the final graph:
%%{init: {'flowchart': {'curve': 'linear'}}}%% graph TD; __start__([__start__
]):::first init(init) reset(reset) retrieve(retrieve) web_search(web_search) grade_documents(grade_documents) generate(generate) transform_query(transform_query) no_result(no_result) result(result) __end__([__end__
]):::last __start__ --> init; init --> retrieve; no_result --> __end__; reset --> web_search; result --> __end__; retrieve --> grade_documents; web_search --> grade_documents; grade_documents -.-> transform_query; grade_documents -.-> generate; transform_query -.-> reset; transform_query -.-> web_search; transform_query -.-> retrieve; transform_query -.-> no_result; generate -. useful .-> result; generate -. not useful .-> transform_query; generate -. not supported .-> generate; classDef default fill:#0059B3FF,line-height:1.2,color:#FFFFFF classDef first fill:#0A470AFF classDef last fill:#0A470AFF linkStyle default stroke:#FFFFFF
Demo
Of course I wrapped this with a nice React + MUI web app, and here is a small demo of how it works:
Returning results for the vector database
Returning web search results
Conclusion
In this journey, I’ve had the opportunity to implement Self-RAG using Langchain and Langgraph, and I must say it’s been an exciting experience. While my implementation is opinionated and doesn’t provide a lot of room for extension, it has provided valuable insights into how to effectively utilize these libraries.
However, as with any solution, there are limitations to this approach. To overcome these challenges, I plan to work on creating more reusable components that can be easily integrated into future implementations.
This has been a fun project to work on.
You can find the code for this project on this repo.