Self-RAG (Self-Reflective Retrieval-Augmented Generation) is a framework that combines the benefits of retrieval-augmented generation (RAG) with self-reflection, allowing large language models (LLMs) to adaptively retrieve passages on-demand and generate more accurate responses.

Langgraph recently published a blog post on Self-RAG with their own implementation.

I really like the idea of validating the retrieved information. In that past I have noticed that, while vector databases will return documents that usually match the input query string, some may contain irrelevant content. Having the ability to validate and correct this is a great feature for any RAG setup.

Initial implementation of the Langchain blog post

Here’s my take on implementing a simple Self-RAG agent using Langchain and Langgraph.

My use-case was the following: I wanted to use self RAG to provide answers based on our internal documentation first, then if the application vector database does not return any documents, it will search the web.

So my first step was to implement the code provided by Langchain in their blog post and make it work on my local LLM and local vector database (PGVector)

Graph

%%{init: {'flowchart': {'curve': 'linear'}}}%%
graph TD;
  __start__([

__start__

]):::first retrieve(retrieve) grade_documents(grade_documents) generate(generate) transform_query(transform_query) __end__([

__end__

]):::last __start__ --> retrieve; retrieve --> grade_documents; grade_documents -.-> transform_query; grade_documents -.-> generate; transform_query -.-> retrieve; generate -.  useful  .-> __end__; generate -.  not useful  .-> transform_query; generate -.  not supported  .-> generate; classDef default fill:#0059B3FF,line-height:1.2,color:#FFFFFF classDef first fill:#0A470AFF classDef last fill:#0A470AFF linkStyle default stroke:#FFFFFF

Here is what happens in this graph:

  • When we send the question to our retrieve agent, it will retrieve relevant documents from PGVector.
  • Then the grade_documents agent will grade those documents based on their relevance to the question.
    • If the document is useful for answering the question, it generates an answer directly.
    • If not, it transforms the query and retrieves more documents until it finds a useful one.
  • After it finds a useful document, it generates an answer from that document using the generate agent.
  • Using a hallucination_grader agent, it will check if the generated answer is supported by a set of facts or not.
    • If it’s not supported, it will regenerate the answer and try again.
    • If it’s suppported, it will grade the answer
  • Using an answer_grader agent, it will determine if the generated answer is useful or not for answering the question.
    • If it is useful, it ends the process and outputs the final answer.
    • If it is not useful, it transforms the query again and tries to find a more relevant document.

After setting up the Self-RAG implementation, I encountered a first major challenge: infinite loops. In some cases, no matter how many times I transformed the question, I couldn’t retrieve relevant documents from the vector database.

To address this issue, I made the decision to limit the number of transformations to 2 per attempt. If, after these 2 retries (3 attempts in total), the retrieved documents still didn’t meet the expectations, it was time to fall back on web searching using DuckDuckGo.

So I implemented a retry counter. When the current question fails to yield useful results for 3 consecutive attempts, it then reset the current question to the original question and start over from that point.

After resetting the question, it will stop using the retrieve agent and instead, use the web_search agent. This agent will execute a web search using DuckDuckGo and then pass the documents through the grade_documents agent. If this process yield useful documents, it will use the generate agent to produce an answer from them.

However, even with these precautions in place, I still had a retry limit for web searching: 3 attempts before conceding defeat. At the end of the process, there is two possible outcomes: either I’d successfully generated an answer and reached the result node, or I’d encountered a scenario where no useful information was available after multiple attempts, at which point I’d reach the no_results node.

Graph

Here is the final graph:

%%{init: {'flowchart': {'curve': 'linear'}}}%%
graph TD;
  __start__([

__start__

]):::first init(init) reset(reset) retrieve(retrieve) web_search(web_search) grade_documents(grade_documents) generate(generate) transform_query(transform_query) no_result(no_result) result(result) __end__([

__end__

]):::last __start__ --> init; init --> retrieve; no_result --> __end__; reset --> web_search; result --> __end__; retrieve --> grade_documents; web_search --> grade_documents; grade_documents -.-> transform_query; grade_documents -.-> generate; transform_query -.-> reset; transform_query -.-> web_search; transform_query -.-> retrieve; transform_query -.-> no_result; generate -.  useful  .-> result; generate -.  not useful  .-> transform_query; generate -.  not supported  .-> generate; classDef default fill:#0059B3FF,line-height:1.2,color:#FFFFFF classDef first fill:#0A470AFF classDef last fill:#0A470AFF linkStyle default stroke:#FFFFFF

Demo

Of course I wrapped this with a nice React + MUI web app, and here is a small demo of how it works:

Returning results for the vector database

Returning web search results

Conclusion

In this journey, I’ve had the opportunity to implement Self-RAG using Langchain and Langgraph, and I must say it’s been an exciting experience. While my implementation is opinionated and doesn’t provide a lot of room for extension, it has provided valuable insights into how to effectively utilize these libraries.

However, as with any solution, there are limitations to this approach. To overcome these challenges, I plan to work on creating more reusable components that can be easily integrated into future implementations.

This has been a fun project to work on.

You can find the code for this project on this repo.