Image
Leuchtende Glühbirne in kräftigen Farben

07.05.2025 | Blog RAG is dead - or is it?

Retrieval-Augmented Generation (RAG) was introduced to address the limitations of LLMs. But with larger context windows and improved training methods, some claim RAG is becoming obsolete. In this blog post, we explore the latest developments and explain why RAG is far from dead.

Retrieval-Augmented Generation (RAG), which integrates a large generative language model (LLM) with search, has been in use for several years. The concept was first introduced in 2020 [1]. The release of ChatGPT by OpenAI in December 2022 marked a significant breakthrough. While generative language models had existed for some time, instruction tuning and reinforcement learning made them truly practical. ChatGPT not only engaged in conversation but also answered a wide range of questions. Beyond its strong language capabilities, it leveraged extensive general knowledge from its training data.

So why was RAG introduced in the first place?

LLMs have several limitations that can be addressed by integrating them with search. They lack access to information contained in private documents or databases that weren’t included in their training data and have no knowledge of events that occurred after their training. Moreover, they struggle to distinguish between factual information and plausible guesses, sometimes leading to hallucinations. While this issue has significantly improved over the years, there was a time when LLMs were jokingly referred to as "random bullshit generators" and there is still some truth to that.

RAG effectively addresses these challenges. By leveraging search - typically dense vector search or hybrid search - it retrieves relevant documents and passages containing information useful for answering the original query. This enables access to non-public documents through the search system, retrieval of publicly available information generated after the LLM’s training, and the ability to provide references that support and ground responses. As a result, RAG systems significantly reduce hallucinations.

I think RAG works kind of like how people answer questions. We usually have a bunch of knowledge in our heads ( similar to factual knowledge of an LLM), so we can answer a lot of questions right away. But if we want to be sure, we’ll often look it up in a book or online to double-check or find the answer. RAG does something similar—it pulls in extra information when it needs to.

What makes some pundits claim that RAG is dead?

Honestly, I believe their main argument boils down to personal preference - they simply don’t like it. They would rather have a single, seamless AI black box (LLM) that magically solves all their problems. It would be easier and more convenient, wouldn’t it? Currently, two LLM developments are raising hopes of replacing RAG. 

1. Improvements in training
First, with recent improvements in training (for example, the Chinese model DeepSeek, which was trained at a much lower cost than other models and produces competitive results), it may be possible to simply fine-tune an LLM with specific data, such as company information, and use it to answer questions directly based on that data. The problem with this approach is that existing LLMs, trained on nearly all publicly available data, still can't answer all questions related to that data and struggle to avoid hallucinations. In fact, ChatGPT itself is a form of RAG now. It uses BING to retrieve relevant documents to answer questions, even if those documents (like Wikipedia) are part of its training data. The new "Deep Research" feature is heavily built on the RAG concept. Perplexity is actually a RAG that combines internet search and LLMs.
 

2.Growing size of context windows 
Second, with the growing size of context windows, it may become possible to include all relevant content for a RAG directly in the prompt and answer questions based on that content. This concept is known as Cache-Augmented Generation (CAG). In fact, some LLMs now support context windows with several million tokens, meaning it could, in theory, be possible to input an entire text, like the Bible, into one prompt. However, there are two major caveats. First, LLMs with such large context windows often don't utilize them reliably; they tend to overlook parts of the content, which can result in incorrect answers [2]. Additionally, since context size will always be finite, there will always be limitations to this approach, and large context windows require significant computational resources, which can get really expensive.

LLMs with large context windows could potentially replace RAG for smaller applications, such as answering questions based on a few FAQ pages. Traditional RAG systems rely on dense vector search, which is great for semantic search but struggles if exact keyword matches are required, e.g. for specific error or product codes. Therefore, CAG could replace them, especially since CAG could also address some further limitations of dense vector retrieval, like issues stemming from poor chunking. However, CAG is unlikely to be effective for larger applications.

The future of RAG

We believe that RAG will continue to be one of the most important, if not the most important, applications of LLMs. 

However, there is a need for improved retrieval systems that scale for big data. Dense vector search could be the solution because of its semantic retrieval capabilities. However, creating and storing embeddings for massive corpora takes time and resources. Updating indexes in real-time or frequently can be trickier than with traditional search. For large corpora, combining dense and sparse (hybrid search) often gives the best performance—balancing precision, recall, and speed. Therefore, besides using Dense Vector Search we're also actively working on optimizing classical retrieval for RAG and will be publishing a follow-up blog on this topic. So, I would also strongly argue that traditional search is not dead yet.

With the advent of reasoning models, much will change. LLMs will take a more proactive role in planning and controlling the retrieval process, performing multiple retrieval steps, and utilizing additional tools to gather all the necessary information to answer a question. This is another area of our research at IntraFind. Moreover, with large context windows, passage retrieval may become less critical. It's possible that we could send all relevant documents from the search engine directly to the LLM and let it handle the selection of the relevant parts / passages.

Conclusion

RAG is not dead - it’s evolving. While new approaches like CAG and fine-tuned LLMs may replace RAG in specific use cases, it remains the most effective solution for accessing large and dynamic knowledge sources.

IntraFind develops AI-powered search technology for Enterprise Search, where fast and reliable access to all the company's information is essential. Given that this information typically comes from large and constantly evolving knowledge sources, we began exploring RAG technology a few years ago—and we’re continuing to actively develop it. The next natural evolution is toward reasoning models and agent-based Enterprise Search.

References:

[1] Patrick Lewis et al. Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks, Advances in Neural Information Processing Systems, submitted 2020.

[2] Yuri Kuratov et al. BABILong: Testing the Limits of LLMs with Long Context Reasoning-in-a-Haystack, 2024

Related articles

Image
Gebäude

The future of generative AI

Generative artificial intelligence (AI) is transforming the world of work. New approaches and innovations ensure that companies can organize their processes more efficiently and use AI applications more widely. Ten trends, IntraFind sees for 2025.
Read blog
Image
Treppenhaus

Which LLM is the right one?

Generative AI opens up many new opportunities for companies and authorities, but selecting the right large language model (LLM) can be challenging. What is important when making a choice.
Read blog
Image
Netzwerk

GenAI: From hype to practical application in business

With Generative AI companies and public authorities can increase their productivity. This blog describes how organizations can safely use their own data with generative AI to increase efficiency
Read blog

The author

Dr. Christoph Goller
Head of Research
Christoph Goller holds a PhD in computer science from the Technical University of Munich with research in Deep Learning. He is Apache Lucene Committer and has more than 20 years of experience in Information Retrieval, Natural Language Processing and AI. Since 2002 he is head of IntraFind‘s research department.
Image
Dr. Christoph Goller