Retrieval-Augmented Generation Makes AI SmarterRetrieval-Augmented Generation Makes AI Smarter
Real-time augmented-data retrieval can significantly boost the accuracy and performance of generative AI. But getting it right can be difficult.
A core problem with artificial intelligence is that it’s, well, artificial. Generative AI systems and large language models (LLMs) rely on statistical methods rather than intrinsic knowledge to predict text outcomes. As a result, they sometimes spin up lies, errors and hallucinations.
This lack of real-world knowledge has repercussions that extend across domains and industries. The problems can be particularly painful in areas such as finance, healthcare, law, and customer service. Bad results can lead to bad business decisions, irate customers, and wasted money.
As a result, organizations are turning to retrieval-augmented generation (RAG). According to a Deloitte report, upwards of 70% of enterprises are now deploying the framework to augment LLMs. “It is essential for realizing the full benefits of AI and managing costs,” says Jatin Dave, managing director of AI and data at Deloitte.
RAG’s appeal is that it supports faster and more reliable decision-making. It also dials up transparency and energy savings. As the competitive business landscape intensifies and AI becomes a tool that differentiates organizations, RAG is emerging as an important tool in the AI arsenal.
Says Scott Likens, US and Global Chief AI Engineering Officer at PwC: “RAG is revolutionizing AI by combining the precision of retrieval models with the creativity of generative models.”
RAG Matters
What makes RAG so powerful is that it combines a trained generative AI system with real-time information, typically from a separate database. “This synergy enhances everything from customer support to content personalization, providing more accurate and context-aware interactions,” Likens explains.
RAG increases the odds that results are accurate and up to date by checking external sources before serving up a response to a query. It also introduces greater transparency to models by generating links that a human can check for accuracy. Then there’s the fact that RAG can trim the time required to obtain information, reduce compute overhead and conserve energy.
“RAG enables searches through a very large number of documents without the need to connect to the LLM during the search process,” Dave points out. “A RAG search is also faster than an LLM processing tokens. This leads to faster response times from the AI system.”
This makes RAG particularly valuable for handling diverse types of data from different sources, including product catalogs, technical images, call transcripts, policy documents, marketing data, and legal contracts. What’s more, the technology is evolving rapidly, Dave says. RAG is increasingly equipped to manage larger datasets and operate within complex cloud frameworks.
For example, RAG can combine generalized medical or epidemiological data held in an LLM with specific patient information to deliver more accurate and targeted recommendations. It can connect a customer using a chatbot with an inventory system or third-party logistics and delivery data to provide an immediate update about a delayed shipment. RAG can also personalize marketing and product recommendations, based on past clicks or purchases.
The result is a higher level of personalization and contextualization. “RAG can tailor language model outputs to specific enterprise knowledge and enhance the LLMs core capabilities,” Likens says. Yet all of this doesn’t come without a string attached. “RAG adds complexity to knowledge management. It requires dealing with data lineage, multiple versions of the same source, and the spread of data across different business units and applications, “he adds.
Beyond the Chatbot
Designing an effective RAG framework can prove challenging. Likens says that on the technology side, several components are foundational. This includes vector databases, orchestration, a document processing tool, and a scaled data processing pipelines.”
It’s also important to adopt tools that streamline RAG development and improve the accuracy of information, Likens says. These include hybrid retrieval solutions, experiment tracking and data annotation tooling. More advanced tools, such as LLMs, vector databases, data pipeline and compute workflow tools are typically available through hyperscalers and SaaS providers
“There is not a one-size-fits-all RAG pipeline, so there will always be a need to tailor the technology to the specific use case,” Likens says.
Equally important is mapping out a data and information pipeline. Chunking -- breaking data into smaller strings that an LLM can process -- is essential. There’s also a need to fine-tune the language model so that it can contextualize the RAG data, and it’s important to adapt a model’s weights during post-training processes.
“People typically focus on the LLM model, but it’s the database that often causes the most problems because, unlike humans, LLMs aren’t good with domain knowledge,” explains Bern Elliot, a research vice president at Gartner. “A person reads something and knows it makes sense without understanding every detail.”
Elliott says that a focus on metadata and keeping humans in the loop is critical. Typically, this involves tasks like rank ordering and grounding that anchor a system in the real world -- and increase the odds that AI outputs are meaningful and contextually relevant. Although there’s no way to hit 100% accuracy with RAG, the right mix of technology and processes -- including using footnoting so that humans can review output -- boosts the odds that an LLM will deliver value.
Designs on Data
There’s no single way to approach RAG. It’s important to experiment because a system might not initially generate the right information or response for an appropriate reason, Likens says. It’s also wise to pay close attention to data biases and ethical considerations, including data privacy. Unstructured data magnifies the risks. “It may contain personally identifiable information (PII) or other sensitive information,” he notes.
Organizations that get the equation right take LLMs to a more functional and viable level. They’re able to achieve more with fewer resources. This translates into a more agile and flexible Gen AI framework with less fine tuning. “RAG equals the playing field between ultra-large language models that exceed 100 billion parameter and more compact models of 8 [billion] to 70 billion parameters … organizations can achieve similar results with very little tradeoff in performance.”
Of course, RAG isn’t a savior. It can’t transform a mediocre LLM into a transformative powerhouse, Dave says. What’s more, many aspects of business expertise aren’t embedded in digital documents and an overreliance on the technology could prove problematic. Still, “Semantic search is very powerful,” he concludes. In the coming years, “RAG-based constructs will become a key component of the technology stack in every enterprise.”
About the Author
You May Also Like