How Retrieval Augmented Generation (RAG) improves GenAI solutions
Retrieval Augmented Generation (RAG) combines LLM (Large Language Models) with external data sources to deliver more relevant and up-to-date results. This article explains how RAG works and why compound AI systems are important.
Learn how contextual information improves the quality of LLM output and how your business benefits.
The limitations of LLM and how RAG overcomes them
LLMs have revolutionized the way we interact with AI. They can understand and generate human language, answer complex questions, and perform versatile tasks. Yet they face limitations when they lack the knowledge needed to complete a task or answer a question. This often results in so-called hallucinations – that is, an LLM generates inaccurate information or invents it completely.
What causes these limitations?
- Lack of up-to-date knowledge: LLMs are based on data collected up to a certain point in time. Events or information occurring after that point in time are unknown to these models.
- Lack of specialization: Although they have broad knowledge, LLMs often lack in-depth specific technical and domain knowledge. Furthermore, the models do not have access to internal company data or knowledge about internal company processes.
How RAG reduces these limitations
RAG systems expand the capabilities of LLMs by incorporating additional data sources. This reduces the probability of errors and hallucinations. They use additional data sources as an aid, which creates the following added values:
- Current answers: RAG provides access to current information from the internet or other real-time data sources, ensuring that answers are always up to date.
- Higher quality answers: By accessing internal databases, documents or knowledge bases, RAG systems can incorporate company-specific knowledge into the generation of answers.
We show examples of RAG systems in use
RAG systems are already in use today in numerous applications:
Customer service chatbots
A company implements a chatbot that answers customer queries not only with general knowledge but also with specific information from internal databases. This provides customers with up-to-date and precise answers to their individual questions.
Internal knowledge management
Employees use an RAG system to quickly access internal guidelines, technical documentation or project information. This promotes knowledge sharing and increases efficiency within the company.
Personalized content creation
Marketing teams generate content based on current data and specific customer segments. RAG systems enable the creation of customized campaigns that are more responsive and insightful to the needs of the target audience.
What is the architecture behind RAG systems?
The core components of an RAG system
A RAG system consists of at least the following two central components, which work together and are connected via an integration layer:
- Large Language Model (LLM): The generative AI model creates texts based on input data.
- Data sources: Internal or external databases, document archives, knowledge databases or APIs that provide specific information.
How does a RAG system work?
- User query: The user asks a question or makes a request of the system.
- Information retrieval: The system searches the internal or external data sources for relevant information that will help answer the question or request.
- Contextualization: The retrieved information is provided to the LLM as context.
- Generating an answer: The LLM generates an answer that takes into account both the query and the additional context.
- Output: The user receives the answer generated by the LLM in response to their query.
How a secure and productive RAG system is structured
The diagram describes the architecture of a mature RAG system designed for productive use.

The RAG system consists of six components:
Management layer: This is about services and storage that are not core to the RAG system. They provide supporting functions (including various logging and monitoring functionalities, user chat histories and access rights management). These functions are accessed at various points throughout the entire RAG process.
Document preprocessing: The necessary processing steps are now carried out to convert the raw documents into a specific format – a format that can be efficiently searched by the RAG system and fed into the LLM. To do this, text is first extracted from the raw documents (e.g., Word or PDF). After that, various techniques are used to divide the texts into shorter sections. This so-called chunking makes it easier to examine large amounts of data. In complex scenarios, an LLM can be useful for this step.
Document storage: The preprocessed documents are stored in searchable indexes. Larger RAG systems in particular distribute the data into multiple databases, for example by topic or domain.
The data itself always determines the type of database. Vector databases or an index for keyword searches are suitable for classic text data. This index often complements the pure vector search.
Traditionally, SQL databases are used for structured, tabular data.
Depending on the data at hand, it may also be useful to store it as a knowledge graph in a graph database. The routing determines which database is accessed for a user query.
Data retrieval: A user makes a query. The relevant document sections are retrieved from the configured indices and returned. Various state-of-the-art techniques such as query expansion and hypothetical document embeddings (HyDE) further improve the search results. The techniques used depend on the respective indices.
To search an SQL database or to traverse a graph in a graph database, another LLM call generates the corresponding database queries in an intermediate step. To retrieve data from external APIs (interfaces), an additional LLM call generates the necessary parameters from the user request.
Routing: This component determines which of the indexes should be searched for relevant information based on the user request. The aim is to determine and classify the intention using an LLM call.
An LLM call should select the appropriate indexes.
Context building, security and final response:
This is the interface for communicating with the RAG system. It receives user queries and uses the document sections and LLM calls supplied by the RAG system to respond to the model APIs.
The final quality and security check then takes place. The aim is to check the accuracy, relevance and compliance of the response with company guidelines. In addition, it ensures that no sensitive information is disclosed. The user receives the checked and secure response. Optionally, the user can provide feedback on the response; this feedback can subsequently improve the system.
The role of compound AI systems (CAS) – what flexible systems can do
RAG systems are a type of compound AI system. CAS are modular in structure, meaning that they consist of several specialized components. These components perform clearly defined tasks. The modularity of CAS offers various advantages:
- Flexibility: Individual components can be developed, adapted or exchanged independently of one another.
- Scalability: It is easy to expand the system to handle additional functions or higher loads.
- Adaptability: New technologies or data sources can be easily integrated.
Advanced components ensure quality and security:
In addition to the main components, further modules can be integrated into a CAS to increase quality and security.
- Validation models: Additional AI models check the generated responses for correctness and plausibility.
- Rule-based filtering: Business rules or compliance guidelines are implemented to identify unwanted or incorrect content.
- Human review (human-in-the-loop): Experts can review critical responses before they go to the user.
- Authentication and authorization: Ensure that only authorized users have access to certain functions or data.
- Anonymization and encryption: Can sensitive data be protected? Absolutely. This is established by anonymization or encryption while the data is being processed.
- Compliance checks: Compliance with legal requirements and internal guidelines is verified.
In addition, CAS architectures offer the option of integrating further functions:
- Feedback mechanisms: These enable users to provide feedback to improve the system.
- Analytics and monitoring: Usage data is collected to monitor and optimize system performance.
- Multilingual support: The system is extended to process and output multiple languages.
The advantages of RAG and CAS systems for banks and insurance companies
Banks and insurance companies in particular have immense internal amounts of data, for example customer and product information or knowledge of their own processes. Therefore, banks and insurance companies benefit especially from RAG and CAS systems. However, a high level of security is mandatory.
Key Advantages:
- Internal knowledge: An RAG system makes it possible to efficiently use and make available existing company knowledge.
- Accuracy and relevance: Combining LLMs with specific data sources makes answers more precise and relevant.
- Security and compliance: Additional security modules ensure that sensitive data is protected and that regulations are adhered to.
- Scalability and flexibility: The modular CAS architecture makes it possible to adapt the system to increasing demands.
Compound AI Systems are suitable for flexible and secure AI solutions – a conclusion
RAG systems extend the capabilities of LLMs by incorporating specific knowledge and up-to-date information to generate answers. Companies can develop flexible, scalable, and secure AI solutions through a CAS architecture. The integration of additional components for quality assurance and security increases the reliability of the systems. It also makes it easier for people to accept the AI solutions.
Understanding RAG and CAS is essential for any organization looking to improve their GenAI solutions and meet increasing demands.
With internal knowledge, better answers, and high security and compliance, your company can gain a decisive competitive advantage.