You may have seen headlines about AI systems recommending that users eat rocks or put glue on pizza—classic examples of hallucinations. Retrieval-Augmented Generation (RAG) is an architectural approach designed to reduce these failures by grounding model outputs in reliable, external knowledge.
At its core, RAG combines two components: a retriever (typically a vector database) that fetches relevant context, and a generator (an LLM) that uses this context to produce the final response. While the concept is deceptively simple, building and deploying RAG systems in production introduces a range of real-world challenges—from data quality and retrieval accuracy to latency, evaluation, and operational complexity.
In this talk, you’ll explore the most common failure modes of production RAG systems and learn practical techniques to mitigate—or eliminate—them, enabling more reliable, trustworthy AI applications.