Abstract :
While developing Generative and Training Language Models, there is a problem called "hallucinations", which refers to generative language models producing responses that seem reasonable given a context but are actually factually incorrect. In this paper, we present a new Optimised Retrieval Augmented Generation Pipeline that seeks to eliminate the hallucination problem, by dynamically grounding responses to external, retrievable knowledge. We have developed a system that leverages the following: (1) LangChain for orchestration; (2) LangGraph for workflow management; (3) Crew AI for tier-3 agent-to- agent coordination; (4) ChromaDB, a vector data base; and (5) the Gemini API for Generative outputs. Key architectural distinctions include: Intelligent Semantic Chunking; Adaptive Retrieval Strategies; and Structured Multi-Stage Prompts, and we back those distinctions up with a rigorous evaluation framework. Our experimental evaluation demonstrates a statistically significant improvement over our baseline LLMs, with a factual accuracy score of 93.4% compared to 76.2% for the baseline; a decrease in identification of "hallucinations" from 18.3% of all responses to under 4.1% of all responses; and a citation accuracy score of 96.2% of all responses produced by this system in the shortest time frame (3.4 seconds). The modular architecture of this system will allow for effortless adaptation across different domains, by simply updating the knowledge base, without having to retrain any models. This work establishes a practical scalable framework for safely deploying factually grounded, trustworthy generative AI models, and provides an example of how organisations with limited resources in machine learning can easily access sophisticated RAG capabilities.