Conflict-Aware Graph RAG: Reducing Hallucination in Multi-Hop QA

00:01:51:46

Overview

Large language models (LLMs) are powerful but prone to hallucination — especially on multi-hop questions that require reasoning across multiple pieces of evidence. Retrieval-Augmented Generation (RAG) helps ground model outputs in retrieved documents, but flat vector-based retrieval struggles to capture structured relationships between concepts.

This project proposes a Conflict-Aware Graph RAG pipeline that addresses this by combining knowledge graph construction with entropy-based conflict resolution.

How It Works

Step 1 — Knowledge Graph Construction

Unstructured text is parsed by an LLM (GPT-4o / LLaMA 3) to extract entities and their relationships as structured triples:

(Subject, Predicate, Object)
e.g. (Eesh Saxena, interned_at, IIT Tirupati)

These triples are stored in a Neo4j graph database, enabling efficient multi-hop traversal during retrieval.

Step 2 — Graph-Guided Retrieval

At query time, rather than performing flat cosine similarity over document chunks, we traverse the knowledge graph from the query's seed entities — collecting supporting evidence paths up to a configurable hop depth.

This structured traversal surfaces chains of reasoning that flat retrieval would miss.

Step 3 — Conflict Detection

Multiple retrieved paths may contain contradictory information. We compute an entropy score over conflicting triples at each node:

Low entropy → high agreement → high confidence passage
High entropy → conflicting evidence → flagged for explicit resolution or abstention

This prevents the model from confidently hallucinating when evidence is ambiguous.

Step 4 — Answer Generation

The filtered, conflict-resolved evidence paths are provided to the LLM as structured context. The model generates an answer grounded in the graph's structured knowledge rather than parametric memory alone.

Tech Stack

LLM: GPT-4o, LLaMA 3 via Ollama
Graph DB: Neo4j
Orchestration: LangChain
Embeddings: HuggingFace sentence-transformers
Backend: Python, FastAPI

Results

On multi-hop QA benchmarks (HotpotQA, MuSiQue), the conflict-aware graph retrieval approach showed measurable reduction in contradictory answers compared to flat vector RAG baselines, while maintaining competitive F1 scores on answerable questions.

Key Takeaways

Graph-structured retrieval is significantly more effective than flat chunk retrieval for questions requiring multi-step reasoning. The entropy-based conflict module adds a lightweight but effective safety layer that reduces confident hallucination — a critical requirement for deployment in high-stakes domains like healthcare or legal QA.