Evaluating RAG Pipelines: What Actually Helped

Sun, 26 Apr 2026 00:00:00 +0000

In the previous posts, I wrote about agent systems from the architecture side: how to choose frameworks, how to structure agent loops, how to think about context, and how my local research agent uses a shared RAG layer.

This post is about the next step: evaluation.

I wanted to answer a simple question:

Which retrieval pipeline actually works better for paper question answering?

Not which one feels more elegant. Not which one is more fashionable. Not which one gives a nice demo on a single document. I wanted a comparison across lexical retrieval, vector retrieval, hybrid retrieval, and my hierarchical Zoom retrieval pipeline.

RAG Evaluation on Welcome

Evaluating RAG Pipelines: What Actually Helped