LaRA: Benchmarking Retrieval-Augmented Generation and Long-Context LLMs - No Silver Bullet for LC or RAG Routing

Kuan Li, Liwen Zhang, Yong Jiang, Pengjun Xie, Fei Huang +2 more
2/14/2026

Abstract

Effectively incorporating external knowledge into Large Language Models (LLMs) is crucial for enhancing their capabilities and addressing real-world needs. Retrieval-Augmented Generation (RAG) offers an effective method for achieving this by retrieving the most relevant fragments into LLMs. However, the advancements in context window size for LLMs offer an alternative approach, raising the question of whether RAG remains necessary for effectively handling external knowledge. Several existing studies provide inconclusive comparisons between RAG and long-context (LC) LLMs, largely due to limitations in the benchmark designs. In this paper, we present LaRA, a novel benchmark specifically designed to rigorously compare RAG and LC LLMs. LaRA encompasses 2326 test cases across four practical QA task categories and three types of naturally occurring long texts. Through systematic evaluation of seven open-source and four proprietary LLMs, we find that the optimal choice between RAG and LC depends on a complex interplay of factors, including the model's parameter size, long-text capabilities, context length, task type, and the characteristics of the retrieved chunks. Our findings provide actionable guidelines for practitioners to effectively leverage both RAG and LC approaches in developing and deploying LLM applications. Our code and dataset is provided at: \href{https://github.com/Alibaba-NLP/LaRA}{\textbf{https://github.com/Alibaba-NLP/LaRA}}.

View on arXivView PDF

Code Implementations(9)

Alibaba-NLP/LaRAOfficial100%

The code for LaRA Benchmark

473Shell, PythonMar 5, 202510 months agoMIT

LLM-powered framework for deep document understanding, semantic retrieval, and context-aware answers using RAG paradigm.

13,0731,497Jul 22, 20252 months agoNOASSERTION
agentagenticaichatbotchatbots+15 more

Demystify RAG by building it from scratch. Local LLMs, no black boxes - real understanding of embeddings, vector search, retrieval, and context-augmented generation.

1,183122Oct 27, 20254 months agoMIT
agentsai-agentseducationalllmnode-llama-cpp+5 more

An agentic retrieval-augmented generation (RAG) system for querying multiple documents using LLMs and intelligent routing.

10Feb 4, 20262 months ago

Opti-Oignon is a comprehensive optimization framework for local LLMs running on Ollama. It maximizes the performance of your local models through intelligent task routing based on a custom benchmark, RAG (Retrieval-Augmented Generation), and multi-model orchestration.

40Dec 21, 20253 months agoMIT
aigradiollmllm-optimizationlocal-llm+4 more

Open-source AI orchestration framework for building context-engineered, production-ready LLM applications. Design modular pipelines and agent workflows with explicit control over retrieval, routing, memory, and generation. Built for scalable agents, RAG, multimodal applications, semantic search, and conversational systems.

24,2622,612Nov 14, 20191 months agoApache-2.0
agentagentsaigeminigenerative-ai+15 more

RAGFlow is a leading open-source Retrieval-Augmented Generation (RAG) engine that fuses cutting-edge RAG with Agent capabilities to create a superior context layer for LLMs

71,2227,788Dec 12, 20233 months agoApache-2.0
agentagenticagentic-aiagentic-workflowai+15 more

Context7 MCP Server -- Up-to-date code documentation for LLMs and AI code editors

42,1562,047Mar 26, 20253 months agoMIT
llmmcpmcp-servervibe-coding

Technical LLM System - RAG Core A Stack-Aware + Routing-Ready Reasoning Core for modular RAG (Retrieval-Augmented Generation) systems

10Jan 25, 20262 months agoApache-2.0

Discussion