PP-OCRv5: A Specialized 5M-Parameter Model Rivaling Billion-Parameter Vision-Language Models on OCR Tasks

Cheng Cui, Yubo Zhang, Ting Sun, Xueqing Wang, Hongen Liu +9 more
3/25/2026
cs.CV

Abstract

The advent of "OCR 2.0" and large-scale vision-language models (VLMs) has set new benchmarks in text recognition. However, these unified architectures often come with significant computational demands, challenges in precise text localization within complex layouts, and a propensity for textual hallucinations. Revisiting the prevailing notion that model scale is the sole path to high accuracy, this paper introduces PP-OCRv5, a meticulously optimized, lightweight OCR system with merely 5 million parameters. We demonstrate that PP-OCRv5 achieves performance competitive with many billion-parameter VLMs on standard OCR benchmarks, while offering superior localization precision and reduced hallucinations. The cornerstone of our success lies not in architectural expansion but in a data-centric investigation. We systematically dissect the role of training data by quantifying three critical dimensions: data difficulty, data accuracy, and data diversity. Our extensive experiments reveal that with a sufficient volume of high-quality, accurately labeled, and diverse data, the performance ceiling for traditional, efficient two-stage OCR pipelines is far higher than commonly assumed. This work provides compelling evidence for the viability of lightweight, specialized models in the large-model era and offers practical insights into data curation for OCR. The source code and models are publicly available at https://github.com/PaddlePaddle/PaddleOCR.

View on arXivView PDF

Code Implementations(1)

Apache-2.0

Turn any PDF or image document into structured data for your AI. A powerful, lightweight OCR toolkit that bridges the gap between images/PDFs and LLMs. Supports 100+ languages.

77,01110,357C, C++May 8, 20201 months agoApache-2.0
ai4sciencechineseocrdocument-parsingdocument-translationkie+8 more

Cite this paper

@article{cui2026ppocrv,
  title  = {PP-OCRv5: A Specialized 5M-Parameter Model Rivaling Billion-Parameter Vision-Language Models on OCR Tasks},
  author = {Cheng Cui and Yubo Zhang and Ting Sun and Xueqing Wang and Hongen Liu and Manhui Lin and Yue Zhang and Tingquan Gao and Changda Zhou and Jiaxuan Liu and Zelun Zhang and Jing Zhang and Jun Zhang and Yi Liu},
  year   = {2026},
  eprint = {2603.24373},
  archivePrefix = {arXiv},
  url    = {http://arxiv.org/abs/2603.24373v1}
}

Discussion