paperarXivTrust 82 · PrimaryPublished 5d agoLive · 3d ago

Can OCR-VLMs Read Devanagari? A Stress-Test Benchmark and Post-Correction Study

OCR systems, ranging from classical engines to specialised OCR vision-language models (OCR-VLMs) and frontier multimodal LLMs, report strong results on English and Chinese document benchmarks, yet their behaviour on Indic scripts is largely uncharacterised. We benchmark ten systems on Devanagari (Hindi): classical EasyOCR; open VLMs (Qwen2.5-VL-3B, Qwen3-VL-8B, olmOCR-7B); specialised OCR-VLMs (DeepSeek-OCR, Unlimited-OCR); and frontier closed models (Gemini 2.5 Flash, Claude Opus 4.7, GPT-5.5, Mistral OCR), across four synthetic degradation conditions and 300 real printed scans. We report fou

Lineage graph

Paper → model → repo connections mined from source citations (Tier-1 exact match).

Covers

newsFind the best open-source OCR models in one place at Papers with Code [P]newsIs Qwen3-VL-2B the only viable VLM for JSON extraction on a "potato"?newsPP-OCRv6 on Hugging Face: 50-Language OCR from 1.5M to 34.5M Parameters

Covers (incoming)

newsTurboOCR v3 — high-speed document OCR server (C++/CUDA), ~520 img/s on RTX 5090

Implements (incoming)

repoUfonik88/invoice-ocr-app

Related across the graph

repoUfonik88/invoice-ocr-app newsIs Qwen3-VL-2B the only viable VLM for JSON extraction on a "potato"?newsPP-OCRv6 on Hugging Face: 50-Language OCR from 1.5M to 34.5M Parameters newsFind the best open-source OCR models in one place at Papers with Code [P]newsTurboOCR v3 — high-speed document OCR server (C++/CUDA), ~520 img/s on RTX 5090

Topics

cs.CV