Quick Brief
- Sarvam Akshar launched February 2026 as India’s first document intelligence workbench for 22 languages
- Outperforms Gemini 3 Pro, GPT-5.2, and Claude Opus 4.5 on Indic script recognition
- Visual grounding pinpoints exact text coordinates with automated error correction loops
- Processes 19th-century Tamil and Gujarati manuscripts 100x faster than manual transcription
Sarvam AI has fundamentally redefined document intelligence for India and Akshar proves it. Released on February 14, 2026, this workbench doesn’t just extract text; it reasons over complex layouts, grounds visual elements, and automates proofreading for documents spanning from 1800s manuscripts to modern multi-script forms. Built atop the 3-billion parameter Sarvam Vision model, Akshar addresses what legacy OCR and frontier LLMs consistently fail: accurate knowledge extraction from low-quality Indian language documents.
What Sarvam Akshar Actually Solves
Traditional OCR engines Tesseract, EasyOCR, Google Cloud Vision operate on a fatal assumption: documents follow predictable, single-script, high-resolution patterns. That assumption collapses when confronted with Indian realities: mixed Devanagari-Latin scripts, hand-filled KYC forms, 150-year-old newspaper scans with complex conjuncts (matras), and multi-column layouts read linearly.
Multimodal LLMs like Gemini and GPT made progress on reading order and key-value extraction, but introduced new problems. Their probabilistic outputs lack auditability, hallucinate modern spellings for archaic fonts, and require constant prompt tuning. A linguist validating a raw API output must compare text against images line-by-line, a process that takes hours per page.
Sarvam Akshar eliminates this bottleneck by coupling Sarvam Vision with an agent loop. The workbench identifies script uncertainties, allowing experts to validate hundreds of pages in the time previously required for one. This isn’t passive extraction; it’s active reasoning that understands semantic relationships between document elements.
Visual Grounding: The Technical Breakthrough
Akshar’s defining capability is visual grounding, pinpointing exact coordinates of text and structural elements within documents. When processing a 19th-century Gujarati manuscript, the system doesn’t just output transcribed text. It maps each word to its pixel location, preserves nested table hierarchies, and differentiates body text from marginalia.
This spatial awareness enables three critical functions legacy tools cannot deliver:
- Layout-aware extraction that maintains reading order across multi-column pages without linear corruption
- Block-level semantic understanding that distinguishes headers, footnotes, captions, and paragraph structure
- Human-in-the-loop validation where experts click uncertain blocks to verify, not re-transcribe entire pages
The system detects 94 distinct blocks in complex documents, categorizing each as header, headline, paragraph, image, or footnote with contextual accuracy.
Performance That Eclipses Frontier Models
Sarvam Vision, the foundation model powering Akshar, achieves state-of-the-art results on three benchmarks: olmOCR-Bench, OmniDocBench (English), and the proprietary Sarvam Indic OCR Bench.
Sarvam AI’s Indic language accuracy vs. competitors (2026 data):
| Language | Sarvam Vision | Gemini 3 Pro | GPT-5.2 | Claude Opus 4.5 | Google Cloud Vision |
|---|---|---|---|---|---|
| Hindi | 95.91% | 95.12% | 84.86% | 93.08% | 90.94% |
| Bengali | 92.61% | 90.79% | 70.52% | 83.76% | 88.23% |
| Tamil | 93.42% | 92.73% | 61.87% | 89.62% | 89.69% |
| Telugu | 87.70% | 85.32% | 35.70% | 71.28% | 82.58% |
| Kannada | 89.89% | 87.36% | 26.49% | 77.41% | 85.54% |
| Malayalam | 91.60% | 87.10% | 56.66% | 82.88% | 88.30% |
| Odia | 81.95% | 75.39% | 10.53% | 57.22% | 82.20% |
| Santhali | 80.32% | 64.02% | 27.44% | 36.62% | 54.79% |
The gap widens dramatically for low-resource languages. GPT-5.2 scores negative accuracy on Kashmiri (-0.60%) and Sanskrit (-21.22%), while Sarvam Vision maintains 55.93% and 81.65% respectively. This 15-20% accuracy advantage over global providers stems from dedicated training on each of India’s 22 official languages, not transfer learning from English models.
What accuracy gains mean in practice:
A 10-percentage-point accuracy improvement on a 10,000-word document reduces manual corrections from 1,500 errors to 500 errors. For institutions digitizing historical archives, libraries, courts, government records Akshar’s precision translates to 67% less human verification time.
The Knowledge Extraction Workbench Architecture
Akshar operates as an intelligence layer, not a standalone API. Users upload documents through the web interface, and the system returns structured JSON with four core components:
- Extracted blocks (text content organized by semantic type)
- Visual coordinates (bounding boxes for each text element)
- Confidence scores (per-word uncertainty metrics)
- Layout graph (spatial relationships between blocks)
The interface displays the source document on the left with clickable blocks. Selecting a block scrolls to its corresponding text output on the right, enabling rapid verification workflows. Blocks flagged with low confidence scores appear highlighted, directing expert attention only where the model indicates uncertainty.
This design philosophy agents augmenting humans rather than replacing them addresses the “last-mile problem” in document digitization. Full automation isn’t feasible for historical archives with damaged scans, but Akshar’s targeted verification reduces the cognitive load from 100% manual transcription to 5-10% spot-checking.
Real-World Application: Digitizing 19th-Century Manuscripts
The workbench was stress-tested on documents dating to the 1800s Gujarati and Tamil manuscripts with archaic fonts, complex conjuncts, and degraded paper quality. Legacy OCR systems hallucinate modern spellings when encountering historical orthography, producing outputs that require complete revalidation.
Akshar’s approach differs fundamentally. The model flags uncertain regions (faded ink, ambiguous matras, overlapping text) and presents them to linguists for targeted review. A single expert can now validate 200-300 pages per day versus 2-3 pages with traditional OCR-plus-manual-checking workflows.
The Amrita Bazar Patrika Independence Number (August 15, 1947) serves as a demonstration case. This historical newspaper contains multi-column English text, vintage Bengali script, embedded images, and nested footnotes. Akshar correctly identified 94 distinct blocks, categorized image captions, preserved reading order across columns, and extracted photo descriptions that legacy OCR misread as body text.
How Akshar Compares to Document AI Alternatives
vs. Google Document AI Workbench:
Google’s platform excels at custom processor training and generative AI-powered extraction. It requires as few as 10 sample documents for fine-tuning and offers 99.9% uptime SLA. However, it was primarily trained on English and European languages. Sarvam Akshar achieves 15-20% higher accuracy on Indian language documents without requiring fine-tuning datasets.
vs. AWS Textract:
Amazon’s service provides excellent general-purpose OCR with table extraction and form parsing. Like Google Cloud Vision, it struggles with regional script variations and low-resource languages such as Santhali, Bodo, and Kashmiri. Textract wasn’t designed for the mixed-script, low-resolution scans prevalent in Indian institutional workflows.
vs. Legacy OCR (Tesseract, EasyOCR):
Open-source engines use bottom-up character recognition without semantic context. They fail catastrophically on multi-column layouts, reading across the page linearly and producing discontinuous text. Indic script support remains weak, with frequent misinterpretation of matras and diacritics. Sarvam Akshar’s top-down, layout-aware approach eliminates these architectural limitations.
vs. Multimodal LLMs (Gemini, GPT, Claude):
Frontier models demonstrate strong general document understanding but lack three critical features for production use: (1) deterministic outputs with auditability trails, (2) visual grounding for human verification, and (3) specialized training on low-resource Indic scripts. GPT-5.2’s 26.49% accuracy on Kannada versus Akshar’s 89.89% illustrates the specialization gap.
Supported Languages and Script Coverage
Sarvam Akshar processes all 22 official Indian languages defined in the Constitution’s Eighth Schedule:
Devanagari script: Hindi, Marathi, Nepali, Sanskrit, Konkani, Maithili
Bengali-Assamese script: Bengali, Assamese, Manipuri
Dravidian scripts: Tamil, Telugu, Kannada, Malayalam
Perso-Arabic script: Urdu, Kashmiri
Other scripts: Gujarati, Odia, Punjabi (Gurmukhi), Sindhi, Santhali (Ol Chiki), Dogri, Bodo
The system handles handwritten text across all languages, though accuracy decreases by 5-8% for highly stylized handwriting compared to printed text. It automatically detects language switches within mixed-script documents critical for Indian forms that combine English headers with regional language responses.
Limitations and Considerations
Sarvam Akshar’s accuracy on low-resource languages (Kashmiri at 55.93%, Maithili at 81.95%) lags behind high-resource languages like Hindi and Bengali. The Sarvam Indic OCR Bench contains 20,267 test samples, but distribution is uneven some languages have fewer than 500 samples.
The workbench requires internet connectivity for cloud-based processing. Organizations with data sovereignty requirements must evaluate whether routing sensitive documents through external APIs meets compliance standards.
Highly degraded scans (water damage, severe fading, torn pages) still require manual intervention. Akshar reduces validation time but doesn’t eliminate the need for domain experts when processing archival materials.
When does Sarvam Akshar deliver maximum ROI?
Organizations digitizing large-scale Indian language document collections, government archives, legal case files, medical records, historical newspapers see 10-50x productivity improvements. Single-document processing or English-only workflows may not justify the platform investment compared to general-purpose OCR.
Accessing Sarvam Akshar in February 2026
Sarvam AI operates Akshar as part of its sovereign AI platform hosted on Indian compute infrastructure. The company offers API access and a web-based workbench interface. Pricing details were not disclosed in the February 2026 launch announcement.
Developers can access Sarvam Vision’s underlying model through the company’s API for integration into custom workflows. The Sarvam Indic OCR Bench is available for researchers evaluating document intelligence models on Indian languages.
India’s document digitization challenge spans 1.4 billion people, 22 official languages, and centuries of paper records stored in sub-optimal conditions. Sarvam Akshar represents the first production-ready solution architected specifically for this complexity. The 87% average accuracy across all Indic scripts isn’t just a benchmark win, it’s the threshold where automated document processing becomes economically viable at population scale.
Frequently Asked Questions (FAQs)
What makes Sarvam Akshar different from Google Cloud Vision OCR?
Sarvam Akshar was built specifically for Indian languages with dedicated models for each of the 22 official scripts. This results in 15-20% higher accuracy on Indic documents compared to Google Cloud Vision, which was primarily trained on English and European languages.
Can Sarvam Akshar process handwritten Indian language documents?
Yes. Sarvam Vision has been specifically trained on handwritten text across all 22 Indian languages. Accuracy is typically 5-8% lower for handwritten content compared to printed text, but significantly outperforms general-purpose OCR tools on Indian language handwriting.
How does visual grounding work in the Akshar workbench?
Visual grounding pinpoints exact pixel coordinates for each text element and structural component in documents. Users can click any block in the interface to see its corresponding location in the source image, enabling rapid verification workflows without full manual transcription.
Which Indian languages does Sarvam Akshar support?
Akshar processes all 22 official Indian languages: Hindi, Bengali, Tamil, Telugu, Marathi, Malayalam, Kannada, Odia, Punjabi, Gujarati, Urdu, Sindhi, Santhali, Sanskrit, Nepali, Manipuri, Maithili, Konkani, Kashmiri, Dogri, Bodo, and Assamese.
What accuracy does Sarvam Akshar achieve on Telugu and Kannada documents?
Sarvam Vision achieves 87.70% word accuracy on Telugu and 89.89% on Kannada based on the Sarvam Indic OCR Bench. This outperforms GPT-5.2 (35.70% Telugu, 26.49% Kannada) and Claude Opus 4.5 (71.28% Telugu, 77.41% Kannada) by significant margins.
Can Akshar handle documents with multiple languages on the same page?
Yes. Sarvam Akshar automatically detects language switches and maintains context across different scripts within a single document. This is particularly useful for Indian government forms, academic papers, and business correspondence that mix English with regional languages.
How long does it take to process a 100-page historical document?
Processing speed depends on document quality and complexity. For 19th-century manuscripts with archaic fonts, Akshar enables experts to validate 200-300 pages per day versus 2-3 pages with traditional OCR workflows, a 100x productivity improvement.
Is Sarvam Akshar suitable for small businesses or individual researchers?
Akshar delivers maximum ROI for organizations processing large-scale document collections (10,000+ pages). Single-document workflows or English-only content may not justify the platform investment compared to general-purpose OCR tools.

