Key Takeaways
- Cohere Tiny Aya packs 3.35 billion parameters and supports 70+ languages including Hindi, Tamil, Bengali, and Marathi
- Five variants Base, Global, Earth, Fire, Water each optimized for a specific regional or deployment context
- Runs entirely offline on standard laptops and phones; no internet or cloud subscription required
- Outperforms Gemma 3-4B and Qwen 3-4B on low-resource language benchmarks (GlobalMGSM)
Most AI models demand English fluency, stable broadband, and a cloud server to function. Cohere’s Tiny Aya removes all three requirements at once. Launched February 17, 2026, at the India AI Summit, this open-weight model runs 70+ languages fully offline on a standard laptop, no internet required. What the specs, benchmark results, and real deployment data actually show is more consequential than the announcement itself.
What Is Cohere Tiny Aya?
Cohere Labs Cohere’s dedicated research division released Tiny Aya on February 17, 2026, as a family of open-weight small language models. The base model contains 3.35 billion parameters and handles more than 70 languages, with particular strength in underserved regional languages across Africa, South Asia, Asia-Pacific, West Asia, and Europe. The term “open-weight” means the model’s underlying code and weights are publicly available for anyone to use and modify.
Tiny Aya was built specifically to run on consumer hardware without an internet connection. Cohere engineered the underlying software to require less computing power than most comparable models, making offline translation and local language applications viable even on everyday devices. A technical report detailing the full training methodology is forthcoming from Cohere.
Five Tiny Aya Variants and What Each Does
Cohere released five models in the Tiny Aya family, each built for a different deployment scope. The core design principle across all variants is consistent: strong regional linguistic grounding while retaining broad multilingual coverage, making every variant a flexible starting point for further adaptation and research.
Cohere explained the rationale directly: “This approach allows each model to develop stronger linguistic grounding and cultural nuance, creating systems that feel more natural and reliable for the communities they are meant to serve. At the same time, all Tiny Aya models retain broad multilingual coverage, making them flexible starting points for further adaptation and research.“
| Variant | Regional Focus | Primary Use Case |
|---|---|---|
| Tiny Aya Base | All 70+ languages (pretrained foundation) | Fine-tuning, research, custom app development |
| Tiny Aya Global | Balanced across all supported languages, instruction-tuned | Apps requiring broad cross-language command-following |
| TinyAya-Earth | Africa | African language apps and offline deployments |
| TinyAya-Fire | South Asia: Hindi, Bengali, Punjabi, Urdu, Gujarati, Tamil, Telugu, Marathi | India-focused apps and vernacular services |
| TinyAya-Water | Asia-Pacific, West Asia, and Europe | Regional deployments across APAC, Middle East, Europe |
All five variants retain full cross-lingual capability regional specialization sharpens local performance without removing universal coverage.
Architecture: What Cohere Has Confirmed
Cohere has confirmed a limited set of architecture and training details, with a full technical report still pending. The verified facts are:
- Total parameters: 3.35 billion
- Training hardware: Single cluster of 64 Nvidia H100 GPUs
- Training scale: Described as “relatively modest computing resources” by Cohere
- On-device optimization: Software built specifically to require less computing power than most comparable models
No further architectural specifications, layer counts, vocabulary sizes, attention mechanisms, or pretraining token counts have been verified from primary sources as of this publication. This article will be updated when the official technical report is released.
How was Cohere Tiny Aya trained?
Cohere trained Tiny Aya on a single cluster of 64 Nvidia H100 GPUs, which the company describes as relatively modest computing resources for a model of its capability. The full training methodology, including data and architecture details, will be covered in a forthcoming technical report from Cohere.
Tiny Aya vs. Gemma 3: Where It Wins and Where It Falls Short
Tiny Aya and Gemma 3 target different priorities Tiny Aya optimizes for multilingual depth and offline accessibility, while Gemma 3 prioritizes context length and multimodal capability.
| Feature | Tiny Aya | Google Gemma 3-4B |
|---|---|---|
| Parameters | 3.35B | 4B |
| Context window | 8K tokens | 128K tokens |
| Languages | 70+ (deep regional coverage) | 140+ (broad) |
| Low-resource language performance | Best on GlobalMGSM | Moderate |
| Regional variants | Yes (Earth/Fire/Water) | No |
| Multimodal (image + text) | No | Yes |
| Offline capable | Yes | Yes |
| Open-weight | Yes | Yes |
| Release | February 2026 | March 2025 |
On the GlobalMGSM benchmark which tests translation and mathematical reasoning across diverse languages Tiny Aya outperforms both Gemma 3-4B and Qwen 3-4B specifically on low-resource African and West Asian languages. Gemma 3 retains a significant lead on context length (128K vs. 8K tokens) and is the stronger option for tasks involving image-text inputs or long documents.
On-Device Performance: Real Hardware Results
Cohere designed Tiny Aya’s software from the ground up to minimize compute requirements for on-device inference. Independent testing by Futurum Group recorded Tiny Aya achieving 32 tokens per second on an iPhone 17 Pro, a benchmark that confirms practical usability on current consumer mobile hardware.
This performance figure is significant for India specifically. A developer building a Hindi or Tamil language app can run TinyAya-Fire locally on a mid-range device without GPU acceleration, cloud API costs, or dependence on consistent broadband access. Cohere highlighted India’s linguistically diverse infrastructure as a primary motivator for the model’s offline-first architecture.
Why the India AI Summit Launch Matters
Cohere chose the India AI Summit as Tiny Aya’s global launch venue, a deliberate signal about the model’s target market. TinyAya-Fire directly addresses eight of India’s most widely spoken languages: Bengali, Hindi, Punjabi, Urdu, Gujarati, Tamil, Telugu, and Marathi. No prior small language model had covered this combination with offline capability at this parameter scale.
Cohere’s business trajectory reinforces the strategic focus. The company posted $240 million in annual recurring revenue at the end of 2025, with 50% quarter-over-quarter growth, and CEO Aidan Gomez has confirmed plans for a public offering. Tiny Aya positions Cohere as the leading multilingual AI infrastructure provider for sovereign and emerging market deployments ahead of that IPO.
How to Access and Deploy Tiny Aya
Tiny Aya is available across four platforms, with no paywall for model weights:
- Hugging Face Download model weights and access training and evaluation datasets
- Kaggle Local deployment and notebook-based experimentation
- Ollama Single-command local setup for offline use on personal hardware
- Cohere Platform Managed API access for production-scale enterprise integrations
Developers can use any of the regional variants as a starting point for fine-tuning toward more specific tasks or narrower language domains.
Limitations to Know Before Deploying
- 8K context window: Unsuitable for long-document tasks legal texts, extended reports, or multi-turn conversations beyond roughly 6,000 words
- Text-only: No multimodal capability; cannot process images, audio, or video
- Reasoning ceiling: Optimized for edge inference; complex multi-step logical reasoning favors larger models such as Cohere’s own Command R+
- Language depth variance: 70+ languages does not mean uniform quality; major Indic and African languages are primary targets, smaller dialects may perform inconsistently
- Pending technical report: Full architecture, training data, and benchmark methodology details are not yet publicly available performance claims beyond GlobalMGSM should be treated as provisional
Frequently Asked Questions (FAQs)
What is Cohere Tiny Aya?
Cohere Tiny Aya is a family of open-weight small language models released by Cohere Labs on February 17, 2026. With 3.35 billion parameters, it supports over 70 languages and runs fully offline on everyday devices like laptops and phones, with no internet connection required.
How many languages does Tiny Aya support?
Tiny Aya supports over 70 languages across African, South Asian, Asia-Pacific, West Asian, and European regions. The TinyAya-Fire regional variant covers eight South Asian languages: Hindi, Bengali, Punjabi, Urdu, Gujarati, Tamil, Telugu, and Marathi.
Can Tiny Aya run offline on a smartphone?
Yes. Cohere built Tiny Aya’s underlying software specifically for on-device use, requiring less computing power than most comparable models. Independent testing recorded 32 tokens per second on an iPhone 17 Pro, confirming practical offline performance on current consumer mobile hardware.
What are the five Tiny Aya model variants?
The five variants are: Tiny Aya Base (open pretrained multilingual foundation), Tiny Aya Global (instruction-tuned for broad cross-language use), TinyAya-Earth (African languages), TinyAya-Fire (South Asian languages including all eight major Indian languages), and TinyAya-Water (Asia-Pacific, West Asia, and Europe).
How does Tiny Aya compare to Gemma 3-4B?
Tiny Aya outperforms Gemma 3-4B on GlobalMGSM benchmark scores for low-resource African and West Asian languages. Gemma 3 leads on context length at 128K tokens versus Tiny Aya’s 8K, and supports multimodal image-text inputs that Tiny Aya does not.
Is Cohere Tiny Aya free to use?
Tiny Aya is open-weight and freely available on Hugging Face, Kaggle, and Ollama. The model’s underlying code and weights are publicly available for anyone to use and modify. Enterprise API access via the Cohere Platform is also available for production deployments.
Why was Tiny Aya launched at the India AI Summit?
Cohere selected the India AI Summit as the launch venue because Tiny Aya directly addresses India’s scale of linguistic diversity. TinyAya-Fire covers eight major Indian languages with full offline capability enabling developers in connectivity-limited areas to build native language applications without cloud access.
Where can I download Cohere Tiny Aya?
Cohere Tiny Aya is free to download on Hugging Face, Kaggle, and Ollama for local offline deployment. It is also available via the Cohere Platform for API-based enterprise use. Training and evaluation datasets are published on Hugging Face alongside all five model weight variants.
How does Tiny Aya compare to Gemma 3?
Tiny Aya (3.35B) outperforms Gemma 3-4B on GlobalMGSM benchmark scores for low-resource African and West Asian languages. However, Gemma 3-4B leads on context length 128K tokens vs. Tiny Aya’s 8K and supports multimodal image-text tasks that Tiny Aya does not offer. Both are open-weight and offline-capable.

