Cohere Tiny Aya: Multilingual AI That Runs on Your Phone

Q: How many languages does Tiny Aya support?

Tiny Aya supports over 70 languages. TinyAya-Fire covers eight South Asian languages: Hindi, Bengali, Punjabi, Urdu, Gujarati, Tamil, Telugu, and Marathi.

Q: Can Tiny Aya run offline on a smartphone?

Yes. Cohere built Tiny Aya for on-device use with less compute than comparable models. Independent testing recorded 32 tokens per second on an iPhone 17 Pro.

Q: What are the five Tiny Aya model variants?

Tiny Aya Base (pretrained foundation), Tiny Aya Global (instruction-tuned for broad use), TinyAya-Earth (African languages), TinyAya-Fire (South Asian languages), and TinyAya-Water (Asia-Pacific, West Asia, and Europe).

Q: How does Tiny Aya compare to Gemma 3-4B?

Tiny Aya outperforms Gemma 3-4B on GlobalMGSM benchmark for low-resource African and West Asian languages. Gemma 3 leads on context length (128K vs 8K tokens) and supports multimodal image-text inputs.

Q: Is Cohere Tiny Aya free to use?

Yes. Tiny Aya is open-weight and freely available on Hugging Face, Kaggle, and Ollama. The code and weights are publicly available for anyone to use and modify.

Q: Why was Tiny Aya launched at the India AI Summit?

Cohere launched at the India AI Summit because TinyAya-Fire covers eight major Indian languages with full offline capability, enabling developers in connectivity-limited areas to build native language apps without cloud access.

Key Takeaways

Cohere Tiny Aya packs 3.35 billion parameters and supports 70+ languages including Hindi, Tamil, Bengali, and Marathi
Five variants Base, Global, Earth, Fire, Water each optimized for a specific regional or deployment context
Runs entirely offline on standard laptops and phones; no internet or cloud subscription required
Outperforms Gemma 3-4B and Qwen 3-4B on low-resource language benchmarks (GlobalMGSM)

Most AI models demand English fluency, stable broadband, and a cloud server to function. Cohere’s Tiny Aya removes all three requirements at once. Launched February 17, 2026, at the India AI Summit, this open-weight model runs 70+ languages fully offline on a standard laptop, no internet required. What the specs, benchmark results, and real deployment data actually show is more consequential than the announcement itself.

What Is Cohere Tiny Aya?

Cohere Labs Cohere’s dedicated research division released Tiny Aya on February 17, 2026, as a family of open-weight small language models. The base model contains 3.35 billion parameters and handles more than 70 languages, with particular strength in underserved regional languages across Africa, South Asia, Asia-Pacific, West Asia, and Europe. The term “open-weight” means the model’s underlying code and weights are publicly available for anyone to use and modify.

Tiny Aya was built specifically to run on consumer hardware without an internet connection. Cohere engineered the underlying software to require less computing power than most comparable models, making offline translation and local language applications viable even on everyday devices. A technical report detailing the full training methodology is forthcoming from Cohere.

Five Tiny Aya Variants and What Each Does

Cohere released five models in the Tiny Aya family, each built for a different deployment scope. The core design principle across all variants is consistent: strong regional linguistic grounding while retaining broad multilingual coverage, making every variant a flexible starting point for further adaptation and research.

Cohere explained the rationale directly: “This approach allows each model to develop stronger linguistic grounding and cultural nuance, creating systems that feel more natural and reliable for the communities they are meant to serve. At the same time, all Tiny Aya models retain broad multilingual coverage, making them flexible starting points for further adaptation and research.“

Variant	Regional Focus	Primary Use Case
Tiny Aya Base	All 70+ languages (pretrained foundation)	Fine-tuning, research, custom app development
Tiny Aya Global	Balanced across all supported languages, instruction-tuned	Apps requiring broad cross-language command-following
TinyAya-Earth	Africa	African language apps and offline deployments
TinyAya-Fire	South Asia: Hindi, Bengali, Punjabi, Urdu, Gujarati, Tamil, Telugu, Marathi	India-focused apps and vernacular services
TinyAya-Water	Asia-Pacific, West Asia, and Europe	Regional deployments across APAC, Middle East, Europe

All five variants retain full cross-lingual capability regional specialization sharpens local performance without removing universal coverage.

Architecture: What Cohere Has Confirmed

Cohere has confirmed a limited set of architecture and training details, with a full technical report still pending. The verified facts are:

Total parameters: 3.35 billion
Training hardware: Single cluster of 64 Nvidia H100 GPUs
Training scale: Described as “relatively modest computing resources” by Cohere
On-device optimization: Software built specifically to require less computing power than most comparable models

No further architectural specifications, layer counts, vocabulary sizes, attention mechanisms, or pretraining token counts have been verified from primary sources as of this publication. This article will be updated when the official technical report is released.

How was Cohere Tiny Aya trained?

Cohere trained Tiny Aya on a single cluster of 64 Nvidia H100 GPUs, which the company describes as relatively modest computing resources for a model of its capability. The full training methodology, including data and architecture details, will be covered in a forthcoming technical report from Cohere.

Tiny Aya vs. Gemma 3: Where It Wins and Where It Falls Short

Tiny Aya and Gemma 3 target different priorities Tiny Aya optimizes for multilingual depth and offline accessibility, while Gemma 3 prioritizes context length and multimodal capability.

Feature	Tiny Aya	Google Gemma 3-4B
Parameters	3.35B	4B
Context window	8K tokens	128K tokens
Languages	70+ (deep regional coverage)	140+ (broad)
Low-resource language performance	Best on GlobalMGSM	Moderate
Regional variants	Yes (Earth/Fire/Water)	No
Multimodal (image + text)	No	Yes
Offline capable	Yes	Yes
Open-weight	Yes	Yes
Release	February 2026	March 2025

On the GlobalMGSM benchmark which tests translation and mathematical reasoning across diverse languages Tiny Aya outperforms both Gemma 3-4B and Qwen 3-4B specifically on low-resource African and West Asian languages. Gemma 3 retains a significant lead on context length (128K vs. 8K tokens) and is the stronger option for tasks involving image-text inputs or long documents.

On-Device Performance: Real Hardware Results

Cohere designed Tiny Aya’s software from the ground up to minimize compute requirements for on-device inference. Independent testing by Futurum Group recorded Tiny Aya achieving 32 tokens per second on an iPhone 17 Pro, a benchmark that confirms practical usability on current consumer mobile hardware.

This performance figure is significant for India specifically. A developer building a Hindi or Tamil language app can run TinyAya-Fire locally on a mid-range device without GPU acceleration, cloud API costs, or dependence on consistent broadband access. Cohere highlighted India’s linguistically diverse infrastructure as a primary motivator for the model’s offline-first architecture.

Why the India AI Summit Launch Matters

Cohere chose the India AI Summit as Tiny Aya’s global launch venue, a deliberate signal about the model’s target market. TinyAya-Fire directly addresses eight of India’s most widely spoken languages: Bengali, Hindi, Punjabi, Urdu, Gujarati, Tamil, Telugu, and Marathi. No prior small language model had covered this combination with offline capability at this parameter scale.

Cohere’s business trajectory reinforces the strategic focus. The company posted $240 million in annual recurring revenue at the end of 2025, with 50% quarter-over-quarter growth, and CEO Aidan Gomez has confirmed plans for a public offering. Tiny Aya positions Cohere as the leading multilingual AI infrastructure provider for sovereign and emerging market deployments ahead of that IPO.

How to Access and Deploy Tiny Aya

Tiny Aya is available across four platforms, with no paywall for model weights:

Hugging Face Download model weights and access training and evaluation datasets
Kaggle Local deployment and notebook-based experimentation
Ollama Single-command local setup for offline use on personal hardware
Cohere Platform Managed API access for production-scale enterprise integrations

Developers can use any of the regional variants as a starting point for fine-tuning toward more specific tasks or narrower language domains.

Limitations to Know Before Deploying

8K context window: Unsuitable for long-document tasks legal texts, extended reports, or multi-turn conversations beyond roughly 6,000 words
Text-only: No multimodal capability; cannot process images, audio, or video
Reasoning ceiling: Optimized for edge inference; complex multi-step logical reasoning favors larger models such as Cohere’s own Command R+
Language depth variance: 70+ languages does not mean uniform quality; major Indic and African languages are primary targets, smaller dialects may perform inconsistently
Pending technical report: Full architecture, training data, and benchmark methodology details are not yet publicly available performance claims beyond GlobalMGSM should be treated as provisional

Frequently Asked Questions (FAQs)

What is Cohere Tiny Aya?

Cohere Tiny Aya is a family of open-weight small language models released by Cohere Labs on February 17, 2026. With 3.35 billion parameters, it supports over 70 languages and runs fully offline on everyday devices like laptops and phones, with no internet connection required.

How many languages does Tiny Aya support?

Tiny Aya supports over 70 languages across African, South Asian, Asia-Pacific, West Asian, and European regions. The TinyAya-Fire regional variant covers eight South Asian languages: Hindi, Bengali, Punjabi, Urdu, Gujarati, Tamil, Telugu, and Marathi.

Can Tiny Aya run offline on a smartphone?

Yes. Cohere built Tiny Aya’s underlying software specifically for on-device use, requiring less computing power than most comparable models. Independent testing recorded 32 tokens per second on an iPhone 17 Pro, confirming practical offline performance on current consumer mobile hardware.

What are the five Tiny Aya model variants?

The five variants are: Tiny Aya Base (open pretrained multilingual foundation), Tiny Aya Global (instruction-tuned for broad cross-language use), TinyAya-Earth (African languages), TinyAya-Fire (South Asian languages including all eight major Indian languages), and TinyAya-Water (Asia-Pacific, West Asia, and Europe).

How does Tiny Aya compare to Gemma 3-4B?

Tiny Aya outperforms Gemma 3-4B on GlobalMGSM benchmark scores for low-resource African and West Asian languages. Gemma 3 leads on context length at 128K tokens versus Tiny Aya’s 8K, and supports multimodal image-text inputs that Tiny Aya does not.

Is Cohere Tiny Aya free to use?

Tiny Aya is open-weight and freely available on Hugging Face, Kaggle, and Ollama. The model’s underlying code and weights are publicly available for anyone to use and modify. Enterprise API access via the Cohere Platform is also available for production deployments.

Why was Tiny Aya launched at the India AI Summit?

Cohere selected the India AI Summit as the launch venue because Tiny Aya directly addresses India’s scale of linguistic diversity. TinyAya-Fire covers eight major Indian languages with full offline capability enabling developers in connectivity-limited areas to build native language applications without cloud access.

Where can I download Cohere Tiny Aya?

Cohere Tiny Aya is free to download on Hugging Face, Kaggle, and Ollama for local offline deployment. It is also available via the Cohere Platform for API-based enterprise use. Training and evaluation datasets are published on Hugging Face alongside all five model weight variants.

How does Tiny Aya compare to Gemma 3?

Tiny Aya (3.35B) outperforms Gemma 3-4B on GlobalMGSM benchmark scores for low-resource African and West Asian languages. However, Gemma 3-4B leads on context length 128K tokens vs. Tiny Aya’s 8K and supports multimodal image-text tasks that Tiny Aya does not offer. Both are open-weight and offline-capable.

Search for an article

Cohere’s Tiny Aya: The 3.35B-Parameter Model Bringing 70+ Languages to Your Pocket