xAI’s Grok 4 Fast is a cheaper, faster variant of Grok 4 with a 2M-token context and unified model for both quick replies and deep reasoning.

Where does it rank on LMArena?

#1 on Search Arena (Elo 1163) and #8 on Text Arena at time of writing; rankings change over time.

Yes—limited time via OpenRouter and Vercel AI Gateway.

Grok 4 Fast: xAI’s 2M-Token Model With 98% Cost Cut & LMArena #1 (2025)

Q: How much does Grok 4 Fast cost?

Under 128k tokens per request: $0.20/M input, $0.50/M output, cached input $0.05/M. At ≥128k: $0.40/$1.00.

Q: Is Grok 4 Fast actually fast?

Independent testing shows ~296.8 tokens/second and low TTFT depending on provider.

xAI’s Grok 4 Fast is a cheaper, faster variant of Grok 4 with a massive 2M-token context window. It unifies “reasoning” and “non-reasoning” in one model, cuts thinking tokens by ~40%, and at current rates can match Grok 4’s benchmark level at ~98% lower price. It’s already #1 on LMArena’s Search Arena and #8 on Text. Pricing starts at $0.20/M input.

What is Grok 4 Fast?

Launched on September 19, 2025, Grok 4 Fast is xAI’s push for cost-efficient reasoning. You get Grok-class quality with a very large context window and lower token use so heavy research, long docs, and code bases become more doable without melting your budget. It ships in two variants (reasoning and non-reasoning) and runs in Grok.com, iOS/Android, and via API.

Why it matters

Teams in India juggling long PDFs, contracts, or multi-file repos often hit context limits or runaway costs. Grok 4 Fast’s 2M tokens plus low input rates are built to relieve both pressure points.

Core specs and what they mean in practice

2,000,000-token context: practical room for books, codebases, and multi-doc reviews. Plan on chunking strategy anyway, but you’ll do it less.
Unified model: one set of weights for both quick responses and deep “thinking,” steered by prompts. Fewer model swaps, simpler ops.
Speed: Independent testing shows ~296.8 tokens/sec with low TTFT (provider-dependent). Expect noticeably snappier output than Grok 4 for many flows.
Tool-use RL: End-to-end RL trains the model to decide when to browse, run code, or hop links useful for live research on X and the web.

Benchmarks & public evals (sanity check)

xAI reports strong math/reasoning scores-AIME 2025: 92%, HMMT 2025: 93.3%, GPQA Diamond: 85.7%-with ~40% fewer thinking tokens than Grok 4 on average. In crowdsourced testing, “grok-4-fast-search” (codename menlo) holds #1 on LMArena Search (Elo 1163), and “grok-4-fast” places #8 on the Text Arena. These are public, moving leaderboards-expect shifts.

Short Answer: Grok 4 Fast looks competitive with frontier models on math and search-style tasks while being much cheaper to operate than Grok 4. That combo quality + cost is the draw.

Pricing deep-dive (and realistic monthly bills)

Under 128k input tokens per request:

Input: $0.20/M
Output: $0.50/M
Cached input: $0.05/M

At ≥128k input tokens per request:

Input: $0.40/M
Output: $1.00/M

Both reasoning and non-reasoning SKUs support the full 2M context.

Example bill (dev team, India):

30 prompts/day, each 60k input tokens, 3k output tokens → ~1.8M input/day (~$0.36) + 90k output/day (~$0.045) → ≈ $12/month. If you reuse long context via caching, it drops further. (Numbers scale linearly; watch for occasional long outputs.)

Reality check: Grok 4 Fast’s blended price factoring typical 3:1 input:output sits well below most frontier “reasoning” models in independent dashboards.

Grok 4 Fast vs Grok 4

Feature	Grok 4 Fast	Grok 4 (0709)
Context window	2,000,000	256,000 (API)
Modes	Unified (reasoning/non-reasoning)	Reasoning only
Speed (independent tests)	~296.8 tok/s (provider-dependent)	~50-75 tok/s typical
Price (≤128k/request)	$0.20/M in, $0.50/M out, $0.05/M cached	$3.00/M in, $15.00/M out
LMArena presence	#1 Search, #8 Text (as of Sep 21)	Strong Text showing; Search variant behind Grok 4 Fast

When to choose which

Pick Grok 4 Fast for search/grounded tasks, long-context analysis, bulk inference, and cost-sensitive workloads.
Keep Grok 4 if your pipelines are tuned to its output style and you don’t hit context/cost ceilings.

Real-world fit (who benefits most)

Search/research agents: LMArena Search results suggest strong performance when grounded browsing matters (Elo lead over o3-search).
Support & content ops: 2M context helps keep long histories active—fewer truncation bugs, fewer “what did we talk about?” moments.
Coding/QA: Speed bump helps iterative prompts; still measure tool-use accuracy against your test suite.

How to try it today (fastest paths)

Grok.com / Apps: Available to all users; Fast/Auto modes route to Grok 4 Fast where it makes sense.
API (xAI): Two SKUs-grok-4-fast-reasoning and grok-4-fast-non-reasoning-both with 2M context.
OpenRouter & Vercel AI Gateway: Free for a limited time, handy for quick POCs or usage caps.

Mini How-To (OpenRouter quickstart)

Create an OpenRouter key and select Grok 4 Fast (free).
Point your OpenAI-compatible client at OpenRouter’s endpoint.
For deeper chains, toggle the reasoning variant and test cache hits on long, repeated context.

Limitations & considerations

Latency varies: Independent dashboards show low TTFT for Grok 4 Fast, but providers differ. Validate in your region.
Benchmarks ≠ your workload: xAI’s scores are solid, yet task-domain drift is real. Pilot on your own eval set.
Live Search costs extra in xAI’s API (per source). Factor this in if you lean on browsing.

FAQs

Does the 2M context apply to both SKUs?
Yes-reasoning and non-reasoning both support 2M tokens.

What’s the difference between reasoning and non-reasoning?
Same base model; prompts steer the “thinking” depth. Non-reasoning is faster/cheaper for simple tasks.

How does it compare to Grok 4 on cost?
xAI says ~40% fewer thinking tokens, plus lower per-token rates, yielding ~98% lower price to match Grok 4’s benchmark level.

Is browsing included?
The model can decide to browse, but Live Search metering applies on the API.

What speed should I expect in India?
Varies by provider/route. Use a short canary prompt to measure TTFT and throughput before rollout.

Where can I see public rankings?
LMArena’s Search/Text leaderboards update regularly.

Featured Answer boxes

What is Grok 4 Fast?

xAI’s Grok 4 Fast is a cheaper, faster variant of Grok 4 with a 2M-token context and a unified model for both quick replies and deep reasoning. It targets lower costs by using ~40% fewer thinking tokens while maintaining comparable benchmark scores.

How much does Grok 4 Fast cost?

For requests under 128k tokens, pricing starts at $0.20/M input and $0.50/M output, with cached input at $0.05/M. Above 128k, input/output double to $0.40/$1.00 per million tokens.

Is Grok 4 Fast actually fast?

As of Sep 21, 2025, grok-4-fast-search is #1 on LMArena’s Search Arena (Elo 1163), and grok-4-fast is #8 on the Text Arena. Rankings change over time.

Can I try it free?

Yes for a limited time via OpenRouter and Vercel AI Gateway. Availability may change; check the listings before you build.

Source: xAI | xAI Docs

Search for an article

Grok 4 Fast: What xAI’s 2M-Token, Ultra-Cheap Model Means for You

Table of Contents

What is Grok 4 Fast?

Why it matters

Core specs and what they mean in practice

Benchmarks & public evals (sanity check)

Pricing deep-dive (and realistic monthly bills)

Grok 4 Fast vs Grok 4

Real-world fit (who benefits most)

How to try it today (fastest paths)

Mini How-To (OpenRouter quickstart)

Limitations & considerations

FAQs

Featured Answer boxes

What is Grok 4 Fast?

How much does Grok 4 Fast cost?

Is Grok 4 Fast actually fast?

Can I try it free?

Latest articles

More like this