GPT-5.3-Codex-Spark : Ultra-Fast AI Model That Rewrites Real-Time Coding

Q: How fast is Codex-Spark compared to standard generation speeds?

Codex-Spark generates over 1,000 tokens per second, compared to typical AI generation speeds of 65 to 70 tokens per second. This represents approximately 15 times faster code generation, enabling near-instant responses and fluid interaction during live coding sessions.

Q: What are Codex-Spark's current limitations?

During the research preview, Codex-Spark is text-only with a 128,000 token context window and lacks multimodal capabilities. The model is optimized for interactive coding workflows with targeted edits rather than comprehensive autonomous tasks. Users may experience access limits or queuing during periods of high demand.

Quick Brief

GPT-5.3-Codex-Spark generates code at 1,000+ tokens/second on Cerebras Wafer Scale Engine 3 hardware
Available now to ChatGPT Pro users in Codex app, CLI, and VS Code extension as research preview
Optimized for real-time collaboration interrupts, redirects, and instant edits replace wait times
Infrastructure improvements reduce client/server roundtrip overhead by 80% and time-to-first-token by 50%

OpenAI has fundamentally redefined real-time software development and GPT-5.3-Codex-Spark proves it. Released February 12, 2026, this ultra-fast AI coding model represents the first milestone in OpenAI’s partnership with Cerebras, delivering over 1,000 tokens per second when served on Cerebras’ Wafer Scale Engine 3 hardware. Unlike previous AI coding assistants that force developers into batch-style workflows with multi-minute wait times, Codex-Spark enables instant feedback loops where you can interrupt, redirect, and iterate at the speed of thought.

What Makes GPT-5.3-Codex-Spark Different From Standard Codex

GPT-5.3-Codex-Spark is a smaller, speed-optimized version of OpenAI’s flagship GPT-5.3-Codex model, specifically engineered for latency-sensitive coding workflows. While GPT-5.3-Codex excels at long-running autonomous tasks spanning hours or days, Codex-Spark focuses on real-time collaboration making targeted edits, reshaping logic, and refining interfaces with near-instant responses.

The model launches with a 128k context window and text-only capabilities during the research preview phase. OpenAI optimized Codex-Spark’s default working style to be lightweight: it makes minimal, targeted edits and doesn’t automatically run tests unless explicitly requested. This design choice reflects the model’s core mission keeping developers in a tight interactive loop rather than handling comprehensive autonomous tasks.

Performance benchmarks demonstrate Codex-Spark’s dual strength: speed and capability. On SWE-Bench Pro and Terminal-Bench 2.0 two rigorous evaluations of agentic software engineering capability GPT-5.3-Codex-Spark demonstrates strong performance while accomplishing tasks in a fraction of the time compared to GPT-5.3-Codex. According to Cerebras, Codex-Spark produces more capable responses than GPT-5.1-Codex-mini while completing tasks at significantly higher speeds.

The speed advantage is substantial: Codex-Spark operates at over 1,000 tokens per second compared to standard generation speeds of approximately 65-70 tokens per second, representing roughly 15x faster code generation. This performance leap transforms coding assistants from batch processing tools into real-time collaborative partners.

What is GPT-5.3-Codex-Spark’s inference speed?

GPT-5.3-Codex-Spark delivers over 1,000 tokens per second when served on Cerebras’ Wafer Scale Engine 3 hardware. This ultra-low latency enables near-instant feedback during live coding sessions, allowing developers to interrupt or redirect the model mid-task.

Cerebras Partnership: The Hardware Advantage Behind Speed

Codex-Spark runs exclusively on Cerebras’ Wafer Scale Engine 3 (WSE-3), a purpose-built AI accelerator that differs fundamentally from traditional GPU architectures. Unlike semiconductor manufacturers that dice wafers into small chips, Cerebras’ WSE technology transforms an entire wafer into a single massive chip. This architecture features the largest on-chip memory of any AI processor, enabling high-speed inference at thousands of tokens per second per user.

OpenAI announced its multi-year partnership with Cerebras on January 14, 2026. Just four weeks later, GPT-5.3-Codex-Spark became the first integration from this collaboration. Sachin Katti, Head of Industrial Compute at OpenAI, described Cerebras as “a great engineering partner” and emphasized that bringing wafer-scale compute into production gives OpenAI “a new way to keep Codex responsive for latency-sensitive work“.

The partnership complements rather than replaces OpenAI’s existing infrastructure. GPUs remain foundational across training and inference pipelines, delivering cost-effective tokens for broad usage. Cerebras excels at workflows demanding extremely low latency, and OpenAI notes that GPUs and Cerebras can be combined for single workloads to reach optimal performance.

Sean Lie, CTO and Co-Founder of Cerebras, stated: “What excites us most about GPT-5.3-Codex-Spark is partnering with OpenAI and the developer community to discover what fast inference makes possible new interaction patterns, new use cases, and a fundamentally different model experience. This preview is just the beginning“.

Infrastructure Improvements Beyond Model Speed

Model speed represents only part of OpenAI’s real-time collaboration equation. The company implemented end-to-end latency improvements across its entire serving infrastructure that will benefit all future models.

OpenAI streamlined response streaming from client to server, rewrote key inference stack components, and reworked session initialization to reduce time-to-first-token by 50%. Through introducing a persistent WebSocket connection and targeted optimizations inside the Responses API, OpenAI reduced overhead per client/server roundtrip by 80% and per-token overhead by 30%.

The WebSocket path is enabled for Codex-Spark by default and will become the standard for all models soon. These architectural improvements ensure Codex stays responsive as developers iterate, maintaining the tight feedback loop that defines real-time collaboration.

Performance Metric	Improvement	Impact
Client/server roundtrip overhead	80% reduction	Faster response streaming
Per-token overhead	30% reduction	Sustained high-speed generation
Time-to-first-token	50% reduction	Instant feedback start
Token generation speed	1,000+ tokens/second	Near-instant code delivery

Availability and Access Strategy

GPT-5.3-Codex-Spark is rolling out as a research preview exclusively to ChatGPT Pro users starting February 12, 2026. Access is available through the latest versions of the Codex app, CLI, and VS Code extension.

Because Codex-Spark runs on specialized low-latency hardware, usage is governed by separate rate limits that may adjust based on demand during the research preview. OpenAI warns that during high-demand periods, users may experience limited access or temporary queuing as the company balances reliability across users.

OpenAI is also making Codex-Spark available via API to a small set of design partners to understand how developers want to integrate the model into their products. Access will expand over the coming weeks as OpenAI continues tuning the integration under real workloads.

During the research preview, Codex-Spark has its own rate limits and usage will not count towards standard rate limits. This separation ensures early adopters can experiment without impacting their primary Codex usage quotas.

How can developers access GPT-5.3-Codex-Spark?

ChatGPT Pro subscribers can access GPT-5.3-Codex-Spark through the Codex app, CLI, or VS Code extension starting February 12, 2026. A limited number of developers can also access it via API as design partners. Expansion to broader audiences will occur in the coming weeks.

Real-World Use Cases and Workflow Integration

Codex-Spark is optimized for interactive work where latency matters as much as intelligence. Developers can collaborate with the model in real time, interrupting or redirecting it as it works, and rapidly iterate with near-instant responses.

The model excels at making precise edits, revising plans, and answering contextual questions about codebases. It provides a fast way to visualize new layouts, refine styling, and test interface changes. This capability addresses a persistent frustration in AI-assisted coding: the wait time between asking a straightforward question and receiving a response.

OpenAI envisions Codex evolving into a system with two complementary modes: longer-horizon reasoning and execution combined with real-time collaboration for rapid iteration. Over time, these modes will blend Codex can keep developers in a tight interactive loop while delegating longer-running work to sub-agents in the background, or fanning out tasks to many models in parallel when breadth and speed are required.

The 1,000 tokens per second capability transforms the coding assistant from a batch processing tool into a conversational partner that responds at human conversation speed.

Future Roadmap and Capability Expansion

Codex-Spark is the first in a family of ultra-fast models. As OpenAI learns more with the developer community about where fast models shine for coding, the company plans to introduce additional capabilities including larger models, longer context lengths, and multimodal input.

Cerebras states that their broader goal is to accelerate a wide spectrum of AI workloads across both real-time and asynchronous use cases. The Wafer-Scale Engine architecture scales out to thousands of systems, extending fast memory capacity into the multi-terabyte domain to support trillion-parameter models for both training and inference. Cerebras expects to bring ultra-fast inference capability to OpenAI’s largest frontier models in 2026.

OpenAI emphasizes that as models become more capable, interaction speed becomes a clear bottleneck. Ultra-fast inference tightens the feedback loop, making Codex feel more natural to use and expanding what’s possible for anyone turning ideas into working software.

Safety and Security Evaluation

Codex-Spark includes the same safety training as OpenAI’s mainline models, including cyber-relevant training. OpenAI evaluated Codex-Spark as part of its standard deployment process, which includes baseline evaluations for cyber and other capabilities.

The company determined that Codex-Spark does not have a plausible chance of reaching OpenAI’s Preparedness Framework threshold for high capability in cybersecurity or biology. This evaluation provides assurance that the speed optimizations did not compromise the model’s safety guardrails.

Limitations to Consider

While Codex-Spark delivers impressive speed, several limitations exist during the research preview phase. The model is currently text-only with a 128k context window, lacking multimodal capabilities available in other OpenAI models.

OpenAI designed Codex-Spark for a specific use case: real-time, interactive coding workflows where speed and responsiveness matter most. The model makes minimal, targeted edits rather than handling comprehensive autonomous tasks that GPT-5.3-Codex specializes in. This represents a deliberate trade-off: optimizing for rapid iteration in scenarios where immediate feedback enables better development flow.

Access constraints during high-demand periods may frustrate early adopters. Separate rate limits and potential queuing reflect the specialized hardware requirements and limited initial capacity as Cerebras ramps up datacenter infrastructure.

Frequently Asked Questions (FAQs)

What is GPT-5.3-Codex-Spark?

GPT-5.3-Codex-Spark is OpenAI’s ultra-fast AI coding model optimized for real-time software development. It delivers over 1,000 tokens per second on Cerebras Wafer Scale Engine 3 hardware, enabling instant feedback loops for targeted edits, logic adjustments, and interface refinements.

How fast is Codex-Spark compared to standard generation speeds?

Codex-Spark generates over 1,000 tokens per second compared to typical AI generation speeds of 65-70 tokens per second. This represents approximately 15 times faster code generation, enabling near-instant responses during live coding sessions.

Who can access GPT-5.3-Codex-Spark?

ChatGPT Pro subscribers can access Codex-Spark through the Codex app, CLI, and VS Code extension starting February 12, 2026. A limited number of design partners also have API access. OpenAI plans to expand availability in coming weeks.

What hardware powers Codex-Spark’s speed?

Codex-Spark runs exclusively on Cerebras’ Wafer Scale Engine 3, which transforms an entire silicon wafer into a single massive chip. This architecture provides the largest on-chip memory of any AI processor, enabling ultra-low latency inference.

Does Codex-Spark replace GPT-5.3-Codex?

No. Codex-Spark complements GPT-5.3-Codex rather than replacing it. While GPT-5.3-Codex handles long-running autonomous tasks, Codex-Spark focuses on real-time collaboration and rapid iteration. OpenAI envisions both working together in blended workflows.

What are Codex-Spark’s current limitations?

During research preview, Codex-Spark is text-only with a 128k context window and lacks multimodal capabilities. The model is optimized for interactive coding workflows with targeted edits rather than comprehensive autonomous tasks. Users may experience access limits or queuing during high-demand periods.

When will Codex-Spark support larger context windows?

OpenAI states that as they learn from developer community feedback, they will introduce additional capabilities including larger models, longer context lengths, and multimodal input. No specific timeline was provided for these enhancements.

How does OpenAI ensure Codex-Spark safety?

Codex-Spark includes the same safety training as OpenAI’s mainline models, including cyber-relevant training. OpenAI evaluated it using their standard deployment process and determined it does not reach Preparedness Framework thresholds for high capability in cybersecurity or biology.

Search for an article

GPT-5.3-Codex-Spark: OpenAI’s Ultra-Fast AI Model That Rewrites Real-Time Coding