NVIDIA Cosmos on Jetson: Physical AI Runs at the Edge Now

Essential Points

Cosmos Reason2-2B runs on the full Jetson lineup, including the Orin Nano 8GB Super, after W4A16 quantization
Cosmos Reason2 was announced at CES 2026 with 2B and 8B model sizes and long-context support up to 256K tokens
Jetson AGX Thor delivered a 3.5x increase in LLM token throughput five weeks after launch via FlashInfer and xFormers kernel optimization
Leading physical AI organizations including 1X, Agility, Figure AI, Waabi, Foretellix, Uber, and XPENG adopted Cosmos at its platform launch

NVIDIA just demonstrated that physical AI inference no longer requires a data center. Cosmos world foundation models now run directly on Jetson edge hardware, from the AGX Thor down to the compact Orin Nano Super. This article breaks down exactly what that means for robotics developers, autonomous vehicle engineers, and anyone building AI systems that must reason about the physical world in real time.

What NVIDIA Cosmos Actually Does

Cosmos is not a single model. It is a unified platform of world foundation models (WFMs), tokenizers, and an accelerated data pipeline purpose-built for physical AI. Three distinct model families address different stages of the development pipeline:

Cosmos Predict generates future world states as photorealistic video, enabling navigation and manipulation systems to anticipate environmental changes
Cosmos Transfer converts 3D simulator output into photoreal video, directly bridging the sim-to-real gap that has blocked robotics progress for years
Cosmos Reason applies chain-of-thought visual reasoning for scene annotation, object understanding, and spatial planning

Together, these models address the most persistent bottleneck in physical AI: the cost and scarcity of real-world training data. Cosmos generates synthetic data at scale that is both photorealistic and physics-consistent, with models available under an open model license on the NVIDIA NGC catalog and Hugging Face.

Cosmos Reason2: What Changed at CES 2026

Cosmos Reason1, released at CES 2025, introduced a two-dimensional ontology for embodied reasoning and topped Hugging Face’s physical reasoning for video leaderboard. NVIDIA used CES 2026 to announce Cosmos Reason2, its most advanced reasoning vision-language model for physical AI to date.

Cosmos Reason2 adds four concrete improvements over its predecessor:

Enhanced physical reasoning and spatio-temporal understanding
Flexible deployment with 2B and 8B model sizes
Long-context understanding supporting up to 256K tokens
Object detection with 2D/3D point localizations and trajectory data

The 2B size targets constrained edge hardware. The 8B size serves higher-compute deployments requiring deeper scene analysis and planning capability. NVIDIA CEO Jensen Huang has positioned this as the beginning of the “age of physical AI,” where agentic models drive robots and vehicles rather than just processing text.

Why the Jetson Deployment Changes Everything

Deploying Cosmos on cloud infrastructure is straightforward with sufficient GPU budget. Deploying it on a 40-watt edge device is a different problem. The Jetson family, spanning the AGX Thor, AGX Orin 64GB, Orin NX, and Orin Nano Super, is purpose-built for accelerated physical AI and robotics applications at the edge.

The critical milestone arrived in February 2026. Engineers quantized Cosmos Reason2-2B to W4A16 precision and optimized it to run across the entire Jetson lineup, including the Orin Nano 8GB Super with only 8GB of unified memory. The quantized model uses approximately 5.8GB of RAM at max_model_length=2048 and achieves approximately 16 to 17 tokens per second for text, image, and video inference on Orin Nano hardware.

That is not just a benchmark. It means a sub-$500 edge module can now reason about camera footage, identify objects, understand spatial relationships, and plan actions, all without a cloud connection.

Cosmos Reason2 on Jetson Orin: Real Numbers

The quantized Cosmos Reason2-2B configuration on Jetson hardware reflects genuine engineering trade-offs:

Device	RAM Used	Tokens/Sec	Notes
Jetson AGX Thor	Full precision supported	High throughput	Official NVIDIA target; benefits from 3.5x vLLM gains
Jetson AGX Orin 64GB	High	High throughput	Parallel dual-board configs tested
Jetson Orin NX 16GB	Moderate	Moderate	Recommended for 8-frame video inference
Jetson Orin Nano Super 8GB	~5.8GB	16 to 17 tok/s	W4A16 quantized, max-length 2048

For the AGX Thor, the vLLM container update delivered up to 3.5x improvement in LLM token throughput via FlashInfer and xFormers kernel optimization, five weeks after the platform launched in September 2025. NVIDIA has confirmed developers can expect similar gains across other models, including Cosmos workloads, with continued software optimization.

The Physical AI Stack: From Simulation to Edge

The deployment workflow follows a structured pipeline that connects cloud-scale simulation to edge inference:

Access Cosmos models via the NVIDIA API catalog or download from NVIDIA NGC or Hugging Face
Generate synthetic world simulation scenarios using NVIDIA Omniverse assets and Isaac Sim
Curate and evaluate outputs using NeMo Curator
Iterate on hyperparameters and distill models for edge deployment
Deploy optimized models on Jetson hardware via containerized Docker images

The Docker containerization is central to usability. Both Cosmos and the Transformer Engine are fully containerized for plug-and-play deployment on Jetson AGX Orin, eliminating the dependency conflicts that historically made edge AI setup painful.

Cosmos Reason vs. Competing VLMs on Jetson

Jetson AGX Thor benchmarks provide a useful baseline for comparing model performance across the VLM landscape:

Model	Tokens/Sec (AGX Thor)
Qwen2.5-VL 7B	45 tok/s
LLaMA 3.2 11B Vision	26.31 tok/s
Cosmos Reason2-2B (W4A16, Orin Nano)	~16 to 17 tok/s

Cosmos Reason2-2B is a smaller model optimized for constrained hardware, so lower token throughput on Orin Nano is expected. The relevant comparison is not raw speed but capability at the hardware tier. Cosmos Reason2 leads Hugging Face’s Physical AI Bench leaderboard, a benchmark specifically designed for physical-world reasoning, which general-purpose VLMs do not target.

Who Is Actually Using This

The adoption signal from industry is concrete. At the Cosmos platform launch at CES 2025, the confirmed first-wave adopters included 1X, Agile Robots, Agility, Figure AI, Foretellix, Fourier, Galbot, Hillbot, IntBot, NEURA Robotics, Skild AI, Virtual Incision, Waabi, XPENG, and ridesharing giant Uber. At CES 2026, Boston Dynamics, Caterpillar, Franka, LG Electronics, and additional NEURA Robotics deployments were also unveiled as next-generation robots built on NVIDIA technology.

These organizations are using Cosmos alongside NVIDIA Isaac for robotics and NVIDIA Omniverse for simulation, building end-to-end pipelines from synthetic data generation through to deployed edge models. Uber specifically is combining its rich driving datasets with the Cosmos platform and NVIDIA DGX Cloud to help autonomous vehicle partners build stronger AI models.

Practical Deployment Challenges

Running Cosmos on memory-constrained Jetson hardware involves real trade-offs that developers should understand before committing to the platform.

Slow inference on Orin: Cosmos Reason2-2B on Orin hardware prioritizes fitting in memory over inference speed. Reducing max_tokens in the WebUI configuration reduces latency at the cost of response length.

RAM management: Configurations on Orin Nano require setting GPU memory utilization to approximately 0.60 and limiting max model length to 2048 tokens to avoid memory exhaustion.

Storage requirements: VLM containers require approximately 50GB of storage, with the container consuming around 20GB and the default model adding 32.3GB. External M.2 SSD installation is recommended before deployment.

Model limitations: NVIDIA’s own documentation acknowledges that Cosmos models can exhibit object impermanence artifacts and implausible physics behaviors, particularly in scenarios involving gravity and inertia. These are active areas of model improvement.

The Broader Physical AI Context

NVIDIA announced a new wave of open physical AI models and frameworks at CES 2026 on January 4, 2026, alongside partner announcements of next-generation robots across industry sectors. Cosmos Reason2 sits at the center of this initiative, providing the world-model reasoning foundation that downstream robotics and autonomous vehicle systems build on.

The platform’s integration with Omniverse creates a comprehensive development ecosystem: synthetic data generation in simulation, curated training pipelines via NeMo, and edge deployment via Jetson. NVIDIA has also collaborated with Hugging Face to integrate its Cosmos models into the LeRobot open-source robotics framework, expanding the accessible developer base significantly.

For developers in India and the US building on Jetson hardware, the practical takeaway is direct. A Jetson Orin Nano Super, the most affordable entry point in the lineup, can now run a quantized Cosmos Reason2 model capable of processing video from a robot camera and generating reasoned responses about what it observes. That capability was not available on sub-$500 hardware before the W4A16 quantization work completed in February 2026.

Considerations

Cosmos on constrained Jetson hardware involves genuine limitations. Inference speed on Orin Nano (16 to 17 tok/s) is adequate for non-real-time analysis but insufficient for latency-sensitive robotics control loops requiring sub-100ms response cycles. Full-precision Cosmos models still require AGX Thor or DGX-class hardware. Quantization improves accessibility but introduces accuracy trade-offs that require validation against each specific deployment scenario before production use.

Frequently Asked Questions (FAQs)

What Jetson devices can run NVIDIA Cosmos Reason2 in 2026?

The full Jetson lineup supports Cosmos Reason2-2B after W4A16 quantization, including the Orin Nano 8GB Super. Full-precision Cosmos models target the AGX Thor and higher-tier Orin devices. AGX Thor provides the highest throughput, benefiting from up to 3.5x vLLM performance improvements deployed in late 2025.

When was Cosmos Reason2 announced and what is new in it?

Cosmos Reason2 was announced at CES 2026 on January 4, 2026. It adds enhanced physical reasoning, spatio-temporal understanding, 2B and 8B model size options, long-context support up to 256K tokens, and object detection with 2D and 3D point localizations. It tops the Physical AI Bench leaderboard on Hugging Face.

How fast is Cosmos Reason2-2B on Jetson Orin Nano Super?

Running the W4A16 quantized version on the Orin Nano Super 8GB, the model achieves approximately 16 to 17 tokens per second across text, image, and video inference tasks. This configuration uses approximately 5.8GB of the device’s unified memory with max model length set to 2048 tokens.

What are the three Cosmos model types and what does each do?

Cosmos Predict generates future world states as video for navigation and manipulation planning. Cosmos Transfer converts 3D simulation output into photorealistic video to close the sim-to-real gap. Cosmos Reason applies chain-of-thought visual reasoning for scene understanding, object annotation, and action planning. All three are available on Hugging Face under an open model license.

Which companies have adopted NVIDIA Cosmos?

Confirmed first-wave adopters at CES 2025 include 1X, Agile Robots, Agility, Figure AI, Foretellix, Fourier, Galbot, Hillbot, IntBot, NEURA Robotics, Skild AI, Virtual Incision, Waabi, XPENG, and Uber. At CES 2026, Boston Dynamics, Caterpillar, Franka, and LG Electronics unveiled robots built on NVIDIA technology.

How does Cosmos address the training data problem for physical AI?

Cosmos generates photorealistic, physics-consistent synthetic data at scale using NVIDIA Omniverse and Isaac Sim, eliminating dependence on expensive real-world data collection. This synthetic data pipeline connects directly to NeMo Curator for curation and validation before being used in model training pipelines.

What are the main limitations of running Cosmos on edge hardware?

Inference at 16 to 17 tok/s on Orin Nano is insufficient for real-time robotics control loops. Full storage requirements exceed 50GB. GPU memory utilization must be capped at 0.60 on Orin Nano to avoid crashes. Object impermanence artifacts and occasional physics inconsistencies remain known issues in current model versions.

Search for an article

NVIDIA Cosmos on Jetson: World Foundation Models Now Run on Edge Hardware