OpenAI Scales PostgreSQL to 800M Users: Infrastructure Deep-Dive

Quick Brief

The Infrastructure: OpenAI supports 800 million ChatGPT users with one Azure PostgreSQL primary instance and 50 read replicas handling millions of queries per second (QPS)
The Challenge: Database load increased 10x in 12 months, requiring extensive optimizations to avoid cascading failures during traffic spikes
The Impact: Demonstrates PostgreSQL can power hyperscale applications with five-nines availability and low double-digit millisecond p99 latency
Strategic Shift: OpenAI migrates write-heavy workloads to Azure CosmosDB while maintaining PostgreSQL for read-heavy operations

OpenAI revealed January 22, 2026, that its PostgreSQL infrastructure now powers 800 million ChatGPT users through a single primary Azure PostgreSQL flexible server instance paired with nearly 50 geo-distributed read replicas. Bohan Zhang, Member of the Technical Staff at OpenAI, disclosed the architecture sustains millions of queries per second while maintaining five-nines availability despite database load growing more than 10x over the past year.

Architecture: Single-Primary PostgreSQL at Hyperscale

OpenAI’s production database architecture contradicts conventional wisdom about distributed systems scalability. The company operates a single primary Azure PostgreSQL flexible server instance that handles all write operations, while approximately 50 read replicas distributed across multiple geographic regions serve the vast majority of read traffic. This configuration supports ChatGPT and OpenAI’s API platform with consistent low double-digit millisecond p99 client-side latency.

The system achieves this performance through aggressive read offloading strategies. OpenAI engineers ensured even critical requests that previously ran on the primary now execute on replicas, reducing the single point of failure risk. While write operations would fail during primary outages, the majority of user-facing requests continue functioning, downgrading potential SEV0 incidents to lower severity levels.

The primary instance runs in High-Availability (HA) mode with a hot standby, a continuously synchronized replica ready for immediate promotion during failures or maintenance windows. Azure PostgreSQL’s team developed failover mechanisms that remain stable under extreme load conditions, according to OpenAI’s disclosure.

Engineering Challenges: 10x Load Growth and MVCC Limitations

OpenAI encountered multiple severe incidents (SEVs) following predictable patterns: upstream failures triggering cache misses, expensive multi-way joins saturating CPU, or write storms from new feature launches. These events caused resource utilization spikes, elevated query latency, and timeout-driven retry amplification that threatened ChatGPT and API service availability.

PostgreSQL’s multiversion concurrency control (MVCC) implementation emerged as a critical bottleneck for write-heavy workloads. The database copies entire rows when updating even a single field, creating new tuple versions that cause significant write and read amplification. Zhang and Carnegie Mellon University Professor Andy Pavlo previously documented these issues in their blog post “The Part of PostgreSQL We Hate the Most,” now cited in PostgreSQL’s Wikipedia page.

MVCC’s limitations manifest through table and index bloat, increased index maintenance overhead, and complex autovacuum tuning requirements. One particularly expensive query joining 12 tables was responsible for multiple high-severity SEVs before engineers decomposed it into application-layer logic.

Optimization Strategy: Eight Critical Interventions

Challenge	Solution	Impact
Write Bottlenecks	Migrated shardable workloads to Azure CosmosDB; enforced strict rate limits on backfills	Reduced primary write pressure; enabled sufficient headroom
Expensive Queries	Eliminated 12-table joins; moved complex logic to application layer; ORM-generated SQL review	Prevented CPU saturation from query spikes
Connection Exhaustion	Deployed PgBouncer with transaction pooling; reduced connection time from 50ms to 5ms	Efficiently reused 5,000-connection limit
Cache Miss Storms	Implemented cache locking mechanism single reader fetches per key during misses	Protected database from redundant read surges
Replica Scaling Limits	Testing cascading replication with Azure to support 100+ replicas without overwhelming primary	Future-proofs read scaling architecture
Noisy Neighbor Problem	Isolated workloads into dedicated instances with high/low priority tiers	Prevented cross-product performance degradation
Schema Change Risks	Enforced 5-second timeout; prohibited table rewrites; rate-limited backfills taking over a week	Avoided full table rewrites disrupting production
Traffic Spikes	Multi-layer rate limiting at application, pooler, proxy, and query levels; ORM-level query blocking	Enabled targeted load shedding during surges

The caching strategy proved particularly critical. When cache hit rates drop unexpectedly, only one request per missed key acquires a lock to fetch data from PostgreSQL, while others wait for cache updates rather than hammering the database simultaneously.

AdwaitX Analysis: Centralized vs. Distributed Database Economics

OpenAI’s decision to maintain a single-primary architecture rather than shard PostgreSQL reveals strategic infrastructure priorities. The company determined that sharding existing application workloads would require modifying hundreds of endpoints and consume months or years of engineering time. Since read-heavy operations dominate the workload profile, the current architecture provides an “ample runway” for continued growth without near-term sharding plans.

This approach challenges the distributed-by-default mentality prevalent in cloud-native architectures. While companies like Timescale promote read replica sets and horizontal scaling solutions for PostgreSQL, OpenAI demonstrates that vertical scaling combined with strategic read distribution can support applications at the upper boundary of global user bases.

The write-heavy workload migration to Azure CosmosDB represents a hybrid strategy leveraging sharded systems where horizontal partitioning makes sense while avoiding the complexity cost of sharding the core PostgreSQL deployment. OpenAI’s data indicates write-heavy workloads that are difficult to shard remain the primary technical debt requiring ongoing migration efforts.

Technical Performance Metrics and Future Roadmap

OpenAI’s PostgreSQL infrastructure consistently delivers five-nines availability (99.999% uptime) in production. The system maintains near-zero replication lag across nearly 50 read replicas despite the primary streaming Write Ahead Log (WAL) data to every replica instance.

Over the past 12 months, OpenAI experienced only one SEV-0 PostgreSQL incident during ChatGPT ImageGen’s viral launch when write traffic surged more than 10x as over 100 million new users registered within one week. This incident rate demonstrates the robustness of implemented optimizations despite supporting a user base that grew from 700 million in September 2025 to 800 million by early 2026.

The cascading replication architecture under development with Azure’s PostgreSQL team addresses the primary’s WAL streaming bottleneck. This topology allows intermediate replicas to relay WAL to downstream replicas, potentially supporting over 100 read replicas without overloading the primary. However, OpenAI acknowledges this introduces operational complexity, particularly around failover management, and requires extensive testing before production deployment.

Strategic Implications for Enterprise Database Planning

OpenAI’s disclosure provides a validated reference architecture for enterprises evaluating PostgreSQL at scale. The company’s willingness to maintain schema change restrictions including a strict 5-second timeout and prohibition of new tables in PostgreSQL demonstrates the trade-offs required for operational stability at hyperscale.

The engineering team’s emphasis on ORM-generated SQL review highlights a persistent challenge in modern application development. Frameworks frequently generate inefficient queries, and OpenAI’s experience with 12-table joins causing SEVs underscores the importance of database query observability in production systems.

AdwaitX research indicates ChatGPT’s user base continues accelerating toward OpenAI’s projected 1 billion users in 2026. The company’s statement about “sufficient runway for current and future growth” suggests confidence in the current architecture supporting this expansion without fundamental redesign.

Frequently Asked Questions (FAQs)

How many users does ChatGPT currently support?

ChatGPT serves 800 million users globally as of January 2026, supported by OpenAI’s PostgreSQL infrastructure handling millions of queries per second.

What database architecture does OpenAI use for ChatGPT?

OpenAI operates one Azure PostgreSQL primary instance for writes and approximately 50 geo-distributed read replicas, achieving five-nines availability with low latency.

Why doesn’t OpenAI shard its PostgreSQL database?

Sharding would require modifying hundreds of application endpoints and take months to years, while read-heavy workloads perform well with the current architecture.

What caused OpenAI’s only PostgreSQL SEV-0 incident in 2024?

ChatGPT ImageGen’s viral launch triggered write traffic surging over 10x when more than 100 million users signed up within one week.

How does OpenAI prevent PostgreSQL connection exhaustion?

PgBouncer with transaction pooling reduces active connections and cuts connection setup time from 50 milliseconds to 5 milliseconds.

Search for an article

OpenAI Deploys PostgreSQL at Unprecedented Scale: 800 Million ChatGPT Users on Single-Primary Architecture