Quick Brief
- The Infrastructure: OpenAI supports 800 million ChatGPT users with one Azure PostgreSQL primary instance and 50 read replicas handling millions of queries per second (QPS)
- The Challenge: Database load increased 10x in 12 months, requiring extensive optimizations to avoid cascading failures during traffic spikes
- The Impact: Demonstrates PostgreSQL can power hyperscale applications with five-nines availability and low double-digit millisecond p99 latency
- Strategic Shift: OpenAI migrates write-heavy workloads to Azure CosmosDB while maintaining PostgreSQL for read-heavy operations
OpenAI revealed January 22, 2026, that its PostgreSQL infrastructure now powers 800 million ChatGPT users through a single primary Azure PostgreSQL flexible server instance paired with nearly 50 geo-distributed read replicas. Bohan Zhang, Member of the Technical Staff at OpenAI, disclosed the architecture sustains millions of queries per second while maintaining five-nines availability despite database load growing more than 10x over the past year.
Architecture: Single-Primary PostgreSQL at Hyperscale
OpenAI’s production database architecture contradicts conventional wisdom about distributed systems scalability. The company operates a single primary Azure PostgreSQL flexible server instance that handles all write operations, while approximately 50 read replicas distributed across multiple geographic regions serve the vast majority of read traffic. This configuration supports ChatGPT and OpenAI’s API platform with consistent low double-digit millisecond p99 client-side latency.
The system achieves this performance through aggressive read offloading strategies. OpenAI engineers ensured even critical requests that previously ran on the primary now execute on replicas, reducing the single point of failure risk. While write operations would fail during primary outages, the majority of user-facing requests continue functioning, downgrading potential SEV0 incidents to lower severity levels.
The primary instance runs in High-Availability (HA) mode with a hot standby, a continuously synchronized replica ready for immediate promotion during failures or maintenance windows. Azure PostgreSQL’s team developed failover mechanisms that remain stable under extreme load conditions, according to OpenAI’s disclosure.
Engineering Challenges: 10x Load Growth and MVCC Limitations
OpenAI encountered multiple severe incidents (SEVs) following predictable patterns: upstream failures triggering cache misses, expensive multi-way joins saturating CPU, or write storms from new feature launches. These events caused resource utilization spikes, elevated query latency, and timeout-driven retry amplification that threatened ChatGPT and API service availability.
PostgreSQL’s multiversion concurrency control (MVCC) implementation emerged as a critical bottleneck for write-heavy workloads. The database copies entire rows when updating even a single field, creating new tuple versions that cause significant write and read amplification. Zhang and Carnegie Mellon University Professor Andy Pavlo previously documented these issues in their blog post “The Part of PostgreSQL We Hate the Most,” now cited in PostgreSQL’s Wikipedia page.
MVCC’s limitations manifest through table and index bloat, increased index maintenance overhead, and complex autovacuum tuning requirements. One particularly expensive query joining 12 tables was responsible for multiple high-severity SEVs before engineers decomposed it into application-layer logic.
Optimization Strategy: Eight Critical Interventions
| Challenge | Solution | Impact |
|---|---|---|
| Write Bottlenecks | Migrated shardable workloads to Azure CosmosDB; enforced strict rate limits on backfills | Reduced primary write pressure; enabled sufficient headroom |
| Expensive Queries | Eliminated 12-table joins; moved complex logic to application layer; ORM-generated SQL review | Prevented CPU saturation from query spikes |
| Connection Exhaustion | Deployed PgBouncer with transaction pooling; reduced connection time from 50ms to 5ms | Efficiently reused 5,000-connection limit |
| Cache Miss Storms | Implemented cache locking mechanism single reader fetches per key during misses | Protected database from redundant read surges |
| Replica Scaling Limits | Testing cascading replication with Azure to support 100+ replicas without overwhelming primary | Future-proofs read scaling architecture |
| Noisy Neighbor Problem | Isolated workloads into dedicated instances with high/low priority tiers | Prevented cross-product performance degradation |
| Schema Change Risks | Enforced 5-second timeout; prohibited table rewrites; rate-limited backfills taking over a week | Avoided full table rewrites disrupting production |
| Traffic Spikes | Multi-layer rate limiting at application, pooler, proxy, and query levels; ORM-level query blocking | Enabled targeted load shedding during surges |
The caching strategy proved particularly critical. When cache hit rates drop unexpectedly, only one request per missed key acquires a lock to fetch data from PostgreSQL, while others wait for cache updates rather than hammering the database simultaneously.
AdwaitX Analysis: Centralized vs. Distributed Database Economics
OpenAI’s decision to maintain a single-primary architecture rather than shard PostgreSQL reveals strategic infrastructure priorities. The company determined that sharding existing application workloads would require modifying hundreds of endpoints and consume months or years of engineering time. Since read-heavy operations dominate the workload profile, the current architecture provides an “ample runway” for continued growth without near-term sharding plans.
This approach challenges the distributed-by-default mentality prevalent in cloud-native architectures. While companies like Timescale promote read replica sets and horizontal scaling solutions for PostgreSQL, OpenAI demonstrates that vertical scaling combined with strategic read distribution can support applications at the upper boundary of global user bases.
The write-heavy workload migration to Azure CosmosDB represents a hybrid strategy leveraging sharded systems where horizontal partitioning makes sense while avoiding the complexity cost of sharding the core PostgreSQL deployment. OpenAI’s data indicates write-heavy workloads that are difficult to shard remain the primary technical debt requiring ongoing migration efforts.
Technical Performance Metrics and Future Roadmap
OpenAI’s PostgreSQL infrastructure consistently delivers five-nines availability (99.999% uptime) in production. The system maintains near-zero replication lag across nearly 50 read replicas despite the primary streaming Write Ahead Log (WAL) data to every replica instance.
Over the past 12 months, OpenAI experienced only one SEV-0 PostgreSQL incident during ChatGPT ImageGen’s viral launch when write traffic surged more than 10x as over 100 million new users registered within one week. This incident rate demonstrates the robustness of implemented optimizations despite supporting a user base that grew from 700 million in September 2025 to 800 million by early 2026.
The cascading replication architecture under development with Azure’s PostgreSQL team addresses the primary’s WAL streaming bottleneck. This topology allows intermediate replicas to relay WAL to downstream replicas, potentially supporting over 100 read replicas without overloading the primary. However, OpenAI acknowledges this introduces operational complexity, particularly around failover management, and requires extensive testing before production deployment.
Strategic Implications for Enterprise Database Planning
OpenAI’s disclosure provides a validated reference architecture for enterprises evaluating PostgreSQL at scale. The company’s willingness to maintain schema change restrictions including a strict 5-second timeout and prohibition of new tables in PostgreSQL demonstrates the trade-offs required for operational stability at hyperscale.
The engineering team’s emphasis on ORM-generated SQL review highlights a persistent challenge in modern application development. Frameworks frequently generate inefficient queries, and OpenAI’s experience with 12-table joins causing SEVs underscores the importance of database query observability in production systems.
AdwaitX research indicates ChatGPT’s user base continues accelerating toward OpenAI’s projected 1 billion users in 2026. The company’s statement about “sufficient runway for current and future growth” suggests confidence in the current architecture supporting this expansion without fundamental redesign.
Frequently Asked Questions (FAQs)
How many users does ChatGPT currently support?
ChatGPT serves 800 million users globally as of January 2026, supported by OpenAI’s PostgreSQL infrastructure handling millions of queries per second.
What database architecture does OpenAI use for ChatGPT?
OpenAI operates one Azure PostgreSQL primary instance for writes and approximately 50 geo-distributed read replicas, achieving five-nines availability with low latency.
Why doesn’t OpenAI shard its PostgreSQL database?
Sharding would require modifying hundreds of application endpoints and take months to years, while read-heavy workloads perform well with the current architecture.
What caused OpenAI’s only PostgreSQL SEV-0 incident in 2024?
ChatGPT ImageGen’s viral launch triggered write traffic surging over 10x when more than 100 million users signed up within one week.
How does OpenAI prevent PostgreSQL connection exhaustion?
PgBouncer with transaction pooling reduces active connections and cuts connection setup time from 50 milliseconds to 5 milliseconds.

