OpenAI's Jalapeño Chip: Custom AI Inference in Nine Months

SAN FRANCISCO – For years, OpenAI trained and deployed the world’s most powerful language models on chips manufactured by companies that also supplied its competitors. On Tuesday, the company took the first step toward changing that.

OpenAI and Broadcom jointly unveiled Jalapeño, the code name for OpenAI’s first custom AI chip, formally called the Intelligence Processor. It is a reticle-sized application-specific integrated circuit built for one task: running large language model inference at the scale OpenAI now operates, TechCrunch reported.

Jalapeño is not a GPU replacement. It is a purpose-built inference ASIC, meaning it cannot train models and was never designed to.

The chip targets three specific inefficiencies in how GPU clusters handle inference workloads: the cost of moving data between compute units and memory, the imbalance between computational throughput and memory bandwidth amid an industrywide DRAM shortage, and the overhead of networking chips together at rack scale. Solving those three problems for LLM inference specifically, rather than the broader range of tasks a GPU must handle, is the architecture’s core bet.

OpenAI still depends on NVIDIA hardware for model training and has placed orders for NVIDIA’s next-generation Rubin-class GPUs. What changes is the inference layer, where the economics of running hundreds of millions of ChatGPT requests daily on GPU hardware have become increasingly difficult to justify at scale.

Bloomberg Tech coverage of OpenAI Jalapeño custom inference chip with Broadcom — Bloomberg Tech reports on OpenAI’s Jalapeño chip development with Broadcom, June 2026. [Image Source: Bloomberg]

The development timeline is what makes the Jalapeño announcement unusual. The chip went from initial design to completed tape-out in nine months, a pace that engineers familiar with high-performance semiconductor development described as possibly the fastest ASIC cycle ever recorded in that class, according to Tom’s Hardware.

OpenAI accelerated the design process using its own AI models, VentureBeat reported. The company did not specify which models were used or how AI was integrated into the chip design workflow, but the nine-month figure suggests a materially different approach than traditional ASIC cycles, which typically run two to four years for comparable hardware.

A physical sample of the chip was delivered to OpenAI on June 25, one day after the public announcement. The compressed overall timeline, not the delivery itself, is what distinguishes this development cycle from anything the semiconductor industry has seen at this performance tier.

Broadcom designed Jalapeño’s silicon and is handling the networking fabric that connects chips together at rack scale. Celestica, the Canadian contract electronics manufacturer, is responsible for board-level integration and full rack assembly.

The division of labor mirrors the model hyperscalers have used for years: the AI company defines workload requirements, the chip design partner translates them into architecture, and a contract manufacturer assembles the system-level hardware. OpenAI has not disclosed which TSMC process node the chip uses or its transistor count.

The chip is reticle-sized, meaning it uses the largest die area achievable in a single lithography exposure, consistent with the memory bandwidth and compute density targets the companies described. It will require high-bandwidth memory from suppliers whose own production investments have reshaped the sector; SK Hynix, which controls roughly 60 percent of the HBM market, this week announced a $29 billion Wall Street listing to fund the next phase of that expansion.

The deployment plan is tied to OpenAI’s infrastructure partnership with Microsoft. Earlier in 2026, the two companies announced plans for gigawatt-scale AI data centers beginning this year, facilities representing the largest concentration of AI compute ever deployed at a single architecture scale.

Jalapeño is expected to be a core component of those deployments, with first production rollout targeted for the end of 2026. Microsoft’s Azure infrastructure would host the chips in purpose-built racks designed around the Jalapeño architecture.

The economic logic is straightforward. OpenAI processes hundreds of millions of ChatGPT requests daily; running those requests on NVIDIA GPUs carries a per-inference cost set by a hardware market OpenAI does not control. A purpose-built chip capturing even a portion of that inference volume shifts billions of dollars in annual economics in the company’s direction.

NVIDIA retains OpenAI’s training workloads, the company’s orders for Rubin-class GPUs, and a broader market extending far beyond a single customer. What it loses at the margin is inference volume from one of the world’s highest-throughput AI deployments, as that volume migrates to dedicated hardware.

OpenAI now joins Google, Meta, Amazon, and Microsoft, all of which have built or are building custom inference hardware, in the group of AI companies that have concluded the investment in custom silicon justifies the engineering overhead. The difference with Jalapeño is the timeline: nine months, versus the years those companies spent on early iterations of their own chips.

The question OpenAI has not answered is whether Jalapeño will be made available to external customers through Azure or remain a dedicated internal accelerator. If the chip becomes a cloud product, the economics shift again, and NVIDIA’s position in the inference market faces a challenge from a direction it has not had to defend before.