SAN FRANCISCO – Every time someone types a question into ChatGPT, OpenAI pays for the answer. The electricity, the servers, the specialised chips – for three years, the cost of that compute has been Nvidia’s to set. A custom silicon chip called Jalapeño, announced Tuesday, is the most direct attempt yet by a major AI lab to change that equation.
OpenAI and semiconductor company Broadcom announced Jalapeño on June 24 – OpenAI’s first application-specific integrated circuit, designed from scratch for the single purpose of running large language models at scale. The chip will be manufactured by TSMC on its 3-nanometer process node, the same generation used in Apple’s latest mobile processors, with prototype deployments targeted for the end of 2026.
The architecture is a departure from how AI inference has worked until now. Nvidia’s GPUs – still the dominant compute substrate across the AI industry – were originally designed for graphics and adapted for machine learning over two decades of iteration. Jalapeño has no prior life as anything else. It features eight stacks of high-bandwidth memory arranged around a systolic array core optimised for the matrix multiplications large language models perform when generating text, with none of the overhead that comes from a general-purpose processor architecture carrying capabilities the workload does not need.
Broadcom CEO Hock Tan said the chip targets roughly 50 percent lower cost per inference token compared with current GPU alternatives. That claim warrants scrutiny. The 50 percent figure comes from internal benchmarking Broadcom and OpenAI conducted against their own existing GPU workloads – a comparison that may not hold against Nvidia’s latest Blackwell architecture or against the Vera Rubin generation Nvidia is planning for 2027. No independent verification of the cost figures has been published. A company announcing savings from silicon it has not yet deployed at production scale should be read as a projection, not a performance guarantee.
What is harder to dismiss is the timeline. OpenAI President Greg Brockman told CNBC that Jalapeño reached TSMC tape-out in nine months – a pace that chip engineers have called unprecedented for advanced ASIC silicon at the 3nm node. Brockman said OpenAI used its own AI models to accelerate parts of the chip design and verification process, embedding AI-assisted tooling into the development cycle itself. If that claim survives independent technical review, it is a meaningful signal about how quickly AI labs can now iterate on hardware, and what that speed implies for how fast the chip landscape could shift.

The strategic logic runs deeper than cost reduction. OpenAI has described compute access as its primary operational constraint for years. Every token the company serves today passes through Nvidia hardware, which means Nvidia indirectly sets a floor on what inference costs, and what ChatGPT must charge to cover it. Jalapeño is a bet that custom silicon – designed around the specific workload OpenAI actually runs rather than the general-purpose workload Nvidia built for – can provide the same output at lower power and capital cost. Whether that bet holds at the scale OpenAI needs, gigawatts of inference capacity deployed across data centres in 2027 and 2028, is a question neither company is in a position to answer yet.
The chip’s memory architecture connects directly to a shortage that has restructured the economics of the AI industry. Demand for high-bandwidth memory – the specialised stacked chip package that lets processors keep pace with modern large language models – has outrun supply for two consecutive years, contributing to what Micron Technology described as the most profitable quarter in semiconductor memory history. Jalapeño’s eight HBM stacks are designed to extract more inference per dollar from memory that is already expensive and constrained.
OpenAI is not the first AI company to build its own silicon. Google has operated custom tensor processing units for years. Amazon’s Trainium chips run inference inside AWS. Meta has built accelerators for its own model serving. But those companies are primarily infrastructure providers that sell compute to others. OpenAI’s business is selling AI outputs directly to users and developers, and the dependence on outside silicon is proportionally more direct. As the broader AI compute competition accelerates – with Google and Anthropic trading researchers and infrastructure investments at an increasing pace – OpenAI’s decision to build Jalapeño signals that the lab intends to compete on hardware as well as model quality.
For Broadcom, the partnership extends an existing custom chip business that already serves Google, Meta, and ByteDance with co-designed AI accelerators. Broadcom’s semiconductor design expertise and its relationships with TSMC provided the manufacturing path that OpenAI could not have built alone in nine months. Tan described the effort as “a fundamental commitment to scaling the physical infrastructure required for the next decade of AI” – language that positions Broadcom as a strategic infrastructure partner rather than a contract manufacturer, and suggests the company expects the OpenAI relationship to extend well beyond a single chip generation.
What Jalapeño will actually deliver in a production data centre, against real ChatGPT inference workloads, will not be known until well into 2027. The chip’s claimed advantages were measured against benchmarks designed by the same companies claiming the advantage. Production scale introduces thermal management challenges, yield variability, and software compatibility questions that do not appear in launch presentations. Nvidia’s software ecosystem – CUDA – has been hardened over a decade of widespread deployment. TechCrunch noted that OpenAI has not yet described what the software stack for Jalapeño looks like. Jalapeño is starting from zero on that dimension.
The announcement is significant. The proof will come from data centres, under load, after the press cycle has moved on.

