OpenAI and Broadcom Unveil Jalapeño Inference Chip

SAN FRANCISCO – Every time someone types a question into ChatGPT, OpenAI pays for the answer. The electricity, the servers, the specialised chips – for three years, the cost of that compute has been Nvidia’s to set. A custom silicon chip called Jalapeño, announced Tuesday, is the most direct attempt yet by a major AI lab to change that equation.

OpenAI and semiconductor company Broadcom announced Jalapeño on June 24 – OpenAI’s first application-specific integrated circuit, designed from scratch for the single purpose of running large language models at scale. The chip will be manufactured by TSMC on its 3-nanometer process node, the same generation used in Apple’s latest mobile processors, with prototype deployments targeted for the end of 2026.

We’ve designed and built our first AI chip: Jalapeño.

Designed from the ground up by OpenAI and brought to production with @Broadcom, Jalapeño is purpose-built for the LLM workloads powering ChatGPT, Codex, the API, and future agentic products.

Chips are foundational to the AI… pic.twitter.com/mHU7DaMMTi
— OpenAI (@OpenAI) June 24, 2026

The architecture is a departure from how AI inference has worked until now. Nvidia’s GPUs – still the dominant compute substrate across the AI industry – were originally designed for graphics and adapted for machine learning over two decades of iteration. Jalapeño has no prior life as anything else. It features eight stacks of high-bandwidth memory arranged around a systolic array core optimised for the matrix multiplications large language models perform when generating text, with none of the overhead that comes from a general-purpose processor architecture carrying capabilities the workload does not need.

Broadcom CEO Hock Tan said the chip targets roughly 50 percent lower cost per inference token compared with current GPU alternatives. That claim warrants scrutiny. The 50 percent figure comes from internal benchmarking Broadcom and OpenAI conducted against their own existing GPU workloads – a comparison that may not hold against Nvidia’s latest Blackwell architecture or against the Vera Rubin generation Nvidia is planning for 2027. No independent verification of the cost figures has been published. A company announcing savings from silicon it has not yet deployed at production scale should be read as a projection, not a performance guarantee.

What is harder to dismiss is the timeline. OpenAI President Greg Brockman told CNBC that Jalapeño reached TSMC tape-out in nine months – a pace that chip engineers have called unprecedented for advanced ASIC silicon at the 3nm node. Brockman said OpenAI used its own AI models to accelerate parts of the chip design and verification process, embedding AI-assisted tooling into the development cycle itself. If that claim survives independent technical review, it is a meaningful signal about how quickly AI labs can now iterate on hardware, and what that speed implies for how fast the chip landscape could shift.

A 300mm silicon wafer of the type used in manufacturing the Jalapeño AI inference chip at TSMC’s 3nm foundry — A 300mm silicon wafer of the type used to manufacture the Jalapeño chip at TSMC’s 3nm foundry. Broadcom CEO Hock Tan handed the first production wafer to OpenAI’s Sam Altman and Greg Brockman at the June 24 announcement. [Image Source: Wikimedia Commons / CC BY-SA]

The strategic logic runs deeper than cost reduction. OpenAI has described compute access as its primary operational constraint for years. Every token the company serves today passes through Nvidia hardware, which means Nvidia indirectly sets a floor on what inference costs, and what ChatGPT must charge to cover it. Jalapeño is a bet that custom silicon – designed around the specific workload OpenAI actually runs rather than the general-purpose workload Nvidia built for – can provide the same output at lower power and capital cost. Whether that bet holds at the scale OpenAI needs, gigawatts of inference capacity deployed across data centres in 2027 and 2028, is a question neither company is in a position to answer yet.

The chip’s memory architecture connects directly to a shortage that has restructured the economics of the AI industry. Demand for high-bandwidth memory – the specialised stacked chip package that lets processors keep pace with modern large language models – has outrun supply for two consecutive years, contributing to what Micron Technology described as the most profitable quarter in semiconductor memory history. Jalapeño’s eight HBM stacks are designed to extract more inference per dollar from memory that is already expensive and constrained.

OpenAI is not the first AI company to build its own silicon. Google has operated custom tensor processing units for years. Amazon’s Trainium chips run inference inside AWS. Meta has built accelerators for its own model serving. But those companies are primarily infrastructure providers that sell compute to others. OpenAI’s business is selling AI outputs directly to users and developers, and the dependence on outside silicon is proportionally more direct. As the broader AI compute competition accelerates – with Google and Anthropic trading researchers and infrastructure investments at an increasing pace – OpenAI’s decision to build Jalapeño signals that the lab intends to compete on hardware as well as model quality.

For Broadcom, the partnership extends an existing custom chip business that already serves Google, Meta, and ByteDance with co-designed AI accelerators. Broadcom’s semiconductor design expertise and its relationships with TSMC provided the manufacturing path that OpenAI could not have built alone in nine months. Tan described the effort as “a fundamental commitment to scaling the physical infrastructure required for the next decade of AI” – language that positions Broadcom as a strategic infrastructure partner rather than a contract manufacturer, and suggests the company expects the OpenAI relationship to extend well beyond a single chip generation.

What Jalapeño will actually deliver in a production data centre, against real ChatGPT inference workloads, will not be known until well into 2027. The chip’s claimed advantages were measured against benchmarks designed by the same companies claiming the advantage. Production scale introduces thermal management challenges, yield variability, and software compatibility questions that do not appear in launch presentations. Nvidia’s software ecosystem – CUDA – has been hardened over a decade of widespread deployment. TechCrunch noted that OpenAI has not yet described what the software stack for Jalapeño looks like. Jalapeño is starting from zero on that dimension.

The announcement is significant. The proof will come from data centres, under load, after the press cycle has moved on.

OpenAI and Broadcom Unveil Jalapeño, a Chip Built to Halve Inference Costs

Related Posts

Technology Desk

Leave a ReplyCancel reply

Google Releases Gemini 3.6 Flash and Two New AI Models Prioritizing Speed and Cost

Bessent Warns US May Sanction China Over AI Model IP Theft as Kimi K3 Draws Scrutiny

Tesla Q2 Earnings Face Cash Burn Test as AI Spending Devours Free Cash Flow

Sony Music Files Second Copyright Lawsuit Against Udio Over 30,000 Songs

China’s Moonshot AI Pauses Kimi K3 Subscriptions as Demand Overwhelms GPU Capacity

Judge Approves Anthropic’s $1.5B Copyright Deal, But Fair Use Ruling May Cost Authors

European Parliament Builds Internal AI Platform with OpenAI, Anthropic, Meta and Mistral

Netflix Pays $587 Million for Ben Affleck’s AI Filmmaking Startup InterPositive

IBM’s 25% Crash Exposes What AI Hardware Spending Is Doing to Enterprise Software

Apple Names OpenAI’s Hardware Chief in Trade Secrets Lawsuit and Leaves Jony Ive Out

OpenAI and Broadcom Unveil Jalapeño, a Chip Built to Halve Inference Costs

Related Posts

Leave a ReplyCancel reply

Latest from Artificial Intelligence

Don't Miss