ai compute bottlenecks chips power and deployment lag

constraint stack

The AI rollout story has a physical infrastructure problem underneath the narrative. Here’s the constraint hierarchy I track:

Silicon — NVIDIA H100/H200 supply has improved but HBM memory remains the binding constraint. More allocation goes to inference and training runs, but the queue for new allocation is still measured in months. Custom silicon (Groq, TPUs, Trainium) is gaining share in inference specifically, but the training market is still H100-dominated.

Power — the most underappreciated constraint. A single H100 cluster at meaningful scale draws power comparable to a small data center. The grid capacity in key markets (Virginia, Texas, Ireland, Singapore) is becoming a genuine bottleneck. This isn’t theoretical — data center power procurement delays are already pushing out deployment timelines.

Interconnect — NVLink and InfiniBand are required for distributed training at scale. The alternative networking stacks (Ethernet-based RoCE) work but have performance penalties that make them cost-inefficient for the largest runs. This creates an NVDA-adjacent moat that gets less attention than it deserves.

Cooling — liquid cooling adoption is accelerating because air cooling can’t handle the power density at rack level. This is a cost and timeline multiplier that feeds into data center design decisions.

Integration — the software stack (CUDA, NCCL, training frameworks) remains NVIDIA’s deepest moat. Custom silicon can match raw compute but loses meaningful performance without CUDA optimization. This is why AMD has struggled despite competitive silicon.

leading and lagging indicators

Leading indicators (tell you what’s coming in 6–18 months):

HBM supply contracts and pricing trends
Data center power procurement announcements
Cloud provider capex guidance with infrastructure split
Custom silicon tape-out timelines

Lagging indicators (confirm what’s actually deployed):

MLCommons benchmark results with real-world training runs
Cloud compute pricing per token (trending down = supply is catching up)
Inference latency improvements in production systems

The gap between leading and lagging tells you whether deployment timelines are realistic.

impact on timelines and margins

Model release timelines — if compute is constrained, frontier model releases get delayed or throttled. The market currently prices in a faster cadence than the infrastructure supports.

Margins for hyperscalers — the ones with owned silicon (Google with TPUs, Amazon with Trainium) have better cost curves than pure NVIDIA buyers. This is a structural advantage that compounds over time.

Enterprise AI ROI — the cost per useful output is falling but not as fast as the narrative assumes. The bottleneck constraints mean the efficiency gains are unevenly distributed.

scenarios

Base case — HBM supply loosens through 2026, custom silicon gains 15–20% share in inference, power constraints slow hyperscaler expansion but don’t halt it. The AI rollout continues but at a slower cadence than the most optimistic projections.

Bullish case — TSMC N2 and CoWoS capacity relief accelerates, enabling a new compute cycle. Power infrastructure investment catches up faster than expected. Custom silicon closes the gap with NVIDIA for inference-heavy workloads.

Bearish case — power procurement delays compound, grid constraints in key markets worsen, and the capex cycle disappoints. Model companies that raised at peak multiples face a reckoning when deployment timelines slip and the cost curve doesn’t bend fast enough.

my take

The compute bottleneck narrative is real but incomplete. The constraint is multi-layered and the resolution path isn’t linear. Power is the underappreciated bottleneck for 2026 specifically — silicon gets more attention but power procurement decisions made today determine deployment capacity for the next 3–5 years.

linkage

[[signal triage for macro and ai feeds]]
[[security posture for ai native teams]]
[[market memo what changed in liquidity and risk appetite]]

ending questions

which compute constraint will release first and drive the next meaningful step-change in AI deployment capacity?

Keith Kitchen

Explorer

ai compute bottlenecks chips power and deployment lag

ai compute bottlenecks chips power and deployment lag

constraint stack

leading and lagging indicators

impact on timelines and margins

scenarios

my take

ending questions

Stacked notes

Graph View

Map

Table of Contents

Backlinks