latency targets are now product promises not infra metrics
Interactive AI workflows are making latency an explicit user contract; long or volatile responses now feel like broken behavior rather than technical variance (Google web vitals).
see also: latency is becoming cultural not technical · queue aware batching improves gpu utilization stability
expectation reset
Users compare assistants to real-time interfaces, so perceived delay now influences trust as much as output quality.
operating consequences
- Product teams set latency SLOs by workflow criticality.
- Routing and cache policy are now UX levers.
- Tail latency failures drive churn in recurring tasks.
my take
Latency discipline is now part of product truthfulness.
linkage
- [[latency is becoming cultural not technical]]
- [[queue aware batching improves gpu utilization stability]]
- [[prompt cache invalidation strategies reduce tail latency]]
ending questions
which user journey should define the primary latency budget for an ai product?