HN: LLM Agents in Production - Real-World Deployments
Q1 2026 saw a shift from agent demos to production deployments, with the HN community sharing candid lessons.
The Agent Hype Cycle
| Year | Dominant Narrative |
|---|---|
| 2023 | ”Agents are the future!“ |
| 2024 | ”Agents don’t work reliably” |
| 2025 | ”Agents work for specific tasks” |
| 2026 | ”Agents in production - here’s what broke” |
Production Use Cases
What Works in Production
| Use Case | Success Rate | Key Success Factor |
|---|---|---|
| Web research | 78% | Tool reliability |
| Data extraction | 85% | Structured output |
| Code review | 72% | Context quality |
| Customer support | 68% | Escalation paths |
| Code generation | 81% | Narrow scope |
What Still Breaks
HN practitioners reported consistent failure modes:
- Multi-step reasoning: Accuracy degrades past 10 steps
- Tool calling: API inconsistencies cause 15-30% failures
- Context management: Memory truncation issues
- Error recovery: Agents don’t handle edge cases well
Architecture Patterns
Successful Production Architectures
┌─────────────────────────────────────────┐
│ Orchestrator Agent │
├─────────────────────────────────────────┤
│ ┌─────────┐ ┌─────────┐ ┌─────────┐ │
│ │ Tool 1 │ │ Tool 2 │ │ Tool N │ │
│ └─────────┘ └─────────┘ └─────────┘ │
├─────────────────────────────────────────┤
│ Memory / State Store │
├─────────────────────────────────────────┤
│ Human-in-the-Loop Checkpoints │
└─────────────────────────────────────────┘
Key Patterns
- Task decomposition: Break complex tasks into agent-sized chunks
- Checkpointing: Human approval for irreversible actions
- Fallback agents: When primary agent fails, escalate to human
- Verification loops: Test outputs before proceeding
Cost Analysis
Production agent costs (median reported):
| Task Type | Tokens Used | Cost/Task | Human Time Saved |
|---|---|---|---|
| Research | 150K | $0.45 | 45 min |
| Coding | 80K | $0.24 | 30 min |
| Writing | 40K | $0.12 | 20 min |
| Analysis | 200K | $0.60 | 60 min |
Lessons Learned
Top HN recommendations:
- Start narrow: “Our best agent does one thing excellently”
- Measure everything: “We didn’t know agents failed 20% of the time until we logged it”
- Human escalation: “The agent knows when it doesn’t know - use that”
- Cost monitoring: “Agents can be surprisingly expensive at scale”
Media & Sources
Embedded Images
