Reliability arithmetic · interactive

Reliability compounds. Every step counts.

Agent pipelines chain tool calls. Each call has a per-step reliability, and the end-to-end success rate is the product of those reliabilities. It collapses faster than most teams expect. Adjust the sliders, or pick a real-world preset, to see the curve.

Single-shot success
35.8%
0.95020. Most attempts fail somewhere in the chain.
Pass⁸ (8 runs in a row)
0.02%
Probability the same task succeeds 8 times in a row. τ-bench's passk metric.
Success rate across pipeline length
At the current per-step reliability. Dashed line shows pass⁸.
Single attempt
Pass⁸
At 95% per-step reliability, a 20-step pipeline succeeds 35.8% of the time. End-to-end success and consistent reliability measure different things.