Your Agent Knows Less Than You Think: Sierra's tau-knowledge Benchmark Exposes the Retrieval Gap
Agentic AI · Benchmark Analysis Sierra's tau-knowledge benchmark tests agents on messy, evolving knowledge bases. Even the best frontier model passes only 37% of tasks on first try. That gap is already in production. By Shashi Bellamkonda · May 17, 2026 37.4% Best Pass^1 score — GPT-5.5 xhigh reaso…