Agent burns $80 in 3 hours searching for a deleted Slack message

An autonomous research agent spent 3,400 iterations re-querying the same search tool with slightly mutated keywords because the answer it was looking for had been deleted. No stop condition fired.

What happened

A daily-summary agent was asked to find a specific Slack message containing a deployment URL from "earlier this week." The message had been deleted. The agent's search tool returned an empty array. Instead of concluding "not found," the agent rephrased the query and tried again. And again. For three hours.

The trace pattern

Every iteration looked structurally identical:

llm.completions  in:1,247  out:118
  └ tool: search_slack { query: "deployment URL"        } → []
llm.completions  in:1,389  out:124
  └ tool: search_slack { query: "deploy URL last week" } → []
llm.completions  in:1,532  out:131
  └ tool: search_slack { query: "production URL"       } → []
... 3,397 more iterations ...

Token count grew linearly each turn (prior tool results retained in context). By iteration 3,400 the system prompt + history was at 290K tokens per call.

Diagnosis

The agent had three failure modes stacked:

1. No iteration cap. The orchestration loop only checked for done: true from the model, never a hard limit on tool calls.
2. No diversification rule. The model wasn't told "if a query returned zero results twice, stop searching with that strategy."
3. No graceful failure tool. The model had search_slack and report_finding tools but no report_not_found tool. From the model's perspective, returning empty-handed wasn't a valid action.

The fix

const result = await runAgent({
    tools: [searchSlack, reportFinding],
+   maxIterations: 25,
+   maxToolCalls: 50,
  });

Plus a system-prompt addition:

If the same search returns no results twice with similar queries,
call report_not_found with what you tried. Do not invent new
phrasings indefinitely.

Plus a third tool:

{
  name: "report_not_found",
  description: "Report that the requested information could not be located after a reasonable search.",
  input_schema: { type: "object", properties: { attempts: { type: "array", items: { type: "string" } } } },
}

After the fix: same task fails fast at 6 iterations, $0.04 cost.

Takeaway

Always bound autonomous loops with two caps: a step count (the agent can see and reason about) and a hard wall-clock or call-count limit (enforced by the orchestrator regardless of model state). And give the model a way to say "I couldn't do this" — otherwise it will hallucinate progress.