When to Use AI Agents (and When Not To)

THE HYPE PROBLEM

Every third client conversation right now includes the phrase "we want to build an AI agent." Sometimes they mean a sophisticated multi-step reasoning system. Sometimes they mean a chatbot with a database lookup. Sometimes they're not sure — they just know agents are what everyone's talking about.

The hype is real. So is the capability. LLMs in 2025 can genuinely execute multi-step tasks, use tools, reason about partial results, and recover from errors in ways that weren't viable two years ago. Agents aren't a gimmick.

But agents are also expensive, slow, unpredictable, and hard to test. A lot of problems that people reach for agents to solve are better served by a single well-crafted API call. And building the wrong architecture for a problem is one of the most expensive mistakes a software project can make.

So before we write any code, we run four questions. If any of them returns a "no," we go simpler.

QUESTION 1: IS THE TASK GENUINELY MULTI-STEP?

An agent earns its cost when the task can't be decomposed into a fixed sequence of steps in advance — when the model needs to decide what to do next based on what it just learned.

If your task is "summarize this document," that's one call. If it's "extract all action items, look up the assignees in the CRM, check their current task loads, and draft a reassignment plan if anyone is overloaded" — that's a workflow where each step depends on the previous result. That's where an agent architecture makes sense.

A useful diagnostic: can you draw the full flowchart before you start? If yes, you probably don't need an agent — you need a pipeline. Pipelines are cheaper, faster, and more predictable.

QUESTION 2: DOES THE OUTCOME JUSTIFY THE COST?

Agentic calls are expensive in time and money. A single agent run can make dozens of LLM calls, each with its own latency and token cost. If you're processing 10,000 items a day and each one triggers a 20-call agent loop, the economics need to work.

This is especially true for async background tasks where the user isn't waiting in real time — there's no UX reason to use a fast, expensive agent when a slower, cheaper pipeline produces the same result. We've seen clients burn through API budgets on agent architectures that a well-structured prompt chain would have handled for a tenth of the cost.

Ask: what is this task worth? If the value of the outcome is high and the volume is low, agents make sense. If you're doing high-volume, lower-stakes processing, engineer the simpler path first.

QUESTION 3: IS THE MODEL CAPABLE AT THIS TASK?

This is the question people skip most often. Agent frameworks make it easy to wire up tool calls and prompts — but the underlying model still has to be able to reason correctly about the domain.

We've seen agent architectures fail not because the architecture was wrong, but because the model consistently made bad decisions at a particular step — hallucinating field names, misreading ambiguous inputs, or failing to recognize when it had enough information to stop. No amount of retry logic fixes a reasoning gap.

Before building a full agent, run evals on the individual steps. Can the model reliably extract the right entity from this document type? Can it correctly interpret this API response? Validate the components before you compose them.

QUESTION 4: CAN ERRORS BE CAUGHT AND RECOVERED FROM?

Agents take actions. Some of those actions are reversible (reading data, drafting text); some aren't (sending emails, deleting records, submitting forms). The blast radius of a failure matters.

For high-stakes irreversible actions, we build approval gates — points where a human reviews the agent's proposed action before it executes. For lower-stakes or fully reversible actions, we build with automatic rollback or idempotent operations where possible.

The test is: if the agent makes the worst plausible mistake, what happens? If the answer is "nothing we can't fix," proceed. If the answer involves data loss, financial transactions, or customer-facing consequences — design the error handling before you design the agent.

THE DEFAULT POSITION

Our default is always the simplest architecture that solves the problem. A single LLM call. Then a prompt chain. Then a workflow with tools. Then, only if the task genuinely requires open-ended reasoning across unpredictable states, an agent.

The most common mistake isn't failing to use agents — it's using them too early. Start simple, measure, and add complexity only when simplicity demonstrably fails.

Agents are a power tool. Use them for the jobs that actually need them.