Agent Loop

Intro

The agent loop is the execution cycle that turns an LLM from a single-shot text generator into an autonomous problem solver. Instead of one prompt producing one response, the model runs in a loop: reason about the current state, call a tool, observe the result, then decide whether to continue or stop. This is the ReAct pattern (Reasoning + Acting), introduced by Yao et al. (ICLR 2023), and many production agent frameworks — Semantic Kernel, LangChain, AutoGen — use ReAct-like think-act-observe loops as their core execution model.

The loop works in four steps:

  1. Think — the model receives the conversation history (including any prior tool results) and generates a reasoning trace. It decides what information it still needs or what action to take next.
  2. Act — the model emits a structured tool call — a function name plus arguments — choosing from the tools provided in its schema.
  3. Observe — the runtime executes the tool and appends the result to the conversation history as a tool-role message.
  4. Repeat — the updated history goes back to the model. It either reasons and acts again (back to step 1) or returns a final text response to the user.
stateDiagram-v2
    [*] --> Think: User message arrives
    Think --> Act: Model emits tool call
    Think --> Done: Model returns text
    Act --> Observe: Runtime executes tool
    Observe --> Think: Result appended to history
    Done --> [*]

The critical insight from the ReAct paper: interleaving reasoning traces with tool use outperforms either approach alone. Chain-of-thought without tools hallucinates facts — the model fabricates answers from parametric memory when it should look them up. Tool use without reasoning cannot decompose complex goals into subgoals. On ALFWorld decision-making tasks, ReAct outperformed imitation learning and reinforcement learning baselines by 34% absolute success rate. The loop lets reasoning guide tool selection while tool outputs ground the reasoning in reality.

For the broader context of when an agent loop is the right choice versus simpler workflow patterns, see the Agents hub page.

How It Works in Practice

The loop maps directly to the chat completions API contract. Here is what happens at each step, concretely:

Think. The runtime sends the full chat history — system prompt, user messages, prior assistant messages, and tool results — plus the JSON schemas of all available tools. The model processes everything and generates a response.

Act. If the model's response contains tool_calls instead of plain text, the runtime extracts the function name and arguments. The model has decided it needs external data or a side effect before it can answer.

Observe. The runtime invokes the function, serializes the result, and appends it as a tool-role message in the chat history. The model never executes tools directly — it only sees the serialized result.

Repeat. The runtime sends the updated history back to the model. If the model responds with text and no tool calls, the loop terminates and the response reaches the user. If it responds with more tool calls, the cycle continues.

In Semantic Kernel (.NET), the loop is automated behind FunctionChoiceBehavior.Auto():

var builder = Kernel.CreateBuilder()
    .AddAzureOpenAIChatCompletion(deploymentName, endpoint, apiKey);
builder.Plugins.AddFromType<WeatherPlugin>("Weather");
var kernel = builder.Build();

var settings = new OpenAIPromptExecutionSettings
{
    FunctionChoiceBehavior = FunctionChoiceBehavior.Auto()
};

var history = new ChatHistory();
history.AddUserMessage("Should I bring an umbrella to Seattle today?");

// This single call runs the full Think-Act-Observe-Repeat loop internally
var response = await kernel.GetRequiredService<IChatCompletionService>()
    .GetChatMessageContentAsync(history, settings, kernel);

Behind GetChatMessageContentAsync, Semantic Kernel serializes all [KernelFunction] methods into JSON tool schemas, sends them with the chat history, invokes any requested functions, appends results, and repeats until the model returns a text response or hits the iteration cap.

The equivalent raw loop without framework abstraction:

messages = [{"role": "user", "content": "Should I bring an umbrella to Seattle today?"}]

while True:
    response = client.chat.completions.create(
        model="gpt-4.1", messages=messages, tools=tools
    )
    choice = response.choices[0]

    if choice.finish_reason == "stop":
        print(choice.message.content)  # Final answer — loop ends
        break

    # Model wants to call tools — execute each one
    messages.append(choice.message)
    for tool_call in choice.message.tool_calls:
        result = execute_tool(tool_call.function.name, tool_call.function.arguments)
        messages.append({
            "role": "tool",
            "tool_call_id": tool_call.id,
            "content": json.dumps(result),
        })
    # Loop back — send updated history to model

Both examples implement the same four-step cycle. The framework version hides the loop; the raw version makes every step explicit.

Pitfalls

Infinite Loops and Tool Spam

The model enters a cycle where it repeatedly calls the same tool or alternates between two tools without making progress toward an answer. A production case documented by Hugo Nogueira: one agent run made 369 tool calls, consumed 9.7M tokens, and cost $2.74 — without ever reaching an answer. The model kept searching for information it had already retrieved because its reasoning trace lost track of prior observations.

Why it happens: as the context window fills with tool results, earlier reasoning traces fall outside the model's effective attention window. The model "forgets" what it already tried. Models also have no intrinsic sense of diminishing returns — they will keep trying if the prompt does not define a stop condition.

Mitigation: set a hard cap on loop iterations (Semantic Kernel's auto function-calling settings accept a maximum iteration count; LangGraph exposes recursion_limit). Add explicit stop instructions in the system prompt: "If you have called the same tool twice with the same arguments, stop and answer with what you have." Monitor tool call counts per request and alert on outliers.

Token Explosion

Each loop iteration appends messages to the conversation history — the model's reasoning, the tool call, and the tool result. After 5–10 iterations, the accumulated context can reach thousands of tokens. Large tool responses — full API payloads, search results with multiple documents — accelerate this. Eventually the context window fills, and the model either truncates critical information or the API rejects the request.

Why it happens: tool results are appended verbatim without summarization. Developers return entire database records or full web pages when the model only needs a few fields.

Mitigation: keep tool return values compact — return only the fields the model needs. Set max_tokens on the model response to limit reasoning verbosity. For long-running agents, summarize or truncate older tool results before re-sending. Track cumulative token usage per loop iteration and terminate early if approaching the context limit.

Hallucinated Tool Calls

The model invokes a function that does not exist, passes arguments that do not match the schema, or fabricates parameter values. This is especially common when tool schemas are ambiguous or when the model confuses similar function names across plugins.

Why it happens: tool schemas are sent as JSON descriptions — the model is predicting the next token, not reading documentation. Vague function names (process, handle) or under-specified parameter descriptions increase confusion. Complex nested schemas degrade argument accuracy.

Mitigation: use self-explanatory function names and explicit parameter descriptions. Validate tool call arguments against the schema before execution — reject and return a clear error message so the model can self-correct on the next iteration. Keep schemas flat and simple. Anthropic's SWE-bench agent team found that switching from relative to absolute file paths in tool parameters eliminated an entire class of hallucinated arguments.

Questions

References


Whats next