Long-Horizon Agents Large context, evidence by reference, context policies, memory, skills, and optimization. cpp agents agents/long-horizon website/content-src/templates/agents-long-horizon.md agents Long-Horizon Agents

Long-Horizon Agents

The full harness: bulky context that never bloats the prompt, runs that stay resumable, memory and skills loaded on demand, and behavior you can tune offline. Reach for this tier when the actor must inspect intermediate results, keep executable state alive, recover from failures, or answer many questions over the same large material.

IllustrativeGenerated-package equivalent. Prefer checked-in package examples for copy/paste runnable code.
C++
#include <axllm/axllm.hpp>

auto qa = axllm::ax("question:string -> answer:string");

Runtime-As-REPL

The defining move at this tier is RLM: the model does not emit one answer. It writes small runtime steps, Ax executes them in a persistent session, and the next turn sees compact evidence plus live variable state.

RLM executor loop

An actor turn should be one observable step: inspect, call a tool, log a result, or finish. Successful runtime values stay alive in the session even when older prompt replay is summarized.

sequenceDiagram
  participant M as Model actor
  participant R as Runtime session
  participant L as Action log
  M->>R: const docs = await kb.search(...)
  R->>L: console.log(docs.length)
  L->>M: compact evidence + live variables
  M->>R: await final("write answer", { docs })

Context Fields: Data The Model Computes On, Not Reads

Declare bulky inputs as contextFields and they stay in the runtime session instead of the prompt. The distiller narrows them with code; its evidence passes to the executor by reference in the shared session, while the executor’s prompt carries only a compact shape summary — real field names included, so the actor writes t.amountCents, not a guess. The prompt does not grow with your data, at any data size. (The shared-session execution path ships in the TypeScript runtime today; the generated language ports currently hand the distilled evidence to the executor directly.)

You don’t have to catch every case by hand. autoUpgrade (ON by default) keeps any oversized input value runtime-only even when you forget to declare it — the prompt gets a truncated preview plus a shape summary while the full value stays live as inputs.<field>. Declare a field in contextFields when you want a specific inline policy, and set autoUpgrade: false to turn the automatic behavior off. (TypeScript today; port parity is the follow-up.)

This is the property the grounded-audit example demonstrates end to end — a 250-row ledger the model never sees in its prompt, audited exactly. Measurements on the Performance page.

Context Policy: Long Runs Under Control

Within one run, the action log would grow without bound. The context policy decides what gets replayed into the prompt each turn — without erasing runtime state.

PresetWhen to use
fullShort tasks, debugging, weaker models that need exact replay
checkpointedGeneral default for real multi-turn agent work
adaptiveSummarize older successful work sooner
leanVery long runs with strong models and tight prompt pressure
IllustrativeGenerated-package equivalent. Prefer checked-in package examples for copy/paste runnable code.
C++
auto assistant = axllm::agent(
  "question:string -> answer:string",
  axllm::object({{"contextFields", axllm::array()}})
);

Context maps are the complement for repeated runs: a persistent orientation cache over the same corpus (a repo, a document set, a system), so every new task starts oriented instead of exploring from scratch. Use a policy for one long run, a map for many runs over the same material.

Memory And Skills

Memory and skill search let the actor load only what a task needs: memories are facts from an external store, recalled with await recall([...]); skills are procedural guides and runbooks, loaded with await discover({ skills: [...] }). Loaded/used callbacks tell you what the agent pulled in and what it claims it relied on.

IllustrativeGenerated-package equivalent. Prefer checked-in package examples for copy/paste runnable code.
C++
auto assistant = axllm::agent(
  "question:string -> answer:string",
  axllm::object({{"contextFields", axllm::array()}})
);
flowchart TB
  A[Agent task] --> B[recall searches]
  B --> C[Memory store]
  C --> D[inputs.memories next turn]
  A --> E[discover skills]
  E --> F[Skill store]
  F --> G[Loaded Skills prompt section]
  D --> H[Executor]
  G --> H

Optimizing Agents

Long-horizon behavior is tunable offline. agent.optimize(...) evolves the actor’s instructions against your metric on realistic task records — tool use, clarification behavior, delegation, and final quality all improve against examples that expose those tradeoffs.

IllustrativeGenerated-package equivalent. Prefer checked-in package examples for copy/paste runnable code.
C++
auto assistant = axllm::agent(
  "question:string -> answer:string",
  axllm::object({{"contextFields", axllm::array()}})
);

Observability

Everything above is observable: actor turns, tool calls, discovery, recalls, skill loads, child-agent calls, context pressure events, token usage, and costs. Long-horizon agents are production workflows; treat them like it. See Telemetry.

Runnable code: long-horizon examples. How it works inside: Internals. What we measured: Performance.

Docs