Agent Internals AxAgent stages, the four context objects, and the research lineage behind them. go concepts concepts/agent-internals website/content-src/templates/concept-agent-internals.md concepts Agent Internals

Agent Internals

AxAgent is a small, opinionated runtime built from a handful of research ideas. This page explains how it works inside — the three-stage pipeline, the distinct context objects it manages, how trajectory compaction actually decides what the model sees again, and the papers each part comes from. For usage and patterns, start at Agents; this page is the “why it is shaped this way”.

The Three Stages

Every forward() runs three programs in sequence:

flowchart LR
  A[Original typed inputs] --> B[Distiller]
  B -->|executorRequest + distilledContext| C[Executor]
  C -->|evidence + runtime state + final envelope| D[Responder]
  D --> E[Typed output]
RLM executor loop
  • Distiller normalizes the task and compresses large inputs into the exact request the executor needs.
  • Executor owns the runtime session: tool calls, discovery, memory recall, child agents, and the final envelope.
  • Responder turns the executor’s evidence into the declared output signature.

The handoff between stages is deliberately narrow. The completion primitive is final(task, context?) — exactly two arguments. Gathered evidence rides inside that optional context object (surfaced downstream as distilledContext); there is no separate side channel. Keeping the envelope to two positional arguments is what lets the same protocol run identically across every Ax language backend.

The Context Objects

The most important internal idea is that “context” is not one thing. AxAgent manages four distinct objects, each with its own scope, lifetime, and mechanism — and each grounded in a different paper. Confusing them is the usual source of context-management mistakes.

ObjectScopeMechanismLineageAx surface
Context mapa recurring external corpus (a repo, doc set, dataset)persistent orientation cache, curated by distill → cartograph → evictPEEKcontextMap
Trajectory compactionone run’s action loghindsight ranking, tombstones, and checkpoints under a budgetRLMcontextPolicy presets
Strategy playbooka task, learned offlinegeneration → reflection → curation into reusable bulletsACEagent.optimize(...)
Instruction texta program, learned offlinereflective Pareto evolution of promptsGEPAagent.optimize(...)
Retrieved facts & guidesa turnsearched on demand, injected for one turnrecall / skills

Two of these live at runtime (context map, trajectory compaction), two are offline optimizers (playbook, instructions), and one is per-turn retrieval. They compose rather than compete:

flowchart TB
  Corpus[Recurring corpus] -->|contextMap · PEEK| EX[Executor]
  Trajectory[This run's turns] -->|contextPolicy · hindsight| EX
  Strategy[Task strategy] -->|optimize · ACE + GEPA| EX
  Retrieval[Facts and guides] -->|recall · skills| EX
  EX --> Out[Typed output]

A useful rule of thumb: use a context map when many tasks ask different questions over the same large material; use a context policy when one long run needs its own history kept under control; use optimization to improve strategy or instructions offline; use recall/skills to pull in only what a single turn needs.

Trajectory Compaction (Internals)

Within a single run, the executor’s action log would grow without bound. The context policy decides what is replayed into the prompt each turn — without erasing runtime state, which stays alive in the session. The decision is made by hindsight: each step is scored after it runs, and the resulting plan keeps what still matters and compacts the rest.

  • Hindsight ranking scores each completed step (foundational, pivot, dead-end, superseded) and ranks it 0–5. Low-value steps become eligible for compaction.
  • Tombstones replace a resolved error (and its failed attempts) with one compact line, so the model remembers the lesson without re-reading the failure.
  • Checkpoints summarize older trajectory into a structured ledger once the prompt crosses a budget threshold, while keeping recent working state verbatim.

Presets tune how aggressive this is:

PresetWhen to use
fullShort tasks, debugging, weaker models that need exact replay
checkpointedGeneral default for real multi-turn agent work
adaptiveSummarize older successful work sooner
leanVery long runs with strong models and tight prompt pressure

All of this is observable. The onContextEvent callback emits a budget_check every turn (with the live mutable prompt size and pressure level) plus action_compacted, checkpoint_created, and tombstone_created events. Aggregating that stream gives the headline numbers worth tracking: peak prompt size, compaction ratio, and cumulative tokens. Sweeping the presets on a long-horizon task is the cleanest way to see the tradeoff — raw replay keeps the most context but costs the most tokens, while the trimming presets cut peak size and token cost at some risk to answer completeness.

Lineage

  • DSPy — declarative, typed model programs; the foundation for signatures and optimization.
  • RLM (Recursive Language Models) — treat a long context as an external environment the model inspects through bounded, tool-mediated turns; the basis for runtime state and small-context turns.
  • PEEK — context maps as persistent orientation knowledge about a recurring corpus.
  • ACE — evolving context playbooks via generation, reflection, and curation; shipped as an optimizer alongside GEPA.
  • GEPA — reflective prompt evolution over a Pareto frontier.

See the full Research Map for papers and how each maps to Ax.

Docs