Optimization
Optimization means measuring a program and improving the parts that affect quality: instructions, demos, tool descriptions, templates, component maps, or saved optimizer artifacts.
For TypeScript, use the top-level optimize(...) helper for normal AxGen and Flow tuning, and agent.optimize(...) for agent-specific tuning. Generated languages expose the AxIR-supported optimizer surface, usually around AxGEPA and artifact application.
import { AxAIOpenAIModel, ai, ax, optimize } from '@ax-llm/ax';
const program = ax('emailText:string -> priority:class "high, normal, low"');
const studentAI = ai({ name: 'openai', apiKey: process.env.OPENAI_APIKEY!, config: { model: AxAIOpenAIModel.GPT4OMini } });
const teacherAI = ai({ name: 'openai', apiKey: process.env.OPENAI_APIKEY!, config: { model: AxAIOpenAIModel.GPT4O } });
const metric = ({ prediction, example }: any) => prediction.priority === example.priority ? 1 : 0;
const result = await optimize(program, train, metric, { studentAI, teacherAI, maxMetricCalls: 40 });
program.applyOptimization(result.optimizedProgram!);GEPA is useful when accuracy, cost, latency, brevity, tool-use quality, or policy quality are real tradeoffs. The output can be a Pareto frontier instead of one fake “best” prompt.
What You Provide
- A program to tune.
- Training examples with the same input/output shape as the signature.
- A metric or judge that scores predictions.
- Optional validation examples for holdout selection.
- Student and teacher model settings where the language surface supports them.
- A
maxMetricCallsbound so the optimizer cannot spend without limit.
flowchart LR A[Program] --> D[Optimizer] B[Train examples] --> D C[Metric or judge] --> D E[Validation examples] --> D D --> F[Optimized artifact] D --> G[Pareto frontier]
AxGen Example
Use this for a single structured generator. Keep the metric deterministic when the expected output is easy to score.
const classifier = ax('emailText:string -> priority:class "high, normal, low"');
const metric = ({ prediction, example }) =>
prediction.priority === example.priority ? 1 : 0;
const result = await optimize(classifier, train, metric, {
studentAI,
teacherAI,
validationExamples,
maxMetricCalls: 120,
});
classifier.applyOptimization(result.optimizedProgram!);Flow Example
Flows expose multiple optimizable components. Use multi-objective metrics when a workflow must balance accuracy with brevity, cost, or latency.
const wf = flow<{ emailText: string }>()
.n('classifier', 'emailText:string -> priority:class "high, normal, low"')
.n('rationale', 'emailText:string, priority:string -> rationale:string')
.e('classifier', (s) => ({ emailText: s.emailText }))
.e('rationale', (s) => ({ emailText: s.emailText, priority: s.classifierResult.priority }))
.r((s) => ({ priority: s.classifierResult.priority, rationale: s.rationaleResult.rationale }));
const result = await optimize(wf, train, multiObjectiveMetric, { studentAI, teacherAI });Agent Example
Use agent.optimize(...) for tool-use, clarification, delegation, and final-response behavior. The normal path starts with task records containing input, criteria, and optional expectedActions or forbiddenActions.
const result = await supportAgent.optimize(tasks, {
judgeAI,
judgeOptions: { description: 'Prefer correct tool use over polished wording.' },
bootstrap: true,
maxMetricCalls: 24,
});
supportAgent.applyOptimization(result.optimizedProgram!);Metrics And Judges
| Scoring path | Use when |
|---|---|
| Deterministic scalar metric | The expected answer or action is clear |
| Multi-objective metric | You need visible tradeoffs such as accuracy vs brevity |
Plain typed AxGen judge | Non-agent qualitative scoring needs an LLM |
Built-in agent.optimize(...) judge | Agent behavior needs holistic review |
Normalize scores to 0..1 when possible. Keep objective names stable across calls.
Bootstrap And GEPA Together
Bootstrap demos are useful for small starter sets because they seed the model with concrete successful examples before GEPA mutates instructions/components. TypeScript optimize(...) composes the practical bootstrap-plus-GEPA path. Generated languages expose the optimizer primitives supported by their AxIR contract.
Artifacts
Optimization output is model-adjacent configuration. Save it, version it, record the examples and metrics used, and apply it through the program or agent API rather than manually patching instructions.
import { axDeserializeOptimizedProgram, axSerializeOptimizedProgram } from "@ax-llm/ax";
const saved = axSerializeOptimizedProgram(result.optimizedProgram!);
const restored = axDeserializeOptimizedProgram(saved);
program.applyOptimization(restored);Budget Discipline
- Always set
maxMetricCallsin docs and examples. - Use distinct validation examples when selecting a best candidate.
- Start with small
numTrialsand scale once the metric is stable. - For trees, inspect optimized component keys so you know what changed.
- Persist artifacts only after a held-out or smoke run proves they help.
See optimize() GEPA and optimize() API.