GEPA Optimization Codegen Rules (@ax-llm/ax)
This skill helps an LLM generate correct AxGEPA optimization code using @ax-llm/ax. Use when the user asks about AxGEPA, GEPA, Pareto optimization, multi-objective prompt tuning, reflective prompt evolution, validationExamples, maxMetricCalls, or optimizing a generator, flow, or agent tree.
Install
Install only this skill for TypeScript:
npx skills add https://ax-llm.github.io/ax/typescript/ --skill 'ax-gepa'Published skill file: ax-gepa/SKILL.md.
Source
- Source: src/ax/skills/ax-gepa.md
- Version:
22.0.3
Skill Instructions
Use this skill to generate GEPA optimization code. Prefer the top-level optimize(...) helper for normal code, and use direct AxGEPA / AxBootstrapFewShot only when the user needs low-level optimizer control.
Use These Defaults
- Use
optimize(program, train, metric, { studentAI, teacherAI, ... })for normal generator and flow tuning. - Prefer
ai(),ax(), andflow()for new code. - Use a strong
teacherAIand a cheaperstudentAI. - Pass
validationExampleswhen you have a holdout set. - Set
maxMetricCallsto bound optimizer cost;optimize(...)defaults it to100. - Use scalar metrics for one objective and object metrics for Pareto optimization.
- Apply results with
program.applyOptimization(result.optimizedProgram!). - For tree-wide runs, expect
optimizedProgram.componentMap. - Persist artifacts with
axSerializeOptimizedProgram(...)and restore them withaxDeserializeOptimizedProgram(...)so the same flow works in browsers and Node. optimize(...)runsAxBootstrapFewShot -> AxGEPAfor small starter sets by default, preserving the demos inresult.optimizedProgram.demos.
Critical Rules
optimize(...)andAxGEPA.compile()work for a single generator and for tree-aware roots such as flows or agents with registered optimizable descendants.- There is no separate flow-only GEPA optimizer. Use
AxGEPAfor flows too. - The metric may return either
numberorRecord<string, number>. - Keep metrics deterministic and cheap by default.
- Avoid extra LLM calls inside the metric unless the user explicitly wants judge-based evaluation.
- If the user needs LLM-as-judge scoring for a non-agent GEPA run, prefer a plain typed
AxGenevaluator instead of writing a custom judge abstraction. maxMetricCallsmust be large enough to cover the initial validation pass overvalidationExamples.- GEPA optimizes generic string components exposed by
getOptimizableComponents(). If a tree exposes no components, optimization will fail. - Use held-out validation examples for selection. Do not reuse the training set as
validationExamples. result.optimizedProgramis the easy-to-apply best candidate.result.paretoFrontis the full trade-off set for multi-objective runs.- Direct
AxGEPAstill has its ownbootstrapoption, but top-leveloptimize(...)composes the existingAxBootstrapFewShotoptimizer before GEPA instead.
Metric Selection
Choose the evaluation path deliberately:
- Prefer a deterministic metric when correctness can be read directly from
predictionandexample. - Prefer a deterministic metric when cost, latency, recursion depth, or tool count matters.
- Use a plain typed
AxGenevaluator only when the task is genuinely qualitative and hard to score exactly. - For
agent.optimize(...), prefer the built-in judge path instead of manually wrapping a judge metric. Normal agent users usually do not need to settargetormetricat all.
Rule of thumb:
optimize(...)onAxGenor flow: use a metric first, optionally a plain typedAxGenevaluator if needed.agent.optimize(...): use custommetricfor crisp scoring, otherwise let the built-in judge handle scoring. AddjudgeAIplusjudgeOptionsonly when you want a stronger or separate judge model.
Canonical Scalar Pattern
import { ai, ax, optimize, AxAIOpenAIModel } from '@ax-llm/ax';
const student = ai({
name: 'openai',
apiKey: process.env.OPENAI_APIKEY!,
config: { model: AxAIOpenAIModel.GPT4OMini },
});
const teacher = ai({
name: 'openai',
apiKey: process.env.OPENAI_APIKEY!,
config: { model: AxAIOpenAIModel.GPT4O },
});
const classifier = ax(
'emailText:string -> priority:class "high, normal, low", rationale:string'
);
const train = [
{ emailText: 'URGENT: Server down!', priority: 'high' },
{ emailText: 'Weekly newsletter', priority: 'low' },
];
const validation = [
{ emailText: 'Invoice overdue', priority: 'high' },
{ emailText: 'Lunch plans?', priority: 'low' },
];
const metric = ({ prediction, example }: { prediction: any; example: any }) =>
prediction?.priority === example?.priority ? 1 : 0;
const result = await optimize(classifier, train, metric, {
studentAI: student,
teacherAI: teacher,
numTrials: 12,
minibatch: true,
minibatchSize: 4,
earlyStoppingTrials: 4,
sampleCount: 1,
validationExamples: validation,
maxMetricCalls: 120,
});
classifier.applyOptimization(result.optimizedProgram!);
console.log(result.bestScore);Canonical Pareto Pattern
import { ai, flow, optimize, AxAIOpenAIModel } from '@ax-llm/ax';
const student = ai({
name: 'openai',
apiKey: process.env.OPENAI_APIKEY!,
config: { model: AxAIOpenAIModel.GPT4OMini },
});
const teacher = ai({
name: 'openai',
apiKey: process.env.OPENAI_APIKEY!,
config: { model: AxAIOpenAIModel.GPT4O },
});
const wf = flow<{ emailText: string }>()
.n('classifier', 'emailText:string -> priority:class "high, normal, low"')
.n(
'rationale',
'emailText:string, priority:string -> rationale:string "One concise sentence"'
)
.e('classifier', (state) => ({ emailText: state.emailText }))
.e('rationale', (state) => ({
emailText: state.emailText,
priority: state.classifierResult.priority,
}))
.r((state) => ({
priority: state.classifierResult.priority,
rationale: state.rationaleResult.rationale,
}));
const train = [
{ emailText: 'URGENT: Server down!', priority: 'high' },
{ emailText: 'Weekly newsletter', priority: 'low' },
];
const validation = [
{ emailText: 'Invoice overdue', priority: 'high' },
{ emailText: 'Lunch plans?', priority: 'low' },
];
const metric = ({ prediction, example }: { prediction: any; example: any }) => {
const accuracy = prediction?.priority === example?.priority ? 1 : 0;
const rationale = typeof prediction?.rationale === 'string'
? prediction.rationale
: '';
const brevity = rationale.length <= 40 ? 1 : rationale.length <= 80 ? 0.5 : 0.1;
return { accuracy, brevity };
};
const result = await optimize(wf, train, metric, {
studentAI: student,
teacherAI: teacher,
numTrials: 16,
minibatch: true,
minibatchSize: 6,
earlyStoppingTrials: 5,
sampleCount: 1,
validationExamples: validation,
maxMetricCalls: 240,
});
for (const point of result.paretoFront) {
console.log(point.scores, point.configuration);
}
wf.applyOptimization(result.optimizedProgram!);
console.log(result.optimizedProgram?.componentMap);Metric Patterns
// Scalar objective
const scalarMetric = ({ prediction, example }) =>
prediction.answer === example.answer ? 1 : 0;
// Multi-objective
const multiMetric = ({ prediction, example }) => ({
accuracy: prediction.answer === example.answer ? 1 : 0,
brevity:
typeof prediction?.reasoning === 'string' &&
prediction.reasoning.length < 120
? 1
: 0.2,
});- Return plain numbers or plain object literals.
- Keep objective names stable across calls.
- Prefer normalized scores such as
0..1so trade-offs are easy to reason about.
Result Handling
const { optimizedProgram, paretoFront } = result;
program.applyOptimization(optimizedProgram!);
// Save for later
const saved = JSON.stringify(optimizedProgram);
// Load later and re-apply
const loaded = JSON.parse(saved);
program.applyOptimization(loaded);- Single-target runs usually populate both
optimizedProgram.instructionandoptimizedProgram.componentMap. - Tree-wide runs rely on
componentMap, keyed by full component key. - Pareto points expose candidate configs under
point.configuration.componentMap.
Useful Options
const optimizer = new AxGEPA({
studentAI,
teacherAI,
numTrials: 20,
minibatch: true,
minibatchSize: 5,
minibatchFullEvalSteps: 5,
earlyStoppingTrials: 5,
minImprovementThreshold: 0,
sampleCount: 1,
seed: 42,
verbose: true,
});numTrials: number of reflection/evolution rounds.minibatch: reduce per-round evaluation cost.minibatchSize: examples per minibatch.earlyStoppingTrials: stop after repeated non-improvement.minImprovementThreshold: reject tiny gains below this threshold.seed: stabilize sampling during demos and tests.
Budgeting and Validation
- Always create distinct
trainandvalidationExamplesarrays. - Size
maxMetricCallsfor at least one full validation pass plus several rounds. - If the user wants a strict budget, say so explicitly and set
maxMetricCalls. - For expensive trees, start with
auto: 'light'or fewernumTrials, then scale up. - GEPA selects among exposed components using measured accept/reject history, not LLM-generated numeric scores. The LLM proposes component text; metrics decide whether to keep it.
- Function/tool trace reflection is keyed by stable component IDs where available, so function renames do not break saved candidate maps.
Troubleshooting
- Error about
maxMetricCallsbeing too small: increase it until the initial validation pass fits. - Empty or poor Pareto front: verify the metric returns numbers for every example.
- No tree optimization effect: ensure child programs are registered under the root and expose optimizable components.
- Saved optimization applies only partly: use
program.applyOptimization(...), not justsetInstruction(...), socomponentMapreaches the full tree. - Agent target seems too broad: when using
agent.optimize(...), settarget: 'actor','responder','all', or explicit program IDs. The wrapper filters GEPA components to the selected target.
Good Example Targets
/Users/vr/src/ax/src/examples/optimize.ts/Users/vr/src/ax/src/examples/gepa.ts/Users/vr/src/ax/src/examples/gepa-flow.ts/Users/vr/src/ax/src/examples/gepa-train-inference.ts/Users/vr/src/ax/src/examples/gepa-quality-vs-speed-optimization.ts/Users/vr/src/ax/src/examples/axagent-gepa-optimization.ts