Ax Refine And BestOfN

Use this skill when writing or reviewing Ax bestOfN/refine code, reward functions, thresholds, native sample selection, serial attempts, generated advice, and attempt diagnostics.

Install

Install only this skill for TypeScript:

Shell

npx skills add https://ax-llm.github.io/ax/typescript/ --skill 'ax-refine'

Published skill file: ax-refine/SKILL.md.

Source

Source: src/ax/skills/ax-refine.md
Version: 23.0.5

Skill Instructions

Use bestOfN(...) when you can score complete outputs independently. Use refine(...) when failed rounds should produce feedback that changes the next attempt.

Validation And Assertions

Keep reward scoring, whole-output assertions, and streaming assertions separate:

Use schema validation for shape, types, and field-level constraints.
Use addAssert(...) for whole-output hard invariants. Failed assertions feed correction text into the normal retry loop.
Use addStreamingAssert(...) for partial streaming hard invariants. It aborts the current stream attempt as soon as the partial field fails, then feeds correction text into the normal retry loop.
Use bestOfN(...) for complete-candidate selection.
Use refine(...) for reward-scored retry rounds with generated feedback.

APIs

TypeScript

import { bestOfN, refine } from '@ax-llm/ax';

const selected = bestOfN(program, {
  n: 4,
  threshold: 0.8,
  rewardFn: ({ input, prediction, traces, chatLog }) => score(prediction),
});

const improved = refine(program, {
  rounds: 3,
  samplesPerRound: 2,
  threshold: 0.85,
  rewardDescription: 'Prefer complete, grounded, concise answers.',
  rewardFn: ({ prediction }) => score(prediction),
});

Rules:

forward(...) returns the selected prediction.
streamingForward(...) is unsupported; score complete outputs instead.
getUsage() aggregates usage across attempts.
getTraces() and getChatLog() return the selected attempt’s diagnostics.
getAttempts() returns all attempt metadata, including reward, errors, and advice application.

Reward Functions

Reward functions return a number. Higher is better. A threshold marks a good-enough candidate and can stop serial attempts early.

TypeScript

const rewardFn = ({ prediction }) => {
  const exact = prediction.answer === 'Paris' ? 1 : 0;
  const concise = prediction.answer.length < 80 ? 0.2 : 0;
  return exact + concise;
};

Use serial strategy when the reward needs traces, chat logs, tools, or full flow behavior.

Strategies

strategy: "auto" uses native samples for AxGen and serial attempts for composite programs.
strategy: "native-samples" uses sampleCount and a reward-backed resultPicker; candidate context includes outputs, not full per-candidate traces.
strategy: "serial" runs isolated full-program attempts with fresh memory/session IDs.

Refine Advice

refine(...) generates advice after a below-threshold round. Advice is appended temporarily to matching kind: "instruction" components exposed by getOptimizableComponents() and applied through applyOptimizedComponents().

Rules:

Original instruction values are restored in finally, on success and error.
Programs without instruction components continue as best-of-N rounds and mark adviceApplied: false.
Do not add DSPy-style hint_ signature fields; Ax uses instruction-component advice.

Streaming

Do not use refine(...) for streaming. For partial-output safety, use addStreamingAssert(fieldName, fn, message?) on AxGen. Streaming assertions fail fast within the current attempt with AxStreamingAssertionError, then retry with correction feedback when retries remain.