LLMs Provider clients, model catalogs, routing, audio, embeddings, thinking, and context caching. java concepts concepts/llms website/content-src/templates/concept-llms.md concepts LLMs

LLMs

The ai() layer owns provider clients and model traffic. It keeps Ax programs focused on signatures while one provider surface handles chat, streaming, embeddings, media, usage normalization, thinking controls, routing, balancing, tracing, and provider-specific behavior.

Java
AxAIClient client = Ax.openAICompatible(Map.of(
  "api_key", apiKey,
  "model", "gpt-4.1-mini"
));
Provider router map

Provider Setup

Create provider clients near the application boundary, keep keys in environment variables, and pass the client into forward(), agents, flows, or optimizers.

Generated Package Provider Path

The Java package exposes the AxIR-supported provider surface. Current generated examples use OpenAI-compatible clients plus no-key provider mapping tests so provider normalization can be checked without credentials.

IllustrativeGenerated-package equivalent. Prefer checked-in package examples for copy/paste runnable code.
Java
AxAIClient client = Ax.openAICompatible(Map.of(
  "api_key", apiKey,
  "model", "gpt-4.1-mini"
));

Use the generated package examples for exact provider API runs, stream mapping, Responses audio mapping, and realtime event folding for this language.

Model Catalog

Use the model catalog before runtime when a UI or router needs model choices, costs, and capabilities. It can filter for text, code, embedding, and audio models.

IllustrativeGenerated-package equivalent. Prefer checked-in package examples for copy/paste runnable code.
Java
// TypeScript exposes the bundled model catalog helper.
// Generated packages publish capability metadata in axir-capabilities.json.
flowchart LR
  A[Model catalog] --> B[Capability filter]
  B --> C[Text]
  B --> D[Embeddings]
  B --> E[Audio]
  C --> F[Route or select model]
  D --> F
  E --> F

Routing And Balancing

Routers choose a provider by capability, model key, or app policy. Balancers retry across services while preserving the Ax request shape. Use them when latency, quota, cost, rate limits, or provider outages matter.

Embeddings

Embeddings live on the same provider client surface. Use them for retrieval indexes, memory search, context lookup, and similarity workflows while keeping embedding model selection separate from generation model selection.

IllustrativeGenerated-package equivalent. Prefer checked-in package examples for copy/paste runnable code.
Java
// Implement embedding calls through the generated AxAI client surface when present.
// Use package conformance coverage to confirm current support for this language.

Audio, Realtime, And Responses

Ax maps batch transcription, batch speech, conversational audio, OpenAI Responses audio, and realtime event folding where supported. Direct ax(...) programs can pass media to compatible models; agents usually transcribe audio before planner/executor/responder stages.

IllustrativeGenerated-package equivalent. Prefer checked-in package examples for copy/paste runnable code.
Java
// See audio_responses_mapping and realtime_audio_events examples for this package.
// They show batch audio mapping and realtime event normalization.

Thinking And Context Caching

Thinking controls expose provider-specific reasoning budgets through one Ax option. Context caching marks stable prompt regions so providers with prefix caching can reuse expensive context.

IllustrativeGenerated-package equivalent. Prefer checked-in package examples for copy/paste runnable code.
Java
// Thinking budgets are provider-specific runtime options.
// Trace usage and provider metadata before relying on a budget in production.
flowchart TB
  A[Stable context field] --> B[Cache breakpoint]
  C[User query] --> D[Generation]
  B --> D
  E[thinkingTokenBudget] --> D
  D --> F[Usage + trace]

Production Notes

  • Keep provider keys outside source code.
  • Prefer model aliases like fast, smart, or cheap when app callers should not know provider model IDs.
  • Trace request latency, retries, token usage, cost, route choice, media mode, and model key.
  • Keep provider-api examples separate from no-key examples.
  • Use OpenAI-compatible clients for generated-language package examples when that is the supported provider path.

See ai() LLM models and ai() API.

Docs