ai() LLM Models

Use ai() to create provider clients and keep model traffic behind one Ax request shape.

Python

from axllm import OpenAICompatibleClient, ax

client = OpenAICompatibleClient(api_key=api_key, model="gpt-4.1-mini")
program = ax("question:string -> answer:string")
out = program.forward(client, {"question": "What is Ax?"})

What It Does

ai() selects a provider implementation from configuration and returns a client that Ax programs can call. The client handles chat, streaming, embeddings, media where supported, usage normalization, provider options, model keys, routing hooks, tracing, and runtime defaults.

flowchart LR
  A["Model key or alias"] --> B["Model catalog"]
  B --> C["Capability filter"]
  C --> D["Provider client"]
  D --> E["Request mapping"]
  E --> F["Provider API"]
  F --> G["Response normalization"]
  G --> H["Usage + trace"]

Core Call Shape

Create the client once near the application boundary, then pass it into forward(), streamingForward(), agents, flows, or optimizers.

text

client = ai(provider options)
result = program.forward(client, inputs)

Common Patterns

Use a provider name and environment-backed API key.
Set a default model in provider config when the app has one obvious model.
Define model aliases when callers should choose fast, smart, or cheap instead of provider model IDs.
Use OpenAI-compatible apiURL for compatible providers.
Use model catalog helpers before runtime when the UI needs provider/model selectors.
Use routers or balancers when provider fallback is part of the product.

Adaptive balancing

AxBalancer keeps its existing ordered failover behavior by default. Set strategy.type to adaptive to rank equivalent providers per chat request using learned reliability, successful latency, a deadline, and estimated cost. Configure badOutcomeCost in the same currency or unit as the route cost estimate.

Use the native stats-store option for authoritative decision state. The built-in in-memory store can be shared by balancers in one process; multi-process applications can implement AxBalancerStatsStore with an atomic Redis or database update. The routing-event hook is best-effort telemetry, not routing state. Stable route keys are required with a shared store, and namespace plus slice keep unrelated traffic from learning from each other.

Adaptive balancing does not inspect prompt meaning or decide which model is best for a task. The application defines acceptable substitutes through shared logical aliases.

Provider clients

Generated Package Provider Path

The Python package exposes the AxIR-supported provider surface. Public examples use OpenAI-compatible clients, while internal fixtures cover provider normalization without credentials.

IllustrativeGenerated-package equivalent. Prefer checked-in package examples for copy/paste runnable code.

Python

from axllm import OpenAICompatibleClient, ax

client = OpenAICompatibleClient(api_key=api_key, model="gpt-4.1-mini")
program = ax("question:string -> answer:string")
out = program.forward(client, {"question": "What is Ax?"})

Use the generated package examples for exact provider API runs, stream mapping, Responses audio mapping, and realtime event folding for this language.

Embeddings and audio

IllustrativeGenerated-package equivalent. Prefer checked-in package examples for copy/paste runnable code.

Python

# Implement embedding calls through the generated AxAI client surface when present.
# Use package conformance coverage to confirm current support for this language.

IllustrativeGenerated-package equivalent. Prefer checked-in package examples for copy/paste runnable code.

Python

# Realtime audio over WebSocket — pip install axllm[realtime]
client = OpenAIResponsesClient(model="gpt-realtime-2", api_key=os.environ["OPENAI_APIKEY"])
request = {"model": "gpt-realtime-2", "chat_prompt": [{"role": "user", "content": "Say hello."}], "audio": {"output": {"voice": "alloy"}}}
final = client.realtime_chat(request)  # one merged turn: transcript + base64 PCM audio
# Realtime models also route transparently through chat(); .chat() accepts input_audio parts; transcribe()/speak() do batch STT/TTS.

Practical Notes

Prefer provider factories over direct provider classes in new code.
Use model catalog and provider-scoring helpers when choosing between providers.
Use a multi-service router to dispatch caller-selected model keys; use a balancer for fallback or adaptive operational routing across equivalent services.
Keep public provider examples separate from internal conformance fixtures.
Trace provider requests, token usage, estimated cost, and routing decisions in production.

See ai() API.