Skip to content

Models

ostk is model-agnostic. Six production providers, one local runtime, one on-device path. Every provider speaks through the same CpuDriver trait — but they don't all support the same features. This page tells you what works where.

Anthropic FULL
MODELS
claude-opus-4-6, claude-sonnet-4-6, claude-haiku-4-5
CONTEXT / WIRE
200k tokens — Native
FEATURES
Tool use, extended thinking, streaming, prompt caching, vision, file upload, batch API, token counting, model listing, speed mode, citations
API KEY
ANTHROPIC_API_KEY
SOURCE
src/cpu/anthropic.rs:257–297
Google Gemini FULL
MODELS
gemini-2.5-pro, gemini-2.0-flash, gemini-3.1-pro, gemini-3-flash
CONTEXT / WIRE
1M tokens — OpenAI-compat
FEATURES
Tool use, thinking (auto for 3.x models, thinkingLevel: HIGH), streaming. No client-side prompt caching — server-side only.
API KEY
GEMINI_API_KEY (or GOOGLE_API_KEY)
SOURCE
src/cpu/gemini.rs:32–70, 299–319
Mistral FULL
MODELS
mistral-large-latest, codestral-latest, devstral-small, devstral-medium-2507, magistral-*
CONTEXT / WIRE
256k tokens — OpenAI-compat
FEATURES
Tool use, streaming. No thinking mode. ostk handles Mistral-specific quirks: tool name injection (required by API), tool ID remapping (9 alphanumeric chars).
API KEY
MISTRAL_API_KEY
SOURCE
src/cpu/mistral.rs:24–45, 145–220
OpenAI PRODUCTION
MODELS
gpt-4o, o1-*, o3-pro, o4-*
CONTEXT / WIRE
128k–200k tokens — OpenAI-compat
FEATURES
Tool use, streaming. No prompt caching, no token counting. Reasoning models (o1/o3/o4) have built-in thinking — not controlled by ostk.
API KEY
OPENAI_API_KEY
SOURCE
src/cpu/openrouter.rs (shared driver), model_registry.rs:84–117
OpenRouter GATEWAY
MODELS
Any provider/model (meta-llama/*, deepseek/*, etc.)
CONTEXT / WIRE
Model-dependent — OpenAI-compat
FEATURES
Tool use, streaming. Fallback for any model not matched by a dedicated driver. Sends HTTP-Referer: https://ostk.ai.
API KEY
OPENROUTER_API_KEY
SOURCE
src/cpu/openrouter.rs:12–51
Ollama (local) LOCAL
MODELS
local/codestral:22b, local/qwen2.5-coder:32b, local/llama3.3:70b, local/deepseek-r1:70b, local/gemma3:27b
CONTEXT / WIRE
Model-dependent — OpenAI-compat
FEATURES
Tool use, streaming. No prompt caching, no token counting. Models with ":" separator auto-route here.
API KEY
None required (OLLAMA_HOST to override endpoint)
SOURCE
src/cpu/mod.rs:437–454, providers.rs:62–68
Apple on-device EXPERIMENTAL
MODELS
apple/default
CONTEXT / WIRE
4k tokens — OpenAI-compat
FEATURES
Streaming only. macOS only. Routes to the olleh local service on port 11941. Suitable for preprocessing, not primary agent work.
API KEY
None (APPLE_MODEL_HOST to override)
SOURCE
src/cpu/mod.rs:439–446, model_registry.rs:196–201

How FROM auto Picks a Model

When an Agentfile says FROM auto (or no FROM at all), ostk scores available models based on which API keys are present. Source: src/commands/run.rs:89–197.

01
Runtime override: staging/preferred_model (set by :model in TUI)
02
HUMANFILE MODEL directive (highest-priority static config)
03
HUMANFILE FALLBACK directive (secondary static config)
04
Environment scoring: scan for API keys and rank by capability. claude-opus-4-6 + ANTHROPIC_API_KEY scores 100; claude-sonnet-4-6 scores 90; gpt-4o + OPENAI_API_KEY scores 60; gemini-2.0-flash + GEMINI_API_KEY scores 50.
05
Default fallback: claude-sonnet-4-6 (run.rs:196)

OSTK_MODEL env var overrides everything — if set, it's used regardless of HUMANFILE or FROM. Source: src/cpu/context.rs:116.

Four places to set a model, in priority order:

OSTK_MODEL env var Session-wide override. Ignores everything else. OSTK_MODEL=gemini-2.5-pro ostk run agents/worker.af
FROM <model> in Agentfile Per-agent. The agent always runs this model. FROM claude-sonnet-4-6
HUMANFILE MODEL directive Project-wide default. Applies when FROM is auto or absent. MODEL claude-opus-4-6
--model flag on ostk kernel spawn Per-spawn override. Takes precedence over Agentfile FROM auto. ostk kernel spawn worker --model gemini-2.5-pro "task"

Not every provider supports every feature. The CpuDriver trait provides a common surface, but the underlying APIs vary. This matrix shows what actually works per provider at the driver level.

Feature Anthropic Gemini Mistral OpenAI Ollama
Tool use
Streaming
Extended thinking ✓ (3.x) built-in*
Prompt caching server
Vision/images model
File upload API
Token counting
Batch API
Speed mode
Citations

= ostk driver implements it. = not available in the driver. server = handled server-side, no client config. built-in* = reasoning is a model behavior, not a driver feature (o1/o3/o4 always think). model = depends on the specific model loaded in Ollama.

Gemini 3.x thinking: When the model name contains "3.1" or "3-", the driver automatically sends thinkingConfig.includeThoughts: true and thinkingLevel: "HIGH". Gemini 2.x models get standard config only. Source: gemini.rs:105–121.
Mistral tool ID remapping: Mistral requires tool call IDs to be exactly 9 alphanumeric characters. Anthropic emits toolu_ prefixed IDs that are longer. ostk remaps them transparently. Source: mistral.rs:191–220.
Ollama auto-detection: Any model name containing a colon (e.g., codestral:22b) is auto-routed to Ollama without the local/ prefix. Source: model_registry.rs:251–253.
OpenAI via OpenRouter: If OPENAI_API_KEY is not set but OPENROUTER_API_KEY is, OpenAI models (gpt-4o, o1-*, o3-*) route through OpenRouter automatically. Source: providers.rs:76–81.
DeepSeek R1: Reasoning model routed via OpenRouter with special format mapping at model_registry.rs:391–398. Handles the reasoning_content field in responses.
HUMANFILE
MODEL claude-sonnet-4-6
FALLBACK gemini-2.5-pro
agents/cheap-lint.af
FROM local/codestral:22b
PROMPT Run eslint, fix warnings.
TOOL shell
LIMIT budget_usd 0

The HUMANFILE sets the project default (Sonnet) with a fallback (Gemini). The lint agent pins a local model via FROM — free, fast, and sufficient for the task. The budget is $0 because local models have no API cost. Model mixing is the normal case, not an edge case.