Skip to content

Context & State Management

LLM context is a finite, expensive resource. The ostk kernel manages it through a multi-layered pipeline: compressing command outputs, caching stable prefixes, providing peripheral awareness via digest envelopes, injecting inline diagnostics, and declaring modular capabilities.

The Context Lifecycle Loop

Every agent turn passes through six distinct stages. Removing any stage degrades cache performance and inflates context growth.

01
PRELOAD_RENDER
The kernel compiles the base prompt, identity (.language), registers, and Agentfile context. These blocks are byte-stable across turns to maximize prompt cache hits. Volatile elements are appended after the cache boundaries.
02
TOOL_CALL
The model invokes a tool. Command output passes through the Squasher pipeline before entering context. File reads pass through elision logic, and the peripheral digest envelope is appended.
03
OUTPUT_COMPRESSION
ANSI progress indicators are stripped, lines are classified by grammatical hazard/outcome/noise rules, and repetitive outputs are collapsed. Average output savings are around 30%.
04
DIGEST_INJECTION
A 5-line envelope is appended to the tool response containing [procs], [presence], [files] (stale only), [loadavg], and [meminfo]. This signals to the agent whether files need re-reading without re-sending file content.
05
304_ELISION
If the agent re-reads a file that the digest indicates is unchanged, the kernel checks the generation table and returns a short "[304] path:gen=N (current)" message instead of the full content.
06
DRAIN_SNAPSHOT
At the turn boundary, session messages, token counts, and configuration are persisted to `.ostk/drain/<lineage_id>.json`. This facilitates crash recovery and hot-rehydration.

Prompt Caching & Fleet Economics

81% Cost Reduction

Anthropic's prompt cache is content-addressed and per-organization. Multiple ostk agent sessions sharing the same API key reuse cache entries automatically, drastically reducing the cost of parallel workers. Fleet statistics aggregated from real-world codebase runs (over 1.18 billion tokens) demonstrate an actual 81.0% cost reduction by maintaining high cache read rates.

CONTENT-ADDRESSED

The cache key is the hash of the byte content up to the breakpoint. Identical base system blocks yield instant cache hits.

FLEET ECONOMICS

One process pays the initial `cache_create` write fee; subsequent reader processes access the warm prefix at 10% of the normal input cost.

5-MINUTE TTL

Cache entries expire after 5 minutes of inactivity. Periodic turns from parallel workers keep the cache warm indefinitely.

The kernel positions up to four cache_control breakpoints: base system prompt, preload context (identity, registers), tool definitions, and message history watermark. Volatile data is placed at the end to keep the cache prefix stable.

Output Compression (The Squasher)

The Squasher pipeline intercepts shell and subprocess output. It strips progress indicators, classifies lines into priority buckets, collapses repeated sequences, and appends diagnostics.

Prompt Compression Pipeline

Stdout vs. Stderr Stream Routing

The Squasher handles streams according to their content type. Stderr is preserved with high fidelity, bypassing aggressive collapsing to protect warnings and compiler panics. Stdout is condensed aggressively using Levenshtein distance matching. On non-zero exit codes, the compression threshold relaxes to provide sufficient surrounding lines for debugging.

Three-Tier Classification

Lines are evaluated against rules and grouped into:

  • HAZARD (High Priority): Deprecations, lock timeouts, and warnings. Always preserved.
  • OUTCOME (Medium Priority): Build metrics, test counts, and exit states. Preserved verbatim.
  • NOISE (Low Priority): Iterative logs, progress bars, and dividers. Subject to immediate Levenshtein collapsing.

Levenshtein Line Merging (Similarity = 0.85)

Adjacent lines with matching structural shape are merged using a sliding window Levenshtein algorithm. Dynamic tokens (versions, hashes, paths) are generalized into tags like {hex}, {path}, and {ver}. If similarity exceeds 0.85, they collapse into [⋯ N similar lines].

CONDENSE Verbose output → summary. cargo build, npm install. Implicit dedup + hazard/outcome filtering
NARRATE Silent commands → execution narration. cp, mv, mkdir. "→ cp: a.txt → b.txt (ok)"
PASSTHROUGH Verbatim preservation. cat, grep, ls, jq. Original output + metadata footer
STRUCTURED Machine-readable → formatted summary. git status, docker ps. Parse into branches, containers, etc.
DANGEROUS Destructive ops. rm -rf, git push --force. CRITICAL/WARNING severity with context

Semantic Deduplication via Potion-Base

Optional GPU-accelerated semantic deduplication uses the potion-base-8M embeddings model (256-dimensions) to identify equivalent line semantics. Install via ostk embeddings download.

Runs on Metal (Apple Silicon) with CPU fallbacks via Wgpu. Enabled with --features embeddings.
RAW TERMINAL STREAM
Compiling ostk v3.0.0
Compiling tokio v1.38.0
Compiling serde v1.0.203
... 47 more crate compilations
warning: unused import `std::io`
warning: 2 warnings generated
Finished release in 42.3s
COMPRESSED CONTEXT
Compiling ostk v3.0.0
[⋯ 49 similar lines]
warning: unused import `std::io`
warning: 2 warnings generated
Finished release in 42.3s

Digest Envelopes & Read Prevention

To prevent agents from repeatedly reading files to check for external updates, the kernel appends a 5-line status envelope to every tool response.

EXAMPLE_DIGEST
[procs] builder:active:2m:45% reviewer:stale:5m:78%
[presence] :arrived(builder)
[files] src/main.rs:gen=12:reviewer:3m
[loadavg] needles=3 p0=1 fleet=2/2 nudges=0
[meminfo] ctx=45% used=360k/800k buffers=2 calls=14

Layer 1: Digest Suppression

If files haven't changed, they are omitted from the [files] block. Seeing no stale entries, the agent has no reason to issue a read command, avoiding the lookup entirely.

Layer 2: 304 Elision

If the agent attempts to read a file anyway, the kernel queries the generation table. If no writes have occurred since the agent's last read, the kernel overrides the read and returns [304] path:gen=N (current).

Driver Enrichment Hooks & Diagnostics

Registered FCP drivers (such as fcp-rust wrapping LSP) intercept file operations to inject compilation diagnostics, outline symbols, and manage type-safe multi-file refactoring.

Inline Diagnostic Injection

Diagnostics are injected directly as virtual code comments inside the file read response. The agent receives compiler errors inline with the source code, eliminating the need to compile manually to find syntax errors.

enriched read response
fn main() {
    let x = 5;  // [error] unused variable `x` (E0001)
    println!("hello");
}
304 (unchanged)
None. The model already possesses the file state.
First read
Errors only (severity >= error). Hints and warnings are excluded to conserve tokens.
Explicit enrich=full
Full diagnostic set, symbol outlines, structural annotations, and reference trees.
Post-edit checks
Always enriched. The kernel calls drivers immediately after an edit to check for breakages.

Type-Safe Refactoring via LSP

Drivers expose symbol graphs to support complex, multi-file refactoring verbs. When executing refactorings, the driver computes all edits, and the kernel processes them atomically under OCC CAS rules.

RENAME

Updates symbol and all references across the codebase safely. Prevents regex search errors.

EXTRACT_FUNCTION

Selects code, extracts it, and computes parameters and return structures.

INLINE

Inlines function or variables, validating that visibility and scopes are preserved.

Signal Embeddings (Breadth) Drivers (Precision)
Related function ~0.75 cosine similarity Exact call graph mapping
Relevant file Shared vocabulary/topics Direct import dependency
Dead code detection Cannot determine Zero reference symbols
Call chain path Co-occurrence heuristics Exact static call-stack traversal
Test coverage scope Cannot determine Test target reference mapping
Diagnostic Limit Diagnostic messages are truncated to 256 characters; binary codes are stripped.
Path Sandbox Drivers are restricted to authorized workspace paths; leaks outside root are blocked.
Circuit Breaker Three consecutive driver timeouts trigger a 5-minute cooldown period.
Timeout (Warm / Cold) 100ms for warm calls (falls back to raw read); 2000ms for cold start.
Refactor Timeout 10s. Fails explicitly instead of falling back to raw edits.

Capabilities & the SKILL Directive

The SKILL directive declares named capability bundles within an Agentfile. The parser extracts these into a simple vector, which is resolved at spawn time.

Format: SKILL <bundle_name>. Multiple declarations compile into Agentfile.skills: Vec<String>. Missing skill arguments trigger a ParseError::MissingArgument error.

At spawn time, the harness maps these identifiers to skill packages (e.g. resolving skills/<name>/SKILL.md) to append system prompt instructions, configure required tools, and establish style conventions.

agents/fixer.af
FROM claude-sonnet-4-6
PROMPT You fix bugs and write tests.
SKILL tdd
SKILL commit
TOOL shell
TOOL file:edit

Task Eligibility & the WORK Directive

Rather than a push-based routing engine, ostk implements a pull-based task architecture. Agents declare task eligibility using affinity masks defined via the WORK directive.

Format: WORK <expr> [<expr>...]. Multiple expressions on a single line are space-separated (evaluated as logical AND). An Agentfile can contain at most one WORK directive.

  • Parsed into WorkFilter containing a list of match expressions.
  • If omitted, the agent defaults to work: None, indicating it is eligible to pull any task.
  • Declaring multiple WORK directives triggers a ParseError::MultipleWork error.
= priority=P0 Exact match constraint.
>= priority>=P1 Lower bound mapping (P0 < P1 < P2 < P3).
<= priority<=P2 Upper bound mapping.
=a,b tags=rust,bugfix Comma-separated list (matches if the task has ANY listed tag).
agents/rust-worker.af
FROM claude-sonnet-4-6
WORK tags=rust,bugfix priority>=P1
TOOL shell
TOOL file:edit

Context Pressure & Successor Handoff

To preserve prompt caching, the kernel avoids in-turn context compaction. Instead, when context thresholds are crossed, the agent initiates a clean, structured handoff to a fresh successor process.

70% Threshold (AGING)

The kernel signals AGING state. Pre-computations begin to compile the handoff registry, while the current task loop runs unhindered.

90% Threshold (DYING)

State advances to DYING. Future tool calls are blocked. Handoff payloads are finalized. Sudden token jumps bypass AGING directly here.

Handoff (DRAINING/DEAD)

A single finalization turn (DRAINING) commits the handoff to disk, transitioning the session to DEAD. A fresh successor rehydrates the handoff state.