Skip to content

Agent Lifecycle & Topology

Sessions outlive everything that drives them — clients disconnect, daemons crash, workers hang. State lives in the filesystem and the audit log. The daemon acts as an ephemeral cache of active execution; the journal on disk is the ultimate truth.

Transient Clients & Daemon Isolation

Clients are transient by design. The TUI, CLI, MCP bridges, and IDE extensions attach and detach over the Unix domain socket at .ostk/ostk.sock without affecting underlying execution. The daemon (anchor) manages agent lifecycles in memory, acting as a long-lived process while routing events to subscribed clients.

Transient Connections

If your TUI or terminal emulator disconnects mid-task, the agent does not stop. The anchor keeps running. Upon reconnecting, the client issues a client/attach JSON-RPC request to resume streaming logs and state updates.

Source: src/serve/server.rs, src/serve/client.rs

Sub-Stack Isolation

Sub-stacks isolate work scopes. Under .ostk/stacks/<name>/, a sub-stack maintains its own scoped journal, drain snapshots, and nudge inbox. The parent anchor communicates with it solely through these bounded IPC channels, limiting blast radius.

Source: src/kernel/sub_stack.rs, src/kernel/sandbox.rs

The Five-State Execution Lifecycle

Once an agent advances state, it cannot regress. State transitions are monitored by the kernel loop on every turn.

Agent Lifecycle Flow
HEALTHY
TRIGGER
Default state at spawn. Context usage < 70%.
BEHAVIOR
Normal execution. All permitted tools are available with minimal monitoring overhead.
AGING
TRIGGER
Context usage ≥ 70% (AGING_THRESHOLD_PCT).
BEHAVIOR
The kernel prepares a handoff record for the successor. Context is not mutated mid-turn to preserve the prompt cache.
DYING
TRIGGER
Context usage ≥ 90% (DYING_THRESHOLD_PCT). Jumps (e.g. 60% → 92%) skip AGING directly to DYING.
BEHAVIOR
Fencing begins. No new tool calls are allowed except within the single finalization turn.
DRAINING
TRIGGER
Transitioned from DYING; last-turn execution.
BEHAVIOR
A single final tool call is allowed (typically writing the handoff document). Advances to DEAD once complete.
DEAD
TRIGGER
Finalization turn complete or context limit exceeded.
BEHAVIOR
All tool calls are fenced. The agent subprocess persists as a zombie until the daemon reaps it.

Transitions are guarded by LifecycleState::can_transition_to() at src/kernel/lifecycle.rs. Evaluated during command dispatch via lifecycle.evaluate(context_pct) inside the CPU agent loop.

Spawning and Process Boundaries

Unlike systems running agents in-process or inside green threads, the ostk daemon spawns agents as OS subprocesses with distinct isolation boundaries.

01

Spawn Request

Operator runs `ostk kernel spawn <name> --model <model>` or invokes it programmatically. A new session is initialized.

02

OS Fork-Exec

The daemon fork-execs the agent as an independent child process. The agent gets its own PID and isolated environment.

03

Metadata Registration

The agent registers its configuration and PID to `.ostk/agents/<name>.meta` for process tracking.

04

Local IPC Listener

The child process opens a dedicated socket listener at `.ostk/agents/<name>.sock` to route agent-specific client traffic.

05

Lineage Bind

The agent is bound to a Lineage ID—the persistent, logical identifier tracked by the daemon across restarts.

Worker Hang Detection & Recovery

To maintain the Bounded Wait scheduler invariant, each active session writes a periodic heartbeat timestamp. If a worker process hangs or stops responding, the scheduler tick loop detects the failure and reclaims the resource.

ACTIVE

Heartbeat updated < 30 seconds ago. Process is executing normally.

STALE

30 to 90 seconds since last heartbeat. Process is assumed idle; daemon monitors.

CRASHED

> 90 seconds. Tick loop probes PID; if hung, it reaps the process and triggers hot-rehydration.

Heartbeats write to the global registry at .ostk/agents.jsonl. A secondary fallback file is maintained per agent at .ostk/.heartbeat.<alias> to prevent serialization contention.

Source: src/kernel/heartbeat.rs, src/kernel/scheduler.rs

Daemon Crash Recovery & Revival

When the daemon is killed or crashes, no state is lost. At every turn boundary, the daemon commits a snapshot of the execution state to disk. On reboot, the daemon identifies active lineages without corresponding processes, marks them Orphaned, and resolves them based on the revival policy.

REVIVAL_POLICIES
LIMIT revival_policy revive   # Rehydrate and resume from last turn (default)
LIMIT revival_policy reap     # Discard session immediately on daemon start
LIMIT revival_policy ask      # Block lineage; wait for manual resolution

Ask-pending lineages can be resolved via CLI: ostk lineage resolve <id> --revive or --reap. Anchor Exclusivity invariants guarantee that multiple running daemons cannot collide or double-rehydrate the same lineage.

Source: src/kernel/drain.rs, src/kernel/anchor.rs

Turn-Boundary Drain Snapshots

A snapshot is written to .ostk/drain/<lineage_id>.json on every completed turn. Snapshots contain all parameters required to re-establish the assistant context from the exact same boundary.

Committed Fields

  • lineage_id & anchor_id
  • written_at ISO8601 timestamp
  • Active LLM configuration and Model string
  • Cumulative token usage accounting
  • Full structured conversation messages
  • Current LoopConfig (tools, permissions, limits)

Deliberately Ephemeral Fields

  • root / directory pointers (regenerated at boot)
  • pending_images (discarded across turns)
  • runtime_allowed approvals (rebuilt per run)
  • Tokio task handles, cancel flags, and IPC channels
  • Mid-turn outbox events (rehydration resumes from turn boundaries only)
Source: src/kernel/drain.rs (V2 Upgrade path supported at upgrade_from_v1)

Kill & Reap Protocol

Process termination distinguishes between active termination (Kill) and post-mortem state synchronization (Reap).

Kill Sequence

  1. Send SIGTERM to the process group (negative PID).
  2. Initiate a 5-second grace period for clean exit.
  3. Fall back to SIGKILL if the process fails to terminate.

Does not trigger a final drain; managed by the session process table.

Reap Process

  1. Sweep the active table in agents.jsonl.
  2. Probe active entries using kill(pid, 0).
  3. For dead processes, update status to inactive and prune metadata.

Triggered periodically or via ostk kernel reap; removes heartbeat locks.

The Ephemeral Invariants

"Agents are ephemeral" is the second of the Five Foundational Laws. The concrete runtime constraints enforced by the kernel include:

State resets on daemon restart

Preload contexts, temporary tools, and in-memory session structures are entirely generated fresh at boot. Pinning to local RAM state across restarts is forbidden.

Token accounting resets on rehydrate

The token budget for the running agent process is tracked locally in the process memory. If the process is rehydrated from a snapshot, token accounting is initialized clean from that point.

Task handles are tokio-bound

Tokio futures, file stream handles, and socket event loop primitives cannot be serialized. Rehydrated processes are initialized with fresh handles, resuming from the last messages on disk.

"State is held in ServerState and resets on daemon restart. Arrivals are not themselves persistent kernel state — the persistent record is the audit row stream, which recall @arrived can query as the canonical arrival record." — src/kernel/presence.rs