Agent Lifecycle & Topology
Sessions outlive everything that drives them — clients disconnect, daemons crash, workers hang. State lives in the filesystem and the audit log. The daemon acts as an ephemeral cache of active execution; the journal on disk is the ultimate truth.
Transient Clients & Daemon Isolation
Clients are transient by design. The TUI, CLI, MCP bridges, and IDE extensions attach and detach over the Unix domain socket at .ostk/ostk.sock without affecting underlying execution. The daemon (anchor) manages agent lifecycles in memory, acting as a long-lived process while routing events to subscribed clients.
Transient Connections
If your TUI or terminal emulator disconnects mid-task, the agent does not stop. The anchor keeps running. Upon reconnecting, the client issues a client/attach JSON-RPC request to resume streaming logs and state updates.
src/serve/server.rs, src/serve/client.rs Sub-Stack Isolation
Sub-stacks isolate work scopes. Under .ostk/stacks/<name>/, a sub-stack maintains its own scoped journal, drain snapshots, and nudge inbox. The parent anchor communicates with it solely through these bounded IPC channels, limiting blast radius.
src/kernel/sub_stack.rs, src/kernel/sandbox.rs The Five-State Execution Lifecycle
Once an agent advances state, it cannot regress. State transitions are monitored by the kernel loop on every turn.
Transitions are guarded by LifecycleState::can_transition_to() at src/kernel/lifecycle.rs. Evaluated during command dispatch via lifecycle.evaluate(context_pct) inside the CPU agent loop.
Spawning and Process Boundaries
Unlike systems running agents in-process or inside green threads, the ostk daemon spawns agents as OS subprocesses with distinct isolation boundaries.
Spawn Request
Operator runs `ostk kernel spawn <name> --model <model>` or invokes it programmatically. A new session is initialized.
OS Fork-Exec
The daemon fork-execs the agent as an independent child process. The agent gets its own PID and isolated environment.
Metadata Registration
The agent registers its configuration and PID to `.ostk/agents/<name>.meta` for process tracking.
Local IPC Listener
The child process opens a dedicated socket listener at `.ostk/agents/<name>.sock` to route agent-specific client traffic.
Lineage Bind
The agent is bound to a Lineage ID—the persistent, logical identifier tracked by the daemon across restarts.
Worker Hang Detection & Recovery
To maintain the Bounded Wait scheduler invariant, each active session writes a periodic heartbeat timestamp. If a worker process hangs or stops responding, the scheduler tick loop detects the failure and reclaims the resource.
Heartbeat updated < 30 seconds ago. Process is executing normally.
30 to 90 seconds since last heartbeat. Process is assumed idle; daemon monitors.
> 90 seconds. Tick loop probes PID; if hung, it reaps the process and triggers hot-rehydration.
Heartbeats write to the global registry at .ostk/agents.jsonl. A secondary fallback file is maintained per agent at .ostk/.heartbeat.<alias> to prevent serialization contention.
src/kernel/heartbeat.rs, src/kernel/scheduler.rs Daemon Crash Recovery & Revival
When the daemon is killed or crashes, no state is lost. At every turn boundary, the daemon commits a snapshot of the execution state to disk. On reboot, the daemon identifies active lineages without corresponding processes, marks them Orphaned, and resolves them based on the revival policy.
LIMIT revival_policy revive # Rehydrate and resume from last turn (default) LIMIT revival_policy reap # Discard session immediately on daemon start LIMIT revival_policy ask # Block lineage; wait for manual resolution
Ask-pending lineages can be resolved via CLI: ostk lineage resolve <id> --revive or --reap. Anchor Exclusivity invariants guarantee that multiple running daemons cannot collide or double-rehydrate the same lineage.
src/kernel/drain.rs, src/kernel/anchor.rs Turn-Boundary Drain Snapshots
A snapshot is written to .ostk/drain/<lineage_id>.json on every completed turn. Snapshots contain all parameters required to re-establish the assistant context from the exact same boundary.
Committed Fields
lineage_id&anchor_idwritten_atISO8601 timestamp- Active LLM configuration and Model string
- Cumulative token usage accounting
- Full structured conversation messages
- Current
LoopConfig(tools, permissions, limits)
Deliberately Ephemeral Fields
root/ directory pointers (regenerated at boot)pending_images(discarded across turns)runtime_allowedapprovals (rebuilt per run)- Tokio task handles, cancel flags, and IPC channels
- Mid-turn outbox events (rehydration resumes from turn boundaries only)
src/kernel/drain.rs (V2 Upgrade path supported at upgrade_from_v1) Kill & Reap Protocol
Process termination distinguishes between active termination (Kill) and post-mortem state synchronization (Reap).
Kill Sequence
- Send SIGTERM to the process group (negative PID).
- Initiate a 5-second grace period for clean exit.
- Fall back to SIGKILL if the process fails to terminate.
Does not trigger a final drain; managed by the session process table.
Reap Process
- Sweep the active table in
agents.jsonl. - Probe active entries using
kill(pid, 0). - For dead processes, update status to inactive and prune metadata.
Triggered periodically or via ostk kernel reap; removes heartbeat locks.
The Ephemeral Invariants
"Agents are ephemeral" is the second of the Five Foundational Laws. The concrete runtime constraints enforced by the kernel include:
State resets on daemon restart
Preload contexts, temporary tools, and in-memory session structures are entirely generated fresh at boot. Pinning to local RAM state across restarts is forbidden.
Token accounting resets on rehydrate
The token budget for the running agent process is tracked locally in the process memory. If the process is rehydrated from a snapshot, token accounting is initialized clean from that point.
Task handles are tokio-bound
Tokio futures, file stream handles, and socket event loop primitives cannot be serialized. Rehydrated processes are initialized with fresh handles, resuming from the last messages on disk.
"State is held in ServerState and resets on daemon restart. Arrivals are not themselves persistent kernel state — the persistent record is the audit row stream, which recall @arrived can query as the canonical arrival record." — src/kernel/presence.rs