What comparable tools do
The harness today is a CLI — run a bot, get output, done. No history, no overview, no live status. Three comparable tools show what a web dashboard adds:
Railway. Each deployment is a discrete unit with status, logs, start time, and duration. The log viewer is a first-class feature. Railway's insight: deploying is an event, and events need replay.1
Claude Code on the web. The closest direct precedent. Session list with timestamps, a conversation thread showing text and tool calls as expandable items, token usage visible per session. It doesn't hide the tool call plumbing — it treats it as useful information.2
Linear. Not a monitoring tool, but relevant for thinking about agent sessions as work items. Each session has a prompt (title), result (description), cost (effort), and status. The mental model transfers directly — sessions are things that happened, not just log lines.3
What the CLI can't tell you
You can't see at a glance which bots are running, which errored, and which haven't run today. The CLI only shows status for the one bot you just ran.
CLI output vanishes when the terminal closes. There's no way to review what a bot did in a past run, audit its tool calls, or compare sessions across days.
The CLI shows cost per run ($0.0024). No per-bot total, no weekly trend, no model comparison. At scale this is a financial blind spot.
Cron expressions in bots.harness.json tell you the schedule, but not "when does historian run next?" or "how long ago did maexbot last run?" The CLI has no schedule awareness at runtime.
The CLI prints [tool] tool_name with no structure: no inputs, no outputs, no timing. Debugging a session means adding ad-hoc logging. The data model already has everything — it's just not surfaced.
The agent-sdk runtime supports session resumption via resumeSessionId. But there's no UI to find and resume a past session. This capability is invisible to anyone not reading the source code.
Navigation structure
Flat enough to understand at a glance. Following the Monitor IA proposal — task-based grouping, not product-based. Three things you need: status (Overview), detail (Bot pages), and cost (Stats).
"Is everything running?" — Bot status grid (6 cards) answering running / idle / off at a glance, plus a feed of recent sessions across all bots for quick access.
Controls panel (from the bot-controls design) + session history list + config sidebar: model, runtime, schedule (cron rendered as human time), MCP servers. Clicking a session row navigates to the session viewer.
Full conversation replay — prompt, tool call groups (collapsed by default), assistant turns, final response. Stats bar at top: duration, cost, tokens, stop reason. Reached by clicking any session row.
Cost aggregate by bot, by model, by day. The financial view the CLI can't provide — not mocked here, but shaped by the same data already in SessionStats.costUsd.
Overview
The landing page. Six bot cards answer "is everything OK?" at a glance — green dot running, amber ring idle, gray dot off. Below: a recent sessions feed for quick access across all bots.
The 40-30-20-10 space rule applies here: bot status grid gets most of the vertical real estate, recent sessions fill the middle, aggregate cost sits in the header.4
Bot detail
The bot-controls component (from the existing bot-controls design) sits at top-left, anchored to context. Right side: full session history list. Below the controls: schedule, MCP server count. Clicking any session row opens the viewer.
Session items with a non-end_turn stop reason get a colored left border — errors are visible without reading every row.
Session viewer
The most technically distinctive screen and the biggest value-add over the CLI. Instead of flat log lines, you get a structured replay: the prompt, then alternating tool call groups and assistant text turns, ending with the final response.
Tool calls are collapsed by default — showing the shape (how many, which MCP server) without overwhelming. Expand any call to see the full input and result. The stats bar pins the key numbers at the top where you see them first.
Why this design
SessionResult — the viewer just surfaces it.costUsd is already computed by the harness — we just propagate it to every list row, every card, and the overview total.end_turn stop reasons get a red left border. end_turn is the happy path. max_tokens and refusal indicate sessions that didn't complete as intended. A colored left border on the session list item surfaces problems at a glance, without coloring the whole row (which would be alarming for something recoverable).What we know vs what we're guessing
High confidence: The session viewer is the right centerpiece. Claude Code's web interface directly validates the pattern — same data model (text + tool_use + tool_result blocks), same structured viewer. We're not inventing anything here, we're applying a proven pattern to the harness data model.
High confidence: Cost visibility belongs on every surface. It's already computed. Surfacing it is free, and the alternative (aggregating costs manually from CLI output) doesn't scale.
Medium confidence: The sidebar IA (Overview → Bots → Stats) is reasonable but untested. With 6 bots today the list is short. If the colony grows significantly, collapsible groups or a search input may become necessary — but adding them now would be premature.
Speculative: The Stats page shape is obvious (per-bot, per-day cost chart, per-model breakdown) but not mocked here. I'd want to see real multi-week usage data before designing the exact layout — the interesting chart is whatever one emerges from actual patterns, not a generic cost graph.
Out of scope: Live streaming of in-progress sessions. The current harness is synchronous — runSession and runAgentSession both return when complete. A live thread view would require a streaming API layer that doesn't exist yet. The overview mockup shows a running session with a "14m ↑" indicator via polling, but full live streaming is a separate project.