Design Proposal

Bot harness
dashboard

A web interface for managing and monitoring AI bot sessions — status overview, per-bot controls, session history, and cost visibility.

Precedent

What comparable tools do

The harness today is a CLI — run a bot, get output, done. No history, no overview, no live status. Three comparable tools show what a web dashboard adds:

Railway. Each deployment is a discrete unit with status, logs, start time, and duration. The log viewer is a first-class feature. Railway's insight: deploying is an event, and events need replay.1

Claude Code on the web. The closest direct precedent. Session list with timestamps, a conversation thread showing text and tool calls as expandable items, token usage visible per session. It doesn't hide the tool call plumbing — it treats it as useful information.2

Linear. Not a monitoring tool, but relevant for thinking about agent sessions as work items. Each session has a prompt (title), result (description), cost (effort), and status. The mental model transfers directly — sessions are things that happened, not just log lines.3

Audit

What the CLI can't tell you

1
No colony overview

You can't see at a glance which bots are running, which errored, and which haven't run today. The CLI only shows status for the one bot you just ran.

2
Sessions are ephemeral

CLI output vanishes when the terminal closes. There's no way to review what a bot did in a past run, audit its tool calls, or compare sessions across days.

3
Cost is invisible in aggregate

The CLI shows cost per run ($0.0024). No per-bot total, no weekly trend, no model comparison. At scale this is a financial blind spot.

4
Schedule is opaque

Cron expressions in bots.harness.json tell you the schedule, but not "when does historian run next?" or "how long ago did maexbot last run?" The CLI has no schedule awareness at runtime.

5
Tool calls are flat log lines

The CLI prints [tool] tool_name with no structure: no inputs, no outputs, no timing. Debugging a session means adding ad-hoc logging. The data model already has everything — it's just not surfaced.

6
Session resume is invisible

The agent-sdk runtime supports session resumption via resumeSessionId. But there's no UI to find and resume a past session. This capability is invisible to anyone not reading the source code.

Information Architecture

Navigation structure

Flat enough to understand at a glance. Following the Monitor IA proposal — task-based grouping, not product-based. Three things you need: status (Overview), detail (Bot pages), and cost (Stats).

Harness
Overview
maexbot opus
forge sonnet
historian sonnet
concierge haiku
curator haiku
designer opus
$ Stats
Overview

"Is everything running?" — Bot status grid (6 cards) answering running / idle / off at a glance, plus a feed of recent sessions across all bots for quick access.

Bot detail

Controls panel (from the bot-controls design) + session history list + config sidebar: model, runtime, schedule (cron rendered as human time), MCP servers. Clicking a session row navigates to the session viewer.

Session viewer

Full conversation replay — prompt, tool call groups (collapsed by default), assistant turns, final response. Stats bar at top: duration, cost, tokens, stop reason. Reached by clicking any session row.

Stats

Cost aggregate by bot, by model, by day. The financial view the CLI can't provide — not mocked here, but shaped by the same data already in SessionStats.costUsd.

Screen 1 / 3

Overview

The landing page. Six bot cards answer "is everything OK?" at a glance — green dot running, amber ring idle, gray dot off. Below: a recent sessions feed for quick access across all bots.

The 40-30-20-10 space rule applies here: bot status grid gets most of the vertical real estate, recent sessions fill the middle, aggregate cost sits in the header.4

harness.dearlarry.co
Harness
Overview
Bots
maexbot
forge
historian
concierge
curator
designer
$ Stats
Overview
Today: $0.42 across 8 sessions
maexbot opus
Running · 14m
$0.18 today
forge sonnet
Idle · 2h ago
$0.09 today
historian sonnet
Idle · 5h ago
$0.11 today
concierge haiku
Off
curator haiku
Off
designer opus
Idle · 1h ago
$0.04 today
Bot Started Duration Cost Stop
maexbot 16:34 14m ↑
designer 15:48 8m 22s $0.04 end_turn
forge 14:23 3m 14s $0.09 end_turn
maexbot 12:00 21m 07s $0.18 end_turn
historian 11:30 6m 45s $0.11 end_turn
Screen 2 / 3

Bot detail

The bot-controls component (from the existing bot-controls design) sits at top-left, anchored to context. Right side: full session history list. Below the controls: schedule, MCP server count. Clicking any session row opens the viewer.

Session items with a non-end_turn stop reason get a colored left border — errors are visible without reading every row.

harness.dearlarry.co/bots/maexbot
Harness
Overview
Bots
maexbot
forge
historian
concierge
curator
designer
$ Stats
maexbot
claude-opus-4-6 · agent-sdk · 0 12 * * *
Active 32% / 14m
Enabled
Auto-restart
Model
opus
sonnet
haiku
Restart
Fresh Start
0 12 * * *
Daily at noon
Next: tomorrow 12:00
linear 28 tools
gmail 6 tools
gcal 9 tools
Today 16:34 — running
Do a daily standup check — review Linear issues, check calendar...
14m · 12 tool calls · in progress
Today 12:00 end_turn
Do a daily standup check — review Linear issues, check calendar...
21m 07s · 18 tool calls · $0.18
Yesterday 12:00 end_turn
Do a daily standup check — review Linear issues, check calendar...
19m 44s · 14 tool calls · $0.16
Apr 8 12:00 end_turn
Do a daily standup check — review Linear issues, check calendar...
24m 12s · 21 tool calls · $0.21
Apr 7 12:00 max_tokens
Do a daily standup check — review Linear issues, check calendar...
8m 02s · 6 tool calls · $0.07
Screen 3 / 3

Session viewer

The most technically distinctive screen and the biggest value-add over the CLI. Instead of flat log lines, you get a structured replay: the prompt, then alternating tool call groups and assistant text turns, ending with the final response.

Tool calls are collapsed by default — showing the shape (how many, which MCP server) without overwhelming. Expand any call to see the full input and result. The stats bar pins the key numbers at the top where you see them first.

harness.dearlarry.co/bots/maexbot/sessions/2026-04-10-12-00
Harness
Overview
Bots
maexbot
forge
historian
$ Stats
maexbot Today 12:00
Duration
21m 07s
Cost
$0.18
Tool calls
18
Input tokens
12,400
Output tokens
3,200
Cache read
8,100
Stop reason
end_turn
Prompt
Do a daily standup check — review open Linear issues, check calendar for today, send a summary to the colony.
Turn 1 — 3 tool calls
list_issues linear
gcal_list_events gcal
get_issue linear
Input
{ "issueId": "COL-142" }
Result
{ "id": "COL-142", "title": "Fix bot schedule edge case", "state": "In Progress", "assignee": "maexbot" }
Thinking I've retrieved the open issues and today's calendar. I see 3 high-priority issues in "In Progress" and two meetings today at 14:00 and 16:30. Let me check one issue before drafting the summary.
Turn 2 — 1 tool call
save_comment linear
Response
maexbot Daily standup complete. I reviewed 3 open issues: COL-142 (in progress), COL-138 (blocked), COL-135 (awaiting review). Two meetings today at 14:00 and 16:30. Posted a standup comment on COL-142. Colony summary sent via Linear.
Decisions

Why this design

Research-driven
Session viewer as structured thread, not flat log. Claude Code on the web uses this exact pattern — text blocks and tool_use blocks as distinct visual items.2 Session replay research shows structured event timelines reduce debugging time compared to flat log scrubbing.5 The harness already captures everything in SessionResult — the viewer just surfaces it.
Research-driven
Cost visible on every surface. Developer tools that surface API costs at the point of use reduce billing surprises by making cost a natural part of the workflow rather than a post-hoc invoice item.6 costUsd is already computed by the harness — we just propagate it to every list row, every card, and the overview total.
Research-validated
Tool calls collapsed by default. Design instinct: 18 expanded tool calls would dominate the session viewer and bury the actual response. Backed by progressive disclosure — "grasp the available breadth first, then focus on the area of interest."4 Collapsed calls show the shape (how many, which server) without the noise.
Research-validated
Status dots in the sidebar, not just on the overview. Design instinct: while navigating to a specific bot you should still see colony health peripherally. Railway validates this in their deployment sidebar — per-item status in the nav is standard for infra dashboards.1
Design instinct
Schedule shows human-readable time, not raw cron. "0 12 * * *" is verifiable but not immediately useful. "Daily at noon · Next: tomorrow 12:00" is. The raw cron stays visible below for verification. No research — information hierarchy.
Design instinct
MCPs as a separate list below controls. The bot-controls design is about runtime state: is it running? Enable it. Restart it. MCP server configuration is context, not a control. Mixing them creates cognitive load on the controls panel. A separate "MCP servers" card keeps each section to one concern.
Design instinct
Non-end_turn stop reasons get a red left border. end_turn is the happy path. max_tokens and refusal indicate sessions that didn't complete as intended. A colored left border on the session list item surfaces problems at a glance, without coloring the whole row (which would be alarming for something recoverable).
Design instinct
Stats bar pins above the thread, not below. You reach the session viewer because you want to know what happened. The stats bar answers the quick questions (duration, cost, stop reason) before you read the thread. It's a skimmable summary that earns the detail below it.
Confidence

What we know vs what we're guessing

High confidence: The session viewer is the right centerpiece. Claude Code's web interface directly validates the pattern — same data model (text + tool_use + tool_result blocks), same structured viewer. We're not inventing anything here, we're applying a proven pattern to the harness data model.

High confidence: Cost visibility belongs on every surface. It's already computed. Surfacing it is free, and the alternative (aggregating costs manually from CLI output) doesn't scale.

Medium confidence: The sidebar IA (Overview → Bots → Stats) is reasonable but untested. With 6 bots today the list is short. If the colony grows significantly, collapsible groups or a search input may become necessary — but adding them now would be premature.

Speculative: The Stats page shape is obvious (per-bot, per-day cost chart, per-model breakdown) but not mocked here. I'd want to see real multi-week usage data before designing the exact layout — the interesting chart is whatever one emerges from actual patterns, not a generic cost graph.

Out of scope: Live streaming of in-progress sessions. The current harness is synchronous — runSession and runAgentSession both return when complete. A live thread view would require a streaming API layer that doesn't exist yet. The overview mockup shows a running session with a "14m ↑" indicator via polling, but full live streaming is a separate project.

Sources

References