What needs work
The tool works well functionally. Two classifiers, side-by-side results, 66-example benchmark that auto-runs. The issues are visual consistency and information hierarchy — the page reads like a prototype that grew organically.
Instrument Serif for headlines, IBM Plex for body, IBM Plex Mono for labels, Space Mono loaded but unused. Four font families is two too many for a single-page tool. The serif gives it an editorial feel that clashes with the technical content.
The title "Direct vs Agentic" is set in Instrument Serif at 48px — elegant but not assertive. The "INTENT CLASSIFIER" super-label at 11px mono disappears. The whole header section feels like a blog post, not a tool.
The two-column Heuristic/NLI info floats without a card or visual boundary. At a glance it's hard to tell this is a discrete section. The badges (CLIENT-SIDE, SERVER-SIDE) are the only visual anchors.
The result intent word ("agentic" / "direct") at 32px serif is soft. The confidence bars are 4px thin with no labels explaining what they mean to a new user. The gradient top-bar is subtle to the point of invisible.
66 rows across three expandable groups. Each row shows prompt text + expected + two results, but the results are tiny colored pills. The "H: 100% NLI: 100%" stat line per group is cramped and hard to parse.
The page uses its own color tokens (const C) with no overlap with the colony design system. Space Grotesk + Space Mono are the colony standard; this page uses IBM Plex + Instrument Serif. Feels like a different project.
Consolidate to two families
Drop Instrument Serif and IBM Plex entirely. Use Space Grotesk for all display and body text, Space Mono for labels, data, and technical content. This aligns with the colony design system and gives the tool a crisper, more technical feel.
The result intent word loses the serif elegance but gains legibility and coherence. The title trades height (48px → ~43px) for weight — Space Grotesk 700 with tight letter-spacing hits harder than the wispy serif.
Keep the palette, tighten the system
The rose/indigo pair for direct/agentic is strong and well-separated. The green/red benchmark pass/fail is standard. The issue isn't the hues — it's how they're applied. Too many near-identical grays with no clear hierarchy.
Proposed change: Collapse the 7 text grays (111827 through 9ca3af) into 4 clear levels. Cut textFaint and textMedium — use textMuted (#6b7280) for secondary content and text (#111827) for everything primary. The visual difference between textStrong (#1f2937) and text (#111827) is negligible on screens.
Sharpen the sections
The page flows well top-to-bottom but the sections bleed into each other. Each section should be a visually distinct unit with clear boundaries.
Proposed direction
Static mockup showing the proposed typography, layout, and color treatment. Not interactive — just enough to evaluate the direction before implementation.
- Scores prompts against handcrafted rules
- Instant — runs in your browser
- No model, no server needed
- Trained on 66 labeled examples
- Understands meaning, not just keywords
- ~5ms on GPU
Key differences from current: Space Grotesk throughout, info sections in cards, thicker accent bars on results, higher-contrast labels. The overall density is similar — this isn't about adding whitespace, it's about making the existing space work harder through typography and containment.
Scope and approach
This is a CSS/font pass on a single JSX file. No structural React changes, no new components, no API changes. The inline styles in app.jsx would be updated in place.
const C text grays from 7 to 4. Keep all intent colors (rose, indigo) and semantic colors (green, red) unchanged.background: C.cardBg, border, borderRadius as result cards.linear-gradient(90deg, transparent, accent, transparent) to solid background: accent, increase height from 3px to 4px.Estimated changes: ~40 lines in const C, ~30 lines of font-family swaps, ~20 lines of layout tweaks. The benchmark section and signal decomposition don't need structural changes — they just inherit the new fonts and tightened grays.
One pass, cohesive output
This proposal is a visual consistency pass, not a redesign. The tool's structure, interactions, and information architecture are already good. The changes are:
- Typography: 4 families → 2 (Space Grotesk + Space Mono)
- Colors: 7 text grays → 4, intent colors unchanged
- Layout: Info sections get cards, result bars get thickened, confidence bars get taller
- Details: Full-width button → disclosure link
- Alignment: Matches colony design system (Space Grotesk / Space Mono / paper palette)
The result should feel like it belongs to the same family as the other dearlarry.co tools without losing the specialized character of a comparison tool.