Claude Model Face-Off
See Haiku vs Sonnet race in real-time — same prompt, simultaneous streams
Simultaneous dual streams
Real-time TTFT measurement
Tokens/second display
Winner comparison banner
01The Problem
Model selection is usually a judgment call backed by vibes. Which is faster? Which is smarter? By how much? The tradeoffs between Haiku (fast, cheap) and Sonnet (powerful, slower) are often described in benchmarks, but benchmarks feel abstract. I wanted to make the tradeoffs visible and interactive — same prompt, two models, race them head to head.
02The Approach
The client fires two fetch requests simultaneously — one to /api/stream?model=haiku, one to /api/stream?model=sonnet. Both API routes proxy SSE from the Anthropic API using raw fetch (not the SDK, which has Vercel streaming issues). Timing is measured client-side: TTFT is the ms from fetch start to the first text delta event; total time is ms to the done event. Token counts come from Anthropic's message_start and message_delta events.
03Architecture Decisions
Parallel client-side streams
Both model streams start at exactly the same time from the client — a Promise.all over two fetch calls. This means timing reflects real API behavior: if Haiku gets its first token in 180ms and Sonnet in 450ms, you see both numbers simultaneously. There's no sequential bias from one request blocking another.
Raw fetch SSE proxy — not the Anthropic SDK
The Anthropic SDK v0.37.0 has a compatibility issue with Vercel's serverless streaming (authentication resolves incorrectly at runtime). Raw fetch avoids this entirely: POST to api.anthropic.com/v1/messages with stream:true, pipe the SSE response through a ReadableStream, parse content_block_delta events, emit our own simplified events (text, done, error) to the client.
Client-side timing with TTFT
Time to first token (TTFT) is measured from the fetch start to the first text event, client-side. This captures real perceived latency including network time. Total time is tracked to the done event. Tokens-per-second is derived: output_tokens / (totalTimeMs / 1000). All three metrics are displayed in real-time in each panel's stats bar.
Winner banner with qualitative framing
When both streams complete, a winner banner appears comparing the results. But raw speed isn't the only axis — if Sonnet generated significantly more output tokens for the same prompt, the banner notes that depth tradeoff. The goal is insight, not just a leaderboard.
04Key Insight
TTFT is the most important production metric for user-facing AI features, but most people don't think about it explicitly. When Haiku streams its first token in 180ms and Sonnet takes 450ms, the user's experience is dramatically different — one feels instant, one feels like loading. But for a task requiring nuanced reasoning, Sonnet's extra 270ms may be worth it. Making this tradeoff visible changes how engineers think about model selection.
05Why It Matters
A portfolio demo that's also a genuine engineering tool — useful for anyone deciding between Claude models for a new feature. Demonstrates streaming architecture, parallel async patterns, SSE proxying, and client-side timing instrumentation. The kind of demo that earns a second look from an engineering interviewer.