All projects
Live

Voice AI Demo

A voice AI assistant that proves browsers are more powerful than you think

Built in One session (March 2, 2026)
AIVoiceWeb Speech APIStreamingBrowser APIsClaudeNext.js

<1s perceived latency

Browser-native STT + TTS

Full conversation context

Streaming sentence-by-sentence synthesis

01The Problem

Voice AI in the browser is underexplored — most developers reach for Deepgram or Whisper immediately. I wanted to understand what the native browser APIs actually give you, build the full voice pipeline end-to-end, and understand where the real latency bottlenecks are (spoiler: not the LLM).

02The Approach

Web Speech API handles both capture and playback entirely in the browser. Claude Haiku via streaming fetch handles generation. The key insight: start TTS synthesis as each sentence arrives (not after the full response) to achieve sub-1-second perceived latency. The LLM call took 30 minutes to build. The voice pipeline took 4 hours.

03Architecture Decisions

Pipeline streaming for low latency

The response streams from Claude → sentence boundary detection → SpeechSynthesis queue. Users hear the first sentence while Claude generates the second. Perceived response time drops from 2-3 seconds (full response) to under 1 second (first sentence).

System prompt tuned for speech

Standard LLM responses are optimized for reading — long paragraphs, bullet points, markdown formatting. The system prompt constrains responses to under 3 sentences in conversational language. This single constraint changes the character of responses from 'chatbot writing' to 'person talking'.

Graceful browser compatibility handling

SpeechRecognition is Chromium-only (Chrome/Edge). The demo detects API availability on load and shows a clear compatibility warning on unsupported browsers rather than silently failing.

04Key Insight

The bottleneck in voice AI isn't the LLM — it's the voice pipeline. Silence detection timing, sentence boundary detection for streaming TTS, connection recovery when recognition cuts out, and voice selection UX all took longer than the Claude integration. Building voice AI reveals where the real complexity lives.

05Why It Matters

Demonstrates the full voice AI stack — the same problem space companies like Giga ($61M Series A, Vancouver) are solving at enterprise scale. Shows understanding of where the hard engineering problems are: latency, voice tuning, and browser compatibility — not just LLM integration.