How I Built a Streaming AI Chat for My Portfolio — and What I Learned

I added an AI chat to my portfolio. You can ask it anything about my background. Here's exactly how it works — and what I learned about building streamed AI responses in Next.js.

Why an AI chat on a portfolio site

Most portfolio sites are one-directional. You read what I decided to write. If you want to know something specific — do I have experience with a particular API, how did I approach a specific problem, why did I make a certain architectural decision — you'd have to email me and wait.

An AI chat solves this. I've given it full context: every project, the technical decisions behind each one, my research background, what roles I'm looking for. A recruiter or hiring engineer can ask the specific question they have, right now, without the round-trip.

It also demonstrates exactly the thing it claims about me — that I build working AI features, not just demos.

The architecture

Three components: a system prompt, an API route, and a streaming chat UI.

The system prompt

The system prompt is a compressed version of my background. Not my resume — something more useful. It covers:

Background narrative (sports science → AI engineering)
Every project with its tech stack and key decisions
Research publications with titles and PubMed IDs
Current work status and what roles I'm looking for
Response guidelines (be direct, go technical if asked, don't make things up)

The guidelines section matters. Without them, the AI would happily invent impressive-sounding project details. With them, it stays grounded. If someone asks about something not in the context, it says it doesn't know and suggests they email me.

The API route

The server-side route is a Next.js API handler that streams the response from Claude Haiku:

// app/api/chat/route.ts
export async function POST(request: Request) {
  const { messages } = await request.json();
  
  const client = new Anthropic({ apiKey: process.env.ANTHROPIC_API_KEY });
  
  const stream = new ReadableStream({
    async start(controller) {
      const anthropicStream = await client.messages.stream({
        model: "claude-haiku-4-5",
        max_tokens: 1024,
        system: SYSTEM_PROMPT,
        messages,
      });

      for await (const chunk of anthropicStream) {
        if (chunk.type === "content_block_delta") {
          controller.enqueue(encoder.encode(chunk.delta.text));
        }
      }
      controller.close();
    },
  });

  return new Response(stream, {
    headers: { "Content-Type": "text/plain; charset=utf-8" },
  });
}

The key is returning a ReadableStream as the response body, not a JSON object. The client reads it incrementally using body.getReader(), and each chunk updates the UI in real time.

The streaming client

On the client, the loop looks like this:

const reader = res.body.getReader();
const decoder = new TextDecoder();
let accumulated = "";

while (true) {
  const { done, value } = await reader.read();
  if (done) break;
  
  const chunk = decoder.decode(value, { stream: true });
  accumulated += chunk;
  setStreamingContent(accumulated); // re-renders on each chunk
}

// Finalize message
setMessages(prev => [...prev, { role: "assistant", content: accumulated }]);
setStreamingContent("");

The trick is maintaining two pieces of state: streamingContent for the in-progress message, and messages for completed ones. The streaming message renders with a blinking cursor; when the stream ends, it moves to the messages array and the cursor disappears.

Model choice: Haiku over Sonnet

Claude Haiku is 25x cheaper than Sonnet and meaningfully faster. For a portfolio chat where the questions are about career history and projects (not complex reasoning), Haiku is the right call. The context window is sufficient; the quality is more than adequate. Sonnet would add latency and cost without visible benefit for this use case.

The lesson generalizes: model choice should match task complexity. Haiku for extraction and summarization, Sonnet for reasoning under uncertainty, Opus for genuinely difficult tasks. Using the largest model by default is a tax on latency and cost that usually doesn't buy anything.

Conversation history

The messages array grows with each turn. On submit, the full conversation history is sent to the API, which passes it to Claude as the messages array. This is how multi-turn chat works with stateless LLMs — the client holds the state, the server sees only what it's given.

There's a subtle implication: as conversations get longer, API calls get more expensive and slower. For a portfolio site this doesn't matter (conversations rarely go beyond 6-8 turns), but for a production chat interface you'd want a windowing strategy — keep the last N messages, or summarize the earlier ones.

What I'd do differently

Add rate limiting. Right now there's nothing stopping someone from running hundreds of queries and burning through API credits. A simple IP-based rate limit in the API route (or at the edge) would fix this.

Log conversations. I have no idea what questions people are actually asking. Even a basic log to a database would let me see what information is missing from the system prompt, which questions the AI gets wrong, and whether the feature is being used at all.

Structured retrieval. Right now, the system prompt is a flat text document. For a larger knowledge base, you'd want embeddings + vector search — retrieve the most relevant chunks for each query rather than stuffing everything into context. Overkill for a portfolio site; essential for anything at scale.

The prompt engineering part

The most important few lines of the system prompt aren't the factual content — they're the behavioral guidelines at the end:

Be friendly but professional
Be honest about what you don't know
For technical questions, go deep — Harrison appreciates it
If someone asks for a resume, explain the portfolio covers his work and suggest they email

Without these, the AI would answer every question with maximum confidence, whether or not it had reliable information. The guidelines define the failure mode — what the AI does when it's uncertain — which is arguably more important than what it does when it's confident.

Try it

The chat is live at /ask. Ask it something specific — what I'm working on, how I built a particular project, what I'm looking for in a role. It'll tell you, and if it doesn't know, it'll say so.