Writing
·8 min read

I Built a Visible Autonomous Research Agent. Here's What Agentic Systems Actually Require.

Most AI demos are stateless: one prompt, one response. Real agentic systems plan, decompose into steps, run in parallel, handle failures, and synthesize across multiple context windows. I built one with full visibility at every step — and the hard part wasn't the AI calls.

aiagentsengineeringorchestrationstreamingarchitecture

Most AI demos are stateless: one prompt, one response, done. Real agentic systems don't work like that. They plan, decompose a task into steps, execute those steps (sometimes in parallel), handle partial failures, and synthesize results across multiple context windows. I built Research Canvas to make this entire pipeline visible — and to understand from the inside what building it actually requires.

The gap in "agentic" demos

There are two kinds of "agentic AI" demos. The first kind wraps a single complex prompt in a loading spinner and calls it an agent. The second kind actually decomposes a task, makes multiple LLM calls, manages state between them, and shows you what's happening at each step. The first kind is a lie. The second kind is what production agentic systems look like.

Research Canvas is the second kind. The user enters a research topic. The system:

  1. Calls the planning API to generate a structured research outline (3-5 sections)
  2. Launches parallel streaming calls to research each section independently
  3. Accumulates the section content as streams complete
  4. Calls the synthesis API, passing all section content, to produce a coherent final report

Each step is visible in the UI as it happens. The research plan appears first, section by section. The research log shows which section is being processed and streams its content live. The final report panel assembles as sections complete. By the time the synthesis streams in, you've already seen everything that's going into it.

The architecture: client-orchestrated pipeline

The key architectural decision was client-side orchestration rather than a server-side pipeline. Three separate Next.js API routes:

  • /api/research/plan — generates the research outline as structured JSON
  • /api/research/section — researches a single section with streaming SSE output
  • /api/research/synthesize — takes all section content and produces a final report with streaming SSE

The client manages the pipeline state: it calls /plan, waits for the JSON, then fires parallel /section calls for each section, tracks which have completed and their content, and calls /synthesize once all sections are done.

This is very different from a monolithic server-side pipeline that calls all three stages internally and returns one final result. Client-side orchestration means: the pipeline state is inspectable, partial failures are recoverable, and each API route is independently testable.

// Simplified client orchestration
const plan = await fetch('/api/research/plan', { body: topic });
const sections = plan.sections;

// Launch all section research in parallel
await Promise.all(sections.map(async (section) => {
  const stream = await fetch('/api/research/section', {
    body: { topic, section }
  });
  // Stream content into UI as it arrives
  for await (const chunk of stream) {
    updateSectionContent(section.id, chunk);
  }
}));

// Synthesize once all sections complete
await fetch('/api/research/synthesize', {
  body: { topic, sections: completedSections }
});

Sequential streaming with context threading

Sections are researched sequentially, not in parallel. This is a deliberate design choice: each section call receives previous sections as context, so section 3 can reference findings from sections 1 and 2. Running in parallel would mean every section starts with zero context — you'd get disconnected fragments rather than a coherent research progression.

const completedSections: { title: string; content: string }[] = [];

for (const section of plan.sections) {
  const response = await fetch('/api/research/section', {
    body: JSON.stringify({
      topic,
      sectionTitle: section.title,
      sectionDescription: section.description,
      // Each section gets full context of what's already been researched
      previousSections: completedSections,
    })
  });
  
  // Stream section content, accumulate for next section's context
  const content = await streamSection(response, section.id);
  completedSections.push({ title: section.title, content });
}

The previous sections are truncated to 500 characters each before being included in the prompt — enough context for the model to avoid repetition and build on prior findings, without blowing up the token count. This is a real token management decision: more context is better up to a point, then it becomes noise and cost.

The result is a research pipeline where each section meaningfully builds on the previous. Section 3 ("Practical Applications") can reference specific mechanisms discussed in Section 2 ("Technical Architecture"). The final synthesis gets the same full context, which is why the synthesized report reads as coherent analysis rather than stitched-together paragraphs.

Dual model selection: Haiku for sections, Sonnet for synthesis

Section research uses Claude Haiku. Each section gets a narrow, focused prompt: "Research the following aspect of [topic]. Write 200-300 words covering key points, relevant context, and important considerations." Haiku handles this well — the scope is constrained, the output format is simple prose, and speed matters because multiple sections run in parallel.

Final synthesis uses Claude Sonnet. The synthesis prompt passes all section content and asks Claude to produce a coherent, well-structured research report that integrates the insights from each section into flowing prose. This requires understanding relationships between sections, avoiding repetition, and maintaining a consistent analytical voice across the full report. Sonnet handles this noticeably better than Haiku.

The cost breakdown: ~3-5 Haiku calls (sections) + 1 Sonnet call (synthesis). Total cost per research run is roughly $0.02-0.04 — cheap enough to be irrelevant, expensive enough to be real.

Error handling in multi-step pipelines

Production agentic systems need to handle partial failures gracefully. In Research Canvas, if one section's research call fails (timeout, API error, rate limit), the pipeline shouldn't abort entirely. The design: sections that fail get a placeholder ("Research unavailable for this section") and the pipeline continues. Synthesis receives whatever sections completed successfully.

This is a meaningful difference from single-prompt AI demos. When your pipeline has 5 network calls, you need to think about what happens when call 3 fails. The answer in most cases is: continue with partial results, surface the failure visibly, and don't pretend everything worked.

What building this taught me

The most useful realization from building Research Canvas: the complexity in agentic systems isn't the AI calls — it's the orchestration layer.

Each individual API call is simple: prompt in, text out. But managing the state across multiple simultaneous calls, surfacing partial progress to the user, handling failures gracefully, and composing a coherent final output from distributed partial results — that's the actual engineering challenge.

This is why "wrapper" agents that just call GPT with a longer prompt aren't agentic systems. The pipeline infrastructure — state management, error recovery, partial result composition, user-facing progress visibility — is the product, not the API call.

The same pattern underlies every production agentic system I've studied: autonomous coding assistants, enterprise document processors, research tools like Perplexity. The AI capability is table stakes. The orchestration is the differentiator.