Writing
·8 min read

What LLM Pipelines Actually Look Like — A Visible Implementation

Enterprise AI runs on multi-step LLM pipelines. I built a visible one: 4 stages, 4 specialized prompts, context accumulating forward. Each stage shows timing, token usage, and intermediate output. Here's the architecture and what building it revealed.

aillm orchestrationenterprise aiengineeringsystem design

Enterprise AI runs on multi-step LLM pipelines. I built a visible one: 4 stages, 4 specialized prompts, context accumulating forward. Here's the architecture and what building it revealed about specialization vs general prompts.

The invisible pipeline problem

When enterprise software teams talk about “AI features,” they usually mean multi-step LLM pipelines. Not a single prompt — a workflow: extract facts from a document, classify them, summarize for a stakeholder, route to the right team. These pipelines run invisibly — inputs go in, processed outputs come out, the steps between are opaque to the user and often to most of the engineering team.

I wanted to build one where every step is visible — where you can watch the pipeline execute, see each stage's output, check the token counts and timing, and understand why the final action items look the way they do. The result is Pipeline Demo.

The four stages

The pipeline runs any document through four sequential LLM calls:

Stage 1: Extract. A “precise document parser” returns structured JSON — parties, dates, amounts, obligations, document type. Pure extraction, no interpretation. Output: machine-readable facts.

Stage 2: Analyze. A “sharp analytical reviewer” receives the document plus the structured extraction from Stage 1. Identifies key risks, notable obligations for each party, and open questions that need clarification. Output: opinionated risk assessment.

Stage 3: Synthesize. A “senior advisor writing for a busy executive” receives all prior outputs and produces a 2-3 sentence summary in plain English, three key takeaways, and a recommendation — approve, review further, or reject — with justification. Output: stakeholder-ready summary.

Stage 4: Action Items. A “project coordinator” receives the full context and produces exactly five concrete next steps: each with a verb, an owner role, and a timeline. Output: actionable tasks.

Context accumulation

The key architectural feature: context accumulates forward. Each stage receives not just the original document but the outputs of all prior stages.

This changes what's possible. The action item generator doesn't need to re-extract parties from the document; it already has them structured. The synthesizer doesn't need to identify risks; the analysis stage already did. Later stages can be more opinionated because earlier stages have done the groundwork.

The alternative — running each stage independently against the raw document — would produce worse results. The synthesizer would be doing extraction work. The action generator would be reasoning about risks it hadn't formally identified. Decomposing the problem lets each stage focus on one thing.

Specialization vs a general prompt

The most important lesson from building this: specialization dramatically improves output quality.

I ran the naive version first: a single prompt that asked Claude to extract facts, analyze risks, summarize, and generate action items all at once. The output was mediocre across all four dimensions — decent extraction, vague risk analysis, generic summary, useless action items.

When I split into four specialized prompts — each with a clear persona, objective, output format, and voice — the quality improved significantly on every dimension. The extractor returned clean JSON. The analyzer was pointed and specific. The synthesizer made a real recommendation instead of hedging. The action items had owners and timelines.

The intuition: a model playing a specific role outperforms a model trying to be all roles simultaneously. “You are a sharp analytical reviewer. Here are the facts. Identify risks” is a more tractable task than “analyze this document from all angles.”

The instrumentation

Every stage reports input tokens, output tokens, and wall-clock time. Stage 1 (extraction with JSON output) typically runs fastest. Stage 2 (analysis, more open-ended prose) uses more output tokens. The synthesis stage often takes the longest — it has the most context to process.

This instrumentation is standard in production AI systems. If you're paying per token and serving many users, you need to know where the cost is concentrated. If latency matters, you need to know which stage to optimize. Making it visible in the demo is partly for demonstration purposes, partly because I think about production implications when I build.

Client-side orchestration

The pipeline is orchestrated from the browser. Each stage is a separate POST request to the same API route with a different stage index and the accumulated results from prior stages.

This is the simpler choice, not the production choice. Server-side orchestration would be faster — no round-trip latency between stages — and would allow streaming. Client-side makes the flow visible: the user watches each stage complete before the next begins. For a demonstration of the pattern, visibility wins. In production, you'd move orchestration server-side and stream the outputs.

What it's for

Pipeline Demo is a portfolio artifact. I built it to have a concrete, runnable demonstration of LLM orchestration patterns — the kind of architecture enterprise AI teams build for document processing, customer support routing, legal review workflows.

It's also immediately useful. Pasting an employment contract and watching it decompose into extraction → analysis → summary → action items in about 10 seconds is genuinely faster than reading the contract myself. That's the bar for useful AI tooling: it should save real time on a real task.

The code is on GitHub. The live app is at pipeline-demo-beta.vercel.app.