Pipeline Demo
A 4-stage LLM pipeline where you can see every step
4-stage sequential pipeline
Token counting per stage
Wall-clock timing display
Context accumulation pattern
01The Problem
Enterprise AI systems run multi-step LLM pipelines — extract, classify, summarize, route — but the architecture is usually invisible. I wanted to build a visible implementation: a pipeline where every stage is explicit, every model call is separate, every output flows into the next stage. Both to demonstrate the orchestration pattern and to think through the real production tradeoffs.
02The Approach
Four sequential Claude Haiku calls, each with a specialized system prompt. Stage 1 (Extract) returns structured JSON — entities, dates, amounts, obligations. Stage 2 (Analyze) receives the document + extracted facts and identifies risks, obligations, open questions. Stage 3 (Synthesize) receives all prior outputs and writes an executive summary with a recommendation (approve / review / reject). Stage 4 (Actions) produces 5 concrete, owner-assigned action items with timelines. Context accumulates forward — later stages benefit from upstream processing.
03Architecture Decisions
Sequential context accumulation
Each stage receives not just the original document but the outputs of all preceding stages. This is the core orchestration pattern: downstream stages benefit from upstream processing. The synthesizer doesn't need to re-extract facts — it already has them structured. The action item generator has the full analysis and recommendation to draw from.
Specialized system prompts per stage
Each stage has a different objective and voice. The extractor is a 'precise document parser' — it returns JSON, no prose. The analyzer is a 'sharp analytical reviewer' — direct, specific, opinionated. The synthesizer is a 'senior advisor' — executive-level language, concrete recommendation. The action generator is a 'project coordinator' — verbs, owners, timelines. Specialization over a single general prompt produces dramatically better output quality.
Per-stage instrumentation
Every stage reports input tokens, output tokens, and wall-clock time. This is standard in production AI systems — you need to know where latency and cost come from. A 1.2s extract stage followed by a 2.8s analysis stage tells you exactly where to optimize.
Client-driven orchestration (visible by design)
The pipeline is orchestrated client-side: the browser calls the API once per stage, accumulates results, and passes them forward. This is simpler than server-side chaining and makes the flow visible — the user watches each stage complete before the next begins. For a demonstration of the pattern, visibility wins. In production, you'd move orchestration server-side and stream the outputs.
04Key Insight
Specialization dramatically improves output quality. A single 'analyze this document and give me everything' prompt produces mediocre results across all dimensions. Four specialized prompts — each with a clear role, voice, and scope — produce much sharper outputs. The overhead of running four model calls is small compared to the quality improvement.
05Why It Matters
The clearest possible demonstration of LLM orchestration patterns used in enterprise AI systems — document processing pipelines, customer support routing, legal review workflows. Directly relevant to Clio Enterprise AI, legal tech, and knowledge management use cases.