Writing
·6 min read

The System Prompt That Makes Document AI Actually Trustworthy

I built a document Q&A tool in four hours. The code was simple — the interesting part was the system prompt design that prevents hallucination and requires exact citations. Why 'don't speculate' is the most important instruction in document intelligence.

aidocument intelligencesystem designlegal techprompt engineering

I built a document Q&A tool in four hours. But the interesting part wasn't the code — it was the system prompt design that makes it actually trustworthy for legal and professional use cases.

The problem with document AI

Ask ChatGPT “what does section 7 of this employment contract say?” and there's a reasonable chance it will give you a confident answer that's partially made up — drawing on its training data about “typical” employment contracts rather than the specific document in front of it. In a casual context that's annoying. In a legal context, it's dangerous.

The failure mode isn't hallucination in the traditional sense — it's context mixing. The model correctly retrieves information from its context window, but supplements it with related knowledge from training when the document doesn't contain a clear answer. For document Q&A, that behavior is specifically what you need to eliminate.

The tool

DocIQ is a document intelligence demo: paste any document (legal agreements, research papers, product specs, meeting notes), ask questions in plain language, and get streaming answers with exact citations from the source text.

Three sample documents load immediately: an employment contract, a research paper abstract, and a product requirements document. Suggested questions guide users toward what the tool does well. Documents are never stored server-side — they're passed in each request and never persisted.

The underlying architecture is almost intentionally simple. But the system prompt took longer to write than the streaming handler.

System prompt design for document faithfulness

The most important instruction in DocIQ's system prompt is also the one that looks most obvious:

If the document doesn't contain the answer, say so clearly — 
don't speculate.

It looks obvious until you see what happens without it. Models are trained to be helpful. When a question isn't answerable from the document, the default behavior is to synthesize a plausible answer from general knowledge — the model “tries to help” by filling the gap. For document Q&A, that's the worst possible behavior. A user who asks “what is the notice period for termination?” and gets back a synthesized answer about “typical” notice periods has no way of knowing the document doesn't specify one.

The full citation instruction matters just as much:

Always quote specific text from the document when relevant 
(use "..." for exact quotes). Cite location when possible: 
"In the [section/paragraph]..." or "According to the document..."

Requiring exact quotes creates a verifiable audit trail. If Claude claims the non-compete clause lasts 12 months, the quoted text either exists in the document or it doesn't. The user can check. This is the difference between a useful legal tool and a plausible-sounding liability.

Full context vs. RAG

DocIQ uses the “full context window” approach: the entire document is passed in the system prompt for every request. This is the opposite of Retrieval-Augmented Generation (RAG), where the document is chunked, embedded, and the most relevant chunks are retrieved for each query.

For a demo handling documents up to ~100,000 characters, full context is actually the right choice. Here's why:

  • No missed clauses. RAG retrieves the top-k chunks by semantic similarity. If the user asks a question that's phrased differently from how the relevant clause is worded, the right chunk might not be in the top-5. Full context sees everything.
  • Cross-document reasoning. Answering “what are the consequences if the non-compete is violated?” might require synthesizing information from sections 7 (non-compete) and 10 (breach remedies). RAG can miss these cross-section connections if the chunks are retrieved independently.
  • No retrieval infrastructure. No Pinecone account, no embedding API costs, no chunking logic. For a demo tool, simplicity is a feature.

The tradeoff: full context doesn't scale to 500-page documents. For production at scale — entire contract databases, legal due diligence repositories — RAG with careful chunking and hybrid search is necessary. DocIQ is honest about this in its 100,000-character truncation notice.

The architecture in 50 lines

// The full system prompt template
const systemPrompt = `You are DocIQ — a document intelligence assistant.

## Document Content:
<document>
${docText.slice(0, 100000)}
</document>

## Instructions:
- Answer questions ONLY based on the document above
- Always quote specific text from the document (use "...")
- Cite location: "In the [section/paragraph]..."
- If the document doesn't contain the answer, say so — don't speculate
- For lists or comparisons, use bullet points
`;

// Streaming response
const stream = await client.messages.stream({
  model: "claude-haiku-4-5",
  max_tokens: 1500,
  system: systemPrompt,
  messages: formattedMessages,
});

// Pipe to ReadableStream
const readable = new ReadableStream({
  async start(controller) {
    for await (const chunk of stream) {
      if (chunk.type === "content_block_delta") {
        controller.enqueue(encode(chunk.delta.text));
      }
    }
    controller.close();
  },
});

The entire streaming handler is ~40 lines. The value is in the prompt constraints, not the plumbing.

Why document AI matters for enterprise

The legal tech company Clio estimates attorneys spend 40% of their time on administrative tasks. A significant portion of that is document review — finding the clause, checking the provision, answering the client's question about what their own contract says.

Document Q&A is the “hello world” of enterprise AI. The pattern appears everywhere:

  • Legal: Ask questions about contracts without reading 50 pages
  • Finance: Summarize earnings reports for non-analysts
  • Engineering: Query technical specs and RFCs
  • HR: Answer policy questions from the employee handbook
  • Sales: Pull relevant terms from customer contracts before a negotiation

The core loop is always the same: document + question → cited answer. Getting that loop right — faithful, transparent, fast — is what separates a useful enterprise tool from a liability.


Try it: doc-iq-one.vercel.app · Source: github.com/matua-agent/doc-iq