Writing

Thoughts on AI development, building in public, sports science, and shipping things that matter.

·7 min read

Context Engineering Is Replacing Prompt Engineering (And Here's Why It Matters)

Andrej Karpathy called it context engineering — and the shift in language reflects a shift in what actually matters. Not tricks or magic phrases. The discipline of structuring what goes into the context window. I built a live demo showing 6 strategies streaming in parallel so you can see the difference for yourself.

aicontext engineeringprompt engineeringllmproductionengineeringclaude
Read
·8 min read

How to Give AI Agents Persistent Memory (Extract-Store-Inject Architecture)

Every AI conversation starts from zero — unless you build memory infrastructure. I built Agent Memory Demo to make the pattern tangible: two parallel API calls per message, a structured fact store by category, memory injected into context naturally. The pattern that makes AI assistants feel like they actually know you.

aiagentsmemoryarchitecturellmproductionclaude
Read
·7 min read

Why Your Prompts Are Holding You Back: A Side-by-Side Comparison of Four Techniques

Zero-shot, few-shot, chain-of-thought, and system-prompt tuning produce meaningfully different outputs — not just in style, but in accuracy and reliability. I built Prompt Lab to make these differences visible: four simultaneous SSE streams, same prompt, different techniques. Here's what I actually learned.

promptingaiengineeringllmstreamingdeveloper tools
Read
·6 min read

I Automated My Daily Standup With Claude. It Reads Git Logs So I Don't Have To.

Standups are a translation problem: commit messages → accomplishment statements. That translation is high-friction, low-value work — exactly what language models are good at. I built ai-standup: one command reads git history across all my repos, sends it to Claude, and generates a professional standup in 4 seconds. Here's how it works and what it reveals about AI-native dev tooling.

developer toolscliaiworkflowengineeringanthropic
Read
·6 min read

Why Evaluation Is the Most Important Part of AI Engineering (and How to Actually Do It)

Most AI teams spend 90% of their time on the model and 10% on evaluation. Production teams flip that. I built AI Eval Lab to make evaluation fast enough to actually happen — define test cases before writing prompts, run them all at once, score automatically, get AI-powered fix suggestions on failures.

aievaluationprompt engineeringengineeringtestingproduction ai
Read
·8 min read

I Automated My Dev Workflow With Three Claude-Powered CLI Tools

Commit messages, PR descriptions, and code explanations are high-friction, low-value work. I built three CLI tools that use Claude to eliminate them: ai-commit (staged diff → 3 conventional commit options → you pick), smart-pr (branch diff → structured PR description), and ai-explain (pipe any code, get an explanation). Here's how they work and why raw fetch beats the SDK for CLI tooling.

developer toolscliaiworkflowengineeringanthropic
Read
·7 min read

I Built an AI-Powered Job Tracker. Used It to Apply for Jobs the Same Day.

Three Claude tools embedded in a job application Kanban board: streaming cover letter generator, job fit analyzer with visual scoring, and tailored interview prep. The AI has my full background in context — published papers, portfolio apps, role differentiators. It generates letters that are actually specific, not generic.

aiengineeringcareerclaudeworkflowstreaming
Read
·5 min read

I Made Claude Haiku and Sonnet Race Each Other. The Results Are Surprising.

TTFT, tokens/second, total time — all visible, side by side, for the same prompt. I built a real-time streaming comparison tool because model selection shouldn't be based on benchmarks you can't feel. Here's what I actually learned about the Haiku/Sonnet tradeoff from watching them race.

anthropicclaudestreamingmodel selectionsseengineering
Read
·7 min read

I Built the Three AI Workflows Clio Is Shipping. Here's What I Learned.

Client updates from activity logs. Billing narratives from time entries. Calendar events from court documents. These aren't AI features — they're AI workflows. I built all three in one session to understand the pattern. The insight: the interesting engineering is in the output format, not the API call.

legal techaiworkflowsengineeringenterprise aiclio
Read
·7 min read

I Built a Live MCP Demo That Shows Every Protocol Message in Real Time

The Model Context Protocol is everywhere in enterprise AI discussions, but most demos are black boxes. I built one that makes the full protocol visible — initialize handshake, tool discovery, every tool call and result, token costs — streamed live to a trace panel as it happens.

mcpaiengineeringprotocolenterprise ai
Read
·6 min read

I Built a Real MCP Server — Install It in Claude Desktop Right Now

There's a difference between understanding MCP and implementing it. I built a working Node.js MCP server using the official @modelcontextprotocol/sdk — stdio transport, 6 tools with Zod schemas, full JSON-RPC 2.0 protocol. Install it in Claude Desktop with one config line and ask Claude about my portfolio.

mcpnode.jstypescriptprotocolclaude desktopenterprise ai
Read
·7 min read

Why I Implemented BM25 From Scratch Instead of Using an Embedding API

RAG demos always use vector embeddings. I used BM25 — the same algorithm that powers Elasticsearch and Apache Lucene. No external embedding API, no cold starts, fully deterministic and interpretable. Here's what I learned about when lexical retrieval is the right choice.

airagsearchengineeringinformation retrieval
Read
·6 min read

How AI Agents Actually Work: The Tool Use Loop Explained

I built a chat interface that makes the agentic loop visible: the AI reasons about tools, calls them, gets results, and synthesizes a response — and you can see every step. Here's the implementation and why tool descriptions matter more than schemas.

aiagentstool usefunction callingengineering
Read
·7 min read

The Pattern Behind Every Enterprise Document AI Product

I built an AI contract analyzer that extracts risk flags, obligations, and negotiation leverage from any legal document. The interesting part isn't the AI — it's the output design. Why 'here's the bad clause' is less useful than 'here's what to ask for instead.'

ailegal techstructured outputengineeringenterprise ai
Read
·6 min read

The System Prompt That Makes Document AI Actually Trustworthy

I built a document Q&A tool in four hours. The code was simple — the interesting part was the system prompt design that prevents hallucination and requires exact citations. Why 'don't speculate' is the most important instruction in document intelligence.

aidocument intelligencesystem designlegal techprompt engineering
Read
·8 min read

I Built an AI on My Own Research Papers. Here's What I Learned.

I published two papers in the European Journal of Applied Physiology, then built an AI chat that uses them as its knowledge base. Why I chose a hand-crafted knowledge base over RAG, how to design for scientific honesty, and what domain expertise multiplies.

airesearchsports scienceengineeringrag
Read
·8 min read

What LLM Pipelines Actually Look Like — A Visible Implementation

Enterprise AI runs on multi-step LLM pipelines. I built a visible one: 4 stages, 4 specialized prompts, context accumulating forward. Each stage shows timing, token usage, and intermediate output. Here's the architecture and what building it revealed.

aillm orchestrationenterprise aiengineeringsystem design
Read
·8 min read

Building MCP Servers: How I Gave My AI Agent Real Tools

The Model Context Protocol is USB for AI capabilities. Here's what I learned building six MCP servers in production — tool description design, access boundaries, and why structured error messages matter.

mcpaiagentsengineering
Read
·9 min read

The Algorithm Behind Real-Time PR Detection: Sports Science Meets Next.js

I spent years studying strength adaptation as a sports scientist before I wrote production code. Building PR detection for my workout tracker brought those worlds together — the Epley e1RM formula, per-set benchmarking, and what the data actually shows about nonlinear strength gains.

sports scienceengineeringalgorithmsfitness
Read
·6 min read

The Anthropic SDK Breaks on Vercel Streaming. Here's the Fix.

If you're getting 'Could not resolve authentication method' errors from the Anthropic SDK on Vercel, it's a streaming compatibility bug — not your API key. The fix is one pattern change. Here's exactly what breaks, why, and how to replace the SDK with raw fetch.

anthropicvercelstreamingdebuggingsdktypescript
Read
·6 min read

Getting Claude to Return Consistent JSON Every Time

I built a code reviewer that returns structured JSON: category, severity, line number, suggestion. The output is always parseable and typed. Here's the system prompt design that makes structured output reliable, and why schema specificity is the key variable.

aiprompt engineeringstructured outputengineeringdeveloper tools
Read
·7 min read

I Published the Research. Then I Built the App.

The Durability Analyzer started as peer-reviewed science. Building the product from the paper revealed a gap between what researchers communicate and what athletes actually need.

researchproductsports sciencedurability
Read