Test-Driven Writing

A framework for writing with AI that turns you from editor into cultivator

My path to this method came from my special power: dyslexia. Writing has always been hard for me. In the seventies, dyslexia wasn't diagnosed, and my teachers graded me down for spelling errors and mistook my dyslexia for lack of care and commitment to the work. They couldn't see the ideas underneath the errors in my writing. When the word processor came out in college, I could finally revise without retyping the whole thing. Then spell check. Then grammar check. Each tool let the computer handle what my brain could not. Now the LLMs. A writing tool where my ideas can take shape faster than the mechanics can fight against them.

But for the first year with Claude, it was slower for me to write with AI than without it. I was editing everything Claude produced. I caught missed nuance, wrong tone, buried leads. The AI drafted; I fixed. It worked, and my writing got way better, but it took a very long time. Every piece ran through my line-by-line attention, just as it always had. The AI speed advantage vanished into revision after revision after revision.

Then something shifted. I realized Claude could start to capture the edits I am making so it could be the editor next time, instead of me. Not prose guidelines or style suggestions. Criteria. Pass/fail assertions. Does the lead appear in the first two paragraphs? Are claims grounded in specifics? Does the ending land? I am building a test suite for my writing. The assertions can do the work that my slow iterations of editing have been doing.

The parallel is direct. In test-driven development, programmers write the tests before they write the code—define what success looks like first, then build until the tests pass. In code, you design tests; AI iterates until they pass. In writing, you design criteria; AI iterates until they pass. You stop being the editor and become the cultivator. Instead of fixing "it's important to note" yourself, you add it to your banned-phrases list and the system catches it next time.

This is asymmetric pair programming—human and AI working together, each with different strengths. The human cultivates, which includes designing the criteria. The AI implements, drafting and iterating against those criteria. The human refines, reviewing what the critique catches and updating criteria when they miss something.

The job changes from checking every line to articulating what you want. Harder in some ways—you have to make implicit knowledge explicit. But it scales. Your criteria mean AI can make the fixes. You're not rewriting every sentence anymore.

The separation between tests and guidance matters. In test-driven development, the test suite contains only tests—binary pass/fail. Style guidance lives elsewhere: linters, style guides, code review. The same applies to writing. Criteria are the test suite, pass/fail requirements. Patterns are techniques and guidance, tools in the toolbox. But here's where writing differs from code: in programming, more tests is usually better—they verify output without constraining how you write. Criteria constrain the LLM directly. Too many hard rules and the prose gets boxed in. Keep criteria minimal. Most of what you know about good writing belongs in patterns, where it guides without blocking.

My criteria include things like "lead not buried," "no AI-isms," "claims grounded in specifics," "no throat-clearing," "so what is clear." These are checkable. Either the lead appears in the first two paragraphs or it does not. Either the piece contains "delve" and "it's important to note" or it does not. Either assertions have evidence or they do not.

Patterns are different. Techniques like "short sentences for emphasis," "contrast for punch," "practitioner voice." These are guidance—they help but they do not block publication.

The workflow uses multiple agents—research, outline, draft, critique, revise—but the human enters at one key point: reviewing the critique, not the draft. "Is this a valid criticism?" is higher leverage than "is every sentence right?" You spend attention on meta-level judgment instead of line-level editing. And when you do make direct edits, those edits become new criteria. Every change you make teaches the system what you care about.

Frankenstein writing emerges when too many agents touch the prose. Echo chambers form when Claude critiquing Claude misses what a human would catch. Over-processed prose loses punch. Voice gets lost. The mitigations are structural: the human reviews critiques before they become revisions, the human has final judgment on voice. You'll still edit manually—but each edit teaches the system. Over time, you'll need to edit less.

TDW catches what's wrong — jargon, repetition, buried leads. Sometimes it fixes it, offering options when the problem is structural. But it rarely sees what a passage could become. I wrote "I gave up and relaxed" in a scene set in a bathtub. It passed every criterion. But the scene had cooling water, a recording I'd shut off, muscles letting go, a mind letting go — and none of it was doing work. I had to add those details myself. The best the system can do is flag the question: is this scene using what's available? The creative answer is still yours.

This shift is real. You stop reviewing prose and start reviewing outcomes. You stop being the person who checks every line and become the person who defines what "checked" means. For someone like me, dyslexic, always fighting with the mechanics of writing, this changes everything. The mechanics become the AI's job. My job becomes knowing what good looks like. Twenty-five years of reading, editing, and knowing when something lands becomes the input to a system that can check faster than I can read. The hard-won judgment transfers. The mechanical struggle does not.

Test-driven writing.

The TDW framework files are shared freely. Get them on GitHub.