The 7 Patterns Behind High-Performance Claude Prompts (After Testing 2,392 of Them)

The Premise

I built a library of 2,392 Claude prompts and tested every one against real workloads. Some were ones I wrote myself. Many were variations users submitted, asked for, or improved on. A meaningful fraction were prompts I scraped from public skill catalogs and rewrote.

After all that testing, the prompts that consistently produced high-quality output had a structure. The ones that produced mediocre output also had a structure, but a different one. And the ones that completely failed were almost always failing one of three specific things.

This post is the structure. Seven patterns the working prompts share. Three failure modes the broken ones share. One 60-second self-test you can run against any prompt sitting in your CLAUDE.md or your skills folder right now.

No hype. No 'prompts that hit different.' Just what 2,392 data points said.

The Three Failure Modes (Why Most Claude Prompts Do Not Work)

Before the patterns, the failure modes. Almost every weak prompt in the catalog failed because of one of these three:

1. Under-specification. The prompt asks Claude to do something without naming the inputs, the outputs, or the boundary of the task. 'Write a code review of this PR' leaves Claude guessing about depth, format, audience, and what counts as done. Output drifts wildly across runs.

2. Over-instruction. The opposite problem. The prompt has 17 bullet points of rules, 4 nested example structures, and 6 'do not' clauses. Claude tries to satisfy all of them and produces stilted, hedge-heavy output that reads like a compliance form. The model is trying so hard to obey every instruction that it stops thinking.

3. Role confusion. The prompt switches between addressing Claude as the assistant, addressing Claude as the user, and giving Claude instructions about a third party ('tell the user that...'). Claude ends up unsure which voice to speak in, and the output reads as if multiple people wrote different paragraphs.

The seven patterns below all exist because they cleanly avoid one or more of these three failures.

The 7 Patterns Behind High-Performance Prompts

Pattern 1: Role Anchoring

Start the prompt by establishing what Claude is, not what Claude should do. This sounds like a small distinction. It is not.

Bad: 'Review this code and tell me what is wrong.'

Good: 'You are a senior backend engineer reviewing a junior dev pull request. The codebase is Python plus FastAPI plus Postgres. Your reviews are direct, specific, and prioritize correctness over style.'

The first form leaves Claude guessing about depth, voice, and what counts as a finding. The second form anchors all of that in the first sentence. Every downstream answer Claude gives is filtered through 'would a senior backend engineer say this?'

Why it works: role establishes the prior probability distribution over what good output looks like. Without it, Claude defaults to a vague 'helpful assistant' prior that produces vague helpful-assistant output.

Pattern 2: Context Fences

Explicitly declare what Claude has and what Claude does not have. The single most common reason Claude hallucinates is that it assumes it has information it does not.

Bad: 'Suggest improvements to my SaaS pricing.'

Good: 'You have: the pricing page text I am about to paste below. You do not have: revenue numbers, customer feedback, churn data, or competitor pricing. If your suggestion requires any of those, name what you would need and stop.'

The 'you do not have' line is the single highest-leverage sentence you can add to most prompts. It gives Claude permission to say 'I need more' instead of inventing.

Why it works: hallucination is what happens when a model has to produce output despite missing context. Explicitly fencing the missing context turns hallucination into a graceful request for more.

Pattern 3: Output Contract

Name the exact format Claude must return. Do not suggest. Do not request. Specify.

Bad: 'Give me your analysis in a clear format.'

Good: 'Return a JSON object with three fields: findings (array of strings, each a single finding), severity (string, one of: critical, warning, info), recommendation (string, one sentence). No prose outside the JSON.'

For non-structured outputs, the contract still applies, just in prose: 'Return three sections in this order: 1) one-paragraph summary, 2) bulleted list of risks (max 5), 3) single-sentence final verdict.'

Why it works: an output contract removes the largest source of variance across Claude runs. If you cannot predict the shape of the output, you cannot reliably consume it downstream. Defining the shape eliminates that uncertainty.

Pattern 4: Failure Mode Declaration

Tell Claude what not to do, with one specific example of the bad behavior you are preventing.

Bad: 'Be helpful and accurate.'

Good: 'Do not suggest dependencies the user did not already mention. Example of bad output: You should also consider Redis here. Example of good output: stick to what is in scope, and if something is missing, name it as a missing input.'

Failure modes work because they teach Claude the contour of the negative space around the right answer. Showing one specific bad behavior is worth 10 abstract rules.

Why it works: language models are exceptional at pattern matching against examples. A single 'do not write output like this' sample shifts the entire output distribution away from that pattern.

Pattern 5: Reasoning Scaffolding

Decide before you write the prompt whether you want plan-first reasoning or answer-first reasoning. Then explicitly request the one you want.

Plan-first (use when accuracy matters more than speed):

'Before writing the answer, briefly outline the steps you will take. Then execute the steps.'

Answer-first (use when speed matters and you can verify quickly):

'Give the answer first. After the answer, briefly explain your reasoning in 2-3 lines.'

The most common mistake here is leaving it implicit. Claude will default to one or the other based on phrasing cues, and that default is rarely what you actually want.

Why it works: Claude quality on complex tasks scales meaningfully with how much it thinks before speaking. Forcing the plan-first or answer-first mode lets you control the latency-versus-quality tradeoff explicitly.

Pattern 6: Verification Handles

Build the prompt so it produces output the prompt can self-check against.

Bad: 'Refactor this function to be more efficient.'

Good: 'Refactor this function to be more efficient. After the refactor, return a 2-3 line proof that the refactored version preserves the original behavior: name one input the function processes, what the original output is, what your refactored output is. They should match.'

The verification handle is the bit at the end that gives you (and Claude) a way to grade the work without re-running it. Even when the verification is shallow, it forces Claude to actually think about correctness instead of just style.

Why it works: 'explain your work' is older than language models. It is the same reason teachers ask students to show their working: the act of producing the proof catches errors that the original answer would have hidden.

Pattern 7: Iteration Anchors

Leave named hooks in the prompt that the user can later reference and edit. This is the pattern people skip the most often, and the one that matters most for prompts that get used more than once.

Example of an iteration anchor:

You are reviewing a PR.

[VOICE: senior backend engineer, direct, prioritizes correctness over style]
[DEPTH: line-by-line if the diff is under 100 lines, otherwise file-by-file]
[OUTPUT: markdown with a section per file]

The bracketed handles (VOICE, DEPTH, OUTPUT) are the iteration anchors. When the prompt is not doing exactly what you want, you do not rewrite the whole thing. You edit one anchor. Change 'line-by-line' to 'function-by-function.' Change 'senior backend engineer' to 'security reviewer.' The structure stays. The behavior shifts.

Why it works: prompts are software. Software you cannot easily modify is software you stop using. Iteration anchors are the equivalent of named constants in code: they make the prompt maintainable.

A Worked Example

Here is a prompt rewritten using all seven patterns.

Original (broken in 5 ways):

'Review this code and tell me if it has any bugs.'

Rewritten with the 7 patterns:

You are a senior backend engineer reviewing a junior developer pull request.
[STACK: Python plus FastAPI plus Postgres]
[VOICE: direct, specific, prioritizes correctness over style]
[DEPTH: line-by-line]

You have: the diff I am about to paste below.
You do not have: the rest of the codebase, the test suite, the deploy environment, or the PR description.
If a finding requires any of those, name the missing context and stop on that finding.

Return your review in this order:
1. A one-paragraph summary of the change (your understanding of what the PR does)
2. A bulleted list of findings, each tagged [CRITICAL], [WARNING], or [INFO]
3. One sentence: 'Ship' or 'Block, fix the [CRITICAL] items first'

Do not suggest dependencies the developer did not already use. Example of bad output: You should add Redis here for caching. Example of good output: name the issue, leave the solution to the developer judgment.

Before writing the review, briefly outline which files you will review in which order. Then execute.

After the review, in a separate section labeled 'Self-check,' name one assumption you made that could change your verdict if it turned out to be wrong.

This prompt has role anchoring (Pattern 1), context fences (Pattern 2), an output contract (Pattern 3), failure mode declaration (Pattern 4), reasoning scaffolding (Pattern 5), a verification handle (Pattern 6), and iteration anchors (Pattern 7: STACK, VOICE, DEPTH).

The quality difference between this and the original is not subtle. Run them side-by-side once and you will keep writing prompts in the second style for the rest of your Claude career.

The 60-Second Self-Test

Take any prompt you have written in the last 30 days. CLAUDE.md, a custom slash command, a skill file, a prompt you paste into chat. Run it through these seven questions. Score 1 if the answer is yes, 0 if no.

Does it open by telling Claude what role to embody, not just what task to do? (Role anchoring)
Does it explicitly name what Claude has and what Claude does not have? (Context fences)
Does it specify the exact output format Claude must return? (Output contract)
Does it name at least one behavior Claude should NOT produce, with an example? (Failure mode declaration)
Does it tell Claude whether to plan first or answer first? (Reasoning scaffolding)
Does it ask Claude to produce a check, proof, or self-grade alongside the answer? (Verification handles)
Does it leave at least one named, easily-editable handle for future iteration? (Iteration anchors)

Scoring:

0-2: the prompt is probably underperforming by a wide margin
3-4: the prompt works but with high variance across runs
5-6: the prompt is solid; one missing pattern is dragging it
7: the prompt is at the ceiling of what raw prompting can do; remaining gaps are model limits, not prompt limits

Do this once. Most people land at 2 or 3 the first time. The lift from 3 to 6 takes about 15 minutes of editing per prompt and pays back across every future invocation.

Where the 2,392 Prompts Live

The full library is at clskillshub.com. Each skill in the catalog has been through the same seven-pattern lens at some point in its life. If you want ready-made working prompts instead of writing your own from scratch, the library is the fastest path.

There is also a free 115-page guide on prompting and Claude Code workflows at clskillshub.com/guide. No email, no signup wall. Read it if you want the deeper version of what this post covers.

Closing

The seven patterns are not novel. Most of them have shown up in prompt engineering writeups one at a time. The contribution of this post is collecting them in one place, with the underlying failure modes that each one addresses, and the self-test that lets you grade your own prompts in 60 seconds.

If you are going to remember one thing: the difference between a prompt that scores 2 out of 7 and one that scores 6 out of 7 is fifteen minutes of work, and the output quality difference is enormous. The work is editing, not generation. Run the self-test on the three prompts you use most often this week. Edit them. Notice the difference.