Claude Opus 4.7 shipped on April 16, 2026. It's the most powerful generally available model right now — 87.6% on SWE-bench Verified, 64.3% on SWE-bench Pro, 91% on CharXiv visual reasoning with tools. Anthropic's own customers reported a 3x lift in production task completion and 66% fewer tool-calling errors.
But here's what I've noticed in the two weeks since it dropped: most people are still prompting it like Opus 4.6. They're leaving 30% of the model's performance on the table, paying 20%+ more for the same output, and running into weird edge cases they don't understand. The culprit isn't the model — it's the prompts, the effort settings, and a handful of small behavioral changes that nobody told you about.
This is the field guide. Twelve tips, the tokenizer math, the effort decision tree, and — honestly — when you should just use Sonnet 4.6 instead.
The Mindset Shift Nobody Is Talking About
If you only read one thing from this post, read this section. Every other tip is downstream of it.
Opus 4.7's biggest change isn't raw intelligence. It's literalism. Opus 4.6 would interpret vague prompts charitably — you'd say "clean this up" and it would use its judgment to make reasonable changes. Opus 4.7 does exactly what you tell it. Nothing more. Nothing less.
If you tell 4.7 to "fix the bug in the login function," it will fix that one bug and stop. It will not notice the three adjacent bugs. It will not generalize "fix the bug" to "fix all bugs in this file." It treats your instruction as a literal specification.
This is a feature, not a bug. Teams were burned by 4.6 making "helpful" changes they didn't ask for. 4.7 fixes that. But it means your prompting habits from the last six months are now actively costing you performance.
The tips below are all about adapting to this. If you're building anything serious — coding agents, marketing automations, content workflows — this is the move.
Tip 1: Write Task Descriptions, Not Prompts
The single highest-leverage change you can make. Stop writing prompts like conversational instructions. Start writing them like engineering tickets.
A prompt is "write a landing page for a dog food subscription." A task description is:
Constraints:
- Hero headline ≤ 10 words, outcome-focused (not feature-focused)
- One primary CTA: "Start 14-day trial — no credit card"
- Include social proof block with 3 named testimonials
- No jargon, no "revolutionize" or "seamless" or "elevate"
- Mobile-first layout order
Acceptance criteria:
- Headline passes the 5-second test (reviewer can name the product after 5s)
- Page answers: what, who for, why now, what happens if I click
- Copy reads like a specific person wrote it, not a brand
Files: Write to
app/(marketing)/puppy-food/page.tsx. Use existing components in components/landing/.
This isn't overkill. This is what Opus 4.7 is asking for. The model will analyze the intent, calibrate how much thinking it needs, and execute against a clear definition of done. Skip the acceptance criteria and 4.7 will invent its own — and it almost never matches yours.
The three elements that matter most: intent (what outcome do you want), constraints (what can't it do), and acceptance criteria (how do we know it's done). If your prompt has all three, you're ahead of 95% of users.
Tip 2: Strip Out Your Old Scaffolding
Go look at your prompts from the last six months. How many of them say something like "double-check your work before responding" or "make sure the output is valid JSON" or "walk through your reasoning step-by-step before giving the final answer"?
Delete all of that.
Opus 4.7 has native self-verification. It audits its own outputs before reporting back — especially for structured work like .docx redlines or .pptx edits. It emits progress updates natively. When you add "double-check your work," you're not getting additional verification. You're confusing the model about whether there are two verification passes or one, and you're burning tokens on instructions the model would execute anyway.
The scaffolding patterns to remove:
| Old pattern (4.6 era) | What to do in 4.7 |
|---|---|
| "Think step-by-step before responding" | Remove. Adaptive thinking handles this. |
| "Double-check your output" | Remove. 4.7 self-verifies. |
| "Provide a progress update before each step" | Remove. 4.7 emits these natively. |
| "If you're unsure, ask a clarifying question" | Keep — but pair it with "otherwise proceed with best judgment." |
| "Respond ONLY with valid JSON" | Keep — literalism makes this more reliable, but still worth stating. |
Pruned prompts aren't just faster. They're more accurate. Every instruction in your prompt is a potential source of conflict or misinterpretation. 4.7's literalism means unnecessary instructions have real cost.
Tip 3: Master the Effort Dial
Opus 4.7 has five effort levels: low, medium, high, xhigh, max. This is the single biggest lever for cost vs. capability, and most people use the default without thinking about it.
Claude Code defaults to xhigh on every plan tier. That's the right default for coding. It's overkill for most other things.
| Effort | When to use | Examples |
|---|---|---|
| low | Latency-sensitive lookups, simple extractions | Summarize a paragraph, extract entities, classify intent |
| medium | Default for most content tasks | Write an email, draft a blog outline, answer a factual question |
| high | Analysis, multi-step reasoning | Competitive analysis, financial modeling, PR review |
| xhigh | Agentic coding, schema design, long-horizon work | Refactoring, migrating legacy code, building agents |
| max | Hardest, highest-stakes problems only | Novel algorithm design, subtle bug hunts, research synthesis |
The practical rule: if you're observing shallow reasoning on complex problems, raise effort before you touch the prompt. If you're paying too much for straightforward outputs, lower it. Don't try to compensate for the wrong effort level by restructuring your prompt — that's fighting the model instead of tuning it.
Tip 4: Account for the Tokenizer Tax
This one is being quietly passed over by most coverage, and it's costing teams real money.
Opus 4.7 uses a new tokenizer. For the same text, it produces 1x to 1.35x as many tokens as Opus 4.6. Your per-token prices are unchanged ($5/M input, $25/M output) — but your actual token count per request is higher. For a workload that cost $10,000/month on Opus 4.6, expect to pay somewhere between $10,000 and $13,500 on 4.7 for the same volume of work.
This isn't a price hike. It's a tokenization change that happens to increase bills. But you need to budget for it.
Three things you can do about it:
1. Measure before you migrate. Run a representative slice of your workload through 4.7 and compare input/output tokens to 4.6. Don't guess. The multiplier varies by language, code vs. prose, and structured vs. unstructured data.
2. Lean harder on prompt caching. Cache reads are 90% cheaper than fresh reads. If you're passing the same system prompt or codebase context every call, caching it turns the tokenizer tax into a rounding error.
3. Use batch for non-real-time work. Batch processing is 50% off standard rates. Offline analytics, content generation queues, backfills — push them through the Batch API.
Between prompt caching and batching, a well-engineered workload can be cheaper on 4.7 than it was on 4.6, despite the tokenizer change.
Tip 5: Use Task Budgets to Stop Runaway Loops
New in 4.7: task budgets. You give the model a rough token ceiling for the full agentic loop — thinking, tool calls, tool results, final output — and it sees a running countdown. The model uses that countdown to prioritize work and wrap up gracefully as the budget is consumed.
The minimum is 20,000 tokens. Budgets are advisory, not enforced — the model treats them as guidance, not a hard cut-off. And critically, budgets carry forward across compaction cycles, which makes them genuinely useful for long-running agents.
When to actually set one:
- Autonomous agents that run without supervision (customer support bots, CI/CD agents, scheduled workflows). Without a budget, a bad query can spiral into hundreds of tool calls.
- Research tasks where the answer might live in 3 searches or 30, and you want 4.7 to pick the right stopping point.
- User-facing experiences where you're happy with "good in 30 seconds" over "perfect in 3 minutes."
When not to bother: one-shot requests where the output size is predictable. Setting a task budget on a "summarize this article" call is just ceremony.
Tip 6: Pair the 1M Context Window With Prompt Caching
Opus 4.7 ships with a 1M token context window at standard pricing — no long-context premium. That's a big unlock, but the way to actually use it well is to combine it with prompt caching.
Here's the pattern: for a long-running agent (coding assistant, research agent, customer support) you load the full context — the entire codebase, the product documentation, the user's history — into the system prompt. You mark it cacheable. On the first call, you pay full price to process it. On every subsequent call within the cache window, you pay 10% of that price to read it.
For a codebase that's 400K tokens (roughly the size of a mid-sized SaaS app), fresh processing costs ~$2 per call. Cached reads cost ~$0.20. Over 100 calls in a work session, that's the difference between $200 and $20.
If you're building anything multi-turn, this is table stakes. Long context is the headline. Caching is what makes it economically viable.
Tip 7: Push Vision Up to 2,576px
Opus 4.7 accepts images up to 2,576 pixels on the long edge — up from 1,568 in every prior Claude model. That's roughly 3.75 megapixels, more than triple the resolution.
Most integrations are still downsampling to the old 1568px limit "to be safe." Stop doing that. You're throwing away information. The new resolution unlocks:
- Dense screenshots (dashboards, admin UIs, analytics pages) where small text was unreadable at 1568px
- Technical diagrams — architecture diagrams, circuit schematics, flowcharts
- Data-heavy interfaces — Bloomberg terminal-style pages, monitoring dashboards, spreadsheets
- Full-page landing page captures where the fold stretches 2000px+ vertically
Opus 4.7 also improved low-level perception (pointing, measuring, counting) and image localization (bounding-box detection). Vision accuracy in production workloads hit 98.5% in Anthropic's customer tests — the kind of number that actually lets you trust vision outputs in automated pipelines, not just as user-facing demos.
If you're building anything that analyzes web pages, documents, or UIs, upgrade your image pipeline first. This is where the 4.7 upgrade pays for itself fastest.
Tip 8: Let Adaptive Thinking Run
In Opus 4.7, adaptive thinking is the only supported thinking mode. The old thinking: { type: "enabled", budget_tokens: N } parameter is gone. Don't try to bring it back.
Adaptive thinking lets the model decide, per request, whether to think and how much. Simple queries skip thinking entirely. Hard queries get as many thinking tokens as they need. Over an agentic loop, this compounds into meaningful latency and cost savings — you're not paying for thinking on the 80% of steps that don't need it.
You still have indirect levers:
- To encourage more thinking: "Think carefully and step-by-step before responding; this problem is harder than it looks."
- To encourage less thinking: "Prioritize responding quickly rather than thinking deeply. When in doubt, respond directly."
- Effort level is still the biggest dial — higher effort means the model is more willing to spend thinking tokens.
Thinking blocks now appear in the response stream with an empty thinking field by default. If you want to see the actual reasoning trace, you have to explicitly opt in. For most production apps, this is what you want — you're not exposing internal reasoning to end users anyway, and you save tokens by not streaming it.
Tip 9: Be Explicit About Subagent Fanout
Subtle but important: Opus 4.7 spawns fewer subagents by default than 4.6. If you're running a multi-agent system, you'll notice the main agent trying to handle more in one response instead of delegating.
This is steerable. You just have to be explicit.
The prompt that works: "Spawn a specialist subagent for each of: frontend review, backend review, database review, security review."
The prompt that doesn't: "Don't try to handle all of this in one response."
Literalism again. "Don't do X" is weaker than "Do Y." Tell the model what you want, not what you don't want. Give explicit fanout instructions with a clear list of subtasks, and 4.7 will fan out. Leave it vague and it'll try to do it all in-context.
A good heuristic for when to force fanout:
- Fan out when the subtask would flood the main context with search results, logs, or file contents you won't need again.
- Don't fan out for work you can complete in a single response (refactoring a function you can already see).
- Always fan out in parallel when the subtasks are independent — single turn, multiple subagents.
Tip 10: Use /ultrareview for Serious Code Review
New Claude Code command: /ultrareview. It spawns a dedicated review session that flags bugs, design issues, and subtle problems the original implementation might have missed.
This is different from asking Claude to "review my code." Ultrareview is a structured pass — it looks for the specific categories of issues that senior engineers flag in PRs. Logic errors, edge cases, naming consistency, security implications, dead code, mismatched abstractions.
The workflow that works:
- Complete your implementation at xhigh effort
- Run
/ultrareviewbefore you commit - Fix issues it flags
- Run
/ultrareviewagain if you made substantial changes
This is a real productivity multiplier for solo devs — you effectively get a second pair of eyes on every change without tagging a human reviewer. For teams, it cuts the back-and-forth on PR reviews because obvious issues are already caught.
Tip 11: Know When Sonnet 4.6 Is Actually the Right Call
Here's the honest truth most Claude tutorials won't tell you: Opus 4.7 is overkill for a huge chunk of real work.
Sonnet 4.6 scores 79.6% on SWE-bench Verified — only 1.2 points behind Opus 4.6. It's 5x cheaper per token. It's 2–3x faster. And for the vast majority of tasks, you can't tell the difference in output quality.
• Long-horizon coding (hours of autonomous work)
• Schema design, API architecture, migrations
• Dense vision tasks (dashboards, technical diagrams)
• Research synthesis across 1M tokens of context
• Anything where a small quality gap has 10x business impact
• Single-file bug fixes
• Content generation at volume (blog posts, emails)
• Classification, extraction, summarization
• Computer-use workloads (5x cheaper, same quality)
• Anything you're doing thousands of times per day
The smart architecture: default to Sonnet 4.6. Route to Opus 4.7 only when the task actually benefits from the extra capability. For a typical product workload (mix of simple extraction, moderate analysis, occasional deep reasoning), you'll see 40–60% cost savings with zero quality regression.
We've written about when to use AI for landing page copy — the same principle applies. Model selection is an engineering decision, not an identity. Use the right tool for the task.
Tip 12: Batch Your Questions, Minimize Turns
Every user turn in a multi-turn conversation adds reasoning overhead. 4.7 has to re-ingest the prior context, re-calibrate its understanding of the task, and resume. That's real latency and real tokens.
The fix: batch your questions. Instead of asking "what do you think of this headline?" and then "can you rewrite it?" and then "give me three more variations," ask all three at once:
1. Give a frank critique (what's working, what isn't)
2. Rewrite it to be outcome-focused
3. Generate three alternative variations using different hooks (problem-led, outcome-led, contrarian)"
You get all three outputs in one turn. The model sees the full scope of what you want, calibrates once, and executes all three together. Fewer turns, less latency, less cost, and honestly — better outputs, because the model understands how the parts relate.
The 5 Traps That Are Still Tripping Teams Up
Two weeks in, the patterns are clear. Here are the most common mistakes I'm seeing — even from experienced Claude users.
The Patterns That Win
If you only take five things from this post:
- Write specifications, not prompts. Intent, constraints, acceptance criteria, file locations.
- Match effort to task. xhigh is the right default for hard work. Drop it for simple work. Reserve max for the edge cases.
- Cache your long context. 1M tokens is only economical if you're not reprocessing it every call.
- Set task budgets on autonomous agents. 20K token minimum. Saves you from runaway loops.
- Use Sonnet 4.6 by default; route to Opus 4.7 only when the task earns it.
The teams that will get the most out of Opus 4.7 are the ones that adapt their prompting to the model's new behavior — not the ones that run their old prompts and complain that 4.7 "isn't as smart as it should be." It is. You just have to ask it differently.
How We're Using Opus 4.7 at roast.page
One concrete application since this is a landing page blog: Opus 4.7's vision upgrade is a significant quality jump for page analysis.
At roast.page, we screenshot every submitted landing page — both viewport and full-page — then pass those images plus scraped HTML plus PageSpeed data into Claude for analysis across 8 dimensions (hero, copy, CTAs, trust signals, visual design, structure, technical/SEO, differentiation). The 2576px vision ceiling means we can now analyze dense pricing tables, packed feature grids, and long-scroll pages without losing detail in the downsampling.
The other upgrade we've felt is the self-verification. Previously, we'd occasionally get a dimension score that didn't match the reasoning in the "bad" or "fix" fields — the model had written one thing and scored another. That's essentially gone in 4.7. The scores and the explanations are consistent with each other, every time. Small quality bar, but it's the kind of thing users notice.
If you're curious what Opus 4.7's vision + reasoning can actually see on a real landing page, run yours through roast.page. You'll get an 8-dimension breakdown with a priority-ordered list of fixes in about 30 seconds. No signup.
For more on how AI changes what landing pages need to do, see our guides on ranking in AI search, the AI optimization workflow, and whether AI search engines would cite your page. And if you want the data on what's actually working in 2026, our State of Landing Pages 2026 report has the numbers.