Opus 4 retired. Opus 4.8 costs 67% less — mind the tokenizer

Sonnet 4 and Opus 4 retired June 15 — exact model IDs, breaking changes, and the Opus 4.8 tokenizer caveat explained.

Jun 16, 2026

Opus 4 retired. Opus 4.8 costs 67% less — mind the tokenizer

If your code still pins claude-opus-4-20250514 or claude-sonnet-4-20250514, those calls started failing on June 15, 2026 — no warning, no fallback, just errors. Here is exactly what broke and what to swap in.

What Stopped Working on June 15 — and Exactly Which IDs

Anthropic retired Claude Sonnet 4 and Claude Opus 4 on its operated platforms on June 15, 2026, and there is no grace period — the Model deprecations page defines a retired model as one that is "no longer available for use" and states that "requests to retired models will fail" . The two affected IDs are claude-sonnet-4-20250514 and claude-opus-4-20250514. Both were deprecated April 14, 2026 with the June 15 retirement date, satisfying Anthropic's 60-days-notice policy (about 62 days) .

"Requests to retired models will fail." — Anthropic, Model deprecations documentation (source: platform.claude.com)

It is not only the dated snapshots that break. The pre-4.6 convenience aliases — claude-opus-4, claude-opus-4-0, claude-sonnet-4, and claude-sonnet-4-0 — all resolved to the same 20250514 snapshots, so they error out too . If you assumed a bare alias would float forward to a live model, it did not.

The recommended replacements are claude-sonnet-4-6 for Sonnet 4 and claude-opus-4-8 for Opus 4 . One semantic change to internalize: starting with the 4.6 generation, Anthropic uses a dateless format like claude-sonnet-4-6, but these are not evergreen pointers. Each maps to one fixed snapshot, so the ID will not shift underneath you — and future upgrades will require another explicit model-ID change .

To find lingering usage, run a two-step audit. First, grep your codebase and configs for 20250514 and each of the four convenience aliases above. Second, export a usage CSV from the Anthropic Console Usage page, broken down by API key and model, to catch calls that aren't hardcoded — pipelines, notebooks, or third-party integrations that reference the legacy IDs at runtime .

The sections below break down platform-by-platform timing, the breaking API changes for each replacement, and why the Opus 4.8 tokenizer can change your token math even after the swap.

Anthropic API vs. Bedrock vs. Vertex AI: Who Got Cut Off When

The June 15 hard retirement applies only to Anthropic-operated platforms — the Claude API, Claude Platform on AWS, and Microsoft Foundry — where calls to claude-sonnet-4-20250514 and claude-opus-4-20250514 now fail with no grace period . Partner-operated platforms run their own lifecycle schedules, so the same model ID can be retired in one place and still callable in another. If you deploy through Amazon Bedrock or Google Vertex AI, the June 15 date is not automatically yours — you have to read each partner's lifecycle table.

As of mid-June 2026, the partner tables tell a split story. Sonnet 4 remains available (in a deprecated state) on both Bedrock and Vertex AI . Opus 4 diverges between the two: it is retired on Bedrock but still available (deprecated) on Vertex AI . "Deprecated" means the model still serves requests but is on a countdown; "retired" means requests fail outright .

Platform	Sonnet 4 (`claude-sonnet-4-20250514`)	Opus 4 (`claude-opus-4-20250514`)
Claude API (Anthropic)	Retired June 15, 2026 — calls fail	Retired June 15, 2026 — calls fail
Claude Platform on AWS	Retired June 15, 2026 — calls fail	Retired June 15, 2026 — calls fail
Microsoft Foundry	Retired June 15, 2026 — calls fail	Retired June 15, 2026 — calls fail
Amazon Bedrock	Available (deprecated)	Retired
Google Vertex AI	Available (deprecated)	Available (deprecated)

The practical takeaway: treat the platform tables as the source of truth, not a blog summary, because these states shift on partner timelines and the divergence above is a snapshot of mid-June 2026 . A Bedrock Opus 4 caller is already broken; a Vertex Opus 4 caller has runway but should migrate before that table flips. Anthropic also does not publish a precise UTC cutover time for the June 15 date — only the calendar day — so build a buffer rather than scheduling a cutover for the final hours .

Consumer surfaces are out of scope. Claude.ai and Claude Code auto-select models and are not governed by these pinned-ID retirements, so there is no user-facing breakage — the failures are confined to API code that hardcodes the legacy dated IDs or their pre-4.6 aliases . If your only Claude usage is through the apps, nothing changed; if you ship API calls, the platform you route through decides whether June 15 already hit you.

Migrating to Sonnet 4.6: Six Breaking API Changes

Swapping claude-sonnet-4-20250514 for claude-sonnet-4-6 is a one-line change, but six API-level differences can break working code or quietly degrade it. Sonnet 4.6 holds the same price point — $3 per million input tokens and $15 per million output, with a 1M-token context window and 64k max output — so this is a behavior migration, not a cost one. Anthropic documents each change in its migration guide ; here is what actually requires code edits.

1. Assistant-message prefill is gone (HTTP 400). If you seed an assistant turn to force a response shape — a common trick for constraining JSON output — Sonnet 4.6 rejects it with a 400 error . Replace prefill with structured outputs via output_config.format, or move the constraint into a system-prompt instruction. This is the change most likely to take down an existing pipeline silently, because prefill patterns rarely carry their own tests.

2. effort now defaults to high. Sonnet 4 and 4.5 had no effort parameter; Sonnet 4.6 introduces one and defaults it to high, so leaving it unset raises latency relative to your old baseline . For a response profile comparable to Sonnet 4.5 without extended thinking, set effort: low with thinking disabled. Agentic coding workloads should start around medium. Set it explicitly rather than inheriting the default.

"Test the replacement models using your standard evaluation suite before deploying to production," — Anthropic, Model deprecations documentation (source: platform.claude.com).

3. Handle the new refusal stop reason. Response parsing must account for stop_reason: "refusal", which earlier Sonnet 4 code never emitted . Code that branches on stop reasons — or assumes every completion is usable text — needs a path for refusals.

4. Extended thinking still works, but is deprecated. Extended thinking with budget_tokens continues to function on Sonnet 4.6, yet it is deprecated in favor of adaptive thinking through the effort parameter . You do not have to migrate it on day one, but new code should standardize on effort.

5. Drop the GA'd beta header and rename output_format. The fine-grained-tool-streaming-2025-05-14 beta header is now generally available — remove it from your requests. Separately, the deprecated output_format parameter migrates to output_config.format . Both are cleanups, not behavioral risks, but the renamed parameter is easy to miss.

6. Audit custom tool-call JSON parsing. JSON string escaping in tool-call parameters may differ on Sonnet 4.6 . Standard JSON parsers handle the new escaping cleanly, so most callers are fine; the exposure is code that parses tool arguments as raw strings rather than deserializing them properly. If you wrote a custom string-based parser, audit it before shipping.

None of these is large in isolation, but together they argue for routing the swap through your own eval suite rather than trusting that a newer Sonnet preserves the old output shape. The prefill and effort changes are the two that most often surprise teams in production.

Migrating to Opus 4.8: Two Generations of Breaking Changes

Opus 4 → Opus 4.8 is a harder cutover than Sonnet, because Opus 4 predates the 4.7 generation — so you inherit the cumulative Opus 4.7 breaking changes on top of the model-ID swap . Swap the ID to claude-opus-4-8, then work through three server-side rejections that the SDK will not catch for you. None of them throws at type-check time; they all surface as a 400 at runtime .

Quick Answer: Migrating from Opus 4 to claude-opus-4-8 means absorbing Opus 4.7's breaking changes too: non-default temperature/top_p/top_k, manual extended thinking, and assistant prefill all return a 400. Steer via prompting and adaptive thinking instead, and set effort explicitly since it now defaults to high.

The first trap is sampling parameters. Non-default temperature, top_p, and top_k are rejected with a 400 on Opus 4.7 and later, even though the SDK still type-checks these fields as valid . The API enforces the rule server-side, so code that compiles cleanly still fails in production. Remove these fields and steer behavior through prompting.

The second is thinking and prefill. Manual extended thinking — thinking: {type: "enabled", budget_tokens: N} — and assistant prefill both return a 400 on Opus 4.7+ . Migrate to adaptive thinking via output_config.effort. And because effort defaults to high on Opus 4.8, set it explicitly: an unset value can raise latency and cost on workloads that do not need maximum reasoning .

Then update response parsing. Handle stop_reason: "refusal" and read stop_details.category, and recheck any code that treats tool-call JSON as raw strings rather than deserializing it .

Opus 4.8 also ships nonbreaking features that ease the transition: mid-conversation system messages, publicly documented refusal stop_details, a lower 1,024-token prompt-cache minimum, an opt-in fallbacks beta parameter that auto-retries refused requests on another model, and a Fast Mode research preview offering up to 2.5x higher output tokens per second at premium pricing . The fallbacks param in particular pairs well with the new refusal handling.

One deployment-specific gotcha: the context window is not uniform. Opus 4.8 supports a 1M-token context window and 128k max output by default on the Claude API, Amazon Bedrock, and Vertex AI — but only 200k context on Microsoft Foundry . If you run multi-cloud, do not assume your Foundry path can hold the same prompt your Claude API path does.

Change	Status	Action
Non-default `temperature`/`top_p`/`top_k`	Breaking (400)	Remove; steer via prompting
Manual `thinking` + assistant prefill	Breaking (400)	Use adaptive thinking via `output_config.effort`
`effort` default	Now `high`	Set explicitly to control cost/latency
Context window	1M / 128k output (API, Bedrock, Vertex)	200k on Microsoft Foundry — plan accordingly

The Tokenizer Trap: Why Opus 4.8 Can Inflate Your Bills

The most easily missed change in the Opus 4 → 4.8 migration is the tokenizer, not any parameter. Starting with Opus 4.7, Anthropic switched to a tokenizer that can produce roughly 1x to 1.35x as many tokens for the same input text — up to about 35% more tokens on identical prompts . Nothing in your code throws an error, so the effect surfaces only in usage metrics and invoices unless you re-baseline ahead of time.

Because token counts move, several downstream assumptions move with them. Re-baseline each of the following against Opus 4.8 before you trust production numbers:

max_tokens budgets — a prompt that fit comfortably under a ceiling on Opus 4 may now run closer to it, or clip earlier than expected.
Client-side token estimation — any local counter or heuristic calibrated to the old tokenizer will under-report, skewing pre-flight checks and routing logic.
Cost projections — per-request spend is a function of tokens, not characters, so your forecasts need fresh inputs.
Latency targets — more tokens per request means more to generate and stream; SLAs tuned to the old counts can slip.
Compaction trigger thresholds — any logic that summarizes or truncates context at a token boundary will fire at a different point in the conversation.

The net cost impact is not automatic, and this is where the migration math gets counterintuitive. Opus 4.8 cuts the per-token price sharply — from $15 / $75 per million input/output tokens on the original Opus 4 to $5 / $25 on Opus 4.8 . But a prompt that tokenizes 35% heavier partially eats into that cut. The headline price drop and the token inflation pull in opposite directions, so your actual savings depend on your prompt content and length distribution rather than the sticker price alone. Run the arithmetic on your own traffic before assuming the full reduction lands on your bill.

Crucially, the inflation is not uniform across input types. Character-dense payloads — source code, JSON, XML, minified config — can tokenize very differently from natural-language prose, and the 1x–1.35x range is wide enough that where you fall inside it matters. An agentic coding workload pushing large diffs and structured tool output may see a different multiplier than a chat-style summarization task. Measure with your actual payloads, not synthetic benchmarks: pull representative requests from your logs, count them under Opus 4.8, and compare against your historical Opus 4 counts.

Anthropic's documented guidance is to test replacement models against your own evals before production precisely because tokenization and API constraints changed together . Treat token re-baselining as part of that eval pass, not a follow-up cleanup.

Pricing Snapshot: What the 67% Cut Actually Means

Migrating Opus workloads is a price cut, not just a parameter cleanup. The original Opus 4 tier cost $15 per million input tokens and $75 per million output tokens; Opus 4.8 (claude-opus-4-8) charges $5 and $25 respectively — a 67% reduction on both input and output, with lower cache-write and cache-hit prices on top . The catch from the previous section still applies: that headline cut is computed per token, and the new Opus tokenizer can emit up to ~35% more tokens for the same text, so your effective saving lands below 67% until you re-baseline.

Quick Answer: Opus 4.8 cuts list pricing from $15/$75 to $5/$25 per million input/output tokens — 67% off both rates, plus cheaper cache reads and writes. Sonnet 4.6 holds the Sonnet tier at $3/$15, matching Sonnet 4 exactly .

Sonnet's economics are unchanged. Sonnet 4.6 (claude-sonnet-4-6) holds the Sonnet price point at $3 per million input tokens and $15 per million output tokens — the same rate Sonnet 4 carried — so for Sonnet callers this is a capability and API migration with no list-price delta . Both replacement models share a 1M-token context window on the Claude API, but their output ceilings differ: Opus 4.8 supports 128k max output tokens versus 64k on Sonnet 4.6, which matters if you generate long structured responses in a single turn .

Spec	Opus 4 (retired)	Opus 4.8	Sonnet 4 (retired)	Sonnet 4.6
Input / M tokens	$15	$5	$3	$3
Output / M tokens	$75	$25	$15	$15
Max output	—	128k	—	64k
Context window (Claude API)	—	1M	—	1M

One semantic change outlasts this migration. Starting at the 4.6 generation, Anthropic uses dateless IDs like claude-sonnet-4-6 and claude-opus-4-8 — but these are pinned snapshots, each mapping to one fixed model, not rolling pointers that track the latest release . The ID will not shift underneath you, which is good for reproducibility, but it also means the next Opus or Sonnet upgrade requires another explicit model-ID change and another migration pass . Budget for that recurring swap rather than assuming a dateless string future-proofs your code.

Validating the Migration: Evals, the Claude Code Helper, and the Next Deadline

Validate every migration against your own evals before promoting to production — published benchmarks are baselines, not guarantees that output style or token counts survived the jump. Anthropic ships a migration helper inside Claude Code: run /claude-api migrate this project to claude-opus-4-8 (or claude-sonnet-4-6) and the skill swaps model IDs, applies the breaking parameter changes covered earlier, and outputs a manual-verification checklist . It detects Bedrock, Vertex AI, Microsoft Foundry, and Claude Platform on AWS client formats, so mixed-deployment codebases get the right edits per target .

Treat the original Claude 4 benchmarks as regression reference points, not promises. Anthropic reported SWE-bench scores of 72.7% for Sonnet 4 and 72.5% for Opus 4 at the May 22, 2025 launch . Those numbers tell you what the retired models could do; they say nothing about whether 4.6 or 4.8 reproduces your prompts' exact formatting, tool-call shapes, or token economics. The official guidance is explicit on this point:

"Always test replacement models against your own evals before production, since behavior, tokenization, and API constraints changed," — Anthropic, Model migration guide (source: platform.claude.com).

Build domain evals that assert on structure — JSON schemas, stop reasons, output length distributions — rather than eyeballing a few sample completions. The tokenizer change alone means your old max_tokens ceilings and cost projections need re-baselining against real traffic.

If you only call Opus 4.1, you have a slightly longer runway. The legacy ID claude-opus-4-1-20250805 was deprecated June 5, 2026 with an August 5, 2026 retirement date, and it shares the same claude-opus-4-8 replacement . Don't let the extra weeks lull you — the migration work is identical to the Opus 4 path, so doing both in one pass is the cleaner play.

One reassurance for teams worried about losing access to known-good behavior: retirement means API unavailability, not deletion. Anthropic has committed to preserving the weights of publicly released models for at least the lifetime of the company . The snapshots still exist; you simply can't route production traffic to claude-opus-4-20250514 or claude-sonnet-4-20250514 anymore.

The concrete takeaway: run the Claude Code migration skill to mechanize the ID and parameter swaps, then gate the result behind your own evals before shipping. Fold the August 5 Opus 4.1 cutover into the same sweep, re-measure token usage and cost on the new tokenizer, and write the next migration into your roadmap — because the dateless IDs pin you to a snapshot, not to the latest model.

Last updated: 2026-06-16. Reviewed against Anthropic's model-deprecation and migration documentation as published on the retirement date.

Frequently asked questions

Do I need to update anything if I use Claude.ai or Claude Code directly?

No. Consumer surfaces like Claude.ai and Claude Code auto-select models and are not governed by the pinned-ID retirements that took effect June 15, 2026. The breaking change is API-level only: it affects code that hardcodes a specific dated model ID such as claude-opus-4-20250514 or claude-sonnet-4-20250514, or their pre-4.6 convenience aliases. If you never call those IDs from your own API integration, there is nothing to migrate.

Is claude-sonnet-4-6 an evergreen alias that will auto-update to future Sonnet versions?

No. Starting with the 4.6 generation, Anthropic switched to a dateless format like claude-sonnet-4-6 and claude-opus-4-8, but these are pinned snapshots, not evergreen pointers — each maps to one fixed model . The benefit is that the model will not silently change underneath you; the trade-off is that every future upgrade requires another explicit model-ID swap and another round of eval testing. Plan migrations as a recurring task, not a one-time fix.

I use Amazon Bedrock or Vertex AI. Did the June 15 retirement apply to me?

Not automatically. The June 15 dates apply to Anthropic-operated platforms — the Claude API, Claude Platform on AWS, and Microsoft Foundry — while partner-operated platforms set their own lifecycle schedules . As of mid-June 2026 the partner tables showed Sonnet 4 still available (deprecated) on both Amazon Bedrock and Vertex AI, with Opus 4 retired on Bedrock but still available on Vertex AI. Check the platform-specific model lifecycle tables before you act.

How do I find all places in my codebase still using the retired IDs?

Use two complementary steps. First, export a usage CSV from the Anthropic Console Usage page, broken down by API key and model, to see which retired IDs are still receiving live traffic . Second, grep your codebase for the string 20250514 and for the convenience aliases that resolved to those snapshots — claude-opus-4, claude-sonnet-4, claude-opus-4-0, and claude-sonnet-4-0. The CSV catches runtime usage your static search might miss (config files, environment variables, third-party callers).

Opus 4.8 is 67% cheaper than Opus 4 — does the price cut mean reduced capability?

No. Anthropic positions Opus 4.8 as its most capable Opus-tier model for complex reasoning, long-horizon agentic coding, and high-autonomy work, while pricing dropped from $15/$75 per million input/output tokens on Opus 4 to $5/$25 on Opus 4.8 . The reduction reflects efficiency gains in the newer model, not a downgrade. That said, the new tokenizer can produce up to roughly 35% more tokens for the same text, so re-measure real cost — and run your own evals to confirm capability holds for your specific workload before shipping.