Anthropic: Claude Opus 4.6 Adds 1M-Token Context Window and New Agentic Coding, API, And Office Tool Updates

By Amit Chowdhry ● Today at 6:57 PM

Anthropic has released Claude Opus 4.6, positioning the model as an upgrade to its Opus line with stronger coding performance, more deliberate planning, and improved reliability on longer, more agentic workflows. The company says Opus 4.6 is better at operating in large codebases, conducting code reviews, and debugging by catching its own mistakes, while also applying those improvements to everyday knowledge work tasks such as financial analysis, research, and building documents, spreadsheets, and presentations.

Claude Opus 4.6 includes a 1M-token context window in beta. And Anthropic says the longer context meaningfully improves retrieval and reasoning across large bodies of text, aiming to reduce the “context rot” users experience as conversations grow. The company highlights needle-in-a-haystack testing and long-context benchmarks to argue Opus 4.6 can track and use information over hundreds of thousands of tokens with less drift than prior models.

Anthropic also reports state-of-the-art results across several evaluations, citing strengths in agentic coding and complex reasoning tests, as well as strong performance on benchmarks designed to reflect economically valuable professional work. It additionally emphasizes online information-finding performance on search-oriented evaluations.

On safety, Anthropic says Opus 4.6 maintains an overall safety profile that is as good as or better than other frontier models, with low rates of misaligned behavior in internal safety evaluations. The company says it expanded its testing set for this release, including new evaluations tied to user well-being and cybersecurity, and added new probes intended to detect harmful cyber-related responses. Alongside safeguards, Anthropic is also pushing the model toward cyberdefensive use cases, such as identifying and patching vulnerabilities in open-source software.

The release comes with product and API updates designed to support longer-running agent workflows. In Claude Code, Anthropic is introducing agent teams in a research preview, enabling multiple agents to work in parallel and coordinate on larger tasks. On the API, Anthropic is adding context compaction in beta, allowing the model to summarize older context to extend task duration; adaptive thinking, where the model adjusts how much extended thinking it uses based on the task; and effort controls that let developers tune intelligence, speed, and cost. Anthropic also states that Opus 4.6 supports outputs up to 128k tokens and offers a US-only inference option at a pricing multiplier for workloads that must remain in the United States.

Anthropic says Claude Opus 4.6 is available on claude.ai, via the Claude API, and through major cloud platforms, with pricing unchanged at $5 per million input tokens and $25 per million output tokens, while applying premium long-context pricing for prompts above 200k tokens.

KEY QUOTES

“Claude Opus 4.6 is the strongest model Anthropic has shipped. It takes complicated requests and actually follows through, breaking them into concrete steps, executing, and producing polished work even when the task is ambitious. For Notion users, it feels less like a tool and more like a capable collaborator.”

Sarah Sachs, AI Lead at Notion

“Early testing shows Claude Opus 4.6 delivering on the complex, multi-step coding work developers face every day—especially agentic workflows that demand planning and tool calling. This starts unlocking long-horizon tasks at the frontier.”

Mario Rodriguez, Chief Product Officer at GitHub

“Claude Opus 4.6 is a huge leap for agentic planning. It breaks complex tasks into independent subtasks, runs tools and subagents in parallel, and identifies blockers with real precision.”

Michele Catasta, President of Replit

“Claude Opus 4.6 is the best model we’ve tested yet. Its reasoning and planning capabilities have been exceptional at powering our AI Teammates. It’s also a fantastic coding model – its ability to navigate a large codebase and identify the right changes to make is state of the art.”

Amritansh Raghav, Interim CTO at Asana

“Claude Opus 4.6 reasons through complex problems at a level we haven’t seen before. It considers edge cases that other models miss and consistently lands on more elegant, well-considered solutions. We’re particularly impressed with Opus 4.6 in Devin Review, where it’s increased our bug catching rates.”

Scott Wu, CEO of Cognition

“Claude Opus 4.6 feels noticeably better than Opus 4.5 in Windsurf, especially on tasks that require careful exploration like debugging and understanding unfamiliar codebases. We’ve noticed Opus 4.6 thinks longer, which pays off when deeper reasoning is needed.”

Jeff Wang, CEO of Windsurf

“Claude Opus 4.6 represents a meaningful leap in long-context performance. In our testing, we saw it handle much larger bodies of information with a level of consistency that strengthens how we design and deploy complex research workflows. Progress in this area gives us more powerful building blocks to deliver truly expert-grade systems professionals can trust.”

Joel Hron, CTO at Thomson Reuters

“Across 40 cybersecurity investigations, Claude Opus 4.6 produced the best results 38 of 40 times in a blind ranking against Claude 4.5 models. Each model ran end to end on the same agentic harness with up to 9 subagents and 100+ tool calls.”

Stian Kirkeberg, Head of AI & ML, NBIM

“Claude Opus 4.6 is the new frontier on long-running tasks from our internal benchmarks and testing. It’s also been highly effective at reviewing code.”

Michael Truell, Co-founder & CEO, Cursor

“Claude Opus 4.6 achieved the highest BigLaw Bench score of any Claude model at 90.2%. With 40% perfect scores and 84% above 0.8, it’s remarkably capable for legal reasoning.”

Niko Grupen, Head of AI Research, Harvey

“Claude Opus 4.6 autonomously closed 13 issues and assigned 12 issues to the right team members in a single day, managing a ~50-person organization across 6 repositories. It handled both product and organizational decisions while synthesizing context across multiple domains, and it knew when to escalate to a human.”

Yusuke Kaji, General Manager, AI, Rakuten

“Claude Opus 4.6 is an uplift in design quality. It works beautifully with our design systems and it’s more autonomous, which is core to Lovable’s values. People should be creating things that matter, not micromanaging AI.”

Fabian Hedin, Co-founder, Lovable

“Claude Opus 4.6 excels in high-reasoning tasks like multi-source analysis across legal, financial, and technical content. Box’s eval showed a 10% lift in performance, reaching 68% vs. a 58% baseline, and near-perfect scores in technical domains.”

Yashodha Bhavnani, Head of AI, Box

“Claude Opus 4.6 generates complex, interactive apps and prototypes in Figma Make with an impressive creative range. The model translates detailed designs and multi-layered tasks into code on the first try, making it a powerful starting point for teams to explore and build ideas.”

Loredana Crisan, Chief Design Officer, Figma

“Claude Opus 4.6 is the best Anthropic model we’ve tested. It understands intent with minimal prompting and went above and beyond, exploring and creating details I didn’t even know I wanted until I saw them. It felt like I was working with the model, not waiting on it.”

Paulo Arruda, Staff Engineer, Shopify

“Both hands-on testing and evals show Claude Opus 4.6 is a meaningful improvement for design systems and large codebases, use cases that drive enormous enterprise value. It also one-shotted a fully functional physics engine, handling a large multi-scope task in a single pass.”

Eric Simons, CEO, Bolt.new

“Claude Opus 4.6 is the biggest leap I’ve seen in months. I’m more comfortable giving it a sequence of tasks across the stack and letting it run. It’s smart enough to use subagents for the individual pieces.”

Jerry Tsui, Staff Software Engineer, Ramp

“Claude Opus 4.6 handled a multi-million-line codebase migration like a senior engineer. It planned up front, adapted its strategy as it learned, and finished in half the time.”

Gregor Stewart, Chief AI Officer, SentinelOne

“We only ship models in v0 when developers will genuinely feel the difference. Claude Opus 4.6 passed that bar with ease. Its frontier-level reasoning, especially with edge cases, helps v0 to deliver on our number-one aim: to let anyone elevate their ideas from prototype to production.”

Zeb Hermann, General Manager, v0, Vercel

“The performance jump with Claude Opus 4.6 feels almost unbelievable. Real-world tasks that were challenging for Opus [4.5] suddenly became easy. This feels like a watershed moment for spreadsheet agents on Shortcut.”

Nico Christie, Co-founder & CTO, Shortcut.ai

 

 

Exit mobile version