Introducing Anthropic’s Claude Opus 4.5: Cheaper AI, Infinite Chat, and Superhuman Coding Skills -

Anthropic on Monday released its most capable artificial intelligence model to date, cutting the price by nearly two-thirds while touting cutting-edge performance for software engineering tasks. This is a strategic move that will increase competition for AI startups from well-funded rivals OpenAI and Google.

The new model, Claude Opus 4.5, scored higher on Anthropic’s most difficult internal engineering assessment than any human candidate in the company’s history, according to documents reviewed by VentureBeat. The results highlight the rapidly advancing capabilities of AI systems and growing questions about how this technology will reshape white-collar professions.

The Amazon-backed company is pricing Claude Opus 4.5 at $5 per million input tokens and $25 per million output tokens. This is a significant reduction from the $15 and $75 prices of its predecessor, Claude Opus 4.1, released earlier this year. This move will give a wider range of developers and businesses access to cutting-edge AI capabilities and put pressure on competitors to match both performance and price.

"We want to make sure this really works for those who want to use these models." Alex Albert, head of developer relations at Anthropic, said in an exclusive interview with VentureBeat: "That’s really our focus. How can you help Claude do better at things you don’t necessarily want to do at work?"

The announcement comes as Anthropic races to maintain its position in an increasingly crowded field. OpenAI recently released GPT-5.1 and a specialized coding model called Codex Max that can operate autonomously for long periods of time. Google announced Gemini 3 just last week, and OpenAI has also raised concerns about the search giant’s progress, according to a recent report in The Information.

Opus 4.5 demonstrates improved judgment for real-world tasks, developers say

Anthropic’s internal testing has revealed what the company calls a qualitative leap in Claude Opus 4.5’s reasoning ability. According to the company’s data, the model achieved 80.9% accuracy on SWE Bench Verified, a benchmark that measures real-world software engineering tasks, outperforming OpenAI’s GPT-5.1-Codex-Max (77.9%), Anthropic’s own Sonnet 4.5 (77.2%), and Google’s Gemini 3 Pro (76.2%). This result represents a significant improvement compared to OpenAI’s current state-of-the-art model, which was released just five days ago.

But technical benchmarks are only part of the story. Albert said employee testers consistently report that the model shows improved judgment and intuition across a variety of tasks, a change Albert described as the model developing a sense of what is important in real-world situations.

"The model somehow understands this, but" Albert said. "This type of intuition and judgment about many things in the real world has developed, and it feels like a significant qualitative improvement from past models."

He cited his own workflow as an example. Albert has previously said he plans to ask AI models to gather information, but is hesitant to trust it to synthesize or prioritize. Opus 4.5 delegates more complete tasks and connects them to Slack and internal documents to create consistent summaries that align with priorities.

Opus 4.5 outperformed all human test takers in our most rigorous engineering tests.

The model’s performance in Anthropic’s internal engineering evaluation marks a notable milestone. This take-home exam is designed for prospective Performance Engineering candidates and is intended to assess technical ability and judgment under time pressure within a prescribed two-hour limit.

The company says Opus 4.5 achieved higher scores than the human test takers who took the test by using a technique called parallel test time calculation, which aggregates multiple attempts from the model and selects the best results. Without a time limit, the model matched the performance of the best human candidates in history when used within Anthropic’s coding environment, Claude Code.

The company acknowledged that the test does not measure other important professional skills, such as collaboration, communication, or intuition developed through years of experience. Still, Antropic said of the results: "It raises questions about how AI will change engineering as a profession."

Albert emphasized the importance of this discovery. "I think this is probably a kind of indication of how useful these models actually are in work contexts and in the work we do." he said. "Of course, this is an engineering job, and models are relatively advanced in engineering compared to other fields, but I think this is a very important signal to pay attention to."

Dramatic efficiency improvements reduce token usage by up to 76% on key benchmarks

Beyond raw performance, Anthropic is betting that increased efficiency will differentiate Claude Opus 4.5 in the market. According to the company, this model uses dramatically fewer tokens (units of text that the AI system processes) to achieve similar or better results compared to previous models.

According to Anthropic, at moderate effort levels, Opus 4.5 uses 76% fewer output tokens while matching the previous Sonnet 4.5 model’s highest score in SWE bench validation. At the highest effort level, Opus 4.5 outperforms Sonnet 4.5 by 4.3 percent, but still uses 48 percent fewer tokens.

To give developers more control, Anthropic has "effort parameters" This allows users to adjust the amount of computational work that the model applies to each task, balancing performance against latency and cost.

Business customers validated the efficiency claims early on. "Opus 4.5 outperforms Sonnet 4.5 and its competitors by using fewer tokens to solve the same problem." Michele Catasta, president of cloud-based coding platform Replit, said in a statement to VentureBeat. "At scale, it becomes even more efficient."

Mario Rodriguez, GitHub’s chief product officer, said early testing shows Opus 4.5 "It outperforms internal coding benchmarks while cutting token usage in half, making it especially suitable for tasks like code migration and code refactoring."

Early customers report that AI agents are learning from experience and honing their skills

One of the most notable features demonstrated by early customers involves what Anthropic calls. "self improvement agent" — AI systems that can improve their performance through iterative learning.

Rakuten, a Japanese e-commerce and internet company, tested Claude Opus 4.5 for automating office tasks. "Our agent was able to autonomously improve its functionality and was able to achieve the best performance after 4 iterations, whereas other models were unable to match its quality after more than 10 iterations." Yusuke Kaji, general manager of AI for business at Rakuten, says:

Albert explained that models are not updating their weights (fundamental parameters that define the behavior of an AI system), but rather iteratively improving the tools and approaches they use to solve problems. "It turns out that you are iteratively refining your skills for a task and trying to optimize your skills to improve your performance so that you can accomplish this task." he said.

This functionality extends beyond coding. Albert said Anthropic has seen significant improvements in creating professional documents, spreadsheets, and presentations. "They say this is the biggest change seen between generations of the model," Albert said. "So going from Sonnet 4.5 to Opus 4.5 is a bigger leap than going back to back with the previous two models."

Financial modeling firm Fundamental Research Labs reported: "Internal assessments are now 20% more accurate, 15% more efficient, and complex tasks that once seemed out of reach can now be accomplished." According to co-founder Nico Christie,

New feature targets Excel users, Chrome workflows, and eliminates chat length limits

Alongside the model release, Anthropic rolled out a series of product updates aimed at enterprise users. Claude for Excel is now generally available for Max, Team, and Enterprise users with new support for pivot tables, charts, and file uploads. Chrome browser extensions are now available to all Max users.

Perhaps most importantly, Anthropic was introduced. "infinite chat" — A feature that eliminates the limitations of the context window by automatically summarizing the first half of a conversation when it gets long. "Within Claude AI, within the product itself, compression and some of the memory processing that we’re doing effectively gives us this kind of infinite context window." Albert explained.

Anthropic released for developers "calling tools programmatically;" This allows Claude to write and run code that calls the function directly. Claude code updated "plan mode" Also available on desktop in research preview, developers can now run multiple AI agent sessions in parallel.

The market heats up as OpenAI and Google compete on performance and price

Anthropic reached $2 billion in annual revenue in the first quarter of 2025, more than doubling from $1 billion in the prior period. The number of customers spending more than $100,000 annually jumped eight times over the previous year.

The quick release of Opus 4.5, just weeks after Haiku 4.5 in October and Sonnet 4.5 in September, reflects broader industry trends. OpenAI released multiple GPT-5 variants throughout 2025, including a specialized Codex Max model in November that can operate autonomously for up to 24 hours. Google shipped Gemini 3 in mid-November after several months of development.

Albert attributes Anthropic’s accelerated pace to using Claude to accelerate the company’s development. "Both on the actual product building side and on the model research side, you’ll find great support and speed-up from Claude himself." he said.

While Opus 4.5’s price reduction could expand its addressable market, it could also depress profits. "I hope more startups will start incorporating this more into their products to make it stand out." Albert said.

However, major AI labs have invested heavily in computing infrastructure and research talent, and profitability remains elusive. The AI market is projected to exceed $1 trillion in revenue within 10 years, yet no single provider has established a dominant position in the market, even as models reach a threshold where they can meaningfully automate complex knowledge tasks.

Michael Tuell, CEO of Cursor, an AI-powered code editor called Opus 4.5 "Significant improvements over the previous Claude model within Cursor, providing better pricing and intelligence for difficult coding tasks." Scott Wu, CEO of Cognition, an AI coding startup, says this model will: "The most stringent evaluation yields stronger results and consistent performance across 30-minute autonomous coding sessions."

For companies and developers, this competition means rapidly increasing functionality at lower prices. But as AI performance in technical tasks approaches and even exceeds the level of human experts, the impact of technology on professional work becomes more than theoretical.

When asked about the results of the engineering exams and what they imply about the trajectory of AI, Albert was candid: "I think this is a very important signal to heed."

Source link

Categories

Opus 4.5 demonstrates improved judgment for real-world tasks, developers say

Opus 4.5 outperformed all human test takers in our most rigorous engineering tests.

Dramatic efficiency improvements reduce token usage by up to 76% on key benchmarks

Early customers report that AI agents are learning from experience and honing their skills

New feature targets Excel users, Chrome workflows, and eliminates chat length limits

The market heats up as OpenAI and Google compete on performance and price

Related News

BTC purchase opportunities have expanded since 2017

Philosophical puzzles of rational artificial intelligence | Massachusetts Institute of Technology News