Towards AI

The leading AI community and content platform focused on making AI accessible to all. Check out our…

Follow publication

You're reading for free via Berika Varol Malkoçoğlu's Friend Link. Become a member to access the best of Medium.

Member-only story

The First Hybrid Reasoning Model: Claude 3.7 Sonnet

For readers who can’t see the full story, click here.

Anthropic announced in February 2025 that they had developed the first hybrid reasoning model in the literature.

Claude 3.7 Sonnet

They also introduced Claude Code, a command line tool for developers.

This is a super development, but we already have reasoning models like OpenAI o3 mini, DeepSeek-R1, Gemini 2.0 Flash Thinking. This is not a new technology.

But Anthropic’s claim is that it has a different philosophy than other reasoning models on the market.

We’ve developed Claude 3.7 Sonnet with a different philosophy from other reasoning models on the market.

The philosophy of the hybrid approach of the Claude 3.7 Sonnet is based on the fact that the act of reasoning takes place in a single version, rather than in different versions like other models.

They argue that reasoning should be an integrated capability into existing LLM models, just as humans use a single brain for quick reactions and deep thinking. They argue that this hybrid approach will improve the user experience.

They are right, it is more useful to go through a single model instead of different models according to the type of question.

Users can also control how long the model has to think. It is possible to choose whether the model responds normally or thinks more extensive before responding. If the model is used as an API, it is also possible to limit the users’ thinking budget.

Claude 3.7 Sonnet example chat

In normal mode, Claude 3.7 Sonnet represents an upgraded version of Claude 3.5 Sonnet. In extended thinking mode, it can successfully perform math, physics, instruction following, coding or many other tasks by thinking before responding.

However, in developing the model, they focused on real-world tasks that better reflect how they use LLMs rather than competing problems in mathematics and computer science.

How is the cost-performance balance?

Early test results show that the Claude 3.7 Sonnet outperforms existing models in some areas.

Outperforms popular peers in real-world coding tasks

Claude 3.7 Sonnet Software Engineering performance

It also outperformed the OpenAI o1 model in TAU-bench, a framework that tests AI Agent in complex real-world tasks.

Claude 3.7 Sonnet Agentic tool performance

However, Claude 3.7 Sonnet did not surpass the OpenAI o3 mini model in some areas such as general reasoning, extended thinking and mathematics.

Claude 3.7 Sonnet performance

Overall, the Claude 3.7 Sonnet model is quite successful, outperforming all models except the OpenAI o3 mini model in almost every area and improving the user experience with a hybrid approach.

Currently, people with Anthropic’s premium plan can access the reasoning section of this model. So what are the prices, is the cost-performance balance good compared to its peers?

Claude 3.7 Sonnet cost:

  • 1 million input tokens: $3
  • 1 million output tokens: $15

OpenAI o1 cost:

  • 1 million input tokens: $15
  • 1 million output tokens: $60

OpenAI o3 mini cost:

  • 1 million input tokens: $1.1
  • 1 million output tokens: $4.4

DeepSeek-R1 cost:

  • 1 million input tokens: $0.55
  • 1 million output tokens: $2.19

The popular LLM models and Claude 3.7 Sonnet compared in terms of performance-cost;

  • The OpenAI o1 model performed better in some areas, but at a very high cost.
  • The OpenAI o3 mini has outperformed in some areas and very low cost.
  • DeepSeek-R1 underperformed in all areas except math, but at a very low cost.

Conclusion

There are head-to-head performance results in the comparison. The most critical point that can separate these 4 models, which produce similar results in most cases, will be the cost and the hybrid structure, which is a new feature. For example; if you have a structure that uses a model according to the incoming question, instead of installing OpenAI 4o for normal questions and OpenAI o3 mini model for reasoning and processing them separately, the Claude 3.7 Sonnet model that performs both can be preferred. Or, if a reasoning-oriented structure is to be built, the OpenAI o3 mini model can be preferred because the cost is lower and the performance is at the Claude 3.7 Sonnet level (sometimes more).

In summary; Claude 3.7 Sonnet outperforms many other models in terms of performance. It has a small performance difference with OpenAI o3 mini. But when the differences in cost are big enough to make developers think, it will probably be inevitable to continue with existing models.

In order for Anthropic to compete with its competitors, it is very important to offer affordable costs in addition to performance.

Finally:

  • If you liked this article, I’m waiting for your claps 👏
  • To stay updated on more content, you can follow me👀

Published in Towards AI

The leading AI community and content platform focused on making AI accessible to all. Check out our new course platform: https://academy.towardsai.net/courses/beginner-to-advanced-llm-dev

Written by Berika Varol Malkoçoğlu

PhD | Data Scientist | Lecturer | AI Researcher

No responses yet

Write a response

Recommended from Medium

Lists

See more recommendations