You're reading for free via Berika Varol Malkoçoğlu's Friend Link. Become a member to access the best of Medium.

Member-only story

The First Hybrid Reasoning Model: Claude 3.7 Sonnet

Berika Varol Malkoçoğlu

Published in

Towards AI

4 min readFeb 26, 2025

For readers who can’t see the full story, click here.

Anthropic announced in February 2025 that they had developed the first hybrid reasoning model in the literature.

Claude 3.7 Sonnet

They also introduced Claude Code, a command line tool for developers.

This is a super development, but we already have reasoning models like OpenAI o3 mini, DeepSeek-R1, Gemini 2.0 Flash Thinking. This is not a new technology.

But Anthropic’s claim is that it has a different philosophy than other reasoning models on the market.

We’ve developed Claude 3.7 Sonnet with a different philosophy from other reasoning models on the market.

The philosophy of the hybrid approach of the Claude 3.7 Sonnet is based on the fact that the act of reasoning takes place in a single version, rather than in different versions like other models.

They argue that reasoning should be an integrated capability into existing LLM models, just as humans use a single brain for quick reactions and deep thinking. They argue that this hybrid approach will improve the user experience.

They are right, it is more useful to go through a single model instead of different models according to the type of question.

Users can also control how long the model has to think. It is possible to choose whether the model responds normally or thinks more extensive before responding. If the model is used as an API, it is also possible to limit the users’ thinking budget.

In normal mode, Claude 3.7 Sonnet represents an upgraded version of Claude 3.5 Sonnet. In extended thinking mode, it can successfully perform math, physics, instruction following, coding or many other tasks by thinking before responding.

However, in developing the model, they focused on real-world tasks that better reflect how they use LLMs rather than competing problems in mathematics and computer science.

How is the cost-performance balance?

Early test results show that the Claude 3.7 Sonnet outperforms existing models in some areas.

Outperforms popular peers in real-world coding tasks

Claude 3.7 Sonnet Software Engineering performance

It also outperformed the OpenAI o1 model in TAU-bench, a framework that tests AI Agent in complex real-world tasks.

Claude 3.7 Sonnet Agentic tool performance

However, Claude 3.7 Sonnet did not surpass the OpenAI o3 mini model in some areas such as general reasoning, extended thinking and mathematics.

Overall, the Claude 3.7 Sonnet model is quite successful, outperforming all models except the OpenAI o3 mini model in almost every area and improving the user experience with a hybrid approach.

Currently, people with Anthropic’s premium plan can access the reasoning section of this model. So what are the prices, is the cost-performance balance good compared to its peers?

Claude 3.7 Sonnet cost:

1 million input tokens: $3
1 million output tokens: $15

OpenAI o1 cost:

1 million input tokens: $15
1 million output tokens: $60

OpenAI o3 mini cost:

1 million input tokens: $1.1
1 million output tokens: $4.4

DeepSeek-R1 cost:

1 million input tokens: $0.55
1 million output tokens: $2.19

DeepSeek-R1 is biased?

China’s newest artificial intelligence model, DeepSeek-R1, has upset the balance and its results have raised questions.

pub.towardsai.net

The popular LLM models and Claude 3.7 Sonnet compared in terms of performance-cost;

The OpenAI o1 model performed better in some areas, but at a very high cost.
The OpenAI o3 mini has outperformed in some areas and very low cost.
DeepSeek-R1 underperformed in all areas except math, but at a very low cost.

Conclusion

There are head-to-head performance results in the comparison. The most critical point that can separate these 4 models, which produce similar results in most cases, will be the cost and the hybrid structure, which is a new feature. For example; if you have a structure that uses a model according to the incoming question, instead of installing OpenAI 4o for normal questions and OpenAI o3 mini model for reasoning and processing them separately, the Claude 3.7 Sonnet model that performs both can be preferred. Or, if a reasoning-oriented structure is to be built, the OpenAI o3 mini model can be preferred because the cost is lower and the performance is at the Claude 3.7 Sonnet level (sometimes more).

In summary; Claude 3.7 Sonnet outperforms many other models in terms of performance. It has a small performance difference with OpenAI o3 mini. But when the differences in cost are big enough to make developers think, it will probably be inevitable to continue with existing models.

In order for Anthropic to compete with its competitors, it is very important to offer affordable costs in addition to performance.

Finally:

If you liked this article, I’m waiting for your claps 👏
To stay updated on more content, you can follow me👀

Towards AI

The First Hybrid Reasoning Model: Claude 3.7 Sonnet

How is the cost-performance balance?

DeepSeek-R1 is biased?

China’s newest artificial intelligence model, DeepSeek-R1, has upset the balance and its results have raised questions.

Conclusion

Finally:

Published in Towards AI

Written by Berika Varol Malkoçoğlu

No responses yet

More from Berika Varol Malkoçoğlu and Towards AI

YOLOv11 Architecture

Previously we mentioned its younger brother YOLOv10. Today we continue with YOLOv11, the newest in the series. YOLO is an almost unrivaled…

Explaining Transformers as Simple as Possible through a Small Language Model

And understanding Vector Transformations and Vectorizations

Langchain (Upgraded) + DeepSeek-R1 + RAG Just Revolutionized AI Forever

Last week, I made a video about DeepSeek-V3, and it caused a huge stir in the global AI community.

YOLOv11 Mimarisi

Daha önce küçük kardeşi YOLOv10'dan bahsetmiştik. Bugün ise serinin en yenisi YOLOv11 ile devam ediyoruz. YOLO nesne tespit alanında çok…

Recommended from Medium

I hacked Copilot AI’s system prompt and it reveals the chatbot secretly builds a dossier about you

AI is profiling you — and I have the system instructions to prove it

Hard-Earned Lessons from a Year of Building AI Agents

Lessons from building AI agents for both developers and everyday users — including the successes, challenges, and unexpected learnings.

Lists

Predictive Modeling w/ Python

Natural Language Processing

Practical Guides to Machine Learning

ChatGPT prompts

How to build your own AI desktop app… by yourself

No magic wands, no Cursor AI: in 4 steps you can have a Mistral Small 22B powered chat-bot portable application. And here is how

All About Claude 3.7 In One Article

I took some time to do research and have tried my best to cover all the points related to Claude 3.7.

Google just confirmed the AI reality many programmers are desperately trying to deny

AI is slowly taking over coding but many programmers are still sticking their head in the sand about what’s coming…

I “vibe-coded” over 160,000 lines of code. It IS real.

When I was getting my MS from CMU and coding up the algorithmic trading platform NextTrade, I wrote every single goddamn line of code…