Need smarter insights in your inbox? Sign up for our weekly newsletter to get only the things that matter to enterprise AI, data and security leaders. Subscribe now
A comprehensive new study reveals that open-source artificial intelligence models can consume significantly more computing resources than closed-source competitors when performing the same task, and can undermine cost benefits, reshaping the way companies evaluate AI deployment strategies.
A study conducted by AI company Nous Research found that Open-Weight models use 1.5 to 4 times more tokens (basic units of AI calculations) than closed models like OpenAI and humanity. For simple knowledge questions, the gaps widened dramatically, with some open models using up to 10 times the tokens.
Measurement of thinking efficiency in inference models: Missing benchmarkhttps://t.co/b1e1rjx6vz
Token use was measured across the inference model. The open model outputs 1.5-4x more tokens than the closed model on the same task, but there is a large variation depending on the task type ( pic.twitter.com/ly1083won8
– Nous Research (@nousResearch) August 14, 2025
“The openweight model uses 1.5-4x more tokens (up to 10 times more for simple knowledge questions) than the closed ones, which can be more expensive per query despite lower per capita costs.”
The findings challenge the general assumption in the AI industry that open source models offer clear economic benefits over their own alternatives. While open source models typically cost less to run per token, this study suggests that this advantage can be easily offset when more tokens are needed to infer about a particular problem.
AI scaling reaches its limit
Power caps, rising token costs, and inference delays are rebuilding Enterprise AI. Join exclusive salons and discover what your top team looks like.
- Turning energy into a strategic advantage
- Architects efficient inference for real throughput gain
- Unlock competitive ROI with a sustainable AI system
Make sure you have your place to stay first: https://bit.ly/4mwgngo
The actual cost of AI: Why a “cheap” model might break your budget
This study explored 19 different AI models across three categories of tasks: basic knowledge questions, mathematical problems, and logic puzzles. The team measured “token efficiency” (the number of computational unit models used by computational unit models compared to the complexity of the solution).
“Token efficiency is an important indicator for several practical reasons,” the researchers noted. “Hosts with open weight models may be inexpensive, but the advantage of this cost can easily be offset if more tokens are needed to infer about a particular problem.”

Inefficiency is particularly pronounced in large inference models (LRMs) that use extended “threads of thought” to solve complex problems. These models are designed to step-by-step questions, allowing you to consume simple questions that ponder thousands of tokens that require minimal calculations.
For basic knowledge such as “What is Australia’s capital?”, this study found that inference models spend “simple knowledge questions to reflect on hundreds of tokens.”
Which AI model actually offers Bang for your spending
This study revealed significant differences between model providers. Openai’s models, particularly the O4-MINI and the newly released open source GPT-Oss variant, exhibited exceptional token efficiency, particularly for mathematical problems. In this study, we found that Openai models “supported by their extreme token efficiency in mathematical problems,” using up to three times fewer tokens than other commercial models.
Among the open source options, Nvidia’s Llama-3.3-Nemotron-Super-49B-V1 emerged as “the most token-efficient openweight model in all domains,” while newer models from companies like Magistral showed “very high token use” as an outlier.
The efficiency gaps vary widely depending on the task type. The open model uses about twice as many tokens for mathematical and logical problems, but the differences have bloated due to simple knowledge questions that do not require efficient inference.

What enterprise leaders need to know about AI computing costs
The findings immediately affect the adoption of enterprise AI, where computing costs can rapidly expand with use. Companies that evaluate AI models often focus on accuracy benchmarks and pricing per token, but may overlook the total calculation requirements for actual tasks.
“Better token efficiency in closed weight models often compensates for API pricing for these models,” the researchers found when analysing the total inference costs.
The study also revealed that closure model providers appear to be actively optimizing for efficiency. “The closed weight model has been repeatedly optimized to use fewer tokens to reduce inference costs,” and the open source model “is increasing the use of newer versions of tokens, perhaps reflecting priorities for better inference performance.”

How researchers deciphered the code for AI efficiency measurements
The research team faced unique challenges in measuring efficiency across different model architectures. Many closed source models do not reveal the raw inference process and provide a compression overview of internal calculations to prevent competitors from copying the technique.
To address this, the researchers used a completion token (the total unit of calculation charged for each query) as a proxy for their inference efforts. They discover that “latest closed source models do not share traces of raw inference,” and instead “using smaller linguistic models to transcribe a series of thoughts into a summary or compressed representation.”
The methodology of this study included tests using a modified version of the well-known problem to minimize the impact of remembered solutions, such as changing variables for mathematical competition problems from the American Invited Mathematical Examination (AIME).

The future of AI efficiency: What will come next?
Researchers suggest that token efficiency should be a major optimization target along with the accuracy of future model development. “Higher density COTs allow for more efficient use of contexts and may counteract the degradation of contexts during challenging inference tasks,” they write.
The release of Openai’s open source GPT-OSS model, demonstrating cutting-edge efficiency with a “freely accessible COT”, could serve as a reference point for optimizing other open source models.
The complete research dataset and evaluation code are available on GitHub, allowing other researchers to validate and extend their findings. As the AI industry competes for stronger reasoning capabilities, this study suggests that the actual competition is not about who can build the smartest AI, but about who can build the most efficient one.
After all, in a world where all tokens are counted, the most wasted models may be priced from the market, regardless of how well they are thought up.
Source link
