That “cheap” open source AI model is actually burning your calculation budget


Need smarter insights in your inbox? Sign up for our weekly newsletter to get only the things that matter to enterprise AI, data and security leaders. Subscribe now


A comprehensive new study reveals that open-source artificial intelligence models can consume significantly more computing resources than closed-source competitors when performing the same task, and can undermine cost benefits, reshaping the way companies evaluate AI deployment strategies.

A study conducted by AI company Nous Research found that Open-Weight models use 1.5 to 4 times more tokens (basic units of AI calculations) than closed models like OpenAI and humanity. For simple knowledge questions, the gaps widened dramatically, with some open models using up to 10 times the tokens.

“The openweight model uses 1.5-4x more tokens (up to 10 times more for simple knowledge questions) than the closed ones, which can be more expensive per query despite lower per capita costs.”

The findings challenge the general assumption in the AI industry that open source models offer clear economic benefits over their own alternatives. While open source models typically cost less to run per token, this study suggests that this advantage can be easily offset when more tokens are needed to infer about a particular problem.


AI scaling reaches its limit

Power caps, rising token costs, and inference delays are rebuilding Enterprise AI. Join exclusive salons and discover what your top team looks like.

  • Turning energy into a strategic advantage
  • Architects efficient inference for real throughput gain
  • Unlock competitive ROI with a sustainable AI system

Make sure you have your place to stay first: https://bit.ly/4mwgngo


The actual cost of AI: Why a “cheap” model might break your budget

This study explored 19 different AI models across three categories of tasks: basic knowledge questions, mathematical problems, and logic puzzles. The team measured “token efficiency” (the number of computational unit models used by computational unit models compared to the complexity of the solution).

“Token efficiency is an important indicator for several practical reasons,” the researchers noted. “Hosts with open weight models may be inexpensive, but the advantage of this cost can easily be offset if more tokens are needed to infer about a particular problem.”

Open source AI models use up to 12 times more computational resources than the most efficient closed models for basic knowledge questions. (Credit: Nous Research)

Inefficiency is particularly pronounced in large inference models (LRMs) that use extended “threads of thought” to solve complex problems. These models are designed to step-by-step questions, allowing you to consume simple questions that ponder thousands of tokens that require minimal calculations.

For basic knowledge such as “What is Australia’s capital?”, this study found that inference models spend “simple knowledge questions to reflect on hundreds of tokens.”

Which AI model actually offers Bang for your spending

This study revealed significant differences between model providers. Openai’s models, particularly the O4-MINI and the newly released open source GPT-Oss variant, exhibited exceptional token efficiency, particularly for mathematical problems. In this study, we found that Openai models “supported by their extreme token efficiency in mathematical problems,” using up to three times fewer tokens than other commercial models.

Among the open source options, Nvidia’s Llama-3.3-Nemotron-Super-49B-V1 emerged as “the most token-efficient openweight model in all domains,” while newer models from companies like Magistral showed “very high token use” as an outlier.

The efficiency gaps vary widely depending on the task type. The open model uses about twice as many tokens for mathematical and logical problems, but the differences have bloated due to simple knowledge questions that do not require efficient inference.

Openai’s latest model costs the lowest simple questions, but some open source alternatives can be significantly more expensive despite their lower price per token. (Credit: Nous Research)

What enterprise leaders need to know about AI computing costs

The findings immediately affect the adoption of enterprise AI, where computing costs can rapidly expand with use. Companies that evaluate AI models often focus on accuracy benchmarks and pricing per token, but may overlook the total calculation requirements for actual tasks.

“Better token efficiency in closed weight models often compensates for API pricing for these models,” the researchers found when analysing the total inference costs.

The study also revealed that closure model providers appear to be actively optimizing for efficiency. “The closed weight model has been repeatedly optimized to use fewer tokens to reduce inference costs,” and the open source model “is increasing the use of newer versions of tokens, perhaps reflecting priorities for better inference performance.”

Computational overhead varies dramatically between AI providers, with some models using over 1,000 tokens for simple task internal inference. (Credit: Nous Research)

How researchers deciphered the code for AI efficiency measurements

The research team faced unique challenges in measuring efficiency across different model architectures. Many closed source models do not reveal the raw inference process and provide a compression overview of internal calculations to prevent competitors from copying the technique.

To address this, the researchers used a completion token (the total unit of calculation charged for each query) as a proxy for their inference efforts. They discover that “latest closed source models do not share traces of raw inference,” and instead “using smaller linguistic models to transcribe a series of thoughts into a summary or compressed representation.”

The methodology of this study included tests using a modified version of the well-known problem to minimize the impact of remembered solutions, such as changing variables for mathematical competition problems from the American Invited Mathematical Examination (AIME).

Different AI models show different relationships between calculations and output, with some providers compressing the traces of inference and others providing full details. (Credit: Nous Research)

The future of AI efficiency: What will come next?

Researchers suggest that token efficiency should be a major optimization target along with the accuracy of future model development. “Higher density COTs allow for more efficient use of contexts and may counteract the degradation of contexts during challenging inference tasks,” they write.

The release of Openai’s open source GPT-OSS model, demonstrating cutting-edge efficiency with a “freely accessible COT”, could serve as a reference point for optimizing other open source models.

The complete research dataset and evaluation code are available on GitHub, allowing other researchers to validate and extend their findings. As the AI industry competes for stronger reasoning capabilities, this study suggests that the actual competition is not about who can build the smartest AI, but about who can build the most efficient one.

After all, in a world where all tokens are counted, the most wasted models may be priced from the market, regardless of how well they are thought up.



Source link