Weibo’s new open source AI model VibeThinker-1.5B outperforms DeepSeek-R1 on a $7,800 post-training budget



Another day in late 2025, another impressive achievement from a Chinese open source artificial intelligence company.

The AI ​​arm of Chinese social networking company Weibo recently released the open source VibeThinker-1.5B. It is a 1.5 billion parameter large language model (LLM) that is a tweaked version of rival Chinese technology company Alibaba’s Qwen2.5-Math-1.5B.

Now, researchers and corporate developers can download and use it for free, even for commercial purposes, under the permissive MIT license on Hugging Face, GitHub, and ModelScope, and technical reports are posted on the open-access scientific publishing site arxiv.org.

Furthermore, despite its compact size, VibeThinker-1.5B achieves benchmark-leading inference performance on math and code tasks, matching or exceeding models hundreds of times its size, and even outperforming Chinese rival DeepSeek’s famous R1 (671 billion parameter model), which made headlines earlier this year, on formal inference benchmarks.

Additionally, it outperforms Mistral AI’s Magistral Medium and is comparable to Anthropic’s Claude Opus 4 and OpenAI’s gpt-oss-20B Medium, but requires less infrastructure and investment.

Also, post-training was done with a budget of only USD 7,800 in computing resources (3900 GPU hours on an Nvidia H800). This is far less than the tens or even hundreds of thousands of dollars typically required to fine-tune a similar or larger model.

However, note that this is not the total cost of model development. The LLM is trained in stages. First, pre-training occurs, where the model learns basic language structures and general knowledge by predicting next words from vast amounts of text from the internet, books, and articles. This gives them fluency, but they don’t really know how to follow instructions or have a conversation.

Post-training then occurs, using a much smaller, higher-quality dataset (usually a collection of example questions, prompts, and expert-written answers) to teach the model how to respond meaningfully, reason about the problem, and meet human expectations. Still, the cost-effectiveness after training with Weibo’s VibeThinker-1.5B is noteworthy and should be praised.

This open-source release challenges assumptions about the parameter scale, computational intensity, and minimum viable size of high-performance LLMs.

Alternative training approach: spectrum to signal

VibeThinker-1.5B’s performance is not due to scale, but to the training framework behind it, the Spectral Signal Transformation Principle (SSP).

Rather than optimizing a model purely for single-answer correctness (Pass@1), the SSP framework separates supervised fine-tuning (SFT) and reinforcement learning (RL) into two distinct phases with different goals.

  • SFT (“spectral phase”): The model is trained to maximize the diversity across potential correct answers, improving Pass@K scores. This builds a variety of reasonable solutions.

  • RL (“signal phase”): A second-stage reinforcement learning system, called MaxEnt-Guided Policy Optimization (MGPO), is used to identify and amplify the most correct paths from this diverse pool of solutions. MGPO focuses learning using entropy-based weighting to prioritize problems for which the model is most uncertain.

The authors argue that this separation allows small-scale models to more effectively explore the inference space and achieve signal amplification without relying on a huge number of parameters.

VibeThinker-1.5B makes a convincing case that the industry’s reliance on parameter scaling as the only means to improve inference performance may be outdated.

By adopting a diversity-first training pipeline, WeiboAI has shown that smaller, more accessible models can match and even outperform billion-dollar systems on logic-intensive tasks.

Low resource usage is one of the most important aspects of VibeThinker-1.5B. At less than $8,000, post-training costs are 30 to 60 times lower than models like DeepSeek R1 and MiniMax-M1, which cost between $294,000 and $535,000 to train.

Overall domain performance

Despite its small size, VibeThinker-1.5B provides cross-domain inference that outperforms many large open source and commercial models.

model

AIME25

live code bench v6

GPQA Diamond

Vibe sinker-1.5B

74.4

51.1

46.7

GPT-OSS-20B-Medium

72.1

54.9

66.0

claude opus 4

69.2

56.6

79.6

Minimax M1 (456B)

74.6

62.3

69.2

Deep Seek R1 (671B)

70.0

65.9

71.5

Kimi K2 (1.09T)

49.5

53.7

75.1

VibeThinker was benchmarked against both inference-centric models (Magistral, Claude, OpenAI o3-mini) and non-inference LLMs (GPT-4.1, Kimi K2, DeepSeek V3). Across structured inference benchmarks, this model consistently outperformed non-inference models, regardless of scale.

  • In AIME24 (Math), he beat Kimi K2 (1.09T) by over 10 points (80.3 vs. 69.6).

  • In LiveCodeBench v6, it outperformed Claude Opus 4 (51.1 vs. 47.4).

  • On GPQA, it scored lower than GPT-4.1 and Claude, but still achieved twice the score of the base model (from 16.4 to 46.7).

This supports the authors’ contention that size alone is not the only path to inference ability, and with the right training design, smaller models can reach or even exceed the performance of much larger systems on the target task.

In particular, it achieves performance comparable to models hundreds of times larger on math and code, but lags behind general knowledge reasoning (GPQA), where larger models maintain an advantage.

This suggests a potential specialization trade-off. While VibeThinker is great for structured logical tasks, it lacks the ability to recall extensive encyclopedias, a known limitation of smaller architectures.

Guidance for enterprise implementation

This release includes the recommended inference settings (temperature = 0.6, top_p = 0.95, max tokens = 40960).

The model is small enough to be deployed on edge devices such as mobile phones and vehicle embedded systems, yet inference costs are estimated to be 20 to 70 times lower than larger models.

This positions VibeThinker-1.5B not just as a research achievement, but as a potential foundation for a cost-effective, locally deployable inference system.

Weibo strategy and market position

Launched by Sina Corporation in 2009, Weibo remains a cornerstone of China’s social media ecosystem. The platform, also known as China’s version of X (formerly Twitter), blends the features of microblogging, multimedia content, and trending topics with a regulatory environment shaped by intense government oversight.

Despite counting 600 million monthly active users (more than twice the number of

In response, Weibo focused on monetizing the creator economy, live streaming, and vertical video, and added tools for influencer engagement, e-commerce integration, and richer analytics for brands.

Because the platform serves as a digital public square, it is also subject to regulatory scrutiny. Chinese authorities continue to apply pressure on issues ranging from content governance to data security. In September 2025, Weibo was one of the platforms mentioned in an official warning, highlighting that it continues to be exposed to policy risks.

Weibo’s AI R&D efforts, exemplified by the release of VibeThinker-1.5B, signal a shift in ambition. Beyond being a media platform, Weibo leverages its capital reserves, user behavior data, and in-house research capabilities to pursue adjacent technology areas, establishing itself as a player in the next stage of China’s AI development.

Implications for enterprise technology decision makers

For engineering leaders and enterprise AI teams, the release of VibeThinker brings practical implications to everything from orchestration pipelines to cost modeling.

A 1.5B parameter model that outperforms models 100 times larger on math and programming tasks not only saves compute but also changes the balance of the architecture. This enables LLM inference on constrained infrastructure, reduces latency at the edge, and lowers the barrier to entry for applications that require API access to closed frontier-scale models.

This is important for enterprise ML leaders looking to deploy agents with inference capabilities within their existing systems, or for platform owners tasked with integrating LLM into automated workflows.

It also applies to those running reinforcement learning from human feedback (RLHF) pipelines or managing inference optimization across hybrid cloud environments.

This model’s post-training methodology, particularly the entropy-targeted reinforcement learning approach, provides a roadmap for teams looking to refine smaller checkpoints rather than relying on extensive pre-training.

VibeThinker’s benchmark transparency and data decontamination procedures also address another emerging priority in enterprise AI: auditability. Although its performance on general knowledge tests still lags behind large-scale frontier models, its task-specific reliability makes it an attractive candidate in controlled environments where accuracy is more important than coverage.

In other words, VibeThinker-1.5B is not just a research milestone, but a strong candidate for real-world enterprise use, deployment, and learning. This suggests that a new class of compact, inference-optimized models is viable for enterprise use cases that were previously the domain of much larger systems. For organizations looking to balance cost, latency, interpretability, and control, this is a great new option to the long and ever-growing list of Chinese open source products.



Source link