
Just hours after OpenAI updated its flagship platform model GPT-5 to GPT-5.1, promising lower overall token usage and a more comfortable personality with more preset options, Chinese search giant Baidu announced its next-generation platform model ERNIE 5.0, along with a series of AI product upgrades and strategic international expansion.
The goal is to position ourselves as a global contender in the increasingly competitive enterprise AI market.
Announced at the company’s Baidu World 2025 event, ERNIE 5.0 is a unique native omnimodal model designed to collaboratively process and generate content across text, images, audio, and video.
Baidu’s recently released ERNIE-4.5-VL-28B-A3B-Thinking is open source under the enterprise and permissive Apache 2.0 license. Unlike , ERNIE 5.0 is a proprietary model and is only available via Baidu’s ERNIE Bot website (which had to be manually selected from the model picker dropdown) and the Qianfan cloud platform application programming interface (API) for enterprise customers.
Alongside the model launch, Baidu introduced significant updates to its digital human platform, no-code tools, and general-purpose AI agents. All of this is aimed at expanding the AI footprint beyond China.
The company also introduced ERNIE 5.0 Preview 1022, a variant optimized for text-centric tasks, and a general preview model with a balance between modalities.
Baidu emphasized that ERNIE 5.0 represents a change in how intelligence is deployed at scale, with CEO Robin Li stating, “Once we embed AI inside, it becomes a native capability, transforming intelligence from a cost to a source of productivity.”
What ERNIE 5.0 does over GPT-5 and Gemini 2.5 Pro
ERNIE 5.0 benchmark results suggest that Baidu achieves comparable or near-equal performance to top European and American infrastructure models across a wide range of tasks.
In public benchmark slides shared during the Baidu World 2025 event, ERNIE 5.0 Preview outperformed or matched OpenAI’s GPT-5-High and Google’s Gemini 2.5 Pro. Multimodal reasoning, document understanding, image-based QA,at the same time Demonstrate powerful language modeling and code execution abilities.
The company emphasized its ability to process joint inputs and outputs across modalities, rather than relying on post-hoc modality fusion, and positioned it as a technology differentiator.
For visual tasks, ERNIE 5.0 achieved the highest scores in OCRBench, DocVQA, and ChartQA, three benchmarks that test document recognition, comprehension, and structured data reasoning.
Baidu claims the model outperformed both GPT-5-High and Gemini 2.5 Pro on these document- and chart-based benchmarks, areas considered core to enterprise applications such as automated document processing and financial analysis.
According to Baidu’s internal GenEval-based evaluation, ERNIE 5.0 matches or exceeds Google’s Veo3 in image generation across categories such as semantic alignment and image quality. Baidu claimed that the model’s multimodal integration allows it to generate and interpret visual content with greater context awareness than models that rely on modality-specific encoders.
For voice and speech tasks, ERNIE 5.0 showed competitive results on the MM-AU and TUT2017 speech understanding benchmarks, as well as question answering from spoken language input. Its audio performance, while not as emphasized as vision or text, suggests a broad functional footprint aimed at supporting full-spectrum multimodal applications.
In language tasks, the model showed superior results in following instructions, responding to fact-based questions, and mathematical reasoning, core areas that define the usefulness of large-scale language models in the enterprise.
The Preview 1022 variant of ERNIE 5.0, tuned for text performance, showed even stronger language-specific results in early developer access. Although Baidu does not claim widespread superiority in general language reasoning, its internal evaluations suggest that ERNIE 5.0 Preview 1022 closes the gap with top English models and outperforms them in Chinese performance.
Although Baidu does not publicly release full benchmark details or raw scores, its performance positioning suggests a deliberate attempt to frame ERNIE 5.0 not as a niche multimodal system, but as a flagship model that competes with the largest closed models in general-purpose inference.
Baidu claims clear advantages in structured document understanding, visual chart reasoning, and the integration of multiple modalities into a single native modeling architecture.. Although independent validation of these results is still pending, the breadth of claimed functionality positions ERNIE 5.0 as an important alternative in multimodal foundation model environments.
enterprise pricing strategy
ERNIE 5.0 is premium end Baidu model pricing structure. The company announced specific pricing for API usage on its Qianfan platform, aligning costs with other top-tier services from Chinese competitors such as Alibaba.
|
model |
Input cost (per 1,000 tokens) |
Output cost (per 1,000 tokens) |
sauce |
|
Ernie 5.0 |
0.00085 dollars (0.006 yen) |
0.0034 dollars (0.024 yen) |
Qianho |
|
Ernie 4.5 Turbo (example) |
0.00011 dollars (0.0008 yen) |
0.00045 dollars (0.0032 yen) |
Qianho |
|
Qwen3 (coder example) |
0.00085 dollars (0.006 yen) |
0.0034 dollars (0.024 yen) |
Qianho |
The cost contrast between ERNIE 5.0 and previous models such as ERNIE 4.5 Turbo highlights Baidu’s strategy to differentiate between high-volume, low-cost models and high-performance models designed for complex tasks and multimodal inference.
Prices are still moderate when compared to other US alternatives.
|
model |
Input (/1M token) |
Output (/1M token) |
sauce |
|
GPT-5.1 |
$1.25 |
$10.00 |
OpenAI |
|
Ernie 5.0 |
$0.85 |
$3.40 |
Qianho |
|
Ernie 4.5 Turbo (example) |
$0.11 |
$0.45 |
Qianho |
|
Claude Op. 4.1 |
$15.00 |
$75.00 |
human |
|
gemini 2.5 pro |
$1.25 (≤200,000) / $2.50 (>200,000) |
$10.00 (≤200,000) / $15.00 (>200,000) |
Google Vertex AI pricing |
|
Grok 4 (grok-4-0709) |
$3.00 |
$15.00 |
xAI API |
Global Expansion: Products and Platforms
Parallel to the model release, Baidu is expanding internationally.
-
GenFlow 3.0Currently with over 20 million users, is the company’s largest general-purpose AI agent, with enhanced memory and multimodal task processing.
-
famousis a self-evolving agent that can dynamically solve complex problems and is currently being sold through invitations.
-
MeduThe international version of Baidu’s no-code builder Miaoda is available worldwide via medo.dev.
-
oleateis a productivity workspace that supports documents, slides, images, videos, and podcasts, used by over 1.2 million users worldwide.
Baidu’s Digital Human Platform, already rolled out in Brazil, is also part of the global push. According to the company’s data, during the “Double 11” shopping event held in China this year, 83% of live streamers used Baidu’s digital human technology, contributing to a 91% increase in GMV.
Meanwhile, Baidu’s self-driving ride-hailing service “Apollo Go” has been ridden more than 17 million times, operates unmanned vehicles in 22 cities, and has earned the title of the world’s largest robotaxi network.
Open source visual language model captures industry attention
Two days before the flagship event ERNIE 5.0, Baidu also released an open source multimodal model, ERNIE-4.5-VL-28B-A3B-Thinking, under the Apache 2.0 license.
As my colleague Michael Nuñez at VentureBeat reported, the model activates just 3 billion parameters, while keeping the total to 28 billion, using a Mixture-of-Experts (MoE) architecture for efficient inference.
Key innovations include:
-
“Thinking with Images” enables dynamic zoom-based visual analysis
-
Support for chart interpretation, document understanding, visual foundation, and time awareness in videos
-
Runs on a single 80 GB GPU, making it available for medium-sized organizations
-
Full compatibility with Transformers, vLLM, and Baidu’s FastDeploy toolkit
This release increases pressure on closed source competitors. The Apache 2.0 license makes ERNIE-4.5-VL-28B-A3B-Thinking a viable foundation model for commercial applications without license restrictions. This is something that few high-performance models in this class offer.
Community feedback and Baidu response
After the release of ERNIE 5.0, developer and AI evaluator Lisan al Gaib (@scaling01) posted a mixed review of X. Although we were initially impressed with the model’s benchmark performance, we reported a persistent problem with ERNIE 5.0 repeatedly calling the tool (even when explicitly instructed not to do so) during SVG generation tasks.
“The ERNIE 5.0 benchmark looked insane until I tested it…unfortunately RL either has brain damage or has serious issues with the chat platform/system prompts,” Lisan wrote.
Within a few hours, I received the following response from Baidu’s developer support account @ErnieforDevs:
“Thank you for your feedback. This is a known bug. Certain syntax can cause this bug at any time. We are currently working on a fix. At this time, you can rephrase or change the prompt to work around this bug.”
This quick response reflects Baidu’s increasing emphasis on communicating with developers, especially appealing to users around the world through both proprietary and open source products.
Outlook for Baidu and its ERNIE-based LLM family
Baidu’s ERNIE 5.0 marks a strategic escalation in the global foundation model competition. By claiming performance on par with cutting-edge systems from OpenAI and Google, and combining premium pricing and open access alternatives, Baidu is demonstrating its ambition to become not only a domestic AI leader, but also a trusted global infrastructure provider.
At a time when enterprise AI users increasingly demand multimodal performance, flexible licensing, and deployment efficiency, Baidu’s two-track approach (premium hosted APIs and open source releases) has the potential to broaden its appeal to both the enterprise and developer communities.
It remains to be seen whether the company’s performance claims stand up to third-party testing. But in a landscape shaped by rising costs, model complexity, and computing bottlenecks, ERNIE 5.0 and its supporting ecosystem give Baidu a competitive position in the next wave of AI adoption.
