The trick is called "multi-dimensional elastic pre-training." Instead of building ERNIE 5.1 from scratch, Baidu extracted an optimized sub-network from its existing ERNIE 5.0 architecture—which it released in January 2026—and compressed it down. Total parameters dropped to about one-third of the original. Active parameters (the ones actually doing work during a conversation) were cut in half. The result is a leaner model that inherited the knowledge base of its larger parent without repeating the full training bill.
The post-training pipeline is also worth noting. Baidu built a four-stage reinforcement learning system it calls MOPD (Multi-Teacher On-Policy Distillation). Rather than trying to teach every skill at once—which tends to cause "seesaw effects" where, for example, improving math performance tanks creative writing—Baidu trained specialist expert models in parallel for code, reasoning, and agentic tasks, then distilled all of them into a single unified model. A final online reinforcement learning stage handled open-ended conversations and creative output, preserving what the distillation process couldn't capture well.

In theory it should mean all skills are leveled in terms of proficiency, without one being prioritized over the other
On GPQA (Graduate-Level Google-Proof Q&A, a benchmark measuring whether a model can answer expert-level science questions that can't be Googled), ERNIE 5.1 approaches the performance of leading western closed-source models. On AIME26—the American Invitational Mathematics Examination adapted for 2026, which tests advanced problem-solving under competition conditions—the model scored 99.6% when using tool-assisted reasoning, trailing only Gemini 3.1 Pro.
Baidu is hosting its annual Create 2026 developer conference on May 13–14 in Beijing, where it plans to showcase ERNIE's latest industrial applications. That event will be the next data point on how aggressively the company intends to push the model into enterprise and global markets.



















