Xiaomi just launched a new AI model family. Again.
The V2-Pro was text-and-code only. Multimodal capability existed in its sibling model, MiMo-V2-Omni, but that was a separate product at lower benchmark scores. MiMo-V2.5 collapses all of that into one model—faster, more capable, and with native image, video, and audio understanding baked in from the start.
That matters more than it might sound for regular users. For example, now you can upload a photo of your fridge and ask it to suggest dinner recipes. Drop in a video tutorial and get a step-by-step summary. Record a meeting and have it pull out action items. All in one place, without juggling separate tools and separate models with different pricing strategies.
Xiaomi claims MiMo-V2.5-Pro represents "a major leap from MiMo-V2-Pro in general agentic capabilities, complex software engineering, and long-horizon tasks," and says it now matches frontier models like Claude Opus 4.6 and GPT-5.4 across most coding and agent benchmarks. The numbers largely back that up—with some gaps still visible on harder reasoning tasks.
The base and pro models serve different purposes. MiMo-V2.5-Pro is the heavy lifter. Xiaomi says it can "autonomously complete professional tasks involving 1,000+ tool calls, work that would take human experts days." That's for developers running complex, multi-step automated workflows. It runs at 60–80 tokens per second and costs $1.00 input / $3.00 output per million tokens.
MiMo-V2.5 is the everyday version. Faster (100–150 tokens per second), cheaper ($0.40 input / $2.00 output), and supports all modalities—image, audio, and video that the Pro-only tier skips. Both models carry a 1M-token context window, meaning they can hold roughly 750,000 words in a single conversation.
On SWE-bench Pro—a coding benchmark where models fix real bugs in actual startup codebases, scored as a pass rate out of 100—MiMo-V2.5-Pro resolves 57.2% of tasks. That's near the top of the field; the average model manages around 25%. The story is similar on τ3-bench and ClawEval, where it lands within a few points of Claude Opus 4.6 and GPT-5.4. The gap opens up on Humanity's Last Exam, a gauntlet of graduate-level problems across dozens of academic fields: MiMo scores 48.0% versus GPT-5.4's 58.7—a 10-point deficit that's hard to paper over..
Where it genuinely stands out is token efficiency. Xiaomi says MiMo-V2.5-Pro uses 42% fewer tokens than Kimi K2.6 at equivalent benchmark scores, and MiMo-V2.5 uses nearly half the tokens of Muse Spark for similar results. For anyone running these at scale—developers processing thousands of requests daily—that difference is real money.
On multimodal tasks, MiMo-V2.5 scores show results that put it on par with GPT/5.4 and Gemini 3.1 Pro, and are quite close to Opus 4.6 standards.


This was probably due to the boom of the agentic AI tool Hermes and its arrangement with Xiaomi, giving users free access to MiMo v2 Pro for a limited time. That timeframe is already closed, but the hype was enough to put Xiaomi in the game field.
Those who want to use Hermes for free now can test the new Step 3.5 flash with the Nous API or use OpenRouter with free models but more limited usage.
Token plan pricing also got a refresh. MiMo-V2.5 runs at a 1x credit rate; MiMo-V2.5-Pro at 2x. Xiaomi is no longer charging an extra multiplier for using the full 1 million-token context window, which makes long-document analysis noticeably cheaper. Existing users also get a full credit reset as a launch bonus.
The company says it's already training the next generation, with "deeper reasoning, tighter tool integration, and richer real-world grounding." At the rate Xiaomi is moving, that announcement is probably closer than you'd expect.


















