Quick explainer for the non-developers in the room: When you use ChatGPT or Claude in a browser, you're paying a flat subscription—or nothing. When a company builds a product on top of an AI model, they pay per token, where a token is roughly three-quarters of a word. Every message sent, every reply generated, every document processed: all of it adds up at a rate measured in millions of tokens.
An API is the raw pipe that makes this possible, making it possible for an app, an agent, a web site, etc. to use the model in their own environment. So token pricing determines whether an AI-powered product is economically viable or a money pit.
Token plans are a subscription wrapper on top of that. You buy credits upfront; the model eats through them. Xiaomi's billing upgrade gives users 5 to 8 times more tokens at the same price. The Max plan at $100 now gets you 82 billion tokens, up from 1.6 billion.
For context, 82 billion tokens is more than 60 billion words.
Why the cuts are real, not marketingFuli Luo, head of Xiaomi's MiMo team and a former core DeepSeek developer who co-built DeepSeek-V2, published a technical explanation on X. The biggest savings come from a smarter way of storing and reusing information the AI has already processed. Instead of repeatedly doing the same work, Xiaomi’s system can remember much more data at once—about five times more than before. That means the AI needs far less computing power, cutting storage and processing costs by around 80%.
Behind the MiMo API Price Reduction:The deepest price cut, up to 99%, is for Input (Cache Hit). The core reason is our inference framework now supports hierarchical KV cache optimization for SWA. Production inference engine tests show this optimization increases cached token…
“Operating at these newly reduced API prices, our production inference engine is running at near full capacity, and we can still essentially break even,” Luo wrote. “If more architectures that save compute and KV [Key-Value cache] cache emerge, along with better inference Infra to drive down API costs, this will form an excellent virtuous cycle in the industry.”
The result is a model 98% cheaper than GPT-5.5 Pro with a competitive performance.
Silicon Valley’s betDeepSeek V4-Pro is a 1.6 trillion parameter model that gives you the knowledge base of a massive model at a fraction of the compute cost. It now permanently runs at $0.435 input and $0.87 output per million tokens. That's a model that scored 80.6% on SWE-Verified against Claude Opus 4.6's 80.8%—a benchmark measuring real GitHub issue resolution, not cherry-picked demos. The pricing gap between models with essentially the same coding score: 34x on output.
DeepSeek and Xiaomi aren't aloneKimi K2.5 from Moonshot AI, with 76.8% on SWE-bench Verified, runs $0.60 input and $2.50 output. GLM-5.1 from Z.AI beat Claude Opus 4.6 on a key coding benchmark earlier this quarter. Four Chinese frontier models shipped in a 12-day window in early May, all under one-third of Opus 4.7's per-token cost.
For better visualization, this chart shows how Chinese models stack up against the three most popular American AI providers (Anthropic, OpenAI, and Meta) in terms of price to quality ratio.
Image: Artificialanalysis.aiThe Q2 2026 gap between Chinese and American frontier models sits at 15–30x, depending on which models you compare—and that's the baseline, before any cache discounts.
What this week's cuts do is collapse that gap further for the specific workloads that actually run in production: agent pipelines with stable system prompts, document processors, retrieval tools, things that hit cache constantly. At $0.003625 per million cached input tokens, DeepSeek V4-Pro's cost for repeated context is functionally rounding error.


















