Jensen Huang walked onto the Computex stage in Taipei on Sunday, leather jacket on, and unveiled Nemotron 3 Ultra—Nvidia's largest open AI model ever and, at least for now, the smartest open-weight model built in America. It's good. It's just not good enough to beat China.
The model packs roughly 550 billion total parameters but runs on only 55 billion active ones at any given moment, using a design called mixture-of-experts. Parameters are what determine an AI model’s breadth of knowledge, with a greater number generally meaning more powerful.
That makes it the top U.S. open-weight model by a comfortable margin. The next closest American options are Gemma 4 31B from Google at 39, Nemotron 3 Super at 36, and OpenAI's gpt-oss-120b at 33.
NVIDIA just announced the release of Nemotron 3 Ultra in Jensen Huang's Computex keynote: at 550B parameters (55B active), this is the largest Nemotron 3 model to date, and it is the most intelligent US open weights model
The gap over its own predecessor is striking. Nemotron 3 Super, released in March 2026 at 120 billion parameters, was already considered a solid open model for autonomous agents. Ultra jumps 12 index points above it, which in this benchmarking landscape is a big leap.
What the Nemotron family isNvidia has been in the model business longer than most people realize. The first Nemotron-branded model dropped in November 2023, with the third generation announced in December 2025.
The family comes in three sizes: Nano for lightweight tasks, Super for mid-range enterprise applications, and Ultra for complex reasoning workloads. All three share the same hybrid architecture combining Mamba-2 layers, standard Transformer attention, and mixture-of-experts routing.
Mamba-2 is an alternative to standard attention that processes long sequences at a fraction of the cost—relevant when you want a model capable of holding a million tokens in memory at once. Nemotron 3 Ultra supports a 1-million-token context window, meaning an agent can, in theory, have an entire large codebase or hundreds of research documents in view simultaneously.
The Ultra's weights are public and its training recipes are being released. Do you need a supercomputer to run it? Essentially, yes—a 550-billion-parameter model lives in datacenter territory. But you can access it through Nvidia's API or cloud providers without owning the hardware yourself, the same way anyone already uses GPT or Claude through a browser.
Fast model, slower brainBut raw speed doesn't settle the intelligence contest. The chart Artificial Analysis published tells the actual story plainly. On the vertical axis—intelligence—Nemotron 3 Ultra sits at 48 which is nice, but China's Kimi K2.6 from Moonshot AI sits at 54. That six-point gap on the index represents a meaningful difference: Kimi K2.6 was released in April 2026 and currently ranks fourth among all AI models globally, closed or open, sitting only three points behind Anthropic, Google, and OpenAI's proprietary flagships—all tied at 57.
Nemotron 3 Ultra is the most visible result of that bet so far. Nvidia also announced it is already working on Nemotron 4—the next generation—developed through the Nemotron Coalition, a group of eight AI labs including Mistral AI and Perplexity that Nvidia assembled in March 2026 to co-develop open frontier models on DGX Cloud infrastructure. Nemotron 3 Ultra ships June 4.



















