Rio 3.5’s launch timing was perfect: Brazil was playing its World Cup opener, and social media was already on fire. Comments about it rapidly spread from Brazil to beyond.
But just as quickly as it gained attention, there was a dispute over who exactly created the model.
The original model card described Rio 3.5 as a post-train of Qwen 3.5 397B, Alibaba's open-base model, with a new reasoning layer called SwiReasoning added on top. The development cost was reported at R$500,000 (Rio didn’t confirm this), or nearly $100,000 USD—roughly 30 times cheaper than equivalent off-the-shelf AI systems.
The architecture is Mixture-of-Experts, which means only around 17 billion of the 397 billion parameters fire on any given token. That makes inference cheaper than the headline size suggests. The model also supports vision and text, handles over a dozen languages, and ships under a fully open MIT license.
SwiReasoning is the technical centerpiece. It's a training-free inference framework that switches dynamically between two modes. When the model is confident about a next word—low entropy in the probability distribution—it reasons in plain language. When uncertain, it shifts to latent reasoning, thinking in hidden internal states without emitting tokens. IplanRIO said Rio 3.5 was specifically trained to exploit this, and that the gains show up in the benchmark numbers.

The self-reported numbers were eye-catching. Terminal-Bench 2.1—which measures autonomous terminal command execution, scored as percentage of tasks passed—came in at 70.8% for Rio 3.5, edging out Qwen 3.7 Plus at 70.3% and the powerful DeepSeek v4 Pro at 67.9%.
On IMOAnswerBench, a math olympiad benchmark scored as percentage correct, Rio 3.5 hit 89.5%. On HLE—Humanity's Last Exam, a near-unsolvable multi-domain expert battery scored as a percentage—Rio 3.5 landed at 36.5%, ahead of Qwen 3.7 Plus's 34.7%.
A municipal government beating the most important flagship models on the most meaningful quality benchmarks: That's the headline that spread, especially after the Mayor of Rio de Janeiro tweeted about it.
“An open AI model trained in Rio and publicly funded over the last year by [the Municipality of Rio] has just surpassed all other models,” Eduardo Cavaliere wrote. “Today, the world is talking about an open AI model trained in Rio.”
Then Nex showed up“Trained in Rio” proved to be not entirely accurate.
Nex-AGI, a Shanghai-based open-source AI alliance, posted on X days after the release. The opener: "The Rio 3.5 model broke the internet this week. The plot twist? It's essentially our open-source model, Nex N2 Pro, wearing a different hat."
The Rio 3.5 model broke the internet this week. The plot twist? It’s essentially our open-source model, Nex N2 Pro, wearing a different hat.
🤯 We analyzed the weights, and the recipe is exact: Rio 3.5 ≈ 0.6 * Nex N2 Pro + 0.4 * Qwen 3.5
The evidence came in two parts.
First, behavioral. Nex stripped the hardcoded "You are Rio" system prompt from the deployed model and sent it 120 identity questions. Without the mask, Nex reports the model called itself "Nex, from Nex-AGI" 79.2% of the time. It called itself "Rio" exactly 0% of the time. Nex said the model also recited the company’s specific backstory verbatim, mentioning the "Shanghai Innovation Institute" and "a large-model ecosystem alliance." That's Nex's own training data, surfacing in someone else's model.
Second, mathematical. In a genuine weight merge, every parameter in the new model sits on a straight line between the two source models. Nex measured this collinearity across all 60 layers. The result came back at 0.993. Two unrelated models in the same parameter space scored near-zero by chance. Hitting 0.993 across every single layer isn't a coincidence. The mixing ratio held at α ≈ 0.571, stable to three decimal places.
Basically, it was nearly 60% Nex, with the rest being the base Qwen model.
"Every weight tensor in Rio is, to thousands of standard deviations, the same 0.6/0.4 blend of Nex and Qwen—across all 60 layers and every component of the network," Nex wrote. "There is no innocent explanation."
Source: Nex EcosystemThe numbers also told a quieter story. Nex N2 Pro, released just days before Rio 3.5, scores 75.3% on Terminal-Bench 2.1—higher than Rio's 70.8%. On GDPval, an economic forecasting benchmark scored as an Elo-style rating, Nex sits at 1,585 against Rio's 1,533. If Rio is 60% Nex, then you'd expect it to score below Nex on Nex's own benchmarks. It does.
Source: Nex EcosystemIplanRIO responds IplanRIO updated the Hugging Face model card—the benchmark table came down and the attribution changed.
No other public statement from IplanRIO has come out. Nex is now credited.
The "incorrect upload" explanation is the key claim. IplanRIO says the intended release was a distilled version of the merged base—not the raw merge itself. On-policy distillation means a stronger teacher model generates outputs, and the student trains on those while also generating its own. It's more expensive than a raw merge, but still cheaper than training from scratch. If that step was real, then it would represent at least some original work on top of the merge.
What actually shipped, per IplanRIO, was the merged base with nothing on top.
about the Rio 3.5 situation
merging two ~400B-class models and then applying policy distillation isn’t trivial
that said, they made two mistakes:
- a technical error (probably caused by a lack of attention to detail)
- and a communication one (we can debate the integrity of…
Developer and AI YouTuber Lucas Montano noted that "merging two ~400B-class models and then applying policy distillation isn't trivial"—while acknowledging both a technical error and a communication failure.
Legal? Yes. Ethical? Well...Model merging is completely legal. Nex N2 Pro is Apache 2.0—you can use it, modify it, and redistribute it, as long as you credit it. Qwen 3.5 is openly licensed too. Nobody's going to court. here.
The problem was presenting the output as independently developed work without naming all the source models. The open-source community has seen this before. Earlier this year, Cursor's Composer 2 was found to be built on Moonshot's Kimi K2.5 without disclosure. The backlash was fast and reputational—no lawyers, just screenshots.
was messing with the OpenAI base URL in Cursor and caught this
accounts/anysphere/models/kimi-k2p5-rl-0317-s515-fast
IplanRIO is working to upload the corrected, distilled model with full attribution in place. When that lands, the same checks will run again—and the community will find out whether the distillation actually changed anything, or whether it's still mostly Nex with a different system prompt.


















