logo
  • menu
  • Markets
  • ETFs
  • Live
  • Spot
  • Futures
  • Learn
  • Sign In
  • Sign Up
  • Downloads
  • English
  • |
  • USD
  • |
Sign Up
Crypto PricesLearnLatest NewsDownloadsMarketsSpotAnnouncements
Home/
Latest News/
Live

US Government Says China's Best AI Models Lag Behind. Experts Aren't So Sure

By Decrypt
May 5, 2026
4.7 
★
★
★
★
★
★
★
★
★
★
 371 User Rating
Share

A U.S. government institute published its verdict on China's most powerful AI: eight months behind, and the more time passes, the wider the gap gets. The internet read the methodology and started asking questions.

CAISI also calls it the most capable Chinese AI model it has evaluated to date.

The scoring system

CAISI doesn't average benchmark scores like most evaluators do. Instead, it applies Item Response Theory—a statistical method from standardized testing—to estimate each model's latent capability by tracking which problems it solves and which it doesn't, across nine benchmarks in five domains: cybersecurity, software engineering, natural sciences, abstract reasoning, and math.

The IRT-estimated Elo scores: GPT-5.5 at 1,260 points, Anthropic's Claude Opus 4.6 at 999. DeepSeek V4 Pro scores around 800 (±28), which is very close to GPT-5.4 mini at 749. In CAISI's system, DeepSeek sits closer to the old generation of GPT mini than to Opus.

The points system in benchmarks score models the way standardized tests score students—not by raw percentage correct, but by weighting which problems they solve and which they miss, producing a points estimate that only means something relative to other models in the same evaluation. The more points, the better the model is in general terms, with the best model’s score becoming the reference point to see how capable a model is.

It’s impossible to reproduce CAISI’s results because two of the nine benchmarks are non-public, and in those two benchmarks is where the gap is widest. For example, GPT-5.5 scored 71% on CTF-Archive-Diamond, one of CAISI’s cybersecurity tests with DeepSeek registering around 32%.

On public benchmarks, the picture shifts. GPQA-Diamond—PhD-level science reasoning, scored as percentage correct—placed DeepSeek at 90%, one point behind Opus 4.6's 91%. Math olympiad benchmarks (OTIS-AIME-2025, PUMaC 2024, SMT 2025) put DeepSeek at 97%, 96%, and 96%. On SWE-Bench Verified—real GitHub bug fixes, scored as percentage resolved—DeepSeek scored 74% to GPT-5.5's 81%. DeepSeek's own technical report claims V4 Pro matches Opus 4.6 and GPT-5.4.

For cost comparison, CAISI filtered out any U.S, model that performed significantly worse or cost significantly more per token than DeepSeek. Only one model cleared the bar: GPT-5.4 mini. That's the entire U.S. frontier, filtered to a single entry.

DeepSeek came out cheaper on 5 of 7 benchmarks even beating OpenAI’s tiniest and least capable AI model.

The counterargument: Is the gap bigger or smaller?

The Artificial Analysis Intelligence Index v4.0—a rating system tracking frontier model intelligence across 10 evaluations—shows OpenAI near 60 points and DeepSeek in the low 50s as of May 2026, compressed far tighter than a year ago.

Based on standardized benchmarks, their methodology shows the gap is actually getting smaller.

CAISI plans to release a fuller IRT methodology write up in the near future.

Disclaimer: The information on this page may have been obtained from third parties and does not necessarily reflect the views or opinions of BitKan. This content is provided for general informational purposes only, without any representation or warranty of any kind, nor shall it be construed as financial or investment advice. BitKan shall not be liable for any errors or omissions, or for any outcomes resulting from the use of this information. Investments in digital assets can be risky. Please carefully evaluate the risks of a product and your risk tolerance based on your own financial circumstances. Products mentioned in this article may not be available in your region.

Latest News

Industry

Cryptocurrency

Airdrop

Markets

  • Ethereum Foundation to Cut Budget by 40% in Major Restructuring

    Ethereum Foundation to Cut Budget by 40% in Major Restructuring

    The Ethereum Foundation (EF) has announced a comprehensive reorganization that includes a 40% reduction in its 2026 budget and a 20% cut to its workforce, signaling a shift toward a leaner, endowment-style operational model for the blockchain ecosystem.
    Wayne Ingram
    Jun 25, 2026
  • Japan Regulators Greenlight Ripple’s RLUSD Stablecoin Launch

    Japan Regulators Greenlight Ripple’s RLUSD Stablecoin Launch

    The Japan Financial Services Agency (JFSA) approved RLUSD under the Payment Services Act.
    Wayne Ingram
    Jun 25, 2026
  • SpaceX Prices Record $75B IPO at $135, Hits $1.8T Valuation

    SpaceX Prices Record $75B IPO at $135, Hits $1.8T Valuation

    SpaceX has officially executed the largest initial public offering in Wall Street history, substantially eclipsing all previous market records.
    Wayne Ingram
    Jun 12, 2026
  • Stablecoin Secondary Market Rules Pit Banks Against Crypto

    Stablecoin Secondary Market Rules Pit Banks Against Crypto

    The Bank Policy Institute and The Clearing House want anti-money laundering rules to cover secondary market activity.
    Martha Grizzard
    Jun 12, 2026
  • VerifiedX Launches Bitcoin Sidechain for Native DeFi Privacy

    VerifiedX Launches Bitcoin Sidechain for Native DeFi Privacy

    VerifiedX has officially introduced a decentralized "reliever chain" designed to bring programmable, privacy-preserving functionality to the Bitcoin network.
    Martha Grizzard
    May 18, 2026
View more data 
BTCBTC(BTC)
$0
--(Last 24h)
SpotFutures

Top

View more
  1. 1S&P 500 Reclaims 200-Day Moving Average, Bitcoin Gains
  2. 2Trump Softens His Stance on Reciprocal Tariffs, US Stocks and Crypto Markets Rise
  3. 3Vitalik Buterin : The current price of ETH has not been affected by the merger event
  4. 4Vibhu Norby : Solana Spaces store to bring 100K people to Solana per month
  5. 5CZ: compared with the record high nine months ago, the current situation of the industry is much better

Top Gainers

View more
Bondex
BondexBDXN

$0.000855

+55.77%
Synapse
SynapseSYN

$0.4170

+47.92%
Solstice
SolsticeSLX

$0.3757

+25.61%
QuickSwap
QuickSwapQUICK

$0.008740

+23.97%
FUNTOKEN
FUNTOKENFUNTOKEN

$0.002753

+22.53%

Top Trending

View more
MemeCore
MemeCoreM

$0.8134

-71.35%
Synapse
SynapseSYN

$0.4170

+47.92%
Block Street
Block StreetBSB

$0.3248

+6.37%
Litecoin
LitecoinLTC

$41.3300

-2.13%
o1 exchange
o1 exchangeO

$0.6022

-12.83%

Recently added

View more
Nesa
NesaNES

$0.2505

+8.91%
Arcium
ArciumARX

$0.2466

-15.63%
Ambire AdEx
Ambire AdExADX

$0.0568

+1.79%
Re
ReRE

$0.5743

-22.64%
o1 exchange
o1 exchangeO

$0.6022

-12.83%

Learn

View more
  1. 1What Are Appchains? How Do Application-Specific Blockchains Work?
  2. 2What Is Chain Abstraction? What Are the Advantages and Challenges?
  3. 3What Are Intent-Based Transactions? How Do They Work?
  4. 4What Are Modular Blockchains? How Do They Scale Networks?
  5. 5Can Stablecoins Earn Interest? How to Generate Real Yield?
About Us
  • About BitKan
  • Contact Us
  • Announcements
  • VIP Program
  • BitKan Ambassador
  • Institutional Services
Products
  • Spot
  • Futures
  • Crypto Prices
  • Learn
  • News
  • Markets
  • How to Buy Crypto
  • BTC to USD Calculator
  • Reward
Help
  • Help Center
  • Email Us
  • Live Chat
  • Download APP
  • Listing Application
  • Buy Bitcoin
  • Buy Ethereum
  • Buy Dogecoin
  • Buy Altcoins
Terms
  • Terms of Use
  • Privacy Policy
  • Trading Rules
  • Fee
K-Site
English
About Us
+
  • About BitKan
  • Contact Us
  • Announcements
  • VIP Program
  • BitKan Ambassador
  • Institutional Services
Products
+
  • Spot
  • Futures
  • Crypto Prices
  • Learn
  • News
  • Markets
  • How to Buy Crypto
  • BTC to USD Calculator
  • Reward
Help
+
  • Help Center
  • Email Us
  • Live Chat
  • Download APP
  • Listing Application
  • Buy Bitcoin
  • Buy Ethereum
  • Buy Dogecoin
  • Buy Altcoins
Terms
+
  • Terms of Use
  • Privacy Policy
  • Trading Rules
  • Fee
K-Site
+
  • Twitter
  • Facebook
  • Telegram
  • YouTube
  • Instagram
  • Medium
  • Linkedin
@2012-2026 BITKAN.com