logo
  • menu
  • Markets
  • ETFs
  • Live
  • Spot
  • Futures
  • Learn
  • Sign In
  • Sign Up
  • Downloads
  • English
  • |
  • USD
  • |
Sign Up
Crypto PricesLearnLatest NewsDownloadsMarketsSpotAnnouncements
Home/
Latest News/
Live

Google Found a Way to Make Local AI Up to 3x Faster—No New Hardware Required

By Decrypt
May 7, 2026
4.5 
★
★
★
★
★
★
★
★
★
★
 160 User Rating
Share

Running an AI model on your own computer is great—until it isn't.

The promise is privacy, no subscription fees, and no data leaving your machine. The reality, for most people, is watching a cursor blink for five seconds between sentences.

That bottleneck has a name: inference speed. And it has nothing to do with how smart the model is. It's a hardware problem. Standard AI models generate text one word fragment—called a token—at a time. The hardware has to shuttle billions of parameters from memory to its compute units just to produce each single token. It's slow by design. On consumer hardware, it's painful.

The approach is called speculative decoding, and it's been around as a concept for years. Google researchers published the foundational paper back in 2022. The idea didn't go mainstream until now because it required the right architecture to make it work at scale.

Here's the short version of how it works. Instead of making the big, powerful model do all the work alone, you pair it with a tiny "drafter" model. The drafter is fast and cheap—it predicts several tokens at once in less time than the main model would take to produce just one. Then the big model checks all of those guesses in a single pass. If the guesses are right, then you get the whole sequence for the price of one forward pass.

Nothing is sacrificed: The large model—Gemma 4's 31B dense version, for example—still verifies every token, and the output quality is identical. You're just exploiting idle compute power that was sitting unused during the slow parts.

Google says the drafter models share the target model's KV cache—a memory structure that stores already-processed context—so they don't waste time recalculating things the larger model already knows. For the smaller edge models designed for phones and Raspberry Pi devices, the team even built an efficient clustering technique to further cut generation time.

This isn't the only attempt the AI world has made at parallelizing text generation. Diffusion-based language models—like Mercury from Inception Labs—tried a completely different approach: Instead of predicting one token at a time, they start with noise and iteratively refine the entire output. That’s fast on paper, but diffusion LLMs have struggled to match the quality of traditional transformer models, leaving them more of a research curiosity than a practical tool.

Speculative decoding is different because it doesn't change the underlying model at all. It's a serving optimization, not an architecture replacement. The same Gemma 4 you'd already run gets faster.

The practical upside is real. A Gemma 4 26B model running on an Nvidia RTX Pro 6000 desktop GPU gets roughly twice the tokens per second with the MTP drafter enabled, according to Google's own benchmarks. On Apple Silicon, batch sizes of 4 to 8 requests unlock around 2.2x speedups. Not quite the 3x ceiling in every scenario, but still a meaningful difference between "barely usable" and "actually fast enough to work with."

Chrome Is Quietly Installing a 4GB AI Model on Your Computer—And Putting It Back If You Delete It

Google says the drafter unlocks "improved responsiveness: drastically reduce latency for near real-time chat, immersive voice applications and agentic workflows"—the kind of tasks that demand low latency to feel useful at all.

Use cases snap into focus quickly: A local coding assistant that doesn't lag; a voice interface that responds before you've forgotten what you asked; an agentic workflow that doesn't make you wait three seconds between steps. All of this, on hardware you already own.

Disclaimer: The information on this page may have been obtained from third parties and does not necessarily reflect the views or opinions of BitKan. This content is provided for general informational purposes only, without any representation or warranty of any kind, nor shall it be construed as financial or investment advice. BitKan shall not be liable for any errors or omissions, or for any outcomes resulting from the use of this information. Investments in digital assets can be risky. Please carefully evaluate the risks of a product and your risk tolerance based on your own financial circumstances. Products mentioned in this article may not be available in your region.

Latest News

Industry

Cryptocurrency

Airdrop

Markets

  • Ethereum Foundation to Cut Budget by 40% in Major Restructuring

    Ethereum Foundation to Cut Budget by 40% in Major Restructuring

    The Ethereum Foundation (EF) has announced a comprehensive reorganization that includes a 40% reduction in its 2026 budget and a 20% cut to its workforce, signaling a shift toward a leaner, endowment-style operational model for the blockchain ecosystem.
    Wayne Ingram
    Jun 25, 2026
  • Japan Regulators Greenlight Ripple’s RLUSD Stablecoin Launch

    Japan Regulators Greenlight Ripple’s RLUSD Stablecoin Launch

    The Japan Financial Services Agency (JFSA) approved RLUSD under the Payment Services Act.
    Wayne Ingram
    Jun 25, 2026
  • SpaceX Prices Record $75B IPO at $135, Hits $1.8T Valuation

    SpaceX Prices Record $75B IPO at $135, Hits $1.8T Valuation

    SpaceX has officially executed the largest initial public offering in Wall Street history, substantially eclipsing all previous market records.
    Wayne Ingram
    Jun 12, 2026
  • Stablecoin Secondary Market Rules Pit Banks Against Crypto

    Stablecoin Secondary Market Rules Pit Banks Against Crypto

    The Bank Policy Institute and The Clearing House want anti-money laundering rules to cover secondary market activity.
    Martha Grizzard
    Jun 12, 2026
  • VerifiedX Launches Bitcoin Sidechain for Native DeFi Privacy

    VerifiedX Launches Bitcoin Sidechain for Native DeFi Privacy

    VerifiedX has officially introduced a decentralized "reliever chain" designed to bring programmable, privacy-preserving functionality to the Bitcoin network.
    Martha Grizzard
    May 18, 2026
View more data 
BTCBTC(BTC)
$0
--(Last 24h)
SpotFutures

Top

View more
  1. 1S&P 500 Reclaims 200-Day Moving Average, Bitcoin Gains
  2. 2Trump Softens His Stance on Reciprocal Tariffs, US Stocks and Crypto Markets Rise
  3. 3Vitalik Buterin : The current price of ETH has not been affected by the merger event
  4. 4Vibhu Norby : Solana Spaces store to bring 100K people to Solana per month
  5. 5CZ: compared with the record high nine months ago, the current situation of the industry is much better

Top Gainers

View more
Gravity
GravityG

$0.004380

+45.51%
Heima
HeimaHEI

$0.1627

+43.73%
Jotchua
JotchuaJOTCHUA

$0.008242

+31.39%
MEET48
MEET48IDOL

$0.0263

+31.19%
Audiera
AudieraBEAT

$2.3377

+30.31%

Top Trending

View more
Solana
SolanaSOL

$67.9500

+0.31%
Litecoin
LitecoinLTC

$41.0700

-0.41%
Bittensor
BittensorTAO

$210.300

-4.10%
Hyperliquid
HyperliquidHYPE

$63.3830

-0.03%
Ethereum
EthereumETH

$1,552.25

-4.25%

Recently added

View more
Nesa
NesaNES

$0.1974

-14.17%
Arcium
ArciumARX

$0.2746

+9.88%
Ambire AdEx
Ambire AdExADX

$0.0558

-1.76%
Re
ReRE

$0.5851

-5.05%
o1 exchange
o1 exchangeO

$0.5075

-25.61%

Learn

View more
  1. 1What Are Appchains? How Do Application-Specific Blockchains Work?
  2. 2What Is Chain Abstraction? What Are the Advantages and Challenges?
  3. 3What Are Intent-Based Transactions? How Do They Work?
  4. 4What Are Modular Blockchains? How Do They Scale Networks?
  5. 5Can Stablecoins Earn Interest? How to Generate Real Yield?
About Us
  • About BitKan
  • Contact Us
  • Announcements
  • VIP Program
  • BitKan Ambassador
  • Institutional Services
Products
  • Spot
  • Futures
  • Crypto Prices
  • Learn
  • News
  • Markets
  • How to Buy Crypto
  • BTC to USD Calculator
  • Reward
Help
  • Help Center
  • Email Us
  • Live Chat
  • Download APP
  • Listing Application
  • Buy Bitcoin
  • Buy Ethereum
  • Buy Dogecoin
  • Buy Altcoins
Terms
  • Terms of Use
  • Privacy Policy
  • Trading Rules
  • Fee
K-Site
English
About Us
+
  • About BitKan
  • Contact Us
  • Announcements
  • VIP Program
  • BitKan Ambassador
  • Institutional Services
Products
+
  • Spot
  • Futures
  • Crypto Prices
  • Learn
  • News
  • Markets
  • How to Buy Crypto
  • BTC to USD Calculator
  • Reward
Help
+
  • Help Center
  • Email Us
  • Live Chat
  • Download APP
  • Listing Application
  • Buy Bitcoin
  • Buy Ethereum
  • Buy Dogecoin
  • Buy Altcoins
Terms
+
  • Terms of Use
  • Privacy Policy
  • Trading Rules
  • Fee
K-Site
+
  • Twitter
  • Facebook
  • Telegram
  • YouTube
  • Instagram
  • Medium
  • Linkedin
@2012-2026 BITKAN.com