Perplexity Wants Your Laptop to Do Part of the AI Work

Perplexity Wants Your Laptop to Do Part of the AI Work—So It Doesn't Have To

Jun 4, 2026

3.9

★

212 User Rating

"The right goal for an AI system is to deliver the most token value per watt, for each user," Perplexity wrote in the official announcement. Three competing pressures make that hard: accuracy demands the most capable models, privacy demands some data never leaves your machine, and cost demands you don't spend a frontier model's computing resources on a task a smaller one can handle.

The solution Perplexity calls "hybrid agentic inference" addresses all three at once. A compact model runs locally on your device and acts as a traffic cop—figuring out which information is sensitive enough to stay local and which tasks need the full power of a cloud-based frontier model.

"Hybrid agentic inference is for work that includes sensitive data but needs powerful AI. Things like financial records, health information, and personal files," the company explained. "The compact model runs locally on your device to determine when sensitive data should also be kept locally. Meanwhile, work that needs a frontier model's full capability runs on the server."

Should you care about it?

Inference—the process of running a trained AI model to generate a response—is the computational work that happens every time you send a prompt to a chatbot. Right now, almost all of it happens on remote servers owned by AI companies. That means your financial documents, health queries, and private notes travel to someone else's computer before you get an answer back.

This is why you see “Auto” modes or “low thinking” modes on your chatbot. AI companies will always try to force users into routing interactions in the cheapest mode possible for them.

Srinivas has been direct about this. In a Bloomberg Television interview at Computex, he said the quiet part out loud: "You don't want all your compute centralized in servers and everything running through the largest models. Some people are spending half a billion dollars per month. What you actually want is efficient value per watt per user." Offloading inference work to user hardware reduces those bills—for Perplexity.

Local inference is the best for those companies since it cuts a lot of the costs, but has a major point in favor for AI users: It keeps that data on your machine. The tradeoff has always been power: smaller models that run locally are less capable than the large ones living in data centers.

Perplexity's orchestrator tries to get both. Simple tasks—summarizing a document you've already written, formatting text, lightweight classification—run locally. Complex reasoning gets routed to the cloud, ideally without the sensitive parts of your task attached. The company says this happens automatically, mid-task, invisible to the user. Whether the routing is as reliable in practice as it sounds in a Computex demo is a question the July rollout will answer.

Who else is doing this

Every major player in AI is pushing toward on-device or hybrid inference right now. Apple Intelligence runs its most sensitive processing locally on M-series chips. Microsoft's Foundry Local reached general availability in April 2026, enabling full AI inference on Windows, macOS, and Linux without cloud dependency.

Perplexity's differentiation is the orchestration layer. Rather than asking users to pick local or cloud up front, the system decides per task, in real time. Srinivas said the approach is "chip agnostic"—the Computex demo ran on Intel Core Ultra Series 3, but Nvidia processors are also supported. The feature is currently exclusive to the Perplexity for Windows PC app, with a broader rollout timeline not yet confirmed.

EulerEUL	$1.6860 +57.72%
BENQIQI	$0.001562 +45.71%
DeXeDEXE	$4.9300 +41.99%
RequestREQ	$0.0651 +33.13%
Quack AIQ	$0.0240 +22.28%

Shiba InuSHIB	$0.00000487 +17.35%
ZcashZEC	$483.760 -0.88%
AudieraBEAT	$3.3609 +3.31%
EulerEUL	$1.6860 +57.72%
Lorenzo ProtocolBANK	$0.3377 +14.28%

KetKET	$0.0137 -10.75%
Direxion MU Bull 2X ETFMUUB	$31.8000 +3.72%
GraniteShares 2X Long INTC ETFINTWB	$21.1100 +2.08%
AXTAXTIB	$46.6500 -1.64%
GraniteShares 2X Long MRVL ETFMVLLB	$22.6800 +2.67%

Perplexity Wants Your Laptop to Do Part of the AI Work—So It Doesn't Have To

Latest News

Industry

Cryptocurrency

Airdrop

Markets

Brazil’s CVM Launches 60-Day Sprint to Tokenize Securities

Hyperliquid Enables Permissionless Markets With HIP-4 Plan

DTCC Launches Live Tokenized Asset Trading for Wall Street

South Korea Updates Asset Law to Include Cryptocurrency

New SEC Crypto Rule to Cut Red Tape for Startup Fundraising

Top

Top Gainers

Top Trending

Recently added

Learn