CES 2026: The On-Device AI Push and the Economics of Actually Shipping It
Key takeaways
- CES 2026 shifts from concept to production—Intel, AMD, Qualcomm shipping 50-80 TOPS NPU platforms with firm dates, 200+ PC designs committed
- Vendor battery claims (27hr to multi-day) reflect burst workloads under optimal conditions; sustained AI inference power draw remains the critical unanswered question
- Always-on AI requires continuous NPU operation—gap between 10-second demos and 10-hour runtime exposes thermal and power constraints absent from marketing materials
- 2026-27 products will split workloads: local for latency-sensitive tasks (autocomplete, voice), cloud for compute-intensive (image gen, current queries)—pure on-device AI still 1-2 generations out
Las Vegas, January 2026 – Three days into CES, and the vibe is different this year. Companies are calling out actual ship dates instead of the usual “coming soon” dodge. Intel says January 27. Qualcomm says first half. AMD’s already shipping. The on-device AI conversation finally moved from slide decks to purchase orders.
What Got Announced
Intel’s Core Ultra Series 3 is their first AI PC platform on the 18A process. Top SKUs hit around 50 NPU TOPS with 16 CPU cores. They’re claiming 27-hour battery, though that’s under their test conditions, not yours. Over 200 PC designs are supposedly in the pipeline. First units ship by the end of the month.
AMD brought Ryzen AI 400 Series. Up to 60 TOPS NPU, 12 high-performance cores, Radeon 800M graphics baked in. Marketing deck says multi-day battery for typical use: video playback, productivity, that kind of thing.
Qualcomm expanded Snapdragon X2 with the Plus variant. NPU performance around 80 TOPS for mid-range PCs. They’re pushing efficiency: 35% faster single-core, lower power draw versus last gen. Major OEMs lined up for H1 launches.
A few things jumped out:
NPUs all landing between 50-80 TOPS regardless of vendor. Battery claims range from full-day to multi-day, depending on who you ask. Everyone is talking about running 7B-10B parameter models locally if you’ve got the RAM and do the optimization work. AI acceleration is standard now, not a premium tier feature.
Outside PCs, Qualcomm showed Dragonwing IQ10 for industrial robots and humanoids. TransAI demoed an on-device meeting notetaker. Every booth’s message: inference is moving off the cloud, onto your hardware.
Client challenges
This Time It’s Not Vaporware
Intel cites hundreds of committed designs. Qualcomm’s got OEMs queued for H1. AMD is shipping volume already. These aren’t concept units. The production timelines are locked.
Lenovo brought an updated ThinkPad X1S with new Space Frame internals for better thermals and sustained 30W power delivery. ASUS rolled out a full Copilot+ range from ultralight Zenbooks to beefed-up ProArt creator boxes. Real products, not showcase one-offs.
Microsoft’s Copilot+ spec gave everyone a target. Silicon vendors hit it. OEMs built systems. The ecosystem pieces exist.
But Here’s What Nobody’s Saying on Stage
When someone says “runs 10B-parameter models locally,” they’re talking about whether it’s physically possible. Not what it costs in watts to keep that running.
Intel’s 27-hour number? MobileMark testing. AMD’s multi-day? General use scenarios. Qualcomm’s efficiency gains? Compared to their own previous chips. These numbers tell you something. They don’t tell you what happens when you actually run sustained inference.
Burst vs. Always-On
Walking the demo floor, all you see are burst workloads. Generate one image. Transcribe one meeting. Summarize one doc. NPU spins up for a few seconds, does its thing, and goes back to sleep. That fits fine within current power budgets.
But the pitch for on-device AI isn’t occasional party tricks. It’s always-on intelligence. Assistants are tracking your context all day. Vision processing runs continuously. Voice interfaces are listening 24/7. Those workloads behave differently. Average power draw matters. Thermal limits matter. The gap between “works for 10 minutes” and “works for 10 hours” is where most architectures hit the wall.
The Missing Data
A 50 TOPS NPU sounds impressive. What’s the sustained power draw when you actually run inference continuously? 60 TOPS is perfect on paper, but how does it thermal throttle in a thin laptop after 30 minutes?
Efficiency improvements are real. 43% less power than before matters. But if the baseline was already too high for continuous operation, being 43% more efficient just means you’re closer to viable, not there yet.
What you can’t find in the press kits: detailed power curves under realistic loads. Continuous 7B model execution for assistant features. Always-on vision processing in wearables. The stuff that determines whether features can actually stay enabled during a workday.
Local vs. Cloud Math
Local inference has obvious wins. Lower latency, better privacy, no network dependency, no API charges piling up.
Costs: more expensive hardware, extra RAM, better thermal design. Battery constraints on duty cycle. You own the model update and optimization problem.
For quick one-off tasks, cloud APIs cost almost nothing, fractions of a cent mostly. For thousands of daily inferences, on-premises starts making economic sense if the power budget allows it. With current silicon, that boundary is still narrow and workload-specific.
What’ll Actually Ship
Most systems targeting 2026-27 will split workloads not by choice, but because they have to.
Frequent, latency-sensitive stuff runs on premises: autocomplete, basic voice, real-time translation, contextual hints. Heavy or occasional tasks hit the cloud: complex image gen, long content, anything requiring current information, models too big for device memory.
This isn’t a lack of ambition. The hardware works. Economics forces selective execution.
What This Means If You’re Building
Design for sustained performance, not peak specs. Your power budget needs to reflect what the device can maintain over hours, not what the NPU can theoretically do for 10 seconds.
Decide early what runs where. Clear criteria for local versus cloud. Model optimization isn’t optional: quantization, pruning, and memory access patterns determine whether your feature runs all day or drains battery in two hours.
Test with real usage patterns, not vendor benchmarks. Battery behavior under sustained load tells you whether your product works after launch. Be transparent about when you’re local versus cloud: users care about privacy and connectivity implications.
Where We Actually Are
The progress at CES 2026 is real. You can run mid-sized models locally on mainstream hardware with proper optimization. NPU performance enables actual on-device features.
What hasn’t changed: the energy cost of keeping those features running continuously. For always-on agents, battery is still the limiting factor. That’ll improve as silicon gets more efficient and model architectures evolve. Today, it defines what ships reliably versus what stays in the demo booth.
Bottom Line
CES 2026 shows that on-device AI has moved past the prototype phase. Platforms are shipping, the ecosystem is aligning, and purchase orders are real.
The question now is economic, not technical. Sustained power consumption, real-world battery behavior, and the boundary between local and cloud determine which products succeed outside the convention center.
Products that work in 2026 will be the ones designed around current limits, not the ones that pretend those limits don’t exist.

