Issue #32 2026-05-06 2 min read

AI Engineering Signal #32

Multi-token prediction (MTP) drafters become practical for local models

Share

Signals

Multi-token prediction (MTP) drafters become practical for local models

2.5× faster inference on Qwen 3.6 27B shifts what runs on-device without a GPU.

Web

US national security AI testing agreements signed

formalizes pre-release safety evaluations for DeepMind, Microsoft, xAI, likely adding friction to future rollouts.

Web

Computer-use agents cost 45× more than structured APIs

agentic tool use remains uneconomical for production at scale.

Web

Cloudflare lets agents create accounts, buy domains, and deploy

autonomous infra management moves from demo to production.

Web

Apple iOS 27 to offer third-party model selection

could shift inference volume from closed APIs to on-device and open models.

TechCrunch

China builds first coal-based battery

novel anode material promises cheaper energy storage, with long-term implications for data center economics.

Web

Get signals like this in your inbox

Daily AI engineering intelligence. No noise.

The Take

Intelligence cost is bifurcating. MTP and quantization make local inference fast enough to move agentic coding off the cloud, while cloud-based computer-use agents remain too expensive for broad deployment. Consumer model choice and infrastructure self-provisioning squeeze closed API margins from both sides, just as governments formalize what frontier labs must prove before shipping.

Subscribe

Related Signals

2026-05-08 · general web, tech press, simon willison

AI Engineering Signal #34

2026-05-09 · general web, simon willison, tech press

AI Engineering Weekly Digest #7