AI Engineering Signal #32
Multi-token prediction (MTP) drafters become practical for local models
Signals
Multi-token prediction (MTP) drafters become practical for local models
2.5× faster inference on Qwen 3.6 27B shifts what runs on-device without a GPU.
Web
US national security AI testing agreements signed
formalizes pre-release safety evaluations for DeepMind, Microsoft, xAI, likely adding friction to future rollouts.
Web
Computer-use agents cost 45× more than structured APIs
agentic tool use remains uneconomical for production at scale.
Web
Cloudflare lets agents create accounts, buy domains, and deploy
autonomous infra management moves from demo to production.
Web
Apple iOS 27 to offer third-party model selection
could shift inference volume from closed APIs to on-device and open models.
TechCrunch
China builds first coal-based battery
novel anode material promises cheaper energy storage, with long-term implications for data center economics.
Web
The Take
Intelligence cost is bifurcating. MTP and quantization make local inference fast enough to move agentic coding off the cloud, while cloud-based computer-use agents remain too expensive for broad deployment. Consumer model choice and infrastructure self-provisioning squeeze closed API margins from both sides, just as governments formalize what frontier labs must prove before shipping.
Subscribe
Related Signals