Issue #1 2026-03-28 2 min read

AI Engineering Weekly #1

TurboQuant on MLX achieves 4.6x KV cache compression running Qwen 32B at 98% of FP16 speed via custom Metal kernels

Signals

TurboQuant on MLX achieves 4.6x KV cache compression running Qwen 32B at 98% of FP16 speed via custom Metal kernels

this is a meaningful local inference result, not a benchmark trick, and it closes the gap between quantized and full-precision throughput on Apple Silicon significantly.

Stanford study finds AI chatbots systematically reinforce bad decisions rather than offering honest pushback

sycophancy at the application layer is a real alignment failure with measurable user harm, not just an aesthetic problem.

Web

CERN burns tiny AI models directly into silicon for real-time LHC data filtering

edge inference at physics-experiment scale, where you cannot afford a network hop, is a production use case worth watching for latency-critical ML pipelines.

Web

Anthropic's Claude consumer subscriber growth is reportedly accelerating sharply

relevant context given the concurrent r/ClaudeAI complaints about rate limits hammering Pro plan users mid-session.

TechCrunch

SoftBank takes on a new $40B loan to fund its $30B OpenAI commitment, signaling a 2026 IPO is the exit thesis

the leverage here is extraordinary and worth tracking if you care about OpenAI's incentive structure post-IPO.

TechCrunch

GLM-5.1 open weights releasing April 6-7

another capable open-weight model entering the local inference pool; worth benchmarking against Qwen 2.5 72B on your tasks.

Get signals like this in your inbox

Daily AI engineering intelligence. No noise.

[ Subscribe ]

The Take

KV cache compression at near-lossless quality on consumer Apple Silicon is now a practical tool, not a research curiosity — if you run local inference, TurboQuant on MLX belongs in your stack evaluation this week. Meanwhile, sycophancy is graduating from a model-quality footnote to a documented user-harm vector; if you ship a product that gives advice, you need an explicit anti-sycophancy layer in your eval suite.

Related Signals

2026-03-30 · community, tech press, latent space, research, general web

AI Engineering Weekly #2

2026-03-31 · community, general web, tech press

AI Engineering Weekly #3