AI Engineering Weekly #3
Google's TurboQuant memory market impact
Signals
Google's TurboQuant memory market impact
a technical clarification thread on r/LocalLLaMA breaks down TurboQuant/RaBitQ quantization claims, and separately The Register reports memory-maker shares dropped and some RAM prices eased, with Google's quantization work cited as a contributing factor; if aggressive model compression is already moving hardware markets, your infrastructure cost assumptions from six months ago are stale.
Axios npm package compromised
malicious versions were pushing a remote access trojan via one of the most-downloaded JS packages in existence; audit your supply chain now, especially any CI/CD that pulls Axios transitively.
Web
Ollama now powered by MLX on Apple Silicon (preview)
native MLX backend means local inference on Mac gets a meaningful speed and efficiency bump without any user-side changes; worth testing if you're running dev workflows on Apple hardware.
Web
Google multi-agent study: 180 setups tested, multi-agent degraded performance
a community post summarizing Google's internal findings claims multi-agent architectures made outcomes worse in the majority of configurations tested; matches what practitioners building in production have been reporting for a year.
Qwen 3.5-Omni results published by Alibaba, Qwen 3.6 spotted in model registry
two data points in 24 hours signal Alibaba is shipping fast; if you're evaluating open-weight multimodal options, the Qwen family is moving faster than most Western alternatives right now.
Web
LiteLLM drops Delve as a vendor
the most widely used open-source LLM gateway cut ties with a controversial observability startup; if you're using LiteLLM in prod, check whether any Delve integration is in your stack and what replaces it.
TechCrunch
Claude Code source partially leaked via NPM map file
a source map left in the published NPM package exposed Claude Code's internal structure; Anthropic hasn't commented publicly, but this is a reminder that shipping CLI tools via NPM with source maps enabled is an opsec failure mode worth auditing in your own tooling.
Web
The Take
Supply chain (Axios), quantization economics (TurboQuant), and multi-agent reliability are all converging on the same message: the defaults you set up last year are wrong. Audit your npm dependencies, reprice your inference infrastructure assumptions, and stop defaulting to multi-agent architectures until you have evidence they help your specific workload.
Subscribe
Related Signals