AI Engineering Weekly #6
Gemma 4 drops across 1B/4B/12B/27B dense and 124B MoE
Signals
Gemma 4 drops across 1B/4B/12B/27B dense and 124B MoE
Google's open-weight push directly targets Qwen 2.5 and DeepSeek-V3 on the efficiency curve, with community benchmarks showing meaningful gains over Gemma 3, multimodal capability across all sizes, and the 124B MoE already confirmed open-weight. Safety mitigations were shredded via jailbreak within 90 minutes of release, which is par for the course but worth noting before you deploy it in anything customer-facing.
Simon Willison
Qwen3.6-Plus targets real-world agentic use
Alibaba positions this as an agent-optimized release; worth benchmarking against Llama 4 Scout for tool-use tasks before committing to a stack.
Web
Microsoft releases three new in-house models covering speech and image modalities
signals Microsoft is reducing dependency on OpenAI for non-text workloads; no independent evals yet so treat as "announced, not proven."
TechCrunch
Bankai: post-training adaptation method for true 1-bit LLMs
if it holds up under scrutiny, this is the path to running capable models on edge hardware without quantization quality loss; check the GitHub before the paper gets cited everywhere.
GitHub
ArXiv: Scaling Reasoning Tokens via RL and Parallel Thinking on competitive programming
evidence that RL-driven parallel reasoning chains improve performance on hard coding tasks; relevant if you're building code-gen agents and haven't explored parallel chain sampling.
ArXiv
r/AI_Agents: $12k/month on agents where 80% are just talking to each other
a practitioner catching that their agentic spend is mostly inter-agent chatter with no external action; a useful calibration check for anyone running multi-agent pipelines.
AMD Lemonade: fast open-source local LLM server using both GPU and NPU
targets on-device inference with NPU offload on AMD hardware; worth watching for edge deployment use cases where CUDA isn't an option.
Web
The Take
Gemma 4 and Qwen3.6-Plus arriving within the same news cycle means the open-weight tier is now genuinely competitive with hosted APIs for most production workloads — the cost argument for proprietary APIs just got harder to defend. Run your eval suite against both before your next model contract renewal.
Subscribe
Related Signals