Issue #6 2026-04-03 2 min read

AI Engineering Weekly #6

Gemma 4 drops across 1B/4B/12B/27B dense and 124B MoE

Signals

Gemma 4 drops across 1B/4B/12B/27B dense and 124B MoE

Google's open-weight push directly targets Qwen 2.5 and DeepSeek-V3 on the efficiency curve, with community benchmarks showing meaningful gains over Gemma 3, multimodal capability across all sizes, and the 124B MoE already confirmed open-weight. Safety mitigations were shredded via jailbreak within 90 minutes of release, which is par for the course but worth noting before you deploy it in anything customer-facing.

Simon Willison

Qwen3.6-Plus targets real-world agentic use

Alibaba positions this as an agent-optimized release; worth benchmarking against Llama 4 Scout for tool-use tasks before committing to a stack.

Web

Microsoft releases three new in-house models covering speech and image modalities

signals Microsoft is reducing dependency on OpenAI for non-text workloads; no independent evals yet so treat as "announced, not proven."

TechCrunch

Bankai: post-training adaptation method for true 1-bit LLMs

if it holds up under scrutiny, this is the path to running capable models on edge hardware without quantization quality loss; check the GitHub before the paper gets cited everywhere.

GitHub

ArXiv: Scaling Reasoning Tokens via RL and Parallel Thinking on competitive programming

evidence that RL-driven parallel reasoning chains improve performance on hard coding tasks; relevant if you're building code-gen agents and haven't explored parallel chain sampling.

ArXiv

r/AI_Agents: $12k/month on agents where 80% are just talking to each other

a practitioner catching that their agentic spend is mostly inter-agent chatter with no external action; a useful calibration check for anyone running multi-agent pipelines.

AMD Lemonade: fast open-source local LLM server using both GPU and NPU

targets on-device inference with NPU offload on AMD hardware; worth watching for edge deployment use cases where CUDA isn't an option.

Web

Get signals like this in your inbox

Daily AI engineering intelligence. No noise.

[ Subscribe ]

The Take

Gemma 4 and Qwen3.6-Plus arriving within the same news cycle means the open-weight tier is now genuinely competitive with hosted APIs for most production workloads — the cost argument for proprietary APIs just got harder to defend. Run your eval suite against both before your next model contract renewal.

Related Signals

2026-04-11 · general web, community, research, github, tech press

AI Engineering Weekly Digest #3

2026-06-08 · tech press, general web, community, github, research

AI Engineering Signal #55