AI Engineering Signal #21
OpenAI ships GPT-5.5, positioning it as a step toward a unified "super app"
Signals
OpenAI ships GPT-5.5, positioning it as a step toward a unified "super app"
benchmark results are circulating but independent evals are not yet in, so treat capability claims as provisional until third-party numbers land.
Web
DeepSeek V4 drops, Flash variant priced aggressively
near-frontier performance at a fraction of API cost; worth benchmarking against your current stack this week.
Simon Willison
Qwen 3.6 27B ties Claude Sonnet 4.6 on agentic evals
a locally-runnable open-weight model matching a top hosted model on agency benchmarks is a meaningful threshold.
Qwen 3.6 27B running at 85 TPS on a single RTX 3090
125K context and vision on consumer hardware changes the local deployment calculus significantly.
Web
Claude Code post-mortem published by its creator
public acknowledgment of quality regressions in an agentic coding tool is rare; read it before deploying Claude Code in CI.
Simon Willison
Google reports 75% of new code is now AI-generated
up from roughly 50% in 2025 and 25% in 2024; the velocity of adoption inside a major engineering org is the signal, not the number itself.
FairyFuse: multiplication-free LLM inference on CPUs via fused ternary kernels
if this holds up, it shifts the floor on what hardware can run inference without a GPU.
ArXiv
The Take
The open-weight tier is collapsing the cost and capability gap with hosted models faster than most production roadmaps assumed — while hosted providers respond with new releases that lack independent evals at launch. The teams that benchmark DeepSeek V4 and Qwen 3.6 against their actual workloads this week will have real data; everyone else is guessing.
Subscribe
Related Signals