AI Engineering Signal #16
Claude Opus 4.7 ships 50% more expensive and immediately benchmarks worse than Opus 4.6 on long-context tasks
Signals
Claude Opus 4.7 ships 50% more expensive and immediately benchmarks worse than Opus 4.6 on long-context tasks
the Thematic Generalization Benchmark dropped from 80.6 to 72.8, and MRCR long-context performance regressed noticeably, with power users reporting consistent quality drops across coding and reasoning tasks.
Qwen3.6-35B-A3B released, beats Opus 4.7 on image tasks
35B active params from 109B total, runs locally, outdraws frontier models on at least one creative benchmark.
Web
OpenAI Codex expanded to "almost everything"
agentic desktop control now in Codex; hacked a Samsung TV and wrote a Chrome exploit in documented tests, raising real security surface questions.
Web
Physical Intelligence claims robot brain generalizes to untaught tasks
if the claim holds under scrutiny, this is the embodied-AI generalization milestone people have been waiting for.
TechCrunch
Claude identity verification now requires passport or facial scan
driving local-model adoption; Anthropic is tightening access controls at the API edge.
Web
Researchers reveal new method for tuning superconductivity
materials-level control over superconducting state has direct implications for future compute substrate economics.
Web
Cloudflare launches inference-native AI platform for agents
purpose-built inference layer with agent-aware routing; worth evaluating if you're running multi-step agent workloads at scale.
Web
The Take
Anthropic shipped a regression at a price increase on the same week an open-weight local model beat it on a creative benchmark — the gap between frontier API cost and open-weight capability is closing faster than the labs' release cadence can justify. If your stack is Opus-dependent, benchmark before you upgrade.
Subscribe
Related Signals