AI Engineering Signal #12
Anthropic's Claude Mythos "thousands of zero-days" claim deflated
Signals
Anthropic's Claude Mythos "thousands of zero-days" claim deflated
the security benchmark cited relies on just 198 manual reviews, making the headline numbers statistically thin and the White House push for banks to adopt it for vulnerability scanning look premature.
Web
Anthropic cache TTL silently dropped from 1h to 5min
undocumented regression in Claude Code since March 6th is inflating token costs for anyone relying on prompt caching.
GitHub
Gemma 4 speculative decoding hits +29% avg throughput, +50% on code
meaningful local inference speedup for 31B; audio processing also landed in llama.cpp and llama-server.
Berkeley RDI: prominent AI agent benchmarks are exploitable
adversarial inputs can game top leaderboard scores, undermining eval credibility across the field.
Web
Universal surface-growth law confirmed in 2D after 40 years
KPZ universality class experimentally validated; foundational for stochastic models used in physics-informed ML.
Web
Orbital compute cluster now open
first commercial satellite-based compute offering live; latency and workload constraints still unclear but the substrate is real.
TechCrunch
Bryan Cantrill on laziness as a lost engineering virtue
argues that friction-free AI tooling is eroding the productive laziness that drove good abstractions; worth reading before dismissing.
Web
The Take
Benchmark credibility is collapsing from two directions at once — agent evals are gameable and security capability claims rest on sample sizes too small to generalize. Shipping AI into critical infrastructure on the back of either is a bet you should price accordingly.
Subscribe
Related Signals