Issue #12 2 min read

AI Engineering Signal #12

Anthropic's Claude Mythos "thousands of zero-days" claim deflated

Share

Signals

Anthropic's Claude Mythos "thousands of zero-days" claim deflated

the security benchmark cited relies on just 198 manual reviews, making the headline numbers statistically thin and the White House push for banks to adopt it for vulnerability scanning look premature.

Web

Anthropic cache TTL silently dropped from 1h to 5min

undocumented regression in Claude Code since March 6th is inflating token costs for anyone relying on prompt caching.

GitHub

Gemma 4 speculative decoding hits +29% avg throughput, +50% on code

meaningful local inference speedup for 31B; audio processing also landed in llama.cpp and llama-server.

Reddit

Berkeley RDI: prominent AI agent benchmarks are exploitable

adversarial inputs can game top leaderboard scores, undermining eval credibility across the field.

Web

Universal surface-growth law confirmed in 2D after 40 years

KPZ universality class experimentally validated; foundational for stochastic models used in physics-informed ML.

Web

Orbital compute cluster now open

first commercial satellite-based compute offering live; latency and workload constraints still unclear but the substrate is real.

TechCrunch

Bryan Cantrill on laziness as a lost engineering virtue

argues that friction-free AI tooling is eroding the productive laziness that drove good abstractions; worth reading before dismissing.

Web

Get signals like this in your inbox

Daily AI engineering intelligence. No noise.

[ Subscribe ]

The Take

Benchmark credibility is collapsing from two directions at once — agent evals are gameable and security capability claims rest on sample sizes too small to generalize. Shipping AI into critical infrastructure on the back of either is a bet you should price accordingly.

Subscribe

Unsubscribe any time.

Related Signals