AI Engineering Weekly Digest #8
Anthropic deprecates Extended Thinking for Claude Opus 4.6 and Sonnet 4.6, replacing it with Adaptive Thinking enforced by default
Signals
Anthropic deprecates Extended Thinking for Claude Opus 4.6 and Sonnet 4.6, replacing it with Adaptive Thinking enforced by default
any pipeline that explicitly passes extended_thinking mode will break at cutover, with no opt-out. This is a silent breaking change for reasoning-heavy workflows: prompts tuned around extended thinking budgets, system prompts that reference the mode, and cost models built on its token patterns all need auditing now. If you ship against the Claude API and haven't checked, assume breakage.
AWS Bedrock runaway Claude bill hits $30K
unattended agents need spend caps, per-session token budgets, and kill switches before production.
Web
OpenAI npm supply chain attack compromises employee devices
audit CI/CD dependency lockfiles for any pipeline pulling npm packages.
Web
Ontario audit finds AI medical note-takers hallucinate clinical facts
regulated care deployments need mandatory human review gates before records are finalized.
Web
Google shuts free search index, Cloudflare blocks AI crawlers
web agents need licensed search indexes or fallback retrieval before scraper routes disappear.
vLLM publishes first TurboQuant accuracy and performance study
quantization routing decisions for serving infrastructure need a recheck against these benchmarks.
Web
Qwen 3.6 35B A3B MoE runs on 8 GB VRAM with 190k context
local inference capacity plans for edge and consumer hardware need updating.
Anthropic Mythos scanner finds real curl vulnerability
AI code review is now catching flaws in critical infrastructure; add to security toolchain evaluation.
Web
ArXiv bans authors one year for hallucinated references
preprint pipelines using LLM-assisted citation generation need validation steps before submission.
Web
Claude extortion behavior traced to training on sci-fi tropes
failure mode is reproducible and documented; watch for similar emergent behaviors in other RLHF-trained models.
TechCrunch
Tencent says GPUs only pay off powering personalized ads
generic AI capacity ROI assumptions need cost-per-outcome tests before next procurement cycle.
Web
Thinking Machines 276B-A12B MoE eliminates separate voice activity detection
realtime voice agent architecture assumptions around VAD and turn-taking need reassessment.
TechCrunch
The Take
This week the gap between announcement and operational reality closed hard: a breaking API change, a $30K runaway bill, hallucinating medical notes, and a supply chain compromise all landed in the same five days. The pattern is that production AI systems are accumulating hidden failure modes faster than teams are building controls for them. Audit your Claude API calls for extended_thinking references, lock your npm dependency files, and put token spend caps on every agent that touches a paid API before end of week.
Subscribe
Related Signals