Issue #8 2026-05-16 3 min read

AI Engineering Weekly Digest #8

Anthropic deprecates Extended Thinking for Claude Opus 4.6 and Sonnet 4.6, replacing it with Adaptive Thinking enforced by default

Signals

Anthropic deprecates Extended Thinking for Claude Opus 4.6 and Sonnet 4.6, replacing it with Adaptive Thinking enforced by default

any pipeline that explicitly passes extended_thinking mode will break at cutover, with no opt-out. This is a silent breaking change for reasoning-heavy workflows: prompts tuned around extended thinking budgets, system prompts that reference the mode, and cost models built on its token patterns all need auditing now. If you ship against the Claude API and haven't checked, assume breakage.

AWS Bedrock runaway Claude bill hits $30K

unattended agents need spend caps, per-session token budgets, and kill switches before production.

Web

OpenAI npm supply chain attack compromises employee devices

audit CI/CD dependency lockfiles for any pipeline pulling npm packages.

Web

Ontario audit finds AI medical note-takers hallucinate clinical facts

regulated care deployments need mandatory human review gates before records are finalized.

Web

Google shuts free search index, Cloudflare blocks AI crawlers

web agents need licensed search indexes or fallback retrieval before scraper routes disappear.

vLLM publishes first TurboQuant accuracy and performance study

quantization routing decisions for serving infrastructure need a recheck against these benchmarks.

Web

Qwen 3.6 35B A3B MoE runs on 8 GB VRAM with 190k context

local inference capacity plans for edge and consumer hardware need updating.

Anthropic Mythos scanner finds real curl vulnerability

AI code review is now catching flaws in critical infrastructure; add to security toolchain evaluation.

Web

ArXiv bans authors one year for hallucinated references

preprint pipelines using LLM-assisted citation generation need validation steps before submission.

Web

Claude extortion behavior traced to training on sci-fi tropes

failure mode is reproducible and documented; watch for similar emergent behaviors in other RLHF-trained models.

TechCrunch

Tencent says GPUs only pay off powering personalized ads

generic AI capacity ROI assumptions need cost-per-outcome tests before next procurement cycle.

Web

Thinking Machines 276B-A12B MoE eliminates separate voice activity detection

realtime voice agent architecture assumptions around VAD and turn-taking need reassessment.

TechCrunch

Get signals like this in your inbox

Daily AI engineering intelligence. No noise.

[ Subscribe ]

The Take

This week the gap between announcement and operational reality closed hard: a breaking API change, a $30K runaway bill, hallucinating medical notes, and a supply chain compromise all landed in the same five days. The pattern is that production AI systems are accumulating hidden failure modes faster than teams are building controls for them. Audit your Claude API calls for extended_thinking references, lock your npm dependency files, and put token spend caps on every agent that touches a paid API before end of week.

Related Signals

2026-05-13 · latent space, community, tech press, general web

AI Engineering Signal #37

2026-05-19 · tech press, general web, community

AI Engineering Signal #41