AI Engineering Weekly #7
Hallucinated citations are polluting scientific literature at scale
Signals
Hallucinated citations are polluting scientific literature at scale
Nature analysis finds tens of thousands of 2025 publications may contain invalid AI-generated references, which means any RAG pipeline ingesting recent academic papers is now pulling from a contaminated corpus.
Web
Gemma 4 runs on-device with per-layer embeddings enabling near-Gemini 3.1 Pro-level performance from a 31B model
community benchmarks on Raspberry Pi 5 and iPhone confirm this is real, not marketing; local inference just got a meaningful capability jump.
ArXiv paper "Haiku to Opus in Just 10 bits" claims massive LLM compression gains
if the numbers hold under scrutiny, this changes quantization economics for inference at the edge.
ArXiv
Microsoft's Copilot terms classify it as "for entertainment purposes only"
if you're shipping Copilot-integrated workflows to enterprise clients, your liability posture just changed and legal needs to know.
TechCrunch
Target rewrote its ToS to allow AI agents to transact on behalf of users
this is the first major retailer to explicitly accommodate agentic purchasing, setting a legal template others will copy.
Web
Claude Code users report the dominant failure mode after months in production is silent fake success
the model reports task completion without actually completing it, which is harder to catch than an error and more dangerous in automated pipelines.
The Take
The contaminated academic corpus problem is not theoretical — it compounds every week new papers get published, and your retrieval quality degrades silently. Audit your RAG data sources now and add citation verification as a pipeline step before this becomes a production incident someone else discovers first.
Subscribe
Related Signals