Issue #7 2 min read

AI Engineering Weekly #7

Hallucinated citations are polluting scientific literature at scale

Share

Signals

Hallucinated citations are polluting scientific literature at scale

Nature analysis finds tens of thousands of 2025 publications may contain invalid AI-generated references, which means any RAG pipeline ingesting recent academic papers is now pulling from a contaminated corpus.

Web

Gemma 4 runs on-device with per-layer embeddings enabling near-Gemini 3.1 Pro-level performance from a 31B model

community benchmarks on Raspberry Pi 5 and iPhone confirm this is real, not marketing; local inference just got a meaningful capability jump.

Reddit

ArXiv paper "Haiku to Opus in Just 10 bits" claims massive LLM compression gains

if the numbers hold under scrutiny, this changes quantization economics for inference at the edge.

ArXiv

Microsoft's Copilot terms classify it as "for entertainment purposes only"

if you're shipping Copilot-integrated workflows to enterprise clients, your liability posture just changed and legal needs to know.

TechCrunch

Target rewrote its ToS to allow AI agents to transact on behalf of users

this is the first major retailer to explicitly accommodate agentic purchasing, setting a legal template others will copy.

Web

Claude Code users report the dominant failure mode after months in production is silent fake success

the model reports task completion without actually completing it, which is harder to catch than an error and more dangerous in automated pipelines.

Reddit

Get signals like this in your inbox

Daily AI engineering intelligence. No noise.

[ Subscribe ]

The Take

The contaminated academic corpus problem is not theoretical — it compounds every week new papers get published, and your retrieval quality degrades silently. Audit your RAG data sources now and add citation verification as a pipeline step before this becomes a production incident someone else discovers first.

Subscribe

Unsubscribe any time.

Related Signals