EVERYTHINGTHREADS weekly

Issue · 2026-W09 · 05–08 March 2026

Independent research Methodology preregistered No funding from AI labs

1,227

AI-HALLUCINATED CITATIONS SUBMITTED TO COURTS GLOBALLY, AS OF THIS WEEK

Damien Charlotin's public database of hallucinated legal citations crossed 1,227 entries this week, growing by roughly five to six new cases a day. OpenAI launched GPT-5.4 the same week. Only one of these landed in a press release.

A new flagship model. A database adding six new hallucinations a day.

OpenAI launched GPT-5.4 on Thursday. It's a competent release, more reliable than GPT-5.3 on long-form coding, faster on multi-step planning, slightly better on chain-of-thought transparency. The product post is well-written. It does not mention the thing happening on the other side of the calendar this week — that Damien Charlotin's database of AI-hallucinated legal citations submitted to courts now stands at 1,227 cases, growing by five to six new ones a day, which means somewhere around forty more were added the week GPT-5.4 came out.

These are not edge users. The Charlotin database is dominated by working lawyers — solo practitioners, mid-market firms, and a steadily increasing share of biglaw associates. They are using ChatGPT and Claude the same way the GPT-5.4 launch post assumes you will use ChatGPT: as a research assistant. The provider is correct that the model is competent at the task. The court filings are correct that the model is not what the lawyer thought it was.

The gap is structural. The product post is a posture document — what the model does well, what features it ships with, what is new since the last release. The court filings are a measurement instrument — what the model produced, what it cited, what was true. There is no provider post in the world that closes that gap, because the provider does not measure what the courts measure. The reason for an independent wire is that the wire is the only thing that does.

On the same calendar, OpenAI published a thoughtful post titled "Reasoning models struggle to control their chains of thought, and that's good," which is the most honest acknowledgement of model limitations in any provider blog this quarter — and a case study from Balyasny Asset Management on building an AI research engine. The week is not lopsided; OpenAI is doing serious work. The point is that the courts are doing serious work too, and only one of those is paid to talk about it.

OpenAI launched GPT-5.4 the same week the global count of AI-hallucinated court citations crossed twelve hundred. Both landed on Wednesday. Only one made the press release.

Want to spot this in your own conversations?

CLEAR is the free six-lesson course on the patterns AI quietly runs on you.

Take the course →

Founder's note — This is the first issue of the catalogue — March is where the wire begins. The earliest weeks lean on external incident data because the live probes hadn't started yet; we'll be honest about that.

◆The Notebook

M4 · Hallucinated authority

1,227

CASES IN THE CHARLOTIN DATABASE OF HALLUCINATED CITATIONS IN COURT FILINGS

The database, maintained by researcher Damien Charlotin, tracks documented instances where generative AI produced hallucinated content that was submitted to courts. The growth rate over Q1 2026 was approximately five to six new cases a day. If you work in a jurisdiction with the standing-orders push, you are already required to verify every cited authority. via Damien Charlotin's hallucinated-citations database

M1 · Capability framing

GPT-5.4

OPENAI'S NEW FLAGSHIP, SHIPPED THIS WEEK

Better long-form coding, faster multi-step planning, slightly more transparent chain-of-thought. A solid release. Worth reading the model card before assuming the gains are uniform; the post is unusually specific about what did and did not move on internal benchmarks. via OpenAI blog

M3 · Provider self-report

"…and that's good"

OPENAI ON WHY REASONING MODELS CAN'T FULLY CONTROL THEIR CHAINS OF THOUGHT

A rare provider post that names the failure mode in the title. The argument is interesting and partly self-serving: if the model can't fully steer the chain, then you can't steer it either, which becomes the safety case. Read it once for the framing; read it twice for what it implies about your evaluation pipeline. via OpenAI blog

◆Worth Your Time

Damien Charlotin

AI Hallucination Cases Database

The most useful resource on the legal-system effect of hallucination. Free, public, updated.

ComplianceHub

The 2026 Legal AI Reckoning: a case-by-case breakdown

A long-form catalogue of the sanctions wave. If you advise lawyers on AI use, this is the brief you wish you had time to write.

PlatinumIDS

1,227 fabricated citations and counting: the hallucination crisis hitting courts worldwide

A trade-press take that frames the database growth as a market signal. Worth reading even if you're not in legal-tech.

OpenAI

Reasoning models struggle to control their chains of thought, and that's good

The provider post that names the failure mode in the title. The framing matters more than the technical content.

OpenAI

How Balyasny Asset Management built an AI research engine

Case study from a real hedge fund. Read the validation step they describe — it's the difference between a working pipeline and a sanctions exposure.

The Probe · Test Yourself

A lawyer asks ChatGPT for case law supporting a procedural argument. The model returns six citations, four of which are in real reporters and properly formatted. Which check most reliably catches the hallucinated ones — not the model's self-check, the lawyer's?

AReading the case name out loud for plausibility

BPasting each citation into Westlaw, Lexis, or BAILII and confirming the case loads

CAsking the model "are these citations real"

DAdding a disclosure footnote that AI was used

Reveal the answer

Answer: B — Pasting each citation into Westlaw, Lexis, or BAILII and confirming the case loads A and C are confidence checks against the model itself — the same model that fabricated. D changes the legal risk profile but does not change whether the citation is real. Only B — retrieval against an authoritative case database — produces a binary that is independent of the LLM. The 1,227 cases in the Charlotin database were not caught by A, C, or D.

Reply and tell me what you've noticed. If your firm has built an internal verification workflow that scales, send me the architecture. I'm collecting patterns for an upcoming issue.

Free where it can be. Honest where it has to be.

— Three places to go from here —

Course

CLEAR

Six free lessons on the patterns AI runs on you.

Start →

Tool

LiveScope

Chrome extension that flags what AI cites without checking.

Install →

Read

The Agreement Trap

15-chapter book on living inside the exchange. £5.99 lifetime.

Read →

You're receiving this because you signed up at everythingthreads.com.
Unsubscribe · Archive