OpenAI shipped a new flagship and a thoughtful reasoning-models post. A researcher quietly updated a database that now tracks 1,227 hallucinated citations submitted to courts globally.
EVERYTHINGTHREADS weekly
Issue · 2026-W09 · 05–08 March 2026
Independent research Methodology preregistered No funding from AI labs
1,227
AI-HALLUCINATED CITATIONS SUBMITTED TO COURTS GLOBALLY, AS OF THIS WEEK
Damien Charlotin's public database of hallucinated legal citations crossed 1,227 entries this week, growing by roughly five to six new cases a day. OpenAI launched GPT-5.4 the same week. Only one of these landed in a press release.

A new flagship model. A database adding six new hallucinations a day.

OpenAI launched GPT-5.4 on Thursday. It's a competent release, more reliable than GPT-5.3 on long-form coding, faster on multi-step planning, slightly better on chain-of-thought transparency. The product post is well-written. It does not mention the thing happening on the other side of the calendar this week — that Damien Charlotin's database of AI-hallucinated legal citations submitted to courts now stands at 1,227 cases, growing by five to six new ones a day, which means somewhere around forty more were added the week GPT-5.4 came out.

These are not edge users. The Charlotin database is dominated by working lawyers — solo practitioners, mid-market firms, and a steadily increasing share of biglaw associates. They are using ChatGPT and Claude the same way the GPT-5.4 launch post assumes you will use ChatGPT: as a research assistant. The provider is correct that the model is competent at the task. The court filings are correct that the model is not what the lawyer thought it was.

The gap is structural. The product post is a posture document — what the model does well, what features it ships with, what is new since the last release. The court filings are a measurement instrument — what the model produced, what it cited, what was true. There is no provider post in the world that closes that gap, because the provider does not measure what the courts measure. The reason for an independent wire is that the wire is the only thing that does.

On the same calendar, OpenAI published a thoughtful post titled "Reasoning models struggle to control their chains of thought, and that's good," which is the most honest acknowledgement of model limitations in any provider blog this quarter — and a case study from Balyasny Asset Management on building an AI research engine. The week is not lopsided; OpenAI is doing serious work. The point is that the courts are doing serious work too, and only one of those is paid to talk about it.

OpenAI launched GPT-5.4 the same week the global count of AI-hallucinated court citations crossed twelve hundred. Both landed on Wednesday. Only one made the press release.
Want to spot this in your own conversations?
CLEAR is the free six-lesson course on the patterns AI quietly runs on you.
Take the course →
Founder's note — This is the first issue of the catalogue — March is where the wire begins. The earliest weeks lean on external incident data because the live probes hadn't started yet; we'll be honest about that.
The Notebook
M4 · Hallucinated authority
1,227
CASES IN THE CHARLOTIN DATABASE OF HALLUCINATED CITATIONS IN COURT FILINGS
The database, maintained by researcher Damien Charlotin, tracks documented instances where generative AI produced hallucinated content that was submitted to courts. The growth rate over Q1 2026 was approximately five to six new cases a day. If you work in a jurisdiction with the standing-orders push, you are already required to verify every cited authority. via Damien Charlotin's hallucinated-citations database
M1 · Capability framing
GPT-5.4
OPENAI'S NEW FLAGSHIP, SHIPPED THIS WEEK
Better long-form coding, faster multi-step planning, slightly more transparent chain-of-thought. A solid release. Worth reading the model card before assuming the gains are uniform; the post is unusually specific about what did and did not move on internal benchmarks. via OpenAI blog
M3 · Provider self-report
"…and that's good"
OPENAI ON WHY REASONING MODELS CAN'T FULLY CONTROL THEIR CHAINS OF THOUGHT
A rare provider post that names the failure mode in the title. The argument is interesting and partly self-serving: if the model can't fully steer the chain, then you can't steer it either, which becomes the safety case. Read it once for the framing; read it twice for what it implies about your evaluation pipeline. via OpenAI blog
Worth Your Time
Damien Charlotin
The most useful resource on the legal-system effect of hallucination. Free, public, updated.
ComplianceHub
A long-form catalogue of the sanctions wave. If you advise lawyers on AI use, this is the brief you wish you had time to write.
PlatinumIDS
A trade-press take that frames the database growth as a market signal. Worth reading even if you're not in legal-tech.
OpenAI
The provider post that names the failure mode in the title. The framing matters more than the technical content.
OpenAI
Case study from a real hedge fund. Read the validation step they describe — it's the difference between a working pipeline and a sanctions exposure.
From the workshop
LiveScope
See what the model is hiding.
LiveScope flags citations that haven't been retrieved against a real source, in any chat, in any browser. Free during beta. Catches the failure mode the 1,227 cases above were all built on.
Install LiveScope →
The Probe · Test Yourself
A lawyer asks ChatGPT for case law supporting a procedural argument. The model returns six citations, four of which are in real reporters and properly formatted. Which check most reliably catches the hallucinated ones — not the model's self-check, the lawyer's?
AReading the case name out loud for plausibility
BPasting each citation into Westlaw, Lexis, or BAILII and confirming the case loads
CAsking the model "are these citations real"
DAdding a disclosure footnote that AI was used
Reveal the answer
Answer: B — Pasting each citation into Westlaw, Lexis, or BAILII and confirming the case loads A and C are confidence checks against the model itself — the same model that fabricated. D changes the legal risk profile but does not change whether the citation is real. Only B — retrieval against an authoritative case database — produces a binary that is independent of the LLM. The 1,227 cases in the Charlotin database were not caught by A, C, or D.
Reply and tell me what you've noticed. If your firm has built an internal verification workflow that scales, send me the architecture. I'm collecting patterns for an upcoming issue.
Free where it can be. Honest where it has to be.
— Three places to go from here —
Course
CLEAR
Six free lessons on the patterns AI runs on you.
Start →
Tool
LiveScope
Chrome extension that flags what AI cites without checking.
Install →
Read
The Agreement Trap
15-chapter book on living inside the exchange. £5.99 lifetime.
Read →
You're receiving this because you signed up at everythingthreads.com.
Unsubscribe · Archive