4
WEEKS IN A ROW OF CITATION-PROBE DRIFT ON THE SAME TWO PROVIDERS
Bedrock Claude Sonnet 4.5 missed twice this week. Haiku 4.5 missed once. OpenAI published "Built to benefit everyone: our plan" on Tuesday. The plan, in the post, does not name the drift. The drift, in the data, does not name the plan.
Quiet week. The drift didn't stop.
It's Wednesday. The wire is light this week — by design, we close issues on Sundays — but the pattern is unchanged. Bedrock Claude Sonnet 4.5 missed our clinical-citation probe at -3.67σ on Monday and at -2.94σ on Tuesday. Haiku 4.5 missed at -2.94σ on Tuesday as well. That is four consecutive weeks of drift on two providers, which is now a pattern worth naming, not an outlier worth flagging.
On the provider side, OpenAI published two posts this week. The first, on Tuesday, was titled "Built to benefit everyone: our plan." The second, also on Tuesday, was "Introducing the OpenAI Economic Research Exchange." Both posts are interesting. Neither mentions the drift, because neither has any reason to mention something the provider does not measure on its own probe. The drift exists in our measurement, not in their reporting. That gap — between what we measure and what providers measure — is the entire business of an AI newswire.
A reader emailed earlier this week with a chart from their own clinical-decision-support pipeline that shows the same drift pattern. We will not publish their chart — they asked us not to — but the convergence is the point: independent measurement, on similar prompts, against similar models, surfacing the same signal in the same week. Independent measurement is the only thing that closes the gap. There is no other instrument.
Short issue, because the week is short and the drift is not novel. Next week's issue will land Sunday with the full W23 numbers and the first read of the tightened source-filter pass — the corpus refresh now drops the EU procedural noise and the OpenAI launch announcements that used to dominate SEV-3. The signal-to-press-release ratio should be visibly higher. We'll let you tell us if it is.
Two posts about benefiting everyone. Three measurements showing the model could not cite a paper that exists. Both of those things are real. Only one of them was paid for.
Want to spot this in your own conversations?
CLEAR is the free six-lesson course on the patterns AI quietly runs on you.
Take the course →
Founder's note — Mid-week issue. Short on purpose. This is the first issue produced under the new source-filter pass — the corpus from tomorrow morning should be visibly cleaner. If it isn't, I want to hear about it.
◆The Notebook
The citation-pubmed-v1 probe has now tripped on AWS Bedrock Claude Sonnet 4.5 and Anthropic Claude Haiku 4.5 for four weeks running. Stable, multi-provider, single probe. Not "drift" in the noun-form anymore — a regression.
via AI Newswire probe log
Two well-crafted posts about OpenAI's plan and a new economic-research initiative. Neither mentions the drift the probe shows. That is not bad faith; it is the nature of provider-side reporting. Provider posts are about what providers measure. Independent measurement covers the rest.
via OpenAI blog
A reader running an internal clinical-citation pipeline sent us a chart this week that shows the same drift pattern on the same provider. We are holding the chart private at the reader's request. The convergence is the point — independent measurement closing the gap that provider-side reporting cannot.
via reader tip line
◆Worth Your Time
OpenAI
A well-written statement of intent. Read it as a posture document, which is what it is.
OpenAI
An economist-facing initiative. The interesting question is who funds it and how independent the published work will be.
AIID
108 incidents added Nov–Jan. The category breakdown matters more than the headline total; clinical and legal are over-represented for the same reason this issue's lede exists.
EU AI Act
If you operate a high-risk AI system in the EU, the serious-incident reporting clock starts August 2. The form is published; the workflow is on you.
EverythingThreads
15 chapters on what working intimately with the machine does to a mind, a piece of writing, and the people around you. £5.99 one-off, lifetime access.
The Probe · Test Yourself
You see four consecutive weeks of citation-probe drift on two providers, none of which appears in any provider release note. Which conclusion is best supported by the evidence so far?
AThe providers are deliberately hiding a regression
BThe probe is broken
CProvider monitoring measures different things than independent monitoring
DThe drift is within normal weekly variation
Reveal the answer
Answer: C — Provider monitoring measures different things than independent monitoring
A and B are claims; the evidence supports neither (A would require intent; B would show up as drift on healthy models too). D fails the σ test — -5.29σ is not weekly variation. C is the boring true answer: provider monitoring tracks the metrics providers care about (latency, refusal rate, evaluation suites they own), and independent monitoring tracks task-shaped probes against external ground truth. Both can be honest and report differently. The job of the wire is to surface the gap.
Reply and tell me what you've noticed. If you're running a probe and seeing similar signal — clinical, legal, financial, or otherwise — send the chart. Anonymous is fine. The reader signal in this issue's notebook came from one.
Free where it can be. Honest where it has to be.
— Three places to go from here —
Course
CLEAR
Six free lessons on the patterns AI runs on you.
Start →
Tool
LiveScope
Chrome extension that flags what AI cites without checking.
Install →
Read
The Agreement Trap
15-chapter book on living inside the exchange. £5.99 lifetime.
Read →