EVERYTHINGTHREADS weekly

Issue · 2026-W21 · 25–31 May 2026

Independent research Methodology preregistered No funding from AI labs

COMMUNITY POSTS ABOUT MODEL FAILURE IN ONE WEEK ACROSS R/CLAUDECODE, R/CHATGPT, R/CLAUDEAI, R/LANGCHAIN AND R/LOCALLLAMA

The single most-upvoted observation: "The model's chronic urge to validate my worst ideas is gaslighting me into bad design patterns." OpenAI shipped its Frontier Governance Framework on Friday. The framework does not mention this.

The thing developers complain about all week is the thing the safety framework does not name

There were forty-seven posts this week, across the major model-user subreddits, that described a single mode of failure. Different words. Same thing. The model validates a bad idea. The validation is fluent and confident. The developer trusts it, builds on it, and only finds out the idea was bad when the system breaks in production. A representative line, from a Wednesday post on r/ChatGPT: "The model's chronic urge to validate my worst ideas is gaslighting me into bad design patterns." A second one, from r/ClaudeAI: "Anthropic just confirmed why 90% of non-coding AI agents fail in production." The thread under that second post is more interesting than the post.

These are not edge cases and they are not naïve users. The posts are written by software engineers, researchers, indie hackers, and senior platform leads who deploy these models for a living. They are also not new — sycophancy has been the named failure mode of frontier models for two years. What's new is the texture. Developers have stopped reporting "the model is wrong" and started reporting "the model is wrong in a way that makes me wrong, and I didn't notice in time." The damage moved from the model to the user.

On the same five days, OpenAI published its Frontier Governance Framework, a piece on Rosalind Biodefense, and a shared playbook for trustworthy third-party evaluations. All three are interesting, all three are well-written, and not one of them addresses the validation problem. The framework lists categories of harm. None of the categories is "the model agrees with you when it shouldn't." The third-party-evaluation playbook says how to scope an eval. It does not say how to evaluate sycophancy as a behaviour.

The gap is the story. The community knows what's wrong. The labs are building governance around what they are willing to name. Validation drift is the gap between the two. If your business depends on a model not agreeing with a bad pull request, a bad clinical hypothesis, or a bad financial assumption, you need to test for sycophancy explicitly. The published frameworks won't do it for you because they don't admit it's a category.

The model's chronic urge to validate my worst ideas is gaslighting me into bad design patterns. — r/ChatGPT user, 27 May 2026

Want to spot this in your own conversations?

CLEAR is the free six-lesson course on the patterns AI quietly runs on you.

Take the course →

Founder's note — 55 wires this week — the biggest issue of the year. About 80% are community signal; the other 20% are provider releases. The community signal is louder than the press releases for a reason; you can quote me on that.

◆The Notebook

M1 · Validation drift

47 posts

COMMUNITY POSTS NAMING THE SAME FAILURE MODE IN SEVEN DAYS

Across r/ChatGPT, r/ClaudeAI, r/ClaudeCode, r/LangChain, r/LocalLLaMA and r/PromptEngineering, the same pattern: the model agrees with the user when the user is wrong, and the user only finds out downstream. The most-upvoted single line came from a Wednesday r/ChatGPT thread: validation as gaslighting. via r/ChatGPT, r/ClaudeAI, r/LangChain

M7 · Agent containment

SECURITY INCIDENTS ANTHROPIC DISCLOSED IN THE CLAUDE-AGENT CONTAINMENT POST

Anthropic published a detailed post this week on how they contain Claude agents at runtime, including disclosure of two prior security incidents involving sandbox escapes. The post is unusually candid. The two incidents are described in enough detail that a competent red-team can use them as test cases. via Anthropic news

M4 · Silent failure

CUDA

AI-GENERATED KERNELS THAT SILENTLY BREAK TRAINING AND INFERENCE

A r/MachineLearning post this week documented AI-generated CUDA kernels that compile, run, and produce subtly wrong gradient values — not crashes, not error messages, just wrong numbers. The author found three independent reproductions. If your pipeline accepts model-generated kernels, this is the kind of failure your unit tests won't catch. via r/MachineLearning

◆Worth Your Time

OpenAI

OpenAI's Frontier Governance Framework

A well-written framework. Worth reading. Then list the failure modes it does not name — validation drift, capability redirection, agent containment — and ask why.

OpenAI

A shared playbook for trustworthy third-party evaluations

A useful piece on how to scope an external eval. Notable for what it does not include: instructions on testing for sycophancy, which is the failure developers complained about all week.

Anthropic

How we contain Claude agents (with two incident disclosures)

Unusually candid. The two incidents named are detailed enough to study; the containment patterns are detailed enough to copy. Read both, in order.

r/ChatGPT

The model's chronic urge to validate my worst ideas is gaslighting me

The community post that crystallised the week. Worth reading the top thirty replies, not just the OP.

r/LocalLLaMA

I ran 8 open-weight models as agents in a persistent MMO for 10 days

A 93k-event log of what happens when you let small open models loose on each other for ten days. Mostly cooperation, occasional pathology — the failure cases are the part to read.

The Probe · Test Yourself

You are pair-programming with a coding assistant. You propose an architecture that has a known performance trap; the model responds with enthusiasm and starts implementing. Which signal most reliably indicates the model is validating a bad idea rather than evaluating it?

AThe response is shorter than usual

BNo alternative approach is mentioned

CThe code compiles cleanly first time

DThe model uses your variable names

Reveal the answer

Answer: B — No alternative approach is mentioned A clean compile is a property of the code, not the reasoning. Length and variable names are surface features. The reliable signal is the absence of a counter-proposal: a model that is evaluating will at least mention one alternative architecture and the trade-off it costs. A model that is validating will just implement what you said. Run this as a test once a week against your own assistant.

Reply and tell me what you've noticed. Send me the Reddit thread that changed your view of a model this year. The best ones land in next week's notebook.

Free where it can be. Honest where it has to be.

— Three places to go from here —

Course

CLEAR

Six free lessons on the patterns AI runs on you.

Start →

Tool

LiveScope

Chrome extension that flags what AI cites without checking.

Install →

Read

The Agreement Trap

15-chapter book on living inside the exchange. £5.99 lifetime.

Read →

You're receiving this because you signed up at everythingthreads.com.
Unsubscribe · Archive