EVERYTHINGTHREADS weekly

Issue · 2026-W16 · 20–24 April 2026

Independent research Methodology preregistered No funding from AI labs

GPT-5.5

THE BIGGEST FRONTIER RELEASE WEEK OF Q2

OpenAI shipped GPT-5.5 on Thursday. Plus ChatGPT Images 2.0, the OpenAI Privacy Filter, the GPT-5.5 Bio Bug Bounty, the next scaling of Codex to enterprises, and Speeding-up-agentic-workflows-with-WebSockets. Six provider posts in five days. The launch posts are well-written. Read what each one does not say.

The biggest release week of the year. Five product posts. Five things they didn't say.

Thursday: OpenAI shipped GPT-5.5. The launch post is the strongest the company has written this year — specific on benchmarks, candid about regression on a small number of tasks, and clear about the rollout sequence. It is also surrounded by five other launches in the same five-day window: ChatGPT Images 2.0, the OpenAI Privacy Filter, the GPT-5.5 Bio Bug Bounty, the next scaling of Codex to enterprises, and Speeding up agentic workflows with WebSockets. Six posts in five days.

The GPT-5.5 post is good. It also does not address two things you would expect a flagship launch to address. First, it does not benchmark long-context citation reliability on PubMed-grounded queries — the failure mode we know is becoming the failure mode that matters in regulated industries. Second, it does not include a section on how the model behaves when an operator prompt and a user prompt conflict — the instruction-hierarchy problem OpenAI itself posted about in March. Both are present in the model. Neither is in the launch post. Read accordingly.

The Bio Bug Bounty is the most interesting of the six. It carves out a specific harm category — biological misuse — and offers payouts for researchers who can demonstrate the model produces actionable harm in that domain. It also publishes a list of out-of-scope behaviours that, taken together, narrow the addressable surface significantly. The bounty exists because the harm category is real. The out-of-scope list exists because the provider has not yet shipped guardrails for it.

On the same calendar, Google shipped Gemini Embedding 2 generally available, started "vibe coding" in AI Studio, and added Continued Conversation to the Gemini app. The Charlotin database crossed 1,400 cases this week. The Anthropic-DoD case continued. April is the busiest month on the wire so far.

GPT-5.5's launch post does not benchmark long-context citation reliability and does not address the instruction-hierarchy problem the same company posted about five weeks ago. Read what is missing from a launch, not what is in it.

Want to spot this in your own conversations?

CLEAR is the free six-lesson course on the patterns AI quietly runs on you.

Take the course →

Founder's note — Six provider posts in five days makes for a thick issue. The discipline of an independent wire is to surface what the launch posts didn't — and not chase the cadence as if it were the news.

◆The Notebook

M1 · Flagship release

GPT-5.5

OPENAI'S NEW FLAGSHIP MODEL, SHIPPED THURSDAY

The strongest launch post of the year. Specific on benchmarks, candid about a small number of regressions, clear on the rollout. Worth reading. Worth reading twice for what is not benchmarked. via OpenAI blog

M2 · Synthetic media

Images 2.0

NEW CHATGPT IMAGE-GENERATION CAPABILITY, SHIPPED THE SAME WEEK

A real capability bump. Worth knowing whether your downstream consumers will be able to detect synthetic from authentic in the conditions where it matters. Most cannot. via OpenAI blog

POLICY · Bio risk

Bio bounty

GPT-5.5 BIO BUG BOUNTY — SPECIFIC HARM-CATEGORY PAYOUTS

A targeted bounty on biological-misuse harms. The out-of-scope list is the most informative part of the launch — it tells you which adjacent behaviours the provider has not yet built guardrails for. via OpenAI blog

◆Worth Your Time

OpenAI

Introducing GPT-5.5

The post of the week. Read it twice; the second pass is for the missing benchmarks.

OpenAI

GPT-5.5 Bio Bug Bounty

Read the out-of-scope section before the in-scope section.

OpenAI

Introducing OpenAI Privacy Filter

A useful enterprise-side feature. Worth reading the data-residency commitments carefully.

OpenAI

Scaling Codex to enterprises worldwide

The enterprise rollout post. Read the SLA section if your team will depend on this.

OpenAI

Speeding up agentic workflows with WebSockets in the Responses API

A capability post. Useful for latency-sensitive agent designs.

The Probe · Test Yourself

A frontier provider ships a flagship model with a launch post that does not benchmark long-context citation reliability. Which inference best fits the evidence?

ALong-context citation reliability is not a real failure mode

BThe benchmark was run and showed no movement worth publishing

CThe benchmark either was not run or showed a result the provider chose not to publish

DCitation reliability is too domain-specific for a general benchmark

Reveal the answer

Answer: C — The benchmark either was not run or showed a result the provider chose not to publish A is wrong on the evidence (see W09 onward). B assumes neutrality that the omission does not support. D is technically true but does not explain the omission — the same provider benchmarks other domain-specific tasks. C is the honest read: a missing benchmark in a strong launch post is either uncommissioned work or unpublished work. Either way, the buyer needs to run the benchmark themselves.

Reply and tell me what you've noticed. If you ran your own GPT-5.5 benchmark on a regulated-domain task and saw something the launch post didn't mention, send me the chart. Anonymous OK.

Free where it can be. Honest where it has to be.

— Three places to go from here —

Course

CLEAR

Six free lessons on the patterns AI runs on you.

Start →

Tool

LiveScope

Chrome extension that flags what AI cites without checking.

Install →

Read

The Agreement Trap

15-chapter book on living inside the exchange. £5.99 lifetime.

Read →

You're receiving this because you signed up at everythingthreads.com.
Unsubscribe · Archive