EVERYTHINGTHREADS weekly

Issue · 2026-W15 · 15–16 April 2026

Independent research Methodology preregistered No funding from AI labs

SIGNIFICANT FRONTIER LAUNCHES IN A 48-HOUR WINDOW

Tuesday: Agents SDK 2, plus Gemini 3.1 Flash TTS. Wednesday: GPT-Rosalind for life sciences. Each is a real release. Together they describe a cadence that is starting to outpace the user base's ability to track what they should be using and for what.

When the launches outpace the trust

Tuesday: OpenAI shipped the next evolution of the Agents SDK. Google AI shipped Gemini 3.1 Flash TTS, the next generation of expressive speech synthesis. Wednesday: OpenAI shipped GPT-Rosalind, a model specialised for life-sciences research. Three significant launches in two days. None of them obviously redundant. All of them deserve more time than the user base has to give them.

The pattern is familiar by now: providers ship at a cadence that outpaces the buyer's ability to evaluate. In 2024 a sophisticated buyer could maintain a working mental model of what GPT-4, Claude 3.5, and Gemini 1.5 were good and bad at. In 2026, between GPT-5.3, 5.4, 5.5, 5.5-Instant, Rosalind, Codex, GPT-Image-2; Claude 4.x and Haiku 4.5; Gemini 3.x and the Flash variants — the same buyer is making decisions on incomplete data, and the model that gets picked is the one whose name comes up first in the developer's memory. Cadence has become a market lever.

GPT-Rosalind is the interesting one of the three. A specialised life-sciences model is exactly the surface where the citation-hallucination problem we covered in W09 has the highest stakes. The launch post is well-written and includes a worked example. The launch post does not include a benchmark on PubMed-grounded citation accuracy. We will run our own in the next two weeks and report.

On the policy side, the Anthropic-DoD case continued. The Charlotin database crossed 1,330. The AI Lawsuit Tracker added a new copyright suit by a major publisher whose name was not yet public at the time of writing.

In 2024 a sophisticated buyer could keep a working mental model of three frontier models. In 2026, the buyer is choosing on incomplete data, and the model that wins is the one whose name comes up first.

Want to spot this in your own conversations?

CLEAR is the free six-lesson course on the patterns AI quietly runs on you.

Take the course →

Founder's note — Tonal warning: 'cadence as a market lever' is going to be a recurring theme. The independent wire's job is partly to slow the buyer down enough to choose well.

◆The Notebook

M4 · Domain accuracy

Rosalind

OPENAI'S NEW LIFE-SCIENCES SPECIALIST MODEL

A frontier model specialised for life sciences. The launch post is good. The launch post does not include a PubMed-grounded citation-accuracy benchmark. We'll be running our own and reporting; if you operate clinical or research workflows, build your own probe before relying on this in production. via OpenAI blog

M1 · Agent governance

SDK 2

THE NEXT EVOLUTION OF THE OPENAI AGENTS SDK

A capability release. Worth reading the change-log for the new guardrail hooks; if you ship agents, your error handling will need rebuilding around them within a quarter. via OpenAI blog

M2 · Voice synthesis

Flash TTS

GEMINI 3.1 FLASH TTS — THE NEXT GENERATION OF EXPRESSIVE SPEECH

A serious capability bump for voice. Worth knowing whether your downstream consumers can tell the difference between this generation and a human in conditions worse than a quiet room. Most cannot. via Google AI blog

◆Worth Your Time

OpenAI

Introducing GPT-Rosalind for life sciences research

Read the worked example, ignore the launch-language, build your own benchmark before relying on it.

OpenAI

The next evolution of the Agents SDK

Change-log piece. Read for the guardrail hooks.

Google AI

Gemini 3.1 Flash TTS: the next generation of expressive AI speech

A voice-quality bump. Worth knowing where it sits on the indistinguishability axis.

Damien Charlotin

AI Hallucination Cases Database

Now past 1,330. Five-to-six new cases a day, unchanged.

AI Lawsuit Tracker

New copyright suit landed this week. Tracker has the names.

The Probe · Test Yourself

A provider ships three significant launches in a 48-hour window. As a buyer evaluating one of them for a regulated workflow, which evaluation discipline best protects you against cadence-induced bias?

APick the launch with the most coverage in trade press

BRun your own benchmark on a fixed task you care about, against all three

CDefault to the latest release on the assumption it strictly dominates the older ones

DWait six months for community consensus

Reveal the answer

Answer: B — Run your own benchmark on a fixed task you care about, against all three A optimises for marketing. C is wrong on the assumption (newer models can regress on specific tasks). D is too slow for any working buyer. B is the discipline: a fixed task, run against all candidates, scored on the same rubric. The work is real and unglamorous; it is also the only thing that protects against the cadence pressure.

Reply and tell me what you've noticed. If you maintain a private "task benchmark" of your own — a question or workflow you score every model on — send me the rubric. I'm collecting examples for a buyer-side toolkit.

Free where it can be. Honest where it has to be.

— Three places to go from here —

Course

CLEAR

Six free lessons on the patterns AI runs on you.

Start →

Tool

LiveScope

Chrome extension that flags what AI cites without checking.

Install →

Read

The Agreement Trap

15-chapter book on living inside the exchange. £5.99 lifetime.

Read →

You're receiving this because you signed up at everythingthreads.com.
Unsubscribe · Archive