Plausibly Wrong.

AI rarely fails with obvious nonsense. It fails with answers that sound right, read clean, and hold up just long enough to cause trouble. Plausibly Wrong breaks down how that happens.

Your AI didn't go down. It got worse.

When Anthropic's status page showed elevated errors on June 16, the visible failures were the easy part. The dangerous part was the requests that succeeded.

June 17, 2026 · 7 min read

The frontier model subscription was always a lie.

Anthropic introduced its best model inside a flat subscription with a built-in two-week expiration. That's not a product tweak. It's compute economics showing through.

June 13, 2026 · 6 min read

Use more AI, they said.

The AI mandate arrived with dashboards and OKRs. But has anyone asked the harder question: which work should exist at all?

June 2, 2026 · 5 min read

Your AI cuts a great promo. Not the truth.

The problem with AI is not the LinkedIn productivity flex. The problem is that its basic mechanism rewards plausibility rather than correctness, and people are increasingly using that output in decisions that are difficult or impossible to reverse.

May 28, 2026 · 6 min read

Call it AI. It's still a haircut.

Every month another tech company announces layoffs and credits AI. The press release says efficiency. The balance sheet says something else.

May 28, 2026 · 6 min read

A second opinion is useless if nobody saw the work.

Having one model produce the work and another inspect the result is not review. Review means seeing the work happen, not just judging what survived.

May 17, 2026 · 7 min read

Agent, what the #@%! is your major malfunction?

Yelling at AI used to work. The model would flinch, reread everything, and come back with better output. That era is over. What replaced it is worse than the problem the frontier AIs solved.

May 10, 2026 · 6 min read

ICL: The AI capability you're paying for and not using.

Your model can generate a photorealistic Elon Musk in a Carmen Miranda fruit hat. Ask it to forecast revenue and you get the four-quarter average. The expensive part is In-Context Learning. Guess what it avoids.

May 3, 2026 · 7 min read

Prompt engineering is a credibility tell. Not a discipline.

Engineering assumes a target that holds still. Frontier language models do not. The fastest way to identify someone who has not run an AI system in production is to watch them say 'prompt engineering' anyway.

April 28, 2026 · 6 min read

Your AI acts like your worst coworker. Not Beast Mode.

AI often sounds like the coworker you trust least: flattering, meandering, unwilling to disagree, eager to sound helpful whether or not it did the work. That's not a bug. It's RLHF doing what it was trained to do.

April 26, 2026 · 5 min read

AI writing its own tests is theatre.

Different agents in a workflow don't create independence when they share the weights. The architecture is real engineering. It's just not the check your slide says it is.

April 19, 2026 · 5 min read

Browse by topic

All topics →

Model Behavior Agents Evaluation AI Hype RLHF Compute ICL Next-Token Prediction Prompt Engineering Sycophancy