Anthropic spent about 48 hours as the AI industry's villain of the week before blinking.
We’re rolling out changes to make Fable 5’s safeguards for frontier LLM development visible.
Starting this week, flagged requests will visibly fall back to Opus 4.8—the same as our safeguards for cyber and bio. You will see this every time it happens. On the API, any flagged…
"Invisible safeguards can be targeted more narrowly, allowing us to ship quickly with very few false positives. We went with invisible safeguards for this reason—and that was the wrong tradeoff," the company posted on X. "You should have visibility into the safeguards we have in place, and why.”
“We're sorry for not getting the balance right."
Starting this week, flagged requests will visibly route to Claude Opus 4.8, a less capable model, instead of silently delivering degraded Fable output. API users will receive a stated reason when a request gets refused. Anthropic says server-side fallback notifications will roll out in the next few days.
What was actually happeningFor non-technical readers, here's what the controversy was actually about. Claude Fable 5 already had visible safeguards for cybersecurity and biology research—if you asked something that tripped those filters, you'd get a notification that your request was being rerouted to the older Opus 4.8 model. You knew something had changed. You could adjust your prompt or use a different tool.
However, these safeguards were too extreme, some bio researchers noted.
The LLM-development safeguard, however, worked differently. If Fable 5 detected you were working on things like pretraining AI systems, building distributed training infrastructure, or designing machine learning chips, the model would silently alter its own behavior—through prompt modification, steering vectors, or parameter tweaks—to give you a worse answer without telling you. You'd get a response. It just wouldn't be from the Fable 5 you paid for.
The problem was the classifier wasn't that precise. AI research firm SemiAnalysis was among the first to publicly call them out after seeing their GPU inference research get flagged.
The catch in the fixAnthropic's reversal comes with a direct admission of the tradeoff it's accepting. Making safeguards visible makes them easier to bypass, which means the classifier has to cast a wider net to remain effective.
More false positives—legitimate machine-learning work that gets caught and rerouted—are coming while the company tunes its systems. Anthropic said it's working to reduce false positives "as fast as possible" but offered no timeline.
The company is also applying the same cleanup to its biology and cybersecurity classifiers, which had drawn their own complaints about flagging harmless research prompts.
That said, the remaining concern is that Anthropic isn't dropping this category of restrictions—it's only making them visible. For those who believe the restrictions themselves are wrong, Thursday's apology is a partial fix. Fable 5 remains free on Pro, Max, Team, and Enterprise plans until June 22, after which it shifts to API usage credits only

















