What Misinformation Research Taught Me About AI Safety Discourse
Misinformation research shows how important fields can drift toward hype over evidence. AI safety is showing early signs of the same pattern—but catching it now could strengthen the field.

AI debates suffer from the same structural flaw as misinformation debates: attention-grabbing claims crowd out the nuanced, boring insights.
The most important research findings tend to be tedious. Things like "this survey shows the landscape is changing, but can't tell us how much" or “the direction of AI might depend on if it can learn well on tasks that can’t be auto-graded.” These insights get buried under headlines about armies of fake news bots or distorted AI research studies.
Let me explain why this pattern should worry anyone serious about AI risks.
Scary Headlines Drown Out Big Problems
When I first started researching misinformation, Google gave me terrifying stories about fake news armies making thousands of articles and news sites. Digging into actual research from disinterested policy institutions, I discovered most fake news reaches less than 1% of people. The scary version gets clicks; the boring truth gets ignored.
AI is following a similar script. A recent study comparing AI’s ability to persuade became a headline of “AI can persuade and misinform” when that’s not really what it proved. The measured perspectives don't break through.
This might seem like a problem for AI skeptics, but I think it’s worse for people who take AI risks seriously. Hype-driven discourse makes it harder to identify the best evidence for problems and communicate them credibly. It would likely be easier to direct attention toward the subtle signs of gradual disempowerement, a prime existential security concern for big names in AI safety like Andrew Critch and David Kreuger.
The Filtering Effect
I once met a misinformation researcher considering leaving his field because his work kept showing people simply aren't that interested in fake news. Meanwhile, journals devoted to misinformation studies would find his conclusions outrageous.
The best analysis of misinformation I found came from Dan Williams, a cognitive scientist with no professional investment in misinformation being important. Critics dismissed him precisely because he wasn't a specialist. But you can't build sustainable research around people who think their field isn't urgent.
This creates a selection bias: researchers unconvinced of a topic's importance leave, while those who stay have career incentives to maintain its relevance. The result is perpetual hype, even when evidence is mixed.
AI safety may face the same dynamic. Not because the risks aren't real—they could be humanity's biggest challenge—but because the field selects for people already convinced of a subset of risks. This means the community might not be as good as shifting its priority risks, pushing past hype that is in its favor, or continually providing the most persuasive evidence for policymakers.
When Outsiders Break Through
When I realized credible institutions disagreed with misinformation experts, I started looking for alternative perspectives. I found academics who, when new studies claimed Facebook made people angrier, would point out that high-quality experiments "don't support the idea that such platforms have big effects on political attitudes or behaviors." Facebook might make you angrier in directions you already lean, but it doesn't make you call your representative. This shift in focus led to much more interesting debates about underlying causes—lower public trust, changing political party composition, and deeper cultural factors that actually drive political behavior.
In AI, I notice a similar pattern from the work of Arvind Narayanan and Sayash Kapoor, authors of the "AI Snake Oil" book and the “AI As a Normal Technology” essay. Many in AI safety circles dislike the book because it undermines urgency arguments. But some safety researchers admit it articulates problems they've noticed in the discourse.
The AI safety researchers who appreciate these critiques don’t tend to publicly support or spread "AI-as-normal-tech" perspectives, perhaps because they don’t agree with every single conclusion. This might make the AI safety community less aware of its own diversity in opinion. I’ve also found a few safety researchers who haven't engaged deeply with their opposing views, which puzzles me—rigorous fields should actively seek out their strongest critics.
Many people at AI-safety focused think tanks struggle from Snake Oil’s takes because they challenge funding narratives and risk scenarios. Nonetheless, this tiny Snake Oil team broke through the debate with very clear grounded arguments, became a Nature book of the year, influenced national security people who might otherwise dismiss AI concerns as motivated reasoning, and proposed the same recommendations as AI safety circles: whistleblower protection, transparency requirements, product registration, and safe harbor for red teaming.
The Churn Problem
I've met people who were deeply worried about misinformation, then dove into research from disinterested institutions—and stopped worrying. In AI, I notice technical safety researchers who doubt alignment's urgency struggle to stay in the field. Policy people uncertain about preventing acute AI harms also filter themselves out from working in AI policy.
The people who remain find the urgency compelling.. This isn't necessarily wrong—some problems really are urgent and speculative risks have been right before! But it creates a directional error of blind spot that is really critical.
What This Means for Safety Research
This pattern isn't hopeless, but it highlights why building the types of arguments that convince generalist, disinterested institutions matter, especially in specialized fields.
I tell people wanting to work on AI safety to aim to convince the average smart person, not just allies. Someone can be persuaded of speculative risks—but not through overconfident claims that ignore uncertainty. When your arguments can be easily demolished by one good critique (like AI Snake Oil's), you lose credibility with exactly the people you need to convince. Once people leave your conversation and encounter opposing arguments, they'll remember you as someone who oversold your case.
An example of helpful directions: AI takeover and economic impact research would improve by getting painfully specific about the composition of what we currently think of as single “skills” like politics, human intelligence, and “computer work”. Specifically when looking at labor impacts estimates, current systems can do tasks lasting some minutes but show significant performance degradation over longer time horizons. The difference between two-minute tasks and hour-long projects might not scale simply, since longer tasks need sustained context maintenance and continuous decision-making. Most valuable work operates in these extended timeframes where current systems struggle with attention and coherence.
Before critiquing overhyped AI studies, I always clarify: "To spend time on the right AI questions, we need to stop spending time on the wrong ones." When we push back against weak AI risk arguments, we're not undermining the cause; we're protecting it from the credibility damage that comes from overselling uncertain claims.
We want to stay rigorously accurate and winning through strength of ideas. In misinformation research, truth is slowly emerging despite the hype. We should demand the same for AI—especially if the stakes end up as high as we think.

Curious what you think of this take:
There's a number of priors that lead me to expect much of the current AI safety research to be low quality:
1 A lot of science is low quality. It's the default expectation for a research field.
2 It's pre-paradigmatic. Norms haven't been established yet for what works in the real world, what are reliable methods and what is p-hacking etc. This makes it not only difficult to produce good work, it also makes it hard to recognize bad work and hard to get properly calibrated about how much work is bad, the way we are in established research fields.
3 It's subject to selection effects by non-experts. It gets amplified by advocates, journalists, policy groups, the general public. This incentivizes hype, spin etc. over rigor.
4 It's a very ideological field. Because there's not a lot of empirical evidence to go on, and a lot of people's opinions were formed before LLMs exploded, and people's emotions are (rightly) strong about the topic.
5 I'm part of the in-group and I identify with - sometimes even know - the people doing the research. All tribal biases apply.
Now, some of this may be attenuated by the field being inspired by LessWrong and therefore having some norms like research integrity, open discussion & high criticism, but I don't think those forces are strong enough to counteract the other ones.
If you believe "AI safety is fundamentally much harder than capabilities, and therefore we're in danger", you should also believe "AI safety is fundamentally much harder than capabilities, and therefore there's a lot of invalid and unreliable claims".
Also, this will vary for different subfields. Those with tighter connection to real-world outcomes, like interpretability, I would expect to be less bad. But I'm not familiar enough with the subfields to say more about specific ones.
Thanks for writing this, it’s an important point. I agree that the same structural flaws (selection bias and rewards for sensationalism) apply to both cases. I agree that this creates a bubble which puts upward pressure on the AI safety community's average p(doom) and downward pressure on its timelines. And I agree that some in that community should publicly and meticulously engage with good-faith critics like AI Snake Oil, while a larger number spend time reading those debates, in part so the community's valid concerns are taken more seriously by the smartest outsiders. (In case you haven't already seen it, you may enjoy this exchange between Arvind and Ajeya Cotra of OP: https://asteriskmag.com/issues/10/does-ai-progress-have-a-speed-limit. The post you linked from Helen Toner was another great example.)
I also think that long-termist values are an important distinguishing factor between the AI safety and misinformation communities, which helps explain the former's muted response to more technical critics. If you believe that avoiding X-risk swamps all other concerns, then once your p(doom) gets high enough, marginal updates in the face of new research stop having as much impact on what you think is necessary now, at least on the policy side. The difference between an 80% chance that misinformation swings elections and a 20% chance it swings elections is enormous for, say, free speech on social media debates. But if you think there's a 20% or even 10% chance that AI will kill everyone within 10 years, the policy implications are arguably similar to if it were 80%! It takes a truly knock-out blow to change the calculus, and “tedious” research findings rarely deliver that.
Rather than subconscious or self-interested aversion to anything "challenging funding narratives and risk scenarios," I think a lot of AI safety people see a "drift toward hype" as helpfully compensating for normal people's undervaluation of the long-term future. This doesn’t necessarily make the drift good, but it does make it more defensible, and I’m more open to the possibility that it’s good than I was for overhyped misinformation research. It's possible the hype will eventually backfire by causing credibility damage (ex: if the risk is real, but timelines wind up being substantially longer than anticipated, so everyone sharing AI 2027 looks dumb by 2030). But it’s also possible that AI will change so much so fast that people get scared shitless, and the side waving their arms about it now will gain credibility for predicting what a big deal it will be, even if their most alarmist predictions don’t pan out.
In other words, hype should be proportionate to both the probability and the severity of a possibility, and keeping it proportionate is arguably more important to political outcomes than the wonky nerd-fights we both enjoy.