Discussion about this post

User's avatar
Siebe Rozendal's avatar

Curious what you think of this take:

There's a number of priors that lead me to expect much of the current AI safety research to be low quality:

1 A lot of science is low quality. It's the default expectation for a research field.

2 It's pre-paradigmatic. Norms haven't been established yet for what works in the real world, what are reliable methods and what is p-hacking etc. This makes it not only difficult to produce good work, it also makes it hard to recognize bad work and hard to get properly calibrated about how much work is bad, the way we are in established research fields.

3 It's subject to selection effects by non-experts. It gets amplified by advocates, journalists, policy groups, the general public. This incentivizes hype, spin etc. over rigor.

4 It's a very ideological field. Because there's not a lot of empirical evidence to go on, and a lot of people's opinions were formed before LLMs exploded, and people's emotions are (rightly) strong about the topic.

5 I'm part of the in-group and I identify with - sometimes even know - the people doing the research. All tribal biases apply.

Now, some of this may be attenuated by the field being inspired by LessWrong and therefore having some norms like research integrity, open discussion & high criticism, but I don't think those forces are strong enough to counteract the other ones.

If you believe "AI safety is fundamentally much harder than capabilities, and therefore we're in danger", you should also believe "AI safety is fundamentally much harder than capabilities, and therefore there's a lot of invalid and unreliable claims".

Also, this will vary for different subfields. Those with tighter connection to real-world outcomes, like interpretability, I would expect to be less bad. But I'm not familiar enough with the subfields to say more about specific ones.

Expand full comment
Andrew Doris's avatar

Thanks for writing this, it’s an important point. I agree that the same structural flaws (selection bias and rewards for sensationalism) apply to both cases. I agree that this creates a bubble which puts upward pressure on the AI safety community's average p(doom) and downward pressure on its timelines. And I agree that some in that community should publicly and meticulously engage with good-faith critics like AI Snake Oil, while a larger number spend time reading those debates, in part so the community's valid concerns are taken more seriously by the smartest outsiders. (In case you haven't already seen it, you may enjoy this exchange between Arvind and Ajeya Cotra of OP: https://asteriskmag.com/issues/10/does-ai-progress-have-a-speed-limit. The post you linked from Helen Toner was another great example.)

I also think that long-termist values are an important distinguishing factor between the AI safety and misinformation communities, which helps explain the former's muted response to more technical critics. If you believe that avoiding X-risk swamps all other concerns, then once your p(doom) gets high enough, marginal updates in the face of new research stop having as much impact on what you think is necessary now, at least on the policy side. The difference between an 80% chance that misinformation swings elections and a 20% chance it swings elections is enormous for, say, free speech on social media debates. But if you think there's a 20% or even 10% chance that AI will kill everyone within 10 years, the policy implications are arguably similar to if it were 80%! It takes a truly knock-out blow to change the calculus, and “tedious” research findings rarely deliver that.

Rather than subconscious or self-interested aversion to anything "challenging funding narratives and risk scenarios," I think a lot of AI safety people see a "drift toward hype" as helpfully compensating for normal people's undervaluation of the long-term future. This doesn’t necessarily make the drift good, but it does make it more defensible, and I’m more open to the possibility that it’s good than I was for overhyped misinformation research. It's possible the hype will eventually backfire by causing credibility damage (ex: if the risk is real, but timelines wind up being substantially longer than anticipated, so everyone sharing AI 2027 looks dumb by 2030). But it’s also possible that AI will change so much so fast that people get scared shitless, and the side waving their arms about it now will gain credibility for predicting what a big deal it will be, even if their most alarmist predictions don’t pan out.

In other words, hype should be proportionate to both the probability and the severity of a possibility, and keeping it proportionate is arguably more important to political outcomes than the wonky nerd-fights we both enjoy.

Expand full comment
6 more comments...

No posts

Ready for more?