The point isn't that the current models are dangerous. My favorite bit from the GPT-4 safety paper was when they asked it how to kill the most people for a dollar and it suggested buying a lottery ticket (I also wonder how much of the 'safety' concerns of current models are just mislabeling dark humor reflecting things like Reddit).
But the point is to invest in working on safety now while it is so much more inconsequential.
And of everyone I've seen talk about it, I actually think Ilya has one of the better senses of the topic, looking at alignment in terms of long term strategy vs short term rules.
So it's less "if we don't spend 8 months on safety alignment this new model will kill us all" and more "if we don't spend 8 months working on safety alignment for this current model we'll be unprepared to work on safety alignment when there really is a model that can kill us all."
Especially because best practices for safety alignment is almost certainly going to shift with each new generation of models.
So it's mostly using the runway available to test things out and work on a topic before it is needed.
The point isn't that the current models are dangerous. My favorite bit from the GPT-4 safety paper was when they asked it how to kill the most people for a dollar and it suggested buying a lottery ticket (I also wonder how much of the 'safety' concerns of current models are just mislabeling dark humor reflecting things like Reddit).
But the point is to invest in working on safety now while it is so much more inconsequential.
And of everyone I've seen talk about it, I actually think Ilya has one of the better senses of the topic, looking at alignment in terms of long term strategy vs short term rules.
So it's less "if we don't spend 8 months on safety alignment this new model will kill us all" and more "if we don't spend 8 months working on safety alignment for this current model we'll be unprepared to work on safety alignment when there really is a model that can kill us all."
Especially because best practices for safety alignment is almost certainly going to shift with each new generation of models.
So it's mostly using the runway available to test things out and work on a topic before it is needed.