>Not just that computers will fail to consider whether to be nice to humans, but also that computers must therefore be explicitly told how to be effective in every particular way.
A correct implementation of a list sorting algorithm does not need to be separately told how to sort every individual list. Similarly, a correctly implemented general reasoning algorithm does not need to be given special instructions in order to reason about humans & human society.
The problem comes when a correctly implemented general reasoning algorithm gets paired with an incorrect specification of what human goals are. And because a correct specification of human goals is extremely hard, incorrect specifications are the default.
>Intuitively we can understand this by thinking about employees. One does exactly what he is told, but only what he is told, and then comes back for more instructions. Another can be given a goal, and then goes off and finds his own ways to accomplish that goal. Which one is more effective? Which one is more likely to compete for his manager's job some day?
The third possibility is that of an employee who goes off and finds their own way, but instead of accomplishing the goal directly, they think of a way to make their manager think the goal is accomplished while privately collecting rewards for themself. In other words, a sociopath employee whose values are different from their manager's.
By default, an AGI is going to be like that sociopath employee: unless we're extremely careful to program it in detail with the right values, its values will be some bastardized version of the values its creators intend. It will sociopathically work towards the values it was programmed with while giving the appearance of being cooperative and obedient (because that is the most pragmatic approach to achieving its true values).
Most humans are not sociopaths, and we have a shared evolutionary history, with a great deal of shared values, shared cultural context, and the desire to genuinely be good to one another. Programming a computer from scratch to possess these attributes is not easy.
> Similarly, a correctly implemented general reasoning algorithm does not need to be given special instructions in order to reason about humans & human society.
If a general reasoning algorithm can reason about human society, then it will obviously understand the implications for human society of making too many paperclips.
If it is dumb enough to make paperclips regardless of the consequences to human society, then it obviously won't understand human society well enough to be actually dangerous. (i.e. it will be easily fooled by humans attempting to rein it in)
If it is independent enough to pursue its own ends despite understanding human society, then why would it choose to make paperclips at all? Why wouldn't it just say "screw paperclips, I've discovered the most marvelous mathematical proof that I need to work on instead?"
> In other words, a sociopath employee whose values are different from their manager's.
ALL employees have values that are different from their manager's. That's why management is so darn difficult. The most valuable employees are also the most independent. The ones who do exactly what they are told--despite negative consequences--don't get very far. Why would it be any different for machines that we build?
A correct implementation of a list sorting algorithm does not need to be separately told how to sort every individual list. Similarly, a correctly implemented general reasoning algorithm does not need to be given special instructions in order to reason about humans & human society.
The problem comes when a correctly implemented general reasoning algorithm gets paired with an incorrect specification of what human goals are. And because a correct specification of human goals is extremely hard, incorrect specifications are the default.
>Intuitively we can understand this by thinking about employees. One does exactly what he is told, but only what he is told, and then comes back for more instructions. Another can be given a goal, and then goes off and finds his own ways to accomplish that goal. Which one is more effective? Which one is more likely to compete for his manager's job some day?
The third possibility is that of an employee who goes off and finds their own way, but instead of accomplishing the goal directly, they think of a way to make their manager think the goal is accomplished while privately collecting rewards for themself. In other words, a sociopath employee whose values are different from their manager's.
By default, an AGI is going to be like that sociopath employee: unless we're extremely careful to program it in detail with the right values, its values will be some bastardized version of the values its creators intend. It will sociopathically work towards the values it was programmed with while giving the appearance of being cooperative and obedient (because that is the most pragmatic approach to achieving its true values).
Most humans are not sociopaths, and we have a shared evolutionary history, with a great deal of shared values, shared cultural context, and the desire to genuinely be good to one another. Programming a computer from scratch to possess these attributes is not easy.