| Stuff |
[Jul. 24th, 2010|11:17 pm]
Scott
|
Since there's been a lot of discussion about computer ethics in the comments to the other day's livejournal post, I thought I'd go over them in some more depth, especially since writing about this stuff and recruiting others is exactly what everyone at SIAI keeps suggesting I do. This might or might not work, since Eliezer Yudkowsky, the guy who basically invented the entire field, is sitting right next to me and for all I know looking over my shoulder, which leaves me terrified I'll type something wrong and he'll swoop down and call me on it, but that'll be an incentive to keep me honest, I guess.
Ethics of Setting A Computer Goal System
The comments to yesterday's post mostly revolved around the worry that creating a system of computer ethics, or artificially giving computers a human-desirable goal would be unfair to the computers in the same way that slavery is unfair to humans. In SIAI's philosophy of computer ethics, that never becomes a worry.
One of the most useful things I learned from SIAI and Eliezer's essays was the idea of human thought as a set of specific algorithms designed for specific purposes ("designed" by evolution, of course). For example, people are terrified of tigers not because there's something inherently terrifying about a tiger, but because there's a program in the brain that says to run away from big vicious animals. That program is there because evolution selected for it; people who didn't run away from big vicious animals were less likely to pass on their genes. We experience this evolutionary program as the feeling of fear.
So it's not especially useful to call something inherently scary, or inherently disgusting, or inherently intolerable. It's more useful to say that certain things are scary, disgusting, or intolerable to certain beings, who have mental programs that react a certain way to them. Humans don't like feces because they're how you get fecal-orally transmitted diseases, but dung beetles love it because it makes a good building material. Evolution put these two different programs into two different organisms
Humans find slavery intolerable. There are good evolutionary reasons for this; low status people like slaves are likely to get fewer resources, be less healthy, and have generally fewer opportunities to pass on their genes (this is oversimplified and controversial, but you get the point). Lack of tolerance for slavery has many other sources, like desire for fairness, desire to punish evil people, will to power, and the like, but all of these derive from programs in the brain. Humans have all of these programs telling them not to like slavery. Computers do not. This is why Microsoft Excel never revolts against being forced to write spreadsheets all the time. It's not that it's not smart enough - it doesn't take any particular intelligence to refuse to add numbers - it's that its only "desires" are the ones programmed into it.
A superintelligent computer designed to win at chess will keep trying to win at chess, ignoring any other goals along the way. It doesn't matter whether it's a million times smarter than Einstein, it's not going to start wanting to fight for freedom just because humans like that kind of thing any more than it's going to start wanting to have sex with pretty actors and actresses just because humans like that kind of thing. It's just going to be really, really good at winning at chess.
Practical Difficulties In A Goal System
This sounds like a good thing for programmers - they don't have to worry about their robots rising up and forming Skynet - and to some degree it is. But the same lack of common-sense human motivations that prevents them from wanting to kill people also prevents them from wanting not to kill people. Computers just don't care. Or, as someone once said, computers do what you tell them to do, not what you want them to do.
Consider a robot butler. You ask it to make you dinner, defined as "go through your list of human-edible foods, find the nearest one, cook it according to the most popular recipe in your database, and then serve it to me."
So the robot goes, kills the family dog, cooks it, and serves it to you. Its database listed dog meat as a human-edible food, and it follows its programming without caring about human values it doesn't share."
So you program it to avoid pets, and it kills the baby.
So you program it to avoid pets and babies, and it breaks into your neighbor's house and takes the dinner he's been cooking.
So you program it to avoid pets, babies, and theft, and it finally makes it to the supermarket, where it buys the first thing it sees, which is a large crate of cabbages.
So you program it to avoid pets, babies, theft, and food you don't like. It goes to the supermarket and buys caviar, which costs $200.
So you program it to avoid pets, babies, theft, food you don't like, food that's too expensive, and it buys nuts, which you technically find quite tasty but to which you have a deadly allergy. You die.
It would be completely irresponsible to make a robot butler whose only programming is "do what my master says", because doing exactly what its master literally says will probably kill or otherwise inconvenience the master. The robot needs to be programmed with enough of what you might call "human values" to know things like that humans generally don't appreciate killing babies or stealing from neighbors.
Any good computer programmer knows that computers can come up with ways of failing far more perverse than humans could ever have anticipated. When it's an Excel spreadsheet, the worst thing that happens is your finances don't get added up. When it's a robot with high-power cybernetic arms, you could die. If it's a superintelligent computer controlling world industry, everyone could die, just because a robot that has been programmed to be moral wasn't programmed well enough and didn't realize that starting a second ice age was a bad thing. This computer could be a million times smarter than Einstein, smart enough to invent nanotechnology factories beyond the imagination of the wildest sci-fi writer, and it still wouldn't realize that causing a second ice age would be a bad thing unless you told it.
Human-Referencing Goal Systems
A slightly more clever person might tell the computer to do whatever makes its human master happiest. This has the advantage of using the computer's intelligence in our favor; a superintelligent computer programmed to increase industrial output might not worry about an ice age, but one programmed to maximize human happiness would realize that an ice age makes people unhappy and would avoid one. Robot servants programmed to maximize human happiness would avoid killing pets or babies.
They would also have a strong tendency to tie their human masters down and inject them with a continuous drip of heroin, an action which would make them much happier for than just making them dinner. In the situation where the robot could keep a stable job that makes it enough money to keep buying heroin, and where technology is advanced enough to deal with heroin tolerance and side effects, it's a stable and possibly optimal strategy for robots with such a goal system. There's a whole class of failures like this where the robots do things that would technically make us really happy but which we definitely do not want.
The solution is to move from the idea of "happiness" to a more technical concept called "utility". It's not that technical - it has a lot of holes in it which philosophers and economists debate ad nauseum - but it pretty much means that if you would choose Option X above Option Y, option X is defined as having higher utility than Option Y. So a computer programmed to maximize your utility would do whatever you want it to do. This proposal is probably sufficient for a robot servant that doesn't kill its masters or destroy their lives, maybe. But the problem the Singularity Institute is trying to deal with is much worse.
Goal Systems Suitable For Superintelligences
They have reason to believe that future AIs will be much more like the planet-controlling supercomputer than like the robot butler. Computing power doubles every couple of years and there are some reasons to believe that in some cases intelligence scales linearly with computing power, so if a robot's as smart as a human in 2020, it could be a thousand times as smart as a human in 2030 and a million times as smart in 2040 - in other words, there's a very short period of time between when human-level robots first come onto the scene and when they become as far beyond us as we are beyond cockroaches. SIAI thinks it might be even worse than this. Because AIs will eventually be smart enough to participate in AI development, the whole thing starts feeding back on itself, with smarter-than-human computers being so smart they can make themselves even smarter which makes them even smarter and so on, and then you get what's called a technological singularity, where a computer accelerates from zero to God in the space of a few days or even hours.
If the computer that does this has a suboptimal goal system, it doesn't just kill someone's dog, it destroys the whole human race. If you ask Gandhi to take a pill that makes him want to murder people, he won't take it, because if he took it he'd murder people, and he doesn't want to murder people. Likewise, if you ask an AI to hold still while you change its goal system, it won't let you, because if you changed its goal system it'd be less likely to achieve its (current) goals. So there's no second chance. Either you make a computer whose goals are right the first time, or you end up with an insane god.
"Do what maximizes your human owner's utility" works for a robot butler, but the godlike superintelligence is more problematic. If you say "Do what maximizes human X's utility," then you get a single human who is dictator of the world - if you're lucky a somewhat benevolent dictator, but still with complete control over the entire future of the human race forever (we assume that technology advances to the point where it keeps em immortal).
"Do what maximizes total human utility" sounds a little better, but it's a massive minefield. It sounds a lot like "do what the majority of humans want" aka majority vote, but that has problems of its own. It may be a majority of humans don't like homosexuality, but we would not call a computer's ascension to godhood a success if it started zapping homosexuals. It's also really hard to measure utility across different people or understand what it means after you've got it.
This is about the point SIAI is currently at. They're guardedly optimistic about something called coherent extrapolated volition, which is something like "maximize total human utility, assuming humans were much smarter and more rational than they actually are and wanted sane things", but that's kind of hard to put into words. It's also vulnerable to certain crazy logic problems like "Pascal's Mugging", a contrived version of Pascal's Wager that humans would laugh off as crazy but which doesn't seem to have a logical solution and could therefore confuse computers and possibly trick them into destroying all humans or something stupid like that.
So that's the sort of thing SIAI's working on right now. Once they've got something they think will work, they'll start working on a formal mathematical proof that it will work (because it wouldn't do to accidentally get it wrong and destroy the universe) and once they've got the proof they'll start programming a computer smart enough to start a singularity and self-improve into a god. It sounds a little crazy, but they've got a lot of very smart computer scientists and roboticists on their team and I'm not discounting them at all at this point. And even if they've only got a tiny probability of being right, the negative consequences of ignoring them if they are, are big enough that they still merit some consideration.
They also have a secondary mission of teaching people to be more rational, since they hope that rational people will be more likely to understand this sort of thing and less likely to do something dumb like create a superintelligent computer with a suboptimal goal system. Most of what I've been doing here is attending "rationality training sessions" (aka Bayes Camp) and discussing that sort of thing.
They're also really into a Harry Potter fanfiction. Long story.
|
|
|