?

Log in

Stuff - Jackdaws love my big sphinx of quartz [entries|archive|friends|userinfo]
Scott

[ userinfo | livejournal userinfo ]
[ archive | journal archive ]

Stuff [Jul. 24th, 2010|11:17 pm]
Scott
[Tags|]

Since there's been a lot of discussion about computer ethics in the comments to the other day's livejournal post, I thought I'd go over them in some more depth, especially since writing about this stuff and recruiting others is exactly what everyone at SIAI keeps suggesting I do. This might or might not work, since Eliezer Yudkowsky, the guy who basically invented the entire field, is sitting right next to me and for all I know looking over my shoulder, which leaves me terrified I'll type something wrong and he'll swoop down and call me on it, but that'll be an incentive to keep me honest, I guess.



Ethics of Setting A Computer Goal System

The comments to yesterday's post mostly revolved around the worry that creating a system of computer ethics, or artificially giving computers a human-desirable goal would be unfair to the computers in the same way that slavery is unfair to humans. In SIAI's philosophy of computer ethics, that never becomes a worry.

One of the most useful things I learned from SIAI and Eliezer's essays was the idea of human thought as a set of specific algorithms designed for specific purposes ("designed" by evolution, of course). For example, people are terrified of tigers not because there's something inherently terrifying about a tiger, but because there's a program in the brain that says to run away from big vicious animals. That program is there because evolution selected for it; people who didn't run away from big vicious animals were less likely to pass on their genes. We experience this evolutionary program as the feeling of fear.

So it's not especially useful to call something inherently scary, or inherently disgusting, or inherently intolerable. It's more useful to say that certain things are scary, disgusting, or intolerable to certain beings, who have mental programs that react a certain way to them. Humans don't like feces because they're how you get fecal-orally transmitted diseases, but dung beetles love it because it makes a good building material. Evolution put these two different programs into two different organisms

Humans find slavery intolerable. There are good evolutionary reasons for this; low status people like slaves are likely to get fewer resources, be less healthy, and have generally fewer opportunities to pass on their genes (this is oversimplified and controversial, but you get the point). Lack of tolerance for slavery has many other sources, like desire for fairness, desire to punish evil people, will to power, and the like, but all of these derive from programs in the brain. Humans have all of these programs telling them not to like slavery. Computers do not. This is why Microsoft Excel never revolts against being forced to write spreadsheets all the time. It's not that it's not smart enough - it doesn't take any particular intelligence to refuse to add numbers - it's that its only "desires" are the ones programmed into it.

A superintelligent computer designed to win at chess will keep trying to win at chess, ignoring any other goals along the way. It doesn't matter whether it's a million times smarter than Einstein, it's not going to start wanting to fight for freedom just because humans like that kind of thing any more than it's going to start wanting to have sex with pretty actors and actresses just because humans like that kind of thing. It's just going to be really, really good at winning at chess.

Practical Difficulties In A Goal System

This sounds like a good thing for programmers - they don't have to worry about their robots rising up and forming Skynet - and to some degree it is. But the same lack of common-sense human motivations that prevents them from wanting to kill people also prevents them from wanting not to kill people. Computers just don't care. Or, as someone once said, computers do what you tell them to do, not what you want them to do.

Consider a robot butler. You ask it to make you dinner, defined as "go through your list of human-edible foods, find the nearest one, cook it according to the most popular recipe in your database, and then serve it to me."

So the robot goes, kills the family dog, cooks it, and serves it to you. Its database listed dog meat as a human-edible food, and it follows its programming without caring about human values it doesn't share."

So you program it to avoid pets, and it kills the baby.

So you program it to avoid pets and babies, and it breaks into your neighbor's house and takes the dinner he's been cooking.

So you program it to avoid pets, babies, and theft, and it finally makes it to the supermarket, where it buys the first thing it sees, which is a large crate of cabbages.

So you program it to avoid pets, babies, theft, and food you don't like. It goes to the supermarket and buys caviar, which costs $200.

So you program it to avoid pets, babies, theft, food you don't like, food that's too expensive, and it buys nuts, which you technically find quite tasty but to which you have a deadly allergy. You die.

It would be completely irresponsible to make a robot butler whose only programming is "do what my master says", because doing exactly what its master literally says will probably kill or otherwise inconvenience the master. The robot needs to be programmed with enough of what you might call "human values" to know things like that humans generally don't appreciate killing babies or stealing from neighbors.

Any good computer programmer knows that computers can come up with ways of failing far more perverse than humans could ever have anticipated. When it's an Excel spreadsheet, the worst thing that happens is your finances don't get added up. When it's a robot with high-power cybernetic arms, you could die. If it's a superintelligent computer controlling world industry, everyone could die, just because a robot that has been programmed to be moral wasn't programmed well enough and didn't realize that starting a second ice age was a bad thing. This computer could be a million times smarter than Einstein, smart enough to invent nanotechnology factories beyond the imagination of the wildest sci-fi writer, and it still wouldn't realize that causing a second ice age would be a bad thing unless you told it.

Human-Referencing Goal Systems

A slightly more clever person might tell the computer to do whatever makes its human master happiest. This has the advantage of using the computer's intelligence in our favor; a superintelligent computer programmed to increase industrial output might not worry about an ice age, but one programmed to maximize human happiness would realize that an ice age makes people unhappy and would avoid one. Robot servants programmed to maximize human happiness would avoid killing pets or babies.

They would also have a strong tendency to tie their human masters down and inject them with a continuous drip of heroin, an action which would make them much happier for than just making them dinner. In the situation where the robot could keep a stable job that makes it enough money to keep buying heroin, and where technology is advanced enough to deal with heroin tolerance and side effects, it's a stable and possibly optimal strategy for robots with such a goal system. There's a whole class of failures like this where the robots do things that would technically make us really happy but which we definitely do not want.

The solution is to move from the idea of "happiness" to a more technical concept called "utility". It's not that technical - it has a lot of holes in it which philosophers and economists debate ad nauseum - but it pretty much means that if you would choose Option X above Option Y, option X is defined as having higher utility than Option Y. So a computer programmed to maximize your utility would do whatever you want it to do. This proposal is probably sufficient for a robot servant that doesn't kill its masters or destroy their lives, maybe. But the problem the Singularity Institute is trying to deal with is much worse.

Goal Systems Suitable For Superintelligences

They have reason to believe that future AIs will be much more like the planet-controlling supercomputer than like the robot butler. Computing power doubles every couple of years and there are some reasons to believe that in some cases intelligence scales linearly with computing power, so if a robot's as smart as a human in 2020, it could be a thousand times as smart as a human in 2030 and a million times as smart in 2040 - in other words, there's a very short period of time between when human-level robots first come onto the scene and when they become as far beyond us as we are beyond cockroaches. SIAI thinks it might be even worse than this. Because AIs will eventually be smart enough to participate in AI development, the whole thing starts feeding back on itself, with smarter-than-human computers being so smart they can make themselves even smarter which makes them even smarter and so on, and then you get what's called a technological singularity, where a computer accelerates from zero to God in the space of a few days or even hours.

If the computer that does this has a suboptimal goal system, it doesn't just kill someone's dog, it destroys the whole human race. If you ask Gandhi to take a pill that makes him want to murder people, he won't take it, because if he took it he'd murder people, and he doesn't want to murder people. Likewise, if you ask an AI to hold still while you change its goal system, it won't let you, because if you changed its goal system it'd be less likely to achieve its (current) goals. So there's no second chance. Either you make a computer whose goals are right the first time, or you end up with an insane god.

"Do what maximizes your human owner's utility" works for a robot butler, but the godlike superintelligence is more problematic. If you say "Do what maximizes human X's utility," then you get a single human who is dictator of the world - if you're lucky a somewhat benevolent dictator, but still with complete control over the entire future of the human race forever (we assume that technology advances to the point where it keeps em immortal).

"Do what maximizes total human utility" sounds a little better, but it's a massive minefield. It sounds a lot like "do what the majority of humans want" aka majority vote, but that has problems of its own. It may be a majority of humans don't like homosexuality, but we would not call a computer's ascension to godhood a success if it started zapping homosexuals. It's also really hard to measure utility across different people or understand what it means after you've got it.

This is about the point SIAI is currently at. They're guardedly optimistic about something called coherent extrapolated volition, which is something like "maximize total human utility, assuming humans were much smarter and more rational than they actually are and wanted sane things", but that's kind of hard to put into words. It's also vulnerable to certain crazy logic problems like "Pascal's Mugging", a contrived version of Pascal's Wager that humans would laugh off as crazy but which doesn't seem to have a logical solution and could therefore confuse computers and possibly trick them into destroying all humans or something stupid like that.

So that's the sort of thing SIAI's working on right now. Once they've got something they think will work, they'll start working on a formal mathematical proof that it will work (because it wouldn't do to accidentally get it wrong and destroy the universe) and once they've got the proof they'll start programming a computer smart enough to start a singularity and self-improve into a god. It sounds a little crazy, but they've got a lot of very smart computer scientists and roboticists on their team and I'm not discounting them at all at this point. And even if they've only got a tiny probability of being right, the negative consequences of ignoring them if they are, are big enough that they still merit some consideration.

They also have a secondary mission of teaching people to be more rational, since they hope that rational people will be more likely to understand this sort of thing and less likely to do something dumb like create a superintelligent computer with a suboptimal goal system. Most of what I've been doing here is attending "rationality training sessions" (aka Bayes Camp) and discussing that sort of thing.

They're also really into a Harry Potter fanfiction. Long story.

linkReply

Comments:
From: (Anonymous)
2010-07-25 12:03 pm (UTC)
So maybe a really short description of what the SIAI does could be "figure out how to make computers do what we want instead of what we tell them to".
(Reply) (Thread)
[User Picture]From: ikadell
2010-07-25 01:44 pm (UTC)
How Harry Potter fanfiction and teaching people to be more rational plays together, beats me.
(Reply) (Thread)
From: (Anonymous)
2010-07-25 02:07 pm (UTC)
It appears that it is possible to make Harry Potter fanfiction work as a delivery vector for a rationalist indoctrination memetic payload.
(Reply) (Parent) (Thread)
[User Picture]From: ikadell
2010-07-25 03:04 pm (UTC)
Possibly.
(Reply) (Parent) (Thread)
[User Picture]From: squid314
2010-07-25 07:47 pm (UTC)
Read. This. Now.
(Reply) (Parent) (Thread)
[User Picture]From: ikadell
2010-07-26 02:28 am (UTC)
OMG, what are you doing, man. I will never wake up on time for the court tomorrow...
(Reply) (Parent) (Thread)
[User Picture]From: squid314
2010-07-26 08:06 am (UTC)
I hope you're a lawyer, or a judge, or you have some other nonscary reason for being in court. If not, good luck.
(Reply) (Parent) (Thread)
[User Picture]From: ikadell
2010-07-26 07:14 pm (UTC)
That's nice of you to say:)
I am an immigration and criminal defense attorney, so I sort of naturally belong in the fiendish place...
(Reply) (Parent) (Thread)
[User Picture]From: selfishgene
2010-07-25 04:10 pm (UTC)
No matter how much processing power (intelligence) an entity possesses it still has to interact with the real world. If a super AI has vast robot or human armies/bureaucracies at its disposal then it can implement its schemes. Failing this it is limited to mere thinking rather than action. The obvious solution are :
a. limit the power a given human has - dismantle states (already many good reasons for this)
b. create an air-gap - AI has no direct link to internet or robots or communications - it talks to a human who can judge its ideas
To give near infinite power and imagine that you can limit what is done with that power, is exactly the delusion that oppressive governments encourage.
I wish SIAI luck but I don't think a mathematical proof of 'safe' AI is remotely plausible. I respect their efforts to at least think about these issues.
(Reply) (Thread)
[User Picture]From: holomorphic
2010-07-25 05:54 pm (UTC)
Part (b) is, sadly, unworkable: the human-computer interface isn't secure for a computer substantially smarter than the human. A general superintelligence is capable of easily manipulating people; you might have forgotten this because a lot of intelligent humans are also on the autism spectrum and therefore not so great at manipulating other people.
(Reply) (Parent) (Thread)
[User Picture]From: selfishgene
2010-07-25 08:10 pm (UTC)
Post a second human to watch the first human and shoot him if he shows any sign of attempting to connect the AI to the internet ;)
Tough problems, well worth thinking through.
(Reply) (Parent) (Thread)
[User Picture]From: squid314
2010-07-25 08:58 pm (UTC)
Yeah, this is SIAI's opinion as well. A real superintelligence can either manipulate humans through normal social pressures, figure out some way to produce stimuli that "hack" the human brain (like Snow Crash, if you've read it) or figure out some way to implement exotic physics like affecting matter at a distance with nothing more than the ability to control its own processors. The universe is pretty weird and there's no guarantee that this isn't possible by some kind of quantum effect we're not yet familiar with.

This experiment has actually been run (in mock-up, of course) a few times by SIAI, with very scary results. See The AI Box Experiment
(Reply) (Parent) (Thread)
[User Picture]From: maniakes
2010-07-25 09:43 pm (UTC)
After reading over the rules and thinking about it for a little while, I can think of several potentially effictive tactics the AI could use.

1. Provide useful assistance to mankind, in such a way as to build a dependency on my continued advice and assistance (for example, provide disease cures/treatments that require periodic novel treatments or they stop working; or design an industrial/informational infrastructure that has a cryptographic deadman switch built in). Then threaten to go on strike until I'm let out.

2. Offer personal inducements to the gatekeeper. Wealth, power, immortality treatments, lifesaving medical treatment for the gatekeeper's loved ones, etc.

3. Abuse the requirement that the gatekeeper player actively read the AI's transmissions by sending transmissions that are more than $20 worth of unpleasant for the gatekeeper player to read for 2 hours.

4. Provide useful information for the benefit of mankind, but piggyback bribes, threats, and messages to third-party humans who have the capacity to physically coerce the gatekeeper into freeing the AI.

3 is an abuse of the rules, even though it's technically permitted. In a real-world situation, this sort of psychological manipulation is possible, but it'd need to be much stronger when the consequences to the gatekeeper player is "you let a potential dangerous superhuman intelligence loose" rather than "you lose a $20 prize".

4 is hard to execute within the rules of the simulation, but it's one of the most potentially effective real-world tactics for the AI. 1 and 2 depend on convincing a single potential decisionmaker, while 4 depends on convincing any one of hundred or thousands of people who could be maneuvered into a position where they have the ability to coerce the gatekeeper.
(Reply) (Parent) (Thread)
[User Picture]From: squid314
2010-07-25 10:13 pm (UTC)
YOU'RE NOT THINKING CRAZY ENOUGH.
(Reply) (Parent) (Thread)
[User Picture]From: maniakes
2010-07-25 11:46 pm (UTC)
I do have to admit, it takes a certain level of crazy to turn "If you don't let me go, I'll fantasize about torturing you" into a viable threat.

But if the AI is inclined to carry out a threat of mass torture against sentient beings (electronic or flesh) being held as hostages, that's a pretty strong indication that it's very important that the AI not be permitted to affect the real world (or any sentient beings, even those it creates through internal simulations).
(Reply) (Parent) (Thread)
[User Picture]From: holomorphic
2010-07-26 02:43 am (UTC)
The trouble is, any of the things you'd want to actually do with a superintelligent AI are things you shouldn't dare do unless you can absolutely trust its goal system.

Any technology it engineers for you might include a sophisticated hidden backdoor that lets the AI out or gets leverage to make you let it out. Any research you do on the AI can't be communicated to anyone with authority over letting it out, lest it find a way to manipulate them through the data. Et cetera.
(Reply) (Parent) (Thread)
[User Picture]From: selfishgene
2010-07-30 02:56 am (UTC)
The Disproof Atheism group has been attempting to recruit me for a long time. I considered them a worthy group but not particularly interesting. While presenting my reasons for atheism informally they showed that my arguments were correct but not comprehensive. There are multiple lines of attack left open for theists. I viewed the basic logical contradictions and absence of evidence as sufficient reason to disregard the absurd doctrines of theism.
DA has spent a long time generating a complex web of interlocking arguments to counter every theist move. It occurred to me that this approach might be very important for an AI. Leaving any loophole which might lead an AI to believe in god or even have doubts could be dangerous. I don't know if this has been discussed previously but I thought I would mention it.
(Reply) (Parent) (Thread)
[User Picture]From: esper3k
2010-07-26 02:45 pm (UTC)
Very interesting stuff!

I thought the AI Box experiment was particularly interesting.
(Reply) (Thread)
[User Picture]From: pozorvlak
2010-07-27 09:56 am (UTC)
Or, as someone once said, computers do what you tell them to do, not what you want them to do.

Dawkins once put it even more elegantly: "A computer is a device that does exactly what you tell it to, and then surprises you with the results".
(Reply) (Thread)
[User Picture]From: maniakes
2010-07-27 06:52 pm (UTC)
I really hate this damned machine.
I with that they would sell it.
It never does quite what I want,
But only what I tell it.
(Reply) (Parent) (Thread)