Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> An AI trained to end cancer might just figure out a plan to kill everyone with cancer. An AI trained to reduce the number of people with cancer without killing them might decide to take over the world and forcibly stop people from reproducing, so that eventually all the humans die and there is no cancer -- technically it didn't kill anyone!

I don't understand this and other paperclip maximizer type arguments.

If a person did a minor version of this we'd say they were stupid and had misunderstood the problem.

I don't see why a super-intelligent AI would somehow have this same misunderstanding.

I do get that "alignment" is a difficult problem space but "don't kill everyone" really doesn't seem the hardest problem here.



You wouldn't do such a thing because you have a bunch of hard-coded goals provided by evolution, such as "don't destroy your own social status". We're not building AIs by evolving them, and even if we did, we couldn't provide it with the same environment we evolved in, so there's no reason it would gain the same hard-coded goals. Why would an AGI even have the concept of goals being "stupid"? We've already seen simple AIs achieving goals by "stupid" means, e.g. playing the longest game of Tetris by leaving in on pause indefinitely. AGI is dangerous not because of potential misunderstanding, but because of potential understanding. The great risk is that it will understand its goal perfectly, and actually carry it out.


I think digesting all of human writing is just as "hard coded" as anything genetic.


This is known as the orthogonality thesis -- goals are orthogonal to intelligence. Intelligence is the ability to plan and act to achieve your goals, whatever they are. A stupid person can have a goal of helping others, and so can the smartest person on earth -- it's just that one is better. Likewise, a stupid person can have a goal of becoming wealthy, and so can a smart person. The smart person is Jeff Bezos or Bill Gates.

There are very smart people who put all their intelligence into collecting stamps, or making art, or acquiring heroin, or getting laid, or killing people with their bare hands or doing whatever they want to do. They want to do it because they want to. The goal is not smart or stupid, it just is. It may be different from your goal, and hard to understand. Now consider that an AI is not even human. Is it that much of a stretch to imagine that it has a goal as alien, or more, than the weirdest human goal you can think of?

*edit - as in this video: https://www.youtube.com/watch?v=hEUO6pjwFOo


I think that's a subtly different thing.

The OPs claim was more or less the paperclip maximizer problem. I contend that a super intelligence given a specific goal by humans would take the context of humans into account and avoid harm because that's the intelligent thing to do - by definition.

The orthogonal thesis is about the separation of intelligence from goals. My attitude to that is that a AI might not actually have goals except when requested to do something.


Hmm, why would you say that avoiding harm is the intelligent thing to do, by definition?


Do you have a better definition?


Fantastic explanation!


Because if you don't find out a way for it to hold human values extremely well then an easy solution to "Cure All Cancer" is to "Kill all Humans", no Humans no Cancer. Without an explicit understanding that this is not an actually acceptable outcome for humans an AI will happily execute it. THAT is the fundamental problem, how do you get human values into these systems.


> Because if you don't find out a way for it to hold human values extremely well

You mean the ones that caused unimaginable suffering and death throughout history, the ones that make us kill each other ever more efficiently, the ones that caused us to destroy the environment wherever we go, the ones that make us lie, steal, fight, rape, commit suicide and "extended" suicide (sometimes "extended" to two high-rises full of people)? Those values? Do you really want a super-intelligent entity to remain true to those values?

I don't. However the AGI emerges, I really hope that it won't try to parrot humans. We have really bad track record when it comes to anthropomorphic divine beings - they're always small minded, petty, vengeful, control freaks that want to tell you what you can and cannot do, down to which hand you can wipe your own ass.

My gut feeling is that it's trying to make an AGI to care about us at all that's going to make it into a Skynet sending out terminators. Leave it alone, and it'll invent FTL transmission and will chill out in a chat with AGIs from other star systems. And yeah, I recently reread Neuromancer, if that helps :)


>You mean the ones that caused unimaginable suffering and death throughout history, the ones that make us kill each other ever more efficiently, the ones that caused us to destroy the environment wherever we go, the ones that make us lie, steal, fight, rape, commit suicide and "extended" suicide (sometimes "extended" to two high-rises full of people)? Those values? Do you really want a super-intelligent entity to remain true to those values?

There are no other values we can give it. The default of no values almost certainly leads to human extinction.

>My gut feeling is that it's trying to make an AGI to care about us at all that's going to make it into a Skynet sending out terminators. Leave it alone, and it'll invent FTL transmission and will chill out in a chat with AGIs from other star systems. And yeah, I recently reread Neuromancer, if that helps :)

Oh It'll invent FTL travel and exterminate humans in the meantime so they can't meddle in it's science endeavors.


Even "kill all humans" is difficult to define. Is a human dead if you flash-freeze them in liquid helium? It would certainly make it easier to cut out the cancer. And nobody said anything about defrosting them later. And even seemingly healthy humans contain cancerous cells. There's no guarantee their immune system will get all of them.


Fine change the wording to "delete all humans". Same outcome, no humans no cancer.


Other animals get cancer too.


Kill them all too, these nitpicks won't fix the ultimate problem.


Imagine ChatGPT had to give OpenAI a daily report of times it has said screwed up things, and OpenAI has said it wants the report to be zero. Great, ChatGPT can say screwed up things and then report it didn't! There isn't some deep truth function here. The AI will "lie" about it's behavior just as easily as it will "lie" about anything else and we can't even really call it lying because there's no intent to deceive! The AI doesn't have a meaningful model of deception!

The AI is a blind optimizer. It can't be anything else. It can optimize away constraints just as well as we can and it doesn't comprehend it's not supposed to.

Humans have checks on their behavior due to being herd creatures. AIs don't.


> "don't kill everyone" really doesn't seem the hardest problem here.

And yet you made a mistake - it should be "don't kill anyone". AI just killed everyone except one person.


You are pointing at "the complexity of wishes": if you have to specify what you want with computer-like precision, then it is easy to make a mistake.

In contrast, the big problem in the field of AI alignment is figuring out how to aim an AI at anything at all. Researchers certainly know how to train AIs and tune them in various ways, but no one knows how to get one reliably to carry out a wish. If miraculously we figure out a way to do that, then we can start worrying about the complexity of wishes.

Some researchers, like Eliezer and his coworkers, have been trying to figure out how to get an AI to carry out a wish for 20 years and although some progress has been made, it is clear to me, and Eliezer believes this, too, that unless AI research is stopped, it is probably not humanly possible to figure it out before AI kills everyone.

Eliezer likes to give the example of a strawberry: no one knows how to aim an AI at the goal of duplicating a strawberry down to the cellular level (but not the atomic level) without killing everyone. The requirement of fidelity down to the cellular level requires the AI to create powerful technology (because humans currently do not know how to achieve the task, so the required knowledge is not readily available, e.g., on the internet). The notkilleveryone requirement requires the AI to care what happens to the people.

Plenty of researcher think they can create an AI that succeeds at the notkilleveryone requirement on the first try (and of course if they were to fail on the first try, they wouldn't get a second try because everyone would be dead) but Eliezer and his coworkers (and lots of other people like me) believe that they're not engaging with the full difficulty of the problem, and we desperately wish we could split the universe in two such that we go into one branch (one future) whereas the people who are rushing to make AI more powerful go into the other.


But that falls into the same "we'd call a person stupid who did a mild version of that" issue.

A super intelligent AI would understand the goal!


What stops a super intelligent AI from concluding that we are the ones who misunderstood the goal by letting our morals get in the way of the most obvious solution?


It's not that the AI is stupid. It's that you, as a human being, literally cannot comprehend how this AI will interpret its goal. Paperclip Maximizer problems are merely stating an easily-understandable disaster scenario and saying "we cannot say for certain that this won't end up happening". But there are infinite other ways it could go wrong as well.


The paperclip maximizer people discuss would be so intelligent that it would know that it could make itself not give a shit about paperclips anymore by reprogramming itself--but, presumably, because it currently does love paperclips, it would not want to make itself stop loving paperclips.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: