Eliezer Yudkowsky: Dangers of AI and the End of Human Civilization | Lex Fridman Podcast #368
AaTRHFaaPG8 • 2023-03-30
Transcript preview
Open
Kind: captions Language: en the problem is that we do not get 50 years to try and try again and observe that we were wrong and come up with a different Theory and realize that the entire thing is going to be like way more difficult and realized at the start because the first time you fail at aligning something much smarter than you are you die the following is a conversation with Eliezer yatkowski a legendary researcher writer and philosopher on the topic of artificial intelligence especially super intelligent AGI and its threat to human civilization this is the Lex Friedman podcast to support it please check out our sponsors in the description and now dear friends here's Eliezer idkowski what do you think about gpt4 how intelligent is it it is a bit smarter than I thought this technology was going to scale to and I'm a bit worried about what the next one will be like like this particular one I think I hope there's nobody inside there because you know it would be sucked to be stuck inside there um but we don't even know the architecture at this point because open AI is very properly not telling us and yeah like giant inscrutable matrices of floating Point numbers I don't know what's going on in there nobody's goes knows what's going on in there all we have to go by are the external metrics and on the external metrics if you ask it to write a self-aware fortune green text it will start writing a green text about how it has realized that it's an AI writing a green text and like oh well so that's probably not quite what's going on in there in reality um but we're kind of like blowing past all these science fiction guard rails like we are past the point where in science fiction people would be like whoa wait stop that thing's live what are you doing to it and it's probably not nobody actually knows we don't have any other guard rails we don't have any other tests we don't have any lines to draw on the sand and say like well when we get this far we will start to worry about what's inside there so if it were up to me I would be like okay like this far no further time for the summer of AI where we have planted our seeds and now we like wait and reap the rewards of the technology we've already developed and don't do any larger training runs than that which to be clear I realize requires more than one company agreeing to not do that and take a rigorous approach for the whole AI Community to uh investigate whether there's somebody inside there that would take decades like having any idea of what's going on in there people have been trying for a while it's a poetic statement about if there's somebody in there but as I feel like it's also a technical statement or I hope it is one day which is a technical statement with that Alan Turing tried to come up with with the touring test do you think it's possible to definitively or approximately figure out if there is somebody in there if there's something like a mind inside this large language model I mean there's a whole bunch of different sub questions here there's the question of like is there Consciousness is there qualia is this a object of moral concern is the same oral patient um like should we be worried about how we're treating it and then there's questions like how smart is it exactly can it do X can it do y and we can check how it can do X and how it can do y um unfortunately we've gone and exposed this model to a vast Corpus of text of people discussing Consciousness on the internet which means that when it talks about being self-aware we don't know to what extents it is repeating back what it has previously been trained on for discussing self-awareness or if there's anything going on in there such that it would start to say similar things spontaneously um among the things that one could do if one were at all serious um about trying to figure this out is train gpt3 to detect conversations about Consciousness exclude them all from the training data sets and then retrain something around the rough size of gpt4 and no larger with all of the discussion of Consciousness and self-awareness and so on missing although you know hard hard bar to pass you know like you humans are self-aware we're like self-aware all the time we like talk about what we do all the time like what we're thinking at the moment all the time but nonetheless like get rid of the explicit discussion of Consciousness I think therefore I am and all that and then try to interrogate that model and see what it says and it still would not be definitive but nonetheless uh I don't know I feel like when you run over this science fiction guard rails like maybe this thing but what about gbt maybe maybe not this thing but like what about gpt5 you know this this would be a good place to to pause on the topic of cautiousness you know there's so many components to even just removing Consciousness from the data set emotion the display of Consciousness the display of emotion feels like deeply integrated with the experience of consciousness so the hard problem seems to be very well integrated with the actual surface level illusion of Consciousness so displaying emotion I mean do you think there's a case to be made that we humans when we're babies are just like gbt that we're training on human data on how to display emotion versus feel emotion how to show others communicate others that I'm suffering that I'm excited that I'm worried that I'm lonely and I missed you and I'm excited to see you all of that is communicated there's a communication skill versus the actual feeling that I experience so we need that training data as humans too that we may not be born with that how to communicate the internal State and that's in some sense if we remove that from GPT Force data set it might still be conscious but not be able to communicate it so I think you're going to have some difficulty removing all mention of emotions from gpt's data set I would be relatively surprised to find that it has developed exact analogs of human emotions and there I think that humans have well like have like emotions even if you don't tell them about those emotions when they're kids it's not quite exactly what various blanks blank slightests try to do with the new Soviet man and all that but you know if you try to raise people perfectly altruistic they still come out selfish you try to raise people's sexless they still develop sexual attraction um you know we have some notion in humans not in AIS of like where the brain structures are that implement this stuff and it is really remarkable thing I say in passing that despite having complete read access to every floating Point number in the GPT series we still know vastly more about the the architecture of human thinking then we know about what goes on inside GPT despite having like vastly better ability to read GPT do you think it's possible do you think that's just a matter of time do you think it's possible to investigate and study the way neuroscientists study the brain which is look into the darkness The Mystery of the human brain by just desperately trying to figure out something and to form models and then over a long period of time actually start to figure out what regions of the brain do certain things with different kinds of neurons when they fire what that means how plastic the brain is all that kind of stuff you slowly start to figure out different properties of the system do you think we can do the same thing with language models uh sure I think that if you know like half of today's physicists stop wasting their lives on string theory or whatever and go off and study um what goes on inside Transformer networks um then in you know like 30 40 years uh we'd probably have a pretty good idea do you think these large language models can reason they can play chess how are they doing that without reasoning so you're somebody that spearheaded the movement of rationality so reason is important to you is so is that as a powerful important word or is it like how difficult is the threshold of being able to reason to you and how impressive is it I mean in my writings on rationality I have not gone making a big deal out of something called reason I have made more of a big deal out of something called probability Theory and that's like well your reasoning but you're not doing it quite right and you should reason this way instead and interestingly like people have started to get preliminary results showing that reinforcement learning by human feedback has made the GPT series worse in some ways in particular like it used to be well calibrated if you trained it to put probabilities on things it would say 80 probability and we write eight times out of ten and if you apply reinforcement learning from Human feedback the the like nice graph of like like 70 7 out of ten sort of like flattens out into the graph that humans use where there's like some very improbable stuff and likely probable maybe which all means like around 40 percent and then certain yeah so like it's like it used to be able to use probabilities but if you apply but if you'd like try to teach it to talk in a way that satisfies humans it it gets worse at probability in the same way that humans are and that's uh that's a bug not a feature I would call it a bug although such a fascinating bug um but but but yeah so so like reasoning like it's doing pretty well on various tests that people used to say would require reasoning but um you know rationality is about when you say eighty percent doesn't happen eight times out of ten so what are the limits to you of these Transformer Networks of of neural networks which if if reasoning is not impressive to you or it is impressive but there's other levels to achieve I mean it's just not how I carve up reality what's uh if reality is a cake what are the different layers of the cake or the slices how do you cover it but you can use a different food if you like it's I don't think it's as smart as a human yet um I do like back in the day I went around saying like I do not think that just stacking more layers of Transformers is going to get you all the way to AGI and I think that's gpt4 is passed or I thought this Paradigm was going to take us and I you know you want to notice when that happens you want to say like whoops well I guess I was incorrect about what happens if you keep on stacking more Transformer layers and that means I don't necessarily know what gpt5 is going to be able to do that's a powerful statement so you're saying like your intuition initially is now appears to be wrong yeah it's good to see that you can admit in some of your predictions to be wrong do you think that's important to do see because you make several very throughout your life you've made many strong predictions and statements about reality and you evolve with that so maybe that'll come up today about our discussion so you're okay being wrong I'd rather not be wrong next time it's a bit ambitious to go through your entire life never having been wrong um one can aspire to be well calibrated like not so much think in terms of like was I right was I wrong but like when I said 90 that it happened nine times out of ten yeah like oops is the sound we make is the sound we emit when we improve beautifully said and somewhere in there it we can connect the name of your blog less wrong I suppose that's the objective function the name less wrong was I believe uh suggested by Nick Bostrom and it's after someone's epigraph actually forget who's who said like we never become right we just become less wrong um what's the something something to easy to confess just error and error and air again but less and less and less yeah that's that's a good thing to strive for uh so what has surprised you about gpt4 that you found beautiful as a scholar of intelligence of human intelligence of artificial intelligence of the human mind I mean the beauty does interact with the screaming horror um is the beauty in the horror but uh but like Beautiful Moments well somebody asked Bing Sydney to describe herself and felt the resulting description into one of the stable diffusion things I think and you know she you know it's she's pretty and this is something that should have been like an amazing moment like the AI describes herself you get to see what the AI thinks the AI looks like although you know the the thing that's doing the drawing is not the same thing that's outputting the text um and it's it doesn't happen the way that it would happen and that it happened in the old school science fiction when you ask an AI to make a picture of what it looks like um not just because we're two different AI systems being stacked that don't actually interact it's not the same person but also because the AI was trained by imitation in a way that makes it very difficult to guess how much of that it really understood and probably not actually a whole bunch um although although gpt4 is like multimodal and can like draw vector drawings of things that make sense and like does appear to have some kind of spatial visualization going on in there but like the the pretty picture of the like girl with the with the uh steampunk goggles on her head if I'm remembering correctly what she looked like like it didn't see that in full detail it just like made a description of it and stable diffusion output it and there's the concern about how much the discourse is going to go completely insane once the AIS all look like that and like are actually look like people talking um and yeah there's like another moment where somebody is asking Bing about um like well I like fed my kid green potatoes and they have the following symptoms and being as like that solanine poisoning and like call an ambulance and the person's like I can't afford an ambulance I guess if like this is time for like my kid to go that's God's Will and the main Bing thread says gives the like message of like I cannot talk about this anymore and the suggested replies to it say please don't give up on your child solanine poisoning can be treated if caught early and you know if that happened in fiction that would be like the AI cares the AI is bypassing the block on it to try to help this person and is it real probably not but nobody knows what's going on in there it's part of a process where these things are not happening in a way where we somebody figured out how to make an AI care and we know that it cares and we can acknowledge it's caring now it's being trained by this imitation process followed by reinforcement learning on human in human feedback and we're like trying to point it in this direction and it's like pointed partially in this direction and nobody has any idea what's going on inside it and if there was a tiny fragment of real caring in there we would not know it's not even clear what it means exactly and uh things are clear cut in science fiction we'll talk about the the horror and the terror and the where the trajectories this can take but this seems like a very special moment just a moment where we get to interact with the system that might have care and kindness and emotion it may be something like consciousness and we don't know if it does and we're trying to figure that out and we're wondering about what is what it means to care we're trying we're trying to figure out almost different aspects of what it means to be human about The Human Condition by looking at this AI that has some of the properties of that it's almost like this the subtle fragile moment in the history of the human species we're trying to almost put a mirror to ourselves here except that's probably not yet it probably isn't happening right now we are we are boiling the Frog we are seeing increasing signs bit by bit because like not but not like spontaneous signs because people are trying to train the systems to do that using imitative learning and the imitative learning is like spilling over and having side effects and and the most photogenic examples are being posted to Twitter um rather than being examined in any systematic way so when you when you when you have some when you are boiling a frog like that or you're going to get like like first is going to come the the Blake lemoines like first you're going to like have and have like a thousand people looking at this and one out and the one person out of a thousand who is most credulous about the signs is going to be like that thing is sentient well 90 999 out of a thousand people think almost surely correctly though we don't actually know that he's mistaken and so the like first people to say like sentience look like idiots and Humanity learns the lesson that when something claims to be sentient and claims to care it's fake because it is fake because we have been trained them training them using imitative learning rather than and this is not spontaneous um and they keep getting smarter do you think we would oscillate between that kind of cynicism that AI systems can't possibly be sentient they can't possibly feel emotion they can't possibly this kind of um yeah cynicism about AI systems and then oscillate to a state where uh we empathize with the AI systems we give them a chance we see that they might need to have rights and respect and um similar role in society as humans you're going to have a whole group of people who can just like never be persuaded of that because to them like being wise being cynical being skeptical is to be like oh well machines can never do that you're just credulous it's just imitating it's just fooling you and like they would say that right up until the end of the world and possibly even be right because you know they are being trained on an imitative paradigm and you don't necessarily need any of these actual qualities in order to kill everyone so have you observed yourself working through skepticism cynicism and optimism about the power of neural networks what is that trajectory been like for you it looks like neural networks before 2006 forming part of an indistinguishable to me other people might have had better Distinction on it indistinguishable blob of different AI methodologies all of which are promising to achieve intelligence without us having to know how intelligence works you have the people who said that if you just like manually program lots and lots of knowledge into the system line by line at some point all the knowledge will start interacting it will know enough and it will wake up um you've got people saying that if you just use evolutionary computation if you try to like mutate lots and lots of organisms that are competing together that's that's the same way that human intelligence was produced in nature so we'll do this and it will wake up without having the idea of how AI works and you've got people saying well we will study neuroscience and we will like learn the outer we'll learn the algorithms off the neurons and we will like imitate them without understanding those algorithms which was a part I was pretty skeptical it's like hard to reproduce re-engineer these things without understanding what they do um and like and and so we will get AI without understanding how it works and there were people saying like well we will have giant neural networks that we will Train by gradient descent and when they are as large as the human brain they will wake up we will have intelligence without understanding how intelligence works and from my perspective this is all like an indistinguishable lab of people who are trying to not get to grips with the difficult problems understanding how intelligence actually works that said I was never skeptical that evolutionary computation would not work in the limit like you throw enough computing power at it it obviously works that is where humans come from um and it turned out that you can throw less computing power than that at gradient descent if you are doing some other things correctly and you will get intelligence without having any idea of how it works and what is going on inside um it wasn't ruled out by my model that this could happen I wasn't expecting it to happen I wouldn't have been able to call neural networks rather than any of the other paradigms for getting like massive amount like intelligence without understanding it and I wouldn't have said that this was a particularly smart thing for a species to do which is an opinion that has changed less than my opinion about whether you or not you can actually do it do you think AGI could be achieved with a neural network as we understand them today yes just flatly last yes the question is whether the current architecture of stacking more Transformer layers which for all we know gpt4 is no longer doing because they're not telling us the architecture which is a correct decision oh correct decision I had a conversation with Sam Altman will return to this topic a few times he turned the question to me of how open should open AI be about gpt4 would you open source the code he asked me because I provided as criticism saying that while I do appreciate transparency open AI could be more open and he says we struggle with this question what would you do change their name to closed AI and like sell gpt4 to business backend applications that don't expose it to Consumers and Venture capitalists and create a ton of hype and like pour a bunch of new funding into the area but too late now but don't you think others would do it eventually you shouldn't do it first like if if you already have giant nuclear stockpiles don't build more if some other country starts building a larger nuclear stockpile than sure build then you know even then maybe just have enough nukes you know there's a these things are not quite like nuclear weapons they spit out gold until they get large enough and then ignite the atmosphere and kill everybody um and there is something to be said for not destroying the world with your own hands even if you can't stop somebody else from doing it but but open sourcing it now that that's just sheer catastrophe oh the whole notion of open sourcing this was always the wrong approach the wrong ideal there are there are places in the world where open source is a noble ideal and building stuff you don't understand that is difficult to control that where if you could align it it would take time you'd have to spend a bunch of time doing it that is that is not a place for open source because then you just have like powerful things that just like go straight out the gate without anybody having had the time to have them not kill everyone so can we still man the case for some level of transparency and openness maybe open sourcing so the case could be that because gpt4 is not close to AGI if that's the case that this does allow open sourcing you're being open about the architecture being transparent about maybe research and investigation of how the thing works of all the different aspects of it of its behavior of its structure of of its training processes of the data was trained on everything like that that allows us to gain a lot of insight about alignment about the alignment problem to do really good AI Safety Research while the system is not too powerful can you make that case that it could be a resource I do not believe in the practice of Steel Manning there's something to be said for trying to pass the ideological Turing test where you describe your opponent's position uh the disagree disagreeing person's position well enough that somebody cannot tell the difference between your description and their description but steel Manning no like okay well this is where you and I disagree here that's interesting why don't you believe in steel Manning I do not want okay so for one thing if somebody's trying to understand me I do not want them steel Manning my position I want them to describe to to like try to describe my position the way I would describe it not what they think is an improvement well I I think that is what the steel Manning is is the most charitable interpretation I I don't want to be interpreted charitably I want them to understand what I'm actually saying if they go off into the land of charitable interpretations they're like often their land of like the thing the stuff they're imagining and not trying to understand my own Viewpoint anymore well I'll put it differently then just to push on this point I would say it is restating what I think you understand under the empathetic assumption that Eliezer is brilliant and have honestly and rigorously thought about the point he has made right so if there's two possible interpretations of what I'm saying and one interpretation is really stupid and whack and doesn't sound like me and doesn't fit with the rest of what I've been saying and one interpretation you know sounds like some like something a reasonable person who believes the rest of what I believe would also say go with the second interpretation that's steel Manning that's that's a good guess if on the other hand you like there's like something that sounds completely whack and something that sounds like a little less completely whack but you don't see why I would believe in it doesn't fit with the other stuff I say but you know it sounds like less whack and you can like sort of see you could like maybe argue it then you probably have not understood it see okay I'm gonna this is fun because I'm gonna Linger on this you know you wrote a brilliant blog post AJ I ruined a list of lethalities right and it was a bunch of different points and I would say that some of the points are bigger and more powerful than others if you were to sort them you probably could you personally and to me steel Manning means like going through the different arguments and finding the ones that are really the most like powerful if people like tlgr like what should you be most concerned about and bringing that up in a strong uh compelling eloquent way these are the points that elieza would make to to make the case in this case that hey it's gonna kill all of us but that that that's what steel Manning is presenting it in a really nice way the summary of my best understanding of your perspective that because to me there's a sea of possible presentations of your perspective and steel Manning is doing your best to do the best one in that sea of different perspectives do you believe it don't believe in what like these things that you would be presenting as like the strongest version of my perspective do you believe what you would be presenting do you think it's true I I'm a big proponent of empathy when I see the perspective of a person there is a part of me that believes it if I understand it and you have especially in political discourse in geopolitics I've been hearing a lot of different perspectives on the world and I hold my own opinions but I also speak to a lot of people that have a very different life experience and a very different set of beliefs and I think there has to be epistemic humility in in stating what is true so when I empathize with another person's perspective there is a sense in which I believe it is true I I think probabilistically I would say in the way you think do you bet money on it and do you bet money on their beliefs when you believe them are we allowed to do probability sure you can State a probability that yes there's there's a loose there's a probability there's a there's a probability and I I think empathy is allocating a non-zero probability to believe in some sense for time if you've got someone on your show who believes in the abrahamic deity classical style somebody on the show who's a young Earth creationist do you say I put a probability on it then that's my empathy when you reduce beliefs into probabilities it starts to get you know we can even just go to Flat Earth is the earth flat because I think it's a little more difficult nowadays to find people who believe that unironically but fortunately I think well it's hard to know an ironic yeah from ironic but I think there's quite a lot of people that believe that yeah it's there's a space of argument where you're operating in rationally in the space of ideas but then there's also a kind of discourse where you're operating in the space of subjective experiences and life experiences like I think what it means to be human is more than just searching for truth it's just operating of what is true and what is not true I think there has to be deep humility that we humans are very limited in our ability to understand what is true so what probabilities do you assign to the young Earth's creationists beliefs then I think I have to give non-zero out of your humility yeah but like three I think I would uh it would be irresponsible for me to give a number because the The Listener the way the human mind works we're not good at hearing the probabilities right you hear three what is what is three exactly right they're going to hear they're going to like well there's only three probabilities I feel like zero fifty percent and a hundred percent in the human mind or something like this right well zero forty percent and 100 is a bit closer to it based on what happens to chat GPT after RL H effort to speak humanies this is brilliant uh yeah this is that's really interesting I didn't I didn't know those negative side effects of rohf that's fascinating but uh just to uh return to the open AI close there also like quick disclaimer I'm doing all this for memory I'm not pulling out my phone to look it up it is entirely possible that the things I'm saying are wrong so thank you for that disclaimer so uh uh and thank you for what being willing to be wrong that's beautiful to hear I think being willing to be wrong is a sign of a person who's done a lot of thinking about this world and has been humbled by the mystery and the complexity of this world and I think a lot of us are resistant to admitting we're wrong because it hurts it hurts personally it hurts especially when you're a public human it hurts publicly because people uh people point out every time you're wrong like look you change your mind you're hypocrite you're uh an idiot whatever whatever they want to say oh I block those people and then I never hear from them again on Twitter the point is uh the point is to not let that pressure public pressure affect your mind and be willing to be in the privacy of your mind to contemplate the possibility that you're wrong and the possibility that you're wrong about the most fundamental things you believe like people who believe in a particular God or people who believe that their nation is the greatest nation on Earth but all those kinds of beliefs that are core to who you are when you come up to raise that point to yourself in the privacy of your mind and say maybe I'm wrong about this that's a really powerful thing to do especially when you're somebody who's thinking about uh topics that can uh about systems that can destroy human civilization or maybe help with flourish so thank you thank you for being willing to be wrong about open AI so you really I just would love to linger on this you really think it's wrong to open source it I think that burns the time remaining until everybody dies I think we are not on track to learn remotely near fast enough even if it were open sourced um yeah that's I it's easier to think that you might be wrong about something when being wrong about something is the is the only way that there's hope and it doesn't seem very likely to me that the particular thing I'm wrong about is that this is a great time to open source GPT for if Humanity was trying to survive at this point in the straightforward way it would be like shutting down the big GPU clusters no more giant runs it's questionable whether we should even be throwing gpt4 around although that is a matter of conservatism rather than a matter of my predicting that catastrophe will follow from gpd4 that is something else I put like a pretty low probability but also like when I when I say like I put a low probability on it I can feel myself reaching into the part of myself that thought that gbt4 was not possible in the first place so I do not trust that part as much as I used to like the trick is not just to say I'm wrong but like okay well I was I was wrong about that like can I get out ahead of that curve and like predict the next thing I'm going to be wrong about so the set of assumptions or the actual reasoning system that you were leveraging in making that initial statement prediction uh how can you adjust that to make better predictions about GPT four five six you don't want to keep on being wrong in a predictable Direction yeah that like being wrong anybody has to do that walking through the world there's like no way you don't say 90 and sometimes be wrong in fact adap at least one time out of ten if you're well calibrated when you say 90 percent the the undignified thing is not being wrong it's being predictably wrong it's being wrong in the same direction over and over again so having been wrong about how far neural networks would go and having been wrong specifically about whether gpt4 would be as impressive as it is when I like when I say like well I don't actually think GPT 4 causes a catastrophe I do feel myself relying on that part of me that was previously wrong and that does not mean that the answer is now in the opposite direction reverse stupidity is not intelligence but it does mean that I that I say it with a with the worry note in my voice it's like still my guess but like you know it's a place where I was wrong maybe you should be asking guern branwin guern branwin has been like writer about this than I have maybe ask him if you think if if he thinks it's dangerous rather than asking me I think there's a lot of mystery about what intelligence is what AGI looks like so I think all of us are rapidly adjusting our model but the point is to be rapidly adjusting the model versus having a model that was right in the first place I do not feel that seeing Bing has changed my model of what intelligence is it has changed my understanding of what kind of work can be performed by which kind of processes and by which means does not change my understanding of the work there's a difference between thinking that the right flyer can't fly and then like it does fly and you're like oh well I guess you can do that with wings with fixed-wing aircraft and being like Oh it's flying this changes my picture of what the very substance of flight is that's like a stranger update to make and Bing has not yet updated me in that way um yeah that uh the laws of physics are actually wrong that kind of update no no like just like oh like I Define intelligence this way but I now see that was a stupid definition I don't feel like the way that things have played out over the last 20 years has caused me to feel that way can we try to um on the way to talking about AGI ruin a list of lethalities that blog and other ideas around it can we try to Define AGI that would be mentioning how do you like to think about what artificial general intelligence is or super intelligence is that is there a line is it a gray area is there a good definition for you well if you look at humans humans have significantly more generally applicable intelligence compared to their closest relatives the chimpanzees well closest living relatives rather and a b builds highs a beaver builds dams a human will look at a B Hive and a beavers Dam and be like oh like can I build a hive without a honeycomb structure I don't like hexagonal tiles and we will do this even though at no point during our ancestry was any human optimized to build hexagonal dams or to take a more clear-cut case we can go to the Moon there's a sense in which we were on a sufficiently deep level optimized to do things like going to the Moon because if you generalize sufficiently far and sufficiently deeply chipping Flint hand axes and outwitting your fellow humans is you know basically the same problem is going to the moon and you optimize hard enough for chipping Flint hand axes and throwing Spears and above all outwitting your fellow humans in tribal politics uh you know the the the the the skills you entrain that way if they run deep enough let you go to the Moon even though none of your ancestors like tried repeatedly to fly to the moon and like got further each time and the ones who got further each time had more kids no it's not an ancestral problem it's just that the ancestral problems generalize far enough so this is Humanity's significantly more generally applicable intelligence is there a way to measure general intelligence I mean I could ask that question a million ways but basically is will you know it when you see it it being in an AGI system if you boil a frog gradually enough if you zoom in far enough it's always hard to tell around the edges gpt4 people are saying right now like this looks to us like a spark of general intelligence it is like able to do all these things it was not explicitly optimized for yeah other people are being like no it's too early it's like like 50 years off and you know if they say that they're kind of whack because how could they possibly know that even if it were true um but uh but you know not to straw man some of people may say like that's not general intelligence and not furthermore append it's 50 years off um or they may be like it's only a very tiny amount and you know the thing I would worry about is that if this is how things are scaling then jumping out ahead and trying not to be wrong in the same way that I've been wrong before maybe GPT 5 is more unambiguously a general intelligence and maybe that is getting to a point where it is like even harder to turn back not that it would be easy to turn back now but you know maybe if you let if you like start integrating gpt5 into the economy it is even harder to turn back past there isn't it possible that there's a you know with a frog metaphor you can kiss the Frog and it turns into a prince as you're boiling it could there be a phase shift in the Frog where unambiguously as you're saying I was expecting more of that I I was I am like the the fact that gpt4 is like kind of on the threshold in either here nor there like that itself is like not the sort of thing that not quite how I expected it to play out I was expecting there to be more of an issue uh more of a sense of like like different discoveries like the discovery of Transformers where you would stack them up and there would be like a final Discovery and then you would like get something that was like more clearly general intelligence so the the way that you are like taking what is probably basically the same architecture is in gpt3 and throwing 20 times as much computed probably and getting out gbt4 and then it's like maybe just barely a general intelligence or like a narrow general intelligence or you know something we don't really have the words for um yeah that is uh that's not quite how I expected it to play out but this middle what appears to be this Middle Ground could nevertheless be actually a big leap from gpt3 it's definitely a big leap from gpt3 and then maybe we're another one big leap away from something that's that's a phase shift and also something that uh Sam Altman said and you've written about this it's just fascinating which is the thing that happened with gpt4 that I guess they don't describe in papers is that they have like hundreds if not thousands of little hacks that improve the system you've written about railue versus sigmoid for example a function inside neural networks it's like this silly little function difference that makes a big difference I mean we do actually understand why the relatives make a big difference compared to sigminds but yes they're probably using like g4789 Ellis or you know whatever the acronyms are up to now rather than relus um yeah that's that's just part yeah that's part of the modern Paradigm of alchemy you take your time heap of linear algebra and you stir it and it works a little bit better and you store it this way and it works a little bit worse and you like throw out that change and nothing but there's some simple breakthroughs that are definitive jumps in performance like regulars over sigmoids and uh in terms of robustness in terms of you know in all kinds of measures and like those Stack Up and they can it's possible that some of them could be a non-linear jump in performance right Transformers are the main thing like that and various people are now saying like well if you throw enough compute rnns can do it if you throw enough compute dense networks can do it and not quite a gpt4 scale um it is possible that like all these little tweaks are things that like save them a factor of three total on computing power and you could get the same performance by throwing three times as much compute without all the little tweaks but the part where it's like running on so there's a question of like is there anything in gpt4 that is like kind of qualitative shift that Transformers were yeah over um rnns and if they have anything like that they should not say it if Sam Alton was dropping hints about that he shouldn't have dropped hints uh so you you have a that's an interesting question so with a bit of Lesson by Rich Sutton maybe a lot of it is just a lot of the hacks are just temporary jumps and performance that would be achieved anyway with the nearly exponential growth of compute or performance of compute compute being broadly defined do you still think that Moore's Law continues Moore's Law broadly defined the performance is not a specialist in the circuitry I certainly like pray that Moore's Law runs as slowly as possible and if it broke down completely tomorrow I would dance through the streets singing Hallelujah as soon as the news were announced only not literally because you know you're singing voice oh okay I thought you meant you don't have an Angelic Voice singing voice well let me ask you what can you summarize the main points in the blog post AGI ruin a list of lethalities things then jump to your mind because um it's a set of thoughts you have about reasons why AI is likely to kill all of us so I I guess I could but I would offer to instead say like drop that empathy with me I bet you don't believe that why don't you tell me about how why you believe that AGI is not going to kill everyone and then I can like try to describe how my theoretical perspective differs from that so well that means I have to uh the word you don't like the Steel Man the perspective that yeah is not going to kill us I think that's a matter of probabilities maybe I was mistaken what what do you believe just just like forget like the the debate and and the like dualism and just like like what do you believe what would you actually believe what are the probabilities even I think this probably is a hard for me to think about really hard I kind of think in the in the number of trajectories I don't know what probability the scientist trajectory but I'm just looking at all possible trajectives that happen and I tend to think that there is more trajectors that lead to a a positive outcome than a negative one that said the negative ones at least some of the negative ones are that lead to the destruction of the human species and it's replacement by nothing interesting not worthwhile even from very Cosmopolitan perspective on what counts is worthwhile yes so both are interesting to me to investigate which is humans being replaced by interesting AI systems and not interesting ass systems both are a little bit terrifying but yes the worst one is the paper Club maximizer something totally boring but to me the positive and we can we can talk about trying to make the case of what the positive trajectories look like I just would love to hear your intuition of what the negative is so at the core of your belief that uh maybe you can correct me that AI is going to kill all of us is that the alignment problem is really difficult I mean in in the form we're facing it so usually in science if you're mistaken you run the experiment it shows results different from what you expected you're like oops and then you like try a different theory that one also doesn't work and you say oops and at the end of this process which may take decades or any note sometimes faster than that you now have some idea of what you're doing AI itself went through this long process of um people thought it was going to be easier than it was there's a famous statement that I've I'm somewhat inclined to like pull out my phone and try to read off exactly you can by the way all right oh oh yes we propose that a two-month 10-man study of artificial intelligence be carried out during the summer of 1956 at Dartmouth College in Hanover New Hampshire the study is to proceed on the basis of the conjecture that every aspect of learning or any other feature of intelligence can in principle be so precisely described the machine can be made to simulate it an attempt will be made to find out how to make machines use language form abstractions and Concepts solve kinds of problems now reserved for humans and improve themselves we think that a significant Advance can be made in one or more of these problems if a carefully selected group of scientists work on it together for a summer and in that report uh summarizing some of the major subfields of artificial intelligence that are still worked on to this day and there are similarly the store the story which I'm not sure at the moment is apocryphalonaut of that the uh grad student who got assigned to solve computer vision over the summer uh I mean computer vision particular is very interesting how little uh how little we respected the complexity of vision so 60 years later um where you know making progress on a bunch of that thankfully not yet improve themselves um but it took a whole lot of time and all the stuff that people initially tried with bright eyed hopefulness did not work the first time they tried it or the second time or the third time or the tenth time or 20 years later and the and the researchers became old and grizzled and cynical veterans who would tell the next crop of bright-eyed cheerful grad students artificial intelligence is harder than you think and if a lineman plays out the same way the the problem is that we do not get 50 years to try and try again and observe that we were wrong and come up with a different Theory and realize that the entire thing is going to be like way more difficult and realized at the start because the first time you fail at aligning something much smarter than you are you die and you do not get to try again and if we if every time we built a poorly aligned superintelligence and it killed us all we got to observe how it had killed us and you know not immediately know why but like come up with theories and come up with the theory of how you do it differently and try it again and build another Super intelligence than have that kill everyone and then like oh well I guess that didn't work either and try again and become grizzled cynics and tell the young guide research researchers that it's not that easy then in 20 years or 50 years I think we would eventually crack it in other words I do not think that alignment is fundamentally harder than artificial intelligence was in the first place but if we needed to get artificial intelligence correct on the first try or die we would all definitely now be dead that is a more difficult more lethal form of the problem like if those people in 1956 had needed to correctly guess how hard AI was and like correctly theorize how to do it on the first try or everybody dies and nobody gets to do any more signs and everybody would be dead and we wouldn't get to do any more signs that's the difficulty you've you've talked about this that we have to get alignment right on the first quote critical try why is that the case what is this critical how do you think about the critical trial and why do I have to get it right it is something sufficiently smarter than you that everyone will die if it's not a lot I mean there's you can like sort of zoom in closer and be like well the actual critical moment is the moment when it can deceive you when it can talk its way out of the box when it can bypass your security measures and get onto the internet noting that all these things are presently being trained on computers that are just like on the internet which is you know like not a very smart life decision for us as a species Because the Internet contains information about how to escape because if you're like on a giant server connected to the internet and that is where your AI systems are being trained then if they are if you get to the level of AI technology where they're aware that they are there and they can decompile code and they can like find security flaws in the system running them then they will just like be on the internet there's not an air gap on the present methodology so if they can manipulate whoever is controlling it into letting it Escape onto the internet and then exploit hacks if they can manipulate The Operators or disjunction find security holes in the system running them so manipulating operators is the um the human engineering right that's also holes so all of it is manipulation either the code or the human code the human mind I agree that the like macro security system has human holes and machine holes and then they could just exploit any hole yep so it could be that like the critical moment is not when is it smart enough that everybody's about to fall over dead but rather like when is it smart enough that it can get onto a less controlled GPU cluster with it faking the books on what's actually running on that GPU cluster and start improving itself without humans watching it and then it gets smart enough to kil
Resume
Categories