François Chollet: Measures of Intelligence | Lex Fridman Podcast #120
PUAdj3w3wO4 • 2020-08-31
Transcript preview
Open
Kind: captions Language: en the following is a conversation with francois chalet his second time in the podcast he's both a world-class engineer and a philosopher in the realm of deep learning and artificial intelligence this time we talk a lot about his paper titled on the measure of intelligence that discusses how we might define and measure general intelligence in our computing machinery quick summary of the sponsors babel masterclass and cash app click the sponsor links in the description to get a discount and to support this podcast as a side note let me say that the serious rigorous scientific study of artificial general intelligence is a rare thing the mainstream machine learning community works on very narrow ai with very narrow benchmarks this is very good for incremental and sometimes big incremental progress on the other hand the outside the mainstream renegade you could say agi community works on approaches that verge on the philosophical and even the literary without big public benchmarks walking the line between the two worlds is a rare breed but it doesn't have to be i ran the agi series at mit as an attempt to inspire more people to walk this line deep mind and open ai for time and still on occasion walk this line francois chole does as well i hope to also it's a beautiful dream to work towards and to make real one day if you enjoy this thing subscribe on youtube review it with five stars on apple podcast follow on spotify support on patreon or connect with me on twitter at lex friedman as usual i'll do a few minutes of ads now and no ads in the middle i try to make these interesting but i give you time stamps so you can skip but still please do check out the sponsors by clicking the links in the description it's the best way to support this podcast this show is sponsored by babel an app and website that gets you speaking in a new language within weeks go to babble.com and use colex to get three months free they offer 14 languages including spanish french italian german and yes russian daily lessons are 10 to 15 minutes super easy effective designed by over 100 language experts let me read a few lines from the russian poem by alexander bloch that you'll start to understand if you sign up to babble no it's now i say that you'll start to understand this poem because russian starts with a language and ends with the vodka now the latter part is definitely not endorsed or provided by babel it will probably lose me this sponsorship although it hasn't yet but once you graduate with babel you can roll my advanced course of late night russian conversation over vodka no app for that yet so get started by visiting babel.com and use codelex to get three months free this show is also sponsored by masterclass sign up at masterclass.com lex to get a discount and to support this podcast when i first heard about masterclass i thought it was too good to be true i still think it's too good to be true for 180 a year you get an all-access pass to watch courses from to list some of my favorites chris hadfield on space exploration hope to have him in this podcast one day neil degrasse tyson on scientific thinking communication neil two will wright creator of simcity and sims on game design carlos santana on guitar carrie casparov von chasse daniel negrano and poker and many more chris hadfield explaining how rockets work and the experience of being launched at the space alone is worth the money by the way you can watch it on basically any device once again sign up at masterclass.com lex to get a discount and to support this podcast this show finally is presented by cash app the number one finance app in the app store when you get it use code lex podcast cash app lets you send money to friends buy bitcoin and invest in the stock market with as little as one dollar since cash app allows you to send and receive money digitally let me mention a surprising fact related to physical money of all the currency in the world roughly eight percent of it is actually physical money the other 92 percent of the money only exists digitally and that's only going to increase so again if you get cash out from the app store google play and use code lex podcast you get ten bucks and cash app will also donate ten dollars to first an organization that is helping to advance robotics and stem education for young people around the world and now here's my conversation with francois chalet what philosophers thinkers or ideas had a big impact on you growing up and today so one author that had a big impact on me when i read these books as a teenager with jean pierre who is a swiss psychologist is considered to be the father of developmental psychology and he has a large body of work about um basically how intelligence develops uh in children and so it's really old work like most of it is from the 1930s 1940s so it's not quite up to date it's actually superseded by many neural developments in developmental psychology but to me it was it was very uh very interesting very striking and actually shaped the early ways in which i started thinking about the mind and development of intelligence as a teenager his actual ideas or the way he thought about it or just the fact that you could think about the developing mind at all i guess both jean-pierre is the author that's reintroduced me to the notion that intelligence and the mind is something that you construct through throughout your life and that you the children uh construct it in stages and i thought that was a very interesting idea which is you know of course very relevant uh to ai to building artificial minds another book that i read around the same time that had a big impact on me uh and and there was actually a little bit of overlap with john pierre as well and i read it around the same time is jeff hawkins on intelligence which is a classic and he has this vision of the mind as a multi-scale hierarchy of temporal prediction modules and these ideas really resonated with me like the the notion of a modular hierarchy um of you know potentially um of compression functions or prediction functions i thought it was really really interesting and it reshaped uh the way it started thinking about how to build minds the hierarchical nature the which aspect also he's a neuroscientist so he was thinking yes actual he's basically talking about how our mind works yeah the notion that cognition is prediction was an idea that was kind of new to me at the time and that i really loved at the time and yeah and the notion that yeah there are multiple scales of processing uh in the brain the hierarchy yes this is before deep learning these ideas of hierarchies in here i've been around for a long time even before on intelligence i mean they've been around since the 1980s um and yeah that was before deep learning but of course i think these ideas really found their practical implementation in deep learning what about the memory side of things i think he's talking about knowledge representation do you think about memory a lot one way you can think of neural networks as a kind of memory you're memorizing things but it doesn't seem to be the kind of memory that's in our brains or it doesn't have the same rich complexity long-term nature that's in our brains yes the brain is more for sparse access memory so that you can actually retrieve um very precisely like bits of your experience the retrieval aspect you can like introspect you can ask yourself questions again yes you can program your own memory and language is actually the tool you used to do that i think language is a kind of operating system for the mind and use language well one of the uses of language is as a query that you run over your own memory use words as keys to retrieve specific experiences of basic concepts specific starts like language is the way you store thoughts not just in writing in the in the physical world but also in your own mind and it's also how you reach with them like imagine if you didn't have language then you would have to you would not have really have a self internally triggered uh way of retrieving past thoughts you would have to rely on external experiences for instance you you see a specific site you smell specific smell and it brings up memories but you would naturally have a way to deliberately deliberately access these memories without language well the interesting thing you mentioned is you can also program the memory you can change it probably with language yeah using language yes well let me ask you a chomsky question which is like first of all do you think language is like fundamental like uh there's turtles what's at the bottom of the turtles they don't go it can't be turtles all the way down is language at the bottom of cognition of everything is like language the fundamental aspect of like what it means to be a thinking thing no i don't think so i think language you disagree with noam chomsky yes language is a layer on top of cognition so it is fundamental to cognition in the sense that to to use a computing metaphor i see language as the operating system uh of the brain of the human mind yeah and the operating system you know is a layer on top of the computer the computer exists before the operating system but the operating system is how you make it truly useful and the operating system is most likely windows not not linux because it's uh language is messy yeah it's messy and it's uh it's um pretty difficult to uh uh inspect it introspect it how do you think about language like we use actually sort of human interpretable language but is there something like a deeper that's closer to like like logical type of statements um like yeah what is the nature of language do you think because there's something deeper than like the syntactic rules we construct is there something that doesn't require utterances or writing or so on are you asking about the possibility that there could exist uh languages for thinking that are not made of words yeah yeah i think so i think so uh the mind is layers right and language is almost like the the outermost the uppermost layer um but before we think in words i think we think in in terms of emotion in space and we think in terms of physical actions and i think a baby babies in particular probably express his thoughts in terms of um the actions uh that they've seen of that or that they can perform and in terms of the in in terms of motions of objects in their environment before they start thinking in terms of words it's amazing to think about that as the building blocks of language so like the kind of actions and ways the babies see the world as like more fundamental than the beautiful shakespearean language you construct on top of it and we we probably don't have any idea what that looks like right like what because it's important for them trying to engineer it into ai systems i think visual analogies and motion is a fundamental building block of the mind and you you actually see it reflected in language like language is full of special metaphors and when you think about things i consider myself very much as a visual thinker you you often express your thoughts um by using things like uh visualizing concepts um in in 2d space or like you solve problems by image imagining yourself navigating a concept space i don't know if you have this sort of experience you said visualizing concept space so like so i certainly think about i certainly met i certainly visualize mathematical concepts but you mean like in concept space visually you're embedding ideas into some into a three-dimensional space you can explore with your mind essentially yeah 2d you're a flatlander you're um okay no i i i do not i always have to uh before i jump from concept to concept i have to put it back down on pape and it has to be on paper i can only travel on 2d paper not inside my mind you're able to move inside your mind but even if you're writing like a paper for instance don't you have like a special representation of your paper like you you visualize where ideas lie topologically in relationship to other ideas kind of like a subway map of the ideas in your paper yeah that's true i mean there there is uh in papers i don't know about you but there feels like there's a destination um there's a there's a key idea that you want to arrive at and a lot of it is in in the fog and you're trying to kind of it's almost like um what's that called when um you do a path planning search from both directions from the start and from the end but and then you find you do like shortest path but like uh you know in game playing you do this with like a star from both sides when you see where they join yeah so you kind of do at least for me i think like first of all just exploring from the start from like uh first principles what do i know uh what can i start proving from that right and then from the destination if i you start backtracking like if if i want to show some kind of sets of ideas what would it take to show them and you kind of backtrack but yeah i don't think i'm doing all that in my mind though like i'm putting it down on paper do you use mind maps to organize your ideas yeah i like mind maps let's get into this i've been so jealous of people i haven't really tried it i've been jealous of people that seem to like they get like this fire of passion in their eyes because everything starts making sense it's like uh tom cruise in the movie was like moving stuff around some of the most brilliant people i know use mind maps i haven't tried really can you explain what the hell a mind map is i guess mind map is a way to make connected mess inside your mind to just put it on paper so that you gain more control over it it's a way to organize things on paper and as as kind of like a consequence for organizing things on paper it start being more organized inside inside your own mind what what does that look like you put like do you have an example like what what what do you what's the first thing you write on paper what's the second thing you write i mean typically uh you you draw a mind map to organize the way you think about a topic so you would start by writing down like the the key concept about that topic like you would write intelligence or something and then you would start adding uh associative connections like what do you think about when you think about intelligence what do you think are the key elements of intelligence so maybe you would have language for instance instead of motion and so you would start drawing notes with these things and then you would see what do you think about when you think about motion and so on and you would go like that like a tree it's a tree or a tree mostly there's a graph to like a tree oh it's it's more of a graph than a tree and um and it's not limited to just you know writing down words you can also uh draw things and it's not it's not supposed to be purely hierarchical right like you can um the point is that you can start once once you start writing it down you can start reorganizing it so that it makes more sense so that it's connected in a more effective way see but i'm so ocd that you just mentioned intelligence and language emotion i would start becoming paranoid that the categorization isn't perfect like that i'll become paralyzed with the mind map that like this may not be so like the even though you're just doing associative kind of connections there's an implied hierarchy that's emerging and i would start becoming paranoid that's not the proper hierarchy so you're not just one way to see mind maps is you're putting thoughts on paper it's like a stream of consciousness but then you can also start getting paranoid well if is this the right hierarchy sure like which it's a mind map it's your mind map you're free to draw anything you want you're free to draw any connection you want and you can just make a different mind my opinion is if you think the central node is not the right node yeah so i suppose there's a fear of being wrong if you want to if you want to organize your ideas by writing down what you think which i think is is very effective like how do you know what you think about something if you don't write it down right uh if you do that the thing is that it imposes a much more uh syntactic structure over your ideas which is not required with mind map so mind map is kind of like a lower level more freehand way of organizing your thoughts and once you've drawn it then you can start uh actually voicing your thoughts in terms of you know paragraphs it's a two-dimensional aspect of layout too right yeah and it's it's a kind of flower i guess you start there's usually you want to start with a central concept yes typically it ends up more like a subway map so it ends up more like a graph a topological graph without a root note yeah so like in a subway map there are some nodes that are more connected than others and there are some nodes that are more important than others right so there are destinations but it's it's not going to be purely like a tree for instance yeah it's fascinating to think that if there's something to that about our about the way our mind thinks by the way i just kind of remembered obvious thing that i have probably thousands of documents in google doc at this point that bullet point lists uh which is you can probably map a mine map to a bullet point list it's the same it's a no it's not it's a tree it's a tree yeah so i create trees but also they don't have the visual element like um i guess i'm comfortable with the structure it feels like it the narrowness the constraints feel more comforting if you have thousands of documents with your own thoughts in google docs why don't you write uh some kind of search engine like maybe a mind map um a piece of software mind mapping software where you write down a concept and then it gives you sentences or paragraphs from your thousand google docs document that match this concept the problem is it's so deeply unlike mind maps it's so deeply rooted in natural language so it's not um it's not semantically searchable i would say because the categories are very you kind of mention intelligence language and motion they're very strong semantic like it feels like the mind map forces you to be semantically clear and specific the bullet points list i have are are sparse desperate thoughts that uh poetically represent a category like motion as opposed to saying motion so unfortunately it's that's the same problem with the internet that's why the idea of semantic web is difficult to get it's uh most language on the internet is a giant mess of natural language that's hard to interpret which so do you think uh do you think there's something to mind maps as um you actually originally brought up as we were talking about kind of cognition and language do you think there's something to mind maps about how our brain actually deals like think reasons about things it's possible i think it's reasonable to assume that there is some level of topological processing in the brain that the brain is very associative in nature and i also believe that a topological space is a better medium to encode thoughts than a geometric space then so i think what's the difference in topological and geometric space well um if you're talking about topologies uh then points are either connected or not so the topology is more like a subway map and geometry is when you're interested in the distance between things and in subway maps you don't really have the concept of distance you only have the concept of whether there is a train going from station a to station b and what we do in deep learning is that we're we're actually dealing with uh geometric spaces we're dealing with concept vectors word vectors uh that have a distance between the gist expressed in terms of dot product um we are not we are not really building topological models usually i think you're absolutely right like distance is a fundamental importance in deep learning i mean it's the continuous aspect of it yes because everything is a vector and everything has to be a vector because everything has to be differentiable if your space is discrete it's no longer differentiable you cannot do deep learning in it anymore well you could but you could only do it by embedding it in a bigger continuous space so if you do topology in the in the context of deep learning you have to do it by embedding your topology in a geometry right yeah well let me uh let me zoom out for a second uh let's get into your paper on the measure of intelligence that uh did you put on 2019 yes okay yeah november november yeah remember 2018 that was a different time yeah i remember i still remember it feels like a different and different different world you could travel you can you know actually go outside and see friends yeah let me ask the most absurd question i think uh there's some non-zero probability there'll be a textbook one day like 200 years from now on artificial intelligence or it'll be called like just intelligence because humans will already be gone it'll be your picture with a quote you know one of the early biological systems would consider the nature of intelligence and they'll be like a definition of how they thought about intelligence which is one of the things you do in your paper on measure intelligence is to ask like well what is intelligence and and uh how to test for intelligence and so on so is there a spiffy quote about what is intelligence what is the definition of intelligence according to francois charley yes so do you think the the superintendent ais of the future will want to remember us do we remember humans from the past and do you think they would be you know they won't be ashamed of having a biology called origin uh no i i think it would be a niche topic it won't be that interesting but it'll be it'll be like the people that study in certain contexts like historical civilization that no longer exist the aztecs and so on that that's how it'll be seen and it'll be studying also the context on social media there will be hashtags about the atrocity committed to human beings um when when the when the robots finally got rid of them like it was a mistake it'll be seen as a as a giant mistake but ultimately in the name of progress and it created a better world because humans were uh over consuming the resources and all they were not very rational and were destructive in the end in terms of productivity and putting more love in the world and so within that context there'll be a chapter about these biological systems seems to have a very detailed vision of that feature you should write a sci-fi novel about it i said i'm working i'm working on a sci-fi novel currently yes yes self-published yeah the definition of intelligence so intelligence is the efficiency with which you acquire new skills at tasks that you did not previously know about that you did not prepare for all right so it is not intelligence is not skill itself it's not what you know it's not what you can do it's how well and how efficiently you can learn new things new things yes the idea of newness there seems to be fundamentally important yes so you would see intelligence on display for instance whenever you see a human being or you know an ai creature adapt to a new environment that it has not seen before that its creators did not anticipate when you see adaptation when you see improvisation when you see generalization that's intelligence uh in reverse if you have a system that's when you put it in a slightly new environment it cannot adapt it cannot improvise it cannot deviate from what it's hardcoded to do oh what what it has been trying to do um that is a system that is not intelligent there's actually a quote from einstein that captures this idea which is the measure of intelligence is the ability to change i i like that quote i think it captures at least part of this idea you know there might be something interesting about the difference between your definition and einsteins i mean he's just being einstein and clever but acquisition of new ability to deal with new things versus ability to just change what's the difference between those two things so just changing itself do you think there's something to that just being able to change yes being able to adapt so not not change but certainly uh changes direction being able to adapt yourself to your environment whatever the environment that's that's a big part of intelligence yes and intelligence is more precisely you know how efficiently you're able to adapt how efficiently you're able to basically master your environment how efficiently you can acquire new skills and i think there's a there's a big distinction to be drawn between intelligence which is a process and the output of that process which is skill so for instance if you have a very smart human programmer that considers the game of chess and that writes down a static program that can play chess then the intelligence is the process of developing that program but the program itself is just encoding the output artifact of that process the program itself is not intelligent and the way you tell it's not intelligent is that if you put it in a different context you ask it to play go or something it's not going to be able to perform well with human involvement because the source of intelligence the entity that is capable of that process is the human programmer so we should be able to tell the difference between the process and its output we should not confuse the output and the process it's the same as you know do not confuse a road building company and one specific road because one specific road takes you from point a to point b but a road building company can take you from you can make a path from anywhere to anywhere else yeah that's beautifully put but it's also to play devil's advocate a little bit you know um it's possible that there's something more fundamental than us humans so you kind of said the programmer creates uh the difference between the the choir of the skill and the skill itself there could be something like you could argue the universe is more intelligent like the the deep the base intelligence of um that we should be trying to measure is something that created humans we should be measuring god or what the source the universe as opposed to like there's there could be a deeper intelligence sure there's always deeper intelligence you can argue that but that does not take anything away from the fact that humans are intelligence and you can't tell that because they are capable of adaptation and and generality um and you see that in particular and the fact that uh humans are capable of handling uh situations and tasks that are quite different from anything that any of our evolutionary ancestors has ever encountered so we are capable of generalizing very much out of distribution if you consider our evolutionary history as being in a way else training data course evolutionary biologists would argue that we're not going too far out of the distribution we're like mapping the skills we've learned previously desperately trying to like jam them into like these new situations i mean there's definitely a little bit a little bit of that but it's pretty clear to me that we're able to uh you know most of the things we do any given day in our modern civilization are things that are very very different from what you know our ancestors a million years ago would have been doing in in a given day and your environment is very different so i agree that um everything we do we do it with cognitive building blocks that we acquired over the course of revolution right and that anchors um our cognition to a certain context which is the human condition very much but still our mind is capable of a pretty remarkable degree of generality far beyond anything we can create in artificial systems today like the degree in which the mind can generalize from its evolutionary history can generalize away from its evolutionary history is much greater than the degree to which a depending system today can generalize away from its training data and like the key point you're making which i think is quite beautiful is like we shouldn't measure if we talk about measurement we shouldn't measure the skill we should measure like the creation of the new skill the ability to create that new skill yes but there it's tempting like it's weird because the skill is a little bit of a small window into the into the system so whenever you have a lot of skills it's tempting to measure the skills yes i mean the skill is the uh only thing you can objectively measure but yeah so the the thing to keep in mind is that when you see skill in the human it gives you a strong signal that that human is intelligent because you knew they weren't born with that skill typically like you say this you see a very strong chess player maybe you're a very stronger player yourself i think you're and you're you're saying that because i'm russian and now now you're you're prejudiced you assume oh yeah it's just biased i'm biased yeah well you're dead by us um so if you see a very strong chess player you know they weren't born knowing how to play chess so they had to acquire that skill with their limited resources with their limited lifetime and you know they did that because they are generally intelligent and so they may as well have acquired any other skill you know they have this potential and on the other hand if you see a computer playing a chess you cannot make the same assumptions because you cannot you know just assume the computer is generally intelligent the computer may be born knowing how to play chess in the sense that it may have been programmed by a human that has understood chess for the computer and and that has just encoded um the output of that understanding in aesthetic program and that program is not intelligent so let's zoom out just for a second and say like what is the goal of the on the measure of intelligence paper like what do you hope to achieve with it so the goal of the paper is to clear up some long-standing misunderstandings about the way we've been conceptualizing intelligence in the ai community and in the way we've been evaluating progress in ai there's been a lot of progress recently in machine learning and people are you know extrapolating from that progress that we're about to solve general intelligence and if you want to be able to evaluate these statements you need to precisely define what you're talking about when you're talking about general intelligence and you need a formal way a reliable way to measure how much intelligence how much general intelligence a system processes and ideally this measure of intelligence should be actionable so it should not just describe what intelligence is it should not just be a binary indicator that tells you the system is intelligent or it isn't um it should be actionable it should have explanatory power right so you could use it as a feedback signal it would show you uh the way towards building more intelligent systems so at the first level you draw a distinction between two divergent views of intelligence of um as we just talked about intelligence is a collection of tax task specific skills and a general learning ability so what's the difference between kind of this memorization of skills and a general learning ability we've talked about a little bit but can you try to linger on this topic for a bit yeah so the first part of the paper uh is uh an assessment of the different ways uh we've been thinking about intelligence and the different ways we've been evaluating progress in ai and the history of cognitive sciences has been shaped by two views of the human mind and one view is the evolutionary psychology view in which the mind is a collection of fairly static special purpose ad-hoc mechanisms that have been hard coded by evolution over our our history as a species over a very long time and um early ai researchers people like marvin minsky for instance they clearly subscribed to this view and they saw they saw the mind as a kind of you know collection of static programs uh similar to the programs they would they would run on like mainframe computers and in fact they i think they very much understood the mind uh through the metaphor of the mainframe computer because that was the tool they they were working with right and so you had the static programs this collection of very different static programs operating over a database like memory and in this picture learning was not very important learning was considered to be just memorization and in fact learning is basically not featured in ai textbooks until the 1980s with the rise of machine learning it's kind of fun to think about that learning was the outcast like the the weird people were learning like the mainstream ai world was um i mean i don't know what the best term is but it's non-learning it was seen as like reasoning yes would not be learning based yes it was seen it was considered that the mind was a collection of programs that were primarily logical in nature and that's all you needed to do to create a mind was to write down these programs and they would operate over your knowledge which would be stored in some kind of database and as long as your database would encompass you know everything about the world and your logical rules were uh comprehensive then you would have in mind so the other view of the mind is the brain as a sort of blank slate right this is a very old idea you find it in john locke's writings this is the tabulata and this is this idea that the mind is some kind of like information sponge that starts empty it starts blank and that absorbs uh knowledge and skills from experience right so it's uh it's a sponge that reflects the complexity of the world the complexity of your life experience essentially that everything you know and everything you can do is a reflection of something you found in the outside world essentially so this is an idea that's very old uh that was not very popular for instance in the in the 1970s but that had gained a lot of vitality recently with the rise of connectionism in particular deep learning and so today deep learning is the dominant paradigm in ai and i feel like lots of ai researchers are conceptualizing the mind via a deep learning metaphor like they see the mind as a kind of randomly initialized neural network that starts blank when you're born and then that gets trained yeah exposure to training data that acquires knowledge and skills exposure to training data by the way it's a small tangent i feel like people who are thinking about intelligence are not conceptualizing it that way i actually haven't met too many people who believe that a neural network will be able to reason who seriously think that rigorously because i think it's actually interesting world view and and we'll talk about it more but it it's been impressive what the uh what neural networks have been able to accomplish and it's i to me i don't know you might disagree but it's an open question whether like like scaling size eventually might lead to incredible results to us mere humans will appear as if it's general i mean if you if you ask people who are seriously thinking about intelligence they will definitely not say that all you need to do is is like the mind is just in your network uh however it's actually you that's that's very popular i think in the deep learning community that many people are kind of uh conceptually you know intellectually lazy about it right but what i guess what i'm saying exactly right it's uh i i me i haven't met many people and i think it would be interesting uh to meet a person who is not intellectualized about this particular topic and still believes that neural networks will go all the way i think january is probably closest to that there are definitely people who argue that uh current deep learning techniques are already the way to general artificial intelligence and that all you need to do is to scale it up to all the available training data and that's if you look at the the waves that open ai's gpt stream model has made you see echoes of this idea so on that topic gpt-3 similar to gpt-2 actually have captivated some part of the imagination of the public there's just a bunch of hype of different kind that's i would say it's emergent it's not artificially manufactured it's just like people just get excited for some strange reason in in the case of gpt3 which is funny that there's i believe a couple months delay from release to hype maybe i'm not historically correct on that but it feels like there was a little bit of a lack of hype and then there's a phase shift into into hype but nevertheless there's a bunch of cool applications that seem to captivate the imagination of the public about what this language model that's trained in unsupervised way without any fine tuning is able to achieve so what do you make of that what are your thoughts about gbt3 yeah so i think what's interesting about gpg3 is the idea that it may be able to learn new tasks in after just being shown a few examples so i think if it's actually capable of doing that that's novel and that's very interesting and that's something we should investigate that said i must say i'm not entirely convinced that we have shown it's it's capable of doing that it's very likely given the amount of data that the model is trained on that what it's actually doing is pattern matching uh a new task you give it with the task that it's been exposed to in its training data it's just recognizing the task instead of just developing a model of the task right but there's a side to interrupt there's there's a parallels to what you said before which is it's possible to see gpt3 as like the prompts that's given as a kind of sql query into this thing that it's learned similar to what you said before which is language is used to query the memory yes so is it possible that neural network is a giant memorization thing but then if it gets sufficiently giant it'll memorize sufficiently large amounts of thing in the world where it becomes more intelligence becomes a querying machine i think it's possible that uh a significant chunk of intelligence is this giant associative memory uh i definitely don't believe that intelligence is just a giant issue of memory but it may well be a big component so do you think gpt 3 4 5 gpt 10 will eventually like what do you think where's the ceiling do you think you'll be able to reason um no that's a bad question uh like what is the ceiling is the better question how well is it going to scale how good is gptn going to be yeah so i believe gptn is going to chiptn is going to improve on the strength of gpt2 and 3 which is it will be able to generate you know ever more plausible text in context just monitoring the process performance um yes if you train if you're training bigger more on more data then your text will be increasingly more context aware and increasingly more plausible in the same way that gpd3 it is much better at generating clausable text compared to gpd2 but that said i don't think just getting up uh the model to more transformer layers and more train data is going to address the flaws lgbt3 which is that it can generate plausible text but that text is not constrained by anything else other than plausibility so in particular it's not constrained by factualness uh or even consistency which is why it's very easy to get gpt3 to generate statements that are factually untrue uh or to general statements that are even self-contradictory right uh because it's uh it's it's only goal is plausibility and it has no other constraints it's not constrained to be self-consistent for instance right and so for this reason one thing that i thought was very interesting with gpd3 is that you can present mind the answer it will give you by asking the question in specific way because it's very responsive to the way you ask the question since it has no understanding of the content of the question right and if you if you ask the same question in two different ways that are basically adversarially engineered to produce certain answers you will get two different answers to contractor answers it's very susceptible to adversarial attacks essentially potentially yes so in in general the problem with these models is generative models is that they are very good at generating plausible text but that's just that's just not enough right um you need uh i think one one avenue that would be very interesting to make progress is to make it possible to write programs over the latent space that these models operate on that you would rely on these self-supervised models to generate a sort of flag pool of knowledge and concepts and common sense and then you will be able to write explicit uh reasoning programs over it uh because the current problem with gpt stream is that you it's it can be quite difficult to get it to do what you want to do if you want to turn gpd3 into products you need to put constraints on it you need to um force it to obey certain rules so you need a way to program it explicitly yeah so if you look at its ability to do program synthesis it generates like you said something that's plausible yeah so if you if you try to make it generate programs it will perform well for any program that it has seen it in its training data but because uh program space is not interpretive right um it's not going to be able to generalize to problems it hasn't seen before now that's currently do you think sort of an absurd but i think useful um i guess intuition builder is uh you know the gpt-3 has 175 billion parameters a human brain has a hundred has about a thousand times that or or more in terms of number of synapses do you think obviously very different kinds of things but there is some degree of similarity do you think what do you think gpt will look like when it has a hundred trillion parameters you think our conversation might be so in nature different like because you've criticized gbt3 very effectively now do you think no i don't think so so the the to begin with the bottleneck with scaling upgrades gbt models uh alternative pre-trained transformer models is not going to be the size of the model or how long it takes to train it the bottleneck is going to be the trained data because openui is already training gpt3 on a crore of basically the entire web right and that's a lot of data so you could imagine training on more data than that like google could try on more data than that but it would still be only incrementally more data and i i don't recall exactly how much more data gpd3 was trained on compared to gpt2 but it's probably at least like 100 or maybe even a thousand x don't have the exact number uh you're not going to be able to train the model on 100 more data than with what you already with what you're already doing so that's that's brilliant so it's not you know it's easier to think of compute as a bottleneck and then arguing that we can remove that bottleneck but we can remove the compute bottleneck i don't think it's a big problem if you look at the at the base at which we've uh improved the efficiency of deep learning models in the past a few years i'm not worried about uh trying time bottlenecks or model size bottlenecks the the bottleneck in the case of these generative transformer models is absolutely the trained data what about the quality of the data so so yeah so the quality of the data is an interesting point the thing is if you're going to want to use these models in real products um then you you want to feed them data that's as high quality as factual i would say as unbiased as possible but you know there's there's not really such a thing as unbiased data in the first place but you probably don't want to to train it uh on reddit for instance it sounds sounds like a bad plan so from my personal experience working with a large scale deep learning models so at some point i was working on a model at google that's trained on extra 150 million labeled images it's image classification model that's a lot of images that's like probably most publicly available images on the web at the time and it was a very noisy data set because the labels were not originally annotated by hand by humans they were automatically derived from like tags on social media or just keywords in in the same page as the image was fun and so on so it was very noisy and it turned out that you could uh easily get a better model uh not just by training like if you train on more of the noisy data you get an incrementally better model but you you you very quickly hit diminishing returns on the other hand if you try on smaller data set with higher quality annotations quality that are annotations that are actually made by humans you get a better model and it also takes you know less time to train it uh yeah that's fascinating it's the self-supervised learnings there's a way to get better doing the automated labeling yeah so you can enrich or refine your labels in an automated way that's correct do you have a hope for um i don't know if you're familiar with the idea of a semantic web is this a semantic web just for people who are not familiar and is uh is the idea of being able to convert the internet or be able to attach like semantic meaning to the words on the internet this the sentences the paragraphs to be able to contr convert information on the internet or some fraction of the internet into something that's interpretable by machines that was kind of a dream for um i think the the semantic white papers in the 90s it's kind of the dream that you know the internet is full of rich exciting information even just looking at wikipedia we should be able to use that as data for machines and so information is not it's not really in a format that's available to machines so no i don't think the semantic web will ever work simply because it would be a lot of work right to make to provide that information in structured form and there is not really any incentive for anyone to provide that work uh so i think the the way forward to make the knowledge on the web available to machines is actually something closer to unsupervised deep learning yeah the gpg 3 is actually a bigger step in the direction of making the knowledge of the web available to machines than the semantic web was yeah perhaps in a human-centric sense it it feels like gpt-3 hasn't learned anything that could be used to reason but that might be just the early days yeah i think that's correct i think the forms of reasoning that you that you see it perform are basically just reproducing patterns that it has seen in string data so of course if you're trained on uh the entire web then you can produce an illusion of reasoning in many different situations but it will break down if it's presented with a novel uh situation that's the opening question between the illusion of reasoning and actual reasoning yes the power to adapt to something that is genuinely new because the thing is even imagine you had uh you could train on every bit of data ever generated in history of humanity uh it remains so that model would be capable of of anticipating uh many different possible situations but it remains that the future is going to be something different like for instance if you train a gpt stream model on on data from the year 2002 for instance and then use it today it's going to be missing many things it's going to be missing many common sense facts about the world it's even going to be missing vocabulary and so on yeah it's interesting that uh gbt3 even doesn't have i think any information about the coronavirus yes which is why you know uh a system that's uh you you tell that the system is intelligent when it's capable to adapt so intelligence is gonna require uh some amount of continuous learning but it's also gonna require some amount of improvisation like it's not enough to assume that what you're going to be asked to do is something that you've seen before or something that is a simple interpolation of things you've seen before yeah in fact that model breaks down for uh even even very tasks that look relatively simple from a distance like l5 self-driving for instance google had a paper couple of years back showing that something like 30 million different road situations were actually completely insufficient to train a driving model it wasn't even l2 right and that's a lot of data that's a lot more data than the the 20 or 30 hours of driving that a human needs to learn to drive given the knowledge they've already accumulated well let me ask you on that topic elon musk tesla autopilot one of the only companies i believe is really pushing for a learning based approach are you you're skeptical that that kind of network can achieve level four l4 is probably achievable l5 is probably not what's the distinction there this l5 is completely you can just fall asleep yeah alpha is basically human level well it will drive you have to be careful saying human level because like that's yeah most of the drivers yeah that's the clearest example of like you know cars will most likely be much safer than humans in situ in many situations where humans fail it's the vice versa so i'll tell you you know the thing is the the amounts of training data you would need to anticipate for pretty much every possible situation you'll encounter in the real world uh is such that it's not entirely unrealistic to think that at some point in the future we'll develop a system that's running on enough data especially uh provided that we can uh simulate a lot of that data we don't necessarily need actual uh actual cars on the road for everything but it's a massive effort and it turns out you can create a system that's much more adaptative that can gen
Resume
Categories