François Chollet: Keras, Deep Learning, and the Progress of AI | Lex Fridman Podcast #38
Bo8MY4JpiXE • 2019-09-14
Transcript preview
Open
Kind: captions Language: en the following is a conversation with Francois Shelley he's the creator of Karass which is an open source deep learning library that is designed to enable fast user friendly experimentation with deep neural networks it serves as an interface to several deep learning libraries most popular of which is tensorflow and it was integrated into the tensorflow main codebase a while ago meaning if you want to create train and news Neil networks probably the easiest the most popular option is to use chaos inside tensorflow aside from creating an exceptionally useful and popular library Francois was also world-class AI researcher and software engineer at Google and he's definitely an outspoken if not controversial personality in the AI world especially in the realm of ideas around the future of artificial intelligence this is the artificial intelligence podcast if you enjoy it subscribe on YouTube give us five stars and iTunes supported on patreon or simply connect with me on Twitter at lex Friedman spelled Fri D M a.m. and now here's my conversation with Francois shall I you're known for not sugarcoating your opinions and speaking your mind about ideas and AI especially on Twitter it's one of my favorite Twitter accounts so what's one of the more controversial ideas you've expressed online and gotten some heat for how do you pick yeah no I think if you have if you go through the trouble of maintaining Twitter accounts you might as well speak your mind you know otherwise it's you know what's even the point filling in Twitter accounts they're getting nice Colin just didn't leave it in in the garage yes so what's one thing for which I got out of push back perhaps you know that time I wrote something about the idea of intelligence explosion and I was questioning the ID and the reasoning behind this idea and I guess I was push back on that I guess not a flag for it so yeah so integers explore I'm sure if Mei was the idea but it's the idea that if you were to build general AI problem-solving algorithms well the problem of building such an AI that itself is a problem that could be solved by your eye and maybe it could be so better than that then what humans can do so you're a I could start tweaking its own algorithm good that start being a better version of itself and so on it's ratified in a recursive fashion and so you would end up with an AI with exponentially increasing intelligence all right and I was basically questioning this idea first of all because the notion of intelligence explosion uses an implicit definition of intelligence that doesn't sound quite right to me it considers intelligence as property of a grain that you can consider in isolation like the height of the building for instance right but that's not really what intelligence is intelligence emerges from the interaction between a brain a body like embodied intelligence and an environment and if you're missing one of these pieces then you can actually define interagency so just tweaking a brain to make it smaller and smaller doesn't actually make any sense to me so first of all you're crushing the dreams of many people right so there's a little bit like say Maris I feel a lot of physicists max tegmark people who think you know the universe is an information processing system our brain is kind of an information processing system so what's the theoretical limit like it doesn't make sense that there should be some it seems naive to think that our own brain is somehow the limit of the capabilities and this information is just I'm playing devil's advocate here this information processing system and then if you just scale it if you're able to build something that's on par with the brain you just the process that builds it just continues and it will improve exponentially so that that's the logic that's used actually by almost everybody that is worried about superhuman intelligence yeah so you're you're trying to make so most people who are skeptical that are kind of like this doesn't their thought process this doesn't feel right like that's for me as well so I'm more like it doesn't we the whole thing is shrouded in mystery where you you can't really say anything concrete but you could say this doesn't feel right this doesn't feel like that's how the brain works and you're trying to with your blog post and now making a little more explicit so one idea is that the brain isn't exists alone it exists within the environment so you can't exponentially you have to somehow exponentially improve the environment and the brain together almost yeah in order to create something that's much smarter in some kind of of course we don't have a definition of intelligence that's right that's correct III don't think if you look at very smart people today even humans not even talking about a eyes I don't think their brain and the toughness of their brain is the bottleneck to the actually expressed intelligence to their achievements you cannot just tweak one part of this system back of this brain body environment system and expect capabilities like what emerges out of this system to just you know explode exponentially because anytime you improve one part of the system with many interdependencies like this there's a new bottleneck that arises right and I don't think even today for very smart people their brain is not the bottleneck to the sort of problems they can solve right in fact many various what people to them you know they are not actually solving any big scientific problems in a tense time they like Einstein but you know the the patent clerk days like Iceland became Einstein because this was a meeting of a genius with a big problem at the right time right but maybe this meeting could have noon and never happens and then Iceland there's just been a patent clerk it's and in fact many people today are probably like genius level smart but you wouldn't know because they're not really expressing any of that was brilliant so we can think of the world earth but also the universe is just as the space of problems so all these problems and tasks are roaming it a various difficulty and there's agents creatures like ourselves and animals and so on that are also roaming it and then you get coupled with a problem and then you solve it but without that coupling you can't demonstrate your quote-unquote intelligence exactly intelligence is the meaning of great problem-solving capabilities with a great problem and if you don't have the problem you don't react spreche in intelligence all you're left with is potential intelligence like the performance of your brain are you know haha your IQ is which in itself it's just a number right so you mentioned problem-solving capacity yeah what what do you think of as problem-solving about what can you try to define intelligence like what does it mean to be more or less intelligent is it completely coupled to a particular problem or is there something a little bit more universal yeah I do believe all intelligence is specialized intelligence even human intelligence has some degree of generality well all intelligence systems have some degree of generality they're always specialized in in one category of problems so the human intelligence is specialized in the human experience and that shows at various levels that shows in some prior knowledge that's innate that we have at birth knowledge about things like agents goal-driven behavior visual priors about what makes an object try us about time and so on that shows also in the way we learn for instance is very very fast to pick up language it's very very easy for us to learn certain things because we are basically hard-coded to learn them and we are specialized in solving certain kinds of problem and we are quite useless when it comes to other kinds of problems for instance we we are not really designed to handle very long term problems we have no capability of seeing that the very long term we don't have them how much working memory you know so how do you think about long term using long term planning we're talking about scale of years millennia what do you mean by long term were not very good well human intelligence is specialized in the human experience and humans experience is very short like one lifetime is short even within one lifetime we have a very hard time envisioning you know things on a scale of yells like it's very difficult to project yourself at at the scale of favi at the scale of ten years and so on right we can solve only fairly narrowly scoped problems so when it comes to solving bigger problems larger scale problems we are not actually doing it on an individual level so it's not actually our brain doing it we we have this thing called civilization right which is itself a sort of problem solving system a sort of artificially intelligent system right and it's not running on one brain is ringing on a network of brains in fact it's running on much more than a network of brains it's running on a lot of infrastructure like books and computers and the internet and human institutions and so on and that is capable of handling problems on the on a much greater scale in any individual human if you look at some computer science for instance that's an institution that solves problems and it's it is super human right I took resin on a greater scale it can source cancer much bigger problem than an individual human good and science itself science as a system as an institution is a crime affair artificial intelligence problem solving algorithm that is superhuman yes these computer science is like a theorem prover at a scale of thousands maybe hundreds of thousands of human beings at a scale what do you think is a intelligent agent so there's us humans at the individual level there is millions maybe billions of bacteria on our skin there is that's at the smaller scale you can even go to the particle level as systems that behave you couldn't say intelligently in some ways and then you can look at the earth as a single organism you can look at our galaxy and even the universe is just a little organism do you think how do you think about scale and defining intelligent systems and we're here at Google there is millions of devices doing computation just in a distributed way how do you think what intelligence there's a scale you can always characterize anything as a system I think people who talk about things like intelligence explosion tend to focus on one Asian is basically one brain like one brain considered in isolation like a brain a jaw that's controlling your body in a very like top to bottom can a fashion and that body is person goes into an environment so it's a very hierarchical view you have the brain at the top of the pyramid then you have to bother just plainly receiving orders and then the body is manipulating objects in environment and so on so everything is subordinate to this one thing this epicenter which is the brain but in real life intelligent agents don't really work like this right there is no strong delimitation between the brain and the body stalin's you have to look not just to the brain but at the nervous system but then the nervous system and the body are not free to step and it is so you have to look at an entire animal as one agent but then you start realizing as you observe an animal of any length of time that a lot of the intelligence of an animal is actually externalized that's especially true for humans a lot of our intelligence is externalized when you write down some notes that is externalized intelligence when you write the computer program you are externalizing cognition so it's externalizing books it's generalized in in computers the internet in other humans it's externalizing language and so on so it's there is no like hardly limitation of what makes an intelligent agent it's all about context okay but alphago is better at go than the best humor player you know there's levels of skill here so do you think there is such a ability as such a concept as a intelligence explosion and a specific task and then well yeah do you think it's possible to have a category of tasks on which you do have something like an exponential growth of ability to solve that particular problem I think if you consider specificity corn is probably possible to some extent I also don't think we have to speculate about it's because we have real-world examples of frequency self-improving intelligence systems for instance science problem-solving system and knowledge generation system like a system that experiences the world in some sense and then gradually understands it and can act on it and that system is superhuman and it is clearly recursively self-improving because science feeds into technology technology can be used to build better tools with our computers better instrumentation and so on which in turn I can make sense faster right so science is probably the closest thing we have today to a recursively self-improving super human AI and you can just observe you know it's science its scientific progress to the exploding which you know it's that vision isn't is an interesting question you can use that as a basis to try to understand what we happen with a superhuman AI that as a science track behavior let me linger on it a little bit more what is your intuition why an intelligence explosion is not possible like taking the scientific all the semantic revolutions why can't we slightly accelerate that process so you you can absolutely accelerates any problem solving process so recursively as recursive self-improvement is absolutely a real thing but what happens with recursively seven boring system it's typically not explosion because no system exists in isolation and so tweaking one part of the system means that suddenly another pollow system becomes a bottleneck and if you look at science for instance which is clearly a recursively self-improving clearly a problem-solving system scientific progress is not actually exploding if you look at science what you see is the picture of a system that is consuming an exponentially increasing amount of resources but it's having a linear output in terms of scientific progress and maybe that that will seem like a very strong claim many people are actually saying that you know scientific progress is exponential but when they are claiming this they are actually looking at indicators of resource consumption resource consumption by science the number of papers being published the number of parents being filed and so on which are just just completely credited with how many people are working on science today yeah right so it's actually an indicator of resource consumption but what you should look at is the ad put is progress in terms of the knowledge that sales generates in terms of the scope and significance of the problems that we solve and some people have actually been trying to measure that like Michael Neilson for instance he had a very nice paper I think that was last year about it so his approach to measure a scientific progress was to look at the time line of scientific discoveries over the past you know hundred 150 years and for each measure discovery ask a panel of experts to rate the significance of the discovery and if the output of Sciences institution were exponential you will expect the example density of significance to go up exponentially maybe because there's a faster rate of discoveries maybe because the discoveries are you know increasingly more important and what actually happens if you if you plot this temporal density of significance measured in this way is that you see very much a flat graph you see a flat graph across all disciplines across physics biology medicine and so on and it actually makes a lot of sense if you think about it because thing about the progress of physics a hundred and ten years ago right it was a time of crazy change think about the progress of technology you know 160 years ago when we started it in you know replacing horses with scars when we saw that in electricity and so on it was a time of incredible change and today is also a time a very fast change but it would be an unfair characterization to say that today technology in science are moving way faster than they did 50 years ago 100 years ago and if you do try to regardless plots the temporal density of the significance you have significance idea of seeing a family sorry you do see very flat curves let's fasten and and you can check out the paper that Michael Nielson had about this idea and so the way interpret is as you make progress in an in a given field on any given subtitles it becomes exponentially more difficult to make further progress like the very first person to work on information theory if you enter a new field and still the very early years there's a lot of low-hanging fruits you can take that's right yeah but the next generation of researchers is gonna have to dig much harder actually to make smaller discoveries a probably larger number of small discoveries and to achieve the same amount of impact you're gonna need a much greater headcount and that's exactly the picture you're seeing with science that the number of scientists and engineers is in fact increasing exponentially the amount of computational resources that are available to science is increasing exponentially and so on so the resource consumption of science is exponential but the output in terms of progress in terms of significance is linear and the reason why is because and even though science is recursively self-improving meaning that scientific progress mm-hmm turns into technological progress which in turn helps science if you look at computers for instance our products of science and computers are tremendously useful in spinning up science the internet same thing the engine is a technology that's made possible by various incentive advances and itself because it enables you know scientists to to network to communicate to exchange papers and ideas much faster it is a way to speed eccentric products so even though you're looking at a recursively self-improving system it is consuming Spanish way more resources to produce the same amount of problem-solving so that's the fascinating way to paint and certainly that holds for the deep learning community right if you look at the temporal what did you call it the temporal density of significant ideas if you look at in deep learning I think I'd have to think about that but if you really look at significant ideas in deep learning they might even be decreasing so I I do believe the per per paper significance it's like creasing with signified and the amount of papers is still today exponentially increasing sweating if you look at an aggregate my guess is that you would see a linear progress you're probably aware to some to some the significance of all papers you would see roughly in your profits and in in my opinion it is not coincidence that you're seeing in your progress in science despite exponential resource conception I think the resource consumption is dynamically adjusting itself to maintain linear progress because the we as a community expecting your progress meaning that if we start investing less and sing less progress it means that suddenly there are some low-hanging fruits that become available and someone's going to step in step up and pick them right right so it's very much like a market right for discoveries and ideas but there's another fundamental part which you're highlighting which as a hypothesis as science or like the space of ideas any one path you travel down it gets exponentially more difficult to get a new way to develop new ideas yes and your sense is that fun that's gonna hold across our mysterious universe yes when exponential promise Stringer's exponential friction so that if you tweak one part of a system suddenly some other part becomes a bottleneck for instance let's say let's say develop some device that measures it's an acceleration and then it's it has some engine and it add puts even more acceleration in proportion if it's an acceleration and you drop it somewhere it's not going to reach infinite speed because some it exists in a certain context so the air around its gonna generate friction it's gonna is gonna you know block it at some top speed and even if you were to consider the broader context and lift the bottleneck there like the bottleneck a firm a friction then some other part of the system which starts stepping in and creating exponential friction maybe the speed of light are you know whatever and it's definitely horse true when you look at the problem solving algorithm that is being run by science as an institution science as a system as you make more and more progress this despoiling this recursive self-improvement component you are encountering exponential friction like do more researchers you have working on different ideas the more overhead you have in communication across researchers if you look at you were mentioned in quantum mechanics right well if you wants to start making significant discoveries today significant progress in quantum mechanics there is an amount of knowledge you have to ingest which is huge so there is a very large overhead to even start to contribute there is a large amount of overhead to synchronize across researchers and so on and of course this the significant practical experiments are going to require exponentially expensive equipment because there is your ones I've already been run right so in your senses there is no way escaping there's no way of escaping this kind of friction with artificial intelligence systems yeah no I think science is very good way to model with what we happen with with a superhumans are you serious if improving yeah that's intense I mean that's that's my intuition too it's not it's not like a mathematical proof of anything that's not my points like I'm not I'm not trying to prove anything I'm just trying to make an argument to question the narrative of intelligence explosion which is quite a dominant narrative and you do get a lot of pushback if you go against it because so for many people write AI is not just a subfield of computer science it's more like a belief system I just believe that the world is headed towards an event the singularity past which you know AI will become we go exponential very much and the world will be transformed and humans will become obsolete and if you if you go against this narrative because because it is not really a scientific argument but more for belief system it is part of the identity of many people if you go against this narrative it's like you're attacking the identity of people who believe in it it's almost like saying God doesn't exist at something right so you do get a lot of pushback if you try to question this ideas first of all I believe most people all they might not be as eloquent or explicit as you're being but most people in computer science and most people who actually have built anything that you could call AI quote unquote would agree with you they might not be describing in the same kind of way it's more so the pushback you're getting it's from people who get attached to the narrative from not from a place of science but from a place of imagination yes correct miss correct so why do you think that's so appealing because the usual dreams that people have when you create a super intelligent system past a singularity that would people imagine it somehow always destructive do you have if you were put on your psychology hat what's why is it so appealing to imagine the ways that all of human civilization will be destroyed I think it's a good story you know it's a good story and very interestingly it's mirrors residue stories right reiji's mythology if you look at the mythology of most civilizations it's about the world being headed towards some final event in which the world will be destroyed and some new world order will arise that will be mostly spiritual like the apocalypse followed by products probably yeah it's a very appealing story on a fundamental level and we all need stories we own stories to structure in the way we see the world especially at time scales that are beyond our ability to make predictions right so on a more serious non exponential explosion question do you think there will be a time when we'll create something like human level intelligence or intelligence systems that will make you sit back and be just surprised at damn how smart this thing is that doesn't require exponential growth and an exponential improvement but what what's your sense than a time line and so on that where you'll be really surprised at certain capabilities and we'll talk about limitations and deep learners so when do you think in your lifetime you'll be really damn surprised around 2013-2014 I was many times surprised by the capabilities of deep learning actually that was before we had assess exactly well deepening could do and could not do and it felt like a time of immense potential and then we started you know narrowing it down but I was very surprised so it's a it's it's it's it has already happened was there a moment there must have been a day in there where your surprise was almost bordering on the belief of the narrative that we just discussed what it was there a moment because you've written quite eloquently about the limits of deep learning was there a moment that you thought that maybe deep learning is limitless no I don't think I've ever believed this what was restocking is that it it worked all right they worked at all yes yeah but there's a there's a big jump between being able to do really good computer vision and human level intelligence so I I don't think at any points I wasn't an impression that the results we got in computer vision meant that we were very close to him and even intelligence I don't think we're very close to human ever intelligence I do believe that there's no reason why we want achieve it at some point I also believe that you know it's the problem is with talking about human level intelligence that implicitly you are considering like an axis of intelligence with different levels but that's not really how intelligence works intelligence is very multi-dimensional and so there's the question of capabilities but there's also the question is being human-like and two very different things like you can be potentially very advanced intelligent agents that are not human like at all and you can also build very human-like agents and this out okay two very different things right right let's go from the philosophical to the practical I can give me a history of Karis and all the major deep learning frameworks that you kind of remember in relation to chaos and in general tensorflow Theano the old days you give a brief overview Wikipedia style history and your role in it before return to AGI discussions yeah that's a broad topic so I started working on chaos to the name chaos at the time I actually pick the name like just today I was gonna release it so I started working on it in February 2015 and so at the time there weren't too many people working on deep learning maybe like fewer than 10,000 the software tuning was not really developed so the deepening library was cafe which was mostly C++ why do I say cafe was the main one cafe was vastly more popular than ya know in in late 2014 early 2015 cafe was the one library that everyone was using for computer vision and computer vision was the most popular problem absolutely company like covenants was like the subfield of deplaning it everyone was working on so myself suing in in late 2014 I was actually interested in islands in Rico neural networks which was a very niche topic at the time right III a tree to catherine around 2016 and so I was looking for good tools and I had used torch 7 News Channel you stay on a lot in cable competitions mmm I just cafe and there was no like good solution for Ireland's at the time like there was no reusable open-source implementation of in lsdm for instance so I decided to build my own and that first the pitch for that was it was going to be mostly around lsdm Iconia networks it was going to be in Python an important decision at the time that was Canon are obvious is that the models would be defined yeah a Python code which was kind of like going against the mainstream at the time because cafe Thailand who wants on like all the big libraries were actually going with you approach sharing static configuration files in Yemen to define models so some libraries were using code to define models like torch 7 obviously that was not Python Lezyne was like a piano based very early library that was I think developed I don't remember exactly probably late 2014 Python as well it's Python as well it was it was like on top of Tiano and so I started working on something and in the value proposition at the time was that not only that the what I think was the first reducible open-source implementation FRS diem you could combine Islands and covenants with the same library which is not really possible before like a he was on into incontinence and it was kind of easy to use because so before I was using ten I was actually using psychically on and I loved psychically for its usability so I drew a lot of inspiration from psychic then when I went Cara's it's almost like cycling for neural networks yeah the fit function exactly the fit function like reducing a complex training loop to a single function call right and of course you know some people will say this is hiding a lot of details but that's exactly the point all right the magic is the point all right so it's magical but in a good way it's magical in the sense that it's delightful yeah right yeah I'm actually quite surprised I didn't know that it was born out of desire to implement our hands in lc/ms it was that's fascinating so you were actually one of the first people to really try to attempt to get the major architectures together and it's also interesting you made me realize that that was a design decision at all is defining the model in code just I'm putting myself in your shoes whether the yamo especially if cafe was the most popular it was the most but I might fall if I was I'm if I were yeah I don't it I didn't like the yellow thing but it makes more sense that you will put in a configuration file the definition of a model that's an interesting gutsy move just stick with defining it in code just if you look back other libraries we're doing it as well but it was definitely the more niche option yeah okay Cara's and then girls so I really scare us in March 2015 and it got she's just pretty much from the start so the deep learning community was very small at the time lots of people were starting to be interested in the rest um so it was gonna release it at the right time because it was offering an easy to use it as team implementation exactly at the time where lots of yours started to be intrigued by the capabilities of onin on ins one LP so it it grew from there then I joined Google about six months later and that was actually completely unrelated to took care us actually joined a research team working on image classification mostly like computer vision so I was doing computer vision research at Google initially and immediately when I joined Google I was exposed to the early internal version of tensorflow and the way to appeal to me at the time and that was definitely the way it was at the time is that this was an improved version of Tiano so I immediately knew I had to port cars to this new tensorflow thing and I was actually very busy as as as a noogler as a new Googler so I had not time to work on that but then in November I think twist November 2015 tensorflow got released and it was kind of like my my wake-up call at hey to actually you know go and make it happen so in December I I putted cars to run on two of tensorflow but it was not exactly port it was more like a refactoring where I was abstracting away all the backend functionality into one module then the same codebase could run on top of multiple backends right so on top of things fluor Theano and for the next year yeah no you know stayed as the default option it was you know it was easier to use somewhat let's begin it was much faster especially when he came to Orleans but eventually you know a tensorflow overtook it right and test of all the early tests for similar architectural decisions there's the arrow yeah so what is there was a natural as a natural transition yeah absolutely so what I mean that still carries is the side almost fun project right yeah so it it was not my job assignment it's not I was doing it on the side that so I'm and even though it's great to have you know a lot of uses for a deepening library at the time like throughout 2016 but I wasn't doing it as my main job so things solid changing in I think it's mustard maybe October 2016 so one year later so Rashad who has the lead intensive law basically showed up one day in in our building while I was doing like so I was doing research in things like so I added of computer vision research also collaborations with Christians getting and deep planning for theorem proving it was a really interesting research topic answer Rajat was saying hey we saw chaos we liked it we saw that you had Google why don't you come over for like a quarter and and and work with us I was like yeah that sounds like a great opportunity let's do it and so I started working on integrating the chaos API into tends to flow more tightly so what fold up is a sort of like temporary tents of lonely version of chaos that was in tents for that contrib for a while and finally moved to dance to the core and you know I've never actually gotten back to my old sim doing research well it's it's kind of funny that somebody like you who dreams of or at least sees the power of AI systems the reason and they were improving will talk about has also created a system and makes the the most basic kind of LEGO building that is deep learning super accessible super easy so beautifully so that's the funny irony that you're Billy there's just both you're responsible for both things but so telephoto 2.0 it's kind of there's a sprint I don't know how long I'll take but there's a sprint towards the finish what do you look what are you working on these days whether you're excited about what are you excited about in 2.0 I mean eager execution there's so many things that just make it a lot easier yeah work what are you excited about and what's also really hard what are the problems you have to kind of saw so I've spent the past year and a half working on 1002 and it's been a long journey I'm actually extremely excited about it I think it's a great product it's a delightful product competitive law one we met huge progress so on the carrot side what I'm really excited about is that so you know previously Kara's has been this very easy-to-use high level interface to do deep learning but if you wanted to you know if you wanted a lot of flexibility the chaos framework you know was probably not the optimal way to do things compared to just writing everything from scratch so in some way the framework was getting in the way and in terms of you to you don't have this at all actually you have the usability of the high level interface but you have the flexibility of this lower level interface and you have this spectrum of workflows where you can get more or less usability and flexibility the trade-offs depending on your needs right you can write everything from scratch and you get a lot of help doing so by you know subclassing models and writing some train loops using ego execution it's very flexible is very easy to debug is very powerful but all of these integrates seamlessly with higher level features up to you know the classic workflows which which are very psychically unlike and and you know are ideal for a data scientist machining engineer type of profile so now you can have the same framework offering the same set of api's that enable a spectrum of workflows that are more or less uniform or less high level that are suitable for you know profiles ranging from researchers to data scientists and everything in between yeah so that's super excited I mean it's not just that it's connected to all kinds of tooling you can go on mobile and what that's for light it can go in the cloud or serving and so on and all its connected together now some of the best software written ever is often done by one person sometimes two so with a Google you're now seeing sort of Karass having to be integrated in tensorflow I'm sure it's a ton of engineers working on so and there's I'm sure or a lot of tricky design decisions to be made how does that process usually happen from at least your perspective what are the what are the debates like what a is there a lot of thinking considering different options and so on yes so a lot of the time I spend on Google is actually discussing design discussions right writing design Docs participating in design review meetings and so on this is you know as important as actually writing a cool right well there's a lot of thoughts there's a lot of thought and and a lot of care that is that taken in coming up with these decisions and taking into account all of our users because tensorflow has this extremely diverse user base right it's not it's not like just one user segment where everyone has the same needs we have small-scale production uses large-scale production uses we have startups we have researchers you know it's all over the place and we have to catch up to all of their needs if I just look at the standard the base of C++ or Python there's some heated debate do you have those at Google I mean they're not here in terms emotionally but there's probably multiple ways to do it right so how do you arrive through those design meetings at the best way to do it especially in deep learning where the field is evolving as you're doing it is there some magic to it there's a magic to the process I don't know just magic to the process but there definitely is a process so making design decision is about satisfying a set of constraints but also trying to do so in the simplest way possible because this is what can be maintained is what can be expanding in the future so you don't want to naively satisfy the constraints by just you know for each capability you need available you're gonna come up with one argument new idea and so on you want to design api's and that are modular and hierarchical so that they're there they have an API surface that is as small as possible right and and you want this modular hierarchical architecture to reflect the way that domain experts think about the problem because as the men expect when you're reading about a new media you're reading each toy or some darks pages you already have a way that you're thinking about the problem you already have like certain concepts in mind and and and your thing about how they relate together and when you're reading darks you're trying to build as quickly as possible and mapping between the concepts feature the new API and the concepts in your mind so you are trying to map your mental model as a domain expert to the way things work in the API so you need an API and an underlying implementation that are reflecting the way people think about these things so in minimizing the time it takes them this mapping yes minimizing the time the cognitive load there is in in just industry knowledge about your API an API should not be self referential or RF referring to implementation details it should only be referring to domain-specific concepts that people already never understand brilliant so what's the future of kerosene transfer look like what it stands for 3.0 look like so that's gonna to fall in the future for me to answer especially since I'm now I'm not even the one making these decisions okay but so from my perspective which is you know just one perspective among many different perspectives on the transferor team I'm really excited by developing even higher level api's higher level and Carols I'm really excited by hyper parameter tuning by automated machine learning or two ml I think the future is not just you know defining a model like like us and being Lego blocks and then click fit on it it's more like an automatical model let me just look at your data and optimize the objective view after right so that's that's what what I'm looking - yeah so you put the baby into a room with the problem and come back a few hours later with a fully solved problem exactly it's not like a box of Lego's right it's more like the combination of a kid that's pretty good at Legos blocks of Legos yeah it's just building the thing very nice so that's that's an exciting feature and I think there's a huge amount of applications and revolutions to be had under the constraints of the discussion we previously had but what do you think of the current limits of deep learning if we look specifically at these function approximator x' that tries to generalize from data they have you've talked about local versus extreme generalization you mentioned in your networks don't generalize well humans do so there's this gap so and you've also mentioned that externalization extreme journals asian requires something like reasoning to fill those gaps so how can we start trying to build systems like that all right yes so this is this is by design right deplaning models are like huge parametric models differentiable so continuous that go from an input space to not with space and they're trained with gradient descent so they're trying-- pretty much point by points they are learning a continuous geometric morphing from from an input vector space to not protective space right and because this is done point by point a deep neural network can only make sense of points in experience space that are very close to things that it has already seen in string data at best it can do interpolation across points but that means you know that means in order to train your network you need a dance sampling of the input cross ad with space almost a point-by-point sampling which can be very expensive if you're dealing with complex real-world problems like autonomous driving for instance or our robotics is it's doable if you're looking at the subset of the visual space but even then it's still fairly expensive you seen in millions of examples and it's only going to be able to make sense of things that are very close to waste as seen before and in contrast to that well of course we have human intelligence but even if you're not looking at human intelligence you can look at very simple rules algorithms if you have a symbolic rule it can actually apply to a very very large set of inputs because it is abstract it is not obtained by doing a point by point mapping for instance if you try to learn a sorting algorithm using a deep neural network well you're very much limited to learning point by point what the sorted representation of this specific list is like but instead you could have a very simple sorting algorithm written in a few lines maybe it's just you know two nested loops and it can process any list at all because it is abstract because it is a set of rules so deep learning is really like point by point geometric more things more things train with conditions and meanwhile abstract rules can generalize much better and I think the future is which combine the two so how do we do you think combine the tools how do we combine good point by point functions with programs which is what symbolic AI type systems yeah at which levels the combination happen and you know obviously we're jumping into the realm of where there's no good answers it just kind of ideas and intuitions and so on well if you look at the really successful AI systems today I think they are already hybrid systems that are combining symbolic AI with D planning for instance success robotics systems are already mostly model-based rule-based things like planning algorithms and so on at the same time they're using deep learning as perception modules sometimes they're using deep learning as a way to inject a fuzzy intuition into a rule-based process if you look at a system like an a self-driving car it's not just one big end when your network you know that wouldn't work at all precisely because in order to train that you need a dense sampling of experience space when it comes to driving which is completely unrealistic obviously instead the Salonika is mostly symbolic you know it's software it's programmed by hand it's mostly based on explicit models in this case mostly 3d models of the of the environment around the car but it's interfacing with the real world using deep learning modules right right so the deep learning there serves is the way to convert the raw sensory information to something usable by symbolic systems okay well it's lingering that a little more so dense sampling from input to output you said it's obviously very difficult is it possible in the case of send driving you mean let's say still driving itself driving permit for many people but let's not even talk about self-driving let's talk about steering so staying inside the lane lines following yeah it's definitely a problem cancel reason and two in the planning model but that's like one small subset on a second yeah I don't like you're jumping from the extreme so easily because I disagree with you on that I think well it's it's not obvious to me that you can solve Lane following it's no it's not it's not obvious I think it's doable I think in general you know there is no hard limitations to what you can learn with a DP on network as long as this the search space like is rich enough is flexible enough and as long as you have this dense sampling of the input cross output space the problem is that you know this dense sampling could mean anything from 10,000 examples to like trillions and trillions so that's that's my question so what's your intuition and if you could just give it a chance and think what kind of problems can be solved by getting a huge amounts of data and thereby creating a dense mapping so let's think about natural language dialogue the Turing test do you think the Turing test can be solved with a neural network alone well the deterrent test is all about tricking people into believing that certain to human I don't think that's actually very difficult because it's more about exploiting a human perception and not so much about intelligence there's a big difference between mimicking in Asian behavior an actual intogen behavior so ok let's look at maybe the elect surprised and so on the different formulations of a natural language conversation that are less about mimicking and more about maintaining a fun conversation that lasts for 20 minutes mm-hmm that's a little less about mimicking and that's more about I mean it's still mimicking but it's more about being able to carry forward a conversation with all the tangents that happen in dialogue and so on do you think that problem is learn Irbil with this kind of well the neural network that does the point-to-point mapping so I think it would be very very challenging to do this with deep learning I don't think it's out of the question either I wouldn't read out the space of problems that can be solved or the large neural network what's your sense about the spaces those problems so it useful problems for us in theory it's it's infinite right you can solve any problem in practice while deep learning is great fit for perception problems in general any any problem which is naturally a minimal to explicit handcrafted rules or rules that you can generate device exhaustive search or some program space so perception of intuition as long as you have a sufficient ring there and that's the question I mean perception there's interpretation and understanding of the scene yeah which seems to be outside the reach of current for social systems so do you think larger networks will be able to start to understand the physics and the physics of the scene the three-dimensional structure and relationships divisors in the scene and so on or really that's where symbology has to step in well it's it's always possible to solve these problems with with deplaning is just extremely inefficient a model would be an explicit rule-based abstract model would be a flaw efficient for better and more compressed representation of physics then learning justice mapping between in this situation this thing happens if you change the situation like slightly then this other thing happens and so on do you think is possible to automatically generate the programs that would require that kind of reasoning our dessert have to so the word expert systems fail there's so many facts about the world had to be hand coded and thing is possible to learn those logical statements that are true about the world and their relationships do you think I mean that's kind of what you're improving at a basic level is trying to do right yeah except it's it's much harder to farm any statements about the world compared to family ting mathematical statements statements about the world you know tend to be subjective so can you can you learn rule-based models yes yes differently that's the this is a field of program synthesis however today we just don't really know how to do it so it's it's very much a grad search or research problem and so we are limited to you know the sort of at recession grassroot algorithms that we have today personally I think genetic algorithms are very promising so I was like genetic programming genic priming Zack can you discuss the field of program synthesis like what how many people are working and thinking about it what where we are in the history programs the decision what are your hopes f
Resume
Categories