Kind: captions Language: en the following is a conversation with Vladimir of APNIC part 2 the second time we spoke in the podcast he's the co-inventor of support vector machines support vector clustering vici theory and many foundational ideas is the disco learning he was born in the Soviet Union worked at the Institute of control sciences in Moscow then in the u.s. worked at AT&T NEC labs Facebook AI research and now is a professor at Columbia University his work has been cited over 200,000 times the first time we spoke on the podcast was just over a year ago one of the early episodes this time we spoke after a lecture he gave titled complete statistical theory of learning as part of the MIT series of lectures on deep learning and AI that I organized I'll release the video of the lecture in the next few days this podcast and lecture are independent from each other so you don't need one to understand the other the lecture is quite technical and math heavy so if you do watch both I recommend listening to this podcast first since the podcast is probably a bit more accessible this is the artificial intelligence podcast if you enjoy it subscribe on YouTube give it five stars on Apple podcast supported on patreon or simply connect with me on Twitter at Lex Friedman spelled Fri DM aen as usual I'll do one or two minutes of ads now and never any ads in the middle that can break the flow of the conversation I hope that works for you and doesn't hurt the listening experience this show is presented by cap the number one finance app in the App Store when you get it used collects podcast cash app lets you send money to friends buy Bitcoin and invest in the stock market with as little as $1 brokerage services are provided by cash up investing a subsidiary of square and member s IPC since cash app allows you to send and receive money digitally peer-to-peer and security in all digital transactions very important let me mention the PCI data security standard PCI DSS level 1 a cash app is compliant with I'm a big fan of standards for safety and security and PCI DSS is a good example of that or a bunch of competitors got together and agreed that there needs to be a global standard around the security of transactions now we just need to do the same for autonomous vehicles and the AI systems in general so again if you get cash out from the App Store or Google Play and use the collects podcast you get ten dollars in cash people will also donate ten dollars to first one of my favorite organizations that is helping to advance robotics and STEM education for young people around the world and now here's my conversation with vladimir vapnik you and I talked about Alan Turing yesterday a little bit and that he as the father of artificial intelligence may have instilled in our field an ethic of engineering and not science seeking more to build intelligence rather than to understand it what do you think is the difference between these two paths of engineering intelligence and the science of intelligence with completely different story engineering his imitation of human activity you have to make device which behaved as human be fair have all the functions of human it does not matter how you do it but to understand what is intelligence but is quite different problem so I think I believe that it's somehow related to predicate we talked yesterday about because look at the vladimir probes idea he just found 31 he predicates he called it units which can explain human behavior at least in the russian tales here local Russian tales and derive from that than people realize that that more vitamin ration depths it isn't TV in movie serials and for so long so you're talking about Vladimir Propp alright who in 1920 published a book morphology of the folktale describing 31 predicates that have this kind of sequential structure that a lot of the stories narratives follow in Russian folklore and in other content we'll talk about it I'd like to talk about predicates in a focused way but let me if you allow me to stay zoomed out on our friend Alan Turing and you know he inspired a generation with the the imitation game yes do you think if we can linger in a little bit longer do you think we can learn do you think learning to imitate intelligence can get us closer to the scienter understanding intelligence so why do you think imitation is so far from understanding I think that it is different between you have different goals so your goal is to create something something useful and that is great and you can see how much things was done and I believe that it will be done even more yet self-driving cars and also there's business it is great and it was inspired by curing vision but understanding is very difficult it more was philosophical category what means understands evolved I believe in him which start from Plateau that there exists volt of ideas I believe that intelligence it is volved a five years but it has vault of pure ideas and when you combine save this reality sings it creates as in my face invariance which is very specific and that I believe the combination of ideas in way to constructing conveyance is intelligence but first of all predicates if you know predicates and hopefully them not not too much predicate exists for example 31 predicates for human behaviors not a lot Vladimir Propp used 31 you can even call particles 31 predicates to describe stories narratives what do you think human behavior how much of human behavior how much of our world our universe all the things that matter in our existence can be summarized in predicates of the kind that problems working with I think that's we have a lot of form of behavior but I think the predicate is much less because even in these examples which I gave you yesterday you saw that predicates can be can construct one predicate can construct many different invariance depending on on your data they're applying to different data and they give different invariance so but pure ideas maybe not so much not so many less I don't know about that but my guess I hope just very challenged about digit recognition how much you need I think we'll talk about computer vision and 2d images a little bit in your challenge that's exactly both intelligence that's exactly that's exactly about know that hopes to be exactly about the spirit of intelligence in the simplest possible way absolutely you should start this simple story of the very serial to do well there's an open question whether starting at the amnesty digit recognition is a step towards intelligence or it's an entirely different thing I think that to beat records using a hundred two hundred times less examples you new to intelligence you need intelligence so let's because you used this term and it'll be nice and I'd like to ask simple maybe even dumb questions let's start with a predicate in terms of terms and how you think about it what is a predicate I don't know I have a feeling for Molly as they exist but I believe that predicate for 2d images one of them is symmetry hold on a second sorry sorry to interrupt and pull you back at the simplest level we're not evens we're not being profound currently a predicate is a statement of something that is true yes do you think of predicates as somehow probabilistic in nature or is this binary this is truly constraints of logical statements about the world in my definitions of simplest predicate is function function and you can use this function to move inner product that is predicate what's the input and was the output of the function input is something which is input in reality so if you consider digit recognition it picks up space yes input but it is function which in pixel space but it can be any function from pixel space and you choose and and I believe that there are several functions which is important to understanding of images one of them is symmetry it's not so simple construction as I described this little irritated other stuff but another I believe I don't know how me is how well structure eyes is picture structure eyes yeah what I mean by structure eyes it is formal definition so something happens heavy on the left corner not so heavy is the middle and so on you describe in general concept of what what use you concept some kind of universal concepts yeah but I don't know how to formalize this do you so this is the thing there's a million ways we can talk about this I'll keep bringing it up but we humans have such concepts when we look at digits but it's hard to put them just like you're saying now it's hard to put them into words you know that this example when critics in music trying to describe music they use predicates and not too many predicates but in different combination but they have some special words for describing music and the same should be for images but my bizarre are critics who understand essence of what this images about do you think there exists critics who can summarize the essence of images human beings the eye hopes with years but that explicitly state them on paper this is the fundamental question I'm asking is do you do you think there exists a small set of predicates that will summarize images it feels to our mind like it does that the concept of what makes a two and A three and a four no no it's not on this level what it should not describe two three four it describes some construction which allow you to create invariance in variants sorry to stick on this but terminology invariance it is it is protective of your image say I can say looking on my image it is more or less symmetric and I can give you a value of symmetry say level of symmetry using this function which I gave yesterday then you can describe that your image have these characteristics exactly in the way of musical critics described music so but this is invariant applied to two specific data to specific music to something I strongly believe in in in this plot ideas answer exists world of predicate and world of reality and predicate in the reality is somehow connected and you have to know that let's talk about Plato a little bit so you draw a line from Plato to Hegel to Wagner to today yes so Plato has forms the the theory of forms there's a world of ideas yeah world of things as you thought along and there's a connection and presumably the world of ideas is very small and the world of things is arbitrarily big but they're all what Plato calls them like the it's a shadow the real world is a shadow from the world of yeah you have projection projection Altaf idea yes right oh and in reality you can realize this projection Union using canvas invariance because it is projection for on specific examples which create specific features of specific objects so so the essence of intelligence is while only being able to observe the world of things try to come up the world of ideas exactly like in this music story intelligent musical critics knows the soldiers more than favorite feeling about Thornton I feel like that's a contradiction intelligent music critics but I think I think music is to be enjoyed in all its forms the notion of critic like a food critic no I don't want dark mushroom that's an interesting question there's emotion there's a certain elements of the human psychology of the human experience which seem to almost contradict intelligence and reason like emotion like fear like like a love all those things are those not connected in any way to the space of ideas thus I don't know I I just want to be concentrate on a very simple story on digit recognition so you don't think you have to love and fear death in order to recognize digits I don't know because it's so complicated it is it involves a lot of stuff which I never consider but I know about digital news and I know that four digit recognition to to get records from small number of observations you need predicate but not special predicate for this problem but Universal predicate which understand world of images of visual and visual yes but on the first step they understand say world of handwritten digits or characters or something simple so like he said symmetry as an interest no that's what I think one of the predicates is related to symmetry but the level of symmetry ok degree of symmetry so you know you think symmetry at the bottom as a universal notion and there's the there's degrees of a single kind of symmetry or is there many kinds of symmetries many kinds of symmetries there is a symmetry anti symmetry say letter s so it has vertical anti symmetry and it could be diagonal symmetry vertical CIMMYT so when you when you cut vertically the letter S yeah then the upper part in lower part in different directions along the y axis yeah but that's just like one example symmetry isn't there like right but there is a degree of symmetry if you play all this little relative stuff to to do tangent distance whatever I described you can do you can have a degree of symmetry and that is describing reason of image it is the same as you will describe this image saying about Digitas it has anti symmetry did you see symmetric molars look for symmetry do you think such concepts like symmetry predicates like symmetry is it a hierarchical set of concepts or are these independent distinct predicates that we want to discover as some set of noise idea of symmetry and you can this idea of symmetry make very general like degree of symmetry the degree of symmetry can be zero no symmetry at all degree of symmetry say more or less symmetrical but you have one of this description and symmetry can be different as I told horizontal vertical diagonal and anti symmetries it also concept of symmetry what about shape in general I mean symmetry is a fascinating notion but you know I'm talking about digit I would like to concentrate on all I would like to know predicates for digit recognition yes but symmetry is not enough for digit recognition right it was not necessarily for digital cognition it helps to create invariant which will which you can use when you will have examples for digitalization you have regular problem of digital communication you have examples of the first class second class plus you know that the resistor exists concept of symmetry in you apply when you looking for decision rule you will apply concept of symmetry of this level of symmetry which you estimate from so let's let's talk everything is consumed if convergence what is convergence what is we convergence what is strong convergence so sorry I'm going to do this here what are we converging from until you converge ink you would like to have a function the function which say indicator function which indicate your digit 5 for example a classification task let's talk only about classification so classification means you will say whether this is a 5 or not or say which of the ten digits it is all right right I would like to have these functions then I have some exam I can consider protégée of these examples say symmetry and I can measure a level of symmetry for every digit and then I can take average and I from from my training data and I will consider only functions of conditional probability which I am looking for my decision rule which applying to two digits will give me the same average as they absorb on training date so actually this is different level of description of what you want you want not just your so not one digit you show this this predicate so general property of all digits which you have in mind if you have in mind digits three it gives you property of digits three and you select as admissible set of function only function which keeps this product you will not consider as a functions so you immediately looking for smaller subsets of function that's what I mean by admissible functions you add a musical function exam which is still a pretty large for the number three a little R it's a large but if you have one predicate but according to there is a strong indeed convergence strong convergence is convergence and function you're looking for the function from one function and you're looking concern as a function and square difference from them should be small if you take difference in any points make a square make an integral and it should be small that is convergence in function suppose you have some function any function so I would say I say that some function converge to this function if integral from squared difference between them is small that's the definition of strong convergence that definition of a few functions integral the difference PS ma it is convergence in functions yeah but you have different convergence in functionals you take any function you take some function C and take inner product this function this F function f 0 function which you want to find and that gives you some value so you say is it set of functions converge in inner product to this function if this value of inner product converge to value F 0 that is for one V but V converges requires that it converge for any function of Hilbert space if it converge for any function of Hilbert space then you will say that this is the convergence you can think that when you take integral that is protecting integral protect your function for example if you will take sine of a sine it is coefficient of say Fourier expansion so it if it converge for all coefficients of free expansion so under some condition it converge doto2 function you're looking for but the convergence means any property converges not point wise but integral protégée of function so the convergence means integral property of functions when I talking about predicate I would like to formulate which integral protectees I would like to have for convergence so and if I will take one predict predicated function which I measure property if I will use one predicate and say I will consider only function which give me the same value as less this predicate I selecting set of functions from functions which is admissible in the sense that function which are looking for in this set of functions because I checking in training data it gives the same yes it's always has to be connected to the training data in terms of yeah but but protégée you can know independent on training date and this guy prop yeah so the series formal property 31 property and you've married a Russian fairy tale all right but a Russian fairy tale is not so interesting more interesting that people apply this to two movies to theater two to two different things and the same works the universal well so I would argue that there's a little bit of a difference between the kind of things that were applied to which are essentially stories and digit recognition it was the same story you're saying digits there's a story within the digit yeah so but my my point is why I hope that it possible to beat rear court using not 60,000 but a hundred times less because since that you will give predicates and you will select decision not from wide set of functions but from set of function which keeps us predicate but predicate is not related just to digital cognition right so like in blotter space do you think it's possible to automatically discover the predicates this so you basically said that the essence of intelligence is the discovery of good predicates yeah now the natural question is you know that's what Einstein was good at doing in physics can we make machines do these kinds of discovery of good predicates or is this ultimately a human endeavor yes I don't know I don't think that machine can do because according to theory both with convergence any function from hilbert space can be predicated so you have infinite number of predicates in opera and before you don't know which predicate is good on me but whatever prop show and what people call it breakthrough that there is not too many predicates which cover most of situation happens in the world so there's a sea of predicates and most of the only a small amount are useful for the kinds of things that happen in the world I think that I would say only small part of predicates very useful useful all of all of them only very few are what we should let's call them good predicates very good particular very good predicates so can we linger on it what's your intuition why is it hard for a machine to discover good predicates I even in my top described after the brain have to find new predicate I'm not sure that it is very good what is you're proposing it up no in my talk I gave example for diabetes they belong m1 when we achieve some percent so then we're looking from area where some sort of predicate each I formulate does not keeps invariant so if it doesn't keep I train my data I select only function which keeps this invariant and when I did it I improve my performance I can looking for this predicate I know technically have to do that and you can of course do it using machine but I am NOT shows that video instructs the smartest predicate but this is the allow me linger on it because that's the essence that's the challenge that is artificial that's that's the human level intelligence that we seek is the discovery of these good predicates you've talked about deep learning as a way to the predicates they use and the functions are mediocre so you can find better ones let's talk about deep learning sure let's do our I know only yawns Laocoon convolutional network and what else I don't know energy very simple convolution there's not much else eleven right yes I can do it like that when this one predicate it is convolution is a single predicate it's single it's it's single predict yes because you know exactly you take the derivative for translational and predicate this should be kept so that's a single predicate but humans discovered that one or least note that is every stick not too many predicates and that this big story because he undid it 25 years ago and I think so clear was added to the network and then I don't understand why you should talk about deep network instead of talking about piecewise linear functions which keeps this predicate whether you know a counter argument is that maybe the amount of predicates necessary to solve general intelligence say in space of images during efficient recognition of handwritten digits is very small and so we shouldn't be so obsessed about finding we'll find other good predicates like convolution for example you know there's there has been other advancements like if you look at the work with attention there's attentional mechanisms in especially used in natural language focusing the the network's ability to to learn at which part of the input to look at the thing is there's other things besides predicates that are important for the actual engineering mechanism of showing how much you can really do given such these predicates I I mean that's essentially the work of deep learning is constructing architectures that are able to be given the training data to be able to converge towards a function they can approximate you can keep generalize well this is an engineering problem oh yeah I understand but let's talk not on emotional level but on a mathematical area you have set of piecewise linear functions it is all possible neural networks it's just peaceful in ear functions this is many many pieces large large number to specify exactly but very large very large almost what this is still large is to simpler than sex illusionism reproducing kernel Hilbert space nish every hilda's set of function what's Hilbert space its space with infinite number of coordinates a function for expansion something so it's much richer so and when I talk about closed form solution like lot talking about this set of function not piecewise linear set which is particular case if it's small for the neural networks is a small part of the space here talk a function is a small small say a small set of functions they let me take that but it is fine which is fine I don't want to discuss a small or big retaken one so you have some set of functions so know when you're trying to create a he teacher you would like to create admissible set of function which all your tricks to use not all functions but some subset of the set of functions say when you introducing convolutional net it is way to make this subset useful for you but from my point of view convolutional it is something you want to keep some invariants say translation invariance but now if you understand this and you cannot explain on the level of a gears what neural network does you should agree is it it is much better to have a set of functions as I say this set of functions should be admissible it must keep season variances invariant and that in way you know that as soon as you incorporate new invariant set of function because smaller and smaller and smaller but all the invariants are specified by you the human yeah what I am hope that there is a standard predicate like prop so that what that what I want to find four digit recognition if they start it is completely new area what is intelligence about on the level starting from from Plata Sandhya what is vault of ideas so and I believe that is not too many yeah but you know it is a museum that mathematician doing something in their own network in in general function but people from literature from art they uses all the time that's right invariant saying say it is great of how people describe music we should learn from that in something on this level but so why flag Aamir probe who was just theoretical who study theoretical literature he found that you know let me throw that right back at you because there's a little bit of a that's less mathematical and more emotional philosophical Vladimir Propp I mean he wasn't doing math no and you just said an another emotional statement which is you believe that this Plato world of ideas is small I hope I hope do you do what's your intuition no if we can linger on it you know about is not just small or big I know exactly then when I introducing some predicates I decreased set of functions but my goal to degree set of function much by as much as pass by as much as possible good predicate which which does this then I should choose next predicate which does each degree set as much as possible so set of good predicate it is such a decrease this amount of admissible function of each good predicate significantly reduces the set of admissible functions that they're naturally should not be that many cleared predicates no but but if you reduce very well the VC dimension of the function of admissible set of function is small and you need not too much training data to the well and VC dimension by the way is a measure of capacity of the set of function right roughly speaking how many function in this set so you're decreasing decreasing and it might easy for you to find function you're looking for that the most important part to create good admissible set of functions and it probably there are many ways but the good predicated says that that can do that so that for for for this duct you should know a little bit about dog because what are the what is the three fundamental laws of ducks looks like a dog swims like a duck and quacks again you should know something about ducks to me not necessarily looks like a horse so so good it's nice it generalizes yes from the talk lock like edit and make sound like horse and something in run like horse and and moves like horse it is generally it is general predicate that this applied to dock but for dock you can say play chess like that you cannot say play chess why not see you're saying you can put it that would not be a good no you do not reduce a lot of you not do yeah yeah you never just say no function so you get the story is formal story in which a magical story is that you can use any function you want as a predicate but some of them are good some of them are not because some of them reduce a lot of functions thought miscible seta some of them but the question is I'll probably keep asking this question but how do we find such parrot what's your intuition when handwritten here in recognition how do we find the answer to your challenge yeah yeah I understand it's like that I understand what what what defined what it means I'm a new predicate yeah like guy who understand music can say this worth which he described him when he listened to music he understand music he use not too many different or you can do like prop you can make collection what you're talking about music about zoos about that it it's not too many different situation he described because we mentioned vitomir proper buys let me just mention there's a sequence of 31 structural notions they're common in stories and I think you called units units and I think they resonate I mean it starts just a given example abstention a member of the heroes community a family leaves the security of the home environment then it goes to the interdiction or forbidding edict or command is passed upon the hero don't go there don't do this the heroes warn against some action then step three violate violation of interdiction brace you know break the rules break out on your own then reconnaissance the villain makes an effort to attain knowledge needing to fulfill their plot so on it goes on like this ends ends in a wedding number 31 your aplia ever after no he just gave description of all situation he understands this vault of fossils yeah not for not focus like it photos or stories and this story is not in just for tales the stories in detective serials as well and probably in our lives we probably live but is this znz is a they're all set this predicate is good for different situation from movie from what for movie for theater by the way there's also criticism right there's an other way to interpret narratives from claude lévi-strauss I am NOT in this business and I know it's theoretical literature but looking in her eyes it's always the the philosophy - yeah yeah but at least there is a units it's not too many units that can describe but that I probably gives another units or in other way exactly another another set of unasyn another set of predicates it does not matter whole but they exist probably my my question is whether given those units whether without our human brains to interpret these units they would still hold as much power as they have meaning are those units enough when we give them to the alien species let me ask you do you understand digital digital emerges no I don't know no or when you can recognize this digit images that you understand you understand characters you understand no no no no I I it's it's the imitation versus understanding question because I don't understand the mechanism by which I don't know no I'm not talking about I'm talking about three decades you understand that it involves symmetry maybe structure maybe something cause I cannot formulate I just was able to find symmetries like negative symmetries that's really good so this is a good line I feel like I understand the basic elements of what makes a good hand recognition system my own like symmetry connects with me it seems like that's a very powerful predicate my question is is there a lot more going on that we're not able to introspect maybe I need to be able to understand a huge amount in the world of ideas thousands of predicates millions of predicates in order to do hand recognition I don't think so say you're you know both your hope and your intuition nicely clean enough you're using digits you're using examples as well theory says that if you will use all possible functions from Hilbert space all possible predicates you don't need training date you just will have admissible set of functions which contain one function yes so the trade-off is when you're not using all predicates you're only using a few good practice you need to have some training data yes because are the more the more good particles you have the last training day exactly that this intelligent blood still okay I'm gonna keep asking the same dumb question handwritten recognition to solve the challenge you kind of propose a challenge that says we should be able to get state of the art amnesty error rates by using very few sixty maybe fewer examples prediction what kind of predict is do you think it was the challenge so people who will solve this problem that will answer your answer do you think they'll be able to answer it in a human explainable way those are just new to write function that's it but so can that function be written I guess by an automated reasoning system whether we're talking about a neural network learning a particular function or another mechanism no no I'm not against neural network I am against admissible set of function which creates neural network you did it by hand you don't you don't do it by invariance by predicate vital by by reason but your nowas can then reverse the reverse step of helping you find a function just as the task of in your network is is to find a disentangle representation for example what they call is just define that one predicate function as really captures some kind of essence one not the entire essence but one very useful essence of this particular visual space do you think that's possible like um listen I'm grasping hoping there's an automated way to find good predicates right so the question is what are the mechanisms of finding good predicates ideas they you think we should pursue a younga restlessly I gave example so find situation where predicates did you suggesting don't create invariant it's like in physics first find situation where existing theory cannot just explain it find situation where the existing theory cannot explain this to see finding contradictions final contradiction and then remove this contradiction but in my case what means contradiction do point function which if you will use this function you do not keep in conveyance this is really the process of discovering contradictions yeah it is like in physics find situation where you have contradiction for one of the property for one of the predicate then includes the spread effect making invariance and solve against this problem now you don't have contradiction but it is not the best very probably I don't know - looking for predicates that's just one way okay that mono it was brute force way in the brute force way what about the ideas of some what big umbrella term of symbolic AI these what in 80s with expert systems sort of logic reasoning based systems is there hope there to find some through sort of deductive reasoning to find good predicates alright don't think so I think of just logic is not enough it's kind of a compelling notion now you know that when smart people sit in a room and reason through things it seems compelling and making our machines do the same is also compelling so everything is very simple when you have infinite number of predicates you can choose the function you want you have invariance and you can choose the function you want but you have to have we're not too many invariance to solve the problem so in half from infinite number function to select finite number and hopefully small for a number of functions which is good enough to extract small set of admissible functions so they've you be admissible it's for so because every function just decreased set to function and leaving admissible but it will be small but why do you think logic based systems don't can't help intuition not because you you should know you should know life this guy like probe he knows something and he tried to put in invariant his understanding that's the human yeah see you're putting too much value in to Vladimir Propp knowing something no it is my decision what means you more life what elements you know common sense no no you know something common sense it is some rules you think so common sense is simply rules common sense is every its mortality it's no it's it's fear of death it's love it's spirituality it's happiness and sadness all of it is tied up into understanding gravity which is what we think of as common sense they don't really discuss so bright I want to discuss understand digitally understand digital cognition you never bring up love and death you bring it back to digit recognition okay no you know it was durable because there is a challenge yeah which I she have to solve it before you have a student concentrate on this work I do suggest some sector so you mean Henry recognition yeah it's a beautifully simple elegant yet I think that I know invariance which will solve this do I sing some meanness but it is not universe it is maybe I want some universal invariance which are good not only for digit recognition for imaging the static so let me ask how hard do you think is 2d image understanding so if we can kind of Intuit handwritten recognition how big of a step leap journey is it from that if I gave you good I solved your challenge for Henry recognition how long would my journey then be from that to understanding more general natural images immediately understandeth as soon as you make a record because it is not for free as soon as you will create several invariance which will help you to get the same performance that the best neural net did using hundred ten maybe more than hundred times less examples you have to have something smart to dot that and you're saying that represent Mario it is predicate because you should put some idea how to do that but okay let me just pause maybe it's a turning point maybe not but handwritten recognition feels like a 2d two-dimensional problem and it seems like how much complicated is the fact that most images are projection of a three-dimensional world onto a 2d plane it feels like for a three-dimensional world who still we need to start understanding common sense in order to understand an image it's no longer visual shape and symmetry it's having to start to understand concepts of it understand life yeah yes yes you're you're you're talking cells that are different in value different every decade yeah and potentially much larger number you know might be but let's start from simple well yeah but you said that you know I I cannot think yes the ball things which I don't understand this I understand but I'm sure that I don't understand everything's there yeah as the constraints I do as simple as possible but not simpler and that is exact case with harridan every condition yeah but no that's the difference between you and I I welcome and enjoy thinking about things I completely don't understand because to me it's a natural extension without having solved handwritten recognition to wander how how difficult is the the the next step of understanding 2d 3d images because ultimately while the signs of intelligence is fascinating it's also fascinating to see how that maps to the engineering of intelligence and recognizing handwritten digits is not doesn't help you it might it may not help you with the problem of general intelligence we don't know it'll help you a little bit unclear it's unclear yeah but I would like to make a remark yes I start not from very primitive problem Mike a challenge problem I start with very general problem this Plateau so you understand and it comes from plotted so digit recognition so so you basically took Plato and the world of forms and ideas and mapped and projecting into the clearest simplest formulation of that big world and you know I will say that I did not understand Plata until recently and until I consider the convergence and then predicate and you know this is what plot at all so linger on that like why how do you think about this world of ideas and world of things in play-doh no it was me tougher it is it's the matter for for sure yeah compelling it's a poetic and a beautiful for what can you but it is the way of you you should try to understand have a talk I guess since the world so from my point of view it is very clear but it is line all the time people looking for that say plateaus in Hegel whatever reasonable it exists whatever exist it is reasonable I don't know what he have in mind reasonable right there's philosophers again no no no no it is it is next stop of vignale that mathematics understand something good in reality it is the same plot a line and then it comes suddenly so Vladimir Propp look 31 IDs 31 units disconnect everything there's abstractions ideas that represent our world and we should always try to reach into that yeah but what you should make a projection on reality but understanding is it is abstract ideas you have in your mind several abstract ideas which you can apply to reality and reality in this case sir if you look at machine learning as days example did data okay let me let me put you put this on you because I'm an emotional creature I'm not a mathematical creature like you I find compelling the idea forget this the space the sea of functions there's also a sea of data in the world and I find compelling that there might be like you said teacher small examples of data that are most useful for discovering good whether it's predicates or good functions that the selection of data may be a powerful journey a useful mekin you know coming up with a mechanism for selecting good data might be useful to do you find this idea of finding the right data set interesting at all or do you kind of take the data set as a given I think that it is yeah you know my scheme is very simple you have huge set of fun questions if you will apply and you have not too many data if you pickup function which describes this data you will do not very well you know randomly yeah usually fit yeah it will be our ever fitting so you should decrease set of function from which you picking up one so you should go some have two admissible set of function now this what about these conversions so but from another point of view to to make admissible set of function you need just a DG just function which you will take in inner product which you will measure property of your function and that is how it works no I get it I get understand that but do you that the reality is let's let's look this car let's think about examples you have huge set of function if you have several examples if you just trying to keep the take function which satisfies these examples you still do overfit you need decreases you new tab miscible set of function yeah absolutely but what say you have more data than functions so sort of consider though I mean maybe not more data than functions because that's unfortunately impossible but what I was trying to be poetic for a second I mean you have a huge amount of data a huge amount of examples but the function didn't even get bigger I understand there's always there's a long ago well full human space I catch it but okay but you don't you don't find the world of data to be an interest optimization space like the the optimization should be in a space of functions in creating admissible set of unnecessary force no you know even from the classical accessory from structure risk minimization you should or you should organize function in the way that they will be useful for you right and that is the way you're thinking about useful is you're given a small small small set of functions which contain function by looking quo yep as looking for based on the empirical set of small examples yeah but that is another story I don't touch it because I I believe I believe that this small examples it's not too small say sixty per class law of large numbers works I don't need uniform law the story is that in statistics there are two law law of large numbers in uniform law of large numbers so I want to be in situation where I use law of large numbers no but not uniform law of large numbers right so 60 is love it's large enough I hope no it still need some evaluation some bonds so that's what idea is the following that if you trust that say this average gives you something close to expectations so he you can talk about that about this predicate and that is basis of human intelligence right good predicates is the discovery of good predicate is the basis of it is discovery of you of your understanding world of your methodology or this type of understanding wall because you have several function which you will apply to reality okay can you say that again so you're you have several functions predicate but the abstract yes then you will apply them to reality to your data and you will create in this very predicate which is useful for your task but predicates are not related specifically to your task to the C a task it is abstract functions which being applying apply to planning tasks that you might be interested it might be many tasks freedom or different tasks well they should be many tasks yeah I dislike like in prop case it was for free details but such happened everywhere okay so we talked about images a little bit can we talk about Noam Chomsky for a second verify I don't know him personally what not personally I don't know his ideas these ideas well let me just say do you think language human language is essential to expressing ideas as Noam Chomsky believed so like languages at the core of our formation of predicates the human language language and all the story of language is very complicated I don't understand this and I am NOT I thought about nobody I'm not ready to work on that because it's so huge it is not for me and I believe not for our century it's a 21st century not for 21st century so you should learn something a lot of stuff from simple tasks like digit recognition so you think you think digital recognition to the image what how would you more abstractly define a digit recognition it's 2d image symbol recognition essentially I mean I'd like I'm trying to get a sense sort of thinking about it now having worked with amnesty forever how could how small of a subset is this of the general vision recognition problem and the general intelligence problem is it yeah is it a giant subset is it not and how far away is language you know let me refer to entertain take the simplest problem as simple as possible but not simpler and this is challenge is simple problem but it's simple by a year but not simple to to get it when you will do this you will find some predicate without you oh yeah I mean with I what Einstein you can you you look at general relativity but that doesn't help you with quantum mechanics that's another story you don't have any universal instrument yes so I'm trying to wonder if which space were in whether the whether handwritten recognitions like general relativity and then languages like quantum mechanics are you're still going to have to do a lot of mess to to universalize it but I'm trying to see one so what's your intuition why handwritten recognition is easier than language just I think a lot of people would agree with that but if you could elucidate sort of the the intuition of why I don't know no I don't think in this reaction I just think in congestions that this problem which I feel so it well we will create some abstract understanding of images maybe not all images I would like to talk to guys who doing real images in Columbia University what kind of images unreal it's a real image really yeah what the Reggie Israel predicate what can be predicated I still symmetry will play role in real life images in any real life images 2d images let's talk about to the image because that's what we know a neural network was created for 2d images so the people I know in vision science for example the people study human vision you know that they usually go to the world of symbols and like handwritten recognition but not really it's other kinds of symbols to study our visual perception system as far as I know not much predicate type of thinking is understood about our vision system so do not assume conscious direction they don't yeah they but how do you even begin to think in that direction that's a sorry I'd like to discuss with them yeah because if we will be able to show that it is what working and surely it's caused him it's not so bad so the the unfortunate so if we compare the language language has like letters finite set of letters and a finite set of ways you can put together those letters so it feels more amenable to kind of analysis with natural images there is so many pixels no no no letter language is much much more complicated it's involved a lot of different stuff it's not just understanding of very simple class of tasks I would like to see lists of tasks where language involved yes so there's a there's a lot of nice benchmarks now on natural language processing from the very trivial like understanding the elements of a sentence to question answering it more much more complicated where you talk about open domain dialogue the natural question is with handwriting recognition is really the first step yeah of understanding visual information all right but not but but even our records shows that we go in the wrong direction of course we live sixty thousand digits so even this first step so forget about talking about the full journey this first step should be taking in the right or wrong direction because 60,000 pieces unacceptable no I'm saying it should be taken in in the right direction or the 60,000 is not acceptable because you can talk great off percent of error and hopefully the step from doing hand recognition using very few examples the step towards what babies do when they crawl and understand that I know babies will do from very small examples yeah you will find principles that will show the difference from what we using it now and so let's call it's more or less clear that means that you here you'll use deep converges not just strong convergence do you think these principles are will naturally be human interpretable oh yeah so like when we will be able to explain them and have a nice presentation to show what those principles are or are they very going to be very kind of abstract kinds of functions for example I talked yesterday about symmetry yes and it gave very simple examples the same will be like you gave like a predicate of a basic four for symmetries yes four different symmetries in you have four degree of symmetry that this is important not just symmetry existent doesn't exist the degree of symmetry yeah for Herod recognition no it's not for anything it's for ebony images but I would like apply 200 right it's in theory it's more general okay okay so a lot of things we've been talking about Falls we've been talking about philosophy a little bit but also about mathematics and statistics a lot of it falls into this idea a universal idea of statistical theory of learning what is the most beautiful and sort of powerful or essential idea you've come across even just for yourself personally in in the world of statistics or statistic theory of learning probably uniform convergence which we did this Aleksei children ents can't describe university versions you have love love law of large numbers so for any function expectation of function average of function congested expectation but if you have set of functions for any function in test row but it should converge simultaneously for all set of functions and for learning you need uniform convergence just convergence is not enough because when you pick up one which gives minima you can pick up one function which does not converging and it will give you the best answer for for this function so you need two uniform convergence to guarantee learning so learning does not relieve Ontario law of large numbers early on Universal but a deal of this convergence existing statistics for a long time but it is interesting that as I think about myself how stupid I was fifty years I did not see the convergence I work on the on strong convergence but now I think that most powerful is the convergence because it makes admissible set of functions and even in all Prada in Proverbs when people try to understand recognition more dark law looks like a dark and so on they use the convergence people in language they understand this but when the trying to create artificial intelligence if you want present in different way we just consider strong convergence armaments so reducing set admissible functions you think there should be effort put into understanding the properties of weak convergence you know in classical mathematics in gilded space zero only to fail to form of contortions strong and weak now we can use balls that means that we did everything and it so happened then when we use Hilbert space which is very rich space space of continuous functions which has an interval and square so we can apply weaken strong convergence for learning and have closed form solution so for can be computationally simple for me it's a sign that it is right way because you don't need any every stick you yes know whatever you want but no the only what lift it is concept of what is political of predicates but it is not statistics by the way I like the fact that you think the heuristics our mess that should be removed from the system so closed-form solution is the ultimate no it's equipment than when you're using right instrument you have closed one solution do you think intelligence human level intelligence when we create it will will have something like a cost for her solution you know I know I'm looking on bones which I gave bones for on virgins when I looking for bones I thinking what is the most appropriate kernel for this bond would be so you know the team saved all our businesses we use radial basis function but looking consumer on taste things that I start to understand that maybe we need to make corrections to Rigel basis function to be closer to work better for this bonds so I'm again trying to understand what type of kernel best approximation no no proximation best fit to this ball sure so there's a lot of interesting work that could be done in discovering better functions and radial basis functions for your bounds behind it still comes from you you're looking to mass and trying to understand what from your own mind looking at the yeah but I don't know then I trying to understand what would you be good that yet but to me there's still a beauty again maybe I'm a descendant volunteering to heuristics to me ultimately intelligence would be a mass of heuristics and that's the engineering and so absolutely when when you're doing say self-driving cars the great guy who will do this it does not matter what theory behind that who has a better feeling after applied but by the way it is the same story both predicates because you cannot create a rule for situation as much more than you have room for that but maybe you can have more abstract rule then it will be less than zero it is the same story about the decent and a GS apply to the specific cases but story should you cannot avoid this yes of course but you should still reach for the ideas to understand science yeah let me kind of ask do you think neural networks or functions can be made to reason sort of what do you think we've been talking about intelligence but this idea of reasoning as a is an element of sequentially disassembling interpreting the the images so when you think of handwritten recognition we kind of think that there will be a single there's an input and output there's not a recurrence your what do you think about sort of the idea of recurrence of going back to memory and thinking through this sort of sequentially mangling the different representations over and over until you arrive at a conclusion or is ultimately all that can be wrapped up into a function you you suggesting that let us use this type of algorithm when they starting thinking hi first of all starting to understand what I want can I write down what I want and then I trying to formalize and when I do that I think you have to solve this problem and still no I did not see a situation where you need recurrence very good but do you observe human beings yeah do you try to it's the imitation question right it seems the human being the reason this kind of sequentially so does that inspire in your thought that we need to add that into our intelligence systems you're saying okay I mean you've kind of answer saying until now I haven't seen a need for it and so because of that you don't see a reason to think about it you know most of things I don't understand in reasoning human it is for me to complicate it for me the most difficult part is to ask questions to good questions how it works half of people asking questions I don't know you said the machine learning is not only about technical things speaking of questions but it's also about philosophy so what role does philosophy play in machine learning we talked about Plato but generally thinking in this philosophical way does it have how does philosophy math fit together in your mind so studies on sentence their implementation it's like predicates like say admissible set of functions it comes together and we think because the first iteration of surgery was done fifty years ago with all that necessary everything's there if you have data you can and you could be in your set of function he is not has not have not big capacity so Laurie see dimension you can do that you can make structural risk minimization control capacity but he was not able to make admissible set of function God no when suddenly realized that they did not use another idea of convergence which we can everything comes together but those are mathematical notions philosophy plays a role of simply saying that we should be swimming in the space of ideas let's let's talk what is philosophy philosophy means understanding of life so understanding of life say people like Plata they understand on very high abstract level of life so then whatever I doing it just implementation of my understanding of life but every new step it is very difficult for example to find this idea that we need the convergence was not simple for me so there how are you thinking about life a little bit hard to heart hard to trace but there was some thought process you know I work in coach thinking about the same problem for 50 years or more and again and again again I trying to be understand that is a very important not to be very enthusiastic yeah but concentrate on whatever he was not able to achieve relation to me and understand why and now I understand that because I believe in math I believed it in business idea but now when I see that there are only two way of convergence and we using boss that means that we must owe as well as people doing but know exactly in philosophy and what we know about predicate between Cogley understand life can be described as a predicate I thought about that and that is more or less obvious level of symmetry but next favor feeling it's something about structures but I don't know how to formulate how to measure and measure structure and all the stuff and guy who will solve this challenge problem then when we were looking how he did it probably just only symmetries not enough but something like some H will be death so absolutely cemetery of the CERN on level of symmetry will be same and level of symmetry antisymmetric Jurgen electrical and I even don't know how you can use in different direction idea of symmetry that's very general but it will be there I think the people very sensitive to radial symmetry but there are several ideas like symmetry as I would like the lot but you cannot learn just thinking about that you should do challenging problems and then analyzing why why it was we was able to solve them and then we will see simple things it's not easy to find even with talking about this every time he had about you I was surprised I try to understand these people describe in language strong convulsions mechanism for learning I did not see I don't know but we convergence this dark story and story like that when you will explain to keep evil use weak convergence argument it looks like it does like a desert but when you try to formalize you just ignoring this why why fifty years from start of machine and after all flus I think I I think that might be I don't know maybe this is also we should blame for that because empirical risk minimization of the stuffin if you read now textbooks they just about bound both empirical risk minimization they don't looking for another problem like admissible said but on the topic of life perhaps we you could talk in Russian for a little bit what's your favorite memory from childhood like I'll actually be my apologies gesture Oh music how about can you try to answer in Russian musica but below oceans door overcome de la musica cause she's gonna make noise my competitor it's natural to believe I'll gie working below Jillian detect motion app atomic at the poem Walker yeah Bob's a friend a petition statute what this avoid Edom after switch to it / Jakarta cheapest doctoral of Bahia now Kenya Newton werster moved on prostitute States offense dr. Janna doom Western Arizona aluminum to his knee senior s Dillon me it's mostly pretty cot feudal structure of rupees doctora surely she had machine instructor would would come tonight see what they say what was so clear on the political data bah he even just you know now that we're talking about Bach let's switch back to english cuz I like Beethoven and Chopin so I'm shoppin it's another amusing story I was but back if we talk about predicates Park probably has the most sort of well-defined predicates and I you know it is very interesting to read what critics writing about Bach which wards are using they trying to describe three decades and and and then shop when it is very different vocabulary very different predicates and I think that if you will my collection on net so maybe from this you can describe predicate four digit recognition well from Bach and Chopin no no not from Bach and Chopin from the critic interpretation of the music yeah but they trying to explain you music what the uses as a use they describe high level ideas of of plateaus at years but behinds is music that's brilliant so art is not self-explanatory in some sense so you have to try to convert it into ideas it was peaceful oblems when when you go from ideas to to the representation it is easy way but when you're trying to go back it is you'll post problems but nevertheless I believe that when you're looking from that even from art you will be able to find predicates for digit recognition it's such a fascinating and powerful notion do you ponder your own mortality do you think about it do you fear it do you draw insight from it immortality oh yeah are you afraid of that not too much not too much it is peaches it will not be able to do something we shall see I favor healing to do that for example and you'll be very happy to work with various television from music to write this collection of description what what have they describe music our seniors or predicate and from art as well then take what is in common and try to understand predicates which is absolute for everything and where is that for visual recognition exactly other there's still time we got time it's take years and years well see you've got the patient mathematic mathematicians mind I think it could be done very quickly and very beautifully I think it's a really elegant idea yeah also some of many yes you know the most time it is not to make this collection to understand what is the common to think about that once again and again and again and again again but I think sometimes especially just when you say this idea now even just putting together the collection and looking at the different sets of data language trying to interpret music criticize music and images I think there will be sparks of ideas I'll come of course again again you'll come up with better ideas but even just that notion you know is a beautiful notion or even give some example so I have friend who specialist in Russian poetry she is professor of track of Russian poet II he did not write poems but she know a lot of stuff she makes book several books in one of them is a collection of Russian poetry share images of Russian poetry collect all images of Russian poets and I asked her to do following you have nibs digit recognition and we get hundred digits lessons on a table I don't remember maybe fifty digits and try from political point of view describe every image we see using only words of images of Russian poet and she did it in them Detroit - I call it loading fusing privileged information I call it privileged information you have on two languages one language is just image of digit in another language politics description of this image and this is privileged information there is a algorithm when we are working using privileged information you're doing well web better much better so there's something there something there and there is a in any theme she unfortunately direct the collection of digits in poetic descriptions of these digits there is some something there in that poetic description but I think that there is a abstract ideas on the plateau the level of and yes yeah that there there that could be discovered and music seems to be a good entry but as soon as you start this is this challenge problem the challenge from nine it immediately connected to talk to all the stuff especially with your talk and this podcast and I'll do whatever I can to advertise it's such a clean beautiful Einstein like formulation of the challenge before us right let me ask another absurd question we talked about mortality we talked about philosophy of life what do you think is the meaning of life what's the predicate for mysterious existence here on earth I don't know it's very interesting have v in Russia I don't know you know the guy strugatsky they are I think she's a thinking about human what what's going on and say favor dia that Zara just developing two type of people common people and very smart people they just started and these two branches of people will go in different direction very soon so that's what they thinking about life so the purpose of life is the creative two paths human societies yes simple people and more complicated which do you like best a simple people are the complicated ones you know the little he's just his fantasy but you know every week we have guy who is just writer and also so let's cuff literature in he explained have here understand literature and human relationship have his seal life and I understood that I'm just small kids comparing the him she is very smart by in understanding life he knows this predicate he he knows big blocks of life I am I am used every time when I listen to him and he just talking about it rich and I think that I was surprised so the managers in big companies most of them are guys who study English language in English literature so why because they understand life they understand models and among them maybe many talented critics is just analyzing this and this is big science like property this is this blocks it amazes me that you are and continue to be humbled by the brilliance of others I'm very modest about myself why she so small nor so wrong well let me be immodest for you you're one of the greatest mathematician statistician of our time it's truly an honor and making your job ok ok let's talk it is not yeah yeah I know my limits let's let's talk again when your challenge is taking on and solved by a grad student especially he brokered me when they using scripting maybe music will be involved Lattimore thank you so much as been thank you very much thanks for listening to this conversation with vladimir vapnik and thank you to our presenting sponsor cash app download it used collects pot cast you'll get ten dollars and ten dollars a good at first an organization that inspires and educates young minds to become science and technology innovators of tomorrow if you enjoy this podcast subscribe on youtube give it five stars an apple podcast supported on patreon or simply connect with me on Twitter and lex friedman and now let me leave you with some words from vladimir vapnik on solving a problem of interest do not solve a more general problem as an intermediate step thank you for listening I hope to see you next time you