Kind: captions
Language: en
the following is a conversation with
Vladimir of APNIC part 2 the second time
we spoke in the podcast
he's the co-inventor of support vector
machines support vector clustering vici
theory and many foundational ideas is
the disco learning he was born in the
Soviet Union worked at the Institute of
control sciences in Moscow then in the
u.s. worked at AT&T NEC labs Facebook AI
research and now is a professor at
Columbia University his work has been
cited over 200,000 times the first time
we spoke on the podcast was just over a
year ago one of the early episodes this
time we spoke after a lecture he gave
titled complete statistical theory of
learning as part of the MIT series of
lectures on deep learning and AI that I
organized I'll release the video of the
lecture in the next few days this
podcast and lecture are independent from
each other so you don't need one to
understand the other the lecture is
quite technical and math heavy so if you
do watch both I recommend listening to
this podcast first since the podcast is
probably a bit more accessible this is
the artificial intelligence podcast if
you enjoy it subscribe on YouTube give
it five stars on Apple podcast supported
on patreon or simply connect with me on
Twitter at Lex Friedman spelled Fri DM
aen as usual I'll do one or two minutes
of ads now and never any ads in the
middle that can break the flow of the
conversation I hope that works for you
and doesn't hurt the listening
experience this show is presented by cap
the number one finance app in the App
Store when you get it used collects
podcast cash app lets you send money to
friends buy Bitcoin and invest in the
stock market with as little as $1
brokerage services are provided by cash
up investing a subsidiary of square and
member s IPC since cash app allows you
to send and receive money digitally
peer-to-peer and security in all digital
transactions very important let me
mention the PCI data security standard
PCI DSS level 1 a cash app is compliant
with
I'm a big fan of standards for safety
and security and PCI DSS is a good
example of that or a bunch of
competitors got together and agreed that
there needs to be a global standard
around the security of transactions now
we just need to do the same for
autonomous vehicles and the AI systems
in general
so again if you get cash out from the
App Store or Google Play and use the
collects podcast you get ten dollars in
cash people will also donate ten dollars
to first one of my favorite
organizations that is helping to advance
robotics and STEM education for young
people around the world and now here's
my conversation with vladimir vapnik you
and I talked about Alan Turing yesterday
a little bit and that he as the father
of artificial intelligence may have
instilled in our field an ethic of
engineering and not science seeking more
to build intelligence rather than to
understand it what do you think is the
difference between these two paths of
engineering intelligence and the science
of intelligence with completely
different story engineering his
imitation of human activity you have to
make device which behaved as human be
fair have all the functions of human it
does not matter how you do it but to
understand what is intelligence but is
quite different problem so I think I
believe that it's somehow related to
predicate we talked yesterday about
because look at the vladimir probes idea
he just found 31 he predicates
he called it units which can explain
human behavior at least in the russian
tales here local Russian tales and
derive from that
than people realize that that more
vitamin ration depths it isn't TV in
movie serials and for so long so you're
talking about Vladimir Propp alright who
in 1920 published a book morphology of
the folktale describing 31 predicates
that have this kind of sequential
structure that a lot of the stories
narratives follow in Russian folklore
and in other content we'll talk about it
I'd like to talk about predicates in a
focused way but let me if you allow me
to stay zoomed out on our friend Alan
Turing and you know he inspired a
generation with the the imitation game
yes do you think if we can linger in a
little bit longer do you think we can
learn do you think learning to imitate
intelligence can get us closer to the
scienter understanding intelligence so
why do you think imitation is so far
from understanding I think that it is
different between you have different
goals so your goal is to create
something something useful and that is
great and you can see how much things
was done and I believe that it will be
done even more yet self-driving cars and
also there's business it is great and it
was inspired by curing vision but
understanding is very difficult it more
was philosophical category
what means understands evolved I believe
in him which start from Plateau that
there exists volt of ideas I believe
that intelligence it is volved a five
years but it has vault of pure ideas and
when you combine save this
reality sings it creates as in my face
invariance which is very specific and
that I believe the combination of ideas
in way to constructing conveyance is
intelligence but first of all predicates
if you know predicates and hopefully
them not not too much predicate exists
for example 31 predicates for human
behaviors not a lot
Vladimir Propp used 31 you can even call
particles 31 predicates to describe
stories narratives what do you think
human behavior how much of human
behavior how much of our world our
universe all the things that matter in
our existence can be summarized in
predicates of the kind that problems
working with I think that's we have a
lot of form of behavior but I think the
predicate is much less because even in
these examples which I gave you
yesterday you saw that predicates can be
can construct one predicate can
construct many different invariance
depending on on your data they're
applying to different data and they give
different invariance so but pure ideas
maybe not so much not so many less I
don't know about that but my guess I
hope just very challenged about digit
recognition how much you need I think
we'll talk about computer vision and 2d
images a little bit in your challenge
that's exactly both intelligence that's
exactly that's exactly about know that
hopes to be exactly about the spirit of
intelligence in the simplest possible
way absolutely you should start this
simple story of the very serial to do
well there's an open question whether
starting at the amnesty digit
recognition is a step towards
intelligence or it's an entirely
different thing I think that to beat
records using a hundred two hundred
times less examples you new to
intelligence you need intelligence so
let's because you used this term and
it'll be nice and I'd like to ask simple
maybe even dumb questions let's start
with a predicate in terms of terms and
how you think about it what is a
predicate I don't know I have a feeling
for Molly as they exist but I believe
that predicate for 2d images one of them
is symmetry hold on a second sorry sorry
to interrupt and pull you back at the
simplest level we're not evens we're not
being profound currently a predicate is
a statement of something that is true
yes do you think of predicates as
somehow probabilistic in nature or is
this binary this is truly constraints of
logical statements about the world in my
definitions of simplest predicate is
function function and you can use this
function to move inner product that is
predicate what's the input and was the
output of the function input is
something which is input in reality so
if you consider digit recognition it
picks up space yes input but it is
function which in pixel space but it can
be any function from pixel space and you
choose and and I believe that there are
several functions which is important to
understanding of images one of them is
symmetry it's not so simple construction
as I described this little irritated
other stuff but another I believe I
don't know how me
is how well structure eyes is picture
structure eyes yeah what I mean by
structure eyes it is formal definition
so something happens heavy on the left
corner not so heavy is the middle and so
on you describe in general concept of
what what use you concept some kind of
universal concepts yeah but I don't know
how to formalize this do you so this is
the thing there's a million ways we can
talk about this I'll keep bringing it up
but we humans have such concepts when we
look at digits but it's hard to put them
just like you're saying now it's hard to
put them into words you know that this
example when critics in music trying to
describe music they use predicates and
not too many predicates but in different
combination but they have some special
words for describing music and the same
should be for images but my bizarre are
critics who understand essence of what
this images about do you think there
exists critics who can summarize the
essence of images human beings the eye
hopes with years but that explicitly
state them on paper this is the
fundamental question I'm asking is do
you do you think there exists a small
set of predicates that will summarize
images it feels to our mind like it does
that the concept of what makes a two and
A three and a four
no no it's not on this level what it
should not describe two three four it
describes some construction which allow
you to create invariance
in variants sorry to stick on this but
terminology invariance it is it is
protective of your image say I can say
looking on my image it is more or less
symmetric and I can give you a value of
symmetry say level of symmetry using
this function which I gave yesterday
then you can describe that your image
have these characteristics exactly in
the way of musical critics described
music so but this is invariant applied
to two specific data to specific music
to something I strongly believe in in in
this plot ideas answer exists world of
predicate and world of reality and
predicate in the reality is somehow
connected and you have to know that
let's talk about Plato a little bit so
you draw a line from Plato to Hegel to
Wagner to today yes so Plato has forms
the the theory of forms there's a world
of ideas
yeah world of things as you thought
along and there's a connection and
presumably the world of ideas is very
small
and the world of things is arbitrarily
big but they're all what Plato calls
them like the it's a shadow the real
world is a shadow from the world of yeah
you have projection projection
Altaf idea yes right oh and in reality
you can realize this projection Union
using canvas invariance because it is
projection for on specific examples
which create specific features of
specific objects so so the essence of
intelligence is while only being able to
observe the world of things try to come
up
the world of ideas exactly like in this
music story intelligent musical critics
knows the soldiers more than favorite
feeling about Thornton I feel like
that's a contradiction intelligent music
critics but I think I think music is to
be enjoyed in all its forms the notion
of critic like a food critic no I don't
want dark mushroom that's an interesting
question
there's emotion there's a certain
elements of the human psychology of the
human experience which seem to almost
contradict intelligence and reason like
emotion like fear like like a love all
those things are those not connected in
any way to the space of ideas thus I
don't know I I just want to be
concentrate on a very simple story on
digit recognition so you don't think you
have to love and fear death in order to
recognize digits I don't know because
it's so complicated it is it involves a
lot of stuff which I never consider but
I know about digital news and I know
that four digit recognition to to get
records from small number of
observations you need predicate but not
special predicate for this problem but
Universal predicate which understand
world of images of visual and visual yes
but on the first step they understand
say world of handwritten digits or
characters or something simple so like
he said symmetry as an interest no
that's what I think one of the
predicates is related to symmetry but
the level of symmetry ok degree of
symmetry so you know you think symmetry
at the bottom as a universal notion and
there's the
there's degrees of a single kind of
symmetry or is there many kinds of
symmetries many kinds of symmetries
there is a symmetry anti symmetry say
letter s so it has vertical anti
symmetry and it could be diagonal
symmetry vertical CIMMYT so when you
when you cut vertically the letter S
yeah then the upper part in lower part
in different directions along the y axis
yeah but that's just like one example
symmetry isn't there like right but
there is a degree of symmetry if you
play all this little relative stuff to
to do tangent distance whatever I
described you can do you can have a
degree of symmetry and that is
describing reason of image it is the
same as you will describe this image
saying about Digitas it has anti
symmetry did you see symmetric molars
look for symmetry do you think such
concepts like symmetry predicates like
symmetry is it a hierarchical set of
concepts or are these independent
distinct predicates that we want to
discover as some set of noise idea of
symmetry and you can this idea of
symmetry make very general like degree
of symmetry the degree of symmetry can
be zero no symmetry at all degree of
symmetry say more or less symmetrical
but you have one of this description and
symmetry can be different as I told
horizontal vertical diagonal and anti
symmetries it also concept of symmetry
what about shape in general I mean
symmetry is a fascinating notion but you
know I'm talking about digit I would
like to concentrate on all I would like
to know predicates for digit recognition
yes but symmetry is not enough for digit
recognition right it was not necessarily
for digital cognition it helps to create
invariant which will which you can use
when you will have examples for
digitalization you have regular problem
of digital communication you have
examples of the first class second class
plus you know that the resistor exists
concept of symmetry in you apply when
you looking for decision rule you will
apply concept of symmetry of this level
of symmetry which you estimate from so
let's let's talk everything is consumed
if convergence
what is convergence what is we
convergence what is strong convergence
so sorry I'm going to do this here
what are we converging from until you
converge ink you would like to have a
function the function which say
indicator function which indicate your
digit 5 for example a classification
task
let's talk only about classification so
classification means you will say
whether this is a 5 or not or say which
of the ten digits it is all right right
I would like to have these functions
then I have some exam
I can consider protégée of these
examples say symmetry and I can measure
a level of symmetry for every digit and
then I can take average and I from from
my training data and I will consider
only functions of conditional
probability which I am looking for my
decision rule which applying to two
digits will give me the same average as
they absorb on training date so actually
this is different level of description
of what you want you want not just your
so not one digit you show this this
predicate so general property of all
digits which you have in mind if you
have in mind digits three it gives you
property of digits three and you select
as admissible set of function only
function which keeps this product you
will not consider as a functions so you
immediately looking for smaller subsets
of function that's what I mean by
admissible functions you add a musical
function exam which is still a pretty
large for the number three a little R
it's a large but if you have one
predicate but according to there is a
strong indeed convergence strong
convergence is convergence and function
you're looking for the function from one
function and you're looking concern as a
function and square difference from them
should be small if you take difference
in any points make a square make an
integral and it should be small
that is convergence in function suppose
you have some function any function so I
would say I say that some function
converge to this function if integral
from squared difference between them is
small that's the definition of strong
convergence that definition of a few
functions integral the difference PS ma
it is convergence in functions yeah but
you have different convergence in
functionals you take any function you
take some function C and take inner
product this function this F function f
0 function which you want to find and
that gives you some value so you say is
it set of functions converge in inner
product to this function if this value
of inner product converge to value F 0
that is for one V but V converges
requires that it converge for any
function of Hilbert space if it converge
for any function of Hilbert space then
you will say that this is the
convergence you can think that when you
take integral that is protecting
integral protect your function for
example if you will take sine of a sine
it is coefficient of say Fourier
expansion so it if it converge for all
coefficients of free expansion so under
some condition it converge doto2
function you're looking for but the
convergence means any property converges
not point wise but integral protégée of
function
so the convergence means integral
property of functions when I talking
about predicate I would like to
formulate which integral protectees I
would like to have for convergence so
and if I will take one predict
predicated function which I measure
property if I will use one predicate and
say I will consider only function which
give me the same value as less this
predicate I selecting set of functions
from functions which is admissible in
the sense that function which are
looking for in this set of functions
because I checking in training data it
gives the same yes it's always has to be
connected to the training data in terms
of yeah but but protégée you can know
independent on training date and this
guy prop yeah so the series formal
property 31 property and you've married
a Russian fairy tale all right but a
Russian fairy tale is not so interesting
more interesting that people apply this
to two movies to theater two to two
different things and the same works the
universal well so I would argue that
there's a little bit of a difference
between the kind of things that were
applied to which are essentially stories
and digit recognition it was the same
story you're saying digits there's a
story within the digit yeah so but my my
point is why I hope that it possible to
beat rear court using not 60,000 but a
hundred times less because since that
you will give predicates and you will
select
decision not from wide set of functions
but from set of function which keeps us
predicate but predicate is not related
just to digital cognition right so like
in blotter space do you think it's
possible to automatically discover the
predicates this so you basically said
that the essence of intelligence is the
discovery of good predicates yeah now
the natural question is you know that's
what Einstein was good at doing in
physics can we make machines do these
kinds of discovery of good predicates or
is this ultimately a human endeavor yes
I don't know I don't think that machine
can do because according to theory both
with convergence any function from
hilbert space can be predicated so you
have infinite number of predicates in
opera and before you don't know which
predicate is good on me but whatever
prop show and what people call it
breakthrough that there is not too many
predicates which cover most of situation
happens in the world so there's a sea of
predicates and most of the only a small
amount are useful for the kinds of
things that happen in the world I think
that I would say only small part of
predicates very useful useful all of all
of them only very few are what we should
let's call them good predicates very
good particular very good predicates so
can we linger on it what's your
intuition why is it hard for a machine
to discover good predicates I even in my
top described after the brain
have to find new predicate I'm not sure
that it is very good what is you're
proposing it up no in my talk I gave
example for diabetes they belong m1 when
we achieve some percent so then we're
looking from area where some sort of
predicate each I formulate does not
keeps invariant so if it doesn't keep I
train my data I select only function
which keeps this invariant and when I
did it I improve my performance I can
looking for this predicate I know
technically have to do that and you can
of course do it using machine but I am
NOT shows that video instructs the
smartest predicate but this is the allow
me linger on it because that's the
essence that's the challenge that is
artificial that's that's the human level
intelligence that we seek is the
discovery of these good predicates
you've talked about deep learning as a
way to the predicates they use and the
functions are mediocre so you can find
better ones let's talk about deep
learning sure let's do our I know only
yawns Laocoon convolutional network and
what else
I don't know energy very simple
convolution there's not much else eleven
right yes I can do it like that when
this one predicate it is convolution is
a single predicate it's single it's it's
single predict yes because you know
exactly you take the derivative for
translational and predicate this should
be kept so that's a single predicate but
humans discovered that one or least note
that is every stick not too many
predicates and that this big story
because he undid it 25 years ago and
I think so clear was added to the
network and then I don't understand why
you should talk about deep network
instead of talking about piecewise
linear functions which keeps this
predicate whether you know a counter
argument is that maybe the amount of
predicates necessary to solve general
intelligence say in space of images
during efficient recognition of
handwritten digits is very small and so
we shouldn't be so obsessed about
finding we'll find other good predicates
like convolution for example you know
there's there has been other
advancements like if you look at the
work with attention
there's attentional mechanisms in
especially used in natural language
focusing the the network's ability to to
learn at which part of the input to look
at the thing is there's other things
besides predicates that are important
for the actual engineering mechanism of
showing how much you can really do given
such these predicates I I mean that's
essentially the work of deep learning is
constructing architectures that are able
to be given the training data to be able
to converge towards a function they can
approximate you can keep generalize well
this is an engineering problem oh yeah I
understand but let's talk not on
emotional level but on a mathematical
area you have set of piecewise linear
functions it is all possible neural
networks it's just peaceful in ear
functions this is many many pieces large
large number to specify exactly but very
large very large almost what this is
still large is to simpler than sex
illusionism reproducing kernel Hilbert
space nish every hilda's set of function
what's Hilbert space its space with
infinite number of coordinates a
function for expansion something so it's
much richer so and when I talk about
closed form solution like lot talking
about this set of function not piecewise
linear set which is particular case if
it's small for the neural networks is a
small part of the space here talk a
function is a small small say a small
set of functions they let me take that
but it is fine which is fine I don't
want to discuss a small or big retaken
one so you have some set of functions so
know when you're trying to create a he
teacher you would like to create
admissible set of function which all
your tricks to use not all functions but
some subset of the set of functions say
when you introducing convolutional net
it is way to make this subset useful for
you but from my point of view
convolutional it is something you want
to keep some invariants say translation
invariance but now if you understand
this and you cannot explain on the level
of a gears what neural network does you
should agree is it it is much better to
have a set of functions as I say this
set of functions should be admissible it
must keep season variances invariant and
that in way you know that as soon as you
incorporate new invariant set of
function because smaller and smaller and
smaller but all the invariants are
specified by you the human yeah
what I am hope that there is a standard
predicate like prop so that what that
what I want to find four digit
recognition if they start it is
completely new area what is intelligence
about on the level
starting from from Plata Sandhya what is
vault of ideas so and I believe that is
not too many yeah but you know it is a
museum that mathematician doing
something in their own network in in
general function but people from
literature from art they uses all the
time
that's right invariant saying say it is
great of how people describe music we
should learn from that in something on
this level but so why flag Aamir probe
who was just theoretical who study
theoretical literature he found that you
know let me throw that right back at you
because there's a little bit of a that's
less mathematical and more emotional
philosophical Vladimir Propp I mean he
wasn't doing math no and you just said
an another emotional statement which is
you believe that this Plato world of
ideas is small I hope I hope do you do
what's your intuition no if we can
linger on it you know about is not just
small or big I know
exactly then when I introducing some
predicates I decreased set of functions
but my goal to degree set of function
much by as much as pass by as much as
possible
good predicate which which does this
then I should choose next predicate
which does each degree set as much as
possible so set of
good predicate it is such a decrease
this amount of admissible function of
each good predicate significantly
reduces the set of admissible functions
that they're naturally should not be
that many cleared predicates no but but
if you reduce very well the VC dimension
of the function of admissible set of
function is small and you need not too
much training data to the well and VC
dimension by the way is a measure of
capacity of the set of function right
roughly speaking how many function in
this set so you're decreasing decreasing
and it might easy for you to find
function you're looking for that the
most important part to create good
admissible set of functions and it
probably there are many ways but the
good predicated says that that can do
that so that for for for this duct you
should know a little bit about dog
because what are the what is the three
fundamental laws of ducks looks like a
dog swims like a duck and quacks again
you should know something about ducks to
me not necessarily looks like a horse so
so good it's nice it generalizes yes
from the talk lock like edit and make
sound like horse and something in run
like horse and and moves like horse
it is generally it is general predicate
that this applied to dock but for dock
you can say play chess like that you
cannot say play chess why not see you're
saying you can put it that would not be
a good no you do not reduce a lot of you
not do yeah yeah you never just say no
function so you get the story is formal
story in which a magical story is that
you can use any function you want as a
predicate but some of them are good
some of them are not because some of
them reduce a lot of functions thought
miscible seta some of them but the
question is I'll probably keep asking
this question but how do we find such
parrot what's your intuition when
handwritten here in recognition how do
we find the answer to your challenge
yeah yeah I understand it's like that I
understand what what what defined what
it means I'm a new predicate yeah like
guy who understand music can say this
worth which he described him when he
listened to music he understand music he
use not too many different or you can do
like prop you can make collection what
you're talking about music about zoos
about that it it's not too many
different situation he described because
we mentioned vitomir proper buys let me
just mention there's a sequence of 31
structural notions they're common in
stories and I think you called units
units and I think they resonate I mean
it starts just a given example
abstention a member of the heroes
community a family leaves the security
of the home environment then it goes to
the interdiction or forbidding edict or
command is passed upon the hero don't go
there don't do this the heroes warn
against some action then step three
violate violation of interdiction brace
you know break the rules break out on
your own then reconnaissance the villain
makes an effort to attain knowledge
needing to fulfill their plot so on it
goes on like this ends ends in a wedding
number 31
your aplia ever after no he just gave
description of all situation he
understands this vault of fossils yeah
not for not focus like it photos or
stories and this story is not in just
for tales the stories in detective
serials as well and probably in our
lives we probably live but is this znz
is a
they're all set this predicate is good
for different situation from movie from
what for movie for theater by the way
there's also criticism right there's an
other way to interpret narratives from
claude lévi-strauss
I am NOT in this business and I know
it's theoretical literature but looking
in her eyes it's always the the
philosophy - yeah yeah but at least
there is a units it's not too many units
that can describe but that I probably
gives another units or in other way
exactly another another set of unasyn
another set of predicates it does not
matter whole but they exist
probably my my question is whether given
those units whether without our human
brains to interpret these units they
would still hold as much power as they
have meaning are those units enough when
we give them to the alien species let me
ask you do you understand digital
digital
emerges no I don't know no or when you
can recognize this digit images that you
understand you understand characters you
understand no no no no I I it's it's the
imitation versus understanding question
because I don't understand the mechanism
by which I don't know no I'm not talking
about I'm talking about three decades
you understand that it involves symmetry
maybe structure maybe something cause I
cannot formulate I just was able to find
symmetries like negative symmetries
that's really good so this is a good
line I feel like I understand the basic
elements of what makes a good hand
recognition system my own like symmetry
connects with me it seems like that's a
very powerful predicate my question is
is there a lot more going on that we're
not able to introspect maybe I need to
be able to understand a huge amount in
the world of ideas
thousands of predicates millions of
predicates in order to do hand
recognition I don't think so
say you're you know both your hope and
your intuition nicely clean enough
you're using digits you're using
examples as well theory says that if you
will use all possible functions from
Hilbert space all possible predicates
you don't need training date you just
will have admissible set of functions
which contain one function yes so the
trade-off is when you're not using all
predicates you're only using a few good
practice you need to have some training
data yes because are the more the more
good particles you have the last
training day exactly that this
intelligent blood still okay I'm gonna
keep asking the same dumb question
handwritten recognition to solve the
challenge you kind of propose a
challenge that says we should be able to
get state of the art amnesty error rates
by using very few sixty maybe fewer
examples prediction what kind of predict
is do you think it was the challenge so
people who will solve this problem that
will answer your answer do you think
they'll be able to answer it in a human
explainable way those are just new to
write function that's it but so can that
function be written I guess by an
automated reasoning system whether we're
talking about a neural network learning
a particular function or another
mechanism no no I'm not against neural
network I am against admissible set of
function which creates neural network
you did it by hand you don't you don't
do it by invariance by predicate vital
by by reason but your nowas can then
reverse the reverse step of helping you
find a function just as the task of in
your network is is to find a disentangle
representation for example what they
call is just define that one predicate
function as really captures some kind of
essence one not the entire essence but
one very useful essence of this
particular visual space do you think
that's possible like um listen I'm
grasping hoping there's an automated way
to find good predicates right so the
question is what are the mechanisms of
finding good predicates ideas they you
think we should pursue a younga
restlessly
I gave example so find situation where
predicates did you suggesting don't
create invariant it's like in physics
first find situation where existing
theory cannot just explain it
find situation where the existing theory
cannot explain this to see finding
contradictions final contradiction and
then remove this contradiction but in my
case
what means contradiction do point
function which if you will use this
function you do not keep in conveyance
this is really the process of
discovering contradictions yeah it is
like in physics find situation where you
have contradiction for one of the
property for one of the predicate then
includes the spread effect making
invariance and solve against this
problem now you don't have contradiction
but it is
not the best very probably I don't know
- looking for predicates that's just one
way okay that mono it was brute force
way in the brute force way what about
the ideas of some what big umbrella term
of symbolic AI these what in 80s with
expert systems sort of logic reasoning
based systems is there hope there to
find some through sort of deductive
reasoning to find good predicates
alright don't think so I think of just
logic is not enough it's kind of a
compelling notion now you know that when
smart people sit in a room and reason
through things it seems compelling and
making our machines do the same is also
compelling so everything is very simple
when you have infinite number of
predicates you can choose the function
you want you have invariance and you can
choose the function you want but you
have to have we're not too many
invariance to solve the problem so in
half from infinite number function to
select finite number and hopefully small
for a number of functions which is good
enough to extract small set of
admissible functions
so they've you be admissible it's for so
because every function just decreased
set to function and leaving admissible
but it will be small but why do you
think logic based systems don't can't
help intuition not because you you
should know
you should know life this guy like probe
he knows something and he tried to put
in invariant his understanding that's
the human yeah see you're putting too
much value in to Vladimir Propp
knowing something no it is my decision
what means you more life what elements
you know common sense
no no you know something common sense it
is some rules you think so
common sense is simply rules common
sense is every its mortality it's no
it's it's fear of death it's love it's
spirituality it's happiness and sadness
all of it is tied up into understanding
gravity which is what we think of as
common sense they don't really discuss
so bright I want to discuss understand
digitally understand digital cognition
you never bring up love and death you
bring it back to digit recognition okay
no you know it was durable because there
is a challenge yeah which I she have to
solve it before you have a student
concentrate on this work I do suggest
some sector so you mean Henry
recognition yeah it's a beautifully
simple elegant yet I think that I know
invariance which will solve this do I
sing some meanness
but it is not universe it is maybe I
want some universal invariance which are
good not only for digit recognition for
imaging the static so let me ask how
hard do you think is 2d image
understanding so if we can kind of
Intuit handwritten recognition how big
of a step leap journey is it from that
if I gave you good
I solved your challenge for Henry
recognition how long would my journey
then be from that to understanding more
general natural images immediately
understandeth as soon as you make a
record because it is not for free as
soon as you will create several
invariance which will help you to get
the same performance that the best
neural net did using hundred ten maybe
more than hundred times less examples
you have to have something smart to dot
that and you're saying that represent
Mario
it is predicate because you should put
some idea how to do that but okay let me
just pause maybe it's a turning point
maybe not but handwritten recognition
feels like a 2d two-dimensional problem
and it seems like how much complicated
is the fact that most images are
projection of a three-dimensional world
onto a 2d plane it feels like for a
three-dimensional world who still we
need to start understanding common sense
in order to understand an image it's no
longer visual shape and symmetry it's
having to start to understand concepts
of it understand life yeah yes yes
you're you're you're talking cells that
are different in value different every
decade yeah and potentially much larger
number you know might be but let's start
from simple
well yeah but you said that you know I I
cannot think yes the ball things which I
don't understand this I understand but
I'm sure that I don't understand
everything's there yeah as the
constraints I do as simple as possible
but not simpler and that is exact case
with harridan every condition yeah but
no that's the difference between you and
I
I welcome and enjoy thinking about
things I completely don't understand
because to me it's a natural extension
without having solved handwritten
recognition to wander how how difficult
is the the the next step of
understanding 2d 3d images because
ultimately while the signs of
intelligence is fascinating it's also
fascinating to see how that maps to the
engineering of intelligence and
recognizing handwritten digits is not
doesn't help you it might it may not
help you with the problem of general
intelligence we don't know it'll help
you a little bit unclear it's unclear
yeah but I would like to make a remark
yes I start not from very primitive
problem Mike a challenge problem I start
with very general problem this Plateau
so you understand and it comes from
plotted so digit recognition so so you
basically took Plato and the world of
forms and ideas and mapped and
projecting into the clearest simplest
formulation of that big world and you
know I will say that I did not
understand Plata until recently and
until I consider the convergence and
then predicate and you know this is what
plot at all so linger on that like why
how do you think about this world of
ideas and world of things in play-doh
no it was me tougher it is it's the
matter for for sure yeah compelling it's
a poetic and a beautiful for what can
you but it is the way of you you should
try to understand have a talk I guess
since the world so from my point of view
it is very clear but it is line all the
time people looking for that
say plateaus in Hegel whatever
reasonable it exists whatever exist it
is reasonable I don't know what he have
in mind reasonable right there's
philosophers again no no no no it is it
is next stop of vignale that mathematics
understand something good in reality it
is the same plot a line and then it
comes suddenly so Vladimir Propp look 31
IDs 31 units disconnect everything
there's abstractions ideas that
represent our world and we should always
try to reach into that yeah but what you
should make a projection on reality but
understanding is it is abstract ideas
you have in your mind
several abstract ideas which you can
apply to reality and reality in this
case sir if you look at machine learning
as days example did data okay let me let
me put you put this on you because I'm
an emotional creature I'm not a
mathematical creature like you I find
compelling the idea forget this the
space the sea of functions there's also
a sea of data in the world and I find
compelling that there might be like you
said teacher small examples of data that
are most useful for discovering good
whether it's predicates or good
functions that the selection of data may
be a powerful journey a useful mekin you
know coming up with a mechanism for
selecting good data might be useful to
do you find this idea of finding the
right data set interesting at all or do
you kind of take the data set as a given
I think that it is yeah you know my
scheme is very simple you have huge set
of fun
questions if you will apply and you have
not too many data if you pickup function
which describes this data you will do
not very well you know randomly yeah
usually fit yeah
it will be our ever fitting so you
should decrease set of function from
which you picking up one so you should
go some have two admissible set of
function now this what about these
conversions so but from another point of
view to to make admissible set of
function you need just a DG just
function which you will take in inner
product which you will measure property
of your function and that is how it
works
no I get it I get understand that but do
you that the reality is let's let's look
this car let's think about examples you
have huge set of function if you have
several examples if you just trying to
keep the take function which satisfies
these examples you still do overfit you
need decreases you new tab miscible set
of function yeah absolutely but what say
you have more data than functions so
sort of consider though I mean maybe not
more data than functions because that's
unfortunately impossible but what I was
trying to be poetic for a second I mean
you have a huge amount of data a huge
amount of examples but the function
didn't even get bigger
I understand there's always there's a
long ago well full human space I catch
it but okay
but you don't you don't find the world
of data to be an interest
optimization space like the the
optimization should be in a space of
functions in creating admissible set of
unnecessary force no you know even from
the classical accessory from structure
risk minimization you should or you
should organize function in the way that
they will be useful for you right and
that is the way you're thinking about
useful is you're given a small small
small set of functions which contain
function by looking quo yep as looking
for based on the empirical set of small
examples yeah but that is another story
I don't touch it because I I believe I
believe that this small examples it's
not too small say sixty per class law of
large numbers works
I don't need uniform law the story is
that in statistics there are two law law
of large numbers in uniform law of large
numbers so I want to be in situation
where I use law of large numbers no but
not uniform law of large numbers right
so 60 is love it's large enough I hope
no it still need some evaluation some
bonds so that's what idea is the
following that if you trust that say
this average gives you something close
to expectations so he you can talk about
that about this predicate and that is
basis of human intelligence right good
predicates is the discovery of good
predicate is the basis of it is
discovery of you of your understanding
world of your methodology or this type
of understanding wall because you have
several function which you will apply to
reality
okay can you say that again so you're
you have several functions predicate but
the abstract yes
then you will apply them to reality to
your data and you will create in this
very predicate which is useful for your
task but predicates are not related
specifically to your task to the C a
task it is abstract functions which
being applying apply to planning tasks
that you might be interested it might be
many tasks freedom or different tasks
well they should be many tasks yeah I
dislike like in prop case it was for
free details but such happened
everywhere okay so we talked about
images a little bit can we talk about
Noam Chomsky for a second verify I don't
know him personally what not personally
I don't know his ideas these ideas well
let me just say do you think language
human language is essential to
expressing ideas as Noam Chomsky
believed so like languages at the core
of our formation of predicates the human
language language and all the story of
language is very complicated I don't
understand this and I am NOT I thought
about nobody I'm not ready to work on
that because it's so huge it is not for
me and I believe not for our century
it's a 21st century not for 21st century
so you should learn something a lot of
stuff from simple tasks like digit
recognition so you think you think
digital recognition to the image what
how would you
more abstractly define a digit
recognition it's 2d image symbol
recognition
essentially I mean I'd like I'm trying
to get a sense sort of thinking about it
now having worked with amnesty forever
how could how small of a subset is this
of the general vision recognition
problem and the general intelligence
problem is it yeah
is it a giant subset is it not and how
far away is language you know let me
refer to entertain take the simplest
problem as simple as possible but not
simpler and this is challenge is simple
problem but it's simple by a year but
not simple to to get it when you will do
this you will find some predicate
without you oh yeah I mean with I what
Einstein you can you you look at general
relativity but that doesn't help you
with quantum mechanics that's another
story you don't have any universal
instrument yes so I'm trying to wonder
if which space were in whether the
whether handwritten recognitions like
general relativity and then languages
like quantum mechanics are you're still
going to have to do a lot of mess to to
universalize it but I'm trying to see
one so what's your intuition why
handwritten recognition is easier than
language just I think a lot of people
would agree with that but if you could
elucidate sort of the the intuition of
why I don't know no I don't think in
this reaction I just think in
congestions that this problem which I
feel so it well we will create
some abstract understanding of images
maybe not all images I would like to
talk to guys who doing real images in
Columbia University what kind of images
unreal it's a real image really yeah
what the Reggie Israel predicate what
can be predicated I still symmetry will
play role in real life images in any
real life images 2d images let's talk
about to the image because that's what
we know a neural network was created for
2d images so the people I know in vision
science for example the people study
human vision you know that they usually
go to the world of symbols and like
handwritten recognition but not really
it's other kinds of symbols to study our
visual perception system as far as I
know not much predicate type of thinking
is understood about our vision system so
do not assume conscious direction they
don't yeah they but how do you even
begin to think in that direction that's
a sorry I'd like to discuss with them
yeah because if we will be able to show
that it is what working and surely it's
caused him it's not so bad so the the
unfortunate so if we compare the
language language has like letters
finite set of letters and a finite set
of ways you can put together those
letters so it feels more amenable to
kind of analysis with natural images
there is so many pixels no no no letter
language is much much more complicated
it's involved a lot of different stuff
it's not just understanding of very
simple class of tasks I would like to
see lists of tasks where language
involved yes so there's a there's a lot
of nice benchmarks now on natural
language processing
from the very trivial like understanding
the elements of a sentence to question
answering it more much more complicated
where you talk about open domain
dialogue the natural question is with
handwriting recognition is really the
first step
yeah of understanding visual information
all right but not but but even our
records shows that we go in the wrong
direction of course we live sixty
thousand digits so even this first step
so forget about talking about the full
journey this first step should be taking
in the right or wrong direction because
60,000 pieces unacceptable no I'm saying
it should be taken in in the right
direction or the 60,000 is not
acceptable because you can talk great
off percent of error and hopefully the
step from doing hand recognition using
very few examples the step towards what
babies do when they crawl and understand
that I know babies will do from very
small examples yeah you will find
principles that will show the difference
from what we using it now and so let's
call it's more or less clear that means
that you here you'll use deep converges
not just strong convergence do you think
these principles are will naturally be
human interpretable oh yeah so like when
we will be able to explain them and have
a nice presentation to show what those
principles are or are they very going to
be very kind of abstract kinds of
functions for example I talked yesterday
about symmetry yes
and it gave very simple examples the
same will be like you gave like a
predicate of a basic four for symmetries
yes four different symmetries in you
have four degree of symmetry that this
is important not just symmetry existent
doesn't exist the degree of symmetry
yeah for Herod recognition no it's not
for anything it's for ebony images but I
would like apply 200 right it's in
theory it's more general okay okay so a
lot of things we've been talking about
Falls we've been talking about
philosophy a little bit but also about
mathematics and statistics a lot of it
falls into this idea a universal idea of
statistical theory of learning what is
the most beautiful and sort of powerful
or essential idea you've come across
even just for yourself personally in in
the world of statistics or statistic
theory of learning probably uniform
convergence which we did this Aleksei
children ents can't describe university
versions you have love love law of large
numbers so for any function expectation
of function average of function
congested expectation but if you have
set of functions for any function in
test row but it should converge
simultaneously for all set of functions
and for learning you need uniform
convergence just convergence is not
enough because when you pick up one
which gives minima you can pick up one
function which does not converging and
it will give you the best answer for for
this function so you need two uniform
convergence to guarantee learning so
learning does not relieve Ontario law of
large numbers early on Universal but a
deal of this
convergence existing statistics for a
long time but it is interesting that as
I think about myself how stupid I was
fifty years I did not see the
convergence I work on the on strong
convergence but now I think that most
powerful is the convergence because it
makes admissible set of functions and
even in all Prada in Proverbs when
people try to understand recognition
more dark law looks like a dark and so
on they use the convergence people in
language they understand this but when
the trying to create artificial
intelligence if you want present in
different way we just consider strong
convergence armaments so reducing set
admissible functions you think there
should be effort put into understanding
the properties of weak convergence you
know in classical mathematics in gilded
space zero only to fail to form of
contortions strong and weak now we can
use balls that means that we did
everything and it so happened then when
we use Hilbert space which is very rich
space space of continuous functions
which has an interval and square so we
can apply weaken strong convergence for
learning and have closed form solution
so for can be computationally simple for
me it's a sign that it is right way
because you don't need any every stick
you yes
know whatever you want
but no the only what lift it is concept
of what is political of predicates but
it is not statistics by the way I like
the fact that you think the heuristics
our mess that should be removed from the
system so closed-form solution is the
ultimate no it's equipment than when
you're using right instrument you have
closed one solution do you think
intelligence human level intelligence
when we create it will will have
something like a cost for her solution
you know I know I'm looking on bones
which I gave bones for on virgins when I
looking for bones I thinking what is the
most appropriate kernel for this bond
would be so you know the team saved all
our businesses we use radial basis
function but looking consumer on taste
things that I start to understand that
maybe we need to make corrections to
Rigel basis function to be closer to
work better for this bonds so I'm again
trying to understand what type of kernel
best approximation
no no proximation best fit to this ball
sure so there's a lot of interesting
work that could be done in discovering
better functions and radial basis
functions for your bounds behind it
still comes from you you're looking to
mass and trying to understand what from
your own mind looking at the yeah but I
don't know then I trying to understand
what would you be good
that yet but to me there's still a
beauty again maybe I'm a descendant
volunteering to heuristics to me
ultimately intelligence would be a mass
of heuristics and that's the engineering
and so absolutely when when you're doing
say self-driving cars the great guy who
will do this it does not matter what
theory behind that who has a better
feeling after applied but by the way it
is the same story both predicates
because you cannot create a rule for
situation as much more than you have
room for that but maybe you can have
more abstract rule then it will be less
than zero it is the same story about the
decent and a GS apply to the specific
cases but story should you cannot avoid
this yes of course but you should still
reach for the ideas to understand
science yeah let me kind of ask do you
think neural networks or functions can
be made to reason sort of what do you
think we've been talking about
intelligence but this idea of reasoning
as a is an element of sequentially
disassembling interpreting the the
images so when you think of handwritten
recognition we kind of think that there
will be a single there's an input and
output there's not a recurrence your
what do you think about sort of the idea
of recurrence of going back to memory
and thinking through this sort of
sequentially mangling the different
representations over and over until you
arrive at a conclusion or is ultimately
all that can be wrapped up into a
function
you you suggesting that let us use this
type of algorithm when they starting
thinking hi first of all starting to
understand what I want can I write down
what I want and then I trying to
formalize and when I do that I think you
have to solve this problem
and still no I did not see a situation
where you need recurrence very good but
do you observe human beings yeah do you
try to it's the imitation question right
it seems the human being the reason this
kind of sequentially so does that
inspire in your thought that we need to
add that into our intelligence systems
you're saying okay I mean you've kind of
answer saying until now I haven't seen a
need for it and so because of that you
don't see a reason to think about it you
know most of things I don't understand
in reasoning human it is for me to
complicate it for me the most difficult
part is to ask questions to good
questions how it works half of people
asking questions I don't know you said
the machine learning is not only about
technical things speaking of questions
but it's also about philosophy so what
role does philosophy play in machine
learning we talked about Plato but
generally thinking in this philosophical
way does it have how does philosophy
math fit together in your mind so
studies on sentence their implementation
it's like predicates like say admissible
set of functions it comes together and
we think because the first iteration of
surgery was done fifty years ago with
all that necessary
everything's there if you have data you
can and you could be in your set of
function he is not has not have not big
capacity so Laurie see dimension you can
do that you can make structural risk
minimization control capacity but he was
not able to make admissible set of
function God no when suddenly realized
that they did not use another idea of
convergence which we can everything
comes together but those are
mathematical notions philosophy plays a
role of simply saying that we should be
swimming in the space of ideas let's
let's talk what is philosophy philosophy
means understanding of life so
understanding of life say people like
Plata they understand on very high
abstract level of life so then whatever
I doing it just implementation of my
understanding of life but every new step
it is very difficult for example to find
this idea that we need the convergence
was not simple for me
so there how are you thinking about life
a little bit hard to heart hard to trace
but there was some thought process you
know I work in coach thinking about the
same problem for 50 years or more and
again and again again I trying to be
understand that is a very important not
to be very enthusiastic yeah
but concentrate on whatever he was not
able to achieve relation to me and
understand why
and now I understand that because I
believe in math I believed it in
business idea but now when I see that
there are only two way of convergence
and we using boss that means that we
must owe as well as people doing but
know exactly in philosophy and what we
know about predicate between Cogley
understand life can be described as a
predicate I thought about that and that
is more or less obvious level of
symmetry but next favor feeling it's
something about structures but I don't
know how to formulate how to measure and
measure structure and all the stuff and
guy who will solve this challenge
problem then when we were looking how he
did it
probably just only symmetries not enough
but something like some H will be death
so absolutely cemetery of the CERN on
level of symmetry will be same and level
of symmetry antisymmetric Jurgen
electrical and I even don't know how you
can use in different direction idea of
symmetry that's very general but it will
be there I think the people very
sensitive to radial symmetry but there
are several ideas like symmetry as I
would like the lot but you cannot learn
just thinking about that you should do
challenging problems and then analyzing
why why it was we was able to solve them
and then we will see
simple things it's not easy to find even
with talking about this every time he
had about you I was surprised I try to
understand these people describe in
language strong convulsions mechanism
for learning
I did not see I don't know but we
convergence this dark story and story
like that when you will explain to keep
evil use weak convergence argument it
looks like it does like a desert but
when you try to formalize you just
ignoring this why why fifty years from
start of machine and after all flus I
think I I think that might be I don't
know maybe this is also we should blame
for that because empirical risk
minimization of the stuffin if you read
now textbooks they just about bound both
empirical risk minimization they don't
looking for another problem like
admissible said but on the topic of life
perhaps we you could talk in Russian for
a little bit what's your favorite memory
from childhood like I'll actually be my
apologies gesture Oh
music how about can you try to answer in
Russian musica but below oceans door
overcome de la musica cause she's gonna
make noise my competitor it's natural to
believe I'll gie working below Jillian
detect motion app atomic at the poem
Walker yeah Bob's a friend a petition
statute what this avoid Edom after
switch to it / Jakarta cheapest doctoral
of Bahia now Kenya Newton werster moved
on prostitute States offense dr. Janna
doom Western Arizona aluminum to his
knee
senior s Dillon me it's mostly pretty
cot feudal structure of rupees doctora
surely she had machine instructor would
would come tonight see what they say
what was so clear on the political data
bah he even just you know now that we're
talking about Bach let's switch back to
english
cuz I like Beethoven and Chopin so I'm
shoppin it's another amusing story I was
but back if we talk about predicates
Park probably has the most sort of
well-defined predicates and I you know
it is very interesting to read what
critics writing about Bach which wards
are using they trying to describe three
decades and and and then shop when it is
very different vocabulary very different
predicates and I think that if you will
my collection on net so maybe from this
you can describe predicate four digit
recognition well from Bach and Chopin no
no not from Bach and Chopin from the
critic interpretation of the music yeah
but they trying to explain you music
what the uses as a use they describe
high level ideas of of plateaus at years
but behinds is music that's brilliant so
art is not self-explanatory in some
sense so you have to try to convert it
into ideas it was peaceful oblems
when when you go from ideas to to the
representation it is easy way but when
you're trying to go back it is you'll
post problems but nevertheless I believe
that when you're looking from that even
from art you will be able to find
predicates for digit recognition it's
such a fascinating and powerful notion
do you ponder your own mortality do you
think about it do you fear it do you
draw insight from it
immortality oh yeah are you afraid of
that not too much not too much
it is peaches it will not be able to do
something we shall see
I favor healing to do that for example
and you'll be very happy to work with
various television from music to write
this collection of description what what
have they describe music our seniors or
predicate and from art as well then take
what is in common and try to understand
predicates which is absolute for
everything and where is that for visual
recognition exactly other there's still
time we got time it's take years and
years well see you've got the patient
mathematic mathematicians mind I think
it could be done very quickly and very
beautifully I think it's a really
elegant idea yeah also some of many yes
you know the most time it is not to make
this collection to understand what is
the common to think about that once
again and again and again and again
again but I think sometimes especially
just when you say this idea now even
just putting together the collection and
looking at the different sets of data
language trying to interpret music
criticize music and images I think there
will be sparks of ideas I'll come of
course again again you'll come up with
better ideas but even just that notion
you know is a beautiful notion or even
give some example so I have friend
who
specialist in Russian poetry she is
professor of track of Russian poet II he
did not write poems but she know a lot
of stuff she makes book several books in
one of them is a collection of Russian
poetry share images of Russian poetry
collect all images of Russian poets and
I asked her to do following you have
nibs digit recognition and we get
hundred digits lessons on a table I
don't remember maybe fifty digits and
try from political point of view
describe every image we see using only
words of images of Russian poet and she
did it in them Detroit - I call it
loading fusing privileged information
I call it privileged information you
have on two languages one language is
just image of digit in another language
politics description of this image and
this is privileged information there is
a algorithm when we are working using
privileged information you're doing well
web better much better
so there's something there something
there and there is a in any theme she
unfortunately direct the collection of
digits in poetic descriptions of these
digits there is some something there in
that poetic description but I think that
there is a abstract ideas
on the plateau the level of and yes yeah
that there there that could be
discovered and music seems to be a good
entry but as soon as you start this is
this challenge problem the challenge
from nine it immediately connected to
talk to all the stuff especially with
your talk and this podcast and I'll do
whatever I can to advertise it's such a
clean beautiful Einstein like
formulation of the challenge before us
right let me ask another absurd question
we talked about mortality we talked
about philosophy of life what do you
think is the meaning of life
what's the predicate for mysterious
existence here on earth I don't know
it's very interesting have v in Russia I
don't know you know the guy strugatsky
they are I think she's a thinking about
human what what's going on and say favor
dia that Zara just developing two type
of people common people and very smart
people they just started and these two
branches of people will go in different
direction very soon so that's what they
thinking about life so the purpose of
life is the creative two paths human
societies
yes simple people and more complicated
which do you like best
a simple people are the complicated ones
you know the little he's just his
fantasy but you know every week we have
guy who is just writer and also
so let's cuff literature in he explained
have here understand literature and
human relationship have his seal life
and I understood that I'm just small
kids
comparing the him she is very smart by
in understanding life he knows this
predicate he he knows big blocks of life
I am I am used every time when I listen
to him and he just talking about it rich
and I think that I was surprised so the
managers in big companies most of them
are guys who study English language in
English literature so why because they
understand life
they understand models and among them
maybe many talented critics is just
analyzing this and this is big science
like property this is this blocks it
amazes me that you are and continue to
be humbled by the brilliance of others
I'm very modest about myself why she so
small nor so wrong well let me be
immodest for you you're one of the
greatest mathematician statistician of
our time it's truly an honor and making
your job ok ok let's talk it is not yeah
yeah I know my limits let's let's talk
again when your challenge is taking on
and solved by a grad student especially
he brokered me when they using scripting
maybe music will be involved Lattimore
thank you so much as been thank you very
much
thanks for listening to this
conversation with vladimir vapnik and
thank you to our presenting sponsor cash
app download it used collects pot cast
you'll get ten dollars and ten dollars a
good at first an organization that
inspires and educates young minds to
become science and technology innovators
of tomorrow if you enjoy this podcast
subscribe on youtube give it five stars
an apple podcast supported on patreon or
simply connect with me on Twitter and
lex friedman and now let me leave you
with some words from vladimir vapnik on
solving a problem of interest do not
solve a more general problem as an
intermediate step thank you for
listening I hope to see you next time
you