Transcript
Bo8MY4JpiXE • François Chollet: Keras, Deep Learning, and the Progress of AI | Lex Fridman Podcast #38
/home/itcorpmy/itcorp.my.id/harry/yt_channel/out/lexfridman/.shards/text-0001.zst#text/0115_Bo8MY4JpiXE.txt
Kind: captions
Language: en
the following is a conversation with
Francois Shelley
he's the creator of Karass which is an
open source deep learning library that
is designed to enable fast user friendly
experimentation with deep neural
networks it serves as an interface to
several deep learning libraries most
popular of which is tensorflow and it
was integrated into the tensorflow main
codebase a while ago meaning if you want
to create train and news Neil networks
probably the easiest the most popular
option is to use chaos inside tensorflow
aside from creating an exceptionally
useful and popular library Francois was
also world-class AI researcher and
software engineer at Google and he's
definitely an outspoken if not
controversial personality in the AI
world especially in the realm of ideas
around the future of artificial
intelligence this is the artificial
intelligence podcast if you enjoy it
subscribe on YouTube give us five stars
and iTunes supported on patreon or
simply connect with me on Twitter at lex
Friedman spelled Fri D M a.m. and now
here's my conversation with Francois
shall I you're known for not
sugarcoating your opinions and speaking
your mind about ideas and AI especially
on Twitter it's one of my favorite
Twitter accounts so what's one of the
more controversial ideas you've
expressed online and gotten some heat
for how do you pick yeah no I think if
you have if you go through the trouble
of maintaining Twitter accounts you
might as well speak your mind you know
otherwise it's you know what's even the
point filling in Twitter accounts
they're getting nice Colin just didn't
leave it in in the garage yes so what's
one thing for which I got out of push
back perhaps you know that time I wrote
something about the idea of intelligence
explosion and I was questioning the ID
and the reasoning behind this idea and I
guess I was push back on that I guess
not a flag for it so yeah so integers
explore
I'm sure if Mei was the idea but it's
the idea that if you were to build
general AI problem-solving algorithms
well the problem of building such an AI
that itself is a problem that could be
solved by your eye and maybe it could be
so better than that then what humans can
do so you're a I could start tweaking
its own algorithm good that start being
a better version of itself and so on
it's ratified in a recursive fashion and
so you would end up with an AI with
exponentially increasing intelligence
all right and I was basically
questioning this idea first of all
because the notion of intelligence
explosion uses an implicit definition of
intelligence that doesn't sound quite
right to me it considers intelligence as
property of a grain that you can
consider in isolation like the height of
the building for instance right but
that's not really what intelligence is
intelligence emerges from the
interaction between a brain a body like
embodied intelligence and an environment
and if you're missing one of these
pieces then you can actually define
interagency so just tweaking a brain to
make it smaller and smaller doesn't
actually make any sense to me so first
of all you're crushing the dreams of
many people right so there's a little
bit like say Maris I feel a lot of
physicists max tegmark people who think
you know the universe is an information
processing system our brain is kind of
an information processing system so
what's the theoretical limit like it
doesn't make sense that there should be
some it seems naive to think that our
own brain is somehow the limit of the
capabilities and this information is
just I'm playing devil's advocate here
this information processing system and
then if you just scale it if you're able
to build something that's on par with
the brain you just the process that
builds it just continues and it will
improve exponentially so that that's the
logic that's used actually by almost
everybody that is worried about
superhuman intelligence yeah so you're
you're trying to make so most people who
are skeptical that are kind of like this
doesn't their thought process this
doesn't feel right
like that's for me as well so I'm more
like it doesn't we the whole thing is
shrouded in mystery where you you can't
really say anything concrete but you
could say this doesn't feel right this
doesn't feel like that's how the brain
works and you're trying to with your
blog post and now making a little more
explicit so one idea is that the brain
isn't exists alone it exists within the
environment so you can't exponentially
you have to somehow exponentially
improve the environment and the brain
together almost yeah in order to create
something that's much smarter in some
kind of of course we don't have a
definition of intelligence that's right
that's correct
III don't think if you look at very
smart people today even humans not even
talking about a eyes I don't think their
brain and the toughness of their brain
is the bottleneck to the actually
expressed intelligence to their
achievements
you cannot just tweak one part of this
system back of this brain body
environment system and expect
capabilities like what emerges out of
this system to just you know explode
exponentially because anytime you
improve one part of the system with many
interdependencies like this there's a
new bottleneck that arises right and I
don't think even today for very smart
people their brain is not the bottleneck
to the sort of problems they can solve
right in fact many various what people
to them you know they are not actually
solving any big scientific problems in a
tense time they like Einstein but you
know the the patent clerk days like
Iceland became Einstein because this was
a meeting of a genius with a big problem
at the right time right
but maybe this meeting could have noon
and never happens and then Iceland
there's just been a patent clerk it's
and in fact many people today are
probably like genius level smart but you
wouldn't know because they're not really
expressing any of that was brilliant so
we can think of the world earth but also
the universe is just as the space of
problems so all these problems and tasks
are roaming it a various difficulty and
there's agents creatures like ourselves
and animals and so on that are also
roaming it and then you get coupled with
a problem and then you solve it but
without that coupling
you can't demonstrate your quote-unquote
intelligence exactly intelligence is the
meaning of great problem-solving
capabilities with a great problem and if
you don't have the problem you don't
react spreche in intelligence all you're
left with is potential intelligence like
the performance of your brain are you
know
haha your IQ is which in itself it's
just a number right
so you mentioned problem-solving
capacity yeah what what do you think of
as problem-solving about what can you
try to define intelligence like what
does it mean to be more or less
intelligent is it completely coupled to
a particular problem or is there
something a little bit more universal
yeah I do believe all intelligence is
specialized intelligence even human
intelligence has some degree of
generality well all intelligence systems
have some degree of generality they're
always specialized in in one category of
problems so the human intelligence is
specialized in the human experience and
that shows at various levels that shows
in some prior knowledge that's innate
that we have at birth knowledge about
things like agents goal-driven behavior
visual priors about what makes an object
try us about time and so on that shows
also in the way we learn for instance is
very very fast to pick up language it's
very very easy for us to learn certain
things because we are basically
hard-coded to learn them and we are
specialized in solving certain kinds of
problem and we are quite useless when it
comes to other kinds of problems for
instance we we are not really designed
to handle very long term problems we
have no capability of seeing that the
very long term we don't have them
how much working memory you know so how
do you think about long term using long
term planning we're talking about scale
of years millennia what do you mean by
long term were not very good well human
intelligence is specialized in the human
experience and humans experience is very
short like one lifetime is short even
within one lifetime we have a very hard
time envisioning you know things on a
scale of yells like it's very difficult
to project yourself at at the scale of
favi at the scale of ten years and so on
right we can solve only fairly narrowly
scoped problems so when it comes to
solving bigger problems larger scale
problems we are not actually doing it on
an individual level so it's not actually
our brain doing it we we have this thing
called civilization right which is
itself a sort of problem solving system
a sort of artificially intelligent
system right and it's not running on one
brain is ringing on a network of brains
in fact it's running on much more than a
network of brains it's running on a lot
of infrastructure like books and
computers and the internet and human
institutions and so on and that is
capable of handling problems on the on a
much greater scale in any individual
human if you look at some
computer science for instance that's an
institution that solves problems and
it's it is super human right
I took resin on a greater scale it can
source cancer much bigger problem than
an individual human good and science
itself science as a system as an
institution is a crime affair
artificial intelligence problem solving
algorithm that is superhuman yes these
computer science is like a theorem
prover at a scale of thousands maybe
hundreds of thousands of human beings at
a scale what do you think is a
intelligent agent so there's us humans
at the individual level there is
millions maybe billions of bacteria on
our skin there is that's at the smaller
scale you can even go to the particle
level as systems that behave you
couldn't say intelligently in some ways
and then you can look at the earth as a
single organism you can look at our
galaxy and even the universe is just a
little organism do you think how do you
think about scale and defining
intelligent systems and we're here at
Google there is millions of devices
doing computation just in a distributed
way how do you think what intelligence
there's a scale you can always
characterize anything as a system I
think people who talk about things like
intelligence explosion tend to focus on
one Asian is basically one brain like
one brain considered in isolation like a
brain a jaw that's controlling your body
in a very like top to bottom can a
fashion and that body is person goes
into an environment so it's a very
hierarchical view you have the brain at
the top of the pyramid then you have to
bother just plainly receiving orders and
then the body is manipulating objects in
environment and so on so everything is
subordinate to this one thing this
epicenter which is the brain but in real
life intelligent agents don't really
work like this right there is no strong
delimitation between the brain and the
body stalin's you have to look not just
to the brain but at the nervous system
but then the nervous system and the body
are not free to step
and it is so you have to look at an
entire animal as one agent but then you
start realizing as you observe an animal
of any length of time that a lot of the
intelligence of an animal is actually
externalized that's especially true for
humans a lot of our intelligence is
externalized when you write down some
notes that is externalized intelligence
when you write the computer program you
are externalizing cognition
so it's externalizing books it's
generalized in in computers the internet
in other humans
it's externalizing language and so on so
it's there is no like hardly limitation
of what makes an intelligent agent it's
all about context okay but alphago is
better at go than the best humor player
you know there's levels of skill here so
do you think there is such a ability as
such a concept as a intelligence
explosion and a specific task and then
well yeah do you think it's possible to
have a category of tasks on which you do
have something like an exponential
growth of ability to solve that
particular problem
I think if you consider specificity corn
is probably possible to some extent I
also don't think we have to speculate
about it's because we have real-world
examples of frequency self-improving
intelligence systems for instance
science
problem-solving system and knowledge
generation system like a system that
experiences the world in some sense and
then gradually understands it and can
act on it and that system is superhuman
and it is clearly recursively
self-improving because science feeds
into technology technology can be used
to build better tools with our computers
better instrumentation and so on which
in turn I can make sense faster right so
science is probably the closest thing we
have today to a recursively
self-improving super human AI and you
can just observe you know it's science
its scientific progress to the exploding
which you know it's that vision isn't is
an interesting question you can use that
as a basis to try to understand what we
happen with a superhuman AI that as a
science track behavior let me linger on
it a little bit more
what is your intuition why an
intelligence explosion is not possible
like taking the scientific all the
semantic revolutions why can't we
slightly accelerate that process so you
you can absolutely accelerates any
problem solving process so recursively
as recursive self-improvement is
absolutely a real thing but what happens
with recursively seven boring system
it's typically not explosion because no
system exists in isolation and so
tweaking one part of the system means
that suddenly another pollow system
becomes a bottleneck and if you look at
science for instance which is clearly a
recursively self-improving clearly a
problem-solving system scientific
progress is not actually exploding if
you look at science what you see is the
picture of a system that is consuming an
exponentially increasing amount of
resources but it's having a linear
output in terms of scientific progress
and maybe that that will seem like a
very strong claim many people are
actually saying that you know scientific
progress is exponential but when they
are claiming this they are actually
looking at indicators of resource
consumption resource consumption by
science
the number of papers being published the
number of parents being filed and so on
which are just just completely credited
with how many people are working on
science today yeah right so it's
actually an indicator of resource
consumption but what you should look at
is the ad put is progress in terms of
the knowledge that sales generates in
terms of the scope and significance of
the problems that we solve and some
people have actually been trying to
measure that like Michael Neilson for
instance he had a very nice paper I
think that was last year about it so his
approach to measure a scientific
progress was to look at the time line of
scientific discoveries over the past you
know hundred 150 years and for each
measure discovery ask a panel of experts
to rate the significance of the
discovery and if the output of Sciences
institution were exponential you will
expect the example density of
significance to go up exponentially
maybe because there's a faster rate of
discoveries maybe because the
discoveries are you know increasingly
more important and what actually happens
if you if you plot this temporal density
of significance measured in this way is
that you see very much a flat graph you
see a flat graph across all disciplines
across physics biology medicine and so
on and it actually makes a lot of sense
if you think about it because thing
about the progress of physics a hundred
and ten years ago right it was a time of
crazy change think about the progress of
technology
you know 160 years ago when we started
it in you know replacing horses with
scars
when we saw that in electricity and so
on it was a time of incredible change
and today is also a time a very fast
change but it would be an unfair
characterization to say that today
technology in science are moving way
faster than they did 50 years ago 100
years ago and if you do try to
regardless plots the temporal density of
the significance
you have significance idea of seeing a
family sorry you do see very flat curves
let's fasten and and you can check out
the paper that Michael Nielson had about
this idea and so the way interpret is as
you make progress in an in a given field
on any given subtitles it becomes
exponentially more difficult to make
further progress like the very first
person to work on information theory if
you enter a new field and still the very
early years there's a lot of low-hanging
fruits you can take that's right yeah
but the next generation of researchers
is gonna have to dig much harder
actually to make smaller discoveries a
probably larger number of small
discoveries and to achieve the same
amount of impact you're gonna need a
much greater headcount
and that's exactly the picture you're
seeing with science that the number of
scientists and engineers is in fact
increasing exponentially the amount of
computational resources that are
available to science is increasing
exponentially and so on so the resource
consumption of science is exponential
but the output in terms of progress in
terms of significance is linear and the
reason why is because and even though
science is recursively self-improving
meaning that scientific progress mm-hmm
turns into technological progress which
in turn helps science if you look at
computers for instance our products of
science and computers are tremendously
useful in spinning up science the
internet same thing the engine is a
technology that's made possible by
various incentive advances and itself
because it enables you know scientists
to to network to communicate to exchange
papers and ideas much faster it is a way
to speed eccentric products so even
though you're looking at a recursively
self-improving system it is consuming
Spanish way more resources to produce
the same amount of problem-solving
so that's the fascinating way to paint
and certainly that holds for the deep
learning community right if you look at
the temporal what did you call it the
temporal density of significant ideas if
you look at in deep learning I think I'd
have to think about that but if you
really look at significant ideas in deep
learning they might even be decreasing
so I I do believe the per per paper
significance
it's like creasing with signified and
the amount of papers is still today
exponentially increasing sweating if you
look at an aggregate my guess is that
you would see a linear progress you're
probably aware to some to some the
significance of all papers
you would see roughly in your profits
and in in my opinion it is not
coincidence that you're seeing in your
progress in science despite exponential
resource conception I think the resource
consumption is dynamically adjusting
itself to maintain linear progress
because the we as a community expecting
your progress meaning that if we start
investing less and sing less progress it
means that suddenly there are some
low-hanging fruits that become available
and someone's going to step in step up
and pick them right right so it's very
much like a market right for discoveries
and ideas but there's another
fundamental part which you're
highlighting which as a hypothesis as
science or like the space of ideas any
one path you travel down it gets
exponentially more difficult to get a
new way to develop new ideas yes and
your sense is that fun that's gonna hold
across our mysterious universe yes when
exponential promise Stringer's
exponential friction so that if you
tweak one part of a system suddenly some
other part becomes a bottleneck for
instance let's say let's say develop
some device that measures
it's an acceleration and then it's it
has some engine and it add puts even
more acceleration in proportion if it's
an acceleration and you drop it
somewhere it's not going to reach
infinite speed because some it exists in
a certain context so the air around its
gonna generate friction it's gonna is
gonna you know block it at some top
speed and even if you were to consider
the broader context and lift the
bottleneck there like the bottleneck a
firm a friction then some other part of
the system which starts stepping in and
creating exponential friction maybe the
speed of light are you know whatever and
it's definitely horse true when you look
at the problem solving algorithm that is
being run by science as an institution
science as a system as you make more and
more progress this despoiling this
recursive self-improvement component you
are encountering exponential friction
like do more researchers you have
working on different ideas the more
overhead you have in
communication across researchers if you
look at you were mentioned in quantum
mechanics right well if you wants to
start making significant discoveries
today significant progress in quantum
mechanics there is an amount of
knowledge you have to ingest which is
huge so there is a very large overhead
to even start to contribute there is a
large amount of overhead to synchronize
across researchers and so on and of
course this the significant practical
experiments are going to require
exponentially expensive equipment
because there is your ones I've already
been run right so in your senses there
is no way escaping
there's no way of escaping this kind of
friction with artificial intelligence
systems yeah no I think science is very
good way to model with what we happen
with with a superhumans are you serious
if improving yeah that's intense I mean
that's that's my intuition too it's not
it's not like a mathematical proof of
anything that's not my points like I'm
not I'm not trying to prove anything I'm
just trying to make an argument to
question the narrative of intelligence
explosion which is quite a dominant
narrative and you do get a lot of
pushback if you go against it because so
for many people write AI is not just a
subfield of computer science it's more
like a belief system I just believe that
the world is headed towards an event the
singularity past which you know AI will
become we go exponential very much and
the world will be transformed and humans
will become obsolete and if you if you
go against this narrative because
because it is not really a scientific
argument but more for belief system it
is part of the identity of many people
if you go against this narrative it's
like you're attacking the identity of
people who believe in it it's almost
like saying God doesn't exist at
something right so you do get a lot of
pushback if you try to question this
ideas
first of all I believe most people all
they might not be as eloquent or
explicit as you're being but most people
in computer science and most people who
actually have built anything that you
could call AI quote unquote
would agree with you they might not be
describing in the same kind of way it's
more so the pushback you're getting it's
from people who get attached to the
narrative from not from a place of
science but from a place of imagination
yes correct miss correct so why do you
think that's so appealing because the
usual dreams that people have when you
create a super intelligent system past a
singularity that would people imagine it
somehow always destructive do you have
if you were put on your psychology hat
what's why is it so appealing to imagine
the ways that all of human civilization
will be destroyed I think it's a good
story you know it's a good story and
very interestingly it's mirrors residue
stories right reiji's mythology if you
look at the mythology of most
civilizations it's about the world being
headed towards some final event in which
the world will be destroyed and some new
world order will arise that will be
mostly spiritual like the apocalypse
followed by products probably yeah it's
a very appealing story on a fundamental
level and we all need stories we own
stories to structure in the way we see
the world especially at time scales that
are beyond our ability to make
predictions right so on a more serious
non exponential explosion
question do you think there will be a
time when we'll create something like
human level intelligence or intelligence
systems that will make you sit back and
be just surprised at damn how smart this
thing is that doesn't require
exponential growth and an exponential
improvement but what what's your sense
than a time line and so on that where
you'll be really surprised at certain
capabilities and we'll talk about
limitations and deep learners so when do
you think in your lifetime you'll be
really damn surprised around 2013-2014 I
was many times surprised by the
capabilities of deep learning actually
that was before we had assess exactly
well deepening could do and could not do
and it felt like a time of immense
potential and then we started you know
narrowing it down but I was very
surprised so it's a it's it's it's it
has already happened was there a moment
there must have been a day in there
where your surprise was almost bordering
on the belief of the narrative that we
just discussed what it was there a
moment because you've written quite
eloquently about the limits of deep
learning was there a moment that you
thought that maybe deep learning is
limitless no I don't think I've ever
believed this what was restocking is
that it it worked all right they worked
at all yes yeah but there's a there's a
big jump between being able to do really
good computer vision and human level
intelligence so I I don't think at any
points I wasn't an impression that the
results we got in computer vision meant
that we were very close to him and even
intelligence I don't think we're very
close to human ever intelligence I do
believe that there's no reason why we
want achieve it at some point I also
believe that you know it's the problem
is with talking about human level
intelligence that implicitly you are
considering like an axis of intelligence
with different levels but that's not
really how intelligence works
intelligence is very multi-dimensional
and so there's the question of
capabilities but there's
also the question is being human-like
and two very different things like you
can be potentially very advanced
intelligent agents that are not human
like at all and you can also build very
human-like agents and this out okay
two very different things right right
let's go from the philosophical to the
practical I can give me a history of
Karis and all the major deep learning
frameworks that you kind of remember in
relation to chaos and in general
tensorflow Theano the old days you give
a brief overview Wikipedia style history
and your role in it before return to AGI
discussions yeah that's a broad topic so
I started working on chaos to the name
chaos at the time I actually pick the
name like just today I was gonna release
it so I started working on it in
February 2015 and so at the time there
weren't too many people working on deep
learning maybe like fewer than 10,000
the software tuning was not really
developed
so the
deepening library was cafe which was
mostly C++ why do I say cafe was the
main one cafe was vastly more popular
than ya know in in late 2014 early 2015
cafe was the one library that everyone
was using for computer vision and
computer vision was the most popular
problem absolutely company like
covenants was like the subfield of
deplaning it everyone was working on so
myself suing in in late 2014 I was
actually interested in islands in Rico
neural networks which was a very niche
topic at the time right III a tree to
catherine around 2016 and so I was
looking for good tools and I had used
torch 7 News Channel you stay on a lot
in cable competitions mmm I just cafe
and there was no like good solution for
Ireland's at the time like there was no
reusable open-source implementation of
in lsdm for instance so I decided to
build my own and that first the pitch
for that was it was going to be mostly
around lsdm Iconia networks it was going
to be in Python an important decision at
the time that was Canon are obvious is
that the models would be defined yeah a
Python code which was kind of like going
against the mainstream at the time
because cafe Thailand who wants on like
all the big libraries were actually
going with you approach sharing static
configuration files in Yemen to define
models so some libraries were using code
to define models like torch 7 obviously
that was not Python Lezyne was like a
piano based very early library that was
I think developed I don't remember
exactly probably late 2014
Python as well it's Python as well it
was it was like on top of Tiano and so I
started working on something
and in the value proposition at the time
was that not only that the what I think
was the first reducible open-source
implementation FRS diem
you could combine Islands and covenants
with the same library which is not
really possible before like a he was on
into incontinence and it was kind of
easy to use because so before I was
using ten I was actually using
psychically on and I loved psychically
for its usability so I drew a lot of
inspiration from psychic then when I
went Cara's it's almost like cycling for
neural networks yeah
the fit function exactly the fit
function like reducing a complex
training loop to a single function call
right and of course you know some people
will say this is hiding a lot of details
but that's exactly the point
all right the magic is the point all
right so it's magical but in a good way
it's magical in the sense that it's
delightful yeah right yeah I'm actually
quite surprised I didn't know that it
was born out of desire to implement our
hands in lc/ms it was that's fascinating
so you were actually one of the first
people to really try to attempt to get
the major architectures together and
it's also interesting you made me
realize that that was a design decision
at all is defining the model in code
just I'm putting myself in your shoes
whether the yamo especially if cafe was
the most popular it was the most but I
might fall if I was I'm if I were yeah I
don't it I didn't like the yellow thing
but it makes more sense that you will
put in a configuration file the
definition of a model that's an
interesting gutsy move just stick with
defining it in code just if you look
back other libraries we're doing it as
well but it was definitely the more
niche option yeah okay Cara's and then
girls so I really scare us in March 2015
and it got she's just pretty much from
the start so the deep learning community
was very small at the time
lots of people were starting to be
interested in the rest um so it was
gonna release it at the right time
because it was offering an easy to use
it as team implementation exactly at the
time where lots of yours started to be
intrigued by the capabilities of onin on
ins one LP so it it grew from there
then I joined Google
about six months later and that was
actually completely unrelated to took
care us actually joined a research team
working on image classification mostly
like computer vision so I was doing
computer vision research at Google
initially and immediately when I joined
Google I was exposed to the early
internal version of tensorflow
and the way to appeal to me at the time
and that was definitely the way it was
at the time is that this was an improved
version of Tiano
so I immediately knew I had to port cars
to this new tensorflow thing and I was
actually very busy as as as a noogler as
a new Googler so I had not time to work
on that but then in November I think
twist November 2015
tensorflow got released and it was kind
of like my my wake-up call at hey to
actually you know go and make it happen
so in December I I putted cars to run on
two of tensorflow but it was not exactly
port it was more like a refactoring
where I was abstracting away all the
backend functionality into one module
then the same codebase could run on top
of multiple backends right so on top of
things fluor Theano and for the next
year yeah no you know stayed as the
default option it was you know it was
easier to use somewhat let's begin it
was much faster especially when he came
to Orleans but eventually you know a
tensorflow
overtook it right and test of all the
early tests for similar
architectural decisions there's the
arrow yeah so what is there was a
natural as a natural transition yeah
absolutely
so what I mean that still carries is the
side almost fun project right yeah so it
it was not my job assignment it's not I
was doing it on the side that so I'm and
even though it's great to have you know
a lot of uses for a deepening library at
the time like throughout 2016 but I
wasn't doing it as my main job so things
solid
changing in I think it's mustard maybe
October 2016 so one year later so Rashad
who has the lead intensive law basically
showed up one day in in our building
while I was doing like so I was doing
research in things like so I added of
computer vision research also
collaborations with Christians getting
and deep planning for theorem proving it
was a really interesting research topic
answer Rajat was saying hey we saw chaos
we liked it we saw that you had Google
why don't you come over for like a
quarter and and and work with us I was
like yeah that sounds like a great
opportunity let's do it and so I started
working on integrating the chaos API
into tends to flow more tightly so what
fold up is a sort of like temporary
tents of lonely version of chaos that
was in tents for that contrib for a
while and finally moved to dance to the
core and you know I've never actually
gotten back to my old sim doing research
well it's it's kind of funny that
somebody like you who dreams of or at
least sees the power of AI systems the
reason and they were improving will talk
about has also created a system and
makes the the most basic kind of LEGO
building that is deep learning super
accessible super easy so beautifully so
that's the funny irony that you're Billy
there's just both you're responsible for
both things but so telephoto 2.0 it's
kind of there's a sprint I don't know
how long I'll take but there's a sprint
towards the finish what do you look what
are you working on these days whether
you're excited about what are you
excited about in 2.0 I mean eager
execution there's so many things that
just make it a lot easier
yeah work what are you excited about and
what's also really hard what are the
problems you have to kind of saw
so I've spent the past year and a half
working on 1002 and it's been a long
journey I'm actually extremely excited
about it I think it's a great product
it's a delightful product competitive
law one we met huge progress so on the
carrot side what I'm really excited
about is that so
you know previously Kara's has been this
very easy-to-use high level interface to
do deep learning but if you wanted to
you know if you wanted a lot of
flexibility the chaos framework you know
was probably not the optimal way to do
things compared to just writing
everything from scratch so in some way
the framework was getting in the way and
in terms of you to you don't have this
at all actually you have the usability
of the high level interface but you have
the flexibility of this lower level
interface and you have this spectrum of
workflows where you can get more or less
usability and flexibility the trade-offs
depending on your needs right you can
write everything from scratch and you
get a lot of help doing so by you know
subclassing models and writing some
train loops using ego execution it's
very flexible is very easy to debug is
very powerful but all of these
integrates seamlessly with higher level
features up to you know the classic
workflows which which are very
psychically unlike and and you know are
ideal for a data scientist machining
engineer type of profile so now you can
have the same framework offering the
same set of api's that enable a spectrum
of workflows that are more or less
uniform or less high level that are
suitable for you know profiles ranging
from researchers to data scientists and
everything in between
yeah so that's super excited I mean it's
not just that it's connected to all
kinds of tooling you can go on mobile
and what that's for light it can go in
the cloud or serving and so on and all
its connected together now some of the
best software written ever is often done
by one person sometimes two so with a
Google you're now seeing sort of Karass
having to be integrated in tensorflow
I'm sure it's a ton of engineers working
on so and there's I'm sure or a lot of
tricky design decisions to be made how
does that process usually happen from at
least your perspective what are the what
are the debates like what a
is there a lot of thinking considering
different options and so on yes so a lot
of the time
I spend on Google is actually discussing
design discussions right writing design
Docs participating in design review
meetings and so on this is you know as
important as actually writing a cool
right well there's a lot of thoughts
there's a lot of thought and and a lot
of care that is that taken in coming up
with these decisions and taking into
account all of our users because
tensorflow has this extremely diverse
user base right it's not it's not like
just one user segment where everyone has
the same needs we have
small-scale production uses large-scale
production uses we have startups we have
researchers you know it's all over the
place and we have to catch up to all of
their needs if I just look at the
standard the base of C++ or Python
there's some heated debate do you have
those at Google I mean they're not here
in terms emotionally but there's
probably multiple ways to do it right so
how do you arrive through those design
meetings at the best way to do it
especially in deep learning where the
field is evolving as you're doing it is
there some magic to it there's a magic
to the process I don't know just magic
to the process but there definitely is a
process so making design decision is
about satisfying a set of constraints
but also trying to do so in the simplest
way possible because this is what can be
maintained is what can be expanding in
the future so you don't want to naively
satisfy the constraints by just you know
for each capability you need available
you're gonna come up with one argument
new idea and so on you want to design
api's and that are modular and
hierarchical so that they're there they
have an API surface that is as small as
possible right and and you want this
modular hierarchical architecture to
reflect the way that domain experts
think about the problem because as the
men expect when you're reading about a
new media you're reading each toy or
some darks
pages
you already have a way that you're
thinking about the problem you already
have like certain concepts in mind and
and and your thing about how they relate
together and when you're reading darks
you're trying to build as quickly as
possible and mapping between the
concepts
feature the new API and the concepts in
your mind so you are trying to map your
mental model as a domain expert to the
way things work in the API so you need
an API and an underlying implementation
that are reflecting the way people think
about these things so in minimizing the
time it takes them this mapping yes
minimizing the time the cognitive load
there is in in just industry knowledge
about your API an API should not be self
referential or RF referring to
implementation details it should only be
referring to domain-specific concepts
that people already never understand
brilliant so what's the future of
kerosene transfer look like what it
stands for 3.0 look like so that's gonna
to fall in the future for me to answer
especially since I'm now I'm not even
the one making these decisions okay but
so from my perspective which is you know
just one perspective among many
different perspectives on the transferor
team I'm really excited by developing
even higher level api's higher level and
Carols I'm really excited by hyper
parameter tuning by automated machine
learning or two ml I think the future is
not just you know defining a model like
like us and being Lego blocks and then
click fit on it it's more like an
automatical model let me just look at
your data and optimize the objective
view after right so that's that's what
what I'm looking - yeah so you put the
baby into a room with the problem and
come back a few hours later with a fully
solved problem exactly it's not like a
box of Lego's right it's more like the
combination of a kid that's pretty good
at Legos blocks of Legos yeah it's just
building the thing
very nice so that's that's an exciting
feature and I think there's a huge
amount of applications and revolutions
to be had under the constraints of the
discussion we previously had but what do
you think of the current limits of deep
learning if we look specifically at
these function approximator x' that
tries to generalize from data they have
you've talked about local versus extreme
generalization you mentioned in your
networks don't generalize well humans do
so there's this gap so and you've also
mentioned that externalization extreme
journals asian requires something like
reasoning to fill those gaps so how can
we start trying to build systems like
that all right yes so this is this is by
design right deplaning models are like
huge parametric models differentiable so
continuous that go from an input space
to not with space and they're trained
with gradient descent so they're
trying-- pretty much point by points
they are learning a continuous geometric
morphing from from an input vector space
to not protective space right and
because this is done point by point a
deep neural network can only make sense
of points in experience space that are
very close to things that it has already
seen in string data at best it can do
interpolation across points but that
means you know that means in order to
train your network you need a dance
sampling of the input cross ad with
space almost a point-by-point sampling
which can be very expensive if you're
dealing with complex real-world problems
like autonomous driving for instance or
our robotics is it's doable if you're
looking at the subset of the visual
space but even then it's still fairly
expensive you seen in millions of
examples and it's only going to be able
to make sense of things that are very
close to waste as seen before
and in contrast to that well of course
we have human intelligence but even if
you're not looking at human intelligence
you can look at very simple rules
algorithms if you have a symbolic rule
it can actually apply to a very very
large set of inputs because it is
abstract
it is not obtained by doing a point by
point mapping
for instance if you try to learn a
sorting algorithm using a deep neural
network well you're very much limited to
learning point by point what the sorted
representation of this specific list is
like but instead you could have a very
simple sorting algorithm written in a
few lines maybe it's just you know two
nested loops and it can process any list
at all because it is abstract because it
is a set of rules so deep learning is
really like point by point geometric
more things more things train with
conditions and meanwhile abstract rules
can generalize much better and I think
the future is which combine the two so
how do we do you think combine the tools
how do we combine good point by point
functions with programs which is what
symbolic AI type systems yeah at which
levels the combination happen and you
know obviously we're jumping into the
realm of where there's no good answers
it just kind of ideas and intuitions and
so on well if you look at the really
successful AI systems today I think they
are already hybrid systems that are
combining symbolic AI with D planning
for instance success robotics systems
are already mostly model-based
rule-based things like planning
algorithms and so on at the same time
they're using deep learning as
perception modules sometimes they're
using deep learning as a way to inject a
fuzzy intuition into a rule-based
process if you look at a system like an
a self-driving car it's not just one big
end when your network you know that
wouldn't work at all
precisely because in order to train that
you need a dense sampling of experience
space when it comes to driving which is
completely unrealistic obviously instead
the Salonika is mostly
symbolic you know it's software it's
programmed by hand it's mostly based on
explicit models in this case mostly 3d
models of the of the environment around
the car but it's interfacing with the
real world using deep learning modules
right right so the deep learning there
serves is the way to convert the raw
sensory information to something usable
by symbolic systems okay well it's
lingering that a little more so dense
sampling from input to output you said
it's obviously very difficult is it
possible in the case of send driving you
mean let's say still driving itself
driving permit for many people but let's
not even talk about self-driving let's
talk about steering so staying inside
the lane lines following yeah it's
definitely a problem cancel reason and
two in the planning model but that's
like one small subset on a second
yeah I don't like you're jumping from
the extreme so easily because I disagree
with you on that
I think well it's it's not obvious to me
that you can solve Lane following it's
no it's not it's not obvious I think
it's doable I think in general you know
there is no hard limitations to what you
can learn with a DP on network as long
as this the search space like is rich
enough is flexible enough and as long as
you have this dense sampling of the
input cross output space the problem is
that you know this dense sampling could
mean anything from 10,000 examples to
like trillions and trillions so that's
that's my question so what's your
intuition and if you could just give it
a chance and think what kind of problems
can be solved by getting a huge amounts
of data and thereby creating a dense
mapping so let's think about natural
language dialogue the Turing test do you
think the Turing test can be solved with
a neural network alone well the
deterrent test is all about tricking
people into believing that certain to
human
I don't think that's actually very
difficult because it's more about
exploiting a human perception and not so
much about intelligence there's a big
difference between mimicking in Asian
behavior an actual intogen behavior so
ok let's look at maybe the elect
surprised and so on the different
formulations of a natural language
conversation that are less about
mimicking and more about maintaining a
fun conversation that lasts for 20
minutes mm-hmm that's a little less
about mimicking and that's more about I
mean it's still mimicking but it's more
about being able to carry forward a
conversation with all the tangents that
happen in dialogue and so on do you
think that problem is learn Irbil with
this kind of well the neural network
that does the point-to-point mapping so
I think it would be very very
challenging to do this with deep
learning I don't think it's out of the
question
either I wouldn't read out the space of
problems that can be solved or the large
neural network what's your sense about
the spaces those problems so it useful
problems for us in theory it's it's
infinite right you can solve any problem
in practice
while deep learning is great fit for
perception problems in general any any
problem which is naturally a minimal to
explicit handcrafted rules or rules that
you can generate device exhaustive
search or some program space so
perception of intuition as long as you
have a sufficient ring there and that's
the question I mean perception there's
interpretation and understanding of the
scene yeah which seems to be outside the
reach of current for social systems so
do you think larger networks will be
able to start to understand the physics
and the physics of the scene the
three-dimensional structure and
relationships divisors in the scene and
so on or really that's where symbology
has to step in well it's it's always
possible
to solve these problems with with
deplaning is just extremely inefficient
a model would be an explicit rule-based
abstract model would be a flaw efficient
for better and more compressed
representation of physics then learning
justice mapping between in this
situation this thing happens if you
change the situation like slightly then
this other thing happens and so on do
you think is possible to automatically
generate the programs that would require
that kind of reasoning our dessert have
to so the word expert systems fail
there's so many facts about the world
had to be hand coded and thing is
possible to learn those logical
statements that are true about the world
and their relationships do you think I
mean that's kind of what you're
improving at a basic level is trying to
do right yeah except it's it's much
harder to farm any statements about the
world compared to family ting
mathematical statements statements about
the world you know tend to be subjective
so can you can you learn rule-based
models yes yes differently that's the
this is a field of program synthesis
however today we just don't really know
how to do it so it's it's very much a
grad search or research problem and so
we are limited to you know the sort of
at recession grassroot algorithms that
we have today personally I think genetic
algorithms are very promising so I was
like genetic programming genic priming
Zack can you discuss the field of
program synthesis like what how many
people are working and thinking about it
what where we are in the history
programs the decision what are your
hopes for it well if it we are deep
planning this is like the 90s so meaning
that already have we already have
existing solutions we are starting to
have some basic understanding of where
this is about but it still I feel that
is in its infancy there are very few
people I working on it there are very
few real-world applications
so the one we are world application I'm
aware of is a flash fill in Excel it's a
way to automatically learn very simple
programs to format cells in an excel
spreadsheet from from a few examples for
instance training a weight from a date
things like that oh that's fascinating
yeah you know okay that's the disgusting
topic I always wonder when I provide a
few samples to excel what it's able to
figure out like just giving it a few
dates mm-hmm what are you able to figure
out from the pattern I just gave you
it's just a fascinating question and
it's fascinating whether that's learn
about the patterns and you're saying
they're working on that yeah how big is
the toolbox currently are we completely
in the dark so if you said enjoying the
in terms of provinces no I would say so
maybe not even even too optimistic
because by the nineties you know we
already understood that prop we already
understood you know the engine of
deplaning even though we couldn't
release its potential quite today I
don't think we've found the engine of
problems into this so we're in the
winter before backprop yeah anyway yes
so I do believe program synthesis in
general discrete search over route based
models it is going to be a cornerstone
of our research in the next century
right and that doesn't mean like we're
gonna drop deep learning deep learning
is immensely useful like being able to
learn this is a very flexible adaptable
parametric models who's got Henderson
let's let's actually mentally use like
all it's doing its pattern cognition but
being good at pattern recognition given
lots of delays is a statistics from me
powerful so we are still gonna be
working on the planning we are going to
be working on programs entities we're
going to be combining the two
increasingly automated ways mm-hmm so
let's talk a little about data you've
tweeted about 10,000 deep learning
papers have been written about hard
coding priors about a specific task in a
neural network architecture it works
better than a lack of a prior basically
summarizing all these efforts they put a
name to an architecture but
really what they're doing is hard-coding
some priors that improved yes yes but we
get straight to the point is it's
probably true and so you say that you
can always buy performance by in quotes
performance by either training on more
data better data or by injecting tasks
information to the architecture that
pre-processing however this is an
informative about the generalization
power the techniques use the fundamental
ability to generalize do you think we
can go far by coming up with better
methods for this kind of cheating for
better methods of large-scale annotation
of data so building better prize you if
she was emitted it's not seeing any more
right I'm talking about the cheating but
large-scale so basically I'm asking
about something that hasn't and from my
perspective been researched to too much
is exponential improvement in annotation
of doing it do you you often think about
I mean it's actually been I'm being
researched quite a bit you just don't
see publications about it's because you
know people who publish papers are gonna
publish about knowing benchmarks
sometimes I enter is a new benchmark
people who actually have real-world
large-scale dependence
they're gonna spend a lot of resources
into data annotation and get data
annotation pipelines but you don't sink
papers that's interesting so do you
think there are certainly resources but
do you think there's innovation
happening oh yeah asked me to clarify a
at the point in the twist so machine
learning in general is the science of
generalization you want to generate
knowledge that can be reused across
different data sets across different
tasks and if instead you are looking at
one data set and then you are hard
coding knowledge about this task into
your architecture this is no more useful
than training in network and then saying
oh I found these weight values perform
well right so that David hah
I don't know if you know that David yeah
the paper the other day about weight
agnostic neural networks
this is very interesting paper because
it really straights the fact that an
architecture even without wickets in
architecture is a knowledge about a task
it encodes knowledge and when it comes
to architectures that are uncraft Admira
searchers there in some cases it is very
very clear that all they are doing is
artificially re-encoding the template
that corresponds to the the proper way
to solve tasks including given dataset
for instance I know if you've looked at
a baby data set which is about a natural
language question answering it is
generated but not by an algorithm so
this is question-answer pairs are
generated by an algorithm the algorithm
is following a certain template turns
out if you craft a network that
literally encodes this template you can
solve this data set with nearly 100%
accuracy but that doesn't actually tell
you anything about how to solve question
answering in general which is the point
you know the question is just the linger
on it whether it's from the data side
from the size of the network I don't
know if you've read the blog post by
rich Sutton the bitter lesson yeah where
he says the biggest lesson that we can
read from 70 years of AI research is
that general methods that leverage
computation are ultimately the most
effective so as opposed to figuring out
methods that can generalize effectively
do you think we can get pretty far by
just having something that leverages
computation than the improvement of
computation yes so I think rich is
making very good points which is that a
lot of these papers which are actually
all about manually hot coding prior
knowledge about the task into some
system doesn't have to be deep in
architecture but into some system right
you know is these papers are not
actually making any impact instead
what's making real long-term impact is
very simple very general systems that
are agnostic to all these tricks because
districts do not generalize
and of course the one general and simple
thing that you should focus on is that
which leverages computation because
computation the availability of
large-scale computation has been you
know increasing exponentially following
Moore's law so if your algorithm is all
about exploiting this the new algorithm
is suddenly exponentially improving
right so I think rich is definitely
right either you know is right about the
past 70 years is regressing the past 70
years I am Not sure that this assessment
will still hold true for the next 70
years it's it's might to some extent I
suspect it will not because the truth of
his assessments is a function of the
context right in which in which this
research took place and the context is
changing like Moore's law might not be
applicable anymore for instance in the
future and I do believe that you know
when you when you we need to pick one
aspect of a system when you exploit one
aspect of a system some other aspect
starts becoming the bottleneck let's say
you have unlimited competition well then
data is the bottleneck and I think we
are already starting to be in a regime
where our systems are so large in scale
and so data and grain the data today and
quality of data and the scale of data is
the bottleneck and in this environment
the the bitter lesson from rich is it's
not going to be true anymore right all
right so I think we are gonna move from
a focus on a scale of a competition
scale to focus on data efficiency their
efficiency so that's getting to this the
question is symbolic AI but to linger on
the deep learning approaches do you have
hope for either unsupervised learning or
reinforcement learning which are ways of
being more data efficient in terms of
the amount of data they need that
required human annotation so in
supervised learning and reinforcement
learning of frameworks for learning but
they are not like any specific technique
so usually when people say reinforcement
learning but they really mean is deep
printable version which is like one
approach research actually very
questionable the question I was asking
was unsupervised learning with deep
neural networks and deep reinforcement
learning well he's not really a data
efficient because you're still
leveraging you know this huge biometric
models trying point by point with quite
understand it is more efficient in terms
of the number of annotations the density
of annotations you need so the AG being
to to learn the Latin space or on which
the data is organized and then map the
sparse annotations into it and sure I
mean that's that's clearly very good
idea it's not real topic I would be
working on but it's it's really good
idea so we it would get us to solve some
problems that it will get us to
incremental improvements in in labelled
data efficiency do you have concerns
about short-term or long-term threats
from a a from artificial intelligence
yes definitely to some extent and what
is the shape of those concerns this is
actually something I've briefly written
about but the capabilities of deplaning
technology can be used in many ways that
are concerning from you know massive
variance with things like facial
recognition in general you know tracking
lots of data about everyone and then
being able to making sense of this data
to do identification to do prediction
that's concerning that's something
that's being very aggressively preferred
by to tell italian states like you know
china
one thing I am I am very much concerned
about is that you know our lives are
increasingly online are increasingly
digital made of information made of
information consumption and information
production our digital footprint I would
say and if you absorb all of this data
and a new are in control of where you
consume information you know social
networks and so on recommendation
engines then you can build a sort of
reinforcement loop for human behavior
you can observe the state of your minds
at time T you can predict how you would
react to different pieces of contents
how to get you to move your mind you
know in a certain direction in the new
then you can feed in video as the
specific piece of content that would
move you in in a specific direction and
you can do this at scale
you know at scale in terms of doing it
continuously in real time you can also
do it at scale in terms of skinning
these to many many people to entire
populations so potentially artificial
intelligence even in its current state
if you combine it with the internet with
the fact that we have all of our lives
are moving to digital devices and
digital information consumption and
creation what you get is the possibility
to do to achieve mass manipulation of
behavior and mass mass psychological
control and this is a very real
possibility yeah so you're talking about
any kind of recommender system let's
look at the YouTube algorithm Facebook
anything that recommends content you
should watch next
yeah and it's fascinating to think that
there's some aspects of human behavior
that you can you know say a problem of
is this person hold the republican
beliefs the Democratic beliefs and this
is but trivial that's an objective
function and you can optimize and you
can measure and you can turn everybody
into a Republican or everybody
absolutely yeah I do believe it's true
so the human mind is is very if you look
at the human mind as a kind of computer
program it is a very large exploit
surface right it has many many abilities
ways ways ways you can control it
for instance when it comes to your
political beliefs this is very much tied
to your identity so for instance if I'm
in control of your news feed on your
favorite social media platforms this is
actually where you're getting your news
from and I can of course I can I can
choose to only show you news that will
make you see the world in a specific way
right but I can also you know create
incentives for you to post about some
political beliefs and then when I when I
get you to express a statement if it's a
statement that me as the as a controller
I I want you
I want to reinforce I can just show it
to people who will agree and they will
like it and that will reinforce the
statement in your mind if this is a
statement I want you to believe I want
you to abandon I can on the other hand
show it to opponents right we will
attack you and because they attack you
at the very least next time you will
think twice about passing it but maybe
you will even you know start believing
this because you got pushback
right so there are many ways in which
social media platforms can potentially
control your opinions and today the so
all of these things are already being
controlled by a Uyghur isms algorithms
do not have any explicit political goal
today while potentially they could like
if some totalitarian government takes
over you know social media platforms and
decides that you know now we are going
to use this knowledge for mass
surveillance but also for mass opinion
comes from and behavior control very bad
things could happen but it was really
fascinating and and actually quite
concerning is that even with that an
explicit intent to manipulate you're
already saying very dangerous dynamics
in terms of has this contact
recommendation algorithms behave because
right now the the goal the objective
function of zalgar isms is to maximize
engagement right which seems very
innocuous at first right
however it is not because content that
will maximally engage before you know I
get people to react in an emotional way
I get people to click on something it is
very often content that you know is not
healthy due to public discourse for
instance fake news are far more likely
to get you to click on them than real
news simply because they are not
constrained
to reality so they can be as outrageous
as surprising as good stories as you
want because the artificial right yeah
to me that's an exciting world because
so much good can come so there's an
opportunity to educate people you can
balance people's worldview with other
ideas so the there's so many objective
functions the space of objective
functions that create better
civilizations is large arguably infinite
but there's also a large space that
creates division and and and destruction
civil war a lot of bad stuff and the
worry is naturally probably that space
is bigger first of all and if we don't
explicitly think about what kind of
effects are going to be observed from
different objective functions then we
can get into trouble but the question is
how do we how do we get into rooms and
have discussions so inside Google inside
Facebook inside Twitter and think about
okay how can we drive up engagement and
at the same time create a good society
is there is it even possible to have
that kind of philosophical discussion I
think you can different try so from my
perspective I would feel rather
uncomfortable with companies that are in
control of these newsfeed algorithms
with them making explicit decisions to
manipulate people's opinions or
behaviors even if the intent is good
because that's that's a very
totalitarian mindset so instead what I
would like to see as probably never
gonna happen because it's not super
realistic but that's actually something
I care about I would like all these
algorithms to present configuration
settings to their users so that their
users can actually make the decision
about how they want to be impacted by
this information
recommendation content recommendation
algorithms for instance as a as a user
of something like YouTube or Twitter
maybe I want to maximize learning about
a specific topic right so I want the
algorithm to feed my curiosity right
which is in itself a very interesting
problem so instead of maximizing my
engagement it will maximize half fast
and how much I'm learning and it will
also take into account the accuracy
hopefully you know if the information
I'm learning so yeah the user should be
able to determine exactly how these
algorithms are affecting their lives I I
don't want actually any entity making
decisions about in which direction
they're gonna try to manipulate me right
I want I want technology so aii these
algorithms are increasingly going to be
our interface to a world that is
increasingly made of information right
and I want I want everyone to be in
control of this interface to interface
with the world on their own terms so if
someone wants this algorithms to serve
you know their own personal growth goals
they should be able to configure
algorithms in such a way yeah but so I
know it's painful to have explicit
decisions but there is underlying
explicit decisions which is some of the
most beautiful fundamental philosophy
that that we have before us which is
personal growth if I want to watch
videos from which I can learn what does
that mean so if I have a check box that
wants to emphasize learning there's
still an algorithm with explicit
decisions in it that would promote
learning what does that mean for me
like for example I've watched a
documentary on Flat Earth theory I guess
a it was very like that I learned a lot
I really glad I watched it was a friend
recommended it to me not
I don't have such an allergic reaction
to crazy people as my fellow colleagues
do but it was very well it was very
eye-opening
and for others it might not be from
others they might just get turned off
for that same with a Republican Democrat
and what it's a non-trivial problem when
first of all if it's done well I don't
think it's something that wouldn't
happen that the youtubes wouldn't be
promoting or Twitter wouldn't be it's
just a really difficult problem how do
we do how to give people control well
it's mostly an interface design problem
right the way since you want to create
technology that's like a mentor or a
coach or an assistant so that it's not
your boss right you are in control of it
you are telling it what to do for you
and if you feel like it's manipulating
you it's not actually it's not actually
doing what you want you should be able
to switch to different algorithm you
know so that fine-tuned control you kind
of learn you're trusting the human
collaboration and that's how I see
autonomous vehicles too is giving as
much information as possible and you
learn that dance yourself mmm yeah Adobe
I don't use Adobe products like
Photoshop yeah they're trying to see if
they can inject YouTube into their
interface but basically allow you to
show you all these videos that cuz
everybody's confused about what to do
with feature so basically teach people
by linking to and that way it's an
assistant that shows users videos as a
basic element of information yeah
okay so what practically should people
do to try to to try to fight against
abuses of these algorithms or algorithms
that manipulate us us it's a very very
difficult problem because the star is
very little public awareness of these
issues there are a few people would
think as you know anything wrong with
their news feed algorithm even though
there is actually something wrong
already which is that it's trying to
maximize engagement rest of the time
which
as a very negative side effects right so
ideally so the very first thing is to
stop trying to purely maximize
engagement try to propagate contents
based on popularity right instead take
into account the goals and the profiles
of each user so you will you will be one
example is for instance when I look at
tactic recommendations on Twitter's like
you know they have this a news tab where
I will switch recommendations it's
always the worst garbage because it's
it's content that appeals to them the
smallest common denominator to all
Twitter users because they are trying to
optimize the purely trying to
opportunist popularity the purely
friendship you know as an engagement but
that's not what I want so this should
put me in control of some setting so
that I define was the objective function
and the twitter is going to be following
- to show me this cannon so and honestly
so this is all about interface design
and we are not where it's not realistic
to give you this control of a bunch of
knobs that define algorithm instead we
should purify man charge of defining the
objective function like let the user
tell us what they want to achieve how
they want this algorithm to impact their
lives so do you think it is that or do
they provide individual article by
article reward structure where you give
a signal I'm glad I saw this or I'm glad
I didn't so like a Spotify type yeah a
feedback mechanism it works to some
extent I'm kind of skeptical about it
because the only way algorithm the
algorithm will attempt to relate your
choices with the choices of everyone
else which might you know if you have an
average profile that works fine I'm sure
it's pretty far accommodations work fine
if you just like mainstream stuff if you
don't it can be a it's not optimal
election will be in an efficient search
for the for the
part of the Spotify world that
represents you so it's it's a tough
problem but do notes that even even a
feedback system like what Spotify has
does not give me control over what the
algorithm is trying to optimize for well
public awareness which is what we're
doing now it's a good place to start
do you have concerns about long term
existential threats of artificial
intelligence well as I was saying our
world is increasingly made of
information a Iger ism so increasingly
gonna be our interface to this wallet
information and somebody will be in
control of these algorithms and that
puts us in in any kind of a bad
situation
right it has risks it has risks coming
from potentially large companies wanting
to optimize their own goals
maybe profit maybe something else also
from governments I might want to use
these algorithms as a means of control
organs of populations do you think
there's existential threat that could
arise from that so kind of existential
threats so maybe you're referring to the
singularity narrative where robots just
take over well I don't not terminator
robots and I don't believe it has to be
a singularity we're just talking to just
like you said the algorithm controlling
masses of populations the existential
threat being hurt ourselves much like a
nuclear war would hurt ourselves mm-hmm
that kind of thing I don't think that
requires a singularity that requires a
loss of control over AI algorithm yes so
I do agree to all concerning trends
honestly I I wouldn't want to make any
any any long-term predictions I don't I
don't think today we we really have the
capability to see what the dangerous if
they are going to be in 50 years in 100
years I do see that we are already faced
with concrete and present dangers sir
the negative side-effects of content
recommendation systems of newsfeed
algorithms concerning algorithmic bias
as well so we are dedicating more and
more decision processes to algorithms
some of these algorithms aren't crafted
some are Ireland from data but we are we
are we are delegating control sometimes
it's a good thing sometimes not so much
and there is in general very little
supervision of this process right so we
we're still in this period a very fast
change even chaos where society is is
restructuring itself turning into an
information society
I wish itself is turning into an
increasingly automated information
processing society and well yeah I think
the best we can do today is try to to
raise awareness around some of these
issues and I think we're actually making
good progress if you if you look at
algorithmic bias for instance three
years ago even three years ago very very
few people were talking about it and now
all the big companies are talking about
it there are often not in a very serious
way but at least it is part of the
public discourse you see people in
Congress talking about it so and it all
started
from raising awareness right so in terms
of alignment problem try to teach as we
allow algorithms just even recommender
systems on Twitter encoding human values
and morals decisions to touch on ethics
how hard do you think that problem is
how do we have lost functions in neural
networks that have some component some
fuzzy components of human morals well I
think this is really all about objective
function engineering which it's probably
going to be increasingly a topic of
concerned if you like for now where we
are just using very naive loss functions
because the hard part is not actually
what you're trying to minimize it's
everything else but as the everything
else is going to be increasingly
automated we're going to be
focusing on our human attention on
increasingly high level components like
what's actually driving the whole
learning system like the objective
function so the last function
engineering is gonna be last function
janilla is probably going to be a job
title in the future you know and then
the tooling you're creating with Kerris
essentially takes care of all the
details underneath and basically the
human expert is needed for exactly that
last engineer characters the interface
between the data you're collecting and
the business goals and your job as an
engineer is going to be to express your
business goals and your understanding of
your business or your product your
system as a kind of class function all
kind of set of constraints does the
possibility of creating an AGI system
excite you or scare you or bore you so
intelligence can never be be general you
know at best it can have some degree of
generality like human intelligence it's
also always has some specialization in
the same way that human intelligence is
specialized in a certain category of
problems is specialized in the human
experience and when people talk about
AGI I'm never quite sure if they're
talking about
very very smart AI so smart that it's
Stephens modern humans or they're
talking about human-like intelligence
because it's our different things let's
say presumably I'm impressing you today
with my humaneness so imagine that I was
in fact a robot so what does that mean
I'm impressing you with natural language
processing maybe if you weren't able to
see me maybe this is a phone call yes
Zack okay so companion so that that's
very much about building human-like AI
and you're asking me you know is this is
this an exciting perspective yes I think
so yes not so much because of what I
artificial human-like intelligence could
do but you know from an intellectual
perspective I think if you could build
truly human right intelligence that
means you could actually understand
human intelligence which is fascinating
right
yeah human-like intelligence is gonna
require emotions it's gonna require
consciousness which is not things that
would normally be required by an
intelligent system you do get you know
we were mentioning your next science as
superhuman problem-solving a agent not
system it does not have consciousness
enough emotions in general so emotions
I see consciousness is being understand
spectrum as emotions it is a component
of the subjective experience that is
meant very much to guide behavior
generation right hands meant to guide
your behavior in zone human intelligence
and animal intelligence as evolved for
the purpose of behavior generation right
including in a social context so that's
why we actually need emotions that's why
we need cash is an artificial
intelligence system developed in
different context may well never need
them may well may will never become just
like science at that point I would argue
it's possible to imagine that there's
echoes of consciousness in science when
viewed as an organism that science is
consciousness so I mean how would you go
about testing this hypothesis how do you
I probed the subjective experience of an
abstract system like science well the
point of probing any subjective
experience is impossible is that I'm not
science I'm Lex so I can't probe another
entities the another it's no more than
when bacteria on my skewer lacks I can
ask you questions about your subjective
expanse and you can answer me and that's
how I know you're conscious yes but
that's because you speak the same
language you perhaps we have to speak
the language of science as I say I don't
think consciousness just like emotions
of pain and pleasure is not something
that inevitably arises from any sort of
sufficiently intelligent information
processing it is a feature of the mind
and if you've not implemented it
explicitly
is not there so you think it's a fee
it's an emergent feature of a particular
architecture so do you think it's it's a
feature in the Simpsons so again the
subjective experience is all about
guiding behavior if you if if the
problems you're trying to solve don't
really involve and bedight agents maybe
in a social context generating the view
and pursuing goals like this and if you
get fans that's not sure what's what's
happening even though it is it is a form
of artificial AR in artificial
intelligence in the sense that it is
solving problems this is a community
knowledge creating a solutions and so on
so if you're not explicitly implementing
a subjective experience implementing
certain emotions and implementing
consciousness it's not going to just
spontaneously emerge yeah but so for
system like human-like intelligence
system that has consciousness yeah do
you think he needs to have a body yeah
it's definitely I mean doesn't have to
be a physical body right and there's not
that much difference between a realistic
simulation or your world so there has to
be something you have to preserve kind
of thing yes but human-like intelligence
can only arise in the in human right
context intelligence in other humans in
order for you to demonstrate that you
have human-like intelligence essentially
yes so what kind of test and
demonstration would be sufficient for
you to demonstrate human-like
intelligence yeah I just started
curiosity you you talked about in terms
of theorem proving and program synthesis
I think you've written about that
there's no good benchmarks for this yeah
that's one of the problems so let's
let's talk programs program synthesis so
what do you imagine is the goods
I think it's related questions for
human-like intelligence therefore
program synthesis what's a good
benchmark for either both right so I
mean you're actually asking asking two
questions which is one is about
quantifying intelligence and comparing
the intelligence of an artificial system
to the intelligence of a human and the
other is about a degree to which this
intelligence is human right is actually
two different questions so if you look
at you mentioned earlier the Turing test
well I actually don't like the Turing
test because it's very lazy it's it's
all about completely bypassing the
problem of defining and measuring
intelligence right and instead
delegating to a human judge or panel of
human judges so it's it's it's at or
cop-out right if you want to measure how
human-like an agent is I think you have
to make it interact with other humans
maybe it's it's not necessarily good
idea to have these other humans be the
judges maybe you should just observe
behavior and comparison where the human
will actually have done
when it comes to measuring how smart our
clever
an agent is and comparing that today to
the degree of human intelligence so
we're already talking about two things
right the degree I kind of like the
magnitude magnitude of an intelligence
and its direction right like the norm of
the vector right and its direction and
the direction is like human likeness and
the magnitude the norm is intelligence
you could call it intelligence right so
the the direction here your sense the
the space of directions that are
human-like is very narrow
yeah so the the way you would measure
the magnitude of intelligence in a
system in a way that that also enables
you to compare it to that of a human
well if if you look at different
benchmarks for intelligence today
they're all too focused on skill at a
given task let's scale that playing
chess
yeah spirit playing goes skillet playing
Duda and I I think that's that's not the
right way to go about it because you can
always be too human it at one specific
task the reason why our skill at playing
goal or our juggling or anything is
impressive is because we are expressing
this skill within a certain set of
constraints if you remove the
constraints the constraints that we have
one lifetime that we have this body and
so on if you remove the context if you
have unlimited string data if you can
add access to you know for instance if
you look at juggling at if you have no
restriction on the hardware then
achieving arbitrary levels of skill is
not very interesting it and says nothing
about the amount of intelligence you've
achieved so if you want to measure
intelligence you need to rigorously
define what intelligence is which in
itself units it's a very challenging
problem and do you think that's possible
if you define integers yes absolutely
I mean you can provide many people have
provided you know some definition I have
my own definition
where does your definition begin if it
doesn't end well I think intelligence is
essentially the efficiency with which
you turn experience into generalizable
programs so what that means is it's the
efficiency with which you turn a
sampling of experience base into the
ability to process a larger chunk of
experience base
so measuring skill can be one proxy
because many management tasks can be one
proxy for measure intelligence but if
you want the only measured skill you
should control for two things you should
control
form
a mod effects be ins that your system
has and the priors that your system has
but if you if you control if you look at
two agents and you give them the same
priors and you give them the same amount
of experience there is one of the agents
that is going to learn programs
representation something the model that
will perform well on the larger trunk
effects payin space and the other and
that is the smaller agent yes so if you
you have fixed the experience which
generate better programs get better
meaning more generalizable that's really
interesting and that's a very nice clean
definition of oh by the way in this
definition it's it is already very
obvious that intelligence has to be
specialized because you're talking about
experience space and you're talking
about segments of experience space
you're talking about priors and you're
talking about experience all of these
things define the context in which
intelligence emerges and you you can
never look at the totality of experience
space right so intelligence has to be
specialized and but it can be
sufficiently large the experience space
even though specialized there's a
certain point when the experience base
is large enough to where it might as
well be general it feels general it
looks general sure I mean it's it's very
less developed for instance many people
would say human intelligence is general
in fact it is it is quite specialized
you know the we can definitely build
systems that start from the same innate
priors that's what humans have at Birth
because we already understand very well
what sort of priors we have as humans
like many people have worked on this
problem most notably as a bethe a spelke
from how about I know if you know her
his work the rotten and what she calls a
core knowledge and it is very much about
trying to determine and and and describe
what priors we are born with like
language skills and so on and all that
kind of stuff exactly
so we we have some some pretty good
understanding of what price we are born
with so we could so I've actually been
working on a benchmark for the past
couple years you know on earth I hope to
be able is it at some point is to
measure intelligence of systems by
culturing for priors culturing for
amount of expands and by assuming the
same priors as with humans are born with
so that you can actually compare this
course to human intelligence and you can
actually have humans pass the same test
in in a way that's fair yeah and so
importantly such a benchmark should be
such that
any amount of practicing does not
increase your score so try to picture a
game where no matter how much you play
this game that does not change your
skill at the game can you picture that
as a person who deeply appreciates
practice I cannot actually so it is not
I can a there's actually a very simple
trick so in order to come up with a task
so the only thing you can measure is
skill at the task yes all tasks are
gonna involve Pryor's here the trick is
to know where they are and and to
describe that and then you make sure
that this is the same set of priors as
what human stuff is so you create a task
that assumes this priors that exactly
documents is brightest so that's the
price I made explicit and I'll no other
priors involved and then you generate a
certain number of samples in experience
base for this task right and this for
one task assuming that the task is new
for the agent passing it that's one test
of this definition of intelligence and
between that we set up and now you can
scale that to management tasks that all
you know each task should be new to the
agent bassinet so the switch will be
human human interpretive ball and the
son of also lets you can actually have a
human pass the same test and then you
can compare the squash a machine and
squash your human which could be a lot
as they could even start a task a
chemist just as long as you start with
the same set of yes so the problem is M
missed humans already trained to
recognize digits right and but let's say
let's say we're considering objects that
are not digits
some completely arbitrary patterns while
humans already come with visual priors
about how to process that mm-hmm so in
order to to make the game fair you would
have to isolate these priors and
describe them and then express them as
computational rules having worked a lot
with vision science people as
exceptionally difficult process has been
a there's been a lot of good tests and
basically reducing all human vision into
some good priors and we're still
probably far away from that perfectly
but as a start for a benchmark that's an
exciting possibility yeah so I said with
spelke actually lists
abjectness as one of the core knowledge
buyers abjectness Koha Brickner yeah so
we have priors about object nests like
about the visual space about time about
agents but goal-oriented behavior we
have many different priors but what's
interesting is that sure we have you
know this is pretty diverse and an
enriched set of pairs but it was - not
that diverse right we are not born into
this world wheel with a ton of knowledge
about the world with only a small set of
cual knowledge he has hardly ever sense
of how it feels to us humans that that
set is not that large but just even the
nature of time that we kind of integrate
pretty effectively through all of our
perception all of our reasoning maybe
how you know do you have a sense of how
easy it is to encode those priors maybe
it requires building a universe mm-hmm
and the human brain in order to encode
those priors what do you have a hope
that it's can be listed like an accent I
don't think so so you have to keep in
mind that any knowledge about the world
that we are born with is something that
has to have been encoded into our DNA by
evolution at some point right and Gina
is a very very low bandwidth medium like
it's extremely long and expensive to
include anything into DNA because first
of all you need some sort of
evolutionary pressure to guide this
writing process and then you know the
higher level information in trying to
write the longer it's gonna take and the
thing in the environment that you are
trying to encode knowledge but has to be
stable over this this duration yes so
you can only include into DNA things
that constitute an evolutionary
advantage so this is actually a very
small subset of all possible knowledge
about the world you can only encode
things that are a stable that are true
with a very very long period of time to
begin millions of years for instance we
might have some visual prior but the
shape of snakes right but all what makes
a face what's the difference between a
face and on face but consider this
interesting question do we have any
innate sense of the visual difference
between a male face and a female face
what do you think for human I mean I
would have to look back into
evolutionary history when the genders
emerged but yeah most I mean the faces
of humans are quite different to my face
of great hips great apes right yeah like
you didn't say you couldn't tell the
face of human pansy from the face of a
male ship and she probably yeah that'll
hide us humans of all that so we do have
innate knowledge of what makes a face
but it's actually impossible for us to
have any DNA encoded knowledge of the
difference between a female human face
and a male human face because the
that knowledge that information came up
into the world actually very recently if
you look at the at the slowness of the
process of encoding knowledge into DNA
yeah so that's interesting that's a
really powerful argument the DNA is a
low bandwidth and it takes a long time
to encode here that naturally creates a
very efficient encoding but hence the
yeah one one important consequence of
this is that so yes we are born into
this world with a bunch of knowledge
sometimes I high-level knowledge about
the world like the shape the rough shape
of the snake of the raft shape of face
but importantly because this knowledge
takes so long to write almost all of
this innate knowledge is shared with our
cousins with with great apes right so it
is not actually this innate knowledge
and that makes us special but to throw
it right back at you from the earlier on
in our discussion it's that encoding
might also include the entirety of the
environment of Earth to some extent so
it can it can include things that are
important to survival and reproduction
so the for which there is some
evolutionary pressure and things that
are a stable constant over very very
very long time players and honestly it's
not that much information there's also
beside the bandwidths constrain and
constraints of the writing process
there's also a memory constraints like
DNA the part of DNA that deals with the
human brain is actually very small it's
like you know on the order of megabytes
right it's not that much high-level
knowledge about the world you can encode
that's quite brilliant and hopeful for
benchmark of that you're referring to of
encoding priors actually look forward to
i'm skeptical whether you can do in this
couple years but hopefully i've been
working so honestly it's a very simple
benchmark and it's not like a big
breakfast or anything it's more like a
fun a fun side project right these fun
so is imagenet these fun side projects
could launch entire groups of efforts
towards uh towards creating reasoning
systems and so on and i think yeah
that's Nicole it's trying to measure a
strong generalization to measure the
strength of abstraction
you know minds right now mind something
in a in a fishery contagion and if
there's anything through about this
science organism is its individual cells
love competition
so in benchmarks encourage competition
so that's uh yeah that's an exciting
possibility if you are do you think an
AI winter is coming and how do we
prevent it not really so an AI winter is
something that would occur when there's
a big mismatch between how we are
selling the capabilities of VI and and
the actual capabilities of VI and
today's when the planning is creating a
lot of value and we keep creating a lot
of value in the sense that this is
models are applicable to a very wide
range of problems that are written today
and we are only just getting started
with the crimes algorithms to every
problem they could be solving so the
planning will keep creating a lot of
value for the time being what's
concerning however is that there's a lot
of hype around deplaning anaronnie idea
lots of people are over selling the
capabilities of these systems not just
the capabilities but also over selling
them the fact that they might
be more or less in a brain like like
you've given a kind of a mystical aspect
these technologies and also over setting
the pace of progress which you know it
might look fast in the sense that we
have this exponentially increasing
number of papers but again that's just a
simple consequence of the fact that we
have everyone more people coming into
the field doesn't mean the progress
isn't is actually exponentially fast
like let's say you're trying to raise
money for your startup or your research
lab you might want to tell you know a
grandiose stories to investors about how
deep learning is just like the brain and
hide consume all these incredible
problems like self-driving and robotics
and so on and maybe you can tell them
that the field is progressing so fast
and we are gonna have HDI within 15
years or even ten years and oh none of
this is true and every time you're like
saying these things and an investor or
you know a decision maker beliefs them
well your this is like the equivalent of
taking on credit card debt
yeah but for for trust right and maybe
this win you know this will this will be
what enables you to raise a lot of money
but she ultimately you are creating
damage to our dimensional fields that's
the concern is that that that that's
what happens the other day I winters as
the the concern is you actually tweet
about the skooled autonomous vehicles
right there's a almost every single
company now have promised that they will
have full autonomous vehicles by Twenty
twenty one twenty two this is a good
example of that the consequences of
overhyping the capabilities of AI and
the pace of progress because I work
especially a lot recently in this area I
have a deep concern of what happens when
all these companies after I've invested
billions have a meeting and say how much
do we actually first of all do we have
an autonomous vehicles the answer it
will definitely be no and second will be
wait a minute we've invested one two
three
for a billion dollars into this and we
made no profit and the reaction to that
may be going very hard in another
directions that might impact either even
other industries and that's what we call
in the air winter is when there is
backlash well no one believes any of
these promises anymore because they've
turned that be big lies the first time
around yeah and this will definitely
happen to some extent for autonomous
vehicles because the public and decision
makers have been convinced that you know
around around 2015 they've been
convinced by these people who are trying
to raise money for a start-up and so on
that l5 driving was coming mean maybe
2016 maybe 2017 may 2018 now when 2019
was still waiting for it and so I I
don't believe we are going to have a
full-on AI winter because we have this
technologies that are producing a
tremendous amount of free all value
right but there is also too much hype so
there will be some backlash especially
there will be backlash so you know some
startups are trying to sell the dream of
AGI alright and and the fact that Asia
is going to create infinite value like
EG is like a free lunch like if you can
if you can develop an AI system that
passes a certain threshold they of IQ or
something then suddenly you have
infinite value yes and well there are
actually lots of investors buying into
this idea and you know they will wait
maybe maybe 10 15 years and nothing will
happen and and the next time around well
maybe maybe there will be an in
generation of investors no one will care
you know a human memory is very short
after all I don't know about you but
because I've spoken about AGI sometimes
poetically like and I get a lot of
emails from people giving me they're
usually like a large manifestos
of they've they say to me that they have
created an AGI system or they know how
to do it and there's the long write-up
of how to doing it so that was easy man
they are there little bit feel like it's
generated by an AI system actually but
there's usually no guidance recursively
sitting here exactly
it's you have a transformer generating
crank papers about this yeah so what the
question is about because you've been
such a good you have a good radar for
crank papers how do we know they're not
onto something
how do I so when you start to talk about
a GI or anything like the reasoning
benchmarks and so on so something that
doesn't have a benchmark it's really
difficult to know I mean I talked to
Jeff Hawkins who's really looking at
neuroscience approaches to hug and
there's some there's echoes of really
interesting ideas in at least just case
which is Charlie how do you usually
think about this they like preventing
yourself from being too narrow-minded
and elitist about you know deep learning
it has to work on these particular
benchmarks otherwise it's trash well you
know the thing is intelligence does not
exist in the abstract intelligences to
be applied so if you don't have a
benchmark if you're an improvement and
some benchmark maybe it's a new
benchmark all right maybe it's not
something I've been you again before but
you juni is a problem that you're trying
so you're not gonna come up with a
solution with that a problem
so you general intelligence I mean
you've clearly highlight a
generalization if you want to claim that
you have an intelligent system it should
come with the benchmark issued yes it
should display capabilities of some kind
it should it should show that it can
create some form of value even if it's a
very artificial form of value and that's
also the reason why you don't actually
need to care about turning which papers
actually submit in potential and which
do not
because if
if there is a new technique it's
actually creating value you know this is
going to be brought to light very
quickly because it's actually making a
difference so it's the difference
between something that's ineffective and
something that is actually useful and
ultimately usefulness is our guide not
just in this field but if you look at
science in general maybe there are many
many people over the years that have had
some really interesting theories of
everything but they were just completely
useless and you don't actually need to
tell the interesting theories from the
user series all you need is to see you
know is this actually having an effect
on something else
you know is this actually useful it is
this making an impact or not as
beautiful put I mean the same applies to
quantum mechanics to a string theory to
the holographic principle we are doing
the planning because it works you know
that's like before I started working
people you know I considered people
working on our neural networks as as
cranks very much like you know no one
was working anymore and now it's working
which is what makes it valuable it's not
about being right right it's about being
effective and nevertheless the
individual entities is a scientific
mechanism just like yoshua bengio a
young McCune they while being called
cranks stuck with it right yeah and so
us individual agents even if everyone's
laughing at us just stick with it
because if you believe you have
something you should stick with it and
see it's true that's a beautiful
inspirational message to end on first of
all thank you so much for talking today
that was amazing thank you
you