François Chollet: Keras, Deep Learning, and the Progress of AI

François Chollet: Keras, Deep Learning, and the Progress of AI | Lex Fridman Podcast #38

Bo8MY4JpiXE • 2019-09-14

Transcript preview

Open

Kind: captions
Language: en
the following is a conversation with
Francois Shelley
he's the creator of Karass which is an
open source deep learning library that
is designed to enable fast user friendly
experimentation with deep neural
networks it serves as an interface to
several deep learning libraries most
popular of which is tensorflow and it
was integrated into the tensorflow main
codebase a while ago meaning if you want
to create train and news Neil networks
probably the easiest the most popular
option is to use chaos inside tensorflow
aside from creating an exceptionally
useful and popular library Francois was
also world-class AI researcher and
software engineer at Google and he's
definitely an outspoken if not
controversial personality in the AI
world especially in the realm of ideas
around the future of artificial
intelligence this is the artificial
intelligence podcast if you enjoy it
subscribe on YouTube give us five stars
and iTunes supported on patreon or
simply connect with me on Twitter at lex
Friedman spelled Fri D M a.m. and now
here's my conversation with Francois
shall I you're known for not
sugarcoating your opinions and speaking
your mind about ideas and AI especially
on Twitter it's one of my favorite
Twitter accounts so what's one of the
more controversial ideas you've
expressed online and gotten some heat
for how do you pick yeah no I think if
you have if you go through the trouble
of maintaining Twitter accounts you
might as well speak your mind you know
otherwise it's you know what's even the
point filling in Twitter accounts
they're getting nice Colin just didn't
leave it in in the garage yes so what's
one thing for which I got out of push
back perhaps you know that time I wrote
something about the idea of intelligence
explosion and I was questioning the ID
and the reasoning behind this idea and I
guess I was push back on that I guess
not a flag for it so yeah so integers
explore
I'm sure if Mei was the idea but it's
the idea that if you were to build
general AI problem-solving algorithms
well the problem of building such an AI
that itself is a problem that could be
solved by your eye and maybe it could be
so better than that then what humans can
do so you're a I could start tweaking
its own algorithm good that start being
a better version of itself and so on
it's ratified in a recursive fashion and
so you would end up with an AI with
exponentially increasing intelligence
all right and I was basically
questioning this idea first of all
because the notion of intelligence
explosion uses an implicit definition of
intelligence that doesn't sound quite
right to me it considers intelligence as
property of a grain that you can
consider in isolation like the height of
the building for instance right but
that's not really what intelligence is
intelligence emerges from the
interaction between a brain a body like
embodied intelligence and an environment
and if you're missing one of these
pieces then you can actually define
interagency so just tweaking a brain to
make it smaller and smaller doesn't
actually make any sense to me so first
of all you're crushing the dreams of
many people right so there's a little
bit like say Maris I feel a lot of
physicists max tegmark people who think
you know the universe is an information
processing system our brain is kind of
an information processing system so
what's the theoretical limit like it
doesn't make sense that there should be
some it seems naive to think that our
own brain is somehow the limit of the
capabilities and this information is
just I'm playing devil's advocate here
this information processing system and
then if you just scale it if you're able
to build something that's on par with
the brain you just the process that
builds it just continues and it will
improve exponentially so that that's the
logic that's used actually by almost
everybody that is worried about
superhuman intelligence yeah so you're
you're trying to make so most people who
are skeptical that are kind of like this
doesn't their thought process this
doesn't feel right
like that's for me as well so I'm more
like it doesn't we the whole thing is
shrouded in mystery where you you can't
really say anything concrete but you
could say this doesn't feel right this
doesn't feel like that's how the brain
works and you're trying to with your
blog post and now making a little more
explicit so one idea is that the brain
isn't exists alone it exists within the
environment so you can't exponentially
you have to somehow exponentially
improve the environment and the brain
together almost yeah in order to create
something that's much smarter in some
kind of of course we don't have a
definition of intelligence that's right
that's correct
III don't think if you look at very
smart people today even humans not even
talking about a eyes I don't think their
brain and the toughness of their brain
is the bottleneck to the actually
expressed intelligence to their
achievements
you cannot just tweak one part of this
system back of this brain body
environment system and expect
capabilities like what emerges out of
this system to just you know explode
exponentially because anytime you
improve one part of the system with many
interdependencies like this there's a
new bottleneck that arises right and I
don't think even today for very smart
people their brain is not the bottleneck
to the sort of problems they can solve
right in fact many various what people
to them you know they are not actually
solving any big scientific problems in a
tense time they like Einstein but you
know the the patent clerk days like
Iceland became Einstein because this was
a meeting of a genius with a big problem
at the right time right
but maybe this meeting could have noon
and never happens and then Iceland
there's just been a patent clerk it's
and in fact many people today are
probably like genius level smart but you
wouldn't know because they're not really
expressing any of that was brilliant so
we can think of the world earth but also
the universe is just as the space of
problems so all these problems and tasks
are roaming it a various difficulty and
there's agents creatures like ourselves
and animals and so on that are also
roaming it and then you get coupled with
a problem and then you solve it but
without that coupling
you can't demonstrate your quote-unquote
intelligence exactly intelligence is the
meaning of great problem-solving
capabilities with a great problem and if
you don't have the problem you don't
react spreche in intelligence all you're
left with is potential intelligence like
the performance of your brain are you
know
haha your IQ is which in itself it's
just a number right
so you mentioned problem-solving
capacity yeah what what do you think of
as problem-solving about what can you
try to define intelligence like what
does it mean to be more or less
intelligent is it completely coupled to
a particular problem or is there
something a little bit more universal
yeah I do believe all intelligence is
specialized intelligence even human
intelligence has some degree of
generality well all intelligence systems
have some degree of generality they're
always specialized in in one category of
problems so the human intelligence is
specialized in the human experience and
that shows at various levels that shows
in some prior knowledge that's innate
that we have at birth knowledge about
things like agents goal-driven behavior
visual priors about what makes an object
try us about time and so on that shows
also in the way we learn for instance is
very very fast to pick up language it's
very very easy for us to learn certain
things because we are basically
hard-coded to learn them and we are
specialized in solving certain kinds of
problem and we are quite useless when it
comes to other kinds of problems for
instance we we are not really designed
to handle very long term problems we
have no capability of seeing that the
very long term we don't have them
how much working memory you know so how
do you think about long term using long
term planning we're talking about scale
of years millennia what do you mean by
long term were not very good well human
intelligence is specialized in the human
experience and humans experience is very
short like one lifetime is short even
within one lifetime we have a very hard
time envisioning you know things on a
scale of yells like it's very difficult
to project yourself at at the scale of
favi at the scale of ten years and so on
right we can solve only fairly narrowly
scoped problems so when it comes to
solving bigger problems larger scale
problems we are not actually doing it on
an individual level so it's not actually
our brain doing it we we have this thing
called civilization right which is
itself a sort of problem solving system
a sort of artificially intelligent
system right and it's not running on one
brain is ringing on a network of brains
in fact it's running on much more than a
network of brains it's running on a lot
of infrastructure like books and
computers and the internet and human
institutions and so on and that is
capable of handling problems on the on a
much greater scale in any individual
human if you look at some
computer science for instance that's an
institution that solves problems and
it's it is super human right
I took resin on a greater scale it can
source cancer much bigger problem than
an individual human good and science
itself science as a system as an
institution is a crime affair
artificial intelligence problem solving
algorithm that is superhuman yes these
computer science is like a theorem
prover at a scale of thousands maybe
hundreds of thousands of human beings at
a scale what do you think is a
intelligent agent so there's us humans
at the individual level there is
millions maybe billions of bacteria on
our skin there is that's at the smaller
scale you can even go to the particle
level as systems that behave you
couldn't say intelligently in some ways
and then you can look at the earth as a
single organism you can look at our
galaxy and even the universe is just a
little organism do you think how do you
think about scale and defining
intelligent systems and we're here at
Google there is millions of devices
doing computation just in a distributed
way how do you think what intelligence
there's a scale you can always
characterize anything as a system I
think people who talk about things like
intelligence explosion tend to focus on
one Asian is basically one brain like
one brain considered in isolation like a
brain a jaw that's controlling your body
in a very like top to bottom can a
fashion and that body is person goes
into an environment so it's a very
hierarchical view you have the brain at
the top of the pyramid then you have to
bother just plainly receiving orders and
then the body is manipulating objects in
environment and so on so everything is
subordinate to this one thing this
epicenter which is the brain but in real
life intelligent agents don't really
work like this right there is no strong
delimitation between the brain and the
body stalin's you have to look not just
to the brain but at the nervous system
but then the nervous system and the body
are not free to step
and it is so you have to look at an
entire animal as one agent but then you
start realizing as you observe an animal
of any length of time that a lot of the
intelligence of an animal is actually
externalized that's especially true for
humans a lot of our intelligence is
externalized when you write down some
notes that is externalized intelligence
when you write the computer program you
are externalizing cognition
so it's externalizing books it's
generalized in in computers the internet
in other humans
it's externalizing language and so on so
it's there is no like hardly limitation
of what makes an intelligent agent it's
all about context okay but alphago is
better at go than the best humor player
you know there's levels of skill here so
do you think there is such a ability as
such a concept as a intelligence
explosion and a specific task and then
well yeah do you think it's possible to
have a category of tasks on which you do
have something like an exponential
growth of ability to solve that
particular problem
I think if you consider specificity corn
is probably possible to some extent I
also don't think we have to speculate
about it's because we have real-world
examples of frequency self-improving
intelligence systems for instance
science
problem-solving system and knowledge
generation system like a system that
experiences the world in some sense and
then gradually understands it and can
act on it and that system is superhuman
and it is clearly recursively
self-improving because science feeds
into technology technology can be used
to build better tools with our computers
better instrumentation and so on which
in turn I can make sense faster right so
science is probably the closest thing we
have today to a recursively
self-improving super human AI and you
can just observe you know it's science
its scientific progress to the exploding
which you know it's that vision isn't is
an interesting question you can use that
as a basis to try to understand what we
happen with a superhuman AI that as a
science track behavior let me linger on
it a little bit more
what is your intuition why an
intelligence explosion is not possible
like taking the scientific all the
semantic revolutions why can't we
slightly accelerate that process so you
you can absolutely accelerates any
problem solving process so recursively
as recursive self-improvement is
absolutely a real thing but what happens
with recursively seven boring system
it's typically not explosion because no
system exists in isolation and so
tweaking one part of the system means
that suddenly another pollow system
becomes a bottleneck and if you look at
science for instance which is clearly a
recursively self-improving clearly a
problem-solving system scientific
progress is not actually exploding if
you look at science what you see is the
picture of a system that is consuming an
exponentially increasing amount of
resources but it's having a linear
output in terms of scientific progress
and maybe that that will seem like a
very strong claim many people are
actually saying that you know scientific
progress is exponential but when they
are claiming this they are actually
looking at indicators of resource
consumption resource consumption by
science
the number of papers being published the
number of parents being filed and so on
which are just just completely credited
with how many people are working on
science today yeah right so it's
actually an indicator of resource
consumption but what you should look at
is the ad put is progress in terms of
the knowledge that sales generates in
terms of the scope and significance of
the problems that we solve and some
people have actually been trying to
measure that like Michael Neilson for
instance he had a very nice paper I
think that was last year about it so his
approach to measure a scientific
progress was to look at the time line of
scientific discoveries over the past you
know hundred 150 years and for each
measure discovery ask a panel of experts
to rate the significance of the
discovery and if the output of Sciences
institution were exponential you will
expect the example density of
significance to go up exponentially
maybe because there's a faster rate of
discoveries maybe because the
discoveries are you know increasingly
more important and what actually happens
if you if you plot this temporal density
of significance measured in this way is
that you see very much a flat graph you
see a flat graph across all disciplines
across physics biology medicine and so
on and it actually makes a lot of sense
if you think about it because thing
about the progress of physics a hundred
and ten years ago right it was a time of
crazy change think about the progress of
technology
you know 160 years ago when we started
it in you know replacing horses with
scars
when we saw that in electricity and so
on it was a time of incredible change
and today is also a time a very fast
change but it would be an unfair
characterization to say that today
technology in science are moving way
faster than they did 50 years ago 100
years ago and if you do try to
regardless plots the temporal density of
the significance
you have significance idea of seeing a
family sorry you do see very flat curves
let's fasten and and you can check out
the paper that Michael Nielson had about
this idea and so the way interpret is as
you make progress in an in a given field
on any given subtitles it becomes
exponentially more difficult to make
further progress like the very first
person to work on information theory if
you enter a new field and still the very
early years there's a lot of low-hanging
fruits you can take that's right yeah
but the next generation of researchers
is gonna have to dig much harder
actually to make smaller discoveries a
probably larger number of small
discoveries and to achieve the same
amount of impact you're gonna need a
much greater headcount
and that's exactly the picture you're
seeing with science that the number of
scientists and engineers is in fact
increasing exponentially the amount of
computational resources that are
available to science is increasing
exponentially and so on so the resource
consumption of science is exponential
but the output in terms of progress in
terms of significance is linear and the
reason why is because and even though
science is recursively self-improving
meaning that scientific progress mm-hmm
turns into technological progress which
in turn helps science if you look at
computers for instance our products of
science and computers are tremendously
useful in spinning up science the
internet same thing the engine is a
technology that's made possible by
various incentive advances and itself
because it enables you know scientists
to to network to communicate to exchange
papers and ideas much faster it is a way
to speed eccentric products so even
though you're looking at a recursively
self-improving system it is consuming
Spanish way more resources to produce
the same amount of problem-solving
so that's the fascinating way to paint
and certainly that holds for the deep
learning community right if you look at
the temporal what did you call it the
temporal density of significant ideas if
you look at in deep learning I think I'd
have to think about that but if you
really look at significant ideas in deep
learning they might even be decreasing
so I I do believe the per per paper
significance
it's like creasing with signified and
the amount of papers is still today
exponentially increasing sweating if you
look at an aggregate my guess is that
you would see a linear progress you're
probably aware to some to some the
significance of all papers
you would see roughly in your profits
and in in my opinion it is not
coincidence that you're seeing in your
progress in science despite exponential
resource conception I think the resource
consumption is dynamically adjusting
itself to maintain linear progress
because the we as a community expecting
your progress meaning that if we start
investing less and sing less progress it
means that suddenly there are some
low-hanging fruits that become available
and someone's going to step in step up
and pick them right right so it's very
much like a market right for discoveries
and ideas but there's another
fundamental part which you're
highlighting which as a hypothesis as
science or like the space of ideas any
one path you travel down it gets
exponentially more difficult to get a
new way to develop new ideas yes and
your sense is that fun that's gonna hold
across our mysterious universe yes when
exponential promise Stringer's
exponential friction so that if you
tweak one part of a system suddenly some
other part becomes a bottleneck for
instance let's say let's say develop
some device that measures
it's an acceleration and then it's it
has some engine and it add puts even
more acceleration in proportion if it's
an acceleration and you drop it
somewhere it's not going to reach
infinite speed because some it exists in
a certain context so the air around its
gonna generate friction it's gonna is
gonna you know block it at some top
speed and even if you were to consider
the broader context and lift the
bottleneck there like the bottleneck a
firm a friction then some other part of
the system which starts stepping in and
creating exponential friction maybe the
speed of light are you know whatever and
it's definitely horse true when you look
at the problem solving algorithm that is
being run by science as an institution
science as a system as you make more and
more progress this despoiling this
recursive self-improvement component you
are encountering exponential friction
like do more researchers you have
working on different ideas the more
overhead you have in
communication across researchers if you
look at you were mentioned in quantum
mechanics right well if you wants to
start making significant discoveries
today significant progress in quantum
mechanics there is an amount of
knowledge you have to ingest which is
huge so there is a very large overhead
to even start to contribute there is a
large amount of overhead to synchronize
across researchers and so on and of
course this the significant practical
experiments are going to require
exponentially expensive equipment
because there is your ones I've already
been run right so in your senses there
is no way escaping
there's no way of escaping this kind of
friction with artificial intelligence
systems yeah no I think science is very
good way to model with what we happen
with with a superhumans are you serious
if improving yeah that's intense I mean
that's that's my intuition too it's not
it's not like a mathematical proof of
anything that's not my points like I'm
not I'm not trying to prove anything I'm
just trying to make an argument to
question the narrative of intelligence
explosion which is quite a dominant
narrative and you do get a lot of
pushback if you go against it because so
for many people write AI is not just a
subfield of computer science it's more
like a belief system I just believe that
the world is headed towards an event the
singularity past which you know AI will
become we go exponential very much and
the world will be transformed and humans
will become obsolete and if you if you
go against this narrative because
because it is not really a scientific
argument but more for belief system it
is part of the identity of many people
if you go against this narrative it's
like you're attacking the identity of
people who believe in it it's almost
like saying God doesn't exist at
something right so you do get a lot of
pushback if you try to question this
ideas
first of all I believe most people all
they might not be as eloquent or
explicit as you're being but most people
in computer science and most people who
actually have built anything that you
could call AI quote unquote
would agree with you they might not be
describing in the same kind of way it's
more so the pushback you're getting it's
from people who get attached to the
narrative from not from a place of
science but from a place of imagination
yes correct miss correct so why do you
think that's so appealing because the
usual dreams that people have when you
create a super intelligent system past a
singularity that would people imagine it
somehow always destructive do you have
if you were put on your psychology hat
what's why is it so appealing to imagine
the ways that all of human civilization
will be destroyed I think it's a good
story you know it's a good story and
very interestingly it's mirrors residue
stories right reiji's mythology if you
look at the mythology of most
civilizations it's about the world being
headed towards some final event in which
the world will be destroyed and some new
world order will arise that will be
mostly spiritual like the apocalypse
followed by products probably yeah it's
a very appealing story on a fundamental
level and we all need stories we own
stories to structure in the way we see
the world especially at time scales that
are beyond our ability to make
predictions right so on a more serious
non exponential explosion
question do you think there will be a
time when we'll create something like
human level intelligence or intelligence
systems that will make you sit back and
be just surprised at damn how smart this
thing is that doesn't require
exponential growth and an exponential
improvement but what what's your sense
than a time line and so on that where
you'll be really surprised at certain
capabilities and we'll talk about
limitations and deep learners so when do
you think in your lifetime you'll be
really damn surprised around 2013-2014 I
was many times surprised by the
capabilities of deep learning actually
that was before we had assess exactly
well deepening could do and could not do
and it felt like a time of immense
potential and then we started you know
narrowing it down but I was very
surprised so it's a it's it's it's it
has already happened was there a moment
there must have been a day in there
where your surprise was almost bordering
on the belief of the narrative that we
just discussed what it was there a
moment because you've written quite
eloquently about the limits of deep
learning was there a moment that you
thought that maybe deep learning is
limitless no I don't think I've ever
believed this what was restocking is
that it it worked all right they worked
at all yes yeah but there's a there's a
big jump between being able to do really
good computer vision and human level
intelligence so I I don't think at any
points I wasn't an impression that the
results we got in computer vision meant
that we were very close to him and even
intelligence I don't think we're very
close to human ever intelligence I do
believe that there's no reason why we
want achieve it at some point I also
believe that you know it's the problem
is with talking about human level
intelligence that implicitly you are
considering like an axis of intelligence
with different levels but that's not
really how intelligence works
intelligence is very multi-dimensional
and so there's the question of
capabilities but there's
also the question is being human-like
and two very different things like you
can be potentially very advanced
intelligent agents that are not human
like at all and you can also build very
human-like agents and this out okay
two very different things right right
let's go from the philosophical to the
practical I can give me a history of
Karis and all the major deep learning
frameworks that you kind of remember in
relation to chaos and in general
tensorflow Theano the old days you give
a brief overview Wikipedia style history
and your role in it before return to AGI
discussions yeah that's a broad topic so
I started working on chaos to the name
chaos at the time I actually pick the
name like just today I was gonna release
it so I started working on it in
February 2015 and so at the time there
weren't too many people working on deep
learning maybe like fewer than 10,000
the software tuning was not really
developed
so the
deepening library was cafe which was
mostly C++ why do I say cafe was the
main one cafe was vastly more popular
than ya know in in late 2014 early 2015
cafe was the one library that everyone
was using for computer vision and
computer vision was the most popular
problem absolutely company like
covenants was like the subfield of
deplaning it everyone was working on so
myself suing in in late 2014 I was
actually interested in islands in Rico
neural networks which was a very niche
topic at the time right III a tree to
catherine around 2016 and so I was
looking for good tools and I had used
torch 7 News Channel you stay on a lot
in cable competitions mmm I just cafe
and there was no like good solution for
Ireland's at the time like there was no
reusable open-source implementation of
in lsdm for instance so I decided to
build my own and that first the pitch
for that was it was going to be mostly
around lsdm Iconia networks it was going
to be in Python an important decision at
the time that was Canon are obvious is
that the models would be defined yeah a
Python code which was kind of like going
against the mainstream at the time
because cafe Thailand who wants on like
all the big libraries were actually
going with you approach sharing static
configuration files in Yemen to define
models so some libraries were using code
to define models like torch 7 obviously
that was not Python Lezyne was like a
piano based very early library that was
I think developed I don't remember
exactly probably late 2014
Python as well it's Python as well it
was it was like on top of Tiano and so I
started working on something
and in the value proposition at the time
was that not only that the what I think
was the first reducible open-source
implementation FRS diem
you could combine Islands and covenants
with the same library which is not
really possible before like a he was on
into incontinence and it was kind of
easy to use because so before I was
using ten I was actually using
psychically on and I loved psychically
for its usability so I drew a lot of
inspiration from psychic then when I
went Cara's it's almost like cycling for
neural networks yeah
the fit function exactly the fit
function like reducing a complex
training loop to a single function call
right and of course you know some people
will say this is hiding a lot of details
but that's exactly the point
all right the magic is the point all
right so it's magical but in a good way
it's magical in the sense that it's
delightful yeah right yeah I'm actually
quite surprised I didn't know that it
was born out of desire to implement our
hands in lc/ms it was that's fascinating
so you were actually one of the first
people to really try to attempt to get
the major architectures together and
it's also interesting you made me
realize that that was a design decision
at all is defining the model in code
just I'm putting myself in your shoes
whether the yamo especially if cafe was
the most popular it was the most but I
might fall if I was I'm if I were yeah I
don't it I didn't like the yellow thing
but it makes more sense that you will
put in a configuration file the
definition of a model that's an
interesting gutsy move just stick with
defining it in code just if you look
back other libraries we're doing it as
well but it was definitely the more
niche option yeah okay Cara's and then
girls so I really scare us in March 2015
and it got she's just pretty much from
the start so the deep learning community
was very small at the time
lots of people were starting to be
interested in the rest um so it was
gonna release it at the right time
because it was offering an easy to use
it as team implementation exactly at the
time where lots of yours started to be
intrigued by the capabilities of onin on
ins one LP so it it grew from there
then I joined Google
about six months later and that was
actually completely unrelated to took
care us actually joined a research team
working on image classification mostly
like computer vision so I was doing
computer vision research at Google
initially and immediately when I joined
Google I was exposed to the early
internal version of tensorflow
and the way to appeal to me at the time
and that was definitely the way it was
at the time is that this was an improved
version of Tiano
so I immediately knew I had to port cars
to this new tensorflow thing and I was
actually very busy as as as a noogler as
a new Googler so I had not time to work
on that but then in November I think
twist November 2015
tensorflow got released and it was kind
of like my my wake-up call at hey to
actually you know go and make it happen
so in December I I putted cars to run on
two of tensorflow but it was not exactly
port it was more like a refactoring
where I was abstracting away all the
backend functionality into one module
then the same codebase could run on top
of multiple backends right so on top of
things fluor Theano and for the next
year yeah no you know stayed as the
default option it was you know it was
easier to use somewhat let's begin it
was much faster especially when he came
to Orleans but eventually you know a
tensorflow
overtook it right and test of all the
early tests for similar
architectural decisions there's the
arrow yeah so what is there was a
natural as a natural transition yeah
absolutely
so what I mean that still carries is the
side almost fun project right yeah so it
it was not my job assignment it's not I
was doing it on the side that so I'm and
even though it's great to have you know
a lot of uses for a deepening library at
the time like throughout 2016 but I
wasn't doing it as my main job so things
solid
changing in I think it's mustard maybe
October 2016 so one year later so Rashad
who has the lead intensive law basically
showed up one day in in our building
while I was doing like so I was doing
research in things like so I added of
computer vision research also
collaborations with Christians getting
and deep planning for theorem proving it
was a really interesting research topic
answer Rajat was saying hey we saw chaos
we liked it we saw that you had Google
why don't you come over for like a
quarter and and and work with us I was
like yeah that sounds like a great
opportunity let's do it and so I started
working on integrating the chaos API
into tends to flow more tightly so what
fold up is a sort of like temporary
tents of lonely version of chaos that
was in tents for that contrib for a
while and finally moved to dance to the
core and you know I've never actually
gotten back to my old sim doing research
well it's it's kind of funny that
somebody like you who dreams of or at
least sees the power of AI systems the
reason and they were improving will talk
about has also created a system and
makes the the most basic kind of LEGO
building that is deep learning super
accessible super easy so beautifully so
that's the funny irony that you're Billy
there's just both you're responsible for
both things but so telephoto 2.0 it's
kind of there's a sprint I don't know
how long I'll take but there's a sprint
towards the finish what do you look what
are you working on these days whether
you're excited about what are you
excited about in 2.0 I mean eager
execution there's so many things that
just make it a lot easier
yeah work what are you excited about and
what's also really hard what are the
problems you have to kind of saw
so I've spent the past year and a half
working on 1002 and it's been a long
journey I'm actually extremely excited
about it I think it's a great product
it's a delightful product competitive
law one we met huge progress so on the
carrot side what I'm really excited
about is that so
you know previously Kara's has been this
very easy-to-use high level interface to
do deep learning but if you wanted to
you know if you wanted a lot of
flexibility the chaos framework you know
was probably not the optimal way to do
things compared to just writing
everything from scratch so in some way
the framework was getting in the way and
in terms of you to you don't have this
at all actually you have the usability
of the high level interface but you have
the flexibility of this lower level
interface and you have this spectrum of
workflows where you can get more or less
usability and flexibility the trade-offs
depending on your needs right you can
write everything from scratch and you
get a lot of help doing so by you know
subclassing models and writing some
train loops using ego execution it's
very flexible is very easy to debug is
very powerful but all of these
integrates seamlessly with higher level
features up to you know the classic
workflows which which are very
psychically unlike and and you know are
ideal for a data scientist machining
engineer type of profile so now you can
have the same framework offering the
same set of api's that enable a spectrum
of workflows that are more or less
uniform or less high level that are
suitable for you know profiles ranging
from researchers to data scientists and
everything in between
yeah so that's super excited I mean it's
not just that it's connected to all
kinds of tooling you can go on mobile
and what that's for light it can go in
the cloud or serving and so on and all
its connected together now some of the
best software written ever is often done
by one person sometimes two so with a
Google you're now seeing sort of Karass
having to be integrated in tensorflow
I'm sure it's a ton of engineers working
on so and there's I'm sure or a lot of
tricky design decisions to be made how
does that process usually happen from at
least your perspective what are the what
are the debates like what a
is there a lot of thinking considering
different options and so on yes so a lot
of the time
I spend on Google is actually discussing
design discussions right writing design
Docs participating in design review
meetings and so on this is you know as
important as actually writing a cool
right well there's a lot of thoughts
there's a lot of thought and and a lot
of care that is that taken in coming up
with these decisions and taking into
account all of our users because
tensorflow has this extremely diverse
user base right it's not it's not like
just one user segment where everyone has
the same needs we have
small-scale production uses large-scale
production uses we have startups we have
researchers you know it's all over the
place and we have to catch up to all of
their needs if I just look at the
standard the base of C++ or Python
there's some heated debate do you have
those at Google I mean they're not here
in terms emotionally but there's
probably multiple ways to do it right so
how do you arrive through those design
meetings at the best way to do it
especially in deep learning where the
field is evolving as you're doing it is
there some magic to it there's a magic
to the process I don't know just magic
to the process but there definitely is a
process so making design decision is
about satisfying a set of constraints
but also trying to do so in the simplest
way possible because this is what can be
maintained is what can be expanding in
the future so you don't want to naively
satisfy the constraints by just you know
for each capability you need available
you're gonna come up with one argument
new idea and so on you want to design
api's and that are modular and
hierarchical so that they're there they
have an API surface that is as small as
possible right and and you want this
modular hierarchical architecture to
reflect the way that domain experts
think about the problem because as the
men expect when you're reading about a
new media you're reading each toy or
some darks
pages
you already have a way that you're
thinking about the problem you already
have like certain concepts in mind and
and and your thing about how they relate
together and when you're reading darks
you're trying to build as quickly as
possible and mapping between the
concepts
feature the new API and the concepts in
your mind so you are trying to map your
mental model as a domain expert to the
way things work in the API so you need
an API and an underlying implementation
that are reflecting the way people think
about these things so in minimizing the
time it takes them this mapping yes
minimizing the time the cognitive load
there is in in just industry knowledge
about your API an API should not be self
referential or RF referring to
implementation details it should only be
referring to domain-specific concepts
that people already never understand
brilliant so what's the future of
kerosene transfer look like what it
stands for 3.0 look like so that's gonna
to fall in the future for me to answer
especially since I'm now I'm not even
the one making these decisions okay but
so from my perspective which is you know
just one perspective among many
different perspectives on the transferor
team I'm really excited by developing
even higher level api's higher level and
Carols I'm really excited by hyper
parameter tuning by automated machine
learning or two ml I think the future is
not just you know defining a model like
like us and being Lego blocks and then
click fit on it it's more like an
automatical model let me just look at
your data and optimize the objective
view after right so that's that's what
what I'm looking - yeah so you put the
baby into a room with the problem and
come back a few hours later with a fully
solved problem exactly it's not like a
box of Lego's right it's more like the
combination of a kid that's pretty good
at Legos blocks of Legos yeah it's just
building the thing
very nice so that's that's an exciting
feature and I think there's a huge
amount of applications and revolutions
to be had under the constraints of the
discussion we previously had but what do
you think of the current limits of deep
learning if we look specifically at
these function approximator x' that
tries to generalize from data they have
you've talked about local versus extreme
generalization you mentioned in your
networks don't generalize well humans do
so there's this gap so and you've also
mentioned that externalization extreme
journals asian requires something like
reasoning to fill those gaps so how can
we start trying to build systems like
that all right yes so this is this is by
design right deplaning models are like
huge parametric models differentiable so
continuous that go from an input space
to not with space and they're trained
with gradient descent so they're
trying-- pretty much point by points
they are learning a continuous geometric
morphing from from an input vector space
to not protective space right and
because this is done point by point a
deep neural network can only make sense
of points in experience space that are
very close to things that it has already
seen in string data at best it can do
interpolation across points but that
means you know that means in order to
train your network you need a dance
sampling of the input cross ad with
space almost a point-by-point sampling
which can be very expensive if you're
dealing with complex real-world problems
like autonomous driving for instance or
our robotics is it's doable if you're
looking at the subset of the visual
space but even then it's still fairly
expensive you seen in millions of
examples and it's only going to be able
to make sense of things that are very
close to waste as seen before
and in contrast to that well of course
we have human intelligence but even if
you're not looking at human intelligence
you can look at very simple rules
algorithms if you have a symbolic rule
it can actually apply to a very very
large set of inputs because it is
abstract
it is not obtained by doing a point by
point mapping
for instance if you try to learn a
sorting algorithm using a deep neural
network well you're very much limited to
learning point by point what the sorted
representation of this specific list is
like but instead you could have a very
simple sorting algorithm written in a
few lines maybe it's just you know two
nested loops and it can process any list
at all because it is abstract because it
is a set of rules so deep learning is
really like point by point geometric
more things more things train with
conditions and meanwhile abstract rules
can generalize much better and I think
the future is which combine the two so
how do we do you think combine the tools
how do we combine good point by point
functions with programs which is what
symbolic AI type systems yeah at which
levels the combination happen and you
know obviously we're jumping into the
realm of where there's no good answers
it just kind of ideas and intuitions and
so on well if you look at the really
successful AI systems today I think they
are already hybrid systems that are
combining symbolic AI with D planning
for instance success robotics systems
are already mostly model-based
rule-based things like planning
algorithms and so on at the same time
they're using deep learning as
perception modules sometimes they're
using deep learning as a way to inject a
fuzzy intuition into a rule-based
process if you look at a system like an
a self-driving car it's not just one big
end when your network you know that
wouldn't work at all
precisely because in order to train that
you need a dense sampling of experience
space when it comes to driving which is
completely unrealistic obviously instead
the Salonika is mostly
symbolic you know it's software it's
programmed by hand it's mostly based on
explicit models in this case mostly 3d
models of the of the environment around
the car but it's interfacing with the
real world using deep learning modules
right right so the deep learning there
serves is the way to convert the raw
sensory information to something usable
by symbolic systems okay well it's
lingering that a little more so dense
sampling from input to output you said
it's obviously very difficult is it
possible in the case of send driving you
mean let's say still driving itself
driving permit for many people but let's
not even talk about self-driving let's
talk about steering so staying inside
the lane lines following yeah it's
definitely a problem cancel reason and
two in the planning model but that's
like one small subset on a second
yeah I don't like you're jumping from
the extreme so easily because I disagree
with you on that
I think well it's it's not obvious to me
that you can solve Lane following it's
no it's not it's not obvious I think
it's doable I think in general you know
there is no hard limitations to what you
can learn with a DP on network as long
as this the search space like is rich
enough is flexible enough and as long as
you have this dense sampling of the
input cross output space the problem is
that you know this dense sampling could
mean anything from 10,000 examples to
like trillions and trillions so that's
that's my question so what's your
intuition and if you could just give it
a chance and think what kind of problems
can be solved by getting a huge amounts
of data and thereby creating a dense
mapping so let's think about natural
language dialogue the Turing test do you
think the Turing test can be solved with
a neural network alone well the
deterrent test is all about tricking
people into believing that certain to
human
I don't think that's actually very
difficult because it's more about
exploiting a human perception and not so
much about intelligence there's a big
difference between mimicking in Asian
behavior an actual intogen behavior so
ok let's look at maybe the elect
surprised and so on the different
formulations of a natural language
conversation that are less about
mimicking and more about maintaining a
fun conversation that lasts for 20
minutes mm-hmm that's a little less
about mimicking and that's more about I
mean it's still mimicking but it's more
about being able to carry forward a
conversation with all the tangents that
happen in dialogue and so on do you
think that problem is learn Irbil with
this kind of well the neural network
that does the point-to-point mapping so
I think it would be very very
challenging to do this with deep
learning I don't think it's out of the
question
either I wouldn't read out the space of
problems that can be solved or the large
neural network what's your sense about
the spaces those problems so it useful
problems for us in theory it's it's
infinite right you can solve any problem
in practice
while deep learning is great fit for
perception problems in general any any
problem which is naturally a minimal to
explicit handcrafted rules or rules that
you can generate device exhaustive
search or some program space so
perception of intuition as long as you
have a sufficient ring there and that's
the question I mean perception there's
interpretation and understanding of the
scene yeah which seems to be outside the
reach of current for social systems so
do you think larger networks will be
able to start to understand the physics
and the physics of the scene the
three-dimensional structure and
relationships divisors in the scene and
so on or really that's where symbology
has to step in well it's it's always
possible
to solve these problems with with
deplaning is just extremely inefficient
a model would be an explicit rule-based
abstract model would be a flaw efficient
for better and more compressed
representation of physics then learning
justice mapping between in this
situation this thing happens if you
change the situation like slightly then
this other thing happens and so on do
you think is possible to automatically
generate the programs that would require
that kind of reasoning our dessert have
to so the word expert systems fail
there's so many facts about the world
had to be hand coded and thing is
possible to learn those logical
statements that are true about the world
and their relationships do you think I
mean that's kind of what you're
improving at a basic level is trying to
do right yeah except it's it's much
harder to farm any statements about the
world compared to family ting
mathematical statements statements about
the world you know tend to be subjective
so can you can you learn rule-based
models yes yes differently that's the
this is a field of program synthesis
however today we just don't really know
how to do it so it's it's very much a
grad search or research problem and so
we are limited to you know the sort of
at recession grassroot algorithms that
we have today personally I think genetic
algorithms are very promising so I was
like genetic programming genic priming
Zack can you discuss the field of
program synthesis like what how many
people are working and thinking about it
what where we are in the history
programs the decision what are your
hopes f

Resume

# Masa Depan Kecerdasan Buatan: Dari Keras, Mitos Singularitas, hingga Etika Algoritma

### Inti Sari (Executive Summary)
Video ini membahas wawancara mendalam dengan François Chollet, pencipta Keras dan peneliti AI di Google, mengenai realitas di balik kecerdasan buatan modern. Chollet menantang narasi populer tentang "ledakan kecerdasan" (intelligence explosion) dan singularitas, dengan berargumen bahwa kemajuan ilmiah dan AI sebenarnya menghadapi hambatan yang meningkat secara eksponensial yang menghasilkan kemajuan linear. Diskusi juga mencakup evolusi framework deep learning, keterbatasan pembelajaran mesin saat ini, ancaman nyata manipulasi algoritmik, serta pentingnya menggabungkan deep learning dengan AI simbolik untuk mencapai generalisasi yang sejati.

### Poin-Poin Kunci (Key Takeaways)
*   **Sains sebagai AI Superhuman:** Sains dan institusi pengetahuan adalah bentuk kecerdasan yang rekursif dan superhuman, namun menghadapi "gesekan eksponensial" yang membuat kemajuan linear meskipun sumber daya (peneliti/paper) bertambah eksponensial.
*   **Mitos Singularitas:** Narasi tentang ledakan kecerdasan tiba-tiba seringkali lebih mirip sistem kepercayaan atau mitos apokaliptik daripada prediksi ilmiah yang akurat.
*   **Evolusi Keras & TensorFlow:** Keras awalnya dibuat untuk mengisi kekosongan alat bantu RNN/LSTM yang mudah digunakan pada tahun 2015, sebelum akhirnya terintegrasi penuh ke dalam TensorFlow untuk menyeimbangkan kemudahan penggunaan dan fleksibilitas.
*   **Keterbatasan Deep Learning:** Deep Learning sangat baik dalam persepsi dan interpolasi lokal, tetapi tidak efisien untuk generalisasi ekstrem atau pemahaman abstrak; masa depan AI terletak pada sistem hibrida (Deep Learning + AI Simbolik).
*   **Ancaman Nyata AI:** Ancaman terbesar bukan robot yang memberontak, melainkan penggunaan algoritma untuk manipulasi massal dan pengawasan, di mana sistem optimasi keterlibatan (*engagement*) dapat membentuk perilaku manusia tanpa pengawasan etis.
*   **Definisi Kecerdasan:** Kecerdasan bukan hanya keahlian dalam tugas spesifik, melainkan efisiensi dalam mengubah pengalaman menjadi program yang dapat digeneralisasi.

---

### Rincian Materi (Detailed Breakdown)

#### 1. Sifat Kecerdasan dan Mitos Ledakan Kecerdasan
Diskusi dimulai dengan menantang pandangan tradisional tentang kecerdasan sebagai sesuatu yang terisolasi di dalam otak. Kecerdasan sebenarnya bersifat eksternal, termanifestasi melalui buku, internet, bahasa, dan institusi.
*   **Sains sebagai Sistem Rekursif:** Sains adalah contoh terdekat dari kecerdasan rekursif yang superhuman. Namun, tidak ada ledakan eksponensial yang terjadi. Meskipun jumlah peneliti dan paper meningkat tajam, tingkat signifikansi penemuan ilmiah cenderung linear (*flat graph*).
*   **Gesekan Eksponensial:** Semakin jauh manusia memajukan pengetahuan, semakin sulit penemuan berikutnya ("low-hanging fruits" sudah diambil). Sumber daya yang dibutuhkan untuk mempertahankan laju kemajuan linear meningkat secara eksponensial.
*   **Narasi Singularitas:** Banyak orang terikat pada narasi singularitas atau kiamat AI karena alasan psikologis dan imajinasi, bukan bukti ilmiah. Narasi ini sering kali menolak kritik sebagai serangan terhadap identitas kepercayaan mereka.

#### 2. Sejarah dan Filosofi Desain Keras
François Chollet membagikan perjalanan penciptaan Keras, yang dimulai pada awal 2015.
*   **Asal Usul:** Pada saat itu, komunitas deep learning masih kecil. Alat utama seperti Caffe (berbasis C++) sulit digunakan untuk RNN/LSTM. Chollet membuat Keras dengan Python untuk menyediakan implementasi RNN yang dapat digunakan ulang dan mudah diakses.
*   **Integrasi TensorFlow:** Keras dirancang modular dengan backend yang abstrak, memungkinkannya berjalan di Theano dan kemudian TensorFlow. Integrasi ini semakin dalam hingga pada akhirnya Keras menjadi API standar berlevel tinggi untuk TensorFlow 2.0, memperkenalkan fitur seperti *eager execution*.
*   **Filosofi Desain:** API yang baik harus modular, hierarkis, dan meminimalkan beban kognitif pengguna. Desain harus mencerminkan model mental ahli domain, bukan detail implementasi internal.

#### 3. Keterbatasan Deep Learning dan Masa Depan AI
Deep Learning saat ini dominan, tetapi memiliki batasan struktural yang jelas.
*   **Masalah Generalisasi:** Model deep learning belajar melalui interpolasi geometris titik per titik. Mereka membutuhkan *dense sampling* (data yang sangat banyak) dan hanya bekerja baik dalam jangkauan lokal data pelatihan. Manusia dan sistem berbasis aturan (simbolik) jauh lebih unggul dalam generalisasi ekstrem (abstraksi).
*   **Sistem Hibrida:** Masa depan AI bukan hanya tentang memperbesar model deep learning, tetapi menggabungkannya dengan *Program Synthesis* (AI Simbolik). Contohnya adalah mobil otonom, di mana deep learning menangani persepsi visual, sedangkan inti navigasinya menggunakan model fisika dan aturan simbolik.
*   **Kebutuhan Data:** Konsep "The Bitter Lesson" (Rich Sutton) menyatakan bahwa metode umum dengan komputasi besar selalu menang. Namun, Chollet berargumen bahwa di masa depan, komputasi bukan lagi hambatan utama, melainkan data. Efisiensi data (*data efficiency*) akan menjadi kunci.

#### 4. Ancaman Etika: Manipulasi Algoritmik
Ancaman AI yang paling mendesak saat ini bukanlah robot yang sadar, melainkan algoritma rekomendasi yang memanipulasi perilaku manusia.
*   **Maximizing Engagement:** Platform media sosial saat ini dioptimalkan untuk memaksimalkan keterlibatan (klik, waktu tayang). Ini seringkali mendorong konten yang memicu emosi (berita palsu, konspirasi) karena konten tersebut tidak terikat oleh realitas.
*   **Pengendalian Pikiran:** Algoritma dapat memprediksi respons pengguna dan memberi umpan konten untuk mengarahkan pikiran mereka secara perlahan. Ini menciptakan permukaan eksploitasi yang besar pada psikologi manusia.
*   **Solusi:** Pengguna harus diberikan kendali atas fungsi objektif algoritma tersebut. Alih-alih hanya "memaksimalkan engagement", pengguna harus bisa memilih mode seperti "maksimalkan pembelajaran" atau "maksimalkan akurasi". Teknologi harus berperan sebagai asisten/mentor, bukan aturan yang memanipulasi.

#### 5. Mengukur Kecerdasan dan Masa Depan Penelitian
Bagian ini membahas bagaimana mendefinisikan dan mengukur kecerdasan secara objektif.
*   **Definisi Kecerdasan:** Kecerdasan adalah efisiensi dalam mengubah pengalaman menjadi program yang dapat digeneralisasi. Untuk mengukurnya, kita harus mengontrol *priors* (pengetahuan bawaan) dan jumlah pengalaman yang dimiliki agen.
*   **Pengetahuan Bawaan (DNA):** Pengetahuan bawaan manusia terbatas pada apa yang stabil selama jutaan tahun evolusi (misalnya struktur wajah umum, ular). Kita tidak memiliki pengetahuan bawaan untuk hal-hal baru (seperti perbedaan wajah pria/wanita spesifik modern). DNA memiliki bandwidth rendah, sehingga pengetahuan bawaan kita sangat sedikit dan efisien.
*   **Hype vs. Realitas:** Musim dingin AI (AI Winter) penuh tidak mungkin terjadi saat ini karena deep learning

## Kesimpulan & Pesan Penutup
Wawancara dengan François Chollet mengungkap bahwa masa depan kecerdasan buatan bukanlah tentang singularitas yang menakutkan, melainkan pengembangan sistem hibrida yang menggabungkan deep learning dengan AI simbolik. Ancaman terbesar yang harus diwaspadai saat ini bukanlah pemberontakan robot, melainkan manipulasi algoritmik yang dapat membentuk perilaku manusia secara tidak etis. Oleh karena itu, sangat penting bagi kita untuk menggeser fokus pengembangan teknologi agar memberdayakan pengguna melalui kendali yang lebih besar terhadap fungsi objektif algoritma.

Read

file updated 2026-02-13 13:24:14 UTC