François Chollet: Measures of Intelligence | Lex Fridman Podcast #120
PUAdj3w3wO4 • 2020-08-31
Transcript preview
Open
Kind: captions
Language: en
the following is a conversation with
francois chalet
his second time in the podcast he's both
a world-class engineer
and a philosopher in the realm of deep
learning and artificial intelligence
this time we talk a lot about his paper
titled on the measure of intelligence
that discusses how we might define and
measure
general intelligence in our computing
machinery
quick summary of the sponsors babel
masterclass
and cash app click the sponsor links in
the description to get a discount and
to support this podcast as a side note
let me say that the serious
rigorous scientific study of artificial
general intelligence
is a rare thing the mainstream machine
learning community works on
very narrow ai with very narrow
benchmarks
this is very good for incremental and
sometimes
big incremental progress on the other
hand
the outside the mainstream renegade you
could say
agi community works on approaches that
verge on the philosophical and even the
literary
without big public benchmarks walking
the line between the two worlds is a
rare breed
but it doesn't have to be i ran the agi
series at mit as an attempt to inspire
more people to walk this line
deep mind and open ai for time and still
on occasion walk this line francois
chole
does as well i hope to also
it's a beautiful dream to work towards
and to make real
one day if you enjoy this thing
subscribe on youtube
review it with five stars on apple
podcast follow on spotify
support on patreon or connect with me on
twitter at lex friedman
as usual i'll do a few minutes of ads
now and no ads in the middle
i try to make these interesting but i
give you time stamps so
you can skip but still please do check
out the sponsors by clicking the links
in the description
it's the best way to support this
podcast
this show is sponsored by babel an app
and website that gets you speaking in a
new language within weeks
go to babble.com and use colex to get
three months free
they offer 14 languages including
spanish french italian german
and yes russian daily lessons are 10 to
15 minutes
super easy effective designed by over
100 language experts
let me read a few lines from the russian
poem
by alexander bloch that you'll start to
understand if you sign up to babble
no it's
now i say that you'll start to
understand this poem
because russian starts with a language
and
ends with the vodka now the latter part
is definitely not endorsed
or provided by babel it will probably
lose me this sponsorship
although it hasn't yet but once you
graduate with babel
you can roll my advanced course of late
night russian conversation over vodka
no app for that yet so get started by
visiting babel.com
and use codelex to get three months free
this show is also sponsored by
masterclass sign up
at masterclass.com lex to get a discount
and to support this podcast
when i first heard about masterclass i
thought it was too good to be true
i still think it's too good to be true
for 180
a year you get an all-access pass to
watch courses from
to list some of my favorites chris
hadfield on space exploration
hope to have him in this podcast one day
neil degrasse tyson on scientific
thinking communication
neil two will wright creator of simcity
and sims on game design carlos santana
on guitar
carrie casparov von chasse daniel
negrano and poker and many more
chris hadfield explaining how rockets
work and the experience of being
launched at the space alone is worth the
money
by the way you can watch it on basically
any device
once again sign up at masterclass.com
lex to get a discount and
to support this podcast this show
finally is presented by
cash app the number one finance app in
the app store
when you get it use code lex podcast
cash app lets you send money to friends
buy bitcoin and invest in the stock
market with as little as one dollar
since cash app allows you to send and
receive money digitally
let me mention a surprising fact related
to physical money
of all the currency in the world roughly
eight percent of it
is actually physical money the other 92
percent of the money only exists
digitally
and that's only going to increase so
again
if you get cash out from the app store
google play and use code
lex podcast you get ten bucks and cash
app will also donate ten dollars to
first
an organization that is helping to
advance robotics and stem education
for young people around the world and
now here's my conversation
with francois chalet what philosophers
thinkers or ideas had a big impact on
you growing up
and today so one
author that had a big impact on me when
i read
these books as a teenager with jean
pierre who is
a swiss psychologist is considered
to be the father of developmental
psychology and he has a large body of
work about
um basically how intelligence develops
uh in children and so it's really old
work like most of it is from the 1930s
1940s
so it's not quite up to date it's
actually superseded by many
neural developments in developmental
psychology but to me it was
it was very uh very interesting very
striking and actually shaped
the early ways in which i started
thinking about the mind
and development of intelligence as a
teenager his actual ideas or the way he
thought about it or just the fact that
you could think about the developing
mind at all
i guess both jean-pierre is the author
that's reintroduced me to the
notion that intelligence and the mind is
something that you construct
through throughout your life and that
you the children
uh construct it in stages and i thought
that was a very interesting idea which
is you know of course very relevant
uh to ai to building artificial minds
another book that i read around the same
time that had a big impact on me
uh and and there was actually a little
bit of overlap with john pierre as well
and i read it around the same
time is jeff hawkins on
intelligence which is a classic and he
has this vision
of the mind as a multi-scale hierarchy
of temporal prediction modules and these
ideas really resonated with me
like the the notion of a modular
hierarchy
um of you know potentially um
of compression functions or prediction
functions
i thought it was really really
interesting and it reshaped
uh the way it started thinking about how
to build
minds the hierarchical nature
the which aspect also he's a
neuroscientist so he was thinking yes
actual he's basically talking about how
our mind works yeah the notion that
cognition is prediction
was an idea that was kind of new to me
at the time and that i really loved at
the time
and yeah and the notion that yeah there
are multiple scales of processing
uh in the brain the hierarchy
yes this is before deep learning these
ideas of hierarchies
in here i've been around for a long time
even before on intelligence i mean
they've been around since the
1980s um and yeah that was before deep
learning but of course
i think these ideas really found their
practical implementation in deep
learning what about the memory side of
things
i think he's talking about knowledge
representation
do you think about memory a lot one way
you can think of neural networks
as a kind of memory you're memorizing
things
but it doesn't seem to be the kind of
memory that's in our
brains or it doesn't have the same rich
complexity long-term nature that's in
our brains
yes the brain is more for sparse access
memory so that you can actually retrieve
um very precisely like bits of your
experience
the retrieval aspect you can like
introspect
you can ask yourself questions again yes
you can program your own memory and
language is actually
the tool you used to do that i think
language is a kind of
operating system for the mind and use
language
well one of the uses of language is as
a query that you run over your own
memory use
words as keys to retrieve specific
experiences of basic concepts specific
starts
like language is the way you store
thoughts not just in writing in the in
the physical world but also in your own
mind
and it's also how you reach with them
like imagine if you didn't have language
then you would have to you would not
have really have a
self internally triggered uh way of
retrieving
past thoughts you would have to rely on
external experiences
for instance you you see a specific site
you smell specific smell and it brings
up memories
but you would naturally have a way to
deliberately deliberately access these
memories
without language well the interesting
thing you mentioned is you can also
program
the memory you can change it probably
with language yeah using language yes
well
let me ask you a chomsky question which
is like
first of all do you think language is
like fundamental
like uh there's turtles
what's at the bottom of the turtles they
don't go it can't be turtles all the way
down
is language at the bottom of cognition
of everything is like
language the fundamental
aspect of like what it means to be
a thinking thing no i don't think so
i think language you disagree with noam
chomsky yes
language is a layer on top of cognition
so
it is fundamental to cognition in the
sense that
to to use a computing metaphor i see
language as
the operating system uh of the brain of
the human mind
yeah and the operating system you know
is a layer on top of the computer
the computer exists before the operating
system but the operating system is how
you make it truly useful
and the operating system is most likely
windows not
not linux because it's uh language is
messy
yeah it's messy and it's uh it's um
pretty difficult to uh
uh inspect it introspect it how do you
think about language
like we use actually sort of human
interpretable language but is there
something like
a deeper that's closer to like
like logical type of statements um
like yeah what is the nature of
language do you think because there's
something deeper than like the syntactic
rules we construct is there something
that
doesn't require utterances or
writing or so on are you asking about
the possibility that there could exist
uh languages for thinking that are not
made of words
yeah yeah i think so i think so
uh the mind is layers right and language
is almost like the
the outermost the uppermost layer
um but before we think in words i think
we think
in in terms of emotion in space
and we think in terms of physical
actions
and i think a baby babies in particular
probably express his thoughts in terms
of
um the actions uh that they've seen of
that or that they can perform
and in terms of the in in terms of
motions of objects in their environment
before they start thinking in terms of
words it's amazing to
think about that as the building
blocks of language so like the kind of
actions and
ways the babies see the world as like
more fundamental than the
beautiful shakespearean language you
construct on top of it
and we we probably don't have any idea
what that looks like right
like what because it's important for
them trying to
engineer it into ai systems
i think visual analogies and motion
is a fundamental building block of the
mind and you
you actually see it reflected in
language like language is full of
special metaphors and when you think
about things
i consider myself very much as a visual
thinker
you you often express your thoughts
um by using things like uh visualizing
concepts um
in in 2d space or like you solve
problems by image
imagining yourself navigating a concept
space i don't know if you have this sort
of experience
you said visualizing concept space so
like so i certainly think about
i certainly met i certainly visualize
mathematical concepts
but you mean like in concept space
visually you're embedding ideas into
some
into a three-dimensional space you can
explore with your mind essentially
yeah 2d you're a flatlander
you're um okay no
i i i do not i always have to uh before
i jump from concept to concept i have to
put it back down on pape
and it has to be on paper i can only
travel
on 2d paper not inside my mind
you're able to move inside your mind but
even if you're writing
like a paper for instance don't you have
like a special representation of your
paper
like you you visualize where ideas lie
topologically in relationship to other
ideas
kind of like a subway map of the ideas
in your paper
yeah that's true i mean there there is
uh
in papers i don't know about you but
there feels like there's a destination
um there's a there's a key
idea that you want to arrive at and a
lot of it is in
in the fog and you're trying to kind of
it's almost like um
what's that called when um you do a path
planning search from both directions
from the start and from the end
but and then you find you do like
shortest path but like
uh you know in game playing you do this
with like a star
from both sides when you see where they
join
yeah so you kind of do at least for me i
think like
first of all just exploring from the
start from like uh
first principles what do i know uh what
can i
start proving from that right and then
from the destination
if i you start backtracking like
if if i want to show some kind of sets
of ideas
what would it take to show them and you
kind of backtrack but
yeah i don't think i'm doing all that in
my mind though like i'm putting it down
on paper
do you use mind maps to organize your
ideas yeah i like mind maps
let's get into this i've been so jealous
of people i haven't really tried it i've
been jealous of
people that seem to like they get like
this fire of passion in their eyes
because everything starts making sense
it's like uh tom cruise in the movie was
like moving stuff around
some of the most brilliant people i know
use mind maps i haven't tried
really can you explain what the hell a
mind map is
i guess mind map is a way to make
connected mess inside your mind
to just put it on paper so that you gain
more control over it
it's a way to organize things on paper
and as as kind of like a consequence for
organizing things on paper it start
being more organized inside inside your
own mind what what does that look like
you put like do you have an example like
what what
what do you what's the first thing you
write on paper what's the second thing
you write
i mean typically uh you you draw a mind
map to
organize the way you think about a topic
so you would start by
writing down like the the key concept
about that topic like you would write
intelligence or something and then you
would start adding
uh associative connections like what do
you think about when you think about
intelligence what do you think are the
key elements of intelligence so maybe
you would have language for instance
instead of motion
and so you would start drawing notes
with these things and then you would see
what do you think about when you think
about motion
and so on and you would go like that
like a tree it's a
tree or a tree mostly there's a graph to
like
a tree oh it's it's more of a graph than
a tree and
um and it's not limited to just you know
writing down
words you can also uh draw things
and it's not it's not supposed to be
purely hierarchical right
like you can um the point is that you
can start once once you start writing it
down you can start reorganizing it so
that it makes more sense so that it's
connected in a more effective way see
but i'm so
ocd that you just mentioned intelligence
and language emotion
i would start becoming paranoid that the
categorization isn't perfect
like that i'll become paralyzed
with the mind map that like this may not
be
so like the even though you're just
doing associative kind of
connections there's an implied hierarchy
that's emerging
and i would start becoming paranoid
that's not the proper hierarchy
so you're not just one way to see mind
maps is you're putting
thoughts on paper it's like a
stream of consciousness but then you can
also start getting paranoid well
if is this the right hierarchy sure like
which it's a mind map it's your mind map
you're free to draw anything you want
you're free to draw any connection you
want and you can
just make a different mind my opinion is
if you think the central node is not the
right node
yeah so i suppose there's a fear of
being wrong
if you want to if you want to organize
your ideas by
writing down what you think which i
think is is very effective like
how do you know what you think about
something if you don't write it down
right uh if you do that the thing is
that it
imposes a much more uh syntactic
structure
over your ideas which is not required
with mind map so mind map is kind of
like a lower level
more freehand way of organizing your
thoughts
and once you've drawn it then you can
start
uh actually voicing your thoughts in
terms of you know
paragraphs it's a two-dimensional aspect
of layout too right
yeah and it's it's a kind of flower i
guess
you start there's usually you want to
start with a central concept
yes typically it ends up more like a
subway map so it ends up more like a
graph
a topological graph without a root note
yeah so
like in a subway map there are some
nodes that are more connected than
others and there are some nodes that are
more important than others
right so there are destinations but
it's it's not going to be purely like a
tree for instance
yeah it's fascinating to think that if
there's something to that about our
about the way our mind thinks by the way
i just kind of remembered
obvious thing that i have probably
thousands of documents in google doc at
this point
that bullet point lists uh
which is you can probably map a mine
map to a bullet point list
it's the same it's a no it's not it's a
tree
it's a tree yeah so i create trees but
also they don't have the visual element
like um i guess i'm comfortable with the
structure it feels like
it the narrowness the constraints feel
more
comforting if you have thousands of
documents with your own
thoughts in google docs why don't you
write
uh some kind of search engine like maybe
a mind map
um a piece of software mind mapping
software where you write down a concept
and then it gives you
sentences or paragraphs from your
thousand google docs document that match
this concept
the problem is it's so deeply unlike
mind maps
it's so deeply rooted in natural
language
so it's not um
it's not semantically searchable i would
say
because the categories are very you kind
of mention intelligence
language and motion they're very strong
semantic like
it feels like the mind map forces you to
be
semantically clear and specific the
bullet points list i have
are are sparse desperate
thoughts that uh
poetically represent a category
like motion as opposed to saying motion
so unfortunately it's that's the same
problem with the internet that's why the
idea of semantic web is difficult to get
it's uh most language on the internet is
a giant mess
of natural language that's hard to
interpret
which so do you think uh do you think
there's something to mind maps
as um you actually originally brought up
as we were talking about
kind of cognition and language do you
think there's something to mind maps
about how our brain actually
deals like think reasons about things
it's possible i think it's reasonable to
assume that there is
some level of topological processing in
the brain that the brain
is very associative in nature
and i also believe that
a topological space is a better medium
to encode thoughts than a geometric
space
then so i think what's the difference in
topological and geometric space
well um if you're talking about
topologies uh
then points are either connected or not
so the topology is more like a subway
map
and geometry is when you're interested
in the distance between things and in
subway maps you don't really have the
concept of distance you only have the
concept of whether there is a train
going from station a
to station b and
what we do in deep learning is that
we're we're actually dealing with uh
geometric spaces we're dealing with
concept vectors
word vectors uh that have a distance
between
the gist expressed in terms of dot
product um
we are not we are not really building
topological models usually
i think you're absolutely right like
distance is a
fundamental importance in deep learning
i mean it's the
continuous aspect of it yes because
everything is a vector and everything
has to be a vector because everything
has to be differentiable
if your space is discrete it's no longer
differentiable you cannot do deep
learning in it anymore
well you could but you could only do it
by embedding it in
a bigger continuous space so if you do
topology in the in the context of deep
learning you have to do it by embedding
your topology in a geometry right yeah
well let me uh let me zoom out for a
second
uh let's get into your paper on the
measure of intelligence
that uh did you put on 2019 yes
okay yeah november november
yeah remember 2018 that was a different
time
yeah i remember i still remember
it feels like a different and different
different world
you could travel you can you know
actually go outside and
see friends yeah
let me ask the most absurd question i
think
uh there's some non-zero probability
there'll be a textbook one day
like 200 years from now on artificial
intelligence
or it'll be called like just
intelligence because humans will already
be gone
it'll be your picture with a quote
you know one of the early biological
systems would consider
the nature of intelligence and they'll
be like a definition of how they thought
about intelligence
which is one of the things you do in
your paper on measure intelligence is
to ask like well
what is intelligence and and uh how to
test for intelligence and so on
so is there a spiffy quote
about what is intelligence
what is the definition of intelligence
according to francois charley
yes so do you think the the
superintendent ais of the future will
want to remember us do we
remember humans from the past and do you
think they would be
you know they won't be ashamed of having
a biology called origin
uh no i i think it would be a niche
topic it won't be that interesting but
it'll be
it'll be like the people that study in
certain contexts
like historical civilization that no
longer exist
the aztecs and so on that that's how
it'll be seen
and it'll be studying also the context
on social media there will be hashtags
about the atrocity committed to human
beings
um when when the when the robots finally
got rid of them
like it was a mistake it'll be seen as a
as a giant mistake but
ultimately in the name of progress and
it created a better world because humans
were
uh over consuming the resources and all
they were not very rational and were
destructive
in the end in terms of productivity and
putting more love in the world and so
within that context there'll be a
chapter about these biological systems
seems to have a very detailed vision
of that feature you should write a
sci-fi novel about it i said i'm working
i'm working on a sci-fi novel currently
yes
yes self-published yeah the definition
of intelligence so
intelligence is the efficiency
with which you acquire new skills
at tasks that you did not previously
know about that you did not prepare for
all right so it is not intelligence is
not skill itself
it's not what you know it's not what you
can do it's how
well and how efficiently you can learn
new things
new things yes the idea of newness there
seems to be fundamentally important
yes so you would see intelligence on
display for instance
whenever you see a human being or you
know an ai
creature adapt to a new environment that
it has not seen before that its creators
did not anticipate
when you see adaptation when you see
improvisation when you see
generalization that's intelligence
uh in reverse if you have a system
that's when you put it in a
slightly new environment it cannot adapt
it cannot improvise it cannot deviate
from what it's hardcoded to do oh
what what it has been trying to do
um that is a system that is not
intelligent there's actually a quote
from
einstein that captures this idea which
is
the measure of intelligence is the
ability to change
i i like that quote i think it captures
at least part of this idea
you know there might be something
interesting about the difference between
your definition and einsteins
i mean he's just being einstein
and clever but acquisition of
new ability to deal with new things
versus ability to just change
what's the difference between those two
things so just
changing itself do you think there's
something to that
just being able to change yes being able
to adapt so not
not change but certainly uh
changes direction being able to adapt
yourself
to your environment whatever the
environment that's
that's a big part of intelligence yes
and intelligence is more precisely you
know
how efficiently you're able to adapt how
efficiently you're able to
basically master your environment how
efficiently
you can acquire new skills and i think
there's a there's a big distinction to
be drawn
between intelligence which is a process
and the output of that process which is
skill
so for instance if you have a very smart
human programmer
that considers the game of chess and
that writes down
a static program that can play chess
then
the intelligence is the process of
developing that program but the program
itself
is just encoding
the output artifact of that process the
program itself is not intelligent and
the way you tell it's not intelligent
is that if you put it in a different
context you ask it to play go or
something
it's not going to be able to perform
well with human involvement because the
source of intelligence
the entity that is capable of that
process is the human programmer
so we should be able to tell the
difference between
the process and its output we should not
confuse
the output and the process it's the same
as you know do not confuse
a road building company and one specific
road because one specific road takes you
from point a to point b
but a road building company can take you
from you can
make a path from anywhere to anywhere
else yeah that's beautifully put but
it's also
to play devil's advocate a little bit
you know um it's possible that there's
something
more fundamental than us humans so
you kind of said the programmer creates
uh the difference between the the choir
of the skill and the skill itself
there could be something like you could
argue the universe
is more intelligent like the the deep
the base intelligence of um that we
should be
trying to measure is something that
created humans
we should be measuring god or what
the source the universe as opposed to
like there's there could be a deeper
intelligence sure there's always deeper
intelligence you can argue that but that
does not
take anything away from the fact that
humans are intelligence and you can't
tell that
because they are capable of adaptation
and and generality
um and you see that in particular and
the fact that
uh humans are capable of handling
uh situations and tasks that
are quite different from anything that
any of our
evolutionary ancestors has ever
encountered
so we are capable of generalizing very
much out of distribution if you consider
our evolutionary history as being in a
way else training data
course evolutionary biologists would
argue that we're not going too far out
of the distribution
we're like mapping the skills we've
learned previously
desperately trying to like jam them into
like
these new situations i mean there's
definitely a little bit
a little bit of that but it's pretty
clear to me that we're able to
uh you know most of the things we do
any given day in our modern civilization
are things that are
very very different from what you know
our ancestors a million years ago would
have been doing in in a given day and
your
environment is very different so i agree
that
um everything we do we do it with
cognitive building blocks
that we acquired over the course of
revolution
right and that anchors um our cognition
to a certain context which is
the human condition very much but still
our mind is capable of a pretty
remarkable degree of generality
far beyond anything we can create in
artificial systems today
like the degree in which the mind can
generalize
from its evolutionary history
can generalize away from its
evolutionary history is much greater
than the
degree to which a depending system today
can generalize away from its training
data
and like the key point you're making
which i think is quite beautiful is like
we shouldn't measure if we talk about
measurement
we shouldn't measure the skill we should
measure like the creation of the new
skill
the ability to create that new skill yes
but there it's tempting
like it's weird because the skill
is a little bit of a small window into
the
into the system so whenever you have a
lot of skills
it's tempting to measure the skills yes
i mean the skill is the
uh only thing you can objectively
measure
but yeah so the the thing to keep in
mind is that
when you see skill in the human
it gives you a strong signal that that
human is intelligent because you knew
they weren't born with that skill
typically like you say this you see a
very strong chess player maybe you're a
very stronger player yourself
i think you're and you're you're saying
that because i'm russian and now now
you're
you're prejudiced you assume oh yeah
it's just biased
i'm biased yeah well you're dead by us
um so if you see a very strong chess
player you know they weren't born
knowing how to play chess so they had to
acquire that skill
with their limited resources with their
limited lifetime
and you know they did that because they
are generally intelligent
and so they may as well have acquired
any other skill you know they have this
potential
and on the other hand if you see a
computer
playing a chess you cannot make the same
assumptions because you cannot you know
just assume the computer is generally
intelligent
the computer may be born knowing
how to play chess in the sense that it
may have been programmed
by a human that has understood chess for
the computer and and that has just
encoded
um the output of that understanding in
aesthetic program and that program
is not intelligent so let's zoom out
just for a second and say like
what is the goal of the on the measure
of intelligence paper
like what do you hope to achieve with it
so the goal of the paper
is to clear up some long-standing
misunderstandings
about the way we've been conceptualizing
intelligence in the ai community and
in the way we've been evaluating
progress
in ai there's been a lot of progress
recently in machine learning and people
are you know extrapolating from that
progress that we're about
to solve general intelligence
and if you want to be able to evaluate
these statements
you need to precisely define what you're
talking about when you're talking about
general intelligence and you need
a formal way a reliable way to measure
how much intelligence how much general
intelligence
a system processes and ideally this
measure of intelligence should be
actionable so it should not just
describe
what intelligence is it should not just
be a binary indicator that tells you
the system is intelligent or it isn't
um it should be actionable it should
have explanatory
power right so you could use it as a
feedback signal
it would show you uh the way towards
building more intelligent systems
so at the first level you draw a
distinction between two divergent views
of intelligence
of um as we just talked about
intelligence is a collection of tax
task specific skills and a general
learning ability so what's the
difference between
kind of this memorization of skills
and a general learning ability we've
talked about a little bit but can you
try to
linger on this topic for a bit yeah so
the first part of the paper
uh is uh an assessment of the different
ways
uh we've been thinking about
intelligence and the different ways
we've been evaluating progress
in ai and the history
of cognitive sciences has been shaped by
two views
of the human mind and one view is the
evolutionary psychology view in which
the mind
is a collection of fairly static
special purpose ad-hoc mechanisms
that have been hard coded by evolution
over our our history as a species over a
very long time
and um early
ai researchers people like marvin minsky
for instance
they clearly subscribed to this view
and they saw they saw the mind as a kind
of
you know collection of static programs
uh
similar to the programs they would they
would run on like mainframe computers
and in fact they i think they very much
understood the mind
uh through the metaphor of the mainframe
computer because that was the tool they
they were working with right
and so you had the static programs this
collection of very different static
programs operating over
a database like memory and in this
picture learning was not very important
learning was considered to be just
memorization and in fact
learning is basically not featured in ai
textbooks until
the 1980s with the rise of machine
learning
it's kind of fun to think about that
learning was the outcast
like the the weird people were learning
like the mainstream
ai world was um
i mean i don't know what the best term
is but it's non-learning
it was seen as like reasoning yes would
not be learning based
yes it was seen it was considered that
the mind was a collection
of programs that were primarily
logical in nature and that's all you
needed to do to create a mind was to
write down these programs and they would
operate over your knowledge
which would be stored in some kind of
database and as long as your database
would encompass you know
everything about the world and your
logical rules were uh
comprehensive then you would have in
mind so the other view of the mind
is the brain as a sort of blank slate
right this is a very old idea you find
it in
john locke's writings this is the
tabulata
and this is this idea that the mind is
some kind of like information sponge
that starts empty it starts
blank and that absorbs uh
knowledge and skills from experience
right so it's uh it's a sponge that
reflects
the complexity of the world the
complexity of your life experience
essentially
that everything you know and everything
you can do is
a reflection of something you found in
the outside world essentially
so this is an idea that's very old uh
that was not very popular for instance
in the in the 1970s
but that had gained a lot of vitality
recently with the rise of
connectionism in particular deep
learning and so today deep learning is
the dominant
paradigm in ai and i feel like lots of
ai researchers are conceptualizing the
mind
via a deep learning metaphor like they
see the mind as a kind of
randomly initialized neural network that
starts blank
when you're born and then that gets
trained yeah
exposure to training data that acquires
knowledge and skills exposure to
training data
by the way it's a small tangent
i feel like people who are thinking
about intelligence
are not conceptualizing it that way i
actually
haven't met too many people who believe
that a neural network
will be able to reason who
seriously think that rigorously because
i think it's actually interesting world
view
and and we'll talk about it more but it
it's been impressive
what the uh what neural networks have
been able to accomplish
and it's i to me i don't know you might
disagree but it's an open question
whether
like like scaling size
eventually might lead to incredible
results to us mere humans will appear as
if it's general
i mean if you if you ask people who are
seriously thinking about intelligence
they will
definitely not say that all you need to
do is is
like the mind is just in your network uh
however
it's actually you that's that's very
popular i think in the deep learning
community that
many people are kind of uh conceptually
you know intellectually lazy about it
right but what i guess what i'm saying
exactly right it's
uh i i me i haven't met many people and
i think it would be interesting
uh to meet a person who is not
intellectualized about this particular
topic and still believes
that neural networks will go all the way
i think january
is probably closest to that there are
definitely people
who argue that uh
current deep learning techniques are
already the way
to general artificial intelligence and
that all you need to do
is to scale it up to all the available
training data
and that's if you look at the the waves
that
open ai's gpt stream model has made you
see
echoes of this idea so on that topic
gpt-3 similar to gpt-2
actually have captivated some part of
the imagination of the public
there's just a bunch of hype of
different kind that's
i would say it's emergent it's not
artificially manufactured it's just like
people just get excited for some strange
reason in in the case of gpt3 which is
funny
that there's i believe a couple months
delay from release to
hype maybe i'm not
historically correct on that but it
feels like there was a little bit of a
lack of hype and then there's a phase
shift into into
hype but nevertheless there's a bunch of
cool applications
that seem to captivate the imagination
of the public about what this
language model that's trained in
unsupervised way
without any fine tuning is able to
achieve
so what do you make of that what are
your thoughts about gbt3
yeah so i think what's interesting about
gpg3 is the idea that it may be able
to learn new tasks in
after just being shown a few examples so
i think if it's actually capable of
doing that
that's novel and that's very interesting
and that's something we should
investigate
that said i must say i'm not entirely
convinced
that we have shown it's it's capable of
doing that
it's very likely given the amount
of data that the model is trained on
that what it's actually doing
is pattern matching uh a new task you
give it
with the task that it's been exposed to
in its training data it's just
recognizing the task
instead of just developing a model of
the task
right but there's a side to interrupt
there's there's a parallels to what you
said before
which is it's possible to see gpt3
as like the prompts that's given as a
kind of
sql query into this thing that it's
learned similar to what you said before
which is language is used
to query the memory yes so is it
possible that
neural network is a giant memorization
thing
but then if it gets sufficiently giant
it'll memorize sufficiently large
amounts of thing in the world
where it becomes more intelligence
becomes a querying machine
i think it's possible that uh a
significant chunk of intelligence
is this giant associative memory uh
i definitely don't believe that
intelligence is just
a giant issue of memory but it may well
be a big component
so do you think gpt 3
4 5 gpt 10 will eventually
like what do you think where's the
ceiling do you think you'll be able to
reason um no that's a bad question uh
like what is the ceiling is the better
question how well is it going to scale
how good is gptn going to be yeah
so i believe gptn is going to
chiptn is going to improve on the
strength
of gpt2 and 3 which is it will be able
to generate you know
ever more plausible text in context just
monitoring
the process performance um yes
if you train if you're training bigger
more on more data then
your text will be increasingly more
context aware
and increasingly more plausible in the
same way that gpd3
it is much better at generating
clausable text compared to gpd2
but that said i don't think just getting
up
uh the model to more transformer layers
and more train data is going to address
the flaws
lgbt3 which is that it can generate
plausible text but that text is
not constrained by anything else other
than plausibility
so in particular it's not constrained by
factualness
uh or even consistency which is why it's
very easy to get gpt3 to generate
statements that are
factually untrue uh or to general
statements that are even
self-contradictory
right uh because it's uh it's it's
only goal is plausibility and it has no
other constraints
it's not constrained to be
self-consistent for instance right
and so for this reason one thing that i
thought was very interesting with gpd3
is that
you can present mind the answer it will
give you
by asking the question in specific way
because it's very responsive to the way
you ask the question since it has
no understanding of the content of the
question right
and if you if you ask the same question
in two different ways that are
basically adversarially engineered to
produce certain answers you will get
two different answers to contractor
answers it's very susceptible to
adversarial
attacks essentially potentially yes so
in in general
the problem with these models is
generative models is that
they are very good at generating
plausible text but that's just
that's just not enough right um
you need uh i think one one avenue that
would be very
interesting to make progress is to make
it possible
to write programs over the latent space
that these models operate on that you
would rely
on these self-supervised models to
generate a sort of flag
pool of knowledge and concepts and
common sense and then you will be able
to write
explicit uh reasoning programs over it
uh because the current problem with gpt
stream is that you
it's it can be quite difficult to get it
to
do what you want to do if you want to
turn gpd3 into products you need to put
constraints on it
you need to um force it to
obey certain rules so you need a way to
program it explicitly
yeah so if you look at its ability to do
program synthesis
it generates like you said something
that's plausible yeah so
if you if you try to make it generate
programs it will perform well
for any program that it has seen it in
its training data
but because uh program space is not
interpretive
right um it's not going to be able to
generalize to problems it hasn't seen
before
now that's currently do you think
sort of an absurd but i think useful
um i guess intuition builder is uh
you know the gpt-3 has 175 billion
parameters
a human brain has a hundred has about a
thousand times that or
or more in terms of number of synapses
do you think obviously
very different kinds of things but there
is
some degree of similarity
do you think what do you think gpt will
look
like when it has a hundred trillion
parameters
you think our conversation might be so
in nature different
like because you've criticized gbt3 very
effectively now
do you think no i don't think so
so the the to begin with the bottleneck
with scaling upgrades
gbt models uh alternative pre-trained
transformer models
is not going to be the size of the model
or how long
it takes to train it the bottleneck is
going to be the trained data
because openui is already training gpt3
on a crore of basically the entire web
right and that's a lot of data so you
could imagine training on more data than
that like google could try on more data
than that
but it would still be only incrementally
more data
and i i don't recall exactly how much
more data gpd3 was trained on compared
to gpt2 but it's probably at least like
100 or maybe even a thousand x don't
have the exact number
uh you're not going to be able to train
the model on 100 more data than with
what you already with what you're
already doing
so that's that's brilliant so it's not
you know it's easier to think of compute
as a bottleneck
and then arguing that we can remove that
bottleneck but we can remove the compute
bottleneck i don't think it's a big
problem
if you look at the at the base at which
we've uh
improved the efficiency of deep learning
models
in the past a few years i'm not worried
about
uh trying time bottlenecks or model size
bottlenecks
the the bottleneck in the case of these
generative transformer models is
absolutely the trained data
what about the quality of the data so so
yeah so the quality of the data is an
interesting point the thing is
if you're going to want to use these
models in real
products um then you
you want to feed them data that's as
high quality as
factual i would say as unbiased as
possible
but you know there's there's not really
such a thing as unbiased
data in the first place but you probably
don't want to
to train it uh on reddit for instance it
sounds
sounds like a bad plan so from my
personal experience working with
a large scale deep learning models
so at some point i was working on a
model at google
that's trained on extra 150 million
labeled images it's image classification
model that's a lot of images that's like
probably most publicly available images
on the web at the time
and it was a very noisy data set because
the labels
were not originally annotated by hand by
humans they were
automatically derived from like tags on
social media
or just keywords in in the same page as
the image was fun and so on so it was
very noisy and
it turned out that you could uh easily
get
a better model uh not just by training
like if you train on more
of the noisy data you get an
incrementally better model but you
you you very quickly hit diminishing
returns on the other hand
if you try on smaller data set with
higher quality annotations quality that
are
annotations that are actually made by
humans you get a better
model and it also takes you know less
time to train it
uh yeah that's fascinating it's the
self-supervised
learnings there's a way to get better
doing the automated
labeling yeah so you can
enrich or refine your labels
in an automated way that's correct do
you have a hope for um
i don't know if you're familiar with the
idea of a semantic web
is this a semantic web just for people
who are not familiar
and is uh is the idea of being able to
convert
the internet
or be able to attach like semantic
meaning to
the words on the internet this the
sentences the paragraphs
to be able to contr convert information
on the internet or some fraction of the
internet into something that's
interpretable by machines
that was kind of a dream
for um i think the the semantic white
papers in the 90s
it's kind of the dream that you know the
internet is full of rich exciting
information even
just looking at wikipedia we should be
able to use that
as data for machines and so information
is not it's not really in a format
that's available to machines so no i
don't think the semantic web will
ever work simply because it would be a
lot of work
right to make to provide that
information in structured form
and there is not really any incentive
for anyone to provide that work
uh so i think the the way forward to
make
the knowledge on the web available to
machines is actually
something closer to unsupervised deep
learning
yeah the gpg 3 is actually a bigger step
in the direction of making the knowledge
of the web available to machines
than the semantic web was yeah perhaps
in a human-centric sense it it feels
like
gpt-3 hasn't learned
anything that could be used
to reason but that might be just the
early days
yeah i think that's correct i think the
forms of reasoning that you
that you see it perform are basically
just reproducing
patterns that it has seen in string data
so of course if you're trained on
uh the entire web then you
can produce an illusion of reasoning in
many different situations but it will
break down
if it's presented with a novel uh
situation
that's the opening question between the
illusion of reasoning and actual
reasoning
yes the power to adapt to something that
is genuinely new
because the thing is even imagine you
had
uh you could train on
every bit of data ever generated
in history of humanity uh it remains
so that model would be capable of of
anticipating
uh many different possible situations
but it remains that
the future is going to be something
different like
for instance if you train a gpt stream
model on
on data from the year 2002 for instance
and then use it today it's going to be
missing many things it's going to be
missing many
common sense facts about the world
it's even going to be missing vocabulary
and so on
yeah it's interesting that uh gbt3 even
doesn't have
i think any information about the
coronavirus
yes which is why you know uh
a system that's uh you you tell that the
system is intelligent when it's capable
to adapt
so intelligence is gonna require uh some
amount of continuous learning
but it's also gonna require some amount
of improvisation like
it's not enough to assume that what
you're going to be
asked to do is something that you've
seen before
or something that is a simple
interpolation of things you've seen
before
yeah in fact that model breaks down for
uh even even very
tasks that look relatively simple from a
distance
like l5 self-driving for instance
google had a paper couple of years back
showing that something like 30 million
different road situations were actually
completely insufficient to train
a driving model it wasn't even l2 right
and that's a lot of data that's a lot
more data than the
the 20 or 30 hours of driving that a
human needs
to learn to drive given the knowledge
they've already accumulated
well let me ask you on that topic
elon musk tesla autopilot
one of the only companies i believe is
really pushing for a learning based
approach
are you you're skeptical that that kind
of network can achieve level four
l4 is probably achievable
l5 is probably not what's the
distinction there
this l5 is completely you can just fall
asleep
yeah alpha is basically human level well
it will drive you have to be careful
saying human level because like
that's yeah most of the drivers yeah
that's the clearest example of like
you know cars will most likely be much
safer than humans in situ
in many situations where humans fail
it's the
vice versa so i'll tell you you know
the thing is the the amounts of training
data you would need
to anticipate for pretty much every
possible situation
you'll encounter in the real world uh is
such that
it's not entirely unrealistic to think
that at some point in the future we'll
develop a system that's running on
enough data especially uh
provided that we can uh simulate a lot
of that data
we don't necessarily need actual uh
actual cars on the road
for everything but it's a massive
effort and it turns out you can create a
system that's
much more adaptative that can gen
Resume
Read
file updated 2026-02-13 13:23:19 UTC
Categories
Manage