Michael Littman: Reinforcement Learning and the Future of AI

Michael Littman: Reinforcement Learning and the Future of AI | Lex Fridman Podcast #144

c9AbECvRt20 • 2020-12-13

Transcript preview

Open

Kind: captions
Language: en
the following is a conversation with
michael littman a computer science
professor at brown university
doing research on and teaching machine
learning
reinforcement learning and artificial
intelligence
he enjoys being silly and lighthearted
in conversation
so this was definitely a fun one quick
mention of each sponsor
followed by some thoughts related to the
episode thank you to
simply safe a home security company i
use to monitor
and protect my apartment expressvpn the
vpn i've used for many years to protect
my privacy and the internet
masterclass online courses that i enjoy
from some of the most amazing humans in
history
and better help online therapy with a
licensed professional
please check out the sponsors in the
description to get a discount and
to support this podcast as a side note
let me say that i may experiment with
doing some solo episodes in the coming
months
or two the three ideas i have floating
in my head
currently is to use one a particular
moment in history
two a particular movie or three a book
to uh drive a conversation about a set
of uh related concepts
for example i could use 2001 a space
odyssey or x machina
to talk about agi for one two three
hours
or i could do an episode on the yes
rise and fall of hitler and stalin
each in a separate episode using
relevant books and historical moments
for reference i find the format of a
solo episode
very uncomfortable and challenging but
that just tells me
that it's something i definitely need to
do and learn from the experience
of course i hope you come along for the
ride also
since we have all this momentum built up
on announcements
i'm giving a few lectures on machine
learning at mit this january
in general if you have ideas for the
episodes
for the lectures or for just short
videos on youtube
let me know in the comments that i
still definitely read despite my better
judgment
and the wise sage device of the great
joe rogan if you enjoy this thing
subscribe on youtube
review it with five stars on apple
podcast follow on spotify
support on patreon or connect with me on
twitter
lex friedman and now here's my
conversation
with michael littman i saw a video of
you talking to
charles this bell about westworld the tv
series
you guys were doing a kind of thing
where you're watching new things
together but let's
rewind back is there a sci-fi
movie or book or shows
that you that was profound that had an
impact on you philosophically or just
like
specifically something you enjoyed
nerding out about
yeah interesting i think a lot of us
have been inspired by robots in movies
the one that i really like is uh
there's a movie called robot and frank
which i think is really interesting
because it's very near-term
future where uh robots are being
deployed as
uh helpers in people's homes and it was
it was
and we don't know how to make robots
like that at this point but it seemed
very plausible it seemed very
realistic or imaginable and i thought
that was really cool because
they did they're awkward they do funny
things it raised some interesting issues
but
it seemed like something that would
ultimately be helpful and good if we
could do it right
yeah he was an older cranky gentleman
right he was an older cranky
uh jewel thief yeah it's kind of funny
little
thing which is you know he's a dual
thief and so he
pulls the robot into his life which is
like which is something you could
imagine
taking a home robotics thing
and pulling into whatever quirky thing
that's involved in your this is
meaningful to you exactly so yeah and i
think i think from that perspective i
mean not all of us are jewel thieves and
so when we bring our robots into
it for yourself uh explains a lot about
this apartment actually
but no the idea that that people should
have the ability to
you know make this technology their own
that that it becomes part of their lives
and and i think that's
it's hard for us as technologists to
make that kind of technology it's easier
to mold people into what we need them to
be
and um just that opposite vision i think
is really inspiring
and then there's a anthropomorphization
where we project
certain things on them because i think
the robot was kind of dumb
but i have a bunch of roombas that play
with and they you immediately project
stuff onto them much greater level of
intelligence we'll probably do that with
each other too
much much greater degree of compass
that's right one of the things we're
learning from ai is
where we are smart and where we are not
smart yeah
you also enjoy as people can see
and i enjoyed myself uh watching you
sing
and even dance a little bit a little bit
a little bit a little bit of dancing
a little bit of dancing that's not quite
my thing as a as a method of education
or just in life you know in general
so easy question what's the
definitive objectively speaking top
three songs of all time
maybe something that you know uh
to walk that back a little bit maybe
something that
others might be surprised by the three
three songs that you kind of enjoy
that is a great question that i cannot
answer but instead let me tell you a
story so
pick a question you do want it that's
right i've been watching the
presidential debates and vice president
debates and turns out yeah it's really
you can just answer any question you
want
so so it's a related question
[Laughter]
yeah well said i really like pop music
i've enjoyed pop music ever since i was
very young so 60s music 70s music
80s music this is all awesome and then i
had kids and i think i stopped listening
to music and i was starting to realize
that the
like my musical taste had sort of frozen
out and so i decided
in 2011 i think to start listening to
the top 10
billboard songs each week so i'd be on
the on the treadmill and i would listen
to that week's top 10 songs
so i could find out what was popular now
and what i discovered
is that i have no musical taste
whatsoever i like what i'm familiar
with and so yeah the first time i'd hear
a song it's the first week that was on
the charts i'd be like
and then the second week i was into it a
little bit and the third week
i was loving it and by the fourth week
is like just part of me and so
i'm afraid that i can't tell you the
most my favorite song of all time
because it's whatever i heard most
recently yeah
that's interesting people have told me
that um there's an art to listening to
music as well
you can start to if you listen to a song
just carefully like
explicitly just force yourself to really
listen you start to
uh i did this when i was part of jazz
band and fusion band in college is
there's they you you start to hear the
layers
of the instruments you start to hear the
individual instruments and you start to
uh
you can listen to classical music or to
orchestra this way you can listen to
jazz this way i mean
uh it's funny to imagine you now to
walk in that forward to listening to pop
hits now as like a scholar
listening to like cardi b or something
like that or justin timberlake is he
no not temple like bieber i guess
they've both
been in the top 10 since i've been
listening they're still still up there
oh my god i'm so clueless
if you haven't heard justin timberlake's
top 10 in the last few years
there was one song that he did where the
music video was set
at essentially nurips oh wow oh
the one with the robotics yeah yeah yeah
yeah yeah yeah he's like at an academic
conference and he and he's doing it he
was
presenting it was sort of a cross
between the apple
like steve jobs kind of talk and nurips
um so i you know it's always fun when ai
shows up in pop culture i wonder if he
consulted somebody for that
that's very that's really interesting so
maybe on that topic i've seen your
um your celebrity multiple dimensions
but one of them is you've done
cameos in different places i've seen you
in a turbo tax commercial as like
i guess the the brilliant einstein
character
and the the point is that turbo tax
doesn't need
somebody like you doesn't need a
brilliant
very few things need someone like me but
yes they were specifically
emphasizing the idea that you don't need
to be a like a computer expert to be
able to use their software
how did you end up in that world i think
it's an interesting story so i was
teaching my class it was an intro
computer science class for
non-concentrators non-majors
and sometimes when people would visit
campus they would check in to say hey we
want to see what a class is like can we
sit on your class
so a person came to my class
who was the daughter of the brother
of the hus
husband of the best friend of my wife
anyway basically a family friend came to
campus to
to check out brown and asked to come to
my class and
and came with her dad her dad is uh
who i've known from various kinds of
family events and so forth but he also
does
advertising and he said that he was
recruiting
scientists for this this this ad this
this turbotax
set of ads and he said we wrote the ad
with the idea that we get like
the most brilliant researchers um but
they all said no
so can you help us find the like b
level scientists i'm like sure
that's that's who i hang out with so
that should be fine
so i put together a list and i did what
some people call the dick cheney so i
included myself on the list
of possible candidates uh you know with
a little blurb about each one and why i
thought
it would make sense for them to to do it
and they reached out to a handful of
them but then they ultimately they
youtube stalked me a little bit and they
thought
oh i think he could do this and um they
said okay we're gonna offer you the
commercial
i'm like what so um it was it was such
an interesting experience because it's
it's they have another world the people
who do
like nationwide kind of ad campaigns and
and television shows and movies and so
forth it's quite
a a remarkable system that they have
going because like a set
yeah so i went to uh it was just
somebody's house that they rented in new
jersey
um but it in the in the commercial it's
just me and this other woman
in reality there were 50 people in that
room and another
i don't know half a dozen kind of spread
out around the house in various ways
there were people whose job it was to
control the sun
they were in the backyard on ladders
putting
filters up to try to make sure that the
sun didn't glare off the window in a way
that would wreck the shot
so there was like six people out there
doing that there was three people out
there giving
snacks the craft table there was another
three people giving
healthy snacks because that was a
separate craft table there was one
person whose job it was
to keep me from getting lost and
the i think the reason for all this is
because so many people are in one place
at one time they have to be time
efficient they have to get it done
this the morning they were going to do
my commercial in the afternoon they were
going to do a commercial of a
mathematics professor from princeton
they had to get it done no you know no
wasted time or energy and so there's
just a fleet of people
all working as an organism and it was
fascinating i was just the whole time
just looking around like
this is so neat like one person whose
job it was to take the camera off of the
camera man
so that someone else whose job it was to
remove the film canister because every
couple's takes they had to replace the
film because you know
film gets used up it was just i don't
know i was
geeking out the whole time it was so fun
how many takes did it take it looked the
opposite like there was
more than two people there it was very
relaxing right yeah the super
i mean the person who i was in the scene
with um is a professional
she's a you know uh she's an actor
improv comedian okay in your community
and when i got there they had given me a
script as such as it was and then i got
there and they said
we're gonna do this as improv i'm like i
don't know how to improv like this is
not
i don't know what this i don't know what
you're telling me to do here
don't worry she knows okay okay we'll
see how this goes
i get i guess i got pulled into the
story because like where the heck did
you come from
i guess in the scene like how did you
show up in this random person's house
i don't know yeah well i mean the
reality of it is i stood outside in the
blazing sun there was someone whose job
it was to keep an umbrella over me
because i started to schvitz i started
to sweat
and so i would wreck the shot because my
face was all shiny with sweat so there
was one person who would dab me off
had an umbrella um but yeah like the
reality of it like
why is this strange stalkery person
hanging around outside somebody's house
yeah we're not we're not sure when you
have to look in we'll have to wait for
the book
but are you uh so you make you make like
you said youtube you make videos
yourself
you make awesome parody sort of uh
parody songs that kind of focus in on
particular aspects of computer science
how much those seem really natural
how much production value goes into that
do you also have a team of
50 people videos almost all the videos
except for the ones that people would
have actually seen
were just me i write the lyrics i sing
the song i i generally
find a um like a backing track online
because i'm unlike you can't really play
an instrument and then i do
in some cases i'll do visuals using just
like powerpoint
lots and lots of powerpoint to make it
sort of like an animation
the the most produced one is the one
that people might have seen which is the
overfitting video that i did with
charles isbell um
and that was produced by the georgia
tech and udacity people because we were
doing a class together it was kind of i
usually do parody songs
kind of to cap off a class at the end of
a class so that one
you're wearing so it's a this the
thriller yeah you're wearing the michael
jackson the red
leather jacket the interesting thing
with podcasting that you're also
uh into is that
i really enjoy is that there's not a
team of people
it's kind of more because you know the
the there's something that happens
when there's more people involved than
just one person
that just the way you start acting i
don't know
there's a censorship you're not given
especially for like slow thinkers like
me you're not
and i think most of us are if we're
trying to actually think
we're a little bit slow and and careful
it it kind of large teams get in the way
of that
and i don't know what to do with ice
like that's the to me
like if you know this it's very popular
to criticize quote unquote mainstream
media
i but there is legitimacy to criticizing
them the same i
love listening to npr for example but
every
it's clear that there's a team behind it
there's a commercial there's constant
commercial breaks there's this kind of
like rush of like
uh okay i have to interrupt you now
because we have to go to commercial just
this whole
it creates it destroys the possibility
of nuanced conversation
yeah exactly evian uh which
charles uh isabel who i i talked to
yesterday told me that
evian is naive backwards which the fact
that his mind thinks this way is just
uh it's quite brilliant anyway there's a
freedom to this podcast he's dr awkward
which by the way is a palindrome that's
a palindrome that i happen to know
for from other parts of my life and i
just you just throw it out
well you know use it against charles dr
awkward
so what uh what was the most challenging
parody song to make
was it the thriller one hmm no that was
really fun i
wrote the lyrics really quickly um and
then i gave it over to the product
production team they recruited a
a cappella group to to sing that went it
went really smoothly it's great having a
team because then you can just focus on
the part that you really love which in
my case is writing the lyrics
uh for me the most challenging one not
challenging in a bad way but challenging
in a really fun way
was i did one of this one of the parody
songs i did
is is about the halting problem in
computer science the the fact that
you can't create a program that can tell
for any other arbitrary program whether
it actually going to get stuck in
infinite loop or whether it's going to
eventually stop
and so i i did it to an 80s song
because that's i hadn't started my new
thing of learning current songs
and it was billy joel's the piano man
nice which is a great song great song
yeah yeah
and sing me a song you get the piano man
yeah yeah so the lyrics are great
because first of all it rhymes uh not
all songs rhyme i did i've done
rolling stone songs which turn out to
have no rhyme scheme whatsoever they're
just
sort of yelling and having a good time
which makes it not fun from a parody
perspective because like you can say
anything
but this you know the lines rhymed and
there was a lot of internal rhymes as
well
and so figuring out how to sing with
internal rhymes
a proof of the halting problem was
really challenging and
it was i really enjoyed that process
what about uh
last question on this topic what about
the dancing in the thriller video how
many takes that take
so i wasn't planning to dance they they
had me in the studio and they gave me
the jacket and it's like well you can't
if you have the jacket and the glove
like there's not much you can do yeah
so i um i think i just danced around
and then they said why don't you dance a
little bit we there was a scene with me
and charles dancing together
they did not use it in the video but we
recorded it um yeah yeah no it was
it was pretty funny and charles who has
this
beautiful wonderful voice doesn't really
sing he's not really a singer and so
that was why i designed the song with
him doing a spoken section and me doing
things very like barry white yeah it's a
smooth baritone
yeah yeah it's great that was awesome so
one of the other things charles said is
that you know
everyone knows you as like a super nice
guy super passionate about
teaching and so on uh what he said
i don't know if it's true that despite
the fact that you're
you are cold like okay
i will admit this finally for the first
time that was that was me
it's the johnny cash song the man in
reno just to watch him die
uh that you actually do have uh some
strong opinions on some topics
so if this in fact is true what
uh strong opinions would you say you
have is there ideas
you think maybe an artificial
intelligence machine learning
maybe in life that you believe is true
that others might
you know some number of people might
disagree with you on
so i try very hard to see things from
multiple perspectives
there's there's this great calvin and
harp's calvin and hobb's cartoon where
cal do you know okay so calvin's dad is
always kind of a bit of a foil and he
he was he talked to calvin and just
calvin had done something wrong
the dad talks him into like seeing it
from another perspective and calvin like
this breaks calvin because he's like oh
my gosh now i can see the opposite sides
of things and so the
it's it becomes like a cubist cartoon
where there is no front and back
everything's just exposed
and it really freaks him out and finally
he settles back down it's like oh good
no i can make that go away
but like i'm that i'm that i live in
that world where i'm trying to see
everything from every perspective all
the time so there are some things that
i've formed opinions about that i
would be harder i think to disavow me of
one is um the super intelligence
argument and the existential
threat of ai is one where i feel pretty
confident
in my feeling about that one like i'm
willing to hear other arguments but like
i am not particularly moved by the idea
that
if we're not careful we will
accidentally create a super intelligence
that will destroy
human life let's talk about that let's
get you in trouble and record your video
it's like bill gates uh i think he said
like
some quote about the internet that
that's just gonna be a small thing it's
not gonna really go anywhere
and i think uh steve ballmer said uh
i don't know why i'm sticking on
microsoft uh that's something
that like smartphones are useless
there's no reason why microsoft should
get into smartphones that kind of
so let's get let's talk about agi as agi
is destroying the world we'll look back
at this video and see
no uh i think it's really interesting to
actually talk about because nobody
really
knows the future so you have to use your
best intuition it's very
difficult to predict it but you have
spoken about agi
and the existential risks around it and
sort of
based on your intuition that we're
quite far away from that being a serious
concern relative to the other concepts
we have
can you maybe uh unpack that a little
bit yeah sure so
so as as i understand it that
uh for example i read boston's book and
a bunch of other
reading material about this sort of
general way of thinking about the world
and i think
the story goes something like this that
we will at
some point create computers that
are smart enough that they can help
design
the next version of themselves which
itself will be smarter than the previous
version of themselves and eventually
bootstrapped up to being smarter than
us at which point we are essentially at
the mercy of this sort of
more powerful intellect which in
principle
uh we don't have any control over what
its goals are and so if its goals
are at all out of sync with our goals
like the ex for example the continued
existence of humanity
we won't be able to stop it it'll be way
more powerful
than us and we will be toast so
there's some i don't know very smart
people who have signed on to that story
and it's a
it's a compelling story i once
now i can really get myself in trouble i
once wrote an op-ed about this
specifically responding to some quotes
from elon musk who has been
you know on this very podcast uh more
than once
and well the e-e-a-i's summoning the
demon that you get
i think he said but then he came to
providence rhode island which is where i
live
and said uh to the governors of all the
states
uh you know you're worried about
entirely the wrong thing you need to be
worried about ai you need to be very
very worried about ai so uh and peop
journalists kind of reacted to that and
they wanted to get people's people's
take and
i was like okay my my my belief
is that one of the things that makes
elon musk so successful and so
remarkable as an individual
is that he believes in the power of
ideas he believes that you can
have you can if you know if you have a
really good idea for getting into space
you can get into space if you have a
really good idea for a company or for
how to change the way that people drive
you just have to do it and
and it can happen it's really natural to
apply that same idea to ai you see
these systems that are doing some pretty
remarkable computational
tricks uh demonstrations and then to
take that idea and just push it
all the way to the limit and think okay
where does this go where is this going
to take us next
and if you're a deep believer in the
power of ideas
then it's really natural to believe that
those ideas could
be taken to the extreme and kill us
so i think you know his strength is also
his undoing because
that doesn't mean it's true like it
doesn't mean that that has to happen
but it's natural for him to think that
so
another way to phrase the way he thinks
and
i find it very difficult to argue with
that
line of thinking uh so sam harris is
another person
from neuroscience perspective that
things like that is
saying well is there something
fundamental
in the physics of the universe that
prevents this from eventually happening
and this nebosh from things in the same
way they're kind of zooming out
yeah okay we humans now uh are existing
in this
like time scale of minutes and days and
so our
intuition is in this time scale of
minutes hours and days
but if you look at the span of human
history
is there any reason we you
can't see this in in 100 years and
like is there is there something
fundamental about the laws of physics
that prevent this
and if it doesn't then it eventually
will happen or will
we will destroy ourselves in some other
way it's very difficult
i find to actually argue against that
yeah
me too and not sound like
not sound like you're just like rolling
your eyes uh i'm like i have
like science fiction we don't have to
think about it but even even
worse than that which is like i don't
know kids but like i gotta pick up my
kids now like this okay i see there's
more pressing shortcuts yeah there's
more pressing short-term things that
like
uh stop over this existential crisis
where much much shorter things like
now especially this year there's cova so
like any kind of discussion like that is
like there's there's p you know there's
pressing things
uh today it's it's and then so the sam
harris argument well like
any day the exponential singularity
can can occur it's very difficult to
argue against i mean i don't know but
part of his story is also
he's he's not going to put a date on it
it could be in a thousand years it could
be in 100 years it could be in two years
it's just that as long as we keep making
this kind of progress
it's ultimately has to become a concern
i i kind of am on board with that but
the thing that the the piece that i feel
like is missing from that
that way of extrapolating from the
moment that we're in
is that i believe that in the process of
actually developing technology that can
really get around in the world and
really process and and
and do things in the world in a
sophisticated way we're going to learn a
lot about
what that means which that we don't know
now because we don't know how to do this
right now
if you believe that you can just turn on
a deep learning network and eventually
give it enough compute and it'll
eventually get there well sure that
seems really scary because we won't we
won't be in the loop at all we want we
won't be helping to design or or target
these kinds of systems but i don't i
don't see
that that feels like it is against the
laws of physics because these systems
need help right they need
they need to surpass the the
the difficulty the wall of complexity
that happens in arranging something in
the form that
that will happen in yeah like i believe
in evolution like i believe that the
that that there's an argument right so
there's another argument just to look at
it from a different
perspective that people say well i don't
believe in evolution how could evolution
it's it's sort of like a random set of
parts
assemble themselves into a 747 and that
could just never happen
yeah so it's like okay that's maybe hard
to argue against but clearly
747s do get assembled they get assembled
by us basically the idea being that
there's a process by which we will get
to the par the point of making
technology that has that kind of
awareness and
in that process we're going to learn a
lot about that process and we'll have
more
ability to control it or to shape it or
to build it in our own image
it's not something that is going to
spring into existence like that 747
and we're just gonna have to contend
with it completely unprepared
it's very possible that in the context
of the long arc of human history it will
in fact spring into existence
but that springing might take like if
you look at nuclear weapons
like even 20 years is a springing
in in the context of human history and
it's very possible just like with
nuclear weapons that we could have
i don't know what percentage you want to
put at it but the the possibility
could have knocked ourselves out yeah
the possibility of human beings
destroying themselves in the 20th
century
with nuclear weapons i don't know you
can if you really think through it
you could really put it close to like i
don't know 30 40 percent
given like the certain moments of crisis
that happen
so like i think one
like fear in the shadows that's not
being acknowledged
is it's not so much the ai will run away
is
is that as it's running away we won't
have enough time to uh think through how
to stop it
right fast takeoff or foom yeah i mean
my
much bigger concern i wonder what you
think about it which is
we won't know it's happening
so i kind of that argument i think that
there is an
agi situation already happening with
social media
that our minds our collective
intelligence of human civilization is
already being controlled by an algorithm
and like we're we're already super
like the the level of a collective
intelligence thanks to wikipedia people
should donate to wikipedia
to feed the agi man if we had a super
intelligence that
that was in line with wikipedia's values
that it's a lot better than a lot of
other things i can imagine i've i trust
wikipedia more than i trust facebook or
youtube
as far as trying to do the right thing
from a rational perspective
yeah now that's not where you were going
i understand that but it it it does
strike me that there's sort of
smarter and less smart ways of of
exposing ourselves to each other on the
internet yeah the interesting thing is
that wikipedia
and social media have very different
forces you're right i mean wikipedia if
if agi was wikipedia it'd be just like
this
cranky overly competent editor
of uh articles uh you know there's
there's something to that but the social
media aspect is is is not
so the vision of agis is as a separate
system
that's super intelligent that's super
intelligent that's one key little thing
i mean there's the paper clip argument
that's super dumb
but super powerful systems but with
social media you have
a relatively like algorithms we may talk
about today
very simple algorithms that when
uh something charles talks a lot about
which is interactive ai when they start
like
having at scale like tiny little
interactions with human beings
they can start controlling these human
beings so a single algorithm
can control the minds of human beings
slowly to what we might not
realize it could start wars it could
start it can change the way we
think about things it feels like in the
long arc of history
if i were to sort of zoom out from all
the outrage and all the tension on
social media
that it's progressing us towards uh
better and better things
it feels like chaos and toxic and all
that kind of stuff but it's chaos and
toxic
yeah but it feels like actually the
chaos and toxic is similar to the kind
of debates we had
from the founding of this country you
know there was a civil war that happened
over that over that period and
ultimately it was all about
this tension of like something doesn't
feel right about
our implementation of the core values we
hold as human beings and they're
constantly struggling with this
and that results in people calling each
other
uh like just just being shitty to each
other on twitter
but i ultimately the algorithm is
managing all that and it feels like
there's a possible future in which that
algorithm
controls us to into the direction of
self-destruction
whatever that looks like yeah so so all
right i do believe in the power of
social media to
screw us up royally i do believe in the
power of social media to benefit us too
i do think that we're in a
yeah it's sort of almost got dropped on
top of us and now we're trying to as a
culture figure out how to cope with it
there's a sense in which i don't know
there's there's some arguments that say
that for example
i guess college-age students now late
college-age students now people who are
in middle school when when social media
started to really take off
maybe maybe really damaged like me this
may have really hurt their development
in a way that we don't
have all the implications of quite yet
that's the generation who
if and i hate to make it somebody else's
responsibility but like they're the ones
who can fix it they're the ones who can
who can figure out
how do we keep the good of this kind of
technology without
letting it eat us alive and
if they're successful we move on to the
next phase the next level of the game
if they're not successful then yeah then
we're going to wreck each other we're
going to
destroy society so you're going to in
your old age sit on the porch and watch
the world burn
because the tick tock generation that uh
i believe well so my this is my kids age
right and that's certainly my daughter's
age and she's very tapped in
to social stuff but she's also she's
trying to find that balance right of
participating in it and then getting the
positives of it but without letting it
eat her alive um and i think sometimes
she ventures
hopes just to watch this sometimes i
think she ventures a little too far and
is
in and is consumed by it and other times
she gets a little distance
um and if you know if there's enough
people like her out there they're gonna
they're gonna navigate this this choppy
waters that's that's an
interesting uh skill actually to develop
i talked to my dad about it
you know i've uh now somehow
this podcast in particular but other
reasons
has received a little bit of attention
and with that apparently in this world
even though
i don't shut up about love and i'm just
all about kindness
i i have now a little mini army of
trolls
oh it's kind of hilarious actually but
it also doesn't feel good
but it's a skill to learn
to not look at that like to moderate
actually how much you look at that
the discussion i have with my dad is
similar to uh it doesn't have to be
about trolls it could be about checking
email
which is like if you're anticipating you
know there's uh my dad
runs a large institute at drexel
university and
there could be stressful like emails
you're waiting like there's drama of
some kind
and so like there's a temptation to
check the email if you send an email you
cut it
and that pulls you in into it doesn't
feel good
and it's a skill that he actually
complains that he hasn't learned i mean
he
grew up without it so he hasn't learned
the skill of how to
shut off the internet and walk away and
i think young people
while they're also being quote-unquote
damaged by like
uh you know being bullied online all
those stories which are very
like horrific you basically can't escape
your bullies
these days when you're growing up but at
the same time they're also learning that
skill of how to
be able to shut off uh the like
disconnect with it be able to laugh at
it not take it too seriously
it's fascinating like we're all trying
to figure this out just like you said
it's
been dropped on us and we're trying to
figure it out yeah i think that's really
interesting and i
i guess i've become a believer in the
human design
which i feel like i don't completely
understand like how do you make
something as robust as us like we're
so flawed in so many ways and yet and
yet
you know we dominate the planet and we
do seem to manage to get ourselves out
of scrapes
eventually not necessarily the most
elegant possible way but somehow we get
we get to the next step and i don't know
how i'd make a
machine do that i i i
generally speaking like if i train one
of my reinforcement learning agents to
play a video game and it works really
hard on that first stage over and over
and over again and it makes it through
it succeeds on that first level
and then the new level comes and it's
just like okay i'm back to the drawing
board and somehow humanity we keep
leveling up
and then somehow managing to put
together the skills necessary to
achieve success some semblance of
success in that next level too
and you know i hope we can keep doing
that
you mentioned reinforcement learning so
you've have uh
a couple years in the field no quite
you know quite a few quite a long career
in artificial intelligence broadly but
reinforcement learning specifically
can you maybe give a hint about your
sense
of the history of the field and in some
ways has changed with the
advent of deep learning but has a long
roots like how is it
weaved in and out of your own life how
have you seen the community change or
maybe the ideas that it's playing with
change
i've had the privilege the pleasure of
being
of having almost a front row seat to a
lot of this stuff and it's been really
really fun and interesting so uh when i
was in college in the 80s
early 80s uh the neural net
thing was starting to happen and i was
taking a lot of psychology classes a lot
of computer science classes
as a college student and i thought you
know something that can play tic-tac-toe
and just like learn to get better at it
that ought to be a really easy thing so
i spent almost
almost all of my what would have been
vacations during college
like hacking on my home computer trying
to teach it how to play tic-tac-toe and
programming language
basic oh yeah that's that's i was i
that's my first language that's my
native language
is that when you first fell in love with
computer science just like programming
basic on that
uh what was the computer do you remember
i had i had a trs-80
model one before they were called model
ones because there was nothing else uh
i got my computer in 1979
uh instead so i was i was i would have
been bar mitzvahed
but instead of having a big party that
my parents threw on my behalf
they just got me a computer because
that's what i really really really
wanted i saw him in the in the
in the mall in radio shack and i thought
what how are they doing that i would try
to stump them i would give them math
problems like
one plus and then in parentheses two
plus one yeah and i would always get it
right i'm like
how do you know so much message like
i've had to go to algebra class for the
last few years to learn this stuff and
you just seem to know
so i was i was i was smitten and i got a
computer and i think ages
13 to 15
i have no memory of those years i think
i just was in my room with the computer
listening to billy joel communing
possibly listening to the radio
listening to billy joel
that was the one album i had uh on vinyl
at that time
and um and then i got it on cassette
tape and that was really helpful
because then i could play it i didn't
have to go down to my parents wi-fi or
hi-fi
sorry uh and at age 15 i remember kind
of walking out and like okay
i'm ready to talk to people again like
i've learned what i need to learn here
and um so yeah so so that was that was
my home computer and so i went to
college and i was like oh i'm totally
going to study computer science
i opted the college i chose specifically
had a computer science major
the one that i really wanted the college
i really wanted to go to didn't so
bye-bye to them which college did you go
through so i went to yale
uh princeton would have been way more
convenient and it was just beautiful
campus and it was close enough to home
and i was really excited about princeton
and i visited
i said so computer science major like
well we have computer engineering i'm
like oh i don't like that word
engineering
i like if you're science i really i want
to do like you're saying hardware and
software they're like yeah like i just
want to do software i
i couldn't care less about hardware you
grew up in philadelphia i grew up
outside philly yeah yeah okay
uh so the you know local schools were
like penn and drexel
and uh temple like everyone in my family
went to temple at least at one point in
their lives except for me
so yeah philly philly family yale had a
computer science department and that's
when you
it's kind of interesting you said 80s
and you're all that works that's when
you know that which is a hot new thing
or a hot
thing period uh so what is that in
college when you first learned about
neural networks yeah
yeah was she learned like it was in a
psychology class not in a cs wow
yeah was it psychology or cognitive
science or like do you remember like
what context it was yeah yeah yeah so so
i was a
i've always been a bit of a cognitive
psychology groupie
so like i studied computer science but i
like i like to hang around where the
cognitive
scientists are because i don't know
brains man they're like
they're wacky cool and they have a
bigger picture view of things they're a
little less
engineery i would say they're more
they're more interested in the
nature of cognition and intelligence and
perception it's called like the vision
system work
they're asking always bigger questions
now with
the deep learning community there i
think more there's a lot of
intersections but i do find in
that the neuroscience folks actually
and uh cognitive psychology cognitive
science folks
are starting to learn how to program how
to use your own artificial neural
networks
and they are actually approaching
problems in like totally new interesting
ways
it's fun to watch that grad students
from those departments
like approach the problem of machine
learning right they come in with a
different perspective yeah they don't
care about like your
imagine that data set or whatever they
they want like to understand the
the like the basic mechanisms
at the at the neuronal level and the
functional level of intelligence it's
kind of
it's kind of cool to see them work but
yeah okay so
you always you're always a group you
have cognitive psychology
yeah yeah and so uh so it was in a class
by richard garrick he was kind of my
my favorite uh psych professor in
college and i took uh like three
different classes with him
and yeah so that we they were talking
specifically the class i think was kind
of a
there was a big paper that was written
by stephen pinker and
uh prince i don't i'm blanking on
prince's first name but prince and
pinker and prince
they wrote kind of a they were at that
time
kind of like ah i'm blanking on the
names of the current people
um the cognitive scientists who are
complaining a lot about deep networks
oh uh gary gary marcus sorry marcus
and who else i mean there's a few but
gary gary's the most feisty
sure gary's very feisty and with this
with his co-author they they you know
they're kind of
doing these kind of takedowns where they
say okay well yeah it does all these
amazing amazing things but
here's a shortcoming here's a
shortcoming here's your shortcoming and
so the pinker prince paper
is kind of like the that generation's
version of
marcus and davis right where they're
they're trained as cognitive scientists
but they're looking skeptically at the
results in the
in the artificial intelligence neural
net kind of world and saying
yeah it can do this and this and this
but like it can't do that and it can't
do that and it can't do that
maybe in principle or maybe just in
practice at this point but but the fact
of the matter is
you're you've narrowed your focus too
far
to be impressed you know you're
impressed with the things within that
circle
but you need to broaden that circle a
little bit you need to look at a wider
set of problems
and so um so we have so i was in this
seminar in college
that was basically a close reading of
the pinker prince paper
which was like really thick there was a
lot going on in there
and um and and it talked about
the reinforcement learning idea a little
bit i'm like oh that sounds really cool
because behavior is what is really
interesting to me about
psychology anyway so making programs
that i mean programs are things that
behave
people are things that behave like i
want to make learning that learns to
behave
in which way was reinforcement learning
presented is this uh talking about
human and animal behavior or are we
talking about actual mathematical
constructs ah that's
right so that's a good question right so
this is i think
it wasn't actually talked about as
behavior in the paper that i was reading
i think that it just talked about
learning and to me learning is about
learning to behave but really
neural nets at that point were about
learning like supervised learning so
learning to produce outputs from inputs
so i kind of tried to invent
reinforcement learning
i uh when i graduated i joined a
research group at
bellcore which had spun out of bell abs
recently at that time because of the
divestiture of the
of long distance and local phone service
in the 1980s 1984
and i was in a group uh with dave ackley
who
was the first author of the boltzmann
machine paper so the very first neural
net paper that could handle
xor right so xor sort of killed neural
nets the very first the zero with the
first winter
yeah um the the perceptron's paper
and hinton along with his student dave
ackley and and i think there was other
authors as well
showed that no no with both machines we
can actually learn
non-linear concepts and so everything's
back on the table again and that kind of
started that second wave of neural
networks
so dave ackley was he became my mentor
at bellcore and we
talked a lot about learning and life and
computation and how all these things fit
together
now dave and i have a podcast together
so um so i get to
kind of enjoy that sort of
his his perspective uh once again even
even all these years later
and so i said so i said i was really
interested in learning but
in the concept of behavior and he's like
oh well that's reinforcement learning
here and he gave me rich sutton's 1984
td paper
so i read that paper i honestly didn't
get all of it
but i got the idea i got that they were
using that he was using ideas
that i was familiar with in the context
of neural nets and and
like sort of backprop uh but with this
idea of making predictions over time i'm
like this is so interesting but i don't
really get all the details i said
to dave and dave said oh well why don't
we have him come and give a talk
and i was like wait what you can do that
like these are real people
i thought they were just words i thought
it was just like ideas that somehow
magically seeped into paper he's like no
i
i i know rich like we'll just have him
come down and and he'll give a talk
and so i was you know my mind was blown
and uh so rich came and he gave a talk
at bellcore
and he talked about what he was super
excited which was they had just figured
out at the time
uh q learning so uh watkins had visited
the rich sutton's lab at umass or
it's andy barto's lab that rich was a
part of
and um he was really excited about this
because it resolved a whole bunch of
problems that he didn't know how to
resolve in the
in the earlier paper and so uh
for people who don't know td temporal
difference these are all just algorithms
for reinforcement learning
right and td separate difference in
particular is about making predictions
over time
and you can try to use it for making
decisions right because if you can
predict how good a future action
and action outcomes will be in the
future you can choose one that has
better and or
but the theory didn't really support
changing your behavior like the
predictions had to be of a consistent
process if you really
wanted it to work and one of the things
that was really cool about
q-learning algorithm for reinforcement
learning is it was off policy which
meant that you could actually be
learning about the environment and what
the value of different actions would be
while actually figuring out how to
behave
optimally yeah so that was a revelation
yeah and the proof of that is kind of
interesting
i mean that's really surprising to me
when i first read that and then enriched
rich sutton's book on the matter it's
it's kind of
beautiful that a single equation can
capture an equation one line of code and
like you can learn anything
yeah like enough time so equation and
code you're right like
you can the code that
you can arguably at least if you like
squint your eyes
can say this is all of intelligence
is that you can implement that in a
single wall i think i started with lisp
which is uh
shout out to lisp uh like a single line
of code
key piece of code maybe a couple that
you could do that it's kind of magical
it's uh feels too good to be true
well and it sort of is yeah it's kind of
kind of
it seems to require an awful lot of
extra stuff supporting it but
yeah but nonetheless the ideas the the
idea is really good and as far as we
know it is it is
a very reasonable way of trying to
create adaptive behavior
behavior that gets better at something
over time
did you find the idea of optimal uh at
all compelling that
you could prove that it's optimal so
like one part of computer science
that it makes people feel warm and fuzzy
inside
is when you can prove something like
that a sorting algorithm worst case
runs and and log n and it makes
everybody feel so good
even though in reality it doesn't really
matter what the worst case is what
matters is like
does this thing actually work in
practice on this particular actual set
of data that i
that i enjoy did you so here's that
here's a place where i have maybe a
strong opinion uh-oh which is like
you're right of course but no no like so
so the what makes worst case so great
right if you have a worst case analysis
so great is that you get modularity
you can take that thing and plug it into
another thing and still
have some understanding of what's going
to happen when you click them together
right if it just works well in practice
in other words with respect to some
distribution that you care about
when you go plug it into another thing
that distribution can shift
it can change and your thing may not
work well anymore and you want it to
and you wish it does and you hope that
it will but it might not and then
ah so you're so so you're saying you
don't like
machine learning
but we have some positive theoretical
results for these things
you know you can come back at me with
yeah but they're really weak and yeah
they're really weak and and you can even
say that
you know sorting algorithms like if you
do the optimal sorting algorithm it's
not really the one that you want
and that might be true as well but but
it is the modularity is a really
powerful statement
really as an engineer you can then
assemble different things you can count
on them to be
i mean it's interesting it's it's a
balance
like with everything else in life you
don't want to get too obsessed i mean
this is what computer scientists do
which they
potentially get obsessed they over
optimize things
or they start by optimizing them they
over optimize yeah so it

Resume

# Wawancara Eksklusif: Sejarah Kecerdasan Buatan, Risiko Eksistensial, dan Filosofi Hidup bersama Michael Littman

### Inti Sari (Executive Summary)
Video ini merupakan diskusi mendalam antara Lex Fridman dan Michael Littman, Profesor Ilmu Komputer di Brown University, mengenai evolusi kecerdasan buatan (AI), khususnya *Reinforcement Learning* (RL). Percakapan ini mencakup perjalanan sejarah AI dari masa kejayaan jaringan saraf tiruan (neural networks) di tahun 80-an hingga kejayaan AlphaGo, perdebatan sengit seputar ancaman eksistensial AGI, serta dampak algoritma media sosial terhadap masyarakat. Littman juga berbagi perspektif pribadi mengenai etika teknologi, tantangan mobil otonom, dan filosofi "keseimbangan" dalam kehidupan.

### Poin-Poin Kunci (Key Takeaways)
*   **Skeptisisme Terhadap Kiamat AI:** Littman percaya bahwa kita tidak akan secara tidak sengaja menciptakan superintelligence yang menghancurkan umat manusia, karena pengembangan teknologi canggih memerlukan proses pembelajaran dan keterlibatan manusia yang terus-menerus ("in the loop").
*   **Media Sosial sebagai AGI Primitif:** Algoritma media sosial yang mengontrol kecerdasan kolektif manusia dapat dianggap sebagai bentuk awal AGI yang berpotensi membawa perubahan besar, baik positif maupun destruktif.
*   **Pelajaran Pahit (*The Bitter Lesson*):** Dalam jangka panjang, metode algoritmik yang sederhana namun memanfaatkan kekuatan komputasi (komputasi) cenderung mengungguli trik akademis yang kompleks.
*   **Tantangan Mobil Otonom:** Hambatan terbesar dalam pengembangan mobil otonom bukanlah mekanisme mengemudi, melainkan interaksi sosial dan pemahaman *Theory of Mind* (membaca pikiran pengemudi lain).
*   **Filosofi Keseimbangan:** Littman mengartikan makna hidup sebagai "keseimbangan"—tidak berlebihan dalam satu hal pun—dan menekankan pentingnya hubungan serta kerja keras untuk tujuan yang baik.

---

### Rincian Materi (Detailed Breakdown)

#### 1. Pengantar, Latar Belakang, dan Selera Musik
*   **Profil Narasumber:** Michael Littman adalah Profesor Ilmu Komputer di Brown University yang mengkhususkan diri dalam *Machine Learning* dan *Reinforcement Learning*. Ia dikenal memiliki pendekatan yang ringan dan humoris.
*   **Pengaruh Fiksi Ilmiah:** Littman merekomendasikan film *Robot and Frank* karena realistis; ia menggambarkan teknologi harus dapat dibentuk oleh penggunanya, bukan sebaliknya. Ia juga membahas antropomorfisme (manusia yang memberikan sifat manusia pada benda seperti robot vacuum).
*   **Selera Musik & Iklan TurboTax:** Littman berbagi cerita tentang kemunculannya dalam iklan TurboTax sebagai "ilmuan jenius". Ia juga menjelaskan bahwa ia menyadari selera musiknya sangat dipengaruhi oleh keakraban (familiarity), bukan objektivitas kualitas musik itu sendiri.

#### 2. Dunia Produksi Video dan Podcasting
*   **Di Balik Layar Iklan:** Littman menceritakan pengalamannya syuting iklan dengan kru besar (sekitar 50 orang) di New Jersey, di mana setiap detail (seperti pencahayaan hingga keringat) dikelola tim profesional. Ini kontras dengan videonya sendiri yang ia buat sederhana menggunakan PowerPoint dan lagu parodi.
*   **Kebebasan Podcast:** Ia lebih menyukai podcasting daripada media arus utama karena kurangnya sensor dan gangguan tim produksi, yang memungkinkan pembicaraan yang lebih mendalam dan alami.

#### 3. Risiko Eksistensial dan Perdebatan AGI
*   **Keraguan terhadap Skenario "Doom":** Littman meragukan skenario di mana AI tiba-tiba menjadi superintelligence dan menghancurkan manusia. Ia berargumen bahwa pembangunan teknologi semacam itu membutuhkan proses belajar yang panjang di mana manusia selalu terlibat dalam desain dan targetnya.
*   **Respon pada Elon Musk & Sam Harris:** Meskipun mengakui kecerdasan Musk dan argumen fisika dari Sam Harris, Littman berpendapat bahwa kekuatan ide Musk bisa menjadi "keruntuhan"nya sendiri karena mendorong skenario ekstrem tanpa mempertimbangkan kompleksitas implementasi teknis yang nyata.

#### 4. Media Sosial dan Kecerdasan Kolektif
*   **AGI melalui Algoritma:** Ada pandangan bahwa AGI sudah muncul melalui algoritma media sosial yang mengontrol pikiran kolektif. Wikipedia dibandingkan dengan Facebook; Wikipedia dipercaya lebih rasional, sementara algoritma media sosial sederhana namun memanipulatif.
*   **Dampak Generasi:** Generasi yang tumbuh dengan media sosial mungkin mengalami kerusakan perkembangan, tetapi juga mengembangkan ketahanan (resiliensi) untuk menghadapi toksisitas daring—keterampilan yang mungkin dimiliki generasi sebelumnya.

#### 5. Sejarah *Reinforcement Learning* (RL)
*   **Awal Mula Karier:** Littman tertarik pada psikologi kognitif dan ilmu komputer sejak awal 1980-an. Ia mendapatkan komputer pertamanya (TRS-80) sebagai pengganti pesta Bar Mitzvah.
*   **Pertemuan dengan Rich Sutton:** Di Bellcore, Littman berinteraksi dengan tokoh-tokoh kunci seperti Rich Sutton dan Dave Ackley. Ia belajar tentang Q-learning dan TD (Temporal Difference) learning, yang ia gambarkan sebagai "satu baris kode ajaib" yang mewakili perilaku adaptif.
*   **Era TD Gammon:** Jerry Tesauro menciptakan TD Gammon, program backgammon yang belajar melalui *self-play* (bermain melawan dirinya sendiri). Ini adalah terobosan besar, namun pada saat itu, banyak peneliti lain gagal mereplikasi hasilnya karena ketidakstabilan jaringan saraf (neural nets) saat itu.

#### 6. Evolusi AlphaGo dan AlphaZero
*   **Peran David Silver:** AlphaGo berhasil mengalahkan juara dunia Go berkat kombinasi data ahli, *distributed learning*, dan ketekunan David Silver.
*   **AlphaZero:** Versi yang lebih canggih, AlphaZero, belajar tanpa data manusia sama sekali (*pure self-play*). Littman menganggap ini sangat mengesankan ("knocked my socks off") karena integrasi rekayasa perangkat lunaknya.
*   **Batasan Permainan:** Meskipun AI terus meningkat dalam permainan tertutup seperti Catur dan Go, ada batas strategis yang finit. Namun, penerapan metode *self-play* ini pada dunia nyata (seperti bahasa atau mobil otonom) jauh lebih rumit.

#### 7. Bahasa, GPT-3, dan Interaksi
*   **Keterbatasan GPT-3:** Transformer networks seperti GPT-3 luar biasa dalam statistik bahasa, tetapi seringkali hanya meniru data masa lalu tanpa pemahaman mendalam atau kemampuan bercerita yang asli.
*   **Pentingnya Interaksi:** Littman berspekulasi bahwa kecerdasan bahasa sejati mungkin tidak bisa dicapai hanya dengan membaca teks (seperti GPT), tetapi memerlukan interaksi dua arah dengan manusia untuk memahami konteks dan maksud.

#### 8. Hukum Moore, Kapitalisme, dan Mobil Otonom

## Kesimpulan & Pesan Penutup
Wawancara ini menghadirkan perspektif Michael Littman yang menyeimbangkan antara kemajuan teknologi *Reinforcement Learning* dan skeptisisme terhadap ancaman eksistensial AI. Dari sejarah AlphaGo hingga dampak media sosial, pembahasan menekankan bahwa pengembangan AI yang aman memerlukan keterlibatan manusia yang terus-menerus. Pesan filosofis tentang keseimbangan pun menjadi pengingat penting untuk menyikapi evolusi teknologi dengan bijak.

Read

file updated 2026-02-13 13:22:26 UTC