Transcript
Kedt2or9xlo • Oriol Vinyals: DeepMind AlphaStar, StarCraft, and Language | Lex Fridman Podcast #20
/home/itcorpmy/itcorp.my.id/harry/yt_channel/out/lexfridman/.shards/text-0001.zst#text/0074_Kedt2or9xlo.txt
Kind: captions
Language: en
the following is a conversation with
Ariane Vinnie Alice he's a senior
research scientist at google deepmind
and before that he was a Google brain
and Berkeley his research has been cited
over 39,000 times he's truly one of the
most brilliant and impactful minds in
the field of deep learning
he's behind some of the biggest papers
and ideas and AI including sequence the
sequence learning audio generation image
captioning neural machine translation
and of course reinforcement learning
he's a lead researcher of the Alpha Star
project creating an agent that defeated
a top professional at the game of
StarCraft this conversation is part of
the artificial intelligence podcast if
you enjoy it subscribe on youtube itunes
or simply connect with me on twitter at
Lex Friedman spelled Fri D and now
here's my conversation with Arielle
Minnie Alice
you spearheaded the deepmind team behind
alpha star that recently beat a top
professional player Starcraft so you
have an incredible wealth of work and
deep learning in a bunch of fields but
let's talk about Starcraft first let's
go back to the very beginning even
before alpha star before deep mine
before deep learning first what came
first for you a lot for programming or a
love for videogames I think for me it
definitely came first the drive to play
videogames I really liked computers I
didn't really code much but what I would
do is I would just mess with the
computer break it and fix it that was
the level of skills I guess that I
gained in my very early days I mean when
I was 10 or 11 and then I really got
into video games especially Starcraft
actually the first version I spent most
of my time just playing kind of pseudo
professionally as professionally as you
could play back in 98 in Europe which
was not a very main scene like that
what's called nowadays eSports right of
course in the 90s so how did you get
into StarCraft what was your favorite
race how do you develop how did you
develop your skill what was your
strategy all that kind of thing so as a
player I tended to try to play not many
games not to kind of disclose the
strategies that I kind of developed and
I like to play random actually not in
competitions but just to I think in
StarCraft there's well there's three
main races and I found it very useful to
play with all of them and so I would
choose random many times even sometimes
in tournaments to gain skill on the
three races because it's not how you
play against someone but also if you
understand the race because you play it
you also understand what's annoying what
then when you're on the other side what
to do to annoy that person to try to
gain advantages here and there and so on
so I actually played random although I
must say in terms of favorite race I
really liked zerk I was probably best at
Zerg and that's probably what I tend to
use towards the end of my career
year before starting University so let's
step back a little bit could you try to
describe Starcraft to people that may
never have played video games especially
the massively online variety right so
craft so Starcraft is a real-time
strategy game and the way to think about
Starcraft perhaps if you understand a
bit chess is that there are there's a
board which is called map or or or the
gallic the map where people play against
each other there's obviously many ways
you can play but the most interesting
one is the one versus one setup where
you just play against someone else or
even the built in AI right the wizard
put a system that can play the game
reasonably well if you don't know how to
play and then in this board you have
again pieces like in chess but these
pieces are not there initially like they
are in chess you actually need to decide
to gather resources to decide which
pieces to build so in a way you're
starting almost with no pieces you start
gathering resources in StarCraft there's
minerals and gas that you you can gather
and then you must decide how much do you
want to focus for instance on gathering
more resources or starting to build
units or pieces and then once you have
enough pieces or maybe like attack you
know a good attack composition then you
go and attack the other side of the map
and now the other main difference with
chess is that you don't see the other
side of the map so you're not seeing the
moves of the enemy it's what we call
partially observable
so as a result you must not only decide
Trading of economy versus building your
own units but you also must decide
whether you want to scout
to gather information but also by
scouting you might be giving away some
information that you might be hiding
from the enemy so there's a lot of
complex decision-making all in real-time
there's also unlike chess this is not a
turn-based game you play basically all
the time continuously and thus some
skill in terms of speed and accuracy of
clicking is also very important and
people that train for these really play
this game at an amazing skill
I've seen many times these and if you
can witness this life is really really
impressive so in a way it's kind of a
chess where you don't see the other side
of the board you're building your own
pieces and you also need to gather
resources to basically get some money to
build other buildings pieces technology
and so on
from the perspective of the human player
the difference between that and chess or
maybe that and a game like turn-based
strategy like heroes a might of magic is
that there's an anxiety because you have
to make these decisions really quickly
and if you are not actually aware or
what decisions work it's a very
stressful balance the if there's
everything you describe is actually
quite stressful difficult to balance for
a mature human player I don't know if it
gets easier at the professional level
like if they're fully aware what they
have to do but at the amateur level
there's this anxiety oh crap I'm being
attacked oh crap I have to build up
resource oh I have to probably expand
and all these the time the real-time
strategy aspect is really stressful and
computation I'm sure difficult we'll get
into it but for me battlenet so
Starcraft was released in 98 20 years
ago which is hard to believe and
Blizzard battlenet with Diablo 96 came
out and to me it might be a narrow
perspective but it changed online gaming
and perhaps society forever
yeah but I may have made way too narrow
viewpoint but from your perspective can
you talk about the history of gaming
over the past 20 years is this how
transformational how important is this
line of games right so I think I I kind
of was an active gamer whilst this was
developing the internet I'm online
gaming so for me that the way it came
was I played other games strategy
related I play a bit of common and
conquer and then I played Warcraft 2
which is from Blizzard but at the time I
didn't know I didn't understand about
what Blizzard was or anything Warcraft 2
was just a game
which is which was actually very similar
to start off in many ways it's also
real-time strategy game and where
there's orcs and humans so there's only
two races throughs offline and it was
offline right so I remember a friend of
mine came to to school say oh there's
this new cool game called Starcraft and
I just said all these sounds like just a
copy of Warcraft 2 until I kind of
installed it and at the time I am from
Spain so we didn't have internet like
very good internet right so there was
for us a Starcraft became first kind of
an offline experience where you kind of
start to play these missions right you
play against some sort of scripted
things to develop the story of the
characters in the game and then later on
I start playing against the butene AI
and I thought it was impossible to
defeat it then eventually you defeat one
and you can actually break n7 built in
the eyes at the same time which also
felt impossible but actually it's not
that hard to beat seven built-in eyes at
once so once we achieve that also we
discovered that we could play as I said
internet wasn't that great but we could
play with the land right on like
basically against each other if we were
in the same place because you could just
connect machines with like cables right
so we started playing in LAN mode and
again you know as a group of friends and
it was really really like much more
entertaining than playing against the
eyes and later on as internet was
starting to develop and being a bit
faster and more reliable then it's when
I started experiencing battlenet which
is these amazing universe not only
because of the fact that you can play
the game against any way anyone in the
world but you can also get to know more
people
you just get exposed to now like this
vast variety of it's kind of a bit when
the chats came about right there was a
chat system you could play against
people where you could also chat with
people not only about soccer but about
anything and that became a way of life
for kind of two years and obviously then
it became like an elite exploded didn't
mean that I started to play more
seriously going to tournaments and so on
so forth
do you have a sense and a societal
sociological level what
whole part of society that many of us
are not aware of and it's a huge part of
society which is gamers I mean every
time I come across that in YouTube or
streaming sites I mean this is a huge
number of people play games religiously
do you have a sense of those folks
especially now that you've returned to
that realm a little bit on the high side
yeah so in fact I even after soccer if I
actually played World of Warcraft which
is mainly the main sort of online world
or in presence that you get to interact
with lots of people so I played that for
a little bit it was to me it was a bit
less stressful than Starcraft because
winning was kind of a given you just put
in this world and you can always
complete missions but I think it was
actually the social aspect of especially
Starcraft first and then games like
World of Warcraft really shaped me in a
very interesting ways because you had
you get to experience it's just people
you wouldn't usually interact with right
so even nowadays I still have many
Facebook friends from the area where I
played online and their ways of thinking
is even political they just don't we
don't live in like we don't interact in
there in the real world but we were
connected by basically fiber and that
way I actually get to understand a bit
better that we live in a diverse world
and these were just connections that
were made by because you know I happened
to go in a city in a virtual city as a
priest and I met these you know this
warrior and we became friends and then
we start like playing together right so
I think it's it's it's transformative
and more and more and more people are
more aware of it I mean it's it's
becoming quite mainstream but back in
the day as you were saying in 2000 2005
even it was very still very strange
thing to do especially in in Europe I
think there were exceptions like Korea
for instance it was amazing like that
that everything happened so early in
terms of cyber cafes like it's if you go
to Seoul it's a city that back in the
day Starcraft was kind of you could be a
celebrity by playing Starcraft but this
was like 99 2000 right it's not like
recently
so um yeah it's quite it's quite
interesting to to look back and and yeah
I think it's changing society that the
same way of course like technology and
social networks and so on are also
transforming things and a quick tangent
let me ask you're also one of the most
productive people in your particular
chosen passion and path in life and yet
you're also appreciate and enjoy video
games do you think it's possible to do
to enjoy video games in moderation
someone told me that you could choose
two out of three when I was playing
video games you could choose having a
girlfriend playing video games or
studying and I think for the most part
it was relatively true these things do
take time games like stark if you take
the game pretty seriously and you wanna
study it then you obviously will
dedicate more time to it and I
definitely took gaming and obviously
studying very seriously I loved learning
science and etc so to me especially when
I started University undergrad I kind of
step off Starcraft I actually fully
stopped playing and then wall of war
curve was a bit more casual you could
just connect online and I mean it was it
was fun but I as I said that was not as
much time investment as it was for me in
StarCraft ok so let's get into alpha
star what are the your behind the team
so deep mine has been working on
Starcraft and released a bunch of cool
open-source agents and so on the past
few years but alpha star really is the
moment where the first time you beat a
world-class player so what are the
parameters of the challenge in the way
that alpha star took it on and how did
you and David and the rest of deepmind
team get into it
consider that you can even beat the best
in the world or top players I think it
all started in back in 2015
actually I'm lying I think it was 2014
when the mine was acquired by Google and
I at the time was at Google brain which
is it it was in California
in California we had this summit where
we got together the two groups so Google
brain and google deepmind got together
and we gave a series of talks and given
that they were doing deep reinforcement
learning for games I decided to bring up
part of my past which I had developed at
Berkeley like this thing which we called
Berkeley over mine which is really just
a Starcraft one but right so I about
that and I remember them is just came to
me and said well maybe not now it's it's
perhaps a bit too early but you should
just come to the mine and do this again
with deep reinforcement learning right
and at the time it sounded very
science-fiction for for several reasons
but then in 2016 when I actually moved
to London and joined in mind
transferring from brain it became
apparent that because of the alphago
moment and kind of Blizzard reaching out
to us to say wait like do you want the
next challenge and also me being
full-time at deep mine it's a sort of
kind of all these came together and then
I was I went to to air vine in
California to the Blizzard headquarters
to just chat with them and try to
explain how would it all work before you
do anything and the approach has always
been about the learning perspective
right so in in Berkeley we did a lot of
rule-based you know conditioning and or
if you have more than three units then
go attack and if the other has more
units than me I retreat and so on and so
forth and of course the point of deep
reinforcement learning deep learning
machine learning in general is that all
these should be learned behavior so that
kind of was the DNA of the project since
its inception in 2016 where we just
didn't even have an environment to work
with and so these that's how it all
started really so if you go back to a
conversation with damage or even in your
own head how far away did you because
that's we're talking about Atari games
we're talking about go which is kind of
if you're honest about it really far
away from Starcraft in a well now that
you've beaten it maybe you could say
it's close but is it's much it seems
like Starcraft is way harder than go
philosophically in mathematically
speaking so how far away did you did you
think you were do you think it's 2019 in
18 you could be doing as well as you
have yeah when I when I kind of thought
about okay I'm gonna dedicate know a lot
of my time and focus on this and
obviously I do a lot of different
research in deep learning so spending
time on it I mean I really had to kind
of think there's gonna be something good
happening out of this so really I
thought well this sounds impossible and
it probably is impossible to do the full
thing like the all like the full game
where you play one versus one and it's
only a neural network playing and so on
so it really felt like I just didn't
even think it was possible but on the
other hand I could see some stepping
stones like towards that goal clearly
you could define subproblems in
StarCraft and sort of dissect it a bit
and say okay here is a part of the game
here's another part and also obviously
the fact so this was really also
critical to me the fact that we could
access human replays right so Blizzard
was very kind and in fact they
open-source these for the whole
community where you can just go and it's
not every single Starcraft game ever
played but it's a lot of them you can
just go and download and every day they
will you can just query a dataset and
say well give me all the games that were
played today and given my kind of
experience with language and sequences
and supervised learning I thought well
that's definitely gonna be very helpful
and something quite unique now because
ever before we had such a large dataset
of replays of people playing the game at
this scale of such a complex video game
right so that to me was a precious
resource and as soon as I knew that
Blizzard was able to kind of give these
to the community I started to feel
positive about something non-trivial
happening but but I also thought the
full thing like really no rules no no
single line of code that tries to say
well I mean if you see this unit will
the detector all these not having any of
these specializations seemed really
really really difficult to me
I do also like that Blizzard was teasing
or even trolling you sort of almost yeah
pulling you in into this really
difficult challenge they have any aware
and that what's what's the interest from
the perspective of Blizzard except just
curiosity yeah I think Blizzard has
really understood and and really bring
bring forward these competitiveness of
eSports in games the Starcraft really
kind of sparked a lot of like something
that almost was never seen especially as
I was saying he back in Korea so they
just probably thought well this is such
a pure 1vs1 setup that it would be great
to see if something that can play Atari
or go and then later on chess could
could even tackle these kind of complex
realtime strategy game right so for them
they wanted to see first obviously
whether it was possible if the game they
created was in a way solvable to some
extent and I think on the other hand
they also are a pretty modern company
that innovates a lot so just starting to
understand AI for them to how to bring
AI into games is not is not AI for games
but get a games for AI oh right I mean
both ways I think can work and you up we
obviously did mine use games for AI
right to drive AI progress but Blizzard
might actually be able to do and many
other companies to start to understand
done to the opposite so I think that is
also something they can get out of these
and they definitely we have brainstorm a
lot about about these right but one of
the interesting things to me about
Starcraft and Diablo and these games
that blizzards created is the task of
balancing classes for example sort of
making the game fair from the starting
point and then let's skill the term and
the outcome is there uh I mean can you
first comment there's three races Zerg
Protoss and Terran I don't know if I've
ever said that out loud is that how you
pronounce it Terran yeah yeah I don't
think I've ever seen personally interact
with anybody about Starcraft it's funny
so they seem to be pretty balanced
I wonder if the AI the work that you're
doing with the Alpha star would help
balance them even further is that
something you think about is that
something that Blizzard is thinking
about right so so balancing when you add
a new unit or a new spell type is
obviously possible given that you can
always train or retrain at scale some
agent that might start using that in
unintended ways but I think actually if
you understand how StarCraft has kind of
co-evolved with players in a way I think
it's actually very cool the ways that
many of the things and strategies that
people came up with right so I think
it's we've seen it over and over in
StarCraft that Blizzard comes up with
maybe a new unit and then some players
get creative and do something kind of
unintentional or something that Blizzard
designers that just simply didn't test
or think about and then after that
becomes kind of mainstream in the
community Blizzard patches the game and
and then they kind of maybe weaken that
strategy or or make it actually more
interesting but a bit more balanced so
these kind of continual talk between
players and Blizzard is is kind of what
has defined them actually in actually
most games that the in in stykera but
also in World of Warcraft they would do
that there are several classes and it
would be not good that everyone plays
absolutely the same race or and so on
right so I think they they do care about
balancing of course and they do a fair
amount of testing but it's also
beautiful to to also see how players get
creative anyways and I mean whether I
can be more creative at this point I
don't think so right I mean it's just
sometimes something so amazing happens
like I remember back in in the days like
you have these drop ships that could
drop rivers and that was actually not
thought about that you could drop this
unit that has this what's called Splash
Damage that would basically eliminate
all the enemies workers at once no one
thought that you could actually put them
and really early game do that kind of
damage and then we know things change in
the game but I don't know I think
there's it's quite an amazing
exploration process from both sides
players and Blizzard alike well
it's it's almost like a reinforcement
learning exploration but a the scale of
humans that play that play Blizzard
games is almost on the scale of a
large-scale deepmind RL experiment I
mean if you look at the numbers that's I
mean you're talking about I don't know
how many games but hundreds of thousands
of games probably a month yeah I mean so
that you could it's almost the same as
running RL agents what an aspect of the
problem of StarCraft II things the
hardest is it the like you said the
imperfect information is it the fact
they have to do long term planning is it
the real time aspects we have to do
stuff really quickly is it the fact that
a large action space that you can do so
many possible things or is it you know
in the game theoretic sense there is no
Nash equilibria at least you don't know
what the optimal strategy is because
there's way too many options right
what's is there something that stands
out is just like the hardest the most
annoying thing so when we sort of looked
at the problem and start to define
permit like the parameters of it right
what are the observations what are the
actions it became very apparent that you
know that the very first barrier that
one would hit in StarCraft would be
because of the action space being so
large and as not being able to search
like you could in in chess or or go even
though the search space is vast the main
problem that we identified was that of
exploration right so without any sort of
human knowledge or human prior if you
think about StarCraft and you know how
deep reinforcement learning is algorithm
works work which is essentially by
issuing random actions and hoping that
they will get some wins sometimes so
they could learn so if you think of the
of the action space in StarCraft almost
anything you can do in the early game is
bad because any action involves taking
workers which are mining minerals for
free that's something that the game does
automatically sends them to mine and you
would immediately just take them out of
mining and send them around so just
thinking how how is it gonna be possible
to to get to understand the
these concepts but but even more like
expanding right there's there's these
buildings you can place in other
locations in the map to gather more
resources but the location of the
building is important and you have to
select a worker send it walking to that
location build the building wait for the
building to be built and then put extra
workers there so they start mining that
just that feels like impossible if you
just randomly click to produce that
state desirable state that then you
could hope to learn from because
eventually that may yield to an extra
win right so for me the exploration
problem and due to the actions pace and
the fact that there's not really turns
there's so many turns because the game
essentially peaks at 22 times per second
if you mean that's how they keep
discretize sort of time obviously you
always have to discretize time there's
not no such thing as real time but it's
really a lot of time steps of things
that could go wrong and that definitely
felt a priori like the hardest you
mentioned many good ones I think partial
observability and the fact that there is
no perfect strategy because of the
partial observability those are very
interesting problems we start seeing
more and more now in terms of us we
solve the previous ones but the core
problem to me was exploration and
solving it has been basically kind of
the focus and how we saw the first
breakthroughs so exploration you know in
a multi hierarchical way so like 22
times the second exploration is a very
different meaning than it does in terms
of should I gather resources early or
should I wait or so on so how do you
solve the long-term let's talk about the
internals of alpha stuff so first of all
how do you represent the state of the
game as an input right how do you then
do the long term sequence modeling how
do you build a policy right also what's
the architecture like so alpha star has
obviously several components but
everything passes through what we call
the policy which is a neural network and
that's kind of the beauty of it there is
I could just now give you a neural
network and some weights
and if you fed the right observations
and you understood the actions the same
way we do you would have basically the
agent playing the game there's
absolutely nothing else needed other
than those weights that were trained now
the first step is observing the game and
we've experimented with a few
alternatives the one that we currently
use mixes both spatial sort of images
that you would process from the game
that is the zoomed out version of the of
the map and also assume the inversion of
the camera or the screen as we call it
but also we give to the agent the list
of units that it sees more of as a set
of objects that it can operate on that
is not necessarily required to use it
and we have versions of the game that
play well without this set vision that
is a bit not like how humans perceive
the game but it certainly helps a lot
because it's a very natural way to
encode the game is by just looking at
all the units that there are there they
have properties like health position
type of unit whether it's my unit or the
enemies and that's sort of is kind of
the the summary of the state of the of
the game note that list of units or set
of units that you see all the time well
that's pretty close to the way humans
see it again why do you say it's not
isn't it you're saying the exactness of
it is not yeah other humans the
exactness of it is perhaps not the
problem I guess maybe the problem if you
look at it from how actually humans play
the game is that they play with a mouse
on a keyboard and a screen and they
don't see sort of a structured object
with all the units what they see is what
they see on the screen right yes so
remember that there's a certain
interrupt there's a plot that you showed
with camera base where you do exactly
that right move around and that seems to
converge it to similar performance yeah
I think that's what I we're kind of
experimenting with what's necessary or
not but using the set so actually if you
look at research in computer vision
where it makes a lot of sense to treat
images as two-dimensional arrays there
is actually a very nice paper from
Facebook I think I forgot who the
authors
but I think it's part of gaming's has
group and what they do is they take an
image which is this two-dimensional
signal and they actually take pixel by
pixel and scramble the image as if it
was just a list of pixels and crucially
they encode the position of the pixels
with at the XY coordinates and this is
just kind of a new architecture which we
incidentally also use in StarCraft
called the transformer which is a very
popular paper from last year which
yielded very nice result in machine
translation and if you actually believe
in this kind of or it's actually a set
of pixels as long as you encode X Y it's
okay
then you you could argue that the list
of units that we see is precisely that
because we have each unit as a kind of
pixel if you will and then there XY
coordinates so in that perspective we
without knowing it we use the same
architecture that was shown to work very
well on Pascal on image net and so on so
the interesting thing here is putting it
in that way it starts to move it towards
the way you usually work with language
so what and especially with your
expertise and work in language it seems
like there's echoes of a lot of the way
you would work with natural language in
the way you've approached alpha star
right what's does that help with the
long term sequence modeling there
somehow exactly so so now that we
understand what an observation for a
given time step is we need to move on to
say well there's going to be a sequence
of such observations and an agent will
need to given all that it's seen not
only the current time step but all that
it's seen why because there is partial
observability we must remember whether
we saw a worker going somewhere for
instance right because then there might
be an expansion and the top right of the
map so given that what you must then
think about is there is the problem of
given all the observations you have to
predict the next action and not only
given all the observations but given all
the observations and given all the
actions you've taken predict the next
action and that sounds exactly like
machine translation where and that's
exactly how kind of I saw the problem
especially when you are given supervised
data or replays from humans because the
problem is exactly the same you're
translating essentially a prefix of
observations and actions onto what's
going to happen next which is exactly
how you would train a model to translate
or to generate language as well right
you have a certain prefix you must
remember everything that comes in the
past because otherwise you might start
having noncoherent text and the same
architectures we're using LST MS and
transformers to operate on across time
to kind of integrate all that's
happening in the past those
architectures that work so well in
translation or language modeling are
exactly the same than what the agent is
using to issue actions in the game and
the way we train in moreover for
imitation which is step one of alpha
studies take all the human experience
and try to imitate it much like you try
to imitate translators that translated
many pairs of sentences from French to
English say that sort of principle
applies exactly the same it's you
mightyou it's almost the same code
except that instead of words you have a
slightly more complicated objects which
are the observations and the actions are
also a bit more complicated that than
award is there a self play component
into so once you run out of imitation
right so so indeed you can bootstrap
from human replays but then the agents
you get are actually not as good as the
humans you imitated right so how do you
imitate well we take humans from 3,000
MMR and hire 3,000 MMR is just a metric
of human skill and 3,000 MMR might be
like 50% percentile right so it's just
Kevin average human what's that so maybe
quick pause MMR's ranking scale the
matchmaking rating yeah for players so
street uh remember there's like a master
and a grandmaster with 3000 so 3,000 is
is pretty bad I think it's kind of gold
level it just sounds really good
relative to chess I think oh yeah I know
the the rating is the best in the world
are at 7,000 mm
so 3,000 it's a bit like Eloy indeed
right so 3,300 just allows us to not
filter a lot of the data so we like to
have a lot of data in deep learning as
you probably know so we take these kind
of 3,500 and above but then we do a very
interesting trick which is we tell the
neural network what level they are
imitating so we say these replay you're
gonna try to imitate to predict the next
action for all the actions that you're
gonna see is a 4,000 mm our replay this
one is a 6,000 mm our replay and what
what's cool about this is then we take
this policy that is being trained from
human and then we can ask it to play
like a 3,000 mm our player by setting a
bit saying well okay play like a 3,000
mm our player or play like a 6,000 mm
our player and you actually see how the
policy behaves differently
it gets worse economy if you play like a
gold level player it does less actions
per minute which is the number of clicks
or number of actions that you will issue
in a whole minute and it's very
interesting to see that it kind of
imitates the skill level quite well but
if we ask you to play like a 6000 mm our
player
we tested of course these policies to
see how well they do they actually beat
all the built-in ai's that these are put
in the game but they're nowhere near
6000 mm our players right they might be
maybe around gold level
platinum perhaps so there's still a lot
of work to be done for the policy to
truly understand what it means to win so
far we only asked them ok here is the
screen and that's what happened on the
game until this point what would the
next action be that we asked you know we
asked a pro - now say all this you're
gonna click here or here or there and
the point is experiencing experiencing
wins and losses is very important to
then start to refine otherwise the
policy can get loose can can just go off
policy as we call it that's so
interesting you can at least hope
eventually to be able to control a
policy approximately to be a some MMR
level that's that's so interesting
especially given that you have ground
truth for a lot of these cases right can
ask your personal questions or what's
your mmm
well I haven't played Starcraft 2 so I
am unranked oh is the kind of Lois
League okay so I used to play Starcraft
the first one and but you haven't
seriously played so the best player we
have a deep mind is about five thousand
MMR which is high masters is not at
Grandmaster level
Grandmaster level would be the top 200
players in a certain region like Europe
or America or Asia but for me it would
be hard to say I am very bad at the game
I actually played alpha star a bit too
late and it bit me I remember the whole
team was Oreo you should play yeah and I
was it looks like it's not so good yet
and then I remember I kind of got busy
and waited an extra week and I played
and it really beat me
very badly was it I've heard that feel
it's not an amazing feelings amazing
yeah I mean obviously I tried my best
and I try to also impress my because I
actually played the first game so I'm
still pretty good at micromanagement
um the promise I just don't understand
Starcraft 2 I understand Starcraft and
when I played Starcraft I probably was
consistently like for for a couple years
top 32 in Europe so I was decent but at
the time we didn't have this kind of MMR
system as as well established so it
would be hard to know what what it was
back then so what's the difference in
interface between alpha star and
Starcraft and a human player and
Starcraft is there any significant
differences between the way they both
see the game I would say the way they
see the game there's a few things that
are just very hard to simulate the main
one perhaps which is obvious in
hindsight is what's called cloaked units
which are invisible units so in
StarCraft you can make some units that
you need to have a particular kind of
unit to detect it so these units are
invisible
if you cannot detect them you cannot
target them so they would just you know
destroy your buildings or kill your
workers but
despite the fact you cannot target the
unit there's a shimmer that as a human
you observe I mean you need to train a
little bit you need to pay attention but
you would see this kind of space-time as
space-time like distortion and you
wouldn't know okay there art yeah yeah
it's like a wave thing yeah she's kinda
storsch I don't like it that's really
like the Blizzard term is shimmer
shimmer and so this shimmer professional
players actually can see it immediately
they understand it very well but it's
still something that requires certain
amount of attention and and and it's
kind of a bit annoying to deal with
whereas four alpha star in terms of
vision it's very hard for us to simulate
sort of
you know are you looking at these pixel
in the screen and so on so um the only
thing we can do is we there is a unit
that's invisible over there so alpha
star would know that immediately
obviously still obeys the rules you
cannot attack the unit you must have a
detector and so on but it's it's kind of
one of the main things that it just
doesn't feel there's there's a very
proper way I mean you could imagine are
you you don't have high present know
exactly where it is or sometimes you see
it sometimes you don't but it's it's
just really really complicated to get it
so that everyone would agree oh that's
that's the best way to simulate this
right you know it seems like a
perception problem it is a perception
problem so so the only problem is people
are you ask what's the difference
between how humans perceive the game I
would say they wouldn't be able to tell
a shimmer immediately as it appears on
the screen
whereas alpha star in principal sees it
very sharply right it seems okay it sees
that the beat turned from zero to one
meaning there's now a unit there
although you don't know the unit or you
don't know it you know you know that you
cannot attack it and so on God so that
from from a vision standpoint that
probably is the one that is kind of the
most obvious one then there are things
humans cannot do perfectly even
professionals which is they might miss a
detail or they might have not seen a
unit and obviously as a computer if
there's a corner of the screen that
turns green because a unit enters the
field of view that can go into the
memory of the agent the lsdm and Percy's
there for a while
for whatever for however long is
relevant right and in terms of action it
seems like the rate of action from an
alpha star is comparative if not slower
than professional players but is there
but it's more precise as well right so
so that's that's a very like that's
really probably the one that is causing
us more issues for a couple of reasons
right
the first one is Starcraft has been an
AI environment for quite a few years in
fact I mean I was participating in the
very first competition back in 2010 and
there's really not been a kind of a very
clear set of rules how the actions per
minute the rate of actions that you can
issue is and as a result these agents or
bots that people build in a kind of
almost very cool way they do like 20,000
40,000 actions per minute now now to put
this in perspective a very good
professional human my du 300 to 800
actions per minute they might not be as
precise that's why the range is a bit
tricky to to identify exactly I mean 300
actions per minute precisely is probably
realistic 800 is probably not but you
see humans doing a lot of actions
because they warm up and they kind of
select things and spam and so on just so
that when they need they have the
accuracy so we came into this by not
having kind of a standard way to say
well how do we measure whether an agent
is at human level or not on the other
hand we had a huge advantage which is
because we do imitation learning agents
turned out to act like humans in terms
of rate of actions even Precision's and
imprecisions of actions in the supervise
policy you could see all this you could
see how agents like to spam click to
move here if you played specially Diablo
you would know what I mean I mean you
just duck like spam or a move here
Murphy a move here you're doing
literally like maybe five actions in two
seconds but these actions are not very
meaning meaningful one would have
sufficed so on the one hand we start
from this imitation policy that is at
the ballpark of the actions per millions
of humans because it acts
statistically trying to imitate humans
so we see these very nicely in the
curves that we showed in the blog post
like this these actions per minute and
the distribution looks very human-like
but then of course as self play kicks in
and and that's the part we haven't
talked too much yet but of course the
agent must played getting itself to
improve then there's almost no
guarantees that these actions will not
become more precise or even the rate of
actions is going to increase over them
so what we did and this is probably kind
of the first attempt that we thought was
reasonable is we looked at the
distribution of actions for humans for
certain windows of time and just to give
a perspective because I guess I
mentioned that some of these agents that
are programmatic let's call them they do
40,000 actions per minute professionals
as I said to 300 to 800 so what we
looked is we look at a distribution over
professional gamers and we took
reasonably high actions per minute but
we kind of identify certain cat offs
after which if even if the agent wanted
to act these actions would be dropped
but the problem is this cutoff is
probably set a bit too high and what
ends up happening even though the games
and when we ask the professionals and
the gamers by by and large they feel
like it's playing human-like there are
some Asians that developed maybe
slightly too high a PMS which is actions
per minute combined with the precision
which made people sort of start
discussing a very interesting issue
which is should we have limited these
should we just let it loose and see what
cool things it can come up with right so
this is in itself an extremely
interesting question but the same way
that modeling the shimmer would be so
difficult
modeling absolutely all the details
about muscles and precision and an
tiredness of humans would be quite
difficult right so we're here in kind of
innovating in this sense of okay what
could be maybe the next iteration of
putting more rules that makes the agents
more human-like in terms of restrictions
yeah putting constraints that more
constraints yeah that's really
interesting that's really innovative so
one of the constraints you put on your
on yourself or at least focused in is on
the protoss race as far as I understand
can you tell me about the different
races and how they so protoss terran and
the Zerg how do they compare how do they
interact why did you choose Protoss in
right there is in the dynamics of the
game you seen from a strategic
perspective so Protoss so in stacked of
there are three races indeed in the
demonstration we saw only the Protoss
race so maybe let's start with that one
Protoss is kind of the most
technologically advanced race it has
units that are expensive but powerful
right so in general you want to kind of
conserve your units as you go attack so
you wanna and and then you want to
utilize these tactical advantages of
very fancy spells and so on so forth and
at the same time they're kind of people
say like they're they're a bit easier to
play perhaps right but that I actually
didn't know I mean I just talked to now
a lot to the players that we we work
with TLO and mana and they said oh yeah
Protoss is actually people think is
actually one of the easiest races so
perhaps the easier that doesn't mean
that it's you know obviously
professional players excel at the three
races and there's never like a race that
dominates for a very long time anyway so
if you look at the top and 100 in the
world is there one race that dominates
that list it would be hard to know
because it depends on the regions I
think it's pretty equal in terms of
distribution and Blizzard wants it to be
equal right they don't want they
wouldn't want one race like Protoss to
not be representative in the top place
so definitely like they tried it to be
like the balance right so then maybe the
opposite race of Protoss is zerk dirt is
a race where you just kind of expand and
take over as many resources as you can
and they have a very high capacity to
regenerate their units so if you have an
army it's not that valuable in terms of
losing the whole army is not a big deal
azarkh because you can then rebuild it
and given that you generally accumulate
a huge Bank of resources zurk steep it
will play by a prayer applying a lot of
pressure may be losing their whole army
but then rebuilding it quickly so
although of course every race I mean
there is never I mean they're pretty
diverse I mean there's some unity insert
that are technologically advanced and
they do some very interesting spells and
there's some units in proto's that are
less valuable and you could lose a lot
of them and rebuild them and it wouldn't
be a big deal all right so maybe I'm
missing out maybe I'm gonna say some
dumb stuff but just summary of strategy
so first there's collection of a lot of
resources right so that's one option the
other one is expanding so building other
basins then the other is obviously
attack ability building units and
attacking with those units and then I
don't know what else there is maybe
there is the different timing of attacks
like to attack early attack ready what
are the different strategies that
emerged that you've learned about I've
read the a bunch of people are super
happy that you guys have apparently that
alpha star apparently is discovered
that's really good too what is it
saturate oh yeah they're mind-numbing
online yeah the mineralogist yeah yeah
the in it for greedy amateur players
like myself that's always been a good
strategy you just build up a lot of
money and it just feels good it's just
accumulate and accumulate so thank you
for discovering that yeah validating all
of us but is there other strategies that
you discovered interesting yeah unique
to to this game yeah so if you look at
the kind of not being a stack of two
player but of course Starcraft and
Starcraft 2 and realtime strategy games
in general are very similar I would
classify perhaps the openings of the
game they're very important and
generally I would say there's two kinds
of openings one that's a standard
opening that's generally how players
find
sort of a balance between risk and
economy and building some you needs
early on so that they could defend but
they're not to expose basically but also
expanding quite quickly so this is would
be kind of a standard opening and within
a standard opening then you what you do
choose general is what technology are
you aiming towards so there's a bit of
rock-paper-scissors of you could go for
spaceships or you could go for invisible
units or you could go for I don't know
like massive units that attack against
certain kinds of units but they're weak
against others so standard openings
themselves have some choices like
rock-paper-scissors style of course if
you Scout and you're good at guessing
what the opponent is doing then you can
plane as an advantage because if you
know you're gonna play rock I mean I'm
gonna play paper obviously so you can
imagine that normal standard games in
StarCraft looks like a continuous rock
paper scissor game where you guess what
the distribution of rock paper and
scissor is from the enemy and reacting
accordingly to try to beat it or you
know put the paper out before he kind of
changes his mind from rock to scissors
and then you would be in a weak position
so sorry to pause on that I didn't
realize this element cuz I know it's
true but poker I know I looked at
Lambrakis
you're this is so you're also estimating
trying to guess the distribution to
better and better estimate the
distribution what the Epona's likely to
be doing yeah I mean as a player you
definitely want to have a belief state
over what's up on the other side of the
map and when your belief state becomes
inaccurate when you start having that
serious doubts whether he's gonna play
something that you must know that's when
you scout you wanna then gather
information right it's improving the
accuracy of the belief or improving the
belief state part of the loss that you
try to optimize or is it just in a side
effect it's implicit but you could
explicitly model it and it would be
quite good that's probably predicting
what's on the other side of the map but
so far it's all implicit the lot there's
no no additional reward for predicting
the enemy so there's these standard
opening x' and then there's what people
call
which is very interesting and alpha star
sometimes really likes this kind of
cheese these cheese's what they are is
kind of an all-in strategy you're gonna
do something sneaky you're gonna hide
enemies as hide your own buildings close
to the enemy base or you're gonna go for
hiding your technological buildings so
that you do invisible units and the
enemy just cannot react to detect it and
does lose the game and there's quite a
few of these cheeses and variants of
them and there is where actually the
belief state becomes even more important
because if I spelled your base and I see
no buildings at all any human prayer
knows something's up they might know
well you're hiding something close to my
base should I build suddenly a lot of
units to defend should I actually block
my ramp with workers so that you cannot
come and destroy my base so there's all
these is happening and defending against
Jesus is extremely important and in the
alpha star League many agents actually
develop some cheesy strategies and in
the games we saw against elo and mana
two out of the ten agents were actually
doing these kind of strategies which are
cheesy strategies and then there's a
baron of cheesy strategy which is called
all-in so an all-in strategy is not
perhaps as drastic as oh I'm gonna build
cannons on your base and then bring all
my workers and try to just disrupt your
base and game over or GG as we say in
StarCraft umm there's these kind of very
cool things that you can align precisely
at a certain time mark so for instance
you can generate exactly ten unit
composition that is perfectly five of
these type five of disorder type and
align the upgrade so that at four
minutes and a half let's say you have
these ten units and the upgrade just
finished and at that point that army is
really scary and unless the enemy really
knows what's going on
if you push you might then have an
advantage because maybe the enemy is
doing something more standard it
expanded too much it developed too much
economy and any trade-off badly against
having defenses and the enemy will lose
but it's called all-in because if you
don't win then you're gonna lose
so you see players that do these kind of
strategies if they don't succeed game is
not over I mean they still have a place
and they still gathering minerals but
they will just gg out of the game
because they know well game is over I
gambled and I failed
so if we start entering the game
theoretic aspects of the game it's
really rich and it's really that's why
it also makes it quite entertaining to
watch even if I don't play I still enjoy
watching the game but the agents are
trying to do this mostly implicitly but
one element that we improved in self
plays creating the alpha star League and
the alpha star League is not pure self
play it's trying to create different
personalities of agents so that some of
them will become cheese cheese agents
some of them might become very
economical very greedy like getting all
the resources but then being maybe early
on they're gonna be weak but later on
they're gonna be very strong and by
creating this personality of agents
which sometimes it just happens
naturally that you can see kind of an
evolution of agents that given the
previous generation they trained against
all of them and then they generate kind
of the count the perfect counter to that
distribution but these these agents you
must have them in the populations
because if you don't have them you're
not covered against these things right
it's kind of you wanna you wanna you
know create all sorts of the opponents
that you will find in the wild so you
can be exposed to these cheeses early
aggression later aggression more
expansions dropping units in your base
from the side all these things and pure
self play is getting a bit stack at
finding some subset of these but not all
of these so the alpha star League is a
way to kind of do an example of agents
that they're all playing and in a league
much like people play on battlenet right
they play you play against someone who
does a new cool strategy and you
immediately oh my god I want to try it I
want to play again and these to me was
another critical part of the of the of
the problem which was can we create a
battlenet for agents yeah that's kind of
what the Alpha star league is really
fascinating and where they stick to
their different strategies yeah wow
that's
it's really really interesting so but
that said you were fortunate enough or
just skilled enough to win 5-0 and so
how hard is it to win I mean that's not
the goal I guess I don't know what the
goal is the goal should be to a majority
not five zero but how hard is it in
general to win all matchups
I don't want 1v1 so that's a very
interesting question because once you
see alpha star and superficially you
think well locate one
let's if you some of the games like ten
to one right it lost the game that it
played with the camera interface you
might think well that's that's done
right there's it's it's super human at
the game and that's not really the claim
we really can make actually the claim is
we beat a professional gamer for the
first time Starcraft has really been a
thing that it's been going on for a few
years but moment a moment like this
hasn't not had not occurred before yet
but our decisions impossible to beat
absolutely not right so that's a bit
what's you know kind of that the
difference is the agents play at
Grandmaster level there definitely
understand the game enough to play
extremely well but are they am beatable
do they play perfect no and actually in
StarCraft because of these sneaky
strategies it's always possible that you
might take a huge risk sometimes but you
might get wins right out of out of this
so I think that as a domain it still has
a lot of opportunities not only because
of course we want to learn with less
experience we would like to I mean if I
if I learn to play Protoss I can play
turn and learn it much quicker than
alpha star can right so there are
obvious interesting research challenges
as well but even as as the raw like as
the raw performance goes really the
claim here can be we are at pro level or
at high Grandmaster level but obviously
the players also did not know what to
expect right this kind of their prior
distribution was a bit off because they
played this kind of new like alien brain
as they like to say drive and
that's what makes it exciting for them
but also I think if you look at the
games closely you see there were
weaknesses in some points maybe alpha
star did not scout or if it had got
invisible units going against at certain
points it wouldn't have known and it
would have been bad so there's still
quite a lot of work to do but it's
really a very exciting moment for us to
be seeing Wow a single neural net on a
GPU is actually playing against these
guys who are amazing I mean you have to
see them play in life if they're really
really amazing players yeah I'm sure
there's there's the most there must be a
guy in Poland somewhere right now
training his butt off to make sure that
this never happens again with alpha star
so that's really exciting in terms of
alpha star having some holes to exploit
yeah it's just great and then you build
on top of each other and it feels like
Starcraft I'll like go even if you win
it's still not there it's still not
there's so many different dimensions in
which you can explore so that's really
really interesting do you think there's
a ceiling to alpha star you've said that
it hasn't reached you know it's this is
a big wait what you know let me actually
just pause for a second how did it feel
to come here to this point to beat a top
professional player like that night I
mean you know Olympic athletes have
their gold medal right this is your gold
medal and sense sure you're cited a lot
you published a lot of prestige papers
whatever but this is like a win
how did it feel I mean it was for me was
unbelievable because first the win
itself me was so exciting I mean d so
looking back to those last days of 2018
really well that's when the games were
played I'm sure I look back at that
moment I say oh my god I want to be it
like in a project like that it's like I
already feel the nostalgia of like yeah
that was huge in terms of the energy and
the team effort that went into it and so
in that sense as soon as it happened I
already knew it was kind of I was losing
it a little bit so it is almost like sad
that it happened and all
like but on the other hand it also
verifies the approach but to me also
there's so many challenges and
interesting aspects of intelligence that
even though we can train a neural
network to play at the level of the best
humans there's still so many challenges
so for me it's also like well this is
really an amazing achievement but I
already was also thinking about next
steps I mean as I said these agents play
Protoss vs. Protoss but they should be
able to play a different race much
quicker right so that would be an
amazing achievement some people call
this matter reinforcement learning meta
learning and so on right so there's so
many possibilities after that moment but
the moment itself it really felt great
it's I we had this bet so I I'm kind of
a pessimist in general so I kind of send
an email to the team and said okay let's
again steal offers right like what's
gonna be the result and I really thought
we would lose like five zero right III
we had some calibration made against the
5000 MMR player TLO was much stronger
than that player even if he played
Protoss which is his off race but yeah I
was not imagining we would win so for me
that was just kind of a test run or
something and then it really kind of he
was really surprised and unbelievably we
went to this to this bar to celebrate
and and dave tells me well why don't we
invite someone who is a thousand mm are
stronger in proto's like an actual
Protoss player like like that it turned
up being man all right and you know we
had some drinks and I said sure why not
but then I thought well that's really
gonna be impossible to beat I mean even
because it's so much I had a thousand
MMR is really like 99% probability that
mana would beat TLO as Protoss vs.
Protoss right so we did that and to me
the second the second game was much more
important even though a lot of
uncertainty kind of disappeared after we
we cannot beat the yellow I mean it he
is a professional player so that was
kind of over that's really a very nice
achievement
but mana really was at the top and you
could see he played much better but our
agents got much better too so it's a and
then after the first game I said if we
take a single game at least we can say
we beat a game I mean even if we don't
beat the series for me that was a huge
relief and I mean I remember hugging
them is and I mean it was it was really
like this moment for me will resonate
forever as a researcher and I mean as a
person and is a really like great
accomplishment and it was great also to
be there with the team in the room I
don't know if you saw like so it was
really like I mean from my perspective
the other interesting thing is just like
watching Kasparov watching mana was also
interesting because he didn't he is kind
of a loss of words I mean whenever you
lose I've done a lot of sports you
sometimes say excuses you look for
reasons right and he couldn't really
come up with a reason yeah yeah I mean
so with the off race for Protoss you
could say it was it felt awkward it
wasn't but here it was yeah it was it
was just beaten and it was beautiful to
look at a human being being superseded
by an AI system I mean it's a it's a
beautiful moment for researchers so yeah
for sure it was it was I mean probably
the highlight of my career so far
because of its uniqueness and coolness
and I don't know I mean it's obviously
as you said you can look at paper
citations and so on but these these
really is like a testament of the whole
machine learning approach and using
games to advance technology I mean it's
really it really was everything came
together at that moment that that's
really the summary also on the other
side it's a popularization of AI too
because just like traveling to to the
moon and so on I mean this is where a
very large community of people that
don't really and no way I get to really
interact with it which is very important
I mean it's really we must you know
writing papers helps our peers
researchers to understand what we're
doing but I think AI is becoming mature
enough that we must sort of try to
explain what it is and perhaps through
games is an obvious way because these
games always had built
so it may be everyone experience an AI
playing a video game even if they don't
know because there's always some
scripted element and some people might
even call that AI already right so what
are other applications of the approaches
underlying alpha star that you see
happening there's a lot of echoes of he
said transformer of language modeling so
on have you already started thinking
where the breakthroughs in alpha star
get expanded to other applications right
so I thought about a few things for like
kind of next months next year's the main
thing I'm thinking about actually is
what's next as a kind of a grand
challenge because for me like we've seen
Atari and then there's like the sort of
three-dimensional worlds that we've seen
also like pretty good performance from
these capture-the-flag
agents that also some people at deep
mine and elsewhere are working on we've
also seen some amazing results on like
for instance dota 2 which is also a very
complicated game so for me like the main
thing I'm thinking about is what's next
in terms of challenge so as a researcher
I see sort of two tensions between
research and then applications or areas
or domains where you apply them so on
the one hand we've done thanks to the
application of StarCraft is very hard we
develop some techniques some new
research that now we could look at
elsewhere like are there other
applications where we can apply this and
the obvious ones absolutely you can
think of feeding back to sort of the
community we took from which was mostly
sequence modeling or natural language
processing so we've developed an
extended things from the transformer and
and we use pointer networks we combine
LSD and transformers in interesting ways
so that perhaps the kind of lowest
hanging fruit of feeding back to now
different fields of machine learning
that's not playing video games let me go
old-school and jump to the
to mr. Alan Turing yeah so the Turing
test you know there's a natural language
test the conversational test what's your
thought of it as a test for intelligence
do you think it is a grand challenge
that's worthy of undertaking maybe if it
is would you reformulate it or phrase it
somehow differently right so I really
love the Turing test because I also like
sequences and language understanding and
in fact some of the early work we did in
machine translation we tried to apply to
apply to kind of a neural chat bot which
obviously would never pass the Turing
test because it was very limited but it
is a very fascinating fascinating idea
that you could really have an AI that
would be indistinguishable from humans
in terms of asking or conversing with
with it right so I think the test itself
seems very nice and it's kind of well
defined actually like the passing it or
not I think there's quite a few rules
that feel like pretty simple and and and
you know you could you could really like
have I mean I think they have these
competitions every year yes of the
laudner prize but I don't know if you've
seen a I don't know if you've seen the
kind of bots that emerge from that
competition they're not quite as what
you would so it feels like that there's
weaknesses with the way tauren
formulated it it needs to be that the
definition of a genuine rich fulfilling
human conversation it needs to be
something else like the Alexa prize
which I'm not as well familiar with has
tried to define that more I think by
saying you have to continue keeping a
conversation for 30 minutes something
like that
so basically forcing the agent not to
just fool but to have an engaging
conversation kind of thing is that I
mean is is this have you thought about
this problem richly like as and if if
you have in general how how far away are
we from you worked a lot on language
understanding language generation but
the full dialogue the conversation you
know just sitting at the bar having a
couple of beers for an hour and that
kind of conversation have you thought
about yeah so I think you touched here
on the critical point which is
feasibility right so so there's there's
a great sort of essay by Hamming which
describes sort of grand challenges of
physics and he argues that well okay for
instance teleportation or time travel
our great grand challenges of physics
but there's no attacks we really don't
know or cannot kind of make any progress
so that's why most physicists and so on
they don't work on these in their PhDs
and and as part of their careers so I
see the Turing test as I in the full
Turing test as a bit still too early
like I am I think we're especially with
the current trend of deep learning
language models we've seen some amazing
examples I think GPD to being the most
recent one which is very impressive but
to understand to fully solve passing or
fooling a human to think that you're
that there's a human on the other side I
think we're quite far so as a result I
don't see myself and I probably would
not recommend people doing a PhD on
solving the Turing test because it just
feels it's kind of too early or too hard
of a problem yeah but that said you said
the exact same thing about Starcraft
about a few years ago so into damage so
I prefer yeah you'll probably also be
the person who passes the Turing test in
three years I mean I think I think that
yeah so so we have the Sun record this
is nice it's really I mean that the it's
true that progress sometimes is a bit
unpredictable I really wouldn't have not
even six months ago I would not have
predicted the level that we see that
these agents can deliver at Grandmaster
level but I I have worked on language
enough and basically my concern is not
that something could happen a
breakthrough could happen that would
bring us to solving or passing the
Turing test is that I just think the
statistical approach to it like this it
is not is not gonna cut it so we need we
need the breakthrough we
is great for the community but given
that I think there's quite a more
uncertainty whereas for StarCraft I knew
what the steps would be to kind of get
us there I think it was clear that using
the imitation learning part and then
using these battlenet for agents were
gonna be key and and it turned out that
this was the case and a little more was
needed but not much more for Turing test
I just don't know what the plan or
execution plan would look like so that's
why I'm I myself working on it as a
grand challenge is hard but there are
quite a few sub challenges that are
related that you could say well I mean
what if you create a great assistant
like Google already has like the Google
assistant so can we make it better and
can we make it fully new role and so on
that I start to believe maybe we're
reaching a point where we should attempt
these challenges like this conversation
so much because the echo is very much to
start a conversation it's exactly how
you approach StarCraft let's break it
down into small pieces solve those and
you end up solving the whole game great
but that said you you're behind some of
the sort of biggest pieces of work and
deep learning in the last several years
so you mentioned some limits what do you
think of the current limits of deep
learning and how do we overcome those
limits so if I had to actually use a
single word to define the main challenge
in deep learning is a challenge that
probably has been the challenge for many
years and is that of generalization so
what that means is that all that we're
doing is fitting functions to data and
when the data we see is not from the
same distribution or even if there some
times that it is very close to
distribution but because of the way we
train it with limited samples we then
get to this stage where we just don't
see generalization as much as we can
generalize and I think adversarial
examples are a clear example of these
but if you study machine learning and
literature and you know the reason why
as VMs came very popular where because
they were dealing and they
had some guarantees about generalization
which is unseen data or out of
distribution or even within distribution
where you take an image adding a bit of
noise
these models fail so I think really I
don't see a lot of progress on
generalization in in the strong
generalization sense of the word I I
think our neuron neural networks you can
always find design examples that will
make their outputs arbitrary which is
which which is not good because we
humans would never be fooled by these
kind of images or manipulation of the
image and if you look at the mathematics
you kind of understand this is a bunch
of matrices multiplied together
there's probably numerix and instability
that you can just find corner cases so I
think that's really the underlying topic
many times we see when even even at the
grand stage of like doing test
generalization I mean if used if you
start
I mean passing the Turing test should
you should it be in English or should it
be in any language right I mean as a
human if you could you could if you ask
something in a different language you
actually will go and do some research
and try to translate it and so on
shoot the Turing test in clementa
include that right and it's really a
difficult problem and very fascinating
and very mysterious actually yeah
absolutely but do you think it's if you
were to try to solve it can you not grow
the size of data intelligently in such a
way that the distribution of your
training set does include the entirety
of the testing set I think is that one
path the other path is totally new
methodology right it's not statistical
so a path that has worked well and it
worked well in in stark Ravin in machine
translation and in languages scaling up
the data and the model and that's kind
of been maybe the only single formula
that the leap still delivers today in
deep learning right it's it's that scale
data scale and model scale really do
more and more of the things that we
thought oh there's no way it can
generalize to these Ori there's no way
it can generalize to that but I don't
think fundamentally it
resolve with these and for instance I'm
really liking some style or approach
that would not only have neural networks
but it would have programs or some
discrete decision-making because there
is what I feel there's a bit more like
like I mean the example of the best
example I think for understanding
disease I also worked a bit on or like
we can learn an algorithm with a neural
network right so you give it many
examples and it's going to sort your
sort the input numbers or something like
that but really strong generalization is
you give me some numbers or you ask me
to create an algorithm that sorts
numbers and instead of creating a neural
net which will be fragile because it's
gonna go out of range at some point
you're gonna give you numbers that are
too large to small and whatnot
you just if you just create a piece of
code that sorts the numbers then you can
prove that that will generalize to
absolutely all the possible inputs you
could give so I think that's the problem
comes with some exciting prospects I
mean scale is a bit more boring but it
really works and then maybe programs and
these critics tractions are a bit less
developed but clearly I think they're
quite exciting in terms of future for
the field do you draw any insight wisdom
from the 80s and expert systems and
symbolic systems about computing do you
ever go back to those reasoning that
kind of logic do you think that might
make a comeback you have to dust off
those books yeah I actually love
actually adding more inductive biases to
me the problem really is what are you
trying to solve if what you're trying to
solve is so important that try to solve
it no matter what then absolutely use
rules use domain knowledge and then use
a bit of the magic of machine learning
to empower to make the system as the
best system that will detect cancer or
you know or detect weather patterns
right or in terms of start of it also
was a very big challenge so I was
definitely happy that if we had to get
take cut a corner here and there it
could have been
interesting to do and in fact in
StarCraft we we start thinking about
expert systems because it's very you
know you can define I mean people
actually build stack reports by thinking
about those principal I guess you know
state machines and rule-based and then
you could you could think of combining a
bit of a rule-based system but that has
also neural networks incorporated to
make it generalize a bit better so
absolutely I mean we should we should
definitely go back to those ideas and
anything that makes the problem simpler
as long as your problem is important
that's okay and that's research driving
a very important problem and on the
other hand if you wanna really focus on
the limits of reinforcement learning
then of course you must try not to look
at imitation data or to look some like
for some rules of the domain that would
help a lot or even feature engineering
right so these these attention that
depending on what you do I think both
both ways are definitely fine and I
would never not do one or the other if
you're as long as you what you're doing
is important and needs to be soft right
right so there's a bunch of different
ideas that that that you develop that I
really enjoy so but one one is
translating from the image captioning
translated finish the text just just
another just beautiful yeah beautiful
idea I think that resonates throughout
your work actually so the underlying
nature of reality being language always
yes somehow so what's the connection
between images and text rather the
visual world and the world of language
in your view right so I think a piece of
research that's been central to I would
say even extending into Starcraft is is
this idea of sequence to sequence
learning which what we really meant by
that is that you can you can now really
input anything to a neural network as
the input X and then the neural network
will learn a function f that will take X
as an input and produce any output Y and
these x and y's don't need to be like
static or like a features like as
like a fixed vectors or anything like
that it could be it really sequences and
now beyond like data structures right so
that paradigm was tested in a very
interesting way when we moved from
translating French to English to
translating an image to its caption but
the beauty the beauty of it is that
really and that's actually how it
happened I run I change the line of code
in this thing that was doing machine
translation I and I came the next day
and I saw how it like it was producing
captions that seemed like oh my god this
is really really working and the
principle is the same right so I think I
don't see text vision speech waveforms
as something different here as long as
you basically learn a function that will
vector eyes you know these into and then
after we vectorize it we can then use
you know transformers LS DMS whatever
the flavor of the month of the model is
and then as long as we have enough
supervised data really this formula will
work and we'll keep working I believe to
some extent model of these
generalization issues that I mentioned
before so but the testers to vectorize
sort of former representation that's
meaningful nothing and your intuition
now having worked with all this media is
that once you are able to form that
representation you can basically take
any things any sequence is there go back
to Starcraft is there limits on the
length so we didn't really touch on a
long term effect how did you overcome
the whole really long term aspect of
things here is there some tricks or so
the main streak so Starcraft if you look
at absolutely every frame you might
think it's it's quite a long game so we
would have to multiply 22 times 60
seconds per minute times maybe at least
10 minutes per game on average so there
are quite a few frames but the trick
really was to only observe in fact which
might be seen as a limitation but
it is also computational advantage only
observe when you act and then what the
neural network decides is what is the
gap gonna be until the next action and
if you look at most Starcraft games that
we have in the in the data set that
Blaser provided it turns out that most
games are actually only I mean it is
still a long sequence but it may be like
a thousand to 1,500 actions which if you
start looking at L STM's large LST M's
transformers it's it's not like it's not
that that difficult especially if you
have supervised learning if you had to
do it with reinforcement learning the
credit assignment problem what is it
that in this game that made you win that
would be really difficult but thankfully
because of imitation learning we didn't
kind of have to deal with these directly
although if we had to we tried it and
what happen is you just take all your
workers and attack with them and that
sort of is kind of obvious in retrospect
because you start trying random actions
one of the actions will be a worker that
goes to the enemy base and because it's
self play it's not gonna know how to
defend because it basically doesn't know
almost anything and eventually what you
develop is this take our workers and
attack because the the create assignment
issue in Arad is really really hard I do
believe we could do better and that's
maybe a research challenge for the
future but yeah even even in StarCraft
the sequences are maybe a thousand which
I believe there is within the realm of
what transformers can do yeah I guess
the difference between Starcraft and go
is in go and chest stuff starts
happening right away right so there's
not yeah it's pretty easy to self play
not easy but to sulfa is possible to
develop reasonable strategy as quickly
as opposed to Starcraft meaning go
there's only 400 actions but one action
is what people would call the god action
that would be if you had expanded the
whole search tree that's the best action
if you did minimax or whatever algorithm
you would do if you had the
computational capacity
but in StarCraft the 400 is miniscule
like a in 400 you don't even like you
you couldn't even click on the pixels
around a unit right so I think the
problem there is in terms of action
space size is way harder so and that
surge is impossible so there's quite a
few challenges indeed that make this
kind of a step step up in terms of
machine learning for humans
maybe they playing Starcraft it seems
more intuitive because it's looks real I
mean you know like the graphics and
everything moves smoothly
whereas I I don't know how to come in go
is a game that I wouldn't really mean to
study it feels quite complicated but for
machines kind of maybe easier reverse
yes
which shows you the gap actually between
deep learning and however the heck our
brains work so you developed a lot of
really interesting ideas it's
interesting to just ask what's the
what's your process of developing new
ideas
do you like brainstorming with others do
you like thinking alone do you like like
was it eating good fellow said he came
up with Gans after a few beers right he
thinks beers are essential yeah coming
up with new ideas we had beers to decide
to play another game game of Starcraft
after a week so it's really similar to
that story actually I explained this in
a in a deep mind retreat and I said this
is the same as the gun story I mean we
were wearing a bar and we decided let's
play again next week and that's what
happened I feel like we're giving the
wrong message to young undergrads yeah
but in general like yeah do you like
brainstorming do you like thinking alone
working stuff out and so I think I think
throughout the years also things changed
right so initially I was very fortunate
to be with great minds like Geoff Hinton
Jeff Dean Ilya sutskever I was really
fortunate to join brain at a very good
time so at that point it ideas I was
just kind of brainstorming with my
colleagues and learned a lot and keep
learning is actually something you
should never stop doing right so
learning implies reading papers and also
these casting ideas with others it's
very hard at some point to not
communicate that being reading
paper forms from someone or actually
discussing right so definitely that
communication aspect needs to be there
whether it's written or oral nowadays
I'm also trying to be a bit more
strategic about what research to do so I
was describing a little bit this sort of
tension between research for the sake of
research and then you have on the other
hand applications that can drive the
research right and honestly the formula
that has worked best for me is just find
a hard problem and then try to see how
research fits into it how it doesn't fit
into it and then you must innovate so I
think machine translation drove sequence
to sequence then maybe like learning
algorithms that had to like
combinatorial algorithms led to pointer
networks Starcraft led to really scaling
a permutation learning and the Alpha
star league so that's been a formula
that I personally like but the other one
is also about it and I seen it succeed a
lot of the times where you just want to
investigate model-based RL as a kind of
a research topic and then you must then
start to think well how are the tests
how are you going to test these ideas
you need to kind of a minimal
environment to try things you need to
read a lot of papers and so on and
that's also very fun to do and something
I've also done quite a few times both at
brain at the mine and obviously as as a
PhD so so I think besides that the ideas
and discussions I think it's important
also because you start sort of guiding
not only your own goals but other
people's goes to the next breakthrough
so you you must really kind of
understand these you know feasibility
also as we were discussing before right
whether whether these domain is ready to
be tackled or not and you don't want to
be too early you obviously don't want to
be too late so it's it's really
interesting and this is a strategic
component of research which I think as a
grad student I just had no idea to you
know I just read papers and discussed
ideas and I think this has been maybe
the major change and I recommend
people kind of feed forward to success
how it looks like and try to backtrack
other than just kind of looking out
these looks cool these looks cool and
then you do a bit of random work which
sometimes you stumble upon some
interesting things but in general it's
it's also good to plan a bit yeah I like
it
especially like your approach I've taken
a really hard problem stepping right in
and then being super skeptical about
yeah being robbed I mean there's a
balance of both right there's a silly
optimism and and a critical sort of
skepticism that's good to balance which
is why it's good to have a team of
people that that balance that you don't
do that on your own you have both
mentors that have seen or you obviously
wanna chat and discuss whether it's the
right time I mean Demi's came in 2014
and he said maybe in a bit we'll do
starcraft and maybe he knew and that's
and I'm just following his lead which is
great because he's he's brilliant right
so these these things are obviously
quite important that you wanna be
surrounded by people who you know are
diverse they they have their knowledge
there's also important too I mean I I've
learned a lot from people who actually
have an idea that I might not think it's
good but if I give them the space to try
it I've been proven wrong many many
times as well so that's that's great
it's I think it's your colleagues are
more important than yourself I think so
sure now let's real quick talk about
another impossible problem AGI right
what do you think it takes to build a
system that's human level intelligence
we talked a little bit about the
Tauranga stark after all these have
echoes of general intelligence but if
you think about just something that you
would sit back and say wow this is
really something that resembles human
level intelligence what do you think it
takes to build that so I find that AGI
oftentimes is maybe not by well defined
so what I'm trying to then come up with
for myself is what would be a result
look like that you would
start to believe that you would have
agents or neural nets that no longer
sort of over feet to a single task right
but actually kind of learn the skill of
learning so to speak and that actually
is a field that I am fascinated by which
is the learning to learn or meta
learning which is about no longer
learning about a single domain so you
can think about the learning algorithm
itself is general right so the same
formula we applied for alpha star or
Starcraft we can now apply to kind of
almost any video game or you could apply
to many other problems and domains but
the algorithm is what's kind of
generalizing but the neural network the
weights those weights are useless even
to play another race right I train a
network to play very well at Protoss vs.
Protoss I need to throw away those
weights if I want to play now
terran vs terran i would need to retrain
a network from scratch with the same
algorithm that's beautiful
but the network itself will not be
useful so I think when I if I see an
approach that can observe or start
solving new problems without the need to
kind of restart the process I think that
to me would be a nice way to define some
form of AGI again I don't know the grand
views like age I mean so it should
Turing test nice or before AGI I mean I
don't know I think I think concretely I
would like to see clearly that meta
learning happened meaning there there is
an architecture or network that as it
sees new new problem or new data it
solves it and to make it kind of a
benchmark it should solve it at the same
speed that we do solve new problems when
I define your new object and you have to
recognize it when I when you start
playing a new game you played all the
times but now you play a new Atari game
well you you're gonna be pretty quickly
pretty good at the game so that's
perhaps what's the domain and what's the
exact benchmark is a bit difficult I
think as a community we might need to do
some work to define it
but I think this first step I could see
it happen relatively soon but then the
whole what a GI means and so on I am a
bit more confused about what I think
people mean different things there's an
emotional psychological level that like
the even the Turing test passing the
Turing test is something that we just
passed judgment on human beings what it
means to be you know as a as a dog in a
GI system yeah like what level what does
it mean right yeah what does it mean but
I like the generalization and maybe as a
community would converge towards a group
of domains that are sufficiently far
away that would be really damn
impressive if we're able to generalize
some perhaps not as close as Protoss and
Zerg but like Wikipedia step be a good
stuff and then a really good step but
then then like Wickham Starcraft 2
Wikipedia yeah I'm back yeah that kind
of thing and that that feels also quite
hard and far but I think there's as long
as you put the benchmark out as we
discovered for instance with image net
then tremendous progress can be had so I
think maybe there's a lack of benchmark
but I'm sure we'll find one and yeah a
community will will then work towards
that and then beyond what a GI might
mean or would imply I really am hopeful
to see basically machine learning or AI
just scaling up and helping you know
people that might not have the resources
to hire an assistant or that they might
not even know what the weather is like
but you know so I think there's in terms
of the impact the positive impact of AI
I think that's maybe what we should also
not lose focus right the research
community building AG I mean that's a
real nice girl but man I think the way
that deep mind puts it is and then use
it to solve everything else right so I
think we should paralyze yeah we
shouldn't forget about all the positive
things that are actually coming out of
it already and I are not going to be
coming out right but
that I know let me ask relative the
popular perception do you have any worry
about the existential threat of
artificial intelligence in the near or
far future that some people have I think
I'm in the near future I'm I'm skeptical
so I hope I'm not wrong but I'm I'm not
concerned but I I appreciate efforts
ongoing efforts and even like whole
research fields on AI safety emerging
and in conferences and so on I think
that's great in the long term I really
hope we just can simply have the
benefits outweigh the potential dangers
I am hopeful for that but also we must
remain vigilant to kind of monitor and
assess whether the trade-offs are are
there and and we have you know enough
also lead time to prevent or to redirect
our efforts if need be right so but I'm
quite I'm quite optimistic about the
technology and definitely more fearful
of other threats in terms of planetary
level at this point but obviously that's
the one I kind of have more like power
on so clearly I do start thinking more
and more about this and it's kind of
it's groaning me actually to to start
reading more about AI safety jeez afield
that so far I have not really
contributed to but maybe there's
something to be done there as well I
think it's really important you know I
would talk about this issue folks but
it's important to ask you and shove it
in your head because you're at the
leading edge of actually what people are
excited about nay I I mean the work with
alpha star it it's arguably at the very
cutting edge of the kind of thing that
people are afraid of and so you speaking
to that fact and that we're actually
quite far away to the kind of thing that
people might be afraid of but it's still
something worthwhile to think about and
it's also good that you're
the you're not as worried and you're
also open to us yeah me Maura
there's two aspects I mean me not being
worried but obviously we should prepare
for for for it right for for like
forever for things that could go wrong
misuse of the technologies as with any
technologies right so I think there's
there's always trade-offs and I I as a
society we've kind of solved these to
some extent within the past so I'm
hoping that by having the researchers
and the whole community brainstorm and
come up with interesting solutions to
the new things that will happen in the
future
that we can still also push the research
to the Avenue that I think is kind of
the greatest Avenue which is to
understand intelligence right how are we
doing what we're doing and you know
obviously from a scientific standpoint
that is kind of the drive my personal
driver of all the time that I spend
doing what I'm doing really what do you
see the deep learning as a field heading
what do you think the next big big
breakthrough might be so I think deep
learning I I discuss a little of this
before deep learning has to be combined
with some form of discretization program
synthesis I think that's kind of as a
research in itself is an interesting
topic to expand and start doing more
research and then as kind of what will
deep learning and able to do in the
future I don't think that's gonna be
what's gonna happen this year but also
this idea of starting not to throw away
all the weights that the this idea of
learning to learn and really having
these agents not having to restart their
weights and you you can have an agent
that is kind of solving or classifying
images on image net but also generating
speech if you ask it to generate some
speech and and it should really be kind
of almost the same network but might not
be a neural networking might be a neural
network with a optimization algorithm
attached to it but I think this idea of
generalization to new tasks is something
that we first must define good
benchmarks but then I think that's gonna
be exciting and I'm not
sure how close we are but I think
there's the pet if you have a very
limited domain I think we can start
doing some progress and much like how we
did a lot of programs in computer vision
we should start thinking am I really
like a talk that gave that Leon blue to
give gave at ICML a few years ago which
is this train test paradigm should be
broken we we know we should stop
thinking about a training test at
Acharya training set and a test set and
these are closed you know things that
are untouchable I think we should go
beyond these and in meta learning we
call these the meta training set and the
meta test set which is really thinking
about if I know about imagenet why would
that network not work on M NIST which is
a much simpler problem but right now it
really doesn't it you know yeah and but
it just feels wrong right so I think
that's kind of the there's the on the
application or the benchmark sites we we
probably will see quite a few more
interest and progress and hopefully
people defining new and exciting
challenges really do you have any hope
or interest in knowledge graphs within
this context so just kind of totally
yeah constructing graph so going back
that graphs yap well okay neural
networks and graphs but I mean a
different kind of knowledge graph sort
of like semantic graphs or there's
concepts yeah so I think I think the the
idea of graphs is is so I've been quite
interested in sequences first and then
more interesting or different data
structures like graphs and I've studied
graph narrow networks in the last three
years or so I found these models just
very interesting from like deep learning
sites standpoint but then how what do we
want why do we want these models and and
why would we use them what's the
application what's kind of the killer
application of graphs right and perhaps
if we could extract a knowledge
Graff from Wikipedia automatically right
um that would be interesting because
then these graphs have this very
interesting structure that also is a bit
more comfortable with this idea of
programs and deep learning kind of
working together
the jumping neighborhoods and so on you
could imagine defining some primitives
to go around graphs right so I think I
really like the idea of a knowledge
graph and in fact when we we started or
you know as part of the research we did
for StarCraft I thought wouldn't it be
cool to give the graph of you know all
the prerequisites like this all these
buildings that depend on each other and
units that have prerequisites of being
built by that and so this is information
that the network can learn and extract
but it would have been great to see um
or to think of really stack graph as a
giant graph that even also as the game
evolves use kind of star trek taking
branches and so on and we tried we read
a bit of research on these nothing too
relevant but I I really like the idea
and it has elements that are which
something you also worked with in terms
of visualizing your networks as elements
of having human interpretable being able
to generate knowledge representations
that are human interpretable that maybe
human experts can then tweak or at least
understand so there's there's a lot of
interesting aspect there and for me
personally I'm just a huge fan of
Wikipedia and it's it's a shame that our
neural networks aren't taking advantage
of all the structured knowledge that's
on the web what's next for for you
what's next for deep mind what are you
excited about what a four alpha star
yeah so I think the obvious next steps
would be to apply alpha star to other
races I mean that's sort of shows that
the algorithm works because we wouldn't
want to have created by mistake
something in the architecture that
happens to work for Protoss but not for
other races right so as verification I
think that's an obvious next step that
we are working on and then I would like
to see so agents and players can
specialize on
different skill sets that allow them to
be very good I think we've seen alpha
star understanding very well when to
take battles and when to not to do that
do that also very good at
micromanagement and moving the units
around and so on and also very good at
producing non-stop and trading of
economy with building units but I have
not perhaps seen as much as I would like
this idea of the poker idea that you
mentioned right I'm not sure Starcraft
or alpha star rather has developed a
very deep understanding of what the
opponent is doing and reacting to that
and sort of trying to to to trick the
player to do something else or that you
know so this kind of reasoning I would
like to see more so I think purely from
a research standpoint there's perhaps
also quite a few of you things to be
done there in the domain of StarCraft
yeah in a domain of games I've seen some
interesting work in sort of in even
auctions manipulating other players so
forming a belief state and just messing
with people yeah about theory of mind
yeah yeah yeah this is a theory of mine
on star Kirby's kind of they're really
made for each other
yeah so that would be very exciting to
see those techniques applied to
Starcraft or perhaps Starcraft driving
new techniques right as I said this is
always the tension between the two well
Oriol thank you so much for talking
today awesome it was great to be here
thanks
you