Transcript
Kedt2or9xlo • Oriol Vinyals: DeepMind AlphaStar, StarCraft, and Language | Lex Fridman Podcast #20
/home/itcorpmy/itcorp.my.id/harry/yt_channel/out/lexfridman/.shards/text-0001.zst#text/0074_Kedt2or9xlo.txt
Kind: captions Language: en the following is a conversation with Ariane Vinnie Alice he's a senior research scientist at google deepmind and before that he was a Google brain and Berkeley his research has been cited over 39,000 times he's truly one of the most brilliant and impactful minds in the field of deep learning he's behind some of the biggest papers and ideas and AI including sequence the sequence learning audio generation image captioning neural machine translation and of course reinforcement learning he's a lead researcher of the Alpha Star project creating an agent that defeated a top professional at the game of StarCraft this conversation is part of the artificial intelligence podcast if you enjoy it subscribe on youtube itunes or simply connect with me on twitter at Lex Friedman spelled Fri D and now here's my conversation with Arielle Minnie Alice you spearheaded the deepmind team behind alpha star that recently beat a top professional player Starcraft so you have an incredible wealth of work and deep learning in a bunch of fields but let's talk about Starcraft first let's go back to the very beginning even before alpha star before deep mine before deep learning first what came first for you a lot for programming or a love for videogames I think for me it definitely came first the drive to play videogames I really liked computers I didn't really code much but what I would do is I would just mess with the computer break it and fix it that was the level of skills I guess that I gained in my very early days I mean when I was 10 or 11 and then I really got into video games especially Starcraft actually the first version I spent most of my time just playing kind of pseudo professionally as professionally as you could play back in 98 in Europe which was not a very main scene like that what's called nowadays eSports right of course in the 90s so how did you get into StarCraft what was your favorite race how do you develop how did you develop your skill what was your strategy all that kind of thing so as a player I tended to try to play not many games not to kind of disclose the strategies that I kind of developed and I like to play random actually not in competitions but just to I think in StarCraft there's well there's three main races and I found it very useful to play with all of them and so I would choose random many times even sometimes in tournaments to gain skill on the three races because it's not how you play against someone but also if you understand the race because you play it you also understand what's annoying what then when you're on the other side what to do to annoy that person to try to gain advantages here and there and so on so I actually played random although I must say in terms of favorite race I really liked zerk I was probably best at Zerg and that's probably what I tend to use towards the end of my career year before starting University so let's step back a little bit could you try to describe Starcraft to people that may never have played video games especially the massively online variety right so craft so Starcraft is a real-time strategy game and the way to think about Starcraft perhaps if you understand a bit chess is that there are there's a board which is called map or or or the gallic the map where people play against each other there's obviously many ways you can play but the most interesting one is the one versus one setup where you just play against someone else or even the built in AI right the wizard put a system that can play the game reasonably well if you don't know how to play and then in this board you have again pieces like in chess but these pieces are not there initially like they are in chess you actually need to decide to gather resources to decide which pieces to build so in a way you're starting almost with no pieces you start gathering resources in StarCraft there's minerals and gas that you you can gather and then you must decide how much do you want to focus for instance on gathering more resources or starting to build units or pieces and then once you have enough pieces or maybe like attack you know a good attack composition then you go and attack the other side of the map and now the other main difference with chess is that you don't see the other side of the map so you're not seeing the moves of the enemy it's what we call partially observable so as a result you must not only decide Trading of economy versus building your own units but you also must decide whether you want to scout to gather information but also by scouting you might be giving away some information that you might be hiding from the enemy so there's a lot of complex decision-making all in real-time there's also unlike chess this is not a turn-based game you play basically all the time continuously and thus some skill in terms of speed and accuracy of clicking is also very important and people that train for these really play this game at an amazing skill I've seen many times these and if you can witness this life is really really impressive so in a way it's kind of a chess where you don't see the other side of the board you're building your own pieces and you also need to gather resources to basically get some money to build other buildings pieces technology and so on from the perspective of the human player the difference between that and chess or maybe that and a game like turn-based strategy like heroes a might of magic is that there's an anxiety because you have to make these decisions really quickly and if you are not actually aware or what decisions work it's a very stressful balance the if there's everything you describe is actually quite stressful difficult to balance for a mature human player I don't know if it gets easier at the professional level like if they're fully aware what they have to do but at the amateur level there's this anxiety oh crap I'm being attacked oh crap I have to build up resource oh I have to probably expand and all these the time the real-time strategy aspect is really stressful and computation I'm sure difficult we'll get into it but for me battlenet so Starcraft was released in 98 20 years ago which is hard to believe and Blizzard battlenet with Diablo 96 came out and to me it might be a narrow perspective but it changed online gaming and perhaps society forever yeah but I may have made way too narrow viewpoint but from your perspective can you talk about the history of gaming over the past 20 years is this how transformational how important is this line of games right so I think I I kind of was an active gamer whilst this was developing the internet I'm online gaming so for me that the way it came was I played other games strategy related I play a bit of common and conquer and then I played Warcraft 2 which is from Blizzard but at the time I didn't know I didn't understand about what Blizzard was or anything Warcraft 2 was just a game which is which was actually very similar to start off in many ways it's also real-time strategy game and where there's orcs and humans so there's only two races throughs offline and it was offline right so I remember a friend of mine came to to school say oh there's this new cool game called Starcraft and I just said all these sounds like just a copy of Warcraft 2 until I kind of installed it and at the time I am from Spain so we didn't have internet like very good internet right so there was for us a Starcraft became first kind of an offline experience where you kind of start to play these missions right you play against some sort of scripted things to develop the story of the characters in the game and then later on I start playing against the butene AI and I thought it was impossible to defeat it then eventually you defeat one and you can actually break n7 built in the eyes at the same time which also felt impossible but actually it's not that hard to beat seven built-in eyes at once so once we achieve that also we discovered that we could play as I said internet wasn't that great but we could play with the land right on like basically against each other if we were in the same place because you could just connect machines with like cables right so we started playing in LAN mode and again you know as a group of friends and it was really really like much more entertaining than playing against the eyes and later on as internet was starting to develop and being a bit faster and more reliable then it's when I started experiencing battlenet which is these amazing universe not only because of the fact that you can play the game against any way anyone in the world but you can also get to know more people you just get exposed to now like this vast variety of it's kind of a bit when the chats came about right there was a chat system you could play against people where you could also chat with people not only about soccer but about anything and that became a way of life for kind of two years and obviously then it became like an elite exploded didn't mean that I started to play more seriously going to tournaments and so on so forth do you have a sense and a societal sociological level what whole part of society that many of us are not aware of and it's a huge part of society which is gamers I mean every time I come across that in YouTube or streaming sites I mean this is a huge number of people play games religiously do you have a sense of those folks especially now that you've returned to that realm a little bit on the high side yeah so in fact I even after soccer if I actually played World of Warcraft which is mainly the main sort of online world or in presence that you get to interact with lots of people so I played that for a little bit it was to me it was a bit less stressful than Starcraft because winning was kind of a given you just put in this world and you can always complete missions but I think it was actually the social aspect of especially Starcraft first and then games like World of Warcraft really shaped me in a very interesting ways because you had you get to experience it's just people you wouldn't usually interact with right so even nowadays I still have many Facebook friends from the area where I played online and their ways of thinking is even political they just don't we don't live in like we don't interact in there in the real world but we were connected by basically fiber and that way I actually get to understand a bit better that we live in a diverse world and these were just connections that were made by because you know I happened to go in a city in a virtual city as a priest and I met these you know this warrior and we became friends and then we start like playing together right so I think it's it's it's transformative and more and more and more people are more aware of it I mean it's it's becoming quite mainstream but back in the day as you were saying in 2000 2005 even it was very still very strange thing to do especially in in Europe I think there were exceptions like Korea for instance it was amazing like that that everything happened so early in terms of cyber cafes like it's if you go to Seoul it's a city that back in the day Starcraft was kind of you could be a celebrity by playing Starcraft but this was like 99 2000 right it's not like recently so um yeah it's quite it's quite interesting to to look back and and yeah I think it's changing society that the same way of course like technology and social networks and so on are also transforming things and a quick tangent let me ask you're also one of the most productive people in your particular chosen passion and path in life and yet you're also appreciate and enjoy video games do you think it's possible to do to enjoy video games in moderation someone told me that you could choose two out of three when I was playing video games you could choose having a girlfriend playing video games or studying and I think for the most part it was relatively true these things do take time games like stark if you take the game pretty seriously and you wanna study it then you obviously will dedicate more time to it and I definitely took gaming and obviously studying very seriously I loved learning science and etc so to me especially when I started University undergrad I kind of step off Starcraft I actually fully stopped playing and then wall of war curve was a bit more casual you could just connect online and I mean it was it was fun but I as I said that was not as much time investment as it was for me in StarCraft ok so let's get into alpha star what are the your behind the team so deep mine has been working on Starcraft and released a bunch of cool open-source agents and so on the past few years but alpha star really is the moment where the first time you beat a world-class player so what are the parameters of the challenge in the way that alpha star took it on and how did you and David and the rest of deepmind team get into it consider that you can even beat the best in the world or top players I think it all started in back in 2015 actually I'm lying I think it was 2014 when the mine was acquired by Google and I at the time was at Google brain which is it it was in California in California we had this summit where we got together the two groups so Google brain and google deepmind got together and we gave a series of talks and given that they were doing deep reinforcement learning for games I decided to bring up part of my past which I had developed at Berkeley like this thing which we called Berkeley over mine which is really just a Starcraft one but right so I about that and I remember them is just came to me and said well maybe not now it's it's perhaps a bit too early but you should just come to the mine and do this again with deep reinforcement learning right and at the time it sounded very science-fiction for for several reasons but then in 2016 when I actually moved to London and joined in mind transferring from brain it became apparent that because of the alphago moment and kind of Blizzard reaching out to us to say wait like do you want the next challenge and also me being full-time at deep mine it's a sort of kind of all these came together and then I was I went to to air vine in California to the Blizzard headquarters to just chat with them and try to explain how would it all work before you do anything and the approach has always been about the learning perspective right so in in Berkeley we did a lot of rule-based you know conditioning and or if you have more than three units then go attack and if the other has more units than me I retreat and so on and so forth and of course the point of deep reinforcement learning deep learning machine learning in general is that all these should be learned behavior so that kind of was the DNA of the project since its inception in 2016 where we just didn't even have an environment to work with and so these that's how it all started really so if you go back to a conversation with damage or even in your own head how far away did you because that's we're talking about Atari games we're talking about go which is kind of if you're honest about it really far away from Starcraft in a well now that you've beaten it maybe you could say it's close but is it's much it seems like Starcraft is way harder than go philosophically in mathematically speaking so how far away did you did you think you were do you think it's 2019 in 18 you could be doing as well as you have yeah when I when I kind of thought about okay I'm gonna dedicate know a lot of my time and focus on this and obviously I do a lot of different research in deep learning so spending time on it I mean I really had to kind of think there's gonna be something good happening out of this so really I thought well this sounds impossible and it probably is impossible to do the full thing like the all like the full game where you play one versus one and it's only a neural network playing and so on so it really felt like I just didn't even think it was possible but on the other hand I could see some stepping stones like towards that goal clearly you could define subproblems in StarCraft and sort of dissect it a bit and say okay here is a part of the game here's another part and also obviously the fact so this was really also critical to me the fact that we could access human replays right so Blizzard was very kind and in fact they open-source these for the whole community where you can just go and it's not every single Starcraft game ever played but it's a lot of them you can just go and download and every day they will you can just query a dataset and say well give me all the games that were played today and given my kind of experience with language and sequences and supervised learning I thought well that's definitely gonna be very helpful and something quite unique now because ever before we had such a large dataset of replays of people playing the game at this scale of such a complex video game right so that to me was a precious resource and as soon as I knew that Blizzard was able to kind of give these to the community I started to feel positive about something non-trivial happening but but I also thought the full thing like really no rules no no single line of code that tries to say well I mean if you see this unit will the detector all these not having any of these specializations seemed really really really difficult to me I do also like that Blizzard was teasing or even trolling you sort of almost yeah pulling you in into this really difficult challenge they have any aware and that what's what's the interest from the perspective of Blizzard except just curiosity yeah I think Blizzard has really understood and and really bring bring forward these competitiveness of eSports in games the Starcraft really kind of sparked a lot of like something that almost was never seen especially as I was saying he back in Korea so they just probably thought well this is such a pure 1vs1 setup that it would be great to see if something that can play Atari or go and then later on chess could could even tackle these kind of complex realtime strategy game right so for them they wanted to see first obviously whether it was possible if the game they created was in a way solvable to some extent and I think on the other hand they also are a pretty modern company that innovates a lot so just starting to understand AI for them to how to bring AI into games is not is not AI for games but get a games for AI oh right I mean both ways I think can work and you up we obviously did mine use games for AI right to drive AI progress but Blizzard might actually be able to do and many other companies to start to understand done to the opposite so I think that is also something they can get out of these and they definitely we have brainstorm a lot about about these right but one of the interesting things to me about Starcraft and Diablo and these games that blizzards created is the task of balancing classes for example sort of making the game fair from the starting point and then let's skill the term and the outcome is there uh I mean can you first comment there's three races Zerg Protoss and Terran I don't know if I've ever said that out loud is that how you pronounce it Terran yeah yeah I don't think I've ever seen personally interact with anybody about Starcraft it's funny so they seem to be pretty balanced I wonder if the AI the work that you're doing with the Alpha star would help balance them even further is that something you think about is that something that Blizzard is thinking about right so so balancing when you add a new unit or a new spell type is obviously possible given that you can always train or retrain at scale some agent that might start using that in unintended ways but I think actually if you understand how StarCraft has kind of co-evolved with players in a way I think it's actually very cool the ways that many of the things and strategies that people came up with right so I think it's we've seen it over and over in StarCraft that Blizzard comes up with maybe a new unit and then some players get creative and do something kind of unintentional or something that Blizzard designers that just simply didn't test or think about and then after that becomes kind of mainstream in the community Blizzard patches the game and and then they kind of maybe weaken that strategy or or make it actually more interesting but a bit more balanced so these kind of continual talk between players and Blizzard is is kind of what has defined them actually in actually most games that the in in stykera but also in World of Warcraft they would do that there are several classes and it would be not good that everyone plays absolutely the same race or and so on right so I think they they do care about balancing of course and they do a fair amount of testing but it's also beautiful to to also see how players get creative anyways and I mean whether I can be more creative at this point I don't think so right I mean it's just sometimes something so amazing happens like I remember back in in the days like you have these drop ships that could drop rivers and that was actually not thought about that you could drop this unit that has this what's called Splash Damage that would basically eliminate all the enemies workers at once no one thought that you could actually put them and really early game do that kind of damage and then we know things change in the game but I don't know I think there's it's quite an amazing exploration process from both sides players and Blizzard alike well it's it's almost like a reinforcement learning exploration but a the scale of humans that play that play Blizzard games is almost on the scale of a large-scale deepmind RL experiment I mean if you look at the numbers that's I mean you're talking about I don't know how many games but hundreds of thousands of games probably a month yeah I mean so that you could it's almost the same as running RL agents what an aspect of the problem of StarCraft II things the hardest is it the like you said the imperfect information is it the fact they have to do long term planning is it the real time aspects we have to do stuff really quickly is it the fact that a large action space that you can do so many possible things or is it you know in the game theoretic sense there is no Nash equilibria at least you don't know what the optimal strategy is because there's way too many options right what's is there something that stands out is just like the hardest the most annoying thing so when we sort of looked at the problem and start to define permit like the parameters of it right what are the observations what are the actions it became very apparent that you know that the very first barrier that one would hit in StarCraft would be because of the action space being so large and as not being able to search like you could in in chess or or go even though the search space is vast the main problem that we identified was that of exploration right so without any sort of human knowledge or human prior if you think about StarCraft and you know how deep reinforcement learning is algorithm works work which is essentially by issuing random actions and hoping that they will get some wins sometimes so they could learn so if you think of the of the action space in StarCraft almost anything you can do in the early game is bad because any action involves taking workers which are mining minerals for free that's something that the game does automatically sends them to mine and you would immediately just take them out of mining and send them around so just thinking how how is it gonna be possible to to get to understand the these concepts but but even more like expanding right there's there's these buildings you can place in other locations in the map to gather more resources but the location of the building is important and you have to select a worker send it walking to that location build the building wait for the building to be built and then put extra workers there so they start mining that just that feels like impossible if you just randomly click to produce that state desirable state that then you could hope to learn from because eventually that may yield to an extra win right so for me the exploration problem and due to the actions pace and the fact that there's not really turns there's so many turns because the game essentially peaks at 22 times per second if you mean that's how they keep discretize sort of time obviously you always have to discretize time there's not no such thing as real time but it's really a lot of time steps of things that could go wrong and that definitely felt a priori like the hardest you mentioned many good ones I think partial observability and the fact that there is no perfect strategy because of the partial observability those are very interesting problems we start seeing more and more now in terms of us we solve the previous ones but the core problem to me was exploration and solving it has been basically kind of the focus and how we saw the first breakthroughs so exploration you know in a multi hierarchical way so like 22 times the second exploration is a very different meaning than it does in terms of should I gather resources early or should I wait or so on so how do you solve the long-term let's talk about the internals of alpha stuff so first of all how do you represent the state of the game as an input right how do you then do the long term sequence modeling how do you build a policy right also what's the architecture like so alpha star has obviously several components but everything passes through what we call the policy which is a neural network and that's kind of the beauty of it there is I could just now give you a neural network and some weights and if you fed the right observations and you understood the actions the same way we do you would have basically the agent playing the game there's absolutely nothing else needed other than those weights that were trained now the first step is observing the game and we've experimented with a few alternatives the one that we currently use mixes both spatial sort of images that you would process from the game that is the zoomed out version of the of the map and also assume the inversion of the camera or the screen as we call it but also we give to the agent the list of units that it sees more of as a set of objects that it can operate on that is not necessarily required to use it and we have versions of the game that play well without this set vision that is a bit not like how humans perceive the game but it certainly helps a lot because it's a very natural way to encode the game is by just looking at all the units that there are there they have properties like health position type of unit whether it's my unit or the enemies and that's sort of is kind of the the summary of the state of the of the game note that list of units or set of units that you see all the time well that's pretty close to the way humans see it again why do you say it's not isn't it you're saying the exactness of it is not yeah other humans the exactness of it is perhaps not the problem I guess maybe the problem if you look at it from how actually humans play the game is that they play with a mouse on a keyboard and a screen and they don't see sort of a structured object with all the units what they see is what they see on the screen right yes so remember that there's a certain interrupt there's a plot that you showed with camera base where you do exactly that right move around and that seems to converge it to similar performance yeah I think that's what I we're kind of experimenting with what's necessary or not but using the set so actually if you look at research in computer vision where it makes a lot of sense to treat images as two-dimensional arrays there is actually a very nice paper from Facebook I think I forgot who the authors but I think it's part of gaming's has group and what they do is they take an image which is this two-dimensional signal and they actually take pixel by pixel and scramble the image as if it was just a list of pixels and crucially they encode the position of the pixels with at the XY coordinates and this is just kind of a new architecture which we incidentally also use in StarCraft called the transformer which is a very popular paper from last year which yielded very nice result in machine translation and if you actually believe in this kind of or it's actually a set of pixels as long as you encode X Y it's okay then you you could argue that the list of units that we see is precisely that because we have each unit as a kind of pixel if you will and then there XY coordinates so in that perspective we without knowing it we use the same architecture that was shown to work very well on Pascal on image net and so on so the interesting thing here is putting it in that way it starts to move it towards the way you usually work with language so what and especially with your expertise and work in language it seems like there's echoes of a lot of the way you would work with natural language in the way you've approached alpha star right what's does that help with the long term sequence modeling there somehow exactly so so now that we understand what an observation for a given time step is we need to move on to say well there's going to be a sequence of such observations and an agent will need to given all that it's seen not only the current time step but all that it's seen why because there is partial observability we must remember whether we saw a worker going somewhere for instance right because then there might be an expansion and the top right of the map so given that what you must then think about is there is the problem of given all the observations you have to predict the next action and not only given all the observations but given all the observations and given all the actions you've taken predict the next action and that sounds exactly like machine translation where and that's exactly how kind of I saw the problem especially when you are given supervised data or replays from humans because the problem is exactly the same you're translating essentially a prefix of observations and actions onto what's going to happen next which is exactly how you would train a model to translate or to generate language as well right you have a certain prefix you must remember everything that comes in the past because otherwise you might start having noncoherent text and the same architectures we're using LST MS and transformers to operate on across time to kind of integrate all that's happening in the past those architectures that work so well in translation or language modeling are exactly the same than what the agent is using to issue actions in the game and the way we train in moreover for imitation which is step one of alpha studies take all the human experience and try to imitate it much like you try to imitate translators that translated many pairs of sentences from French to English say that sort of principle applies exactly the same it's you mightyou it's almost the same code except that instead of words you have a slightly more complicated objects which are the observations and the actions are also a bit more complicated that than award is there a self play component into so once you run out of imitation right so so indeed you can bootstrap from human replays but then the agents you get are actually not as good as the humans you imitated right so how do you imitate well we take humans from 3,000 MMR and hire 3,000 MMR is just a metric of human skill and 3,000 MMR might be like 50% percentile right so it's just Kevin average human what's that so maybe quick pause MMR's ranking scale the matchmaking rating yeah for players so street uh remember there's like a master and a grandmaster with 3000 so 3,000 is is pretty bad I think it's kind of gold level it just sounds really good relative to chess I think oh yeah I know the the rating is the best in the world are at 7,000 mm so 3,000 it's a bit like Eloy indeed right so 3,300 just allows us to not filter a lot of the data so we like to have a lot of data in deep learning as you probably know so we take these kind of 3,500 and above but then we do a very interesting trick which is we tell the neural network what level they are imitating so we say these replay you're gonna try to imitate to predict the next action for all the actions that you're gonna see is a 4,000 mm our replay this one is a 6,000 mm our replay and what what's cool about this is then we take this policy that is being trained from human and then we can ask it to play like a 3,000 mm our player by setting a bit saying well okay play like a 3,000 mm our player or play like a 6,000 mm our player and you actually see how the policy behaves differently it gets worse economy if you play like a gold level player it does less actions per minute which is the number of clicks or number of actions that you will issue in a whole minute and it's very interesting to see that it kind of imitates the skill level quite well but if we ask you to play like a 6000 mm our player we tested of course these policies to see how well they do they actually beat all the built-in ai's that these are put in the game but they're nowhere near 6000 mm our players right they might be maybe around gold level platinum perhaps so there's still a lot of work to be done for the policy to truly understand what it means to win so far we only asked them ok here is the screen and that's what happened on the game until this point what would the next action be that we asked you know we asked a pro - now say all this you're gonna click here or here or there and the point is experiencing experiencing wins and losses is very important to then start to refine otherwise the policy can get loose can can just go off policy as we call it that's so interesting you can at least hope eventually to be able to control a policy approximately to be a some MMR level that's that's so interesting especially given that you have ground truth for a lot of these cases right can ask your personal questions or what's your mmm well I haven't played Starcraft 2 so I am unranked oh is the kind of Lois League okay so I used to play Starcraft the first one and but you haven't seriously played so the best player we have a deep mind is about five thousand MMR which is high masters is not at Grandmaster level Grandmaster level would be the top 200 players in a certain region like Europe or America or Asia but for me it would be hard to say I am very bad at the game I actually played alpha star a bit too late and it bit me I remember the whole team was Oreo you should play yeah and I was it looks like it's not so good yet and then I remember I kind of got busy and waited an extra week and I played and it really beat me very badly was it I've heard that feel it's not an amazing feelings amazing yeah I mean obviously I tried my best and I try to also impress my because I actually played the first game so I'm still pretty good at micromanagement um the promise I just don't understand Starcraft 2 I understand Starcraft and when I played Starcraft I probably was consistently like for for a couple years top 32 in Europe so I was decent but at the time we didn't have this kind of MMR system as as well established so it would be hard to know what what it was back then so what's the difference in interface between alpha star and Starcraft and a human player and Starcraft is there any significant differences between the way they both see the game I would say the way they see the game there's a few things that are just very hard to simulate the main one perhaps which is obvious in hindsight is what's called cloaked units which are invisible units so in StarCraft you can make some units that you need to have a particular kind of unit to detect it so these units are invisible if you cannot detect them you cannot target them so they would just you know destroy your buildings or kill your workers but despite the fact you cannot target the unit there's a shimmer that as a human you observe I mean you need to train a little bit you need to pay attention but you would see this kind of space-time as space-time like distortion and you wouldn't know okay there art yeah yeah it's like a wave thing yeah she's kinda storsch I don't like it that's really like the Blizzard term is shimmer shimmer and so this shimmer professional players actually can see it immediately they understand it very well but it's still something that requires certain amount of attention and and and it's kind of a bit annoying to deal with whereas four alpha star in terms of vision it's very hard for us to simulate sort of you know are you looking at these pixel in the screen and so on so um the only thing we can do is we there is a unit that's invisible over there so alpha star would know that immediately obviously still obeys the rules you cannot attack the unit you must have a detector and so on but it's it's kind of one of the main things that it just doesn't feel there's there's a very proper way I mean you could imagine are you you don't have high present know exactly where it is or sometimes you see it sometimes you don't but it's it's just really really complicated to get it so that everyone would agree oh that's that's the best way to simulate this right you know it seems like a perception problem it is a perception problem so so the only problem is people are you ask what's the difference between how humans perceive the game I would say they wouldn't be able to tell a shimmer immediately as it appears on the screen whereas alpha star in principal sees it very sharply right it seems okay it sees that the beat turned from zero to one meaning there's now a unit there although you don't know the unit or you don't know it you know you know that you cannot attack it and so on God so that from from a vision standpoint that probably is the one that is kind of the most obvious one then there are things humans cannot do perfectly even professionals which is they might miss a detail or they might have not seen a unit and obviously as a computer if there's a corner of the screen that turns green because a unit enters the field of view that can go into the memory of the agent the lsdm and Percy's there for a while for whatever for however long is relevant right and in terms of action it seems like the rate of action from an alpha star is comparative if not slower than professional players but is there but it's more precise as well right so so that's that's a very like that's really probably the one that is causing us more issues for a couple of reasons right the first one is Starcraft has been an AI environment for quite a few years in fact I mean I was participating in the very first competition back in 2010 and there's really not been a kind of a very clear set of rules how the actions per minute the rate of actions that you can issue is and as a result these agents or bots that people build in a kind of almost very cool way they do like 20,000 40,000 actions per minute now now to put this in perspective a very good professional human my du 300 to 800 actions per minute they might not be as precise that's why the range is a bit tricky to to identify exactly I mean 300 actions per minute precisely is probably realistic 800 is probably not but you see humans doing a lot of actions because they warm up and they kind of select things and spam and so on just so that when they need they have the accuracy so we came into this by not having kind of a standard way to say well how do we measure whether an agent is at human level or not on the other hand we had a huge advantage which is because we do imitation learning agents turned out to act like humans in terms of rate of actions even Precision's and imprecisions of actions in the supervise policy you could see all this you could see how agents like to spam click to move here if you played specially Diablo you would know what I mean I mean you just duck like spam or a move here Murphy a move here you're doing literally like maybe five actions in two seconds but these actions are not very meaning meaningful one would have sufficed so on the one hand we start from this imitation policy that is at the ballpark of the actions per millions of humans because it acts statistically trying to imitate humans so we see these very nicely in the curves that we showed in the blog post like this these actions per minute and the distribution looks very human-like but then of course as self play kicks in and and that's the part we haven't talked too much yet but of course the agent must played getting itself to improve then there's almost no guarantees that these actions will not become more precise or even the rate of actions is going to increase over them so what we did and this is probably kind of the first attempt that we thought was reasonable is we looked at the distribution of actions for humans for certain windows of time and just to give a perspective because I guess I mentioned that some of these agents that are programmatic let's call them they do 40,000 actions per minute professionals as I said to 300 to 800 so what we looked is we look at a distribution over professional gamers and we took reasonably high actions per minute but we kind of identify certain cat offs after which if even if the agent wanted to act these actions would be dropped but the problem is this cutoff is probably set a bit too high and what ends up happening even though the games and when we ask the professionals and the gamers by by and large they feel like it's playing human-like there are some Asians that developed maybe slightly too high a PMS which is actions per minute combined with the precision which made people sort of start discussing a very interesting issue which is should we have limited these should we just let it loose and see what cool things it can come up with right so this is in itself an extremely interesting question but the same way that modeling the shimmer would be so difficult modeling absolutely all the details about muscles and precision and an tiredness of humans would be quite difficult right so we're here in kind of innovating in this sense of okay what could be maybe the next iteration of putting more rules that makes the agents more human-like in terms of restrictions yeah putting constraints that more constraints yeah that's really interesting that's really innovative so one of the constraints you put on your on yourself or at least focused in is on the protoss race as far as I understand can you tell me about the different races and how they so protoss terran and the Zerg how do they compare how do they interact why did you choose Protoss in right there is in the dynamics of the game you seen from a strategic perspective so Protoss so in stacked of there are three races indeed in the demonstration we saw only the Protoss race so maybe let's start with that one Protoss is kind of the most technologically advanced race it has units that are expensive but powerful right so in general you want to kind of conserve your units as you go attack so you wanna and and then you want to utilize these tactical advantages of very fancy spells and so on so forth and at the same time they're kind of people say like they're they're a bit easier to play perhaps right but that I actually didn't know I mean I just talked to now a lot to the players that we we work with TLO and mana and they said oh yeah Protoss is actually people think is actually one of the easiest races so perhaps the easier that doesn't mean that it's you know obviously professional players excel at the three races and there's never like a race that dominates for a very long time anyway so if you look at the top and 100 in the world is there one race that dominates that list it would be hard to know because it depends on the regions I think it's pretty equal in terms of distribution and Blizzard wants it to be equal right they don't want they wouldn't want one race like Protoss to not be representative in the top place so definitely like they tried it to be like the balance right so then maybe the opposite race of Protoss is zerk dirt is a race where you just kind of expand and take over as many resources as you can and they have a very high capacity to regenerate their units so if you have an army it's not that valuable in terms of losing the whole army is not a big deal azarkh because you can then rebuild it and given that you generally accumulate a huge Bank of resources zurk steep it will play by a prayer applying a lot of pressure may be losing their whole army but then rebuilding it quickly so although of course every race I mean there is never I mean they're pretty diverse I mean there's some unity insert that are technologically advanced and they do some very interesting spells and there's some units in proto's that are less valuable and you could lose a lot of them and rebuild them and it wouldn't be a big deal all right so maybe I'm missing out maybe I'm gonna say some dumb stuff but just summary of strategy so first there's collection of a lot of resources right so that's one option the other one is expanding so building other basins then the other is obviously attack ability building units and attacking with those units and then I don't know what else there is maybe there is the different timing of attacks like to attack early attack ready what are the different strategies that emerged that you've learned about I've read the a bunch of people are super happy that you guys have apparently that alpha star apparently is discovered that's really good too what is it saturate oh yeah they're mind-numbing online yeah the mineralogist yeah yeah the in it for greedy amateur players like myself that's always been a good strategy you just build up a lot of money and it just feels good it's just accumulate and accumulate so thank you for discovering that yeah validating all of us but is there other strategies that you discovered interesting yeah unique to to this game yeah so if you look at the kind of not being a stack of two player but of course Starcraft and Starcraft 2 and realtime strategy games in general are very similar I would classify perhaps the openings of the game they're very important and generally I would say there's two kinds of openings one that's a standard opening that's generally how players find sort of a balance between risk and economy and building some you needs early on so that they could defend but they're not to expose basically but also expanding quite quickly so this is would be kind of a standard opening and within a standard opening then you what you do choose general is what technology are you aiming towards so there's a bit of rock-paper-scissors of you could go for spaceships or you could go for invisible units or you could go for I don't know like massive units that attack against certain kinds of units but they're weak against others so standard openings themselves have some choices like rock-paper-scissors style of course if you Scout and you're good at guessing what the opponent is doing then you can plane as an advantage because if you know you're gonna play rock I mean I'm gonna play paper obviously so you can imagine that normal standard games in StarCraft looks like a continuous rock paper scissor game where you guess what the distribution of rock paper and scissor is from the enemy and reacting accordingly to try to beat it or you know put the paper out before he kind of changes his mind from rock to scissors and then you would be in a weak position so sorry to pause on that I didn't realize this element cuz I know it's true but poker I know I looked at Lambrakis you're this is so you're also estimating trying to guess the distribution to better and better estimate the distribution what the Epona's likely to be doing yeah I mean as a player you definitely want to have a belief state over what's up on the other side of the map and when your belief state becomes inaccurate when you start having that serious doubts whether he's gonna play something that you must know that's when you scout you wanna then gather information right it's improving the accuracy of the belief or improving the belief state part of the loss that you try to optimize or is it just in a side effect it's implicit but you could explicitly model it and it would be quite good that's probably predicting what's on the other side of the map but so far it's all implicit the lot there's no no additional reward for predicting the enemy so there's these standard opening x' and then there's what people call which is very interesting and alpha star sometimes really likes this kind of cheese these cheese's what they are is kind of an all-in strategy you're gonna do something sneaky you're gonna hide enemies as hide your own buildings close to the enemy base or you're gonna go for hiding your technological buildings so that you do invisible units and the enemy just cannot react to detect it and does lose the game and there's quite a few of these cheeses and variants of them and there is where actually the belief state becomes even more important because if I spelled your base and I see no buildings at all any human prayer knows something's up they might know well you're hiding something close to my base should I build suddenly a lot of units to defend should I actually block my ramp with workers so that you cannot come and destroy my base so there's all these is happening and defending against Jesus is extremely important and in the alpha star League many agents actually develop some cheesy strategies and in the games we saw against elo and mana two out of the ten agents were actually doing these kind of strategies which are cheesy strategies and then there's a baron of cheesy strategy which is called all-in so an all-in strategy is not perhaps as drastic as oh I'm gonna build cannons on your base and then bring all my workers and try to just disrupt your base and game over or GG as we say in StarCraft umm there's these kind of very cool things that you can align precisely at a certain time mark so for instance you can generate exactly ten unit composition that is perfectly five of these type five of disorder type and align the upgrade so that at four minutes and a half let's say you have these ten units and the upgrade just finished and at that point that army is really scary and unless the enemy really knows what's going on if you push you might then have an advantage because maybe the enemy is doing something more standard it expanded too much it developed too much economy and any trade-off badly against having defenses and the enemy will lose but it's called all-in because if you don't win then you're gonna lose so you see players that do these kind of strategies if they don't succeed game is not over I mean they still have a place and they still gathering minerals but they will just gg out of the game because they know well game is over I gambled and I failed so if we start entering the game theoretic aspects of the game it's really rich and it's really that's why it also makes it quite entertaining to watch even if I don't play I still enjoy watching the game but the agents are trying to do this mostly implicitly but one element that we improved in self plays creating the alpha star League and the alpha star League is not pure self play it's trying to create different personalities of agents so that some of them will become cheese cheese agents some of them might become very economical very greedy like getting all the resources but then being maybe early on they're gonna be weak but later on they're gonna be very strong and by creating this personality of agents which sometimes it just happens naturally that you can see kind of an evolution of agents that given the previous generation they trained against all of them and then they generate kind of the count the perfect counter to that distribution but these these agents you must have them in the populations because if you don't have them you're not covered against these things right it's kind of you wanna you wanna you know create all sorts of the opponents that you will find in the wild so you can be exposed to these cheeses early aggression later aggression more expansions dropping units in your base from the side all these things and pure self play is getting a bit stack at finding some subset of these but not all of these so the alpha star League is a way to kind of do an example of agents that they're all playing and in a league much like people play on battlenet right they play you play against someone who does a new cool strategy and you immediately oh my god I want to try it I want to play again and these to me was another critical part of the of the of the problem which was can we create a battlenet for agents yeah that's kind of what the Alpha star league is really fascinating and where they stick to their different strategies yeah wow that's it's really really interesting so but that said you were fortunate enough or just skilled enough to win 5-0 and so how hard is it to win I mean that's not the goal I guess I don't know what the goal is the goal should be to a majority not five zero but how hard is it in general to win all matchups I don't want 1v1 so that's a very interesting question because once you see alpha star and superficially you think well locate one let's if you some of the games like ten to one right it lost the game that it played with the camera interface you might think well that's that's done right there's it's it's super human at the game and that's not really the claim we really can make actually the claim is we beat a professional gamer for the first time Starcraft has really been a thing that it's been going on for a few years but moment a moment like this hasn't not had not occurred before yet but our decisions impossible to beat absolutely not right so that's a bit what's you know kind of that the difference is the agents play at Grandmaster level there definitely understand the game enough to play extremely well but are they am beatable do they play perfect no and actually in StarCraft because of these sneaky strategies it's always possible that you might take a huge risk sometimes but you might get wins right out of out of this so I think that as a domain it still has a lot of opportunities not only because of course we want to learn with less experience we would like to I mean if I if I learn to play Protoss I can play turn and learn it much quicker than alpha star can right so there are obvious interesting research challenges as well but even as as the raw like as the raw performance goes really the claim here can be we are at pro level or at high Grandmaster level but obviously the players also did not know what to expect right this kind of their prior distribution was a bit off because they played this kind of new like alien brain as they like to say drive and that's what makes it exciting for them but also I think if you look at the games closely you see there were weaknesses in some points maybe alpha star did not scout or if it had got invisible units going against at certain points it wouldn't have known and it would have been bad so there's still quite a lot of work to do but it's really a very exciting moment for us to be seeing Wow a single neural net on a GPU is actually playing against these guys who are amazing I mean you have to see them play in life if they're really really amazing players yeah I'm sure there's there's the most there must be a guy in Poland somewhere right now training his butt off to make sure that this never happens again with alpha star so that's really exciting in terms of alpha star having some holes to exploit yeah it's just great and then you build on top of each other and it feels like Starcraft I'll like go even if you win it's still not there it's still not there's so many different dimensions in which you can explore so that's really really interesting do you think there's a ceiling to alpha star you've said that it hasn't reached you know it's this is a big wait what you know let me actually just pause for a second how did it feel to come here to this point to beat a top professional player like that night I mean you know Olympic athletes have their gold medal right this is your gold medal and sense sure you're cited a lot you published a lot of prestige papers whatever but this is like a win how did it feel I mean it was for me was unbelievable because first the win itself me was so exciting I mean d so looking back to those last days of 2018 really well that's when the games were played I'm sure I look back at that moment I say oh my god I want to be it like in a project like that it's like I already feel the nostalgia of like yeah that was huge in terms of the energy and the team effort that went into it and so in that sense as soon as it happened I already knew it was kind of I was losing it a little bit so it is almost like sad that it happened and all like but on the other hand it also verifies the approach but to me also there's so many challenges and interesting aspects of intelligence that even though we can train a neural network to play at the level of the best humans there's still so many challenges so for me it's also like well this is really an amazing achievement but I already was also thinking about next steps I mean as I said these agents play Protoss vs. Protoss but they should be able to play a different race much quicker right so that would be an amazing achievement some people call this matter reinforcement learning meta learning and so on right so there's so many possibilities after that moment but the moment itself it really felt great it's I we had this bet so I I'm kind of a pessimist in general so I kind of send an email to the team and said okay let's again steal offers right like what's gonna be the result and I really thought we would lose like five zero right III we had some calibration made against the 5000 MMR player TLO was much stronger than that player even if he played Protoss which is his off race but yeah I was not imagining we would win so for me that was just kind of a test run or something and then it really kind of he was really surprised and unbelievably we went to this to this bar to celebrate and and dave tells me well why don't we invite someone who is a thousand mm are stronger in proto's like an actual Protoss player like like that it turned up being man all right and you know we had some drinks and I said sure why not but then I thought well that's really gonna be impossible to beat I mean even because it's so much I had a thousand MMR is really like 99% probability that mana would beat TLO as Protoss vs. Protoss right so we did that and to me the second the second game was much more important even though a lot of uncertainty kind of disappeared after we we cannot beat the yellow I mean it he is a professional player so that was kind of over that's really a very nice achievement but mana really was at the top and you could see he played much better but our agents got much better too so it's a and then after the first game I said if we take a single game at least we can say we beat a game I mean even if we don't beat the series for me that was a huge relief and I mean I remember hugging them is and I mean it was it was really like this moment for me will resonate forever as a researcher and I mean as a person and is a really like great accomplishment and it was great also to be there with the team in the room I don't know if you saw like so it was really like I mean from my perspective the other interesting thing is just like watching Kasparov watching mana was also interesting because he didn't he is kind of a loss of words I mean whenever you lose I've done a lot of sports you sometimes say excuses you look for reasons right and he couldn't really come up with a reason yeah yeah I mean so with the off race for Protoss you could say it was it felt awkward it wasn't but here it was yeah it was it was just beaten and it was beautiful to look at a human being being superseded by an AI system I mean it's a it's a beautiful moment for researchers so yeah for sure it was it was I mean probably the highlight of my career so far because of its uniqueness and coolness and I don't know I mean it's obviously as you said you can look at paper citations and so on but these these really is like a testament of the whole machine learning approach and using games to advance technology I mean it's really it really was everything came together at that moment that that's really the summary also on the other side it's a popularization of AI too because just like traveling to to the moon and so on I mean this is where a very large community of people that don't really and no way I get to really interact with it which is very important I mean it's really we must you know writing papers helps our peers researchers to understand what we're doing but I think AI is becoming mature enough that we must sort of try to explain what it is and perhaps through games is an obvious way because these games always had built so it may be everyone experience an AI playing a video game even if they don't know because there's always some scripted element and some people might even call that AI already right so what are other applications of the approaches underlying alpha star that you see happening there's a lot of echoes of he said transformer of language modeling so on have you already started thinking where the breakthroughs in alpha star get expanded to other applications right so I thought about a few things for like kind of next months next year's the main thing I'm thinking about actually is what's next as a kind of a grand challenge because for me like we've seen Atari and then there's like the sort of three-dimensional worlds that we've seen also like pretty good performance from these capture-the-flag agents that also some people at deep mine and elsewhere are working on we've also seen some amazing results on like for instance dota 2 which is also a very complicated game so for me like the main thing I'm thinking about is what's next in terms of challenge so as a researcher I see sort of two tensions between research and then applications or areas or domains where you apply them so on the one hand we've done thanks to the application of StarCraft is very hard we develop some techniques some new research that now we could look at elsewhere like are there other applications where we can apply this and the obvious ones absolutely you can think of feeding back to sort of the community we took from which was mostly sequence modeling or natural language processing so we've developed an extended things from the transformer and and we use pointer networks we combine LSD and transformers in interesting ways so that perhaps the kind of lowest hanging fruit of feeding back to now different fields of machine learning that's not playing video games let me go old-school and jump to the to mr. Alan Turing yeah so the Turing test you know there's a natural language test the conversational test what's your thought of it as a test for intelligence do you think it is a grand challenge that's worthy of undertaking maybe if it is would you reformulate it or phrase it somehow differently right so I really love the Turing test because I also like sequences and language understanding and in fact some of the early work we did in machine translation we tried to apply to apply to kind of a neural chat bot which obviously would never pass the Turing test because it was very limited but it is a very fascinating fascinating idea that you could really have an AI that would be indistinguishable from humans in terms of asking or conversing with with it right so I think the test itself seems very nice and it's kind of well defined actually like the passing it or not I think there's quite a few rules that feel like pretty simple and and and you know you could you could really like have I mean I think they have these competitions every year yes of the laudner prize but I don't know if you've seen a I don't know if you've seen the kind of bots that emerge from that competition they're not quite as what you would so it feels like that there's weaknesses with the way tauren formulated it it needs to be that the definition of a genuine rich fulfilling human conversation it needs to be something else like the Alexa prize which I'm not as well familiar with has tried to define that more I think by saying you have to continue keeping a conversation for 30 minutes something like that so basically forcing the agent not to just fool but to have an engaging conversation kind of thing is that I mean is is this have you thought about this problem richly like as and if if you have in general how how far away are we from you worked a lot on language understanding language generation but the full dialogue the conversation you know just sitting at the bar having a couple of beers for an hour and that kind of conversation have you thought about yeah so I think you touched here on the critical point which is feasibility right so so there's there's a great sort of essay by Hamming which describes sort of grand challenges of physics and he argues that well okay for instance teleportation or time travel our great grand challenges of physics but there's no attacks we really don't know or cannot kind of make any progress so that's why most physicists and so on they don't work on these in their PhDs and and as part of their careers so I see the Turing test as I in the full Turing test as a bit still too early like I am I think we're especially with the current trend of deep learning language models we've seen some amazing examples I think GPD to being the most recent one which is very impressive but to understand to fully solve passing or fooling a human to think that you're that there's a human on the other side I think we're quite far so as a result I don't see myself and I probably would not recommend people doing a PhD on solving the Turing test because it just feels it's kind of too early or too hard of a problem yeah but that said you said the exact same thing about Starcraft about a few years ago so into damage so I prefer yeah you'll probably also be the person who passes the Turing test in three years I mean I think I think that yeah so so we have the Sun record this is nice it's really I mean that the it's true that progress sometimes is a bit unpredictable I really wouldn't have not even six months ago I would not have predicted the level that we see that these agents can deliver at Grandmaster level but I I have worked on language enough and basically my concern is not that something could happen a breakthrough could happen that would bring us to solving or passing the Turing test is that I just think the statistical approach to it like this it is not is not gonna cut it so we need we need the breakthrough we is great for the community but given that I think there's quite a more uncertainty whereas for StarCraft I knew what the steps would be to kind of get us there I think it was clear that using the imitation learning part and then using these battlenet for agents were gonna be key and and it turned out that this was the case and a little more was needed but not much more for Turing test I just don't know what the plan or execution plan would look like so that's why I'm I myself working on it as a grand challenge is hard but there are quite a few sub challenges that are related that you could say well I mean what if you create a great assistant like Google already has like the Google assistant so can we make it better and can we make it fully new role and so on that I start to believe maybe we're reaching a point where we should attempt these challenges like this conversation so much because the echo is very much to start a conversation it's exactly how you approach StarCraft let's break it down into small pieces solve those and you end up solving the whole game great but that said you you're behind some of the sort of biggest pieces of work and deep learning in the last several years so you mentioned some limits what do you think of the current limits of deep learning and how do we overcome those limits so if I had to actually use a single word to define the main challenge in deep learning is a challenge that probably has been the challenge for many years and is that of generalization so what that means is that all that we're doing is fitting functions to data and when the data we see is not from the same distribution or even if there some times that it is very close to distribution but because of the way we train it with limited samples we then get to this stage where we just don't see generalization as much as we can generalize and I think adversarial examples are a clear example of these but if you study machine learning and literature and you know the reason why as VMs came very popular where because they were dealing and they had some guarantees about generalization which is unseen data or out of distribution or even within distribution where you take an image adding a bit of noise these models fail so I think really I don't see a lot of progress on generalization in in the strong generalization sense of the word I I think our neuron neural networks you can always find design examples that will make their outputs arbitrary which is which which is not good because we humans would never be fooled by these kind of images or manipulation of the image and if you look at the mathematics you kind of understand this is a bunch of matrices multiplied together there's probably numerix and instability that you can just find corner cases so I think that's really the underlying topic many times we see when even even at the grand stage of like doing test generalization I mean if used if you start I mean passing the Turing test should you should it be in English or should it be in any language right I mean as a human if you could you could if you ask something in a different language you actually will go and do some research and try to translate it and so on shoot the Turing test in clementa include that right and it's really a difficult problem and very fascinating and very mysterious actually yeah absolutely but do you think it's if you were to try to solve it can you not grow the size of data intelligently in such a way that the distribution of your training set does include the entirety of the testing set I think is that one path the other path is totally new methodology right it's not statistical so a path that has worked well and it worked well in in stark Ravin in machine translation and in languages scaling up the data and the model and that's kind of been maybe the only single formula that the leap still delivers today in deep learning right it's it's that scale data scale and model scale really do more and more of the things that we thought oh there's no way it can generalize to these Ori there's no way it can generalize to that but I don't think fundamentally it resolve with these and for instance I'm really liking some style or approach that would not only have neural networks but it would have programs or some discrete decision-making because there is what I feel there's a bit more like like I mean the example of the best example I think for understanding disease I also worked a bit on or like we can learn an algorithm with a neural network right so you give it many examples and it's going to sort your sort the input numbers or something like that but really strong generalization is you give me some numbers or you ask me to create an algorithm that sorts numbers and instead of creating a neural net which will be fragile because it's gonna go out of range at some point you're gonna give you numbers that are too large to small and whatnot you just if you just create a piece of code that sorts the numbers then you can prove that that will generalize to absolutely all the possible inputs you could give so I think that's the problem comes with some exciting prospects I mean scale is a bit more boring but it really works and then maybe programs and these critics tractions are a bit less developed but clearly I think they're quite exciting in terms of future for the field do you draw any insight wisdom from the 80s and expert systems and symbolic systems about computing do you ever go back to those reasoning that kind of logic do you think that might make a comeback you have to dust off those books yeah I actually love actually adding more inductive biases to me the problem really is what are you trying to solve if what you're trying to solve is so important that try to solve it no matter what then absolutely use rules use domain knowledge and then use a bit of the magic of machine learning to empower to make the system as the best system that will detect cancer or you know or detect weather patterns right or in terms of start of it also was a very big challenge so I was definitely happy that if we had to get take cut a corner here and there it could have been interesting to do and in fact in StarCraft we we start thinking about expert systems because it's very you know you can define I mean people actually build stack reports by thinking about those principal I guess you know state machines and rule-based and then you could you could think of combining a bit of a rule-based system but that has also neural networks incorporated to make it generalize a bit better so absolutely I mean we should we should definitely go back to those ideas and anything that makes the problem simpler as long as your problem is important that's okay and that's research driving a very important problem and on the other hand if you wanna really focus on the limits of reinforcement learning then of course you must try not to look at imitation data or to look some like for some rules of the domain that would help a lot or even feature engineering right so these these attention that depending on what you do I think both both ways are definitely fine and I would never not do one or the other if you're as long as you what you're doing is important and needs to be soft right right so there's a bunch of different ideas that that that you develop that I really enjoy so but one one is translating from the image captioning translated finish the text just just another just beautiful yeah beautiful idea I think that resonates throughout your work actually so the underlying nature of reality being language always yes somehow so what's the connection between images and text rather the visual world and the world of language in your view right so I think a piece of research that's been central to I would say even extending into Starcraft is is this idea of sequence to sequence learning which what we really meant by that is that you can you can now really input anything to a neural network as the input X and then the neural network will learn a function f that will take X as an input and produce any output Y and these x and y's don't need to be like static or like a features like as like a fixed vectors or anything like that it could be it really sequences and now beyond like data structures right so that paradigm was tested in a very interesting way when we moved from translating French to English to translating an image to its caption but the beauty the beauty of it is that really and that's actually how it happened I run I change the line of code in this thing that was doing machine translation I and I came the next day and I saw how it like it was producing captions that seemed like oh my god this is really really working and the principle is the same right so I think I don't see text vision speech waveforms as something different here as long as you basically learn a function that will vector eyes you know these into and then after we vectorize it we can then use you know transformers LS DMS whatever the flavor of the month of the model is and then as long as we have enough supervised data really this formula will work and we'll keep working I believe to some extent model of these generalization issues that I mentioned before so but the testers to vectorize sort of former representation that's meaningful nothing and your intuition now having worked with all this media is that once you are able to form that representation you can basically take any things any sequence is there go back to Starcraft is there limits on the length so we didn't really touch on a long term effect how did you overcome the whole really long term aspect of things here is there some tricks or so the main streak so Starcraft if you look at absolutely every frame you might think it's it's quite a long game so we would have to multiply 22 times 60 seconds per minute times maybe at least 10 minutes per game on average so there are quite a few frames but the trick really was to only observe in fact which might be seen as a limitation but it is also computational advantage only observe when you act and then what the neural network decides is what is the gap gonna be until the next action and if you look at most Starcraft games that we have in the in the data set that Blaser provided it turns out that most games are actually only I mean it is still a long sequence but it may be like a thousand to 1,500 actions which if you start looking at L STM's large LST M's transformers it's it's not like it's not that that difficult especially if you have supervised learning if you had to do it with reinforcement learning the credit assignment problem what is it that in this game that made you win that would be really difficult but thankfully because of imitation learning we didn't kind of have to deal with these directly although if we had to we tried it and what happen is you just take all your workers and attack with them and that sort of is kind of obvious in retrospect because you start trying random actions one of the actions will be a worker that goes to the enemy base and because it's self play it's not gonna know how to defend because it basically doesn't know almost anything and eventually what you develop is this take our workers and attack because the the create assignment issue in Arad is really really hard I do believe we could do better and that's maybe a research challenge for the future but yeah even even in StarCraft the sequences are maybe a thousand which I believe there is within the realm of what transformers can do yeah I guess the difference between Starcraft and go is in go and chest stuff starts happening right away right so there's not yeah it's pretty easy to self play not easy but to sulfa is possible to develop reasonable strategy as quickly as opposed to Starcraft meaning go there's only 400 actions but one action is what people would call the god action that would be if you had expanded the whole search tree that's the best action if you did minimax or whatever algorithm you would do if you had the computational capacity but in StarCraft the 400 is miniscule like a in 400 you don't even like you you couldn't even click on the pixels around a unit right so I think the problem there is in terms of action space size is way harder so and that surge is impossible so there's quite a few challenges indeed that make this kind of a step step up in terms of machine learning for humans maybe they playing Starcraft it seems more intuitive because it's looks real I mean you know like the graphics and everything moves smoothly whereas I I don't know how to come in go is a game that I wouldn't really mean to study it feels quite complicated but for machines kind of maybe easier reverse yes which shows you the gap actually between deep learning and however the heck our brains work so you developed a lot of really interesting ideas it's interesting to just ask what's the what's your process of developing new ideas do you like brainstorming with others do you like thinking alone do you like like was it eating good fellow said he came up with Gans after a few beers right he thinks beers are essential yeah coming up with new ideas we had beers to decide to play another game game of Starcraft after a week so it's really similar to that story actually I explained this in a in a deep mind retreat and I said this is the same as the gun story I mean we were wearing a bar and we decided let's play again next week and that's what happened I feel like we're giving the wrong message to young undergrads yeah but in general like yeah do you like brainstorming do you like thinking alone working stuff out and so I think I think throughout the years also things changed right so initially I was very fortunate to be with great minds like Geoff Hinton Jeff Dean Ilya sutskever I was really fortunate to join brain at a very good time so at that point it ideas I was just kind of brainstorming with my colleagues and learned a lot and keep learning is actually something you should never stop doing right so learning implies reading papers and also these casting ideas with others it's very hard at some point to not communicate that being reading paper forms from someone or actually discussing right so definitely that communication aspect needs to be there whether it's written or oral nowadays I'm also trying to be a bit more strategic about what research to do so I was describing a little bit this sort of tension between research for the sake of research and then you have on the other hand applications that can drive the research right and honestly the formula that has worked best for me is just find a hard problem and then try to see how research fits into it how it doesn't fit into it and then you must innovate so I think machine translation drove sequence to sequence then maybe like learning algorithms that had to like combinatorial algorithms led to pointer networks Starcraft led to really scaling a permutation learning and the Alpha star league so that's been a formula that I personally like but the other one is also about it and I seen it succeed a lot of the times where you just want to investigate model-based RL as a kind of a research topic and then you must then start to think well how are the tests how are you going to test these ideas you need to kind of a minimal environment to try things you need to read a lot of papers and so on and that's also very fun to do and something I've also done quite a few times both at brain at the mine and obviously as as a PhD so so I think besides that the ideas and discussions I think it's important also because you start sort of guiding not only your own goals but other people's goes to the next breakthrough so you you must really kind of understand these you know feasibility also as we were discussing before right whether whether these domain is ready to be tackled or not and you don't want to be too early you obviously don't want to be too late so it's it's really interesting and this is a strategic component of research which I think as a grad student I just had no idea to you know I just read papers and discussed ideas and I think this has been maybe the major change and I recommend people kind of feed forward to success how it looks like and try to backtrack other than just kind of looking out these looks cool these looks cool and then you do a bit of random work which sometimes you stumble upon some interesting things but in general it's it's also good to plan a bit yeah I like it especially like your approach I've taken a really hard problem stepping right in and then being super skeptical about yeah being robbed I mean there's a balance of both right there's a silly optimism and and a critical sort of skepticism that's good to balance which is why it's good to have a team of people that that balance that you don't do that on your own you have both mentors that have seen or you obviously wanna chat and discuss whether it's the right time I mean Demi's came in 2014 and he said maybe in a bit we'll do starcraft and maybe he knew and that's and I'm just following his lead which is great because he's he's brilliant right so these these things are obviously quite important that you wanna be surrounded by people who you know are diverse they they have their knowledge there's also important too I mean I I've learned a lot from people who actually have an idea that I might not think it's good but if I give them the space to try it I've been proven wrong many many times as well so that's that's great it's I think it's your colleagues are more important than yourself I think so sure now let's real quick talk about another impossible problem AGI right what do you think it takes to build a system that's human level intelligence we talked a little bit about the Tauranga stark after all these have echoes of general intelligence but if you think about just something that you would sit back and say wow this is really something that resembles human level intelligence what do you think it takes to build that so I find that AGI oftentimes is maybe not by well defined so what I'm trying to then come up with for myself is what would be a result look like that you would start to believe that you would have agents or neural nets that no longer sort of over feet to a single task right but actually kind of learn the skill of learning so to speak and that actually is a field that I am fascinated by which is the learning to learn or meta learning which is about no longer learning about a single domain so you can think about the learning algorithm itself is general right so the same formula we applied for alpha star or Starcraft we can now apply to kind of almost any video game or you could apply to many other problems and domains but the algorithm is what's kind of generalizing but the neural network the weights those weights are useless even to play another race right I train a network to play very well at Protoss vs. Protoss I need to throw away those weights if I want to play now terran vs terran i would need to retrain a network from scratch with the same algorithm that's beautiful but the network itself will not be useful so I think when I if I see an approach that can observe or start solving new problems without the need to kind of restart the process I think that to me would be a nice way to define some form of AGI again I don't know the grand views like age I mean so it should Turing test nice or before AGI I mean I don't know I think I think concretely I would like to see clearly that meta learning happened meaning there there is an architecture or network that as it sees new new problem or new data it solves it and to make it kind of a benchmark it should solve it at the same speed that we do solve new problems when I define your new object and you have to recognize it when I when you start playing a new game you played all the times but now you play a new Atari game well you you're gonna be pretty quickly pretty good at the game so that's perhaps what's the domain and what's the exact benchmark is a bit difficult I think as a community we might need to do some work to define it but I think this first step I could see it happen relatively soon but then the whole what a GI means and so on I am a bit more confused about what I think people mean different things there's an emotional psychological level that like the even the Turing test passing the Turing test is something that we just passed judgment on human beings what it means to be you know as a as a dog in a GI system yeah like what level what does it mean right yeah what does it mean but I like the generalization and maybe as a community would converge towards a group of domains that are sufficiently far away that would be really damn impressive if we're able to generalize some perhaps not as close as Protoss and Zerg but like Wikipedia step be a good stuff and then a really good step but then then like Wickham Starcraft 2 Wikipedia yeah I'm back yeah that kind of thing and that that feels also quite hard and far but I think there's as long as you put the benchmark out as we discovered for instance with image net then tremendous progress can be had so I think maybe there's a lack of benchmark but I'm sure we'll find one and yeah a community will will then work towards that and then beyond what a GI might mean or would imply I really am hopeful to see basically machine learning or AI just scaling up and helping you know people that might not have the resources to hire an assistant or that they might not even know what the weather is like but you know so I think there's in terms of the impact the positive impact of AI I think that's maybe what we should also not lose focus right the research community building AG I mean that's a real nice girl but man I think the way that deep mind puts it is and then use it to solve everything else right so I think we should paralyze yeah we shouldn't forget about all the positive things that are actually coming out of it already and I are not going to be coming out right but that I know let me ask relative the popular perception do you have any worry about the existential threat of artificial intelligence in the near or far future that some people have I think I'm in the near future I'm I'm skeptical so I hope I'm not wrong but I'm I'm not concerned but I I appreciate efforts ongoing efforts and even like whole research fields on AI safety emerging and in conferences and so on I think that's great in the long term I really hope we just can simply have the benefits outweigh the potential dangers I am hopeful for that but also we must remain vigilant to kind of monitor and assess whether the trade-offs are are there and and we have you know enough also lead time to prevent or to redirect our efforts if need be right so but I'm quite I'm quite optimistic about the technology and definitely more fearful of other threats in terms of planetary level at this point but obviously that's the one I kind of have more like power on so clearly I do start thinking more and more about this and it's kind of it's groaning me actually to to start reading more about AI safety jeez afield that so far I have not really contributed to but maybe there's something to be done there as well I think it's really important you know I would talk about this issue folks but it's important to ask you and shove it in your head because you're at the leading edge of actually what people are excited about nay I I mean the work with alpha star it it's arguably at the very cutting edge of the kind of thing that people are afraid of and so you speaking to that fact and that we're actually quite far away to the kind of thing that people might be afraid of but it's still something worthwhile to think about and it's also good that you're the you're not as worried and you're also open to us yeah me Maura there's two aspects I mean me not being worried but obviously we should prepare for for for it right for for like forever for things that could go wrong misuse of the technologies as with any technologies right so I think there's there's always trade-offs and I I as a society we've kind of solved these to some extent within the past so I'm hoping that by having the researchers and the whole community brainstorm and come up with interesting solutions to the new things that will happen in the future that we can still also push the research to the Avenue that I think is kind of the greatest Avenue which is to understand intelligence right how are we doing what we're doing and you know obviously from a scientific standpoint that is kind of the drive my personal driver of all the time that I spend doing what I'm doing really what do you see the deep learning as a field heading what do you think the next big big breakthrough might be so I think deep learning I I discuss a little of this before deep learning has to be combined with some form of discretization program synthesis I think that's kind of as a research in itself is an interesting topic to expand and start doing more research and then as kind of what will deep learning and able to do in the future I don't think that's gonna be what's gonna happen this year but also this idea of starting not to throw away all the weights that the this idea of learning to learn and really having these agents not having to restart their weights and you you can have an agent that is kind of solving or classifying images on image net but also generating speech if you ask it to generate some speech and and it should really be kind of almost the same network but might not be a neural networking might be a neural network with a optimization algorithm attached to it but I think this idea of generalization to new tasks is something that we first must define good benchmarks but then I think that's gonna be exciting and I'm not sure how close we are but I think there's the pet if you have a very limited domain I think we can start doing some progress and much like how we did a lot of programs in computer vision we should start thinking am I really like a talk that gave that Leon blue to give gave at ICML a few years ago which is this train test paradigm should be broken we we know we should stop thinking about a training test at Acharya training set and a test set and these are closed you know things that are untouchable I think we should go beyond these and in meta learning we call these the meta training set and the meta test set which is really thinking about if I know about imagenet why would that network not work on M NIST which is a much simpler problem but right now it really doesn't it you know yeah and but it just feels wrong right so I think that's kind of the there's the on the application or the benchmark sites we we probably will see quite a few more interest and progress and hopefully people defining new and exciting challenges really do you have any hope or interest in knowledge graphs within this context so just kind of totally yeah constructing graph so going back that graphs yap well okay neural networks and graphs but I mean a different kind of knowledge graph sort of like semantic graphs or there's concepts yeah so I think I think the the idea of graphs is is so I've been quite interested in sequences first and then more interesting or different data structures like graphs and I've studied graph narrow networks in the last three years or so I found these models just very interesting from like deep learning sites standpoint but then how what do we want why do we want these models and and why would we use them what's the application what's kind of the killer application of graphs right and perhaps if we could extract a knowledge Graff from Wikipedia automatically right um that would be interesting because then these graphs have this very interesting structure that also is a bit more comfortable with this idea of programs and deep learning kind of working together the jumping neighborhoods and so on you could imagine defining some primitives to go around graphs right so I think I really like the idea of a knowledge graph and in fact when we we started or you know as part of the research we did for StarCraft I thought wouldn't it be cool to give the graph of you know all the prerequisites like this all these buildings that depend on each other and units that have prerequisites of being built by that and so this is information that the network can learn and extract but it would have been great to see um or to think of really stack graph as a giant graph that even also as the game evolves use kind of star trek taking branches and so on and we tried we read a bit of research on these nothing too relevant but I I really like the idea and it has elements that are which something you also worked with in terms of visualizing your networks as elements of having human interpretable being able to generate knowledge representations that are human interpretable that maybe human experts can then tweak or at least understand so there's there's a lot of interesting aspect there and for me personally I'm just a huge fan of Wikipedia and it's it's a shame that our neural networks aren't taking advantage of all the structured knowledge that's on the web what's next for for you what's next for deep mind what are you excited about what a four alpha star yeah so I think the obvious next steps would be to apply alpha star to other races I mean that's sort of shows that the algorithm works because we wouldn't want to have created by mistake something in the architecture that happens to work for Protoss but not for other races right so as verification I think that's an obvious next step that we are working on and then I would like to see so agents and players can specialize on different skill sets that allow them to be very good I think we've seen alpha star understanding very well when to take battles and when to not to do that do that also very good at micromanagement and moving the units around and so on and also very good at producing non-stop and trading of economy with building units but I have not perhaps seen as much as I would like this idea of the poker idea that you mentioned right I'm not sure Starcraft or alpha star rather has developed a very deep understanding of what the opponent is doing and reacting to that and sort of trying to to to trick the player to do something else or that you know so this kind of reasoning I would like to see more so I think purely from a research standpoint there's perhaps also quite a few of you things to be done there in the domain of StarCraft yeah in a domain of games I've seen some interesting work in sort of in even auctions manipulating other players so forming a belief state and just messing with people yeah about theory of mind yeah yeah yeah this is a theory of mine on star Kirby's kind of they're really made for each other yeah so that would be very exciting to see those techniques applied to Starcraft or perhaps Starcraft driving new techniques right as I said this is always the tension between the two well Oriol thank you so much for talking today awesome it was great to be here thanks you