Transcript
iOCfIFBBpVY • Anca Dragan: Human-Robot Interaction and Reward Engineering | Lex Fridman Podcast #81
/home/itcorpmy/itcorp.my.id/harry/yt_channel/out/lexfridman/.shards/text-0001.zst#text/0335_iOCfIFBBpVY.txt
Kind: captions Language: en the following is a conversation with ANCA Jorgen a professor of Berkeley working on human robot interaction algorithms and looked beyond the robots function in isolation and generate robot behavior that accounts for interaction and coordination with human beings she also consults at way Moe the autonomous vehicle company but in this conversation she's 100% wearing her Berkeley hat she is one of the most brilliant and fun roboticists in the world to talk with I had a tough and crazy day leading up to this conversation so I was a bit tired even more so than usual but almost immediately as she walked in her energy passion and excitement for human robot interaction was contagious so I had a lot of fun and really enjoy this conversation this is the artificial intelligence podcast if you enjoy it subscribe I need to review it with five stars in the Apple podcast supported on patreon or simply connect with me on Twitter Alex Friedman spelled Fri DM a.m. as usual I'll do one or two minutes of as now and never any ads in the middle that can break the flow of the conversation I hope that works for you and doesn't hurt the listening experience this show is presented by cash app the number one finance app in the App Store when you get it use code Lex podcast cash app lets you send money to friends buy Bitcoin and invest in the stock market with as little as one dollar since cash app does fractional share trading let me mention that the order execution algorithm that works behind the scenes to create the abstraction of fractional orders is an algorithmic marvel so big props to the cash app engineers for solving a hard problem that in the end provides an easy interface that takes a step up to the next layer of abstraction over the stock market making trading more accessible for new investors and diversification much easier so again if you get cash out from the App Store or Google Play and use the code lex podcast you get $10 and cash app will also donate $10 the first an organization that is helping to advanced robotics and STEM education for young people around the world and now here's my conversation with Enka Droog on when did you first fall in love with robotics I think it was a very gradual process and it was somewhat accidental actually because I first started getting into programming when I was a kid and then into Mass and then into compute I disliked computer science was the thing I was gonna do and then in college I got into AI and then I applied to the robotics Institute at Carnegie Mellon and I was coming from this little school and Germany didn't know any nobody had heard of but I had spent an exchange semester at Carnegie Mellon so I had letters from Carnegie Mellon so that was the only play you know I might he said no Berkeley said no Stanford said no that was the only place I got into so I went there it's a robotics Institute and I thought that robotics is a really cool way to actually apply the stuff that I knew and loved like optimization so that's how I got into robotics I have a better story how I got into cars which is I you know I used to do mostly manipulation in my PhD but now I do kind of a bit of everything application wise including cars and I got into cars because I was here in Berkeley while I was a PhD student still for RSS 2014 better be organized in and he arranged for it was Google at a time to give us rides and self-driving cars and I was in a robot and it was just making decision after decision the right call and he was so amazing so it was a whole different experience right it's just I mean manipulation is so hard you can't do anything and there was was it the most magical robot you've ever met so like for me to mean Google self-driving car for the first time was like a transformative moment the guy had two moments like that that and spot mini I don't know if you met Bob many for Boston Dynamics I felt like I felt like I fell in love or something like it because I thought I know how a spot many works right it's just I mean there's nothing truly special it is it's great engineering work but the anthropomorphism that went on into my brain they came to life like a head little arm and like and looked at me he she looked at me you know I don't know there's a magical connection there and it made me realize wow robots can be so much more than things that manipulate objects they can be things that have a human connection Jeff was a self-driving car the moment like it was there a robot that truly sort of inspired you that was I remember that experience very viscerally riding in that car and being just wowed I I had the they gave us a sticker that said I wrote in a self-driving car and I had this cute little Firefly on yes and our logo that was like the smaller were like you had the really cute one yeah and and I put it on my laptop and I had that for years until I finally changed my laptop out and you know what about if we walk back you mention optimization at like what beautiful ideas inspired you in math computer science early on like why get into this field seems like a cold and boring field of math like what was exciting to you about it the thing is I liked math from very early on from fifth grade is when I got into the math Olympia and all of that are you competed yeah this it Romania is like our national sport do you speak I understand so I got into that fairly early and and it was little maybe to just theory with no kind of I didn't kind of how - didn't really have a goal and I didn't understanding which was cool I always liked learning and understanding but there was no can what am i applying this understanding to and so I think that's how I got into more heavily into computer science cuz it was it was kind of math meets something you can do tangibly in the world do you remember like the first program you've written okay the first program I've written with I kind of do it wasn't cute basic and fourth grade and it was drawing like a circle right yeah you know I don't know how to do that anymore right that's like the first thing that they taught me I was like you could take a special I wouldn't say was an extra isn't a sense an extracurricular so you could sign up for you know dance or music or programming and I did the programming thing and I was like what what I know why did you compete in program like these days Romania probably that's like a big thing there's a program of competition hmm what was that did that touch you at all did a little bit of the computer science Olympian but not not as seriously as I did the math Olympiad so is programming yeah it's basically here's a hard math problem solve it with a computer it was kind of yeah it's more like algorithms exactly it's not where's algorithmic so okay you kind of mentioned the Google self-driving car but outside of that Oh what's like who or what is your favorite robot real or fictional that I captivated your imagination throughout I mean I guess you kind of alluded to the Google self-driving the Firefly was a magical moment but is there something else it was I think there was the Lexus by the way this was back back then but yeah so good question I am okay my favorite fictional robot is Wally and I love how amazingly expressive it is some personal things a little bit about expressive motion kinds of things you were staying with you can do this and it's a head and it's a manipulator and what does it all mean I like to think about that stuff I love Pixar I love animation I love Wally has two big eyes I think or no yeah it has these um these cameras and they move so yeah that's it so you know it goes through and then it's super cute it's yeah I think you know the way it moves it's just so expressive the timing of that motion what is doing with its arms and what it's doing with these lenses is amazing and so I've I've really liked that from the start and then on top of that sometimes I shared this it's a personal story I share with people or when I teach about AI or whatnot my husband proposed to me by building a Wally and he actuated it so it's seven degrees of freedom including the lens thing and it kind of came in and it had the he made it have like a you know the belly box opening thing so it just did that and then it's filled out this box made out of Legos that open slowly and then BAM no yeah yeah it was it was quite quite it's at a bar it could be like the most impressive thing I've ever heard okay special connection to Wally long story short I like Wally because I like animation and I like robots and I like you know the fact that this was I we still have this robot to this day what how hard is that problem do you think of the expressivity of robots like the with the Boston Dynamics I never talked to those folks about this particular element I've talked to him a lot but it seems to be like almost an accidental side effect for them that they weren't I don't know if they're faking it they weren't trying to okay they do say that the the gripper on it was not intended to be a face I don't know if that's a honest statement but I think they're legitimate and so do we automatically just anthropomorphizing and youths up anything we could see about a robot it's like the the question is how hard is it to create a wall-e type robot that connects so deeply with us humans what do you think it's really hard right so it depends on what settings so if you want to do it in this very particular narrow setting where it does only one thing and it's expressive then you can get an animator you know can have fixer on call come in design some trajectory is there was a a key had a robot called Cosmo where they put in some of these animations that part is easy right the hard part is doing it not via these kind of handcrafted behaviors but doing it generally autonomously like I want robot say I don't work on just to clarify I don't I used to work a lot on this I don't work on that quite as much these days but but have the notion of having robots that you know when they pick something up and put it in a place they can do that with various forms of style or you can say well this robot is you know succeeding at this desk and is confident versus its hesitant versus you know maybe it's happy or it's you know disappointed about something some failure that it had or I think that when robots move they can communicate so much about internal states or perceived internal states that they have and I think that's really useful in an element that we'll want in the future because I was reading this article about how kids are kids are being rude to Alexa because they can be rude to it and it doesn't really get angry right it doesn't reply it in any way it just says the same thing so I think there's at least for that for the for the correct development of children to learn that these things and you kind of react differently I also think you know you walk in your home and you have a personal robot and if you're really pissed presumably robot just kind of behave slightly differently than one you're super happy and excited but it's really hard because it's I don't know I don't you know the way I would think about it and the way I've thought about it when it came to in expressing goals or intent its our intentions for robots it's well what's really happening is that instead of doing robotics where you have your state and you have your action space and you have your space the reward functions are trying to optimize now you kind of have to expand the notion of state to include this human internal state what is the person actually perceiving what do they think about the robots something's better and then you have to optimize in that system and so that means you have to understand how your motion your actions end up sort of influencing the observers kind of perception of you and it's very it's very hard to write math about that right so when you start to think about incorporating the human into the state model apologize for the philosophical question but how complicated are human beings do you think like can they be reduced to two kind of almost like an object that moves and maybe has some basic intents or is there something do we have to model things like mood and general aggressiveness and time I mean all these kinds of human qualities or like game theoretic qualities like what's your sense how complicated it is how hard is the problem of human robot interaction yeah should we talk about what the problem of human robot is yeah this is what I mean talk about how that yeah so and by the way I'm gonna talk about this very particular view of human robot interaction right which is not so much on the social side or on the side of how do you have a good conversation with the robot what should the robots appearance be throws out that if you make robots taller versus shorter this has an effect on how people act with them so I'm not I'm not talking about that but I'm talking about this very kind of narrow thing which is you take if you want to take a task that a robot can do in isolation in a lab out there in the world but in isolation and now you're asking what does it mean for the robot to be able to do this task for presumably what it's actually angola's which is to help some person that ends up changing the problem in two ways the first way to changes the problem is that the robot is no longer the single agent acting there you have humans who also take actions in that same space you know cars navigating around people robots around an office navigating around the people in that office if I send the robot to over there in the cafeteria to get me a coffee then there's from other people reaching for stuff in the same space and so now you have your robot and you're in charge of the actions that the robot is taking then you have these people who are also making decisions and taking actions in that same space and even if you know the robot knows what it's what it should do and all of that just coexisting with these people right kind of getting the since the gel well to mesh well together that sort of the problem number one and then there's problem number two which is goes back to this notion of I if I'm a programmer I can specify some objective for the robot to go off and optimize you can specify the task but if I put the robot in your home presumably you might have your own opinions about well okay I want my house clean but how do I want it clean then how should robot how close to me it should come and all of that and so I think those are the two differences that you have your acting around people and you what you should be optimizing for should satisfy the preferences of that end user not of your programmer who programmed you yeah and the Preferences thing is tricky so figuring out those preferences be able to interactively adjust to understand what the human is so really boys ought to be understand the humans in order to interact with them in order to please them right so why is this hard what yeah why is understanding humans hard so I think there's two tasks about understanding humans that in my mind are very very similar but not everyone agrees so there's the task of being able to just anticipate what people will do we all know that cards need to do this right we all know that well if I navigate around some people the robot has to get some notion of ok where where is this person gonna be so that's kind of the prediction side and then there's what what you are saying satisfying the preferences right so adapting to the person's preference is knowing what to optimize for which is more this inference side this what is what does this person want what is their intent what are their preferences and to me those kind of go together because I think that in if you at very least if you can understand if you look at human behavior and understand what it is that they want then that's sort of the key enabler to being able to anticipate what they'll do in the future because I think that you know we're not arbitrary we make these decisions that we make we act in the way we do because we're trying to achieve things and so I think that's the relationship between them now how complicated do these models need to be in order to be able to understand what people want so we've gotten a long way in robotics with something called inverse reinforcement learning which is the notion of someone acts demonstrates what how they want this thing done what isn't inverse reinforcement learning you said it right so it's it's the problem of take human behavior and infer reward function from this figure out what it is that that behavior is optimal with respect to and it's a great way to think about learning human preferences in the sense of you know you have a car and the person can drive it and then you can say well okay I can actually learn what the person is optimizing for I can learn their driving style or you can you can have people demonstrate how they want the house clean and then you can say okay this is this is I mean I'm getting the trade-offs that they're that they're making I'm getting the Preferences that they want out of this and so we've been successful in robotics somewhat with this and it's a it's based on a very simple model of human behavior which is remarkably simple which is that human behavior is optimal with respect to whatever it is that people want right so you make that assumption and now you can kind of inverse through that's why it's called inverse well really optimal control but but also inverse reinforcement learning so this is based on utility maximization in economics press back in the forties fine women mortgage time or like okay people are making choices by maximizing utility go and then in the late 50s we had loose and Shephard come in and say people are a little bit noisy and approximate in that process so they might choose something kind of stochastic lee with probability proportional to how much utility something has there's a bit of noise in there on this has translated into buttocks and something that we call Boltzmann rationality so it's a kind of an evolution of inversed reinforcement learning that accounts for four human noise and we've had some success with that too for these tasks where it turns out people act noisily enough that you can't just do vanilla the vanilla version ah you can account for noise and still infer what what they seem to want based on this man now we're hitting tasks word that's no not enough and what are examples where are you damn desk so imagine you're trying to control some robot that's that's fairly complicated trying to control the robot arm cuz maybe you're a patient with a motor impairment and you have this wheelchair mounted army in China to control it around or one test that we've looked at with Sergei is and our students did is a lunar lander so just I don't know if you know this Atari game it's called lunar lander it's it's really hard people really suck at landing the same mostly they just crash it left and right okay so this is the kind of toss for imagine you're trying to provide some assistance to a person operating such such a robot where you won the kind of the autonomy to kick can figure out what it is that you're trying to do and help you do it it's really hard to do that for say lunar lander because people are all over the place and so they seem much more noisy than really irrational that's an example of a task where these models are kind of failing us and it's not surprising because so we you know we talk about a 40s utility late fifties sort of noisy then the seventies came and behavioral economics started being a thing where people are like no no no no no people are not rational people are messy and emotional and irrational and have all sorts of heuristics that might be domain-specific and they're just they're just a messy mess so so what do you so what does my robot do to understand what you won and it's a very it's very that's why it's complicated it's you know for the most part we get away with pretty simple models until we don't and then the question is what do you do then um and it I had days when I wanted to you know pack my bags and go home and jobs because it's just it feels really daunting to make sense of human behavior enough that you can reliably understand what people want especially as you know robot capabilities will continue to get developed you'll get these systems that are more and more capable of all sorts of things and then you really want to make sure that you're telling them the right thing to do what is that thing well read it in human behavior so if I just sit here quietly and try to understand something about you but listening to you talk it would be harder than if I got to say something and ask you and interact and control okay can you can the robot help its understanding of the human by inflowing it influencing the behavior by actually acting yeah absolutely so one of the things that's been exciting to me lately is this notion that when you tried to that that that when you try to think of the robotics problem as okay I have a robot and it needs to optimize for whatever it is that a person wants it to optimize as opposed to maybe what a programmer said that problem we think of as a human robot collaboration problem in which both agents get to act in which the robot knows less than the human because the human actually has access to and you know at least implicitly to what it is that they want they can't write it down but they can they can talk about it they can give all sorts of signals they can demonstrate and and but the robot doesn't need to sit there and passively observe human behavior and try to make sense of it the robot can act too and so there's these information gathering actions that the robot can take to sort of solicit responses that are actually informative so for instance this is not for the purpose of assisting people but with kind of back to coordinating with people in cars and all of that one thing that dorsa did was so we were looking at cars being able to navigate around people and you might not know exactly the driving style of a particular individual that's next to you but you want to change lanes in front of them navigating around other humans inside cars yeah good good clarification question so you have an autonomous car and it's trying to navigate the road around human driven vehicles similar things ideas applied to pedestrians as well but let's just take human driven vehicles so now you're trying to change a lane well you could be trying to infer the driving style of this person next to you you'd like to know if they're in particular if they're sort of aggressive or defensive if they're gonna let you kind of go in or if they're gonna not and and it's very difficult to just you know when if you think that if you want to hedge your bets that maybe they're actually pretty aggressive I shouldn't ride this you kind of end up driving next to them and driving next to them right and then you you don't know because you're not actually getting the observations that you get away someone drives when they're next to you and they just need to go straight it's kind of the same because if they're aggressive or defensive and so you need to enable the robot the reason about how it might actually be able to gather information by changing the actions that it's taking and then the robot comes up with these cool things where it kind of not just towards you and then sees if you're gonna slow down or not then if you slow down it sort of updates its model of you and says oh okay you're more on the defensive side so now I can actually that's a fascinating dance as so that's so cool you could use your own actions to gather information that's uh that feels like I'm totally open exciting new world of robotics prop I mean how many people are even thinking about that kind of thing because it's it's actually leveraging human I mean most roboticist I've talked to a lot of you know colleagues and so on are kind of being honest kind of afraid of humans because they're messy and complicated right I understand um going back to what we're talking about earlier right now we're kind of in this dilemma okay there are tasks that we can just assume people are approximately rational for and we can figure out what they want we can figure out their goals in fear are their driving styles whatever cool they're these tasks that we can't so what do we do right do we pack our bags and go home and this one is just I've had a little bit of hope recently um and I'm kind of doubting myself scoff what do I know that you know 50 years of behavioral economics hasn't figured out but maybe it's not really in contradiction with what with the way that field is headed but basically one thing that we've been thinking about is instead of kind of giving up and saying people are too crazy and irrational for us to make sense of them maybe we can give them a bit the benefit of the doubt and maybe we can think of them as actually being relatively rational but just under different assumptions about the world about how the world works about you know they don't have we when we think about rationality and bliss the assumption is or they're rational under all the same assumptions and constraints as the robot right what if this is the state of the world that's what they know this is the transition function that's what they know this is the horizon that's what they know but maybe maybe the kind of this difference the way the reason they can seem a little messy and hectic especially to robots is that perhaps they just make different assumptions or have different beliefs so I mean that's that's another fascinating idea that this are kind of anecdotal desire to say that humans are irrational perhaps grounded behavioral economics is is that we just don't understand the constraints and their awards under which they operate and so our goal shouldn't be to throw our hands up and say they're irrational is to say let's try to understand what are the constraints what it is that there must be assuming that makes this behavior make sense good life lesson right good life that's true it's just outside a robot is good too that's communicating with humans that's just a good assume that you just don't have empathy right it's uh this is maybe there something you're missing and you know and it's you know it especially happens to robots because they're kind of dumb and they don't know things and oftentimes people are sort of super irrational and that they actually know a lot of things that robots don't sometimes like with the lunar lander the robot you know knows much more so it turns out that if you try to say look maybe people are operating this thing but assuming a much more simple fight physics model because they don't get the complexity of this kind of craft or the robot arm with seven degrees of freedom when these inertia and whatever so so maybe they have this intuitive physics model which is not you know this notion of intuitive physics is something that good you just studied actually in cognitive science was like Josh Tenenbaum Tom Griffiths what kind of stuff and and what we found is that you can actually try to figure out what what physics model kind of best explains human actions and then you can use that to sort of correct what it is that they're commanding the craft to do so they might you know be sending the craft somewhere but instead of executing that action you can sort of take a step back and say according to their intuitive if the world worked according their intuitive physics model where do they think that the craft is going war day where are they trying to send it to and then you can use the real physics right the universe of that to actually figure out what you should do so that you do that instead of where they were actually sending you in the real world and I kid you not it word peopled landed there the damn thing and you know in between the two flags and and and all that so it's not conclusive in any way but I'd say it's evidence that yeah maybe we're kind of under estimating humans in some ways when we're giving up and saying oh there's just crazy noisy then you then you try to explicitly try to model the kind of worldview that they that they have that's right that's right it's not to I mean there's things to be here for Konami's through that that that for instance I've touched upon the planning horizon so there's this idea that I just bounded rationality essentially and the idea that well maybe we work under computational constraints and I think kind of our view recently has been take the bellmen update nai and just break it in all sorts of ways by saying state no no no the person doesn't get to see the real state maybe they're estimating somehow transition function no no no no even the actual reward evaluation maybe they're still learning about what it is that they want like like you know when you watch netflix and you know you have all the things and then you have to pick something imagine that you know the D the AI system interpreted that choice as this is the thing you prefer to see and how are you gonna know you're still trying to figure out what you like what you don't like etc so I mean it's important to also account for that so it's not irrationality precise doing the right thing under the things that they know yeah that's brilliant you mentioned recommender systems what kind of and we're talking about human robot interaction kind of problem spaces are you thinking about so is it robots like wheeled robots of autonomous vehicles is it object manipulation like when you think about human robot interaction in your mind and maybe I'm tree could speak for the entire community of human robot interaction no but like what are the problems of interest here is and does it you know I kind of think of open domain dialogue as human robot interaction and that happens not in the physical space but it could just happen in in the virtual space so word who wears the boundaries of this field for you when you're thinking about the things we've been talking about yeah so I I tried to find kind of underlying I don't know what to even call them I get try to work on you know I might call what I do the kind of working on the foundations of algorithmic human robot interaction and trying to make contributions there and and it's important to me that whatever we do is actually somewhat domain agnostic when it comes to is it about you know autonomous cars or is it about quadrotors or is it a basis or the same underlying principles apply of course when you're trying to get a particular to work usually have to do some extra work to adapt that to that particular domain but these things that we were talking about around well you know how do you model humans it turns out that a lot of systems need to quote benefit from a better understanding of how human behavior relates to what people want and need to predict human behavior physical robots of all sorts and and beyond that and so I used to do manipulation I used to be you know picking up stuff and then I was picking up stuff with people around and now it's sort of very broad when it comes to the application level but in a sense very focused on ok how does the problem need to change how do the algorithms need to change when we're not doing a robot by itself you know emptying the dishwasher but we're stepping outside of that oh I thought that popped into my head just now on the game theoretic side I think you said this really interesting idea of using actions to gain more information but if we think a sort of game theory the humans that are interacting with you with you the robot identity of the robot yeah is they also have a world model of you mm-hmm and you can manipulate that and if we look at autonomous vehicles people have a certain viewpoint you said with the kids people see Alexa as a in a certain way is there some value in trying to also optimize how people see you as a robot is that it or is that a little too far and away from the specifics of what we can solve right now so both right so it's really interesting and we've seen a little bit of progress on this problem on pieces of this problem so you can again it kind of comes down to how complicated is the human model need to be but in one piece of work that we were looking at we just said ok there's these in there's this that are internal to the robot and their what their what the robot is about to do or maybe what objective what driving style the robot has or something like that and what we're gonna do is we're going to set up a system where part of the state is the person's belief over those parameters and now when the robot acts that the person gets new evidence about this robot internal state and so they're updating their mental model of the robot right so if they see a card that sort of cut someone off Tory god that's an aggressive card they no more right if they see sort of a robot head towards a particular door they're like are the robots trying to get to that door so this thing that we have to do with humans to try to understand their goals and intentions humans are inevitably gonna do that to robots and then that raises this interesting question that you asked which is can we do something about that this is gonna happen inevitably but we can sort of be more confusing or less confusing to people and it turns out you can optimize for being more informative and less confusing if you if you have an understanding of how your actions are being interpreted by the human how they're using these actions to update their belief and honesty all we did is just Bayes rule basically okay first has a belief they see an action they make some assumptions about how the robot generates its actions presumably is being rational because robots are rational see reasonable to assume that about them and then they incorporate that that new piece of evidence the Bayesian sense and their belief and they obtain a posterior and now the robot is trying to figure out what actions to take such that it steers the person's belief to put as much probability mass as possible on the correct on the correct parameters so that's kind of a mathematical formalization of that but my worry and I don't know if you want to go there with me but I about this quite a bit um the the kids talking to alexa disrespectfully worries me i worry in general about human nature I guess I grew up in Soviet Union World War two I'm gonna do two so with the Holocaust and everything I just worry about how we sometimes treat the other the the group that we call out or whatever it is through human history the group that's the other has been changed faces but it seems like the robot will be the other the other the the next the other and one thing is it feels to me that robots don't get no respect they get shoved around shoved around in is there one at the shallow level for a better experience it seems that robots need to talk back a little bit like into my intuition says I mean most companies from sort of Roomba autonomous vehicle companies might not be so happy with the idea that a robot has a little bit of an attitude but I feel it feels to me that that's necessary to create a compelling experience like we humans don't seem to respect anything that doesn't give us some attitude that or like Miss mix of mystery and attitude and anger and did that threatens us subtly maybe passive-aggressively I don't it seems like we humans yet need that dude what are you is there something you have thoughts on this one is one is it it's we respond to you know someone being assertive but we also respond to someone being vulnerable so I think robots but my first thought is that robots get shoved around and and bullied a lot because they're sort of you know tempting and they're so showing off or they appear to be showing off and so I think current going back to these things we were talking about in the beginning of making robots a little more a little more expressive a little bit more like oh that wasn't cool to do and now I'm bummed right I think that that can actually help because people can't help but anthropomorphize and respond to that even that though the emotion being communicate is not in any way a real thing and people know that it's not a real T because they know it's just a machine we're still interpret you know we can work with we watch there's this a famous psychology experiment with little triangles and kind of dots on a screen and a triangle is chasing the square and get angry at the darn triangle because why is it not leaving the square alone so that's yeah we can't helps that was the first thought the vulnerability is really interesting that I I think of like being pushing back being assertive as the only mechanism of getting of forming a connection of gaining respect but perhaps vulnerability perhaps there's other mechanisms that are less threatening yeah a little bit yes but then this this other thing that we can think about is it goes back to what you were saying that interaction is really game theoretic all right so the moment you're taking actions in the space humans are taking actions in that same space but you have your own objective which is you know you're a car you need to get your passenger to the destination and then the human nearby has their own objective which someone overlaps with you but not entirely you boat you're not interested in getting into an accident with each other but you have different destinations and you want to get home faster and they want to get home faster and that's a general of some game at that point and so that's I think that's what it's reading it as such is kind of a way we can step outside of this kind of mode that where you try to anticipate what people do and you don't realize you have any influence over it while still protecting yourself because your understanding that people also understand that they can influence you and it's just kind of back and forth is this negotiation which is really really talking about different equilibria of a game the very basic way to solve coordination is to just make predictions about what people will do and then stay out of their way and that's hard for the reasons we talked about which is how you have to understand people's intentions implicitly explicitly who knows but somehow you have to get enough of an understanding of that we all anticipate what happens next and so that's challenging but then it's further challenged by the fact that people change what they're do based on what you do because they don't they don't plan in isolation either right so when you see cars trying to merge on a highway and not succeeding one of the reasons this can be is because you you they they look at traffic that keeps coming they predict what these people are planning on doing which is to just keep going and then they stay out of the way because there's not there's no feasible plan right any planning would actually intersect with one of these other people so that's bad so you get stuck there so now kind of if if you start thinking about it as no no no actually these people change what they do depending on what the car does like if the car actually tries to kind of inch itself forward they might actually slow down and let the car in and down take an advantage of that well that you know that's kind of the next level we call this like this under actuated system idea where it's gonna under actresses and robotics but it's kind of it's you don't your influence these other degrees of freedom but you don't get to decide what somewhere it's seen you mention it this the the human element in this picture as under actuate it said you know you understand under actuator about robotics is you know that you can't fully control the system so you can't go in arbitrary directions in the configuration space under your control yeah it's a very simple way of under actuation where basically there's literally these degrees of freedom that you can control and these are affirmed that you can't but you influence them and I think that's the important part is that they don't do whatever regardless of what you do that what you do influence is what they end up doing I just also like the the poetry of calling human robot interaction and under actuated robotics problem and y'all so much sort of nudging it seems that there and I don't know I think about this a lot in the case of pedestrians I've collected hundreds of hours of videos I like to just watch pedestrians mmm-hmm and it seems that it's a funny hobby yeah it's weird because I learn a lot I learned a lot about myself about our human human behavior from watching pedestrians watching people in their environment basically crossing the street is you're putting your life on the line you know I don't know tens of millions of time in America every day is people are just like playing this weird game of chicken when they cross the street especially when there's some ambiguity about the right-of-way that has to do either with the rules of the road or with the general personality of the intersection based on the time of day and so on I mean and this nudging idea I don't you know it seems that people don't even nudge they just aggressively take make a decision somebody there's a runner that gave me this advice I sometimes run in in the street and you know not in this jannah sidewalk and you said that if you don't make eye contact with people when you're running they will all move out of your way it's called civil and attention civil inattention that's the thing oh wow I need to look this stuff but it works what is that my sense was if you communicate like confidence in your actions that you're unlikely to deviate from the action that you're following that's a really powerful signal to others that they need to plan around your actions as opposed to nudging where you're sort of hesitantly then the hesitation might communicate that you're now you're still in the dance in the game that they can influence with their own actions I've recently had conversation with Jim Keller who is a sort of this legendary chip or chip architect but he also let the autopilot in for a while and his intuition that driving is fundamentally still like a ballistics problem like you can ignore the human element that it's just not hitting things and you can kind of learn the right dynamics required to do the merger and all those kinds of things and then my sense is and I don't know if I can provide a definitive proof of this but my sense is I can order a magnitude or more more difficult when humans are involved like it's not simply a object a collision avoidance problem which where does your intuition of course nobody knows the right answer here but where does your intuition fall on the difficulty fundamental difficulty of the driving problem when humans are involved yeah good question I have many opinions on this imagine downtown San Francisco yeah yeah it's crazy busy everything okay now take all the humans out no pedestrians no human driven vehicles no cyclists no people and little skill electric scooters have been around nothing I think we're done I think driving at that point is done we're done I did nothing really that's nice tilt needs to be solved about that well let's pause there i I think I agree with you that guy and I think a lot of people here will agree with that but we need to sort of internalize that idea so what's the problem there because we're not quite yet be done with that because a lot of people kind of focus on the perception problem well a lot of people kind of map autonomous driving into how close are we to solving being able to detect all the you know the the drivable area the objects in the scene do you see that as a how hard is that problem so your intuition there behind your statement was we might have not solved the yet but were close to solving basically the perceptual problem I think the perception problem I mean and by the way a bunch of years ago this would not have been true and a lot of issues and the space can't we're coming from the fact that we don't really you know we don't know what's what's where but I think it's fairly safe to say that at this point although you could always improve on things and all of that you can drive through downtown San Francisco if there are no people around there's no really perception issue standing in your way there any perception is hard but yeah it's we've made a lot of progress on the perceptions on how to undermine the difficulty of the problem I think everything about robotics is really difficult of course you know the the planning problem the control problem all very difficult but I think what's what makes it really you know yeah it might be I mean you know and I picked downtown San Francisco I ate adapting to well now it's snowing now is no longer snowing now it's slippery in this way now so the dynamics part could good I could imagine being being still somewhat challenging but no the thing that I think worries us and our tuition is not good there is the perceptual problem at the edge cases sort of stout sauce and Francisco the nice thing it's not actually it may not be a good example because cuz you know what - what you're getting for all there's like because crazy construction zones and all yeah but the thing is you're travelling at slow speeds so it doesn't feel dangerous to me what feels dangerous is highway speeds when everything is to us humans super clear yeah I'm assuming light are here by the way I think it's kind of irresponsible to not use lighter that's just my personal opinion depending on your use case but I think like you know if you if you have the opportunity to use light are good your injury makes more sense now so you don't think vision I really just don't know enough to say well vision alone what you know what's like I there's a lot of how many cameras do you have there's all sorts of details I imagine their stuff is really hard to actually see how do you deal with would glare exactly what you're saying stuff that people would see that that that you don't I I think I have more my intuition comes from systems that can actually use lighter as well yeah until we know for sure it's make sense to be using lidar that's kind of the safety focus but then deserve the I also sympathize with the Elon Musk the statement of lidar as a crutch it's it's it's uh it's a fun notion to think that the things that work today is a crutch for the invention of the things that will work tomorrow right they get it's kind of true in the sense that if we you know that we want to stick to the conference and you see this in academic and settings all the time the things that work force you to not explore outside think outside the box I mean that happened all of that the problem is in safety critical systems you kind of want to stick with a thing Sutekh work so it's a it's an interesting and difficult trade-off in the in the in the case of real-world sort of safety critical robotic systems but so your intuition is just to clarify yes how I mean how hard is this human element forger like how hard is driving when this human element is involved are we years decades away from solving it but perhaps actually the years and the the thing I'm asking it doesn't matter what the timeline is but do you think we're how many breakthroughs away away from its in solving the human robot interaction problem to get this to get this right I think it in a sense it really depends I think that you know we were talking about how well look it's really hard because I'm just know people do is hard and on top of that playing the game is hard but I think we sort of have the fundamental some of the fundamental understanding for that and then you already see that these systems are being deployed in the real world you know even even driverless because I think now a few companies that don't have a driver in the car yeah small areas he's got a chance to I went to Phoenix and I and I shot a video with lame-o and you need to get that video out people didn't give me slack but this incredible engineering work being done there and it's one of those other seminal moments for me in my life to be able to it sounds silly but to be able to drive without a with a ride sorry without a driver in the seat I mean I was an incredible robotics I was driven by a robot and without being able to take over without being able to take the steering wheel that's a magical that's a magical moment so in that regard and those domains at least for like way mo they're there they're solving that human there's I mean there were they're going fattening it felt fast because you're like freaking out at first I was this is my first experience but it's going like the speed limit right 30 40 whatever it is and there's humans and it deals with them quite well I detects them and a good negotiation the intersections the left turns and all that so at least in those domains it's solving them the open question for me is like how quickly can we expand you know that's the you know outside of the weather conditions all those kinds of things how quickly can we expand to like cities like San Francisco yeah and I wouldn't say that it's just you know now it's just pure engineering and it's probably the I mean I know by the way I'm speaking kind of very generally here as hypothesizing but I I think that that there are successes and yet no one is everywhere out there so that seems to suggest that things can be expanded and can be scaled and we know how to do a lot of things but they're still probably you know new algorithms or modified algorithms that that you still need to put in there as you as you learn more and more about new challenges that get you get faced when how much is this problem do you think can be learned through in turn this is the success of machine learning and reinforcement learning how much of it can be learned from sort of data from scratch and how much which most of the success of autonomous vehicle systems have a lot of heuristics and rule based stuff on top like human expertise in in injected forced into the system to make it work hmm what's your what's your sense how much what's the will be the role of learning in the near term I think I I think on the one hand that learning is inevitable here right I think on the other hand and when people characterize the problem as it's a bunch of rules that some people wrote versus it's an end-to-end RL system or imitation learning then maybe there's kind of something missing from maybe that's that's more so for instance I think a very very useful tool in this sort of problem both in how to generate the cars behavior and robots in general and how to model human beings is actually planning search optimization right so robotics is a Disick Winchell decision-making problem and when when a robot can figure out on its own how to achieve its goal without hitting stuff and all that stuff you're right all the good stuff promotion planning 101 I think of that as very much AI not this is some rule or something there's nothing rule-based a bit on that right it's just you're you're searching through a space and figure now are you optimizing through a space and figure out what seems to be the right thing to do and I think it's hard to just do that because you need to learn models of the world and I think it's hard to just do the learning part where you don't you know you don't bother with any of that because then you're saying well I could do imitation but then when I go off distribution I'm really screwed or you can say I can do reinforcement learning which adds a lot of robustness but then you have to do either reinforce my learning in the real world which sounds a little challenging or that trial and error you know or you have to do reinforce millennion simulation and then that means well guess why do you need to model things at least to a to model people model the world enough that you you know whatever policy you get of that is like actually fine to roll out in the world and do some additional learning there so do you think simulation by the way just the the the quick tangent has a role in the human robot interaction space like is it useful seems like humans everything we've been talking about are difficult to model and simulate do you think simulation has a role in this space I do I think so because you can take models and train with them ahead of time for instance you can but the model sorry to interrupt the models are sort of human constructed or learned I think they have to be a combination because if you get some human data and then you say this is hog this is gonna be my model of per the person what are for simulation and training or for just deployment time and that's what I'm planning with as my model of how people work regardless if you take some data um and you don't assume anything else and you just say okay this is this is some data that I've collected let me fit a policy to help people work based on that what does to happen is you collected some data in some distribution and then now you're your robot it computes a best response to that right is sort what should I do if this is how people work and easily goes off of distribution where that model that you've built of the human completely sucks because out of distribution you have no idea right there's if you think of all the possible policies and then you take only the ones that are consistent with the human data that you've observed that still needs a lot of put a lot of things could happen outside of that distribution where you're confident then you know what's going on by the way this should you have gotten used to this terminology out of a distribution within the system machine learning terminology because it kind of assumes so distribution is referring to the the data that you States that you encounter they've noticed so far at training time yeah but it kind of also implies that there's a nice like statistical model that represents that data so odd a distribution feels like I don't know it it uh it raises to me falafel questions of how we humans reason out of distribution reasonable things that are completely we haven't seen before and so and what we're talking about here is how do we reason about what other people do in you know situations where we haven't seen them and somehow we just magically navigate that right you know I can anticipate what will happen in situations that are even novel in many ways and I have a pretty good intuition for I always get it right but you know and I might be a little uncertain and so on I think it's it's this that if you just rely on data you know you you just too many possibilities or too many policies out there that fit the data and by the way it's not just state it's clearly kind of history of stake has to really be able to anticipate what the person will do it kind of depends on what they've been doing so far cuz that's the information you need to kind of at least implicitly sort of say oh this is the kind of person that this is this probably what they're trying to do so anyway it's like you're trying to map history States so actually there's many mapping and history meaning like the last yes word the last few minutes or the last few months who knows who knows how much you need right in terms of your state is really like the positions of everything or whatnot and velocities who knows how much you need and then and then there's this there's so many mappings and so now you're talking about how do you regularize that space what priors do you impose or what's the inductive bias so you know there's all very related things to think about it on basically water assumptions that we should be making such that these models actually generalize outside of the data that we've seen and now you're talking about well I don't know what can you assume maybe you can assume that people like actually have intentions and that's what drives their actions maybe that's you know the right thing to do when you haven't seen data very nearby that tells you otherwise I don't know it's a very open question do you think so that one of the dreams of artificial intelligence was to solve common sense reasoning whatever the heck that means do you think something like common sense reasoning has to be solved in part to be able to solve this dance of human interaction the driving space or human robot interaction in general you have to be able to reason about these kinds of common-sense concepts of physics of you know all the things we've been talking about humans I don't even know how to express them with words but the bay the basics of human behavior a fear of death so like to me it's really important to encode in some kind of sense maybe not maybe it's implicit but it feels that it's important to explicitly encode the fear of death that people don't want to die because it seems silly but like that that the game of chicken that involves with the pedestrian crossing the street is playing with the idea of mortality like we really don't want to dies that's just like a negative reward I don't know I it just feels like all these human concepts have to be encoded did you share that sense or is just a lot simpler that I'm making out to be I think it might be simpler and I'm the first thing who likes to complicate is I think where we simpler than that um because it turns out for instance if you if you say model people in the very I don't call it traditional why I don't know if it's fair to look at it as a traditional way but but you know calling people as okay they're irrational somehow the utilitarian perspective well in that once you say that they you automatically capture that they have an incentive to keep on being you know Stuart um likes to stay you can't fetch the coffee if you're dead Russell that's a good night so when when you're sort of cheating agents as having these objectives these incentives humans or artificial you're kind of implicitly modeling that they'd like to stick around so that they can accomplish those goals um so I think I think in a sense maybe that's what draws me so much to the rationality framework even though it's so broken we've been able to it's been such a useful perspective and like we were talking about earlier what's the alternative I give up and go home or you know I just use complete black boxes but then I don't know what to assume out of distribution that come back to this um it's just it's been a very fruitful way to think about the problem and a very more positive way right these people aren't just crazy maybe they make more sense than we think but um but I think we also have to somehow be ready for it to be to be wrong be able to detect when these assumptions aren't holding be all of that stuff let me ask sort of an another small side of this that we've been talking about the pure autonomous driving problem but there's also relatively successful systems already deployed out there in what you may call like level two autonomy or semi autonomous vehicles whether that's test autopilot of work quite a bit with Cadillac super guru system which has a driver facing camera that detects your state there's a bunch of basically Lane centering systems what's your sense about this kind of way of dealing with the human robot interaction problem by having a really dumb robot and and relying on the human to help the robot out to keep them both alive is that is that from the research perspective how difficult is that problem and from a practical deployment perspective is that a fruitful way to approach this human robot interaction problem I think what we have to be careful about there is to not me it seems like some of these systems not all are making this underlying assumption that if so I'm a driver and I'm now really not driving but supervising and my job is to intervene right and so we have to be careful with this assumption that when I'm if I'm supervising I will be just as safe as when I'm driving like that I will you know if I if I wouldn't get into some kind of accident if I'm driving I will be able to avoid that axis and when I'm supervising to and I think I'm concerned about this assumption from a few perspectives so from a technical perspective it's that when you'll add something kind of take control and do its thing and it depends on what that thing is obviously and how much is taking on how what things are you trusting it to do but if you let it do its thing and take control it will go to what we might call off policy from the person's perspective States so stays to the person wouldn't actually find themselves in if they were the ones driving and the assumption that the person functions just as well there as they function in the states that they would normally encounter is a little questionable now another part is the kind of the human factor side of this which is that I don't know about you but I think I definitely feel like I'm experiencing things very differently when I'm actively engaged in the task versus when I'm a passive observer even if I try to stay engaged right it's very different than when I'm actually actively making decisions and you see this in life in general like you see students who are actively trying to come up with the answer learn this thing better than when they're passively told the answer I think that's some more related and I think people have studied this in human factors for airplanes and I think it's actually fairly established that these two are not the same so I and that point because I've gotten a huge amount of heat on this and I stand by it okay because I know the human factors community well and the work here is really strong and there's many decades of work show exactly what you're saying nevertheless I've been continuously surprised that much of the predictions of that work has been wrong and what I've seen so what we have to do I still agree with everything you said but we have to be a little bit more open-minded so the the I'll tell you there's a few surprising things that super villi kever ething you said to the word is actually exactly correct but it doesn't say what you didn't say is that these systems are you said you can't assume a bunch of things but we don't know if he says are fundamentally unsafe that's still unknown if there's there's a lot of interesting things like I'm surprised by the fact not the fact that what seems to be anecdotally from well from large data collection that we've done but also from just talking to a lot of people when in the supervisory role of semi autonomous systems that are sufficiently dumb at least which is that might be a key element is the systems not to be dumb the people are actually more energized as observer so they're actually better they're they're better at observing the situation so there might be cases in systems if you get the interaction right or you as a supervisor will do a better job with the system together I agree I think that is actually really possible I guess mainly I'm pointing out that if you do it naively you're implicitly assuming something that assumption might actually really be wrong but I do think that if you explicitly think about what the agent reducers that the person still stays engaged what the so that you essentially empower the person do more than they could that's the really the goal right is you still have a driver so you want to empower them to be so much better than they would be by themselves and that's different it's a very different mindset then I want them to basically not join but be ready to sort of take over so one of the interesting things we'll be talking about is the rewards that they seem to be fundamental to the way robots behaves so broadly speaking we've been talking about utility function saw but you comment on how do we approach the design of reward functions like how do we come up with good reward function [Laughter] this is you know I used to think I think about how well it's actually really hard to specify rewards for interaction because and it's really supposed to be what the people want and then you really you know we talked about how you have to customize what you want to do to the end user but I kind of realized that even if you take the interactive component away it's still really hard to design reward functions so what do I mean by that I mean if we assumed this survey I paradigm in which there's an agent and his job is to optimize some objectives some reward utility lost whatever cost if you write it out maybe it's a sad depending on situation or whatever it is if you write it out and then you deploy the agent you'd want to make sure that whatever you specified incentivizes the behavior you want from the agent in any situation that the agent will be faced with right so I do motion planning on my robot arm I specify some cost function like you know this is how far away should try to stay so much amount of stay away from people and it so much it matters to be able to be efficient and blah blah blah Ryan I need to make sure that whatever I specified those constraints or trade-offs or whatever they are that when the robot goes and solves that problem in every new situation that behavior is the behavior that I want to see and what I've been finding is that we have no idea how to do that but basically what I can do is I can sample I can think of some situations that I think are representative of what the robot will face and I can turn and add and tune some reward function until the optimal behavior is what I want on those situations which first of all is super frustrating because you know through the miracle of AI we've taken we don't have to specify rules for behavior anymore right the saying before the robot comes up with the right thing to do you plug in this situation it optimizes writing that situation it optimizes but you have to spend still a lot of time and actually defining what it is that that criterion should be make sure you didn't forget about 50 bazillion things that are important and how they all should be combining together to tell the robot what's good and when it's bad and how good and how bad and so I think this is this is a lesson that I don't know kind of I guess I close my eyes to it for a while cuz I've been you know tuning cost functions for 10 years now but it it's it really strikes me that yeah we've moved the tuning and like designing of features or whatever from the behavior side into the reward side and yes I agree that there's way less of it but it still seems really hard to anticipate any possible situation and make sure you specify a reward function that when optimized will work well in every possible situation so so you're kind of referring to unintended consequences or just in general any kind of suboptimal behavior that emerges outside of the things you said about out of distribution suboptimal behavior that is you know actually optimal I mean this I guess the idea of unintended consequence you know it's I've don't respect what you specified but it's not what you want and there's a difference between those but that's not fundamentally a robotics problem it is a human problem so like that's the thing yeah right so there is this thing called good hearts law which is you start a metric for an organization and the moment it becomes on target that people actually optimize for it's no longer a good metric well what's it called the good hearts law good hearts Allah so the moment you specify a metric it stops doing his job yeah it stops doing his job um so there's yeah there's such a thing as off or optimizing for sayings and and you know failing to to think ahead of time of all the possible things that might be important and so that's so that's interesting because you story I work a lot on every word learning from the perspective of customizing to the end user but it really seems like it's not just the interaction with the end user that's a problem of the human and the robot collaborating so that the robot can do what the human one's right that's kind of back and forward the robot probing the person being informative all of that stuff might be actually just as applicable to this kind of maybe new form of human robot interaction which is the interaction between the robot and the expert programmer a roboticist designer in charge of actually specifying what the hectic wants should do a task for this professor that's so cool like collaborating on the reward right collaborating on the reward design and so what what does it mean right what is it when we think about the problem not as someone specifies all of your job is to optimize and we start thinking about your in this interaction and this collaboration and the first thing that comes up is when the person specifies a reward it's not you know gossip was not like the letter of the law it's not the definition of the reward function you should be optimizing because they're doing their best but they're not some magic perfect Oracle and the sooner we start understanding that I think the sooner we'll get tomorrow but instead of robots that function better in different situations and then then you have kind of say okay well it's it's almost like the robots are over learning over you're putting too much weight on the reward specified by definition and maybe leaving a lot of other information on the table like what are other things we could do to actually communicate to the robot about what we want them to do besides attempting to specify a reward phone yeah you have this awesome and again it looks the poetry of leaked information you mentioned humans leaked information about what they want you know leaked reward signal for the for the robot so how do we detect these leaks yeah what are these leaks are they just I don't know that those words recently saw it read it I don't know where from you and that's gonna stick with you for a while for some reason because it's not explicitly expressed it kind of leaks in directly from our behavior we do yeah absolutely so I think maybe something surprising bits right so we were talking to before about our my robot arm it needs to move around people carry stuff put stuff away all of that and now imagine that you know the robot has some initial objective that the programmer gave it so they can do all these things functional it's capable of doing that and now I noticed that it's doing something and maybe it's coming too close to me alright and maybe I'm the designer maybe I'm the end-user and this robot is now in my home and I push it away so I push away cuz you know it's a it's a reaction to what the robot is currently doing and this is what we call physical human robot interaction and now there's a lot of there's a lot of interesting work on how do you respond to physical human robot interaction why should the robot do if such an event occurs and there's sort of different schools of thought it's well you know you can sort of treat it to control theoretical and say this is a disturbance that you must reject you can sort of treat it more a kind of heuristic Leon sorry I'm gonna go into some like gravity compensation mode so that means very maneuverable around I'm gonna go in the direction that the person push me and and to us part of realization has been that that is signal that communicates about the reward because if my robot was moving in an optimal way and I intervened that means that I disagree which is notion of optimality whatever he thinks is optimal is not actually optimal and sort of optimization problems aside that means that the cost versus reward function is is incorrect or at least is not what I wanted to be how difficult a signal to to inter to make actionable so like I it cuz this connects to our Thomas vehicle discussion what they're in the semi autonomous vehicle or autonomous V go on a safety driver disengages the car like they could have disengaged it for a million reasons yeah yeah so that's true again it comes back to Kenya can you structure a little bit of your assumptions about how human behavior relates to what they want and you you know you can't one thing that we've done is literally just treated this external torque that they applied as you know when you take that and you add it with what the torque the robot was already applying that overall action is probably relatively optimal respect to whatever it is that the person wants and then that gives you information about what it is that they want so you can learn that people want you to stay further away from them now you're right that there might be many things that explain just at one signal that you might need much more data than that for the person be able to shape your reward function over time you can also do this info gathering stuff that we were talking about now now we've done that in that context just to clarify but it's definitely somebody thought about where you can have the robot start acting in a way like if there's a bunch of different explanations right it moves in a way where it sees if you corrected in some other way or not and then kind of actually plans its motion so that it can disambiguate then collect information about what you want anyway so that's one way that's cut a sort of leaked information maybe even more subtle leaked information is if I just press the e stop right I just I'm doing it out of panic because the robot is about to do something bad there's again information there right okay the robot should definitely stop but it should also figure out that whatever was about to do was not good and in fact it was so not good then stopping and remaining stop for a while was better a better trajectory for it than whatever it is that it was about to do and that again is information about what are my preference is what do I want speaking of East ops what are your expert opinions on the Three Laws of Robotics um Isaac Asimov don't harm humans obey orders protect yourself I mean it's a it's a such a silly notion but I speak to so many people these days just regular folks just I don't know my my parents and so on about robotics and they kind of operate in that space of you know imagining our future with robots and thinking what are the ethical how do we get that dance right I know the three laws might be a silly notion but do you do you think about like what Universal reward functions that might be that we should enforce on the robots of the future or is that a little too far out and it doesn't or is the mechanism that you just described you shouldn't be three laws it should be constantly adjusting kind of thing I think it should constantly be adjusting I think that you know the issue with the laws is I don't even you know they're words and I have to write math right and have to translate them into math what does it mean to us harm me what right because we just talked about how you try to say what you want but you don't always get it right and you want these machines to do what you want not necessarily exactly what your literacy you want them you don't want them to take you literally you want to take what you're saying and interpret it in context and that's what we do with the specified rewards we don't take them literally anymore from the designer we not we as a community we as you know some members are like we and in some of our collaborators like Peter bol and Stuart Russell we sort of say okay the designer specified this thing but I'm gonna interpret it not as this is universal reward function that I shall always optimize always and forever but as this is good evidence about what the person wants and I should interpret that evidence in the context of these situations that it was specified for because ultimately that's what the designers thought about that's what they had in mind and really them specifying a reward function that works for me in all these situations is really kind of telling me that whatever behavior that incentivizes must be good behavior respect to the thing that I should actually be optimizing for and so now the robot kinda has uncertainty about what it is that it should be what its reward function is and then there's all these additional signals we've been finding that it can kind of continually learn from and adapt its understanding of what people want every time the person corrected maybe they demonstrate maybe they stopped hopefully not right one really really crazy one is the environment itself like our world you don't it's not you know you observe our world and and the state of it and it's not that you're seeing behavior and you're saying how people are making decisions that are rational bla bla bla it's but but but our world is something that we've been acting when according to our preferences so I have this example where like the robot walks into my home and my shoes are laid down on the floor kind of in a line right it took effort to do that even though the robot doesn't see me doing this you know actually aligning the shoes it should still be able to figure out that I want the shoes online because there's no way for them to have magically instantiated themselves in that way someone must have actually just a good time to do that so it must be important so the environment actually tells the varlets information at least information I mean the environment is the way it is because humans some are manipulated is so you have to kind of reverse engineer the narrative that happened to create the environments it is and that leaks the yeah yeah yeah mission yeah you have to be careful yeah right because because people don't have the bandwidth to do everything so just because you know my house is messy doesn't mean that I want it to be messy right but that just shouldn't decide you know I didn't put the effort into that I put the effort into something else so the robot should figure out well that's something else was more important but it doesn't mean that you know the house being messy is not so it's a little subtle but yeah we really think of it the state itself is kind of like a choice that people implicitly made on how they want their world what book or books technical fiction or philosophical had when you like look back your life had a big impact maybe it was a turning point was inspiring maybe we're talking about some silly book that nobody in their right mind would want to read or maybe it's a book that you would recommend to others to read or maybe those could be two different recommendations that of books that could be useful for people on their journey when I was in it's kind of a personal story when I was in 12th grade I got my hands on a PDF copy in Romania of Russell Norvig a I modern approach I didn't know anything about AI at that point I was you know I had watched the movie The Matrix and and so I started going through this thing and you know you were asking in the beginning what are what are you and just it's it you know it's math and it's algorithms what's interesting it was so captivating this notion that you could just have a goal and figure out your way through a kind of a messy complicated situation so what sequence of decisions you should make art autonomously to achieve that goal that was so cool I'm you know I'm biased but that's a cool book yeah you can convert you know the goal the goal of and tell it the process of intelligence and mechanize it I had the same experience I was really interested in psychiatry and trying to understand human behavior and then AI a modern approach is like wait you can just reduce it all yeah so that's and I think that's stuck with me cuz you know a lot of what I do a lot of what we do in my lab is write math about human behavior combine it with data and learning put it all together give it to robots to plan wit and you know hope that instead of writing rules for the robots writing heuristics designing behavior they can actually autonomously come up with the right thing to do around people that's kind of our you know that's our signature move it's we wrote some mass and then instead of kind of hand crafting this and that and that and the robots figuring stuff out and isn't that cool and I think that is the same enthusiasm that I got from there I figured out how to reach that goal in that graph isn't that cool so apologize for the romanticized questions but and the silly ones if a doctor gave you five years to live sort of emphasizing the finiteness of our existence what would you try to accomplish it's like my biggest nightmare by the way I really like living I really don't like dying of being told that I'm gonna die sorry Dylan got enough for a second do you I mean do you meditate or ponder on your mortality on our human the fact that this thing ends it seems to be a fundamental feature do you think of it as a feature or a bug - is it you you said you don't like the idea of dying but if I were to give you a choice of living forever like you're not allowed to die yeah now I'll say that I'm wandering forever but I watch this show it's very still it's called a good place and they reflect a lot on this and you know the the moral of story is that you have to make the afterlife be finite - because otherwise people just like wall-e so so I think the finance helps but but yeah it's just um you know I don't I don't I'm not a religious person I don't think that there's something after and so I think it just ends and you stop existing and I really like existing it's just it's such a great privilege to exist that that yeah it's just I think that's very part I still think that we we like existing so much because it ends mm-hmm and that's so sad like it's so sad to me every time I got find almost everything about this life beautiful like the silliest most mundane things are just beautiful and I think I'm cognizant of the fact that I find it beautiful because it ends like it and it's so I don't know I don't know how to feel about that I also feel like there's a lesson in there for robotics an AI that is not like the finite of things seems to be a fundamental nature of human existence I think some people sort of accuse me of just being Russian and melancholic and romantic or something but that seems to be a fundamental nature of our existence that should be incorporated in our reward functions but anyway if you were speaking of reward functions if you only had five years what would you try to accomplish this is the thing I I'm thinking about this question and have a pretty joyous moment because I don't know that i would change mine listen I'm what I'm I'm you know I'm trying to make some contributions to how we understand human AI interaction I don't think I would change that um maybe I'll check you know I take more trips to the Caribbean or something but I try to spend time so yeah I mean I try to to do the things that bring me joy and thinking about these things bring me joy is d'amérique Ando think you know don't do stuff that doesn't spark joy for the most part I do things that spark joy maybe I'll do like less service in the department or something but but no I mean I think I have amazing colleagues and amazing students and amazing family and friends and kind of spending time and some balance with all of them is what I do and I that's what I'm doing already so I don't know that I would really change anything so on the spirit of positiveness oh what small act of kindness if one pops to mind where you one's shown you will never forget mmm when I was in high school my friends my my classmates did some tutoring we were gearing up for our baccalaureate exam and we they did some tutoring on well someone math someone whatever I was comfortable enough with with some of those subjects but physics was something that I hadn't focused in a while and so they were all working with this one teacher and I started working with that teacher her name is Nicole McConnell and she she was the one who kind of opened up this whole world for me because she sort of told me that I should take the SATs and apply to go to college abroad and you know do better on my English and all of that and when it came to well financially I couldn't my parents couldn't really afford to do all these things she started tutoring me on physics for free and on top of that sitting down with me to kind of train me for SATs and all that jazz that she had experience with Wow and obviously that has taken you to be to here today also to one of the world's experts and robotics it's funny those little yeah dude use these small word for no reason really kindness just out of karma wanting to support someone yeah yeah so we talked a ton of our reward functions let me talk about the the most ridiculous big question what is the meaning of life what's the reward function under which we humans operate like what may be to your life may be broader to human life in general what do you think what gives life fulfillment purpose happiness meaning you can't even ask that question with a straight face that's how ridiculous I can't like him okay so you know you're gonna try to answer it anyway aren't you so I was in a planetarium once yes and you know they show you the thing and these do man is zoom out and this whole like you're a speck of dust kind of thing I think that was conceptualizing that we're kind of you know what our humans were just on this little planet whatever we don't matter much in the grand scheme of things and then my mind got really blown cuz this doctor they doctored this multi-verse this theory where they kind of zoomed out and were like this is our universe and then like there's a bazillion other ones and it stays pop in and out of existence so like our whole thing that's that we can't even fathom how big it is was like a blimp that went in and out and I thought I was like okay clearly what we should be doing is try to impact whatever local thing we can impact our communities leave a little bit behind they're our friends our family our local communities and just try to be there for other humans cuz I just everything beyond that seems ridiculous I mean are you like how do you make sense of these multiverses like are you inspired by the immensity of it do you you can it is there like is it amazing to you or is it almost paralyzing in this in the mystery of it it's frustrating I'm frustrated by my inability to comprehend it feels very frustrating it's look there's there's some stuff that you know we should time blah blah blah that we should really be understanding and I definitely don't understand it but you know the the amazing physicists of the world have a much better understanding than me Don and the grand scheme of things so it's very frustrating it's just it feels like our brain don't have some fundamental capacity yeah well yet or ever I don't know but well this one of the dreams of artificial intelligence is to create systems that will aid expand our cognitive capacity in order to understand the build the theory of everything when the physics and understand what the heck these multiverses are so I think there's no better way to end it than talking about the meaning of life and the fundamental nature of the universe and akka is a huge honor one of the my favorite conversations I've had I really really appreciate your time thank you for talking to thank you for coming come back again thanks for listening to this conversation with anchor dragon and thank you to our presenting sponsor cash app please consider supporting the podcast by downloading cash app and using code lex podcast if you enjoy this podcast subscribe on youtube review it with five stars on a podcast supported on patreon or simply connect with me on Twitter and lex friedman and now let me leave you with some words from Isaac Asimov your assumptions are your windows in the world scrub them off every once in a while or the light won't come in thank you for listening and hope to see you next time you