Transcript
p5AtrKqQ3Fw • Karl Iagnemma & Oscar Beijbom (Aptiv Autonomous Mobility) - MIT Self-Driving Cars
/home/itcorpmy/itcorp.my.id/harry/yt_channel/out/lexfridman/.shards/text-0001.zst#text/0067_p5AtrKqQ3Fw.txt
Kind: captions Language: en all right welcome back to 6 s 0 9 for deep learning for self-driving cars today we have Carling yama and oscar baby boom from active karl is the president of apt of autonomous mobility where Oscar is the machine learning lead karl founded in autonomy as many of you know in 2013 it's a boston-based autonomous vehicle company and new tommy was acquired by active in 2017 and now it's part of active karl and team are one of the leaders in autonomous vehicle development and deployment with cars on roads all over the United States several sites but most importantly Karl is MIT through-and-through is also some of you may know getting his PhD here he led a robotics group here as a research scientist for many years so it's really a pleasure to have both karl and oscar with us today please give them a warm welcome all right thanks Lex yeah very glad to be back at MIT very impressed that you guys are here during IEP my course load during IEP was usually ice skating and sometimes like there was a wine tasting course this is now almost twenty years ago and that was pretty much it that's where the academic work stopped so you guys are here to learn something so I'm gonna do my best and try something radical actually sometime president now of apps of autonomous driving I'm not allowed to talk about anything technical or interesting I'm gonna flout that a little bit and and raise some topics that we think about that I think are interesting you know questions too to keep in the back of your mind as you're thinking about deep learning an autonomous driving so I'll raise some of those questions and then Oscar will actually present some real-life technology and some of the work that he has been doing Oscar's our machine learning lead some of the work that he and his outstanding team have been doing around machine learning based detectors for the deception problem so let me first introduce apt of a little bit because people usually ask me like what's an active when I say I work for active apt has actually been around for a long time but in a different form after it was previously Delphi technologies which was previously part of General Motors so everybody's heard of General Motors some of you may have heard of Delphi active spun from Delphi about 14 months ago and so after the tier 1 supplier they're an automotive company that industrialize --is technology essentially they take software and hardware they industrialize it and put it on car so it can run for many many hundreds of thousands of miles without failing which is a useful thing when we think about autonomous driving so the themes for active they develop what they say is safer greener and more connected solutions safer means safety systems active safety autonomous driving systems of the type that we're building greener systems to enable electrification and green vehicles and then more connected connectivity solutions both within the vehicle transmitting data around the vehicle and then externally wireless communication all of these things as you can imagine feed very very nicely into the future transportation systems that the software will actually only be a part of so active is in a really interesting spot when you think about the future of autonomous driving and give you sense of scale still kind of amazes me the biggest my research group ever was at MIT was like 18 18 people active is a hundred and fifty six thousand employees so significant sized organization about a thirteen billion dollar company by revenue in about 50 countries around the world my groups about seven hundred people so of which Oscar is is one very important person we're about seven hundred working on autonomous driving we've got about a hundred twenty cars on the road in in different different countries and I'll show you some examples of that but first let me take a trip down memory lane and show you a couple of snapshots about where we were not too long ago kind of as a community but but also you know me personally and this will either inspire or horrify you I'm not sure which but the fact is 2007 you know there were groups driving around with cars like running blade servers in the trunk that we're generating so much heat you had to install another air conditioner which then was drawing so much power you have to add another alternator and then kind of rinse and repeat so it wasn't a great situation but people did enough algorithmically computationally to to enable these cars and this is the DARPA urban challenge for those who that may be familiar to enable these cars to do something useful and interesting on a closed course and it kind of convinced enough people that given enough devotion of you know thought and resources that this might actually become a real thing someday so I was one of those people that got convinced 2010 this is now I'm gonna crib from my co-founder Emilio who was a former MIT faculty member and aero-astro Emilio started up an operation in Singapore through smart who somebody had probably worked with so this is some some folks from smart that's James who looks really young in that picture he was one of emilio students who was basically taking a golf cart and and turning it into an autonomous shuttle it turned out to work pretty well and it got people in Singapore excited which in turn got us further excited 2014 they did a demo where they led people of Singapore coming right around these carts in in a garden and that worked great over the course a weekend course of a weekend around this time we'd started new autonomy we'd actually started a commercial enterprise it kind of stepped least partly away from MIT at that point 2015 we had cars on the road this is a Mitsubishi i-miev electric vehicle when we had all our equipment in it the front seat was pushed forward so far that me I'm about six foot three actually couldn't sit in the front seat so I couldn't actually accompany people on rides it wasn't very practical we ended up switching cars to a Renault Zoe platform which is the one you see here which had a little more legroom we were giving at that point open to the public rides in our cars in Singapore in the part of the city that we were allowed to operate in it was a quick transition as you can see just even you know visually the evolution of these systems has come a long way in a short time and we're just a point example of this phenomena which is kind of broadly speaking of you know similar across the industry but 2017 we joined active and we were excited by that because we as primarily scientists and technologists didn't have a great idea how we're gonna industrialize this technology and actually bring it to market and make it reliable and robust and make it safe which is what I'm going to talk about a little bit here today so we joined active with its global footprint today we're primarily in Pittsburgh Boston Singapore and Vegas and we've got connectivity to actives other sites in Shanghai and Wolfsburg let me tell you a little bit about what's happening in Vegas I think people were here when was Luke talking a couple days ago yesterday so Luke from lift Luke Vincent probably talked a little bit about Vegas Vegas is really an interesting place for us we've got a big operation there 130,000 square foot garage we've got about 75 cars we've got thirty of those cars on the lift Network so apt of technology but connecting to the customer through lift so if you go to Vegas and you open your lyft app it'll ask you do you want to take a ride in autonomous car you can opt in you can opt out it's up to you if you opt in there's a reasonable chance one of our cars will pick you up if you call for a ride so anybody can do this competitor's innocent bystanders totally up to you we have nothing to hide our cars are on the road 20 hours a day seven days a week if you take a ride when you get out of the car just like any lifts ride you got to give us a star rating one through five and that to us is actually really interesting because you know it's a scaler it's it's not too rich but that star rating to me says something about the ride quality meaning the comfort of the trip the safety that you felt and the efficiency of getting to where you want it to go and our star rating today is four point nine five which is pretty good key numbers we've given this point over 30,000 rides to more than 50,000 passengers we've driven over a million miles in Vegas and a little bit additional but primarily there and as I mentioned the 4.95 so what's it look like on the road I'll show just one video today I think Oscar has a few more this one's actually in Singapore but it's all kind of morally equivalent you'll see a sped up slightly sped up view of a run from this is now probably six seven months old on the road in Singapore but it's got some interesting stuff in a fairly typical run some of you may recognize these these these roads we're on the wrong side of the road remember because we're in Singapore but to give you an example of the some of the types of problems we have to solve on a daily basis so let me run this thing and you'll see is this car is cruising down the road you have obstacles that we have to avoid sometimes in the face of oncoming traffic we've got to deal with sometimes situations where other road users are maybe not perfectly behaving by the rules we got to manage that in a natural way Construction is Singapore like everywhere else is pretty ubiquitous and so you have to navigate through these less structured environments people who are sometimes doing things or indicating some future action which you have to make inferences about that can be tricky to navigate so typical day a route that any one of us as humans would you know drive through without batting an eye no problem is actually presents some really really complex problems for autonomous vehicles but it's the table stakes these days these are the things you have to do if you want to be on the road and certainly if you want to drive millions of miles you know with very few accidents which is what we're doing so that's an introduction to active and a little bit of background so let me talk about we're going to talk about learning and how we think about learning in the context of autonomous driving so there was a period a few years ago where I think as a community people thought that we would be able to go from pixels to actuator commands with a single learned architecture a single black box I'll say generally speaking we no longer believe that's true and I should include we in that I didn't believe that was ever true but some of us maybe thought that was true and I'll tell you part of the reason why and in part of this talk a big part of it comes down to safety a big part of it comes down to safety and the question of safety convincing ourselves that that system that black box even if we could train it to accurately approximate this massively complex underlying function that we're trying to approximate can we convince ourselves that it's safe and it's very very hard to answer that question affirmatively and I'll raise some of the issues around why that is this is not to say that learning methods are not incredibly useful for autonomous driving because they absolutely are and Oscar will show you examples of why that is and how active is using some learning methods today but this safety dimension is tricky because there's actually there's actually two axes here one is the actual technical safety of the system which is to say can we build a system that's safe that's provably in some sets safe that's we can validate which we can convince ourselves achieves the intended functionality in our operational design domain that adheres to whatever regulatory requirements might be imposed on our jurisdictions that we're operating and there's a whole longer list related to technical safety but these are technical problems primarily but there's another dimension which appear is you know called perceived safety which is to say when you ride in a car even if it's safe do you believe that it's safe and therefore will you want to take another trip which sounds kind of squishy and as engineers we're typically uncomfortable with that kind of stuff but it turns out to be really important and probably harder to solve because it's a little bit squishy and you know quite obviously we got to sit up here right we got to be in this upper right-hand corner where we have not only a very safe car from a technical perspective but one that feels safe that inspires confidence in riders in regulators and and everybody else so how do we get there in the context of elements of this system that maybe black boxes for lack of a better word what's required is trust you know how do we get to this point where we can trust neural networks in the context of safety critical systems which is what an autonomous vehicle is it really comes down to this question of how do we convince ourselves that we can validate these systems again validating the system ensuring that it can it can meet the requirements the operation requirements in the domain of interests that are imposed by the user alright there's three dimensions to to this this this this key question of understanding how to validate and I'm gonna just briefly introduce some questions some some topics of interest around each of these but the first one trusting the data trusting the data so do we actually have confidence about what goes into this algorithm I mean everybody knows garbage in garbage out there's various ways that we can make this garbage we can have data which is insufficiently covering our domain not representative of the domain we can have data that's poorly annotated by our third party trusted partners so we've trusted to to label certain things of interests so do we trust the data that's going in to the algorithm itself do we trust the implementation you've got a beautiful algorithm super descriptive super robust not brittle at all well-trained and we're running it on poor hardware we've coded it poorly we've got buffer overruns right and left do we trust the implementation to actually execute in a safe manner and do we trust the algorithm again generally speaking we're trying to approximate really complicated functions I don't think we typically use neural networks for to approximate linear systems so this is a gnarly nasty function which has topics of which has problems of critical interest which are really rare in fact they're the only ones of interests so there's these events that happen very very infrequently that we absolutely have to get right it's a hard problem to convince ourselves that the algorithm is going to perform properly in these unexpected and rare situations so these are the sorts of things that we think about and that we have to answer in an intelligent way to convince ourselves that we have a validated neural network based system okay let me just step through these each of these topics really quickly so the topic of validation you know what do we mean by that or why it is hard there's a number of different dimensions here the first is that we don't have insight into the nature of the function that we're trying to approximate you know the underlying phenomena is really complicated again if it weren't we'd probably be possibly be modeling it using different techniques we'd write a closed-form equation to describe it so that's a problem second again you know the accidents the actual crashes on the road what's going crashes and not accidents these are rare luckily they're very rare but it makes the statistical argument around these accidents and being able to avoid these accidents really really difficult if you believe rant and they're pretty smart folks they say you got to drive 275 million miles without accident without a crashed you can claim a lower fatality rate than a human with 95% confidence but how we gonna do that can we think about using some correlated incident maybe some kind of close call as a proxy for accidents which may be more frequent and maybe back in that way there's a lot of questions here which I won't say we don't have any answers to because I wouldn't go that far but there there's heart they're hard questions they're not questions with obvious answers so this is one of them these this this issue of rare events the regulatory dimension is one of these known unknowns how do we evaluate a system if the requirements that may be imposed upon us from outside regulatory bodies are still to be written other that's difficult so there's a lack of consensus on what the safety target should be for these systems this is obviously evolving smart people are thinking about this but today it's not at all clear if you're driving in Las Vegas if you're driving in Singapore if you're driving in San Francisco or in or in between what this target needs to be and then lastly and this is a really interesting one we can get through a validation process for a build of code let's assume we can do that well what happens when we're gonna update the code because obviously we will does that mean we have to start that validation process again from scratch which will unavoidably be expensive and lengthy well what if we only change a little bit of the code would have I only changed one line but what if that one line is like the most important line of code in the whole code base this is one that I can tell you keeps a lot of people up at night this question of revalidation and then not even you know again now we'll keep that code base fixed what if we move from one city to the next and let's say that city is quite similar to your previous city but not exactly the same how do we think about validation in the context of new environments so this continuous development issue is a challenge all right let me move on to talking about the data there's probably people in this room who are doing active research in this area because it's a really interesting one but there's a couple of questions I would say that we think about when we think about data we can have a great algorithm and if we're training it on poor data for one reason or another we won't have a great output so one thing we think about is this efficiency the completeness of the data and the bias that may be inherent in the data for our operational domain if we want to operate 24 hours a day and we only train on data collected during day time we're probably going to have an issue annotating the data is another dimension of the problem we can collect raw data that's sufficient that covers our space but when we annotate it when we hand it off to a third party because it's typically a third party to mark up the interesting aspects of it we provide them some specifications but we put a lot of trust in that third party and-and-and and trust that they're gonna do a good job annotating the interesting parts and not the uninteresting parts that they're going to catch all the interesting parts that we've asked them to catch etc so this annotation part which seems very mundane very easy to manage and kind of like low-hanging fruit is in fact another key aspect of ensuring that we can trust the data ok and this just kind of point to the fact that there are again smart people thinking about this problem which rears its head in many domains beyond autonomous driving now what about the algorithms themselves so moving on you know from the data to the actual algorithm you know how do we convince ourselves that that algorithm that you know like any kind of learning based auger we've trained on a training set is going to do well on some unknown test set well there's a couple kind of properties of the algorithm that we can look at that we can kind of interrogate and kind of poke at to convince ourselves that that algorithm will perform well you know one is in variance and the other one we can say is stability if we make small perturbations to this function does it behave well given kind of let's say a bounded input do we see a bounded output or do we see some wild response you know I'm sure you've all heard of examples of adversarial images that can confuse learning based classifiers so it's a it's a it's a turtle you show it a turtle that says well that's a turtle and then you show it a turtle that's maybe fuzz with a little bit of noise that the human eye can't perceive so it still looks like a turtle and it tells you as a machine gun obviously for us in the driving domain we want to stop sign to be correctly identified as a stop sign a hundred types of a hundred we don't want that stop sign if somebody goes up and puts a piece of duct tape in the lower right hand corner to be interpreted as a yield sign for example so this question of the properties of the algorithm its invariance its stability is something of high interests and then lastly and one more point to this this notion of interpretability so interpretability understanding why an algorithm made a decision that it made this is the sort of thing that may not be a nice-to-have may actually be a requirement and would likely to be a requirement from the regulatory groups that I was referring to a minute ago so let's say imagine the case of a crash where the system that was governing your trajectory generator was a was a was a data-driven system was a deep learning based trajectory generator well you may need to explain to someone exactly why that particular generate trajectory was generated at that particular moment and this may be a hard thing to do if the generator was a was a data driven model now obviously there are people working and doing active research into this specific question of interpretive all learning methods but it's it's it's a thorny one it's a very very difficult topic and it's not at all clear to me when and if we'll get to the stage where we can - even a technical audience but beyond that to a lay jury be able to explain why algorithm X made decision why okay so with all that in mind let me talk a little bit about safety that all maybe sounds pretty bleak you think well man well I've been taking this course with Lex because we're never really use this stuff but in fact we we can we can and will as a community there's a lot of tools we can bring to bear to think about neural networks and they're generally speaking within the context of a broader safety argument I think that's the key we tend not to think about using a neural network as an holistic system to drive a car but we'll think about it as a sub-module that we can build other systems around generally speaking that which we can say maybe make more rigorous claims about their performance their underlying properties and then therefore make a convincing holistic safety argument that this end-to-end system is safe we have tools functional safety is maybe familiar to some of you it's something we think about a lot in the automotive domain and so diff which stands for safety of the intended functionality we're basically asking ourselves the question is this overall function doing what it's intended to do is it operating safely and is it meeting its specifications there's kind of an analogy here to validation and verification if you will and we have to answer these questions around functional safety and soda if affirmative lis even in the even when we have neural network based elements in order to eventually put this car on the road all right so I mentioned that we need to do some embedding this is an example of what it might look like we refer to this as sometimes we call this caging the learning so we put the learning in a box it's this powerful animal we want to control and in this case it's up there at the top in red that might be you know that trajectory proposer I was talking about so let's say we've got a powerful trajectory proposer we want to use this thing we've got it on what we call our performance compute our high-powered compute it's maybe not automotive grade it's got some potential failure modes but it's generally speaking you know good performance let's go there and we've got our neural network based generator on it which we can say some things about but maybe not everything we'd like to well we make the argument that if we can surround that says we can cage it kind of underpin it with a safety system that we can say very rigorous things about its performance then generally speaking we may be okay there may be a path to using neural networks on autonomous vehicles if we can wrap them in a safety architecture that we can say a lot of good things about and this is exactly what this represents so I'm going to conclude my part of the talk here handed over to Oscar with kind of a quote and assertion one of my engineers insisted I show today the argument is the following engineering is inching closer to the Natural Sciences I won't say how much closer but closer we're creating things that we don't fully understand and then we're investigating the properties or creation we're not ready down close for closed form functions that would be too easy we're generating these immensely complex function approximator x' and then we're just poking at ways of saying boy well what does this thing do under these situations and I'll leave leave you with one image which I'll present without comment and then hand it over to Oscar all right Thank You Karl thanks Lex for the invite yes my name is Oscar run the the machine learning team at active autonomy so every weekend with this slide specification was you know quite literally a joke so this is an actual comic I won't have seen this before okay well I was doing my PhD in this era where you know building a bird classifier was like a PhD project right and it was it was you know it's funny because it's true and then of course as you well know the deep learning revolution happened and unless you know previous introductory slides gives a great overview I don't want to redo that I just want to say sort of a straight line from what I consider the breakthrough paper by Chris Jeff's key at all to the to the work I'll be talking about today I was sort of these three so you had the you know deep learning and to end learning for you is genetic classification by Christian Hinton that papers been cited 35,000 times I checked yesterday then 2014 Ross Kirsch ago at Berkeley basically showed how to you know repurpose the deep learning architecture to do a detection in images and that was the first time when the visual community really started seeing okay so classification is more general I can classify anything an image an audio signal whatever right but detection images was very intimate to the computer vision community we thought we were best in the world right so in this paper came out that was sort of the the final the final argument from I okay we all need to do deep learning now right and then 2016 this this paper came out the single shot multi box detector which i think is a great paper finally at all so if you haven't looked at you haven't looked at this paper by all means read them carefully you know that's the result you know performance is no longer a joke right so this is this is a network that we developed in our in my group so it's a it's an image joint image classification segmentation network this thing we can run this at 200 Hertz on a single GPU and in this video in this rendering there is no tracking apply there is no temporal smoothing every single frame is analyzed independently from the other one and you can see that we can model several different classes you know both both boxes and and and the surfaces at the same time there's my cartoon drawing of a perception system an autonomous vehicle so you have the three different main sense of analyses typically have some module that does detection and tracking you know this tons of variations of the dis of course we you have some sort of sense of pipelines and then in the end you have a tracking infusion step right so what I showed you in the previous video is basically at this part so I did like I said it was a tracking but it's like going from the camera to detection and if you look you know when I started so I come strict from the computer science learning community so when I start looking at this pipeline I'm like why are there so many steps why aren't we optimizing things you know end to end so obviously like there is a there's a real temptation to just wrap everything in a kernel it's very well defined input output function and like like Karl alluded to it's it's one that can be verified quite quite well assuming you have the right data I'm not going to be talking about this I am going to talk about this namely the building a a deep learning kernel for the liteup pipeline and a lot of pipeline is arguably the backbone of the reception system for for for most autonomous driving systems so what we're going to do is so this is basically going to be the goal here so we're going to have a point cloud simple and we're gonna have a have a neural network that takes that in simple and then generates 3d bounding boxes that are in the world coordinate system so it's like 20 meters that way it's two meters wide so long this this rotation and this orientation and so on so yeah so that's what this talk is about so I'm going to talk about point pillars which is a new method we developed for this and new scenes which is a benchmark data that we released okay so the supporters point bill as well it's a novel point cloud encoder that's what we do is we learn a representation that is suitable for downstream detection it's almost like the main innovation is the translation from a point cloud to a to a canvas that can then be processed by by a similar architecture that you would use in an image and we are sure it outperforms the you know all publish measures on kitty by a large margin especially with respect to inference speed and there's a pre printout and some code available if you guys want to play around with it so the architecture that we're going to use looks like something like this and I should say most papers in this space use this architecture so it's it's kind of a natural design right so you have the point cloud and at the top you have this encoder and that's where we introduced the point pillars but you can have I'll show you guys you can have various types of encoders and then after that that fits into backbone which is now a standard convolutional 2d backbone you have a detection head and you have you might have you may or may not have a segmentation at all that right the point is that after the encoder everything looks just like dark - is very similar to the SSD architecture of the our CNN architecture so so let's go into a little bit more detail right so so the range so what you're given here is it's a range of the meter say you wanted a model you know 40 meters afforded me to circle around example you have certain resolution of your bins and then a number of output channels right so input is a set of pillars or and the pillar here is a vertical column right so you have n m of those that are non-empty in this space and you say a pillar P contains all the points which are a lot of point XYZ and intensity and there's n sub M indexed by M points in each pillar right so just to say that it varies right so it could be one single point at a particular location it could be 200 points and then it's centered around the spin and the goal here is to produce a tensor as a fixed size so it's height which is you know range of a resolution with inter resolution and then this parameter C C is the number of channels so in an image C will be three we don't necessarily care about that we call it a pseudo image but it's the same thing it's a fixed number of channels that the back broken and operate on yes here's a same thing without math right so you have a lot of points and you have this space with you just grid it up in these pillars right some are empty some one of them so in this sort of with this notation let me give a little bit of a literature review people tend to do is you take each pillar and you divide it into voxels right so now I have a 3d box or grid right and then you say I'm gonna extract some sort of features for each box so for example how many points are in this voxel or what is the maximum intensity of all the points in this voxel then you extract feature for the whole pillar right what is the max intensity across all the points in the whole pillar right all of these are ten engineer functions that generates the fixed length output so what you can do is you can now concatenate them and their output is a this tensor X Y see so then Vox on that came came around I'd say year or so ago maybe a little bit more by now so they do the first this first step is similar right so you divide each pillar into voxels and then you take you map the point in each voxels and the normal thing here is that they they got rid of the future engineering so they said we'll we'll map it from a voxel to two features using a point net and I'm not going into the details of a point net but but it's basically a network architecture that that allows you to take a point cloud and map it to again a fixed length representation it's a series of 1d convolutions and max pooling layers this is a very neat paper right so what they did is they okay we say we apply that to each voxel but now I end up with this awkward four dimensional tensor because I still have X Y Z from the voxels and then I have this C dimensional output from the appointment so then they have to consolidate this Z dimension through a 3d convolution right and now you achieve your X Y C tensor so now you're ready to go so it's very nice in the sense that it's an turn method they show good performance but in the day was very slow as I got like five Hertz run time and then the the culprit here is is this last step so the 3d convolution it's it's much much slower than a standard 2d convolution alright so here's what we did we basically said let's just forget about voxels we'll take all the points in the pillar and we'll put it straight through a point in it that's it so just that single change gave a 10200 fold you know speed up from walks on that and then we simplify the point net so now instead of having so a point that can have several layers and several modules inside it so it we simplified it to a single one deconvolution and max falling layer and then we showed you can get a really fast implementation by taking all your pillars that are not empty stack them together into nice dense tensor with a little bit of padding here and there and you can run that you know run the forward pass with a single you can post it as a 2d convolution with a one by one kernel so in the final encoder runtime it's not 1.3 milliseconds which is which is really really fast so the full method looks like this right so you have the point cloud you have this pillar feature net which which is the encoder so the different steps there that feeds straight into the backbone and your detection heads and and there you go so it's still a multi-stage architecture but of course the key is that none of the steps are all the steps are you know fully parameterized and we learnt we can back propagate through the whole thing and learn it so putting these things together these were the sort results we got on the Qt benchmark so if you look at the core class right we actually got the highest performance so this is I think the bird's eye view metric and we even outperformed the the methods that relied on lidar ambition and we did that running at you know over a little bit over 60 Hertz and we you know and this is like I said this is a bird's eye view we can also measure the 3d the 3d benchmark and we get the same very similar performance yeah so you know recorded well cyclist did well pedestrian there was there was one or two map methods fusion methods that did a little bit better but then in aggregate on the top left we ended up on top and I put a little asterisk here this is compared to publish methods at the time of submission it's so many things happening so quickly so there's tons of you know submissions not a kiddie leaderboard that are a completely anonymous oh we don't even know you know what was it what was the input what they did they use so we only compared to publish methods so here's a some quantitative results you have the we you know just for visualization you can project them into the image so you see the gray boxes are the ground truth and the the corridor ones are the predictions and yeah some some challenging challenging us is so smaller but them so we have for example the person right there that's a you know a person with a little stand get interpreted as a bicycle we have this man on the ladder which is an actual annotation error so we discovered it as a person but it wasn't annotated in the data here's a young child on a bicycle that didn't get detected so that's a you know that's that's a bummer okay so stubs Kitty and then I just wanted to show you guys of course we can run this on our vehicle so this is a rendering we just deploy the network by two Hertz on on the full 360 sensor sweet input is still alive you know if you lidar sweeps but just projected into the images for visualization and again no tracking or smoothing applied here so it's every single frame is is analyzed independently see those arrows sticking out that's the velocity estimate so we actually show how you can yeah you can actually cumulate multiple point clouds into this method and now you can start reasoning about velocity as well you so the second part I want to talk about is new scenes which is a data said that we have published alright so what is new scene so it's one thousand twenty second scenes and that we collected with our development platforms it's a full it's the same platform that called show I sort of previous generation platform the so a vehicle so it's full you know the full automotive sends to sweep data is registered and synced in 360-degree view and it's also fully annotated with 3d bounding boxes I think there is over 1 million 3d bounding boxes and we actually make this freely available for research so you can go to new scene store right now and download a teaser a teaser release which is 100 scenes the full release will be in about a month and a person motivation is straightforward right so you know the whole field is driven by benchmark and you know without image and I don't think none of it might be the case that none of us are here we're here right because they may never have been able to write that first paper and sort of start this whole thing going looking at 3d I looked at the kiddie benchmark which is which is truly groundbreaking I don't want to take anything away but it was becoming outdated that they don't have full 3d view they don't have any radar so I think this this offers the opportunity to sort of push push the field forward a little bit right and just as a comparison this is sort of the the most similar benchmark and really the only one that is the that you can really compare to is kitty but so there's other data sets that have maybe lidar only tons of data sets I have image only of course but it's it's a it's quite a big step up from from kidney yeah some some details so you see the layouts with the the Raiders along the edge all the cameras on the roof and the top top lidar and some of the receptive fields and this data is all on the website the taxonomy so we model several different sub sub categories of pedestrians several types of vehicles some static objects barrier cones and then in addition all the bunch of attributes on the vehicles and on the pedestrians all right so with without further ado let's just look at some data so this is one of the thousand scenes right so all I'm showing here is just just playing the frames one by one of all the images and again the annotations are living the in the world coordinate system right so there are full three full 3d boxes I've just projected them into the image and that's what's so neat so we're not really annotating the lidar or the or the camera or the radar we're annotating the actual objects and put them in a wall coordinate system and give all the transformation so you guys can play around with it how you like so just to show that so I can because everything is ready so I can now take the light or sweep and I can just project them into the image images at the same time so here I'm showing just colored by distance so now you have some sort of sparse density measurement on the images a distance measurement sorry so so that's all I want there let's talk about thank you hi I was really really interested in your discussion around validation and particularly continuous development that sort of thing and so my question was basically is is this new scenes data set is this enough to to guarantee that your model is going to generalize to unseen data and you know not hit pedestrians in that stuff or do you have other validation that you need to do no no I mean so the new sensor for this is it's purely an academic efforts so we want to share our data with academic community to drive the car to feel forward we're not making any claims that this is somehow a sufficient data set for trying to save the case it's a small subset of our our data yeah I would say you know obviously my background is in the academic world one of the hardest things was always collecting data because it's difficult and expensive and so having access to a data set like that which was expensive to to collect and annotate but which we thought we would make available because well we hoped that it would spark academic interests and smart people like the people in this room coming up with new and better algorithms which could benefit the whole community and then maybe something even want to come work with us adaptive so not totally a little bit of self interest there wasn't intended to be for validation was more for research to give you a sense than the scale of validation there was one quote there and you know saying you got to drive 275 million miles or more depending on your certainty you want to impose but to date is an industry we've driven about like twelve million miles to twelve to fourteen million miles in some all participants in autonomous mode under hundreds of over hundreds of different Bills of code and many different environments so this would now be saying you're supposed to drive hundreds of millions of miles in a particular environment on a single build of code a single platform now obviously we're probably not going to do that what we'll end up doing is supplementing the driving with quite a lot of simulation and then other methodologies to convince ourselves that we have we can make a statistical ultimately a statistical argument for safety so there'll be use of data sets like this you know we'll be doing lots of regression testing on supersize version of data set either kind of morally equivalent versions to test different parts of the systems now I'm not just classification but different aspects of the system are motion planning decision-making localization all aspects of the system and then augment that with on-road driving and augment that with simulation so the safety case is really quite a bit broader unfortunately then any single data set would allow you to to kind of speak to from an industrial perspective what do you think can 5g offer for autonomous vehicles 5g yeah it's an interesting one well these vehicles are connected you know that's that's a that's a requirement certainly when you think about operating them as a fleet when the day comes when you have an autonomous vehicle that is personally owned and that they will come in some point in the future it may or may not be connected it will almost certainly then be too but when you have a fleet of vehicles and you want to coordinate the activity that fleet and a way to you know maximize the efficiency of that network that transportation network they're certainly connected the requirements of that kind of tivity is fairly relaxed if you're talking about just passing back and forth the position of the car and maybe some status indicators you know are you know autonomous mode manual mode are all systems go where you have a fault code and what is it now there's some interesting requirements that become a little bit more stringent if you think about what we call teleoperation and remote operation of the car the case where if the car encounters a situation it doesn't recognize can't figure out gets stuck or confused you may kind of phone a human operator who's sitting remotely to intervene and in that case you know that human operator will want to have some situational awareness there may be a demand of high-bandwidth low-latency high reliability the sort that maybe 5g is better suited to than 4G or LTE or whatever you've got broadly speaking we see it as very nice to have but like any infrastructure we understand that it's gonna arrive on a time line of its own and be maintained by someone who's not us so it's very much outside our control and so for that reason we design a system such that we don't rely on kind of the coming 5g way but we'll certainly welcome it when it arrives so you said you have presence in 45 countries so did you observe any interesting patterns from that like your car your same your same self-driving car model that is deployed in Vegas as well as Singapore was able to perform equally well in both Vegas and Singapore the model was able to perform very well in Singapore compared to Vegas to speak to your question about like country to country variation you know we touched on that for a moment in the validation discussion but obviously driving in Singapore and driving in Vegas is pretty different I mean you're on the other side of the road for starters but different traffic rules and it's sort of underappreciated people drive differently there's slightly different traffic norms so one of the things that well if anyone was in this class last year my co-founder Emilio gave a talk about something we call rule books which is a structure that we've designed around that what we call the driving policy or the decision-making engine which tries to admit in a general and fairly flexible way the ability to reprioritize rules reassign rules change weights on rules to enable us to drive in one community and then another in a fairly seamless manner so they give you an example when we when you want to get on the road in Singapore if you can imagine you've got a so you're let's say you're a autonomy engineer who was tasked with writing the decision-making engine you decided I'm gonna do a finite-state architecture I'm gonna write down some transition rules I'm gonna do them by hand it's gonna be great and then you did that for the right-hand driving and your boss came in and said oh yeah next Monday we're gonna be the left-hand driving so you flip all that and get it ready to go that could be a huge pain pain to do because it's generally speaking you're doing it manually and then a very difficult to validate to ensure that that the outputs are correct across the entire spectrum possibilities so we wanted to avoid that and so the long story short we actually quite carefully designed the system such that we can scale to different cities and countries and one of the ways you do that is by thinking carefully around the architectural design of the decision-making engine but it's it's it's you know quite different this for cities I mentioned which are primary sites Boston Pittsburgh Vegas and Singapore spans a wide spectrum of driving conditions I mean everybody knows Boston which is pretty bad Vegas is warm weather mid density urban but it's Vegas so I mean all kinds of stuff and then Singapore is interesting perfect infrastructure of good weather flat people generally speaking obey the rules so it's kind of close to the ideal case so you that exposure to this different spectrum of data I think I'll speak for Oscar maybe it's pretty valuable I know for other parts of the development team quite valuable Singapore is ideal except they're the constant construction zones so every time you drive out there's a new construction zone so we focus to have a lot of work in construction zone detection Singapore and the torrential rain yeah in the jaywalkers right they do a walk people don't break the radio AJ so other than that is perfect so which country is fully equipped it's a really good question yeah well it's interesting because there's other dimensions so when we look at which countries are interesting to us to be in as a market there's there's the infrastructure conditions there's the driving patterns and properties the density you know is it Times Square at rush hour or is it Dubuque Iowa there is the regulatory environment which is incredibly important you may have a perfectly well-suited city from a technical perspective and they may not allow you to drive there so it's really all of these things put together and so we you know we kind of have a matrix we analyze which cities check these boxes and and assign them scores and then try to understand then also the economics of that market is that city check all these boxes but there's no one taking using mobility services there there's no opportunity to actually generate revenue from the service so we can you know you factor in all of those things yeah and I think I mean one thing to keep in mind that is always the first first thing I thought candidates when I interview them there's a huge difference in the advantage to the business more we're proposing right that's right right having service so we can choose even if we commit to some city we can solve something you know select the routes that we feel comfortable and we can roll it out sort of piece by piece when you say okay we we don't feel comfortable when drive at night in the city yet so we just won't accept any rights right so that there's like that that decision space as well hi thank you very much for coming and giving us this talk today was very very interesting I have a question which might reveal more about how naive I am than anything else I was comparing your your point pillar approach to the earlier approach where you were which is this the voxel based approach to interpreting the lidar results and in the voxels you had a four dimensional tensor that you were starting with and you your point Miller you only have three dimensions you're throwing away the Z as I understood it so when you do that are you concerned that you're losing information about potential occlusions or transparencies or semi occlusions is this a concern I think so so I may have you know I've been a little bit sloppy there so we're certainly not throwing away the see what we're saying is that we're learning the embedding of in the C dimension jointly with with everything else so Volk sonnet if you want sort of felt that when I first signed that paper that I felt the need to like spoon-feeding Network a little bit and say let's learn everything you know stratified in this in this high dimension and then we'll have a second step where we learn to consolidate that into a single vector we just said why don't just learn those things together so yeah thanks for a talk I have a question for Carl you mentioned that like if people make change to the code do we need another validation or not so I work in the industry of nuclear power so we do nuclear power simulations so when we like make any change to our simulation code and to make it commercialize we need to submit a request for NRC which is the nuclear Regulation Committee so in your opinion do you think for self-driving we need another third-party validation community or not or like should that be a third party or it's just self check yeah that's a really good question so I don't know the answer I would be surprised let me put it this way I would not be surprised either way if the automotive industry ended up with with third party regulatory or oversight or it didn't and I'll tell you why there's there's great precedents for what you just described nuclear aerospace there's external bodies who have deep technical competence who can come in they can do investigations they can impose strict regulation or or advise regulation and they can they can partner or our define requirements for certification of various types the automotive industry has largely been self-certifying there's an argument which is which is certainly not unreasonable that you have a you know a real alignment of incentive within the industry and with the public to be as safe as possible simply put the cost of a crashes is enormous you know economically socially everything else but whether it continues along that path I couldn't tell you it's an interesting space because it's one where the federal government is actually moving very very quickly I mean I would say carefully to not overstepping and not trying to impose too much regulation around an industry that has never generated a dollar of revenue is still quite NASA but if you would have told me a few years ago that there would have been very thoughtfully defined draft regulatory guidelines or advice I mean let's say it's not firm regulation around this industry I probably wouldn't believe you but in fact that exists there's a third version that was released this summer by the Department of Transportation so there's intense interest on the regulatory side in terms of how far you know the process goes in terms of formation of an external body I think really remains to be seen I don't know the answer thanks for your insightful talk looking at this slide I'm wondering how easy and effective your train models are to transfer across different letters and whether you need for example if it is snowing do we need specific trainings for specifically for your light hours to work effectively or you don't see any issues in that regard no I mean I think the same rules apply to this method us as any other machine learning based method you want to have support in your training data for the situation want to deploy in so if we have no snow you know train that I wouldn't go and deploy this in snow I do like one thing I like after having worked so much with mission though is that light the lighter point cloud is really easy to augment and play around with so for example it's you know if you wanna say you want to be robust some really rare events right so let's say there's a piano on the road I really want to detect that but it's hard because I have very few examples of pianos on the road right now if you think about augmenting your visual data set with that data it's actually quite tricky so that easy to have a photorealistic piano in your training data but it is quite easy to do that in your lighter alright so you have a 3d model of your obvious piano you have your your the model for your lidar and you can get a pretty accurate fairly realistic point cloud return from that right so I like that part about working with lighter you can you can augment you can play around with it in fact one of the things we do when we train this model is that we we copy and paste samples from from or like objects from different samples you can take a car that I saw yesterday take that point the points the point difference on that car you can just paste it into your current light or sweep you have to be a little bit careful right and I'm and this was actually proposed by another very previous paper and we found that that was really useful they don't it sounds absurd but it actually works and it speaks to the ability to do that with leather punk rock okay great please give Carl and Oscar up again thank you so much you