Transcript preview
Open
Kind: captions Language: en welcome back to six at zero night for deep learning for self-driving cars today we will talk about autonomous vehicles also referred to as driverless cars autonomous cars Robo cars first the utopian view where for many autonomous vehicles have the opportunity to transform our society into a positive direction 1.3 million people die every year in the automobile crashes globally thirty five thirty eight forty thousand died every year in the United States so the one opportunity that's huge that's one of the biggest focus for us here and MIT for people who truly care about this it's to design autonomous systems our artificial intelligence system that saves lies and those systems help work with deal with or take away what nitsa calls the four DS of human folly drunk drugged distracted and drowsy driving autonomous vehicles have the ability to take away drunk driving distracted drowsy and drugged eliminate car ownership so taking shared mobility to another level eliminating car ownership from the business side is the opportunity to save people money and increase mobility and access making vehicles removing ownership makes vehicles more accessible because the cost of getting from point A to point B drops an order to magnitude and the insertion of software and intelligence into vehicles makes those vehicles makes the idea of Transportation makes the way we see moving from A to point B a totally different experience much like with our smart phone it makes it a personalized efficient and reliable experience now for the negative view for the dystopian view eliminate jobs any technology throughout its history throughout our history of human civilization has always created fear that jobs that rely on the prior technology will be lost this is a huge fear especially in trucking because so many people in the United States and across the world rely work in the transportation industry transportation sector and the possibility that AI will remove those jobs has potential catastrophic consequences the idea one that we have to struggle with in the 21st century of the role of intelligence systems that aren't human beings being further and further integrated into our lives is the idea that a failure of an autonomous vehicle even if they're much rare if they're even if they're much safer that there is a possibility for an AI algorithm designed by probably one of the engineers in this room will kill a person where that person would not have died if they were in control of the vehicle the idea of an intelligent system one indirect interaction with a human being killing that human being is one that we have to struggle with in a philosophical ethical and technological level artificial systems in popular culture lesson engineering concerns may not be grounded ethically grounded at this time much of the focus of building these systems as we'll talk about today and throughout this course that focuses on the technology how do we make these things work but of course decades out years or decades out the ethical concerns starts arising for Rodney Brooks one of the seminal people from MIT those ethical concerns will not be an issue for another several decades at least five decades but they're still important it continues the thought the idea of what is the role of AI in our society when that car gets to make a decision about human life what is it making that decision based on especially when it's a black box what is the ethical grounding of that system does it conform with our social norms does a goal go against them and there's many other concerns security is definitely a big one a car that's not even artificial intelligence based a car that's software basis they're becoming more and more millions most of the cars on road today are run by millions of lines of source code the idea that those lines of source code written again by some of the engineers in this room get to decide the life of a human being means then a hacker from outside of the car can manipulate that code to also decide the fate of that human being that's a huge concern for us from the engineering perspective the truth is somewhere in the middle we want to find what is the best positive way we can build these systems to transform our society to improve the quality of life of everyone amongst us but there's a grain of salt to the hype of autonomous vehicles we have to remember as we discussed in the previous lecture and it will come up again and again our intuition about what is difficult and what is easy for deep learning for autonomous systems is flawed if we use our if use ourselves in this example human beings are extremely good at driving this will come up again and again our intuition has to be grounded in the understanding of what is the source of data what is the annotation and what is the approach what is the algorithm so you have to be careful while using our intuition extending it decades out and making predictions whether it's towards the utopian or dystopian view and as we'll talk about some of the advancements of companies working in the space today you have to take what people say in the media what the companies say some of the speakers that will be speaking at this class say about their plans for the future and their current capabilities I think us a guy that can provide is when there's a promise of a future technology future vehicles there are two years out or more that has to be that's a very doubtful prediction one that is within a year as we'll give a few examples today is skeptical the real proof comes in actual testing of public roads or in the most impressive the most amazing the reality of it is when it's available to consumer purchase I would like to use Rodney Brooks as a so it doesn't come from my mouth but I happened to agree his prediction is no earlier than 2032 a driverless taxi service in a major US city will provide arbitrary pick up and drop off locations fully autonomously that's 14 years away and bite one 45 it will do so in multiple cities across the United States so think about that that a lot of the engineers working in the space a lot of folks are actually building these systems agree with this idea and that is the earliest I believe this will happen and Rodney believes but as all technophobes have been wrong who could be wrong this is a map on the x-axis a plot on the x axis of time throughout the 20th century and the adoption rate and the y axis from zero to 100% of the various technologies from electricity to cars to radio the telephone and so on and as we get closer to today the technology adoption rate when it goes from zero to a hundred percent the number of years it takes to adopt that technology is getting shorter and shorter and shorter as a society we're better at throwing away the technology of old and accepting the technology of new so if a brilliant idea to solve some of the problems were discussing comes along it could change everything overnight so let's talk about different approaches to autonomy we'll talk about sensors afterwards we'll talk about companies players in this space and then we'll talk about AI and the actual algorithms and how they can help solve some of the problems of autonomous vehicles levels of autonomy here's a useful tech solemnization of levels of autonomy useful for initial discussion for legal discussion and for policy making and for blog posts and media reports but it's not useful I would argue for design and engineering of the underlying intelligence and the system viewed from a holistic perspective the entire thing creating an experience that safe and enjoyable so let's go over those levels the five the six levels this is presented by SAE report J three zero one six the most widely accepted taxonomies ation of autonomy no automation at level zero level 1 and level 2 is increasing levels automation level one is cruise control level two is adaptive cruise control lane keeping level three I don't know what level three is there's a lot of people that will explain that level three is conditional automation meaning it's constrained to certain geographical location I will explain that from an engineering perspective I'm personally a little bit confused of where that stands I'll try to redefine how we should view automation level four and level five is high full level automation level four is when the vehicle can drive itself fully for part of the time there's certain areas in which it can take care of everything no matter what no human interaction input safekeeping is required level five automation is the car does everything everything I would argue that those levels aren't useful for designing systems that actually work in the real world I would argue that there's two systems but first a starting point that every system to some degree involves a human it starts with manual control from a human human getting in the car and a human electing to do something so that's the manual control what we're talking about when the human engages the system when the system is first available and the human chooses to turn it on that's when we have to AI systems human centered autonomy when the human is needed is involved and full autonomy when AI is fully responsible for everything from the legal perspective that means a to full autonomy means the car they designer the I system is liable is responsible and for the human-centered autonomy the human is responsible what does this practically mean for human center autonomy and we'll discuss examples of all of these when a human interaction is necessary the question then becomes is how often is the system available is it available on in traffic conditions so for traffic bumper-to-bumper is available on the highway is it sensor based like in the tesla vehicle meaning based on the visual characteristics to the scene the vehicle is confident enough to be able to control to make control decisions perception control decisions the other factor poor not discussed enough and I think poorly imprecisely discussed when it is is the number of seconds given to the driver not guaranteed but provided as a sort of feature to the driver to take over in the tesla vehicle in all vehicles on the road today that time is zero zero seconds are guaranteed zero seconds are provided there is some there's some room sometimes it's hundreds of milliseconds sometimes it's multiple seconds but really there's no standard of how many seconds you get to say wake up take control then tally up operation something that some of the companies will mention are playing with is when a human being is involved remotely controlling the vehicle remotely so being able to take over control the vehicle when you're when you're not able to control it so support by a human that's not inside the car that's a very interesting idea to explore but for the human centered autonomy side all of those features are not required they're not guaranteed the human driver the inside the car is always responsible at the end of the day they must pay attention to a degree that's required to take over when the system fails and no matter under this consideration under this level of autonomy the system will fail at some point that is the that is the point this is a collaboration between human and robot as the system will fail and the human has to catch it when it does and then full autonomy is AI is fully responsible now that doesn't again as will present some companies in the marketing material and the PR side of things they might present that there is significant degrees of autonomy if you're talking about l3 or l4 or l5 you have to read between the lines you're not allowed to have teleoperation if a human is remotely operating the vehicle a human is still in the loop a human is still evolved it's still a human senator autonomy system you don't get the ten second rule which is just because you give the driver ten seconds to take control that somehow removes liability for you if you say that that's it as an AI system I can't take can't resolve can't deal can't control the vehicle in this situation and you have ten seconds to take over that's not good enough the driver might be sleeping that driver may have had a heart attack they're not able to control the vehicle full autonomous systems might must find safe harbor they must get you full stop from point A to point B that point B might be your desired destination or might be a safe parking lot but it has to bring you to a safe location this is a clear definition of the two systems in the human of course as far as our certain current conception of artificial intelligence in cars today is a human always overrides the AI system so we should for them for the in the general case the human gets to choose to take control they I can't take control the human except when danger is imminent meaning sudden crashes like in a bee events we're not yet ready for the I systems to say as a society to say no no you're drunk you can't drive so beyond the traditional levels from level zero to level five the starting point is level zero no automation all cars start here level one level two and level three I would argue fall into human senator autonomy systems a1 because they did involve some degree of a human then l4 l5 to some degree there's some crossover fall into full autonomy even though with l4 with way mo as you can ask on Friday and anyone Cruz uber playing in the space there's very often a human driver involved one of the huge accomplishments of way mo over the past month incredible accomplishment where in Phoenix Arizona they drove without the car drove without a driver the meaning there was no safety driver to catch there was no engineer staff member there to catch the car a human being that doesn't work for Google or way mo got into that car and got from A to point B without a safety driver that's an incredible accomplishment and that particular trip was a fully autonomous trip that is full autonomy well there's no human to catch the car no way I press station is good without cats it's a full autonomy a to system its when you do nothing but right along human Senate autonomy system is when you have some control I'm sorry I had to so the two paths for autonomous systems they want to need to in blue on the left is a one human centered on the right is a two full autonomy and then blue is from the artificial intelligent perspective is easy easier and then red is harder easier meaning we do not have to achieve a hundred percent accuracy harder means everything that's off of a hundred percent accuracy no matter how small has a potential of costing human lives and huge amounts of money for companies so let's discuss we'll discuss later in the lecture about the algorithms behind each of these methods and the left on the right but this summarizes the two approaches the localization mapping for the car to determine where it's located for the human centered autonomy it's easy it still has to do the perception it has to localize itself within the lane it has to find all the neighboring pedestrians and the vehicles in order to be able to control the vehicle to some degree but because the human is there it doesn't have to do so perfectly when it fails a human is there to catch it scene understanding perceiving everything in the environment from the camera from whether its lidar radar ultrasonic the planning of the vehicle whether it's just staying within lane or for adaptive cruise control controlling the longitudinal movement of the vehicle or its changing lanes is the Tesla autopilot or higher degrees of automation all of those movement planning decisions can be autonomous Lee when the human is there to catch it's easier because you're allowed to be wrong rarely but wrong the hard part is getting the human robot interaction piece right that's next Wednesday lecture as we'll discuss about how deep learning can be used to interact first perceive everything about the driver and second to interact with the driver that part is hard because you can't screw up on that part you have to make sure you help the driver know where your flaws are so they can take over if the driver is not paying attention you have to bring their attention back to the road back to the interaction you have to get that piece right because for a flawed system one that's rarely flawed the rarity is the challenge in fact has to get the interaction right and then the final piece communication the autonomous vehicle fully autonomous vehicle must communicate extremely well with the external world with the pedestrians that jaywalkers the humans in this world the cyclists that communication piece one at least that is part of a safe and enjoyable driving experience is extremely difficult on the taught a way Moe vehicle I wish them luck if they come to Boston from getting from point A to point B because pedestrians will take advantage a vehicle must assert itself in order to be able to navigate Boston streets and that assertion is communication that piece is extremely difficult for Tesla vehicle for for a human centered autonomy vehicle l2 l3 the way you deal with Boston pedestrians is you take over roll down the window yell something and then speed up getting the piece for an artificial intelligence system to actually be able to accomplish something like that as we'll discuss on the ethics side and the engineering side is extremely difficult that said most of the literature and the human factors field in the autonomous vehicle field anyone that studied autonomy in aviation and in vehicles is extremely skeptical about the human centered approach they think it's deeply responsible it's deeply responsible because as argued because human beings when you give them a technology which will take control part of the time they will get lazy they would take advantage of that technology they will over trust that technology they'll assume will work perfectly always this is the idea that this this idea extended beyond further and further means that the better the system gets the better the car gets it driving itself the more the humans will sit back and be completely distracted it will not be able to re-engage themselves in order to safely catch when the system fails this is Chris Urmson the founder of the Google self-driving cars program and now the co-founder of one at the other co-founders a speaker this class on next Friday sterling Anderson of a company called Aurora a start-up he was one of the big proponents or the I should say opponents the idea that human senator autonomy could work they tried it publicly is spoken about the fact that at Google as in the early self-driving car program they've tried shared autonomy they've tried l2 and it failed because they're engineers that people driving their vehicles fell asleep and that's the belief that people have and we'll talk about why that may not be true there's a fascinating truth in the way human beings can interact with artificial intelligence systems that may work in this case as I mentioned it's the human robot interaction building that deep connection between human and machine of understanding of communication this is what we believe happens so there's a lot of videos like this as it's it's fun but it's also representative of what what society believes happens when automation is allowed to enter the human experience and driving or the human life is a stake that you can become completely disengaged it's kind of it's kind of a natural thing to think but the question is does this actually happen what actually happens on public roads the amazing thing that people don't often talk about is that there is hundreds of thousands of vehicles on the road today equipped with autopilot Tesla autopilot that have a significant degree of autonomy that's data that's information so we can answer the question what actually happens so many of the people behind this team have instrumented 25 vehicles 21 of which are Tesla autopilot vehicles now with over collected recording everything about the driver 2 cameras 2 HD cameras on the driver 2 cameras on the external camera on the external roadway and collecting everything about the car including audio the state that pulling everything from the cam bus the kinematics of the vehicle I am you GPS all of that information over now over 300,000 miles over 5 billion video frames all as we'll talk about analyze the computer vision you extract from that video of the driver of everything they're doing that level distraction the allocation of attention the drowsiness emotional states the hands on wheel hands off wheel body pose activity smartphone usage all these factors all of these things that you would think would fall apart when you start letting autonomy into your life we'll talk about what the initial reality is that should be inspiring and thought-provoking as I said three cameras single board computer recording all the data over a thousand machines in Holyoke and distributed computation running the deep learning algorithms I've I've mentioned on these five plus billion video frames going from the raw data to the actionable useful information the slides are up online if you'd like to look through them oh fly through some of them and this is the video of one of thousands of trips we have in autopilot in our data a car driving autonomously a large fraction of the time on highways from here to California from here to Chicago to Florida and all across the United States we take that data and using the supervised learning algorithms semi-supervised the number of frames here is huge for those that work in computer vision five billion frames is several orders of magnitude larger than any data set that people are working with in computer vision actively annotated so we want to use that data for understanding the behavior of what people actually doing in the cars and we want to train the algorithms that do perception and control a quick summary over three hundred thousand miles twenty five vehicles the colors are true to the actual colors of the vehicles little fun facts Tesla Model X Model S and now model three five hundred thousand five hundred plus sorry miles a day and growing now most days in 2018 are over a thousand miles a day this is a quick GPS map in red is manual driving across the Boston area in blue cyan is autonomous driving this is giving you the sense of just the scope of this data this is a huge number of miles with automated driving several orders of magnitude larger than what Wei Mo's doing that what Cruise is doing and what Ober is doing the miles driven in this data with autopilot confirming what y'all muska stated it's 33% of miles of driven autonomously this is a remarkable number for those of you who drive and for those of you who are familiar with these technologies that is remarkable adoption rate that 33 percent of the miles are driven in autopilot that means these drivers are getting use out of the system it's working for them that's an incredible number it's also incredible because under the the decades of literature from aviation to automation and vehicles to to Chris Urmson and way mo the belief is such high numbers are likely to lead to crashes to fatalities to at the very least highly responsible behavior drivers over trusting the systems and getting in trouble we can run the glance classification algorithms again this is for next Wednesday discussion to the actual algorithm it's the algorithm that tells you the region that the driver is looking at and it's comparing road instrument cluster left rearview center stack and right does the allocation of glance change with autopilot or with manual driving it does not appear to in any significant noticeable way meaning you don't start playing chess you don't start you don't get in the backseat to sleep you don't start texting in your smartphone watching a movie at least in this data set there's promise here for the human centered approach the observation to summarize this particular data is that people are using it a lot the percentage of miles the percentage of hours is incredibly high at least relative to what was will be expected from these systems and given that there's no crashes there's no near crashes in autopilot the row type is mostly highway traveling at high speeds the mental engagement looked at 8,000 transyl of control from machine to human so human beings taking control of the vehicle saying you know what I'm going to take control now I'm not comfortable with the situation for whatever reason either not comfortable or electing to do something that the vehicle is not able to like turn off the highway make a right or left turn stop for a stop sign these kinds of things physical engagement as I said glance remains the same and what do we take from this it says something that I'd like to really emphasize this we talked to was we talked about autonomous vehicles in this class and the guest speakers who are all on the other side so I'm representing the human center side all our speakers are focused on the full autonomy side because that's the side roboticists know how to solve that's the fascinating algorithm nerd side and that's the side I love as well just my belief stands that the solving the perception control problem is extremely difficult and to three decades away so in the meantime we have to utilize the human robot interaction to actually bring these AI systems onto the road to successfully operate and the way we do that counter-intuitively is we have to have we have to let the artificial intelligence systems reveal their flaws one of the most endearing things to human beings can do to each other friends is reveal their flaws to each other now from an automotive perspective from a company perspective it's perhaps not appealing for an AI system to reveal what it sees about the world and what it doesn't see about the world where it succeeds and where it fails but that is perhaps exactly what it needs to do in the case of autopilot the way the very limited but I believe successful way is currently doing that is allowing you to use autopilot basically anywhere so what people are doing is they're trying to engage their turn on autopilot in places where they really shouldn't rural rural roads curvy with terrible road markings with in in heavy rain conditions with snow with lots of cars driving at high speeds all around they turn autopilot on to understand to experience the limitations of the system to interact that human-robot interaction is through its tactile by turning it on and seeing is it going to work here how's it gonna fail and the human is always there to catch it that interaction that's communication that intimate understanding is what creates successful integration of AI in the car before we're able to solve the full autonomy puzzle learn the limitations by exploring it starts with this guy and hundreds of others if you search on YouTube first time with autopilot the amazing experience of direct transfer of control of your life to an artificial intelligence system in this case giving control to Tesla autopilot system this is why in the human centered camp of autonomy I believe that autonomous vehicles can be viewed as personal robots with which you build build a relationship or the human robot interaction is the key problem not the perception control and they're the flaws of both humans and machines must be clearly communicated and perceived perceived because we use the computer vision algorithms to detect everything about the human it communicated because on the displays of the car or even through voice it has to be able to reveal when it doesn't see different aspects of the scene from the human centered approach then we can focus on the left the perception and control side perceiving everything about the external environment and controlling the vehicle without having to worry about being 99.99999% correct approaching a hundred percent correct because in the cases where it's extremely difficult we can let the human catch the system we can reveal the flaws and let the human take over when the system can't so let's get to the sensors the sources of raw data that we'll get to work with there three there's cameras so image sensors RGB infrared visual data does radar and ultrasonic and there's lidar let's discuss the strengths first to discuss really what these sensors are the strengths the weaknesses and how they can be integrated together through sensor fusion so radar is the trust of the old trusted friend the sensor that's commonly available in most vehicles that have any degree of autonomy on the left is a visualization of the kind of data on high-resolution radar that's able to be extracted it's cheap both radar which works with electromagnetic waves and ultrasonic which works with sound waves sending a wave letting it bounce off the obstacles knowing the speed of that wave being able to calculate the distance to the obstacle based on that it does extremely well in challenging weather rain snow the downside is low resolution compared to the other sensors we'll discuss but it is the one that's most reliable and used in automotive industry today and it's the one that's in sense of fusion is always there lidar visualized on the right the down size it's expensive but it produces an extremely accurate depth information and a high resolution map of the environment that has 360 degrees of visibility it has some of the big strengths of radar in terms of reliability but with much higher resolution and accuracy the downside is cost here is the visualization comparing the two of the kind of information get to work with the the the density and the quality of information with lidar is much higher and lighter has been the successful source of ground truth the reliable sensor relied upon on vehicles that don't care about cost and camera the thing that most people here should be passionate about because machine learning deep learning has the most ability to have a significant impact there why first it's cheap so it's everywhere second it's the highest resolution so there's the most the most highly dense amount of information which means information is something that could be learned and inferred to interpret the external scene so that's why it's the best source of data for understanding the scene and the other reason it's awesome for deep learning is because of the huge eNOS of data involved the its many orders of magnitude more data available for driving in camera visible light or infrared than it is in lidar the and our world is designed for visible light our eyes work in similar ways the cameras at least crudely so the source data is similar the lane markings the traffic signs of traffic lights the other vehicles the other pedestrians all operate with each other in this RGB space in terms of visual characteristics the downside is cameras are bad at depth estimation it's noisy and difficult even with stereo vision cameras to estimate depth relative to lidar and they're not good in extreme weather and they're not good at least visible light cameras at night compare the ranges here's a plot and meters on the x-axis of the range and acuity and the y-axis with ultrasonic lidar radar and camera passive visual sensor plotted the range of cameras is the greatest this is looking at we're going to look at several different conditions this is for clear well-lit conditions so during the day no rain no fog lighter and radar have a smaller range under 200 meters and ultrasonic sensors used mostly for Park assistance and these kinds of things and blind spot warning has terrible range is designed for extremely close as high resolution distance estimation for extremely close distances here a little bit small but looking at up top is clear well-lit conditions the plot we just looked at and on bottom is clear dark conditions so just a clear night day no rain but it's night and on the bottom right is heavy rain snow or fog vision falls apart in terms of range and accuracy under dark conditions and in rain snow or fog radar our old trusted friend stay strong the same range just under two hundred meters and at the same acuity same with sonar lighter doesn't works well at night but it does not do well with rain or fog or snow one of the biggest downsides of lidar other than cost so here's another interesting way to visualize this that I think is productive for our discussion of which sensor will win out is it the Elon Musk prediction of camera or is that the way more prediction of lidar for I'd are in this kind of plot that will look for every single sensor the greater the radius of the blue the more successful that sensor is at accomplishing that feature with a bunch of features lined up around the circle so range for lidar is pretty good not great but pretty good resolution is also pretty good it works in the dark it works in bright light but it falls apart in the snow it does not provide color information texture information contrast it's able to detect speed but the sensor size at least to date is huge the sensor cost at least to date is extremely expensive and it doesn't do well in proximity where ultrasonic shines speaking of which ultrasonic same kind of plot does well in proximity detection it's cheap the cheapest sensor of the four and sensor size you can get it to be tiny it works and snow and fog and rain but its resolution is terrible its range is non-existent and it's not able to detect speed that's where radar steps up it's able to detect speed it's also cheap it's also small but the resolution is very low and it's just like lidar is not able to provide texture information color information camera the sensor cost is cheap the sensor size is small not good up close proximity the range is the longest of all of them resolution is the best of all of them it doesn't work in the dark it works in bright light but not always one of the biggest downfalls of camera senses is the sensitivity to the lighting variation it works it doesn't work in the snow fog rain so suffers much like lidar from that but it provides rich interesting sectional information the very kind that deep learning needs to make sense of this world so let's look at the cheap sensors ultrasonic radar and cameras which is one approach putting a bunch of those in a car and fusing them together the cost there is low one of the nice ways to visualize using this visualization technique when they're fused together on the bottom it gives you a sense of them working together to complement each other as strengths and the question is whether the camera or lidar will win out for partial autonomy or full autonomy on the bottom showing this kind of visualization for a lidar sensor and on top showing this kind of visualization for fused radar ultrasonic and camera at least under these considerations the fusion of the cheap sensors can do as well as lidar now the open question is whether lidar in the future of this technology can become cheap and its range can increase because then lidar can win out solid-state light our and a lot of developments with a lot of startup ladder companies are promising to decrease the cost and increase the range of these sensors but for now we plow along with dedication on the camera front the annotated driving data grows exponentially more and more people are beginning to annotate and study the particular driving perception and control problems and the very algorithms for the supervised and semi-supervised and generative networks that we use to work with this data are improving so it's a race and of course radar and ultrasonic I was there to help so companies that are playing in the space some of them are speaking here lame-o in April 2017 they exited their testing their extensive impressive testing process and allow the first rider in Phoenix Public rider in November 2017 it's an incredible accomplishment for a company and for an artificial intelligence system in November 2017 no safety driver so the car truly achieved full autonomy under a lot of constraints but it's full autonomy it's a step it's an amazing step in the direction towards full autonomy much sooner than people would otherwise predict and the miles four million miles driven autonomously by November 2017 and growing quickly growing in terms of full autonomous driving if I can say so cautiously because most of those miles have a safety driver so I would argue it's not full autonomy but however they define full autonomy it's four million miles driven incredible uber in terms of miles second on that list they have driven two million miles autonomously by December of this of last year 2017 the quiet player here in terms of not making any declarations of being fully autonomous just quietly driving in a human censored way l2 over 1 billion miles in autopilot over three hundred thousand vehicles today are equipped with autopilot technology with the ability to drive control the car laterally and longitudinally and if anyone believes the CEO of Tesla there'll be over 1 million such vehicles by the end of 2018 but no matter what the 300,000 is an incredible number and the 1 billion miles is an incredible number autopilot was first released in September 2014 one of the first systems on the road to do so autopilot and I call myself as one of the skeptics in October 2016 autopilot decided to let go of an incredible work done by Mobil I now Intel we're designing their perception control system they decided to let go of it completely and start from scratch using mostly deep learning methods the DRI px 2 system from Nvidia and 8 cameras they decided to start from scratch that's the kind of boldness the kind of risk-taking that can come with naivety but in this case it worked incredible audio 8 system is going to be released at the end of 2018 and it's promising one of the first vehicles that's promising what they're calling l3 and the definition of l3 according to Thorsten Lionheart the head of the automated driving and Oddie in a naughty is when the function is operate as intended if the customer turns the traffic jam pilot on now this l3 system is designed only for traffic jazz bumper-to-bumper traffic under 60 kilometers an hour if the customer returns a traffic jam pilot on and uses it as intended and the car was in control at the time of the accident the driver goes to the insurance company and the insurance company will compensate the victims of the accident and aftermath they come to us we will pay them so that means the cars liable the problem is under the definition of l2 l3 perhaps there is some truth to this being an l3 system the important thing here is it's nevertheless deeply and fundamentally human centered because even as you see here in this demonstration video with a reporter the car for a poorly understood reason transfer control to the driver says that's it I can't I can't take care of the situation you take control how how much time do you have in terms of seconds before you really need to know to take over well this is the new thing about level 3 with level 3 the system allows the driver to give the prompt to take over vehicle control again ahead of time which is in this case up to 10 seconds ok so if the traffic jam situation clears up or any failure in the system occurs everything you might think of the system still needs to be able to drive automatically because the driver has this time to take over you might ask what its new about this so why is Howdy saying this is the first level 3 system worldwide on the market when talking about these levels of automation there's a classification which starts at lower zero which is basically the drivers doing everything there's no assistance nothing and then it gradually becomes into partly automation and when we're talking about these assistance functions like lane-keeping and distance keeping we're talking about level 2 assistance function ok which is meaning that the driver is obliged to permanently monitor the traffic situation to keep the hands on the wheel even though there's a support and an assistance and to intervene immediately if anything is not quite right so you know that from laying assistance systems when the steering is not perfectly in the right lane we have to intervene and correct immediately and that is the main difference now we got a takeover request so what so let's let's talk about what that means this is still a human Center system it still struggles that still must solve the human robot interaction problem and there's many others playing in the space I'm the on the full autonomy side way mo uber GM crews new tana me the CTO of which he'll speak here on Tuesday optimist ride its annuity voyage the CEO of which will speak here next Thursday and Aurora not listed this the founder of which will speak here next Friday and the human centered autonomy side the reason I am speaking about us so much today is we don't have any speakers I'm the speaker the Tesla autopilot is for several years now doing incredible work on that side we are also working with Volvo pilot assist as a lot of different approaches they're more concerned of interesting the audio traffic jam assist as I mentioned the a8 being released at the end of this year the Mercedes drive pollicis in the e-class an interesting vehicle that I got to drive quite a bit as the Cadillac supercruise the ct-6 which is very much constrained geographically to highway driving and the loudest proudest of them all george hotz of the comma a open pilot let's just leave that there so where can a I help we'll get into the details of the coming lectures on each individual component I'd like to give some examples the key areas problem spaces that we can use machine learning to solve from data his localization and mapping so being able to localize yourself in the space the very first question that a robot needs to answer where am I seen understanding taking the scene in and interpreting that scene detecting all the entities in the scene detecting the class of those entities in order to then do movement planning to move around those entities and finally driver state essential element for the human robot interaction perceive everything about the driver everything about the pedestrian and the cyclists and the cars outside the human element of those the human perception side so first the where am I visual odometry using camera sensors which is really where once again deep learning is most that a vision sensor is the most amenable to learning based approaches and visual odometry is using camera to localize yourself to answer the where am I question the traditional approaches slam detect features in the scene and track them through time from frame to frame and from the movement those features are able to estimate thousands of features tracking estimate the location the orientation of the vehicle or the camera those methods with stereo vision first requires taking two camera streams on distorting them competing disparity map from the different perspectives of the two camera computing the matching between the two the feature detection thus if too fast or any of the methods of extracting non deep learning methods of the extracting features strong detectable features that can be tracked through from frame to frame tracking those features and estimating the trajectory the orientation of the camera that's the traditional approach to visual odometry in the recent years since 2015 but most success in the last year has been the end end deep learning approaches either stereo or monocular cameras deep vo is one of the most successful the antenna method has taken a sequence of images extracting with a CNN from each image the central features from each image and then using RNN recurrent neural network to track over time the trajectory the pose of the camera image to pose and to end here's the visualization on a kitty data set using deep vo again taking the video up on the top right as an input and estimating what's visualized is the position of the vehicle in red is the estimate based again and to end with a CNN and RNN the in red is the estimate in blue is the ground truth in the kitty dataset so this removes a lot of the modular parts a slam a visual odometry and allows it to be and to end which means it's learner bull which means it gets better with data that's huge vision alone this is one of the exciting opportunities for AI or people working in AI is the ability to use a single sensor and perhaps the most inspiring because that sensor is similar to our own the sensor that we ourselves use of our eyes to use that alone as the primary sensor to control a vehicle that's really exciting and the fact that deep learning that the vision visible light is the most amenable to deep learning approaches makes this particularly an exciting area for deep learning research scene understanding of course who can do a thousand slides on this traditionally object detection pedestrians vehicles there is a bunch of different types of classifiers of feature extractions harlech features and deep learning has basically taken over and dominated every aspect of scene interpretation perception understanding tracking recognition classification detection problems and audio can't forget audio that we can use audio as source of information whether that's detecting honks or in this case using the audio of the tires microphones on the tires to determine visualize there's a spectrogram of the audio coming in for those of you who are particularly have a particularly tuned ear can listen to the different audio coming in here of wet road and dry road after the rain so there's no rain but the road is nevertheless wet and detecting that is extremely important for vehicles because they still don't have traction control estelle have poor control in road to road surface tired road surface connection and being able to detect that from just audio is a very interesting approach finally we're not finally next for the perception control side finally is the movement planning getting from A to point from point A to point B traditional approaches the optimization based approach determine the optimal control try to reduce the problem formalize the problem in a way that's amenable to optimization based approaches there's a lot of assumptions that need to be made but once those assumptions are made you're able to determine to generate thousands or millions of possible trajectories and have an objective function we determine which of the trajectories to take here's a race car optimizing how to take a turn at high speed with deep learning reinforcement learning the application mule networks reinforcement learning is particularly exciting for both the control and the planning side so that's where the two of the competitions we're doing in this class come into play the simplistic two-dimensional world of deep traffic and the high mood high speed moving high-risk world of deep crash will explore those tomorrow tomorrow's lectures on deeper enforcement learning and finally drivers state detecting everything about the driver and then interacting with them on the left and green are the easier problems on the right and red are the harder problems in terms of perception in terms of how amenable they are to deep learning methods body pose estimation is a very well studied problem we have extremely good detectors for estimating the pose the hands the elbows the shoulders every aspect visible aspect of the body head pose the orientation of the head or extremely good at that and as we get smaller and smaller in terms of size blink rate blink duration I pose and blink dynamics start getting more and more difficult all of these metrics all of these metrics extremely important for detecting things like drowsiness or as components of detecting emotion or word people are looking in driving where your head is turned is not necessarily where you're looking in regular life non-driving life when you look somewhere you usually turn your head to look with your eyes in driving your head often stay still or moves very subtly your eyes do a lot more moving it's the kind of effect that we described as the lizard owl effect some fraction of people a small fraction or owls meaning they move their head a lot and some people most people are lizards moving eyes to allocate their attention the problem with eyes is from the computer vision perspective they're much harder to detect in lighting variation than real-world conditions they get harder and we'll discuss how to deal with it of course that's where deep learning steps up and really helps with real-world data cognitive load we'll discuss as well estimating the cognitive load of the driver to give a quick clip is this as the driver glance we've seen before estimating the very most important problem on driver stateside is determining whether they're looking on road or off road it's the dumbest simplest but most important aspect are they looking are they in the seat and looking on the road or are they not that's driver glance classification not estimating the X Y Z geometric orientation where they're looking but actually binary class classification on road or off road body pose estimation determining if the hands are on wheel or not determining if the body alignment is standard is good for seatbelt for safety this is one of the important things for autonomous vehicles if there's an imminent danger to the driver the driver should be asked to return to a position that is safe for them in case of a crash driver in motion on the top is satisfied on the bottom as a frustrated driver they self-reported satisfied this is with a voice based navigation one of the biggest sources of frustrations for people in cars is voice based navigation trying to tell an artificial intelligence system using your voice alone where you would l
Resume
Categories