Transcript
_OCjqIgxwHw • MIT Self-Driving Cars (2018)
/home/itcorpmy/itcorp.my.id/harry/yt_channel/out/lexfridman/.shards/text-0001.zst#text/0028__OCjqIgxwHw.txt
Kind: captions
Language: en
welcome back to six at zero night for
deep learning for self-driving cars
today we will talk about autonomous
vehicles also referred to as driverless
cars autonomous cars Robo cars first the
utopian view where for many autonomous
vehicles have the opportunity to
transform our society into a positive
direction 1.3 million people die every
year in the automobile crashes globally
thirty five thirty eight forty thousand
died every year in the United States so
the one opportunity that's huge that's
one of the biggest focus for us here and
MIT for people who truly care about this
it's to design autonomous systems our
artificial intelligence system that
saves lies
and those systems help work with deal
with or take away what nitsa calls the
four DS of human folly drunk drugged
distracted and drowsy driving autonomous
vehicles have the ability to take away
drunk driving distracted drowsy and
drugged eliminate car ownership
so taking shared mobility to another
level
eliminating car ownership from the
business side is the opportunity to save
people money and increase mobility and
access making vehicles removing
ownership makes vehicles more accessible
because the cost of getting from point A
to point B drops an order to magnitude
and the insertion of software and
intelligence into vehicles makes those
vehicles makes the idea of
Transportation makes the way we see
moving from A to point B a totally
different experience much like with our
smart phone it makes it a personalized
efficient and reliable experience now
for the negative view for the dystopian
view eliminate jobs any technology
throughout its history throughout our
history of human civilization has always
created fear that jobs that rely on the
prior technology will be lost this is a
huge fear especially in trucking because
so many people in the United States and
across the world rely work in the
transportation industry transportation
sector and the possibility that AI will
remove those jobs has potential
catastrophic consequences the idea one
that we have to struggle with in the
21st century of the role of intelligence
systems that aren't human beings being
further and further integrated into our
lives is the idea that a failure of an
autonomous vehicle even if they're much
rare if they're even if they're much
safer that there is a possibility for an
AI algorithm designed by probably one of
the engineers in this room will kill a
person where that person would not have
died if they were in control of the
vehicle
the idea of an intelligent system one
indirect interaction with a human being
killing that human being is one that we
have to struggle with in a philosophical
ethical and technological level
artificial
systems in popular culture lesson
engineering concerns may not be grounded
ethically grounded at this time much of
the focus of building these systems as
we'll talk about today and throughout
this course that focuses on the
technology how do we make these things
work
but of course decades out years or
decades out the ethical concerns starts
arising for Rodney Brooks one of the
seminal people from MIT those ethical
concerns will not be an issue for
another several decades at least five
decades but they're still important it
continues the thought the idea of what
is the role of AI in our society when
that car gets to make a decision about
human life
what is it making that decision based on
especially when it's a black box
what is the ethical grounding of that
system does it conform with our social
norms does a goal go against them and
there's many other concerns security is
definitely a big one a car that's not
even artificial intelligence based a car
that's software basis they're becoming
more and more millions most of the cars
on road today are run by millions of
lines of source code the idea that those
lines of source code written again by
some of the engineers in this room get
to decide the life of a human being
means then a hacker from outside of the
car can manipulate that code to also
decide the fate of that human being
that's a huge concern for us from the
engineering perspective the truth is
somewhere in the middle we want to find
what is the best positive way we can
build these systems to transform our
society to improve the quality of life
of everyone amongst us
but there's a grain of salt to the hype
of autonomous vehicles we have to
remember as we discussed in the previous
lecture and it will come up again and
again our intuition about what is
difficult and what is easy for deep
learning for autonomous systems is
flawed if we use our if use ourselves in
this example human beings are extremely
good at driving this will come up again
and again our intuition has to be
grounded in the understanding of what is
the source of data what is the
annotation and what is the approach what
is the algorithm so you have to be
careful while using our intuition
extending it decades out and making
predictions
whether it's towards the utopian or
dystopian view and as we'll talk about
some of the advancements of companies
working in the space today you have to
take what people say in the media what
the companies say some of the speakers
that will be speaking at this class say
about their plans for the future and
their current capabilities I think us a
guy that can provide is when there's a
promise of a future technology future
vehicles there are two years out or more
that has to be that's a very doubtful
prediction one that is within a year as
we'll give a few examples today is
skeptical the real proof comes in actual
testing of public roads or in the most
impressive the most amazing the reality
of it is when it's available to consumer
purchase I would like to use Rodney
Brooks as a so it doesn't come from my
mouth but I happened to agree his
prediction is no earlier than 2032 a
driverless taxi service in a major US
city will provide arbitrary
pick up and drop off locations fully
autonomously that's 14 years away and
bite one
45 it will do so in multiple cities
across the United States so think about
that that a lot of the engineers working
in the space a lot of folks are actually
building these systems agree with this
idea and that is the earliest I believe
this will happen and Rodney believes but
as all technophobes have been wrong who
could be wrong this is a map on the
x-axis a plot on the x axis of time
throughout the 20th century and the
adoption rate and the y axis from zero
to 100% of the various technologies from
electricity to cars to radio the
telephone and so on and as we get closer
to today the technology adoption rate
when it goes from zero to a hundred
percent the number of years it takes to
adopt that technology is getting shorter
and shorter and shorter as a society
we're better at throwing away the
technology of old and accepting the
technology of new so if a brilliant idea
to solve some of the problems were
discussing comes along it could change
everything overnight so let's talk about
different approaches to autonomy we'll
talk about sensors afterwards we'll talk
about companies players in this space
and then we'll talk about AI and the
actual algorithms and how they can help
solve some of the problems of autonomous
vehicles levels of autonomy here's a
useful tech solemnization of levels of
autonomy useful for initial discussion
for legal discussion and for policy
making and for blog posts and media
reports but it's not useful I would
argue for design and engineering of the
underlying intelligence and the system
viewed from a holistic perspective the
entire thing creating an experience that
safe and enjoyable so let's go over
those levels the five the six levels
this is presented by SAE report J three
zero one six the most widely accepted
taxonomies ation of autonomy no
automation at level zero level 1 and
level 2 is increasing levels automation
level one is cruise control level two is
adaptive cruise control lane keeping
level three I don't know what level
three is there's a lot of people that
will explain that level three is
conditional automation meaning it's
constrained to certain geographical
location I will explain that from an
engineering perspective I'm personally a
little bit confused of where that stands
I'll try to redefine how we should view
automation level four and level five is
high full level automation level four is
when the vehicle can drive itself fully
for part of the time there's certain
areas in which it can take care of
everything no matter what no human
interaction input safekeeping is
required level five automation is the
car does everything everything I would
argue that those levels aren't useful
for designing systems that actually work
in the real world I would argue that
there's two systems but first a starting
point that every system to some degree
involves a human
it starts with manual control from a
human human getting in the car and a
human electing to do something so that's
the manual control what we're talking
about when the human engages the system
when the system is first available and
the human chooses to turn it on that's
when we have to AI systems human
centered autonomy when the human is
needed is involved and full autonomy
when AI is fully responsible for
everything from the legal perspective
that
means a to full autonomy means the car
they designer the I system is liable is
responsible and for the human-centered
autonomy the human is responsible what
does this practically mean for human
center autonomy and we'll discuss
examples of all of these when a human
interaction is necessary the question
then becomes is how often is the system
available is it available on in traffic
conditions so for traffic
bumper-to-bumper is available on the
highway is it sensor based like in the
tesla vehicle meaning based on the
visual characteristics to the scene the
vehicle is confident enough to be able
to control to make control decisions
perception control decisions the other
factor poor not discussed enough and I
think poorly imprecisely discussed when
it is is the number of seconds given to
the driver not guaranteed but provided
as a sort of feature to the driver to
take over in the tesla vehicle in all
vehicles on the road today that time is
zero zero seconds are guaranteed zero
seconds are provided there is some
there's some room sometimes it's
hundreds of milliseconds sometimes it's
multiple seconds but really there's no
standard of how many seconds you get to
say wake up take control then tally up
operation something that some of the
companies will mention are playing with
is when a human being is involved
remotely controlling the vehicle
remotely so being able to take over
control the vehicle when you're when
you're not able to control it so support
by a human that's not inside the car
that's a very interesting idea to
explore but for the human centered
autonomy side all of those features are
not required they're not guaranteed the
human driver the
inside the car is always responsible at
the end of the day they must pay
attention to a degree that's required to
take over when the system fails and no
matter under this consideration under
this level of autonomy the system will
fail at some point that is the that is
the point this is a collaboration
between human and robot as the system
will fail and the human has to catch it
when it does and then full autonomy is
AI is fully responsible now
that doesn't again as will present some
companies in the marketing material and
the PR side of things they might present
that there is significant degrees of
autonomy if you're talking about l3 or
l4 or l5 you have to read between the
lines you're not allowed to have
teleoperation if a human is remotely
operating the vehicle a human is still
in the loop a human is still evolved
it's still a human senator autonomy
system you don't get the ten second rule
which is just because you give the
driver ten seconds to take control that
somehow removes liability for you if you
say that that's it as an AI system I
can't take can't resolve can't deal
can't control the vehicle in this
situation and you have ten seconds to
take over that's not good enough the
driver might be sleeping that driver may
have had a heart attack they're not able
to control the vehicle full autonomous
systems might must find safe harbor they
must get you full stop from point A to
point B that point B might be your
desired destination or might be a safe
parking lot but it has to bring you to a
safe location this is a clear definition
of the two systems in the human of
course as far as our certain current
conception of artificial intelligence in
cars today is a human always overrides
the AI system so we should for them for
the
in the general case the human gets to
choose to take control they I can't take
control the human except when danger is
imminent
meaning sudden crashes like in a bee
events we're not yet ready for the I
systems to say as a society to say no no
you're drunk you can't drive so beyond
the traditional levels from level zero
to level five the starting point is
level zero no automation all cars start
here level one level two and level three
I would argue fall into human senator
autonomy systems a1 because they did
involve some degree of a human then l4
l5 to some degree there's some crossover
fall into full autonomy even though with
l4 with way mo as you can ask on Friday
and anyone Cruz uber playing in the
space there's very often a human driver
involved one of the huge accomplishments
of way mo over the past month incredible
accomplishment where in Phoenix Arizona
they drove without the car drove without
a driver the meaning there was no safety
driver to catch there was no engineer
staff member there to catch the car a
human being that doesn't work for Google
or way mo got into that car and got from
A to point B without a safety driver
that's an incredible accomplishment and
that particular trip was a fully
autonomous trip that is full autonomy
well there's no human to catch the car
no way I press
station is good without cats it's a full
autonomy a to system its when you do
nothing but right along human Senate
autonomy system is when you have some
control I'm sorry I had to so the two
paths for autonomous systems they want
to need to in blue on the left is a one
human centered on the right is a two
full autonomy and then blue is from the
artificial intelligent perspective is
easy easier and then red is harder
easier meaning we do not have to achieve
a hundred percent accuracy harder means
everything that's off of a hundred
percent accuracy no matter how small has
a potential of costing human lives and
huge amounts of money for companies so
let's discuss we'll discuss later in the
lecture about the algorithms behind each
of these methods and the left on the
right but this summarizes the two
approaches the localization mapping for
the car to determine where it's located
for the human centered autonomy it's
easy it still has to do the perception
it has to localize itself within the
lane it has to find all the neighboring
pedestrians and the vehicles in order to
be able to control the vehicle to some
degree but because the human is there it
doesn't have to do so perfectly when it
fails a human is there to catch it scene
understanding perceiving everything in
the environment from the camera from
whether its lidar radar ultrasonic the
planning of the vehicle whether it's
just staying within lane or for adaptive
cruise control controlling the
longitudinal movement of the vehicle or
its changing lanes is the Tesla
autopilot or higher degrees of
automation all of those movement
planning decisions can be
autonomous Lee when the human is there
to catch it's easier because you're
allowed to be wrong rarely but wrong the
hard part is getting the human robot
interaction piece right
that's next Wednesday lecture as we'll
discuss about how deep learning can be
used to interact first perceive
everything about the driver and second
to interact with the driver that part is
hard because you can't screw up on that
part you have to make sure you help the
driver know where your flaws are so they
can take over if the driver is not
paying attention you have to bring their
attention back to the road back to the
interaction you have to get that piece
right because for a flawed system one
that's rarely flawed the rarity is the
challenge in fact has to get the
interaction right and then the final
piece communication the autonomous
vehicle fully autonomous vehicle must
communicate extremely well with the
external world with the pedestrians that
jaywalkers the humans in this world the
cyclists that communication piece one at
least that is part of a safe and
enjoyable driving experience is
extremely difficult on the taught a way
Moe vehicle I wish them luck if they
come to Boston from getting from point A
to point B because pedestrians will take
advantage a vehicle must assert itself
in order to be able to navigate Boston
streets and that assertion is
communication that piece is extremely
difficult for Tesla vehicle for for a
human centered autonomy vehicle l2 l3
the way you deal with Boston pedestrians
is you take over
roll down the window yell something and
then speed up
getting the piece for an artificial
intelligence system to actually be able
to accomplish something like that as
we'll discuss on the ethics side and the
engineering side is extremely difficult
that said most of the literature and the
human factors field in the autonomous
vehicle field anyone that studied
autonomy in aviation and in vehicles is
extremely skeptical about the human
centered approach they think it's deeply
responsible it's deeply responsible
because as argued because human beings
when you give them a technology which
will take control part of the time they
will get lazy they would take advantage
of that technology they will over trust
that technology they'll assume will work
perfectly always this is the idea that
this this idea extended beyond further
and further means that the better the
system gets the better the car gets it
driving itself the more the humans will
sit back and be completely distracted it
will not be able to re-engage themselves
in order to safely catch when the system
fails this is Chris Urmson the founder
of the Google self-driving cars program
and now the co-founder of one at the
other co-founders a speaker this class
on next Friday sterling Anderson of a
company called Aurora a start-up he was
one of the big proponents or the I
should say opponents the idea that human
senator autonomy could work they tried
it publicly is spoken about the fact
that at Google as in the early
self-driving car program they've tried
shared autonomy they've tried l2 and it
failed because they're engineers that
people driving their vehicles fell
asleep
and that's the belief that people have
and we'll talk about why that may not be
true there's a fascinating truth in the
way
human beings can interact with
artificial intelligence systems that may
work in this case as I mentioned it's
the human robot interaction building
that deep connection between human and
machine of understanding of
communication this is what we believe
happens so there's a lot of videos like
this as it's it's fun but it's also
representative of what what society
believes happens when automation is
allowed to enter the human experience
and driving or the human life is a stake
that you can become completely
disengaged it's kind of it's kind of a
natural thing to think but the question
is does this actually happen what
actually happens on public roads
the amazing thing that people don't
often talk about is that there is
hundreds of thousands of vehicles on the
road today
equipped with autopilot Tesla autopilot
that have a significant degree of
autonomy that's data that's information
so we can answer the question what
actually happens so many of the people
behind this team have instrumented 25
vehicles 21 of which are Tesla autopilot
vehicles now with over collected
recording everything about the driver 2
cameras 2 HD cameras on the driver 2
cameras on the external camera on the
external roadway and collecting
everything about the car including audio
the state that pulling everything from
the cam bus the kinematics of the
vehicle I am you GPS all of that
information over now over 300,000 miles
over 5 billion video frames all as we'll
talk about analyze the computer vision
you
extract from that video of the driver of
everything they're doing that level
distraction the allocation of attention
the drowsiness emotional states the
hands on wheel hands off wheel body pose
activity smartphone usage all these
factors all of these things that you
would think would fall apart when you
start letting autonomy into your life
we'll talk about what the initial
reality is that should be inspiring and
thought-provoking as I said three
cameras single board computer recording
all the data over a thousand machines in
Holyoke and distributed computation
running the deep learning algorithms
I've I've mentioned on these five plus
billion video frames going from the raw
data to the actionable useful
information the slides are up online if
you'd like to look through them oh fly
through some of them and this is the
video of one of thousands of trips we
have in autopilot in our data a car
driving autonomously a large fraction of
the time on highways from here to
California from here to Chicago to
Florida and all across the United States
we take that data and using the
supervised learning algorithms
semi-supervised the number of frames
here is huge for those that work in
computer vision five billion frames is
several orders of magnitude larger than
any data set that people are working
with in computer vision actively
annotated so we want to use that data
for understanding the behavior of what
people actually doing in the cars and we
want to train the algorithms that do
perception and control a quick summary
over three hundred thousand miles twenty
five vehicles
the colors are true to the actual colors
of the vehicles little fun facts Tesla
Model X Model S and now model three five
hundred thousand five hundred plus sorry
miles a day and growing now most days in
2018 are over a thousand miles a day
this is a quick GPS map in red is manual
driving across the Boston area in blue
cyan is autonomous driving this is
giving you the sense of just the scope
of this data this is a huge number of
miles with automated driving several
orders of magnitude larger than what Wei
Mo's doing that what Cruise is doing and
what Ober is doing the miles driven in
this data with autopilot confirming what
y'all muska stated it's 33% of miles of
driven autonomously this is a remarkable
number for those of you who drive and
for those of you who are familiar with
these technologies that is remarkable
adoption rate that 33 percent of the
miles are driven in autopilot that means
these drivers are getting use out of the
system it's working for them that's an
incredible number it's also incredible
because under the the decades of
literature from aviation to automation
and vehicles to to Chris Urmson and way
mo the belief is such high numbers are
likely to lead to crashes to fatalities
to at the very least highly responsible
behavior drivers
over trusting the systems and getting in
trouble we can run the glance
classification algorithms again this is
for next Wednesday discussion to the
actual algorithm it's the algorithm that
tells you the region that the driver is
looking at and it's comparing road
instrument cluster left rearview center
stack and right does the allocation of
glance change with autopilot or with
manual driving it does not appear to in
any significant noticeable way meaning
you don't start playing chess you don't
start you don't get in the backseat to
sleep you don't start texting in your
smartphone watching a movie at least in
this data set there's promise here for
the human centered approach the
observation to summarize this particular
data is that people are using it a lot
the percentage of miles the percentage
of hours is incredibly high at least
relative to what was will be expected
from these systems and given that
there's no crashes there's no near
crashes in autopilot the row type is
mostly highway traveling at high speeds
the mental engagement looked at 8,000
transyl of control from machine to human
so human beings taking control of the
vehicle saying you know what I'm going
to take control now I'm not comfortable
with the situation for whatever reason
either not comfortable or electing to do
something that the vehicle is not able
to like turn off the highway make a
right or left turn stop for a stop sign
these kinds of things physical
engagement as I said glance remains the
same and what do we take from this it
says something that I'd like to really
emphasize this we talked to was we
talked about autonomous vehicles in this
class and the guest speakers who are all
on the other side so I'm representing
the human center side all our speakers
are focused on the full autonomy side
because that's the side roboticists know
how to solve that's the fascinating
algorithm
nerd side and that's the side I love as
well just my belief stands that the
solving the perception control problem
is extremely difficult and to three
decades away so in the meantime we have
to utilize the human robot interaction
to actually bring these AI systems onto
the road to successfully operate and the
way we do that counter-intuitively is we
have to have we have to let the
artificial intelligence systems reveal
their flaws one of the most endearing
things to human beings can do to each
other friends is reveal their flaws to
each other now from an automotive
perspective from a company perspective
it's perhaps not appealing for an AI
system to reveal what it sees about the
world and what it doesn't see about the
world where it succeeds and where it
fails but that is perhaps exactly what
it needs to do in the case of autopilot
the way the very limited but I believe
successful way is currently doing that
is allowing you to use autopilot
basically anywhere so what people are
doing is they're trying to engage their
turn on autopilot in places where they
really shouldn't rural rural roads curvy
with terrible road markings with in in
heavy rain conditions with snow with
lots of cars driving at high speeds all
around they turn autopilot on to
understand to experience the limitations
of the system to interact that
human-robot interaction is through its
tactile by turning it on and seeing is
it going to work here how's it gonna
fail and the human is always there to
catch it that interaction that's
communication that intimate
understanding is what creates successful
integration of AI in the car before
we're able to solve the full autonomy
puzzle learn the limitations by
exploring it starts with this guy
and hundreds of others if you search on
YouTube first time with autopilot the
amazing experience of direct transfer of
control of your life to an artificial
intelligence system in this case giving
control to Tesla autopilot system this
is why in the human centered camp of
autonomy
I believe that autonomous vehicles can
be viewed as personal robots with which
you build build a relationship or the
human robot interaction is the key
problem not the perception control and
they're the flaws of both humans and
machines must be clearly communicated
and perceived perceived because we use
the computer vision algorithms to detect
everything about the human it
communicated because on the displays of
the car or even through voice it has to
be able to reveal when it doesn't see
different aspects of the scene from the
human centered approach then we can
focus on the left the perception and
control side perceiving everything about
the external environment and controlling
the vehicle without having to worry
about being 99.99999% correct
approaching a hundred percent correct
because in the cases where it's
extremely difficult we can let the human
catch the system we can reveal the flaws
and let the human take over when the
system can't so let's get to the sensors
the sources of raw data that we'll get
to work with there
three there's cameras so image sensors
RGB infrared visual data does radar and
ultrasonic and there's lidar let's
discuss the strengths first to discuss
really what these sensors are the
strengths the weaknesses and how they
can be integrated together through
sensor fusion so radar is the trust of
the old trusted friend the sensor that's
commonly available in most vehicles that
have any degree of autonomy on the left
is a visualization of the kind of data
on high-resolution radar that's able to
be extracted it's cheap
both radar which works with
electromagnetic waves and ultrasonic
which works with sound waves sending a
wave letting it bounce off the obstacles
knowing the speed of that wave being
able to calculate the distance to the
obstacle based on that it does extremely
well in challenging weather rain snow
the downside is low resolution compared
to the other sensors we'll discuss but
it is the one that's most reliable and
used in automotive industry today and
it's the one that's in sense of fusion
is always there lidar visualized on the
right the down size it's expensive but
it produces an extremely accurate depth
information and a high resolution map of
the environment that has 360 degrees of
visibility it has some of the big
strengths of radar in terms of
reliability but with much higher
resolution and accuracy the downside is
cost here is the
visualization comparing the two of the
kind of information get to work with the
the the density and the quality of
information with lidar is much higher
and lighter has been the successful
source of ground truth the reliable
sensor relied upon on vehicles that
don't care about cost and camera the
thing that most people here should be
passionate about because machine
learning deep learning has the most
ability to have a significant impact
there why
first it's cheap so it's everywhere
second it's the highest resolution so
there's the most the most highly dense
amount of information which means
information is something that could be
learned and inferred to interpret the
external scene so that's why it's the
best source of data for understanding
the scene and the other reason it's
awesome for deep learning is because of
the huge eNOS of data involved the its
many orders of magnitude more data
available for driving in camera visible
light or infrared than it is in lidar
the and our world is designed for
visible light our eyes work in similar
ways the cameras at least crudely so the
source data is similar the lane markings
the traffic signs of traffic lights the
other vehicles the other pedestrians all
operate with each other in this RGB
space in terms of visual characteristics
the downside is cameras are bad at depth
estimation it's noisy and difficult even
with stereo vision cameras to estimate
depth relative to lidar and they're not
good in extreme weather and they're not
good at least visible light cameras at
night
compare the ranges here's a plot and
meters on the x-axis of the range and
acuity and the y-axis with ultrasonic
lidar radar and camera passive visual
sensor plotted the range of cameras is
the greatest this is looking at we're
going to look at several different
conditions this is for clear well-lit
conditions so during the day no rain no
fog lighter and radar have a smaller
range under 200 meters and ultrasonic
sensors used mostly for Park assistance
and these kinds of things and blind spot
warning has terrible range is designed
for extremely close as high resolution
distance estimation for extremely close
distances here a little bit small but
looking at up top is clear well-lit
conditions the plot we just looked at
and on bottom is clear dark conditions
so just a clear night day no rain but
it's night and on the bottom right is
heavy rain snow or fog vision falls
apart in terms of range and accuracy
under dark conditions and in rain snow
or fog radar our old trusted friend
stay strong the same range just under
two hundred meters and at the same
acuity same with sonar lighter doesn't
works well at night but it does not do
well with rain or fog or snow one of the
biggest downsides of lidar other than
cost so here's another interesting way
to visualize this that I think is
productive for our discussion of which
sensor will win out is it the Elon Musk
prediction of camera or is that the way
more prediction of lidar for
I'd are in this kind of plot that will
look for every single sensor the greater
the radius of the blue the more
successful that sensor is at
accomplishing that feature with a bunch
of features lined up around the circle
so range for lidar is pretty good not
great but pretty good resolution is also
pretty good it works in the dark it
works in bright light but it falls apart
in the snow it does not provide color
information texture information contrast
it's able to detect speed but the sensor
size at least to date is huge the sensor
cost at least to date is extremely
expensive and it doesn't do well in
proximity where ultrasonic shines
speaking of which ultrasonic same kind
of plot does well in proximity detection
it's cheap the cheapest sensor of the
four and sensor size you can get it to
be tiny it works and snow and fog and
rain but its resolution is terrible
its range is non-existent and it's not
able to detect speed
that's where radar steps up it's able to
detect speed it's also cheap it's also
small but the resolution is very low and
it's just like lidar is not able to
provide texture information color
information camera the sensor cost is
cheap
the sensor size is small not good up
close proximity the range is the longest
of all of them resolution is the best of
all of them
it doesn't work in the dark it works in
bright light but not always one of the
biggest downfalls of camera senses is
the sensitivity to the lighting
variation it works it doesn't work in
the snow fog rain so suffers much like
lidar from that
but it provides rich interesting
sectional information the very kind that
deep learning needs to make sense of
this world so let's look at the cheap
sensors ultrasonic radar and cameras
which is one approach putting a bunch of
those in a car and fusing them together
the cost there is low one of the nice
ways to visualize using this
visualization technique when they're
fused together on the bottom it gives
you a sense of them working together to
complement each other as strengths and
the question is whether the camera or
lidar will win out for partial autonomy
or full autonomy on the bottom showing
this kind of visualization for a lidar
sensor and on top showing this kind of
visualization for fused radar ultrasonic
and camera at least under these
considerations the fusion of the cheap
sensors can do as well as lidar now the
open question is whether lidar in the
future of this technology can become
cheap and its range can increase because
then lidar can win out
solid-state light our and a lot of
developments with a lot of startup
ladder companies are promising to
decrease the cost and increase the range
of these sensors but for now we plow
along with dedication on the camera
front the annotated driving data grows
exponentially more and more people are
beginning to annotate and study the
particular driving perception and
control problems and the very algorithms
for the supervised and semi-supervised
and generative networks that we use to
work with this data are improving so
it's a race
and of course radar and ultrasonic I was
there to help so companies that are
playing in the space some of them are
speaking here lame-o in April 2017 they
exited their testing their extensive
impressive testing process and allow the
first rider in Phoenix Public rider in
November 2017
it's an incredible accomplishment for a
company and for an artificial
intelligence system in November 2017 no
safety driver so the car truly achieved
full autonomy under a lot of constraints
but it's full autonomy it's a step it's
an amazing step in the direction towards
full autonomy much sooner than people
would otherwise predict and the miles
four million miles driven autonomously
by November 2017 and growing quickly
growing in terms of full autonomous
driving if I can say so cautiously
because most of those miles have a
safety driver so I would argue it's not
full autonomy but however they define
full autonomy it's four million miles
driven incredible uber in terms of miles
second on that list they have driven two
million miles autonomously by December
of this of last year 2017 the quiet
player here in terms of not making any
declarations of being fully autonomous
just quietly driving in a human censored
way l2 over 1 billion miles in autopilot
over three hundred thousand vehicles
today are equipped with autopilot
technology with the ability to drive
control the car laterally and
longitudinally and if anyone believes
the CEO of Tesla there'll be over 1
million such vehicles by the end of 2018
but no matter what the 300,000 is an
incredible number and the 1 billion
miles is an incredible number autopilot
was first released in September 2014 one
of the first systems on the road to do
so autopilot and I call myself as one of
the skeptics in October 2016 autopilot
decided to let go of an incredible work
done by Mobil I now Intel we're
designing their perception control
system they decided to let go of it
completely and start from scratch using
mostly deep learning methods the DRI px
2 system from Nvidia and 8 cameras they
decided to start from scratch that's the
kind of boldness the kind of risk-taking
that can come with naivety but in this
case it worked
incredible audio 8 system is going to be
released at the end of 2018 and it's
promising one of the first vehicles
that's promising what they're calling l3
and the definition of l3 according to
Thorsten Lionheart the head of the
automated driving and Oddie in a naughty
is when the function is operate as
intended if the customer turns the
traffic jam pilot on now this l3 system
is designed only for traffic jazz
bumper-to-bumper traffic under 60
kilometers an hour if the customer
returns a traffic jam pilot on and uses
it as intended and the car was in
control at the time of the accident the
driver goes to the insurance company and
the insurance company will compensate
the victims of the accident and
aftermath they come to us
we will pay them so that means the cars
liable the problem is under the
definition of l2 l3 perhaps there is
some truth to this being an l3 system
the important thing here is it's
nevertheless deeply and fundamentally
human centered because even as you see
here in this demonstration video with a
reporter the car for a poorly understood
reason transfer control to the driver
says that's it I can't I can't take care
of the situation you take control how
how much time do you have in terms of
seconds before you really need to know
to take over well this is the new thing
about level 3 with level 3 the system
allows the driver to give the prompt to
take over vehicle control again ahead of
time which is in this case up to 10
seconds ok so if the traffic jam
situation clears up or any failure in
the system occurs everything you might
think of the system still needs to be
able to drive automatically because the
driver has this time to take over
you might ask what its new about this so
why is Howdy saying this is the first
level 3 system worldwide on the market
when talking about these levels of
automation there's a classification
which starts at lower zero which is
basically the drivers doing everything
there's no assistance nothing and then
it gradually becomes into partly
automation and when we're talking about
these assistance functions like
lane-keeping and distance keeping we're
talking about level 2 assistance
function ok which is meaning that the
driver is obliged to permanently monitor
the traffic situation to keep the hands
on the wheel even though there's a
support and an assistance and to
intervene immediately if anything is not
quite right so you know that from laying
assistance systems when the steering is
not perfectly in the right lane we have
to intervene and correct immediately and
that is the main difference now we got a
takeover request so what so let's let's
talk about what that means this is still
a human Center system it still struggles
that still must solve the human robot
interaction problem and there's many
others playing in the space I'm the on
the full autonomy side way mo uber GM
crews new tana me the CTO of which he'll
speak here on Tuesday optimist ride its
annuity voyage the CEO of which will
speak here next Thursday and Aurora not
listed this the founder of which will
speak here next Friday and the human
centered autonomy side the reason I am
speaking about us so much today is we
don't have any speakers I'm the speaker
the Tesla autopilot is for several years
now doing incredible work on that side
we are also working with Volvo pilot
assist as a lot of different approaches
they're more concerned
of interesting the audio traffic jam
assist as I mentioned the a8 being
released at the end of this year the
Mercedes drive pollicis in the e-class
an interesting vehicle that I got to
drive quite a bit as the Cadillac
supercruise the ct-6 which is very much
constrained geographically to highway
driving and the loudest proudest of them
all george hotz
of the comma a open pilot let's just
leave that there so where can a I help
we'll get into the details of the coming
lectures on each individual component
I'd like to give some examples the key
areas problem spaces that we can use
machine learning to solve from data his
localization and mapping so being able
to localize yourself in the space the
very first question that a robot needs
to answer where am I seen understanding
taking the scene in and interpreting
that scene detecting all the entities in
the scene detecting the class of those
entities in order to then do movement
planning to move around those entities
and finally driver state essential
element for the human robot interaction
perceive everything about the driver
everything about the pedestrian and the
cyclists and the cars outside the human
element of those the human perception
side so first the where am I visual
odometry using camera sensors which is
really where once again deep learning is
most that a vision sensor is the most
amenable to learning based approaches
and visual odometry is using camera to
localize yourself to answer the where am
I question the traditional approaches
slam detect features in the scene and
track them through time from frame to
frame and from the movement
those features are able to estimate
thousands of features tracking estimate
the location the orientation of the
vehicle or the camera those methods with
stereo vision first requires taking two
camera streams on distorting them
competing disparity map from the
different perspectives of the two camera
computing the matching between the two
the feature detection thus if too fast
or any of the methods of extracting non
deep learning methods of the extracting
features strong detectable features that
can be tracked through from frame to
frame tracking those features and
estimating the trajectory the
orientation of the camera that's the
traditional approach to visual odometry
in the recent years since 2015 but most
success in the last year has been the
end end deep learning approaches either
stereo or monocular cameras deep vo is
one of the most successful the antenna
method has taken a sequence of images
extracting with a CNN from each image
the central features from each image and
then using RNN recurrent neural network
to track over time the trajectory the
pose of the camera image to pose and to
end here's the visualization on a kitty
data set using deep vo again taking the
video up on the top right as an input
and estimating what's visualized is the
position of the vehicle in red is the
estimate based again and to end with a
CNN and RNN the in red is the estimate
in blue is the ground truth in the kitty
dataset so this removes a lot of the
modular parts a slam a visual odometry
and allows it to be and to end which
means it's learner bull which means it
gets better with data
that's huge
vision alone this is one of the exciting
opportunities for AI or people working
in AI is the ability to use a single
sensor and perhaps the most inspiring
because that sensor is similar to our
own the sensor that we ourselves use of
our eyes to use that alone as the
primary sensor to control a vehicle
that's really exciting and the fact that
deep learning that the vision visible
light is the most amenable to deep
learning approaches makes this
particularly an exciting area for deep
learning research scene understanding of
course who can do a thousand slides on
this traditionally object detection
pedestrians vehicles there is a bunch of
different types of classifiers of
feature extractions harlech features and
deep learning has basically taken over
and dominated every aspect of scene
interpretation perception understanding
tracking recognition classification
detection problems and audio can't
forget audio that we can use audio as
source of information whether that's
detecting honks or in this case using
the audio of the tires microphones on
the tires to determine visualize there's
a spectrogram of the audio coming in
for those of you who are particularly
have a particularly tuned ear can listen
to the different audio coming in here of
wet road and dry road after the rain so
there's no rain but the road is
nevertheless wet and detecting that is
extremely important for vehicles because
they still don't have traction control
estelle have poor control in road to
road surface tired road surface
connection and being able to detect that
from just audio is a very interesting
approach
finally we're not finally next for the
perception control side finally is the
movement planning getting from A to
point from point A to point B
traditional approaches the optimization
based approach determine the optimal
control try to reduce the problem
formalize the problem in a way that's
amenable to optimization based
approaches there's a lot of assumptions
that need to be made but once those
assumptions are made you're able to
determine to generate thousands or
millions of possible trajectories and
have an objective function we determine
which of the trajectories to take here's
a race car optimizing how to take a turn
at high speed with deep learning
reinforcement learning
the application mule networks
reinforcement learning is particularly
exciting for both the control and the
planning side so that's where the two of
the competitions we're doing in this
class come into play the simplistic
two-dimensional world of deep traffic
and the high mood high speed moving
high-risk world of deep crash will
explore those tomorrow tomorrow's
lectures on deeper enforcement learning
and finally drivers state detecting
everything about the driver and then
interacting with them on the left and
green are the easier problems on the
right and red are the harder problems in
terms of perception in terms of how
amenable they are to deep learning
methods body pose estimation is a very
well studied problem we have extremely
good detectors for estimating the pose
the hands the elbows the shoulders every
aspect visible aspect of the body head
pose the orientation of the head or
extremely good at that and as we get
smaller and smaller in terms of size
blink rate blink duration I pose and
blink dynamics start getting more and
more difficult all of these metrics all
of these metrics extremely important for
detecting things like drowsiness or as
components of detecting emotion or word
people are looking in driving where your
head is turned is not necessarily where
you're looking in regular life
non-driving life when you look somewhere
you usually turn your head to look with
your eyes in driving your head often
stay still or moves very subtly your
eyes do a lot more moving it's the kind
of effect that we described as the
lizard owl effect some fraction of
people a small fraction or owls meaning
they move their head a lot
and some people most people are lizards
moving eyes to allocate their attention
the problem with eyes is from the
computer vision perspective they're much
harder to detect in lighting variation
than real-world conditions they get
harder and we'll discuss how to deal
with it of course that's where deep
learning steps up and really helps with
real-world data
cognitive load we'll discuss as well
estimating the cognitive load of the
driver to give a quick clip is this as
the driver glance we've seen before
estimating the very most important
problem on driver stateside is
determining whether they're looking on
road or off road it's the dumbest
simplest but most important aspect are
they looking are they in the seat and
looking on the road or are they not
that's driver glance classification not
estimating the X Y Z geometric
orientation where they're looking but
actually binary class classification on
road or off road body pose estimation
determining if the hands are on wheel or
not determining if the body alignment is
standard is good for seatbelt for safety
this is one of the important things for
autonomous vehicles if there's an
imminent danger to the driver the driver
should be asked to return to a position
that is safe for them in case of a crash
driver in motion on the top is
satisfied on the bottom as a frustrated
driver they self-reported satisfied this
is with a voice based navigation one of
the biggest sources of frustrations for
people in cars is voice based navigation
trying to tell an artificial
intelligence system using your voice
alone where you would like to go huge
source of frustration one of the
interesting things in our large data set
that we have from the effective
computing perspective is determining
which of the features are most commonly
associated with frustrated voice based
interaction and that's a smile as shown
there it's the counter intuitive notion
that emotion in particularly emotion in
the car is very context dependent that
smiling is not necessarily a sign of
happiness and the stoic board look of
the driver up top is not necessarily a
reflection of unhappiness he is indeed a
10 out of 10 in terms of satisfaction
with the experience if he has ever been
satisfied with anything
happens to be Dan Brown one of the
amazing engineers in our team cognitive
load estimating from the eye region and
sequences of images 3d convolutional
neural networks taking in a sequence of
images from the eye looking at the blink
dynamics and the eye position to
determine the cognitive load from 0 to 2
how deep in thought you are two paths to
autonomous future again I would like to
maybe for the last time but probably not
argue for the one on the left because
our brilliant much smarter than me guest
speakers will argue for the one on the
right the human centered approach allows
us to solve the problems of 99% accuracy
of localization scene understanding
movement planning those are the problems
were taking on in this class the scene
segmentation that we'll talk about on
Thursday the control they will talk
about tomorrow and the driver state that
we'll talk about next Wednesday these
problems can be solved with deep
learning today the problems on the right
solving them to close to 100% accuracy
are extremely difficult and may be
decades away because for full autonomy
to be here we have to solve this
situation I've shown this many times
octave Triomphe we have to solve this
situation I give you just a few examples
what do you do you have to solve this
situation a sort of subtler situation
here is a it's a busy crosswalk where no
autonomous vehicle will ever have a hope
of getting through unless it asserts
itself and that there's a couple of
vehicles here that kind of nudge
themselves through or at least when they
have the right-of-way don't necessarily
nudge but don't hesitate when a
pedestrian is present an ambulance
flying by even though if you use a
trajectory so and pedestrian intent
modeling algorithm to predict the
momentum of the pedestrian to estimate
where they can possibly go you would
then autonomous vehicle will stop but
these vehicles don't stop they assert
themselves they move forward now for a
full autonomy system this may not be the
last time I show this video but because
it's taking full control it's following
a reward function an objective function
and all of the problems the ethical and
the AI problems that arise like this
Coast Runner problem will arise so we
have to solve those problems we have to
design that objective function so with
that I'd like to thank you and encourage
you to come tomorrow because you get a
chance to participate in deep traffic a
deep reinforcement learning competition
thank you very much
[Applause]