Transcript
p5AtrKqQ3Fw • Karl Iagnemma & Oscar Beijbom (Aptiv Autonomous Mobility) - MIT Self-Driving Cars
/home/itcorpmy/itcorp.my.id/harry/yt_channel/out/lexfridman/.shards/text-0001.zst#text/0067_p5AtrKqQ3Fw.txt
Kind: captions
Language: en
all right welcome back to 6 s 0 9 for
deep learning for self-driving cars
today we have Carling yama and oscar
baby boom from active karl is the
president of apt of autonomous mobility
where Oscar is the machine learning lead
karl founded in autonomy as many of you
know in 2013 it's a boston-based
autonomous vehicle company and new tommy
was acquired by active in 2017 and now
it's part of active karl and team are
one of the leaders in autonomous vehicle
development and deployment with cars on
roads all over the United States several
sites but most importantly Karl is MIT
through-and-through is also some of you
may know getting his PhD here he led a
robotics group here as a research
scientist for many years so it's really
a pleasure to have both karl and oscar
with us today please give them a warm
welcome
all right thanks Lex
yeah very glad to be back at MIT very
impressed that you guys are here during
IEP
my course load during IEP was usually
ice skating and sometimes like there was
a wine tasting course this is now almost
twenty years ago and that was pretty
much it that's where the academic work
stopped so you guys are here to learn
something so I'm gonna do my best and
try something radical actually sometime
president now of apps of autonomous
driving I'm not allowed to talk about
anything technical or interesting I'm
gonna flout that a little bit and and
raise some topics that we think about
that I think are interesting you know
questions too to keep in the back of
your mind as you're thinking about deep
learning an autonomous driving so I'll
raise some of those questions and then
Oscar will actually present some
real-life technology and some of the
work that he has been doing Oscar's our
machine learning lead some of the work
that he and his outstanding team have
been doing around machine learning based
detectors for the
deception problem so let me first
introduce apt of a little bit because
people usually ask me like what's an
active when I say I work for active apt
has actually been around for a long time
but in a different form
after it was previously Delphi
technologies which was previously part
of General Motors
so everybody's heard of General Motors
some of you may have heard of Delphi
active spun from Delphi about 14 months
ago and so after the tier 1 supplier
they're an automotive company that
industrialize --is technology
essentially they take software and
hardware they industrialize it and put
it on car so it can run for many many
hundreds of thousands of miles without
failing which is a useful thing when we
think about autonomous driving so the
themes for active they develop what they
say is safer greener and more connected
solutions safer means safety systems
active safety autonomous driving systems
of the type that we're building greener
systems to enable electrification and
green vehicles and then more connected
connectivity solutions both within the
vehicle transmitting data around the
vehicle and then externally wireless
communication all of these things as you
can imagine feed very very nicely into
the future transportation systems that
the software will actually only be a
part of so active is in a really
interesting spot when you think about
the future of autonomous driving and
give you sense of scale
still kind of amazes me the biggest my
research group ever was at MIT was like
18 18 people active is a hundred and
fifty six thousand employees so
significant sized organization about a
thirteen billion dollar company by
revenue in about 50 countries around the
world my groups about seven hundred
people so of which Oscar is is one very
important person we're about seven
hundred working on autonomous driving
we've got about a hundred twenty cars on
the road in in different different
countries and I'll show you some
examples of that but first let me take a
trip down memory lane and show you a
couple of snapshots about where we were
not too long ago kind of as a community
but but also you know me personally and
this will either inspire or horrify you
I'm not sure which but the fact is 2007
you know there were groups driving
around with cars like running blade
servers in the trunk that we're
generating so much heat you had to
install another air conditioner which
then was drawing so much power you have
to add another alternator and then kind
of rinse and repeat so it wasn't a great
situation but people did enough
algorithmically computationally to to
enable these cars and this is the DARPA
urban challenge for those who that may
be familiar to enable these cars to do
something useful and interesting on a
closed course and it kind of convinced
enough people that given enough devotion
of you know thought and resources that
this might actually become a real thing
someday so I was one of those people
that got convinced 2010 this is now I'm
gonna crib from my co-founder Emilio who
was a former MIT faculty member and
aero-astro Emilio started up an
operation in Singapore through smart who
somebody had probably worked with so
this is some some folks from smart
that's James who looks really young in
that picture he was one of emilio
students who was basically taking a golf
cart and and turning it into an
autonomous shuttle it turned out to work
pretty well and it got people in
Singapore excited which in turn got us
further excited 2014 they did a demo
where they led people of Singapore
coming right around these carts in in a
garden and that worked great over the
course a weekend
course of a weekend around this time
we'd started new autonomy we'd actually
started a commercial enterprise it kind
of stepped least partly away from MIT at
that point 2015 we had cars on the road
this is a Mitsubishi i-miev electric
vehicle when we had all our equipment in
it the front seat was pushed forward so
far that me I'm about six foot three
actually couldn't sit in the front seat
so I couldn't actually accompany people
on rides it wasn't very practical
we ended up switching cars to a Renault
Zoe platform which is the one you see
here which had a little more legroom we
were giving at that point open to the
public
rides in our cars in Singapore in the
part of the city that we were allowed to
operate in
it was a quick transition as you can see
just even you know visually the
evolution of these systems has come a
long way in a short time and we're just
a point example of this phenomena which
is kind of broadly speaking of you know
similar across the industry but 2017 we
joined active and we were excited by
that because we as primarily scientists
and technologists didn't have a great
idea how we're gonna industrialize this
technology and actually bring it to
market and make it reliable and robust
and make it safe which is what I'm going
to talk about a little bit here today so
we joined active with its global
footprint today we're primarily in
Pittsburgh
Boston Singapore and Vegas and we've got
connectivity to actives other sites in
Shanghai and Wolfsburg let me tell you a
little bit about what's happening in
Vegas I think people were here when was
Luke talking a couple days ago yesterday
so Luke from lift Luke Vincent probably
talked a little bit about Vegas Vegas is
really an interesting place for us we've
got a big operation there 130,000 square
foot garage we've got about 75 cars
we've got thirty of those cars on the
lift Network so apt of technology but
connecting to the customer through lift
so if you go to Vegas and you open your
lyft app it'll ask you do you want to
take a ride in autonomous car you can
opt in you can opt out it's up to you if
you opt in there's a reasonable chance
one of our cars will pick you up if you
call for a ride so anybody can do this
competitor's innocent bystanders totally
up to you we have nothing to hide our
cars are on the road 20 hours a day
seven days a week if you take a ride
when you get out of the car just like
any lifts ride you got to give us a star
rating one through five and that to us
is actually really interesting because
you know it's a scaler it's it's not too
rich but that star rating to me says
something about the ride quality meaning
the comfort of the trip the safety that
you felt and the efficiency of getting
to where you want it to go and our star
rating today is four point nine five
which is pretty good key numbers we've
given this point
over 30,000 rides to more than 50,000
passengers we've driven over a million
miles in Vegas and a little bit
additional but primarily there and as I
mentioned the 4.95 so what's it look
like on the road I'll show just one
video today I think Oscar has a few more
this one's actually in Singapore but
it's all kind of morally equivalent
you'll see a sped up slightly sped up
view of a run from this is now probably
six seven months old on the road in
Singapore but it's got some interesting
stuff in a fairly typical run some of
you may recognize these these these
roads we're on the wrong side of the
road remember because we're in Singapore
but to give you an example of the some
of the types of problems we have to
solve on a daily basis so let me run
this thing and you'll see is this car is
cruising down the road you have
obstacles that we have to avoid
sometimes in the face of oncoming
traffic we've got to deal with sometimes
situations where other road users are
maybe not perfectly behaving by the
rules we got to manage that in a natural
way Construction is Singapore like
everywhere else is pretty ubiquitous and
so you have to navigate through these
less structured environments people who
are sometimes doing things or indicating
some future action which you have to
make inferences about that can be tricky
to navigate so typical day a route that
any one of us as humans would you know
drive through without batting an eye no
problem is actually presents some really
really complex problems for autonomous
vehicles but it's the table stakes these
days these are the things you have to do
if you want to be on the road and
certainly if you want to drive millions
of miles you know with very few
accidents which is what we're doing so
that's an introduction to active and a
little bit of background so let me talk
about we're going to talk about learning
and how we think about learning in the
context of autonomous driving so there
was a period a few years ago where I
think as a community people thought that
we would be able to go from pixels to
actuator commands with a single learned
architecture a single black box
I'll say generally speaking we no longer
believe that's true and I should include
we in that I didn't believe that was
ever true but some of us maybe thought
that was true and I'll tell you part of
the reason why and in part of this talk
a big part of it comes down to safety a
big part of it comes down to safety and
the question of safety convincing
ourselves that that system that black
box even if we could train it to
accurately approximate this massively
complex underlying function that we're
trying to approximate can we convince
ourselves that it's safe and it's very
very hard to answer that question
affirmatively
and I'll raise some of the issues around
why that is this is not to say that
learning methods are not incredibly
useful for autonomous driving because
they absolutely are and Oscar will show
you examples of why that is and how
active is using some learning methods
today but this safety dimension is
tricky because there's actually there's
actually two axes here one is the actual
technical safety of the system which is
to say can we build a system that's safe
that's provably in some sets safe that's
we can validate which we can convince
ourselves achieves the intended
functionality in our operational design
domain that adheres to whatever
regulatory requirements might be imposed
on our jurisdictions that we're
operating and there's a whole longer
list related to technical safety but
these are technical problems primarily
but there's another dimension which
appear is you know called perceived
safety which is to say when you ride in
a car even if it's safe do you believe
that it's safe and therefore will you
want to take another trip which sounds
kind of squishy and as engineers we're
typically uncomfortable with that kind
of stuff but it turns out to be really
important and probably harder to solve
because it's a little bit squishy and
you know quite obviously we got to sit
up here right we got to be in this upper
right-hand corner where we have not only
a very safe car from a technical
perspective but one that feels safe that
inspires confidence in riders in
regulators and and everybody else so how
do we get there in the context of
elements of this system that maybe black
boxes for lack of a better word what's
required is trust
you know how do we get to this point
where we can trust neural networks in
the context of safety critical systems
which is what an autonomous vehicle is
it really comes down to this question of
how do we convince ourselves that we can
validate these systems again validating
the system ensuring that it can it can
meet the requirements the operation
requirements in the domain of interests
that are imposed by the user alright
there's three dimensions to to this this
this this key question of understanding
how to validate and I'm gonna just
briefly introduce some questions some
some topics of interest around each of
these but the first one trusting the
data trusting the data so do we actually
have confidence about what goes into
this algorithm I mean everybody knows
garbage in garbage out there's various
ways that we can make this garbage we
can have data which is insufficiently
covering our domain not representative
of the domain we can have data that's
poorly annotated by our third party
trusted partners so we've trusted to to
label certain things of interests so do
we trust the data that's going in to the
algorithm itself do we trust the
implementation you've got a beautiful
algorithm super descriptive super robust
not brittle at all well-trained and
we're running it on poor hardware we've
coded it poorly
we've got buffer overruns right and left
do we trust the implementation to
actually execute in a safe manner
and do we trust the algorithm again
generally speaking we're trying to
approximate really complicated functions
I don't think we typically use neural
networks for to approximate linear
systems so this is a gnarly nasty
function which has topics of which has
problems of critical interest which are
really rare in fact they're the only
ones of interests so there's these
events that happen very very
infrequently that we absolutely have to
get right it's a hard problem to
convince ourselves that the algorithm is
going to perform properly in these
unexpected and rare situations so these
are the sorts of things that we think
about and that we have to answer in an
intelligent way to convince ourselves
that we have a validated neural network
based system
okay let me just step through these each
of these topics really quickly so the
topic of validation you know what do we
mean by that or why it is hard there's a
number of different dimensions here the
first is that we don't have insight into
the nature of the function that we're
trying to approximate you know the
underlying phenomena is really
complicated again if it weren't we'd
probably be possibly be modeling it
using different techniques we'd write a
closed-form equation to describe it so
that's a problem second again you know
the accidents the actual crashes on the
road what's going crashes and not
accidents these are rare luckily they're
very rare but it makes the statistical
argument around these accidents and
being able to avoid these accidents
really really difficult
if you believe rant and they're pretty
smart folks they say you got to drive
275 million miles without accident
without a crashed you can claim a lower
fatality rate than a human with 95%
confidence but how we gonna do that can
we think about using some correlated
incident maybe some kind of close call
as a proxy for accidents which may be
more frequent and maybe back in that way
there's a lot of questions here which I
won't say we don't have any answers to
because I wouldn't go that far but there
there's heart they're hard questions
they're not questions with obvious
answers so this is one of them these
this this issue of rare events the
regulatory dimension is one of these
known unknowns how do we evaluate a
system if the requirements that may be
imposed upon us from outside regulatory
bodies are still to be written other
that's difficult so there's a lack of
consensus on what the safety target
should be for these systems this is
obviously evolving smart people are
thinking about this but today it's not
at all clear if you're driving in Las
Vegas if you're driving in Singapore if
you're driving in San Francisco or in or
in between what this target needs to be
and then lastly and this is a really
interesting one we can get through a
validation process for a build of code
let's assume we can do that well what
happens when we're gonna update the code
because obviously we will does that mean
we have to start that validation process
again from scratch which will
unavoidably be expensive and lengthy
well what if we only change a little bit
of the code would have I only changed
one line but what if that one line is
like the most important line of code in
the whole code base this is one that I
can tell you keeps a lot of people up at
night this question of revalidation and
then not even you know again now we'll
keep that code base fixed what if we
move from one city to the next and let's
say that city is quite similar to your
previous city but not exactly the same
how do we think about validation in the
context of new environments so this
continuous development issue is a
challenge all right let me move on to
talking about the data there's probably
people in this room who are doing active
research in this area because it's a
really interesting one but there's a
couple of
questions I would say that we think
about when we think about data we can
have a great algorithm and if we're
training it on poor data for one reason
or another we won't have a great output
so one thing we think about is this
efficiency the completeness of the data
and the bias that may be inherent in the
data for our operational domain if we
want to operate 24 hours a day and we
only train on data collected during day
time we're probably going to have an
issue
annotating the data is another dimension
of the problem we can collect raw data
that's sufficient that covers our space
but when we annotate it when we hand it
off to a third party because it's
typically a third party to mark up the
interesting aspects of it we provide
them some specifications but we put a
lot of trust in that third party
and-and-and and trust that they're gonna
do a good job annotating the interesting
parts and not the uninteresting parts
that they're going to catch all the
interesting parts that we've asked them
to catch etc so this annotation part
which seems very mundane very easy to
manage and kind of like low-hanging
fruit is in fact another key aspect of
ensuring that we can trust the data ok
and this
just kind of point to the fact that
there are again smart people thinking
about this problem which rears its head
in many domains beyond autonomous
driving now what about the algorithms
themselves so moving on you know from
the data to the actual algorithm you
know how do we convince ourselves that
that algorithm that you know like any
kind of learning based auger we've
trained on a training set is going to do
well on some unknown test set well
there's a couple kind of properties of
the algorithm that we can look at that
we can kind of interrogate and kind of
poke at to convince ourselves that that
algorithm will perform well you know one
is in variance and the other one we can
say is stability if we make small
perturbations to this function does it
behave well given kind of let's say a
bounded input do we see a bounded output
or do we see some wild response you know
I'm sure you've all heard of examples of
adversarial images that can confuse
learning based classifiers so it's a
it's a it's a turtle you show it a
turtle that says well that's a turtle
and then you show it a turtle that's
maybe fuzz with a little bit of noise
that the human eye can't perceive so it
still looks like a turtle and it tells
you as a machine gun obviously for us in
the driving domain we want to stop sign
to be correctly identified as a stop
sign a hundred types of a hundred we
don't want that stop sign if somebody
goes up and puts a piece of duct tape in
the lower right hand corner to be
interpreted as a yield sign for example
so this question of the properties of
the algorithm its invariance its
stability is something of high interests
and then lastly
and one more point to this this notion
of interpretability
so interpretability understanding why an
algorithm made a decision that it made
this is the sort of thing that may not
be a nice-to-have may actually be a
requirement and would likely to be a
requirement from the regulatory groups
that I was referring to a minute ago so
let's say imagine the case of a crash
where the system that was governing your
trajectory generator was a was a was a
data-driven system was a deep learning
based trajectory generator well you may
need to explain to someone exactly why
that particular generate trajectory was
generated at that particular moment and
this may be a hard thing to do if the
generator was a was a data driven model
now obviously there are people working
and doing active research into this
specific question of interpretive all
learning methods but it's it's it's a
thorny one it's a very very difficult
topic and it's not at all clear to me
when and if we'll get to the stage where
we can - even a technical audience but
beyond that to a lay jury be able to
explain why algorithm X made decision
why
okay so with all that in mind let me
talk a little bit about safety that all
maybe sounds pretty bleak you think well
man well I've been taking this course
with Lex because we're never really use
this stuff but in fact we we can we can
and will as a community there's a lot of
tools we can bring to bear to think
about neural networks and they're
generally speaking within the context of
a broader safety argument I think that's
the key we tend not to think about using
a neural network as an holistic system
to drive a car but we'll think about it
as a sub-module that we can build other
systems around generally speaking that
which we can say maybe make more
rigorous claims about their performance
their underlying properties and then
therefore make a convincing holistic
safety argument that this end-to-end
system is safe we have tools functional
safety is maybe familiar to some of you
it's something we think about a lot in
the automotive domain
and so diff which stands for safety of
the intended functionality we're
basically asking ourselves the question
is this overall function doing what it's
intended to do is it operating safely
and is it meeting its specifications
there's kind of an analogy here to
validation and verification if you will
and we have to answer these questions
around functional safety and soda if
affirmative lis even in the even when we
have neural network based elements in
order to eventually put this car on the
road all right so I mentioned that we
need to do some embedding this is an
example of what it might look like we
refer to this as sometimes we call this
caging the learning so we put the
learning in a box it's this powerful
animal we want to control and in this
case it's up there at the top in red
that might be you know that trajectory
proposer I was talking about so let's
say we've got a powerful trajectory
proposer we want to use this thing we've
got it on what we call our performance
compute our high-powered compute it's
maybe not automotive grade it's got some
potential failure modes but it's
generally speaking you know good
performance let's go there and we've got
our neural network based generator on it
which we can say some things about but
maybe not everything we'd like to well
we make the argument that if we can
surround that says we can cage it kind
of underpin it with a safety system that
we can say very rigorous things about
its performance then generally speaking
we may be okay there may be a path to
using neural networks on autonomous
vehicles if we can wrap them in a safety
architecture that we can say a lot of
good things about and this is exactly
what this represents so I'm going to
conclude my part of the talk here handed
over to Oscar with kind of a quote and
assertion one of my engineers insisted I
show today the argument is the following
engineering is inching closer to the
Natural Sciences I won't say how much
closer but closer we're creating things
that we don't fully understand and then
we're investigating the properties or
creation we're not ready down close for
closed form functions that would be too
easy we're generating these immensely
complex function approximator x' and
then we're just poking at
ways of saying boy well what does this
thing do under these situations and I'll
leave leave you with one image which
I'll present without comment and then
hand it over to Oscar all right Thank
You Karl
thanks Lex for the invite yes my name is
Oscar run the the machine learning team
at active autonomy so every weekend with
this slide
specification was you know quite
literally a joke so this is an actual
comic I won't have seen this before
okay well I was doing my PhD in this era
where you know building a bird
classifier was like a PhD project right
and it was it was you know it's funny
because it's true and then of course as
you well know the deep learning
revolution happened and unless you know
previous introductory slides gives a
great overview I don't want to redo that
I just want to say sort of a straight
line from what I consider the
breakthrough paper by Chris Jeff's key
at all to the to the work I'll be
talking about today I was sort of these
three so you had the you know deep
learning and to end learning for you is
genetic classification by Christian
Hinton that papers been cited 35,000
times I checked yesterday then 2014 Ross
Kirsch ago at Berkeley basically showed
how to you know repurpose the deep
learning architecture to do a detection
in images and that was the first time
when the visual community really started
seeing
okay so classification is more general I
can classify anything an image an audio
signal whatever right but detection
images was very intimate to the computer
vision community we thought we were best
in the world right so in this paper came
out that was sort of the the final the
final argument from I okay we all need
to do deep learning now right and then
2016 this this paper came out the single
shot multi box detector which i think is
a great paper finally at all
so if you haven't looked at you haven't
looked at this paper by all means read
them carefully
you know that's the result
you know performance is no longer a joke
right so this is this is a network that
we developed in our in my group so it's
a it's an image joint image
classification segmentation network this
thing we can run this at 200 Hertz on a
single GPU
and in this video in this
rendering there is no tracking apply
there is no temporal smoothing every
single frame is analyzed independently
from the other one and you can see that
we can model several different classes
you know both both boxes and and and the
surfaces at the same time there's my
cartoon drawing of a perception system
an autonomous vehicle so you have the
three different main sense of analyses
typically have some module that does
detection and tracking you know this
tons of variations of the dis of course
we you have some sort of sense of
pipelines and then in the end you have a
tracking infusion step right so what I
showed you in the previous video is
basically at this part so I did like I
said it was a tracking but it's like
going from the camera to detection and
if you look you know when I started so I
come strict from the computer science
learning community so when I start
looking at this pipeline I'm like why
are there so many steps why aren't we
optimizing things you know end to end so
obviously like there is a there's a real
temptation to just wrap everything in a
kernel it's very well defined input
output function and like like Karl
alluded to it's it's one that can be
verified quite quite well assuming you
have the right data I'm not going to be
talking about this I am going to talk
about this namely the building a a deep
learning kernel for the liteup pipeline
and a lot of pipeline is arguably the
backbone of the reception system for for
for most autonomous driving systems so
what we're going to do is so this is
basically going to be the goal here so
we're going to have a point cloud simple
and we're gonna have a have a neural
network that takes that in simple and
then generates 3d bounding boxes that
are in the world coordinate system so
it's like 20 meters that way it's two
meters wide so long this this rotation
and this orientation and so on so yeah
so that's what this talk is about so I'm
going to talk about point pillars which
is a new method we developed for this
and new scenes which is a benchmark data
that we released okay so the supporters
point bill as well it's a novel point
cloud encoder that's what we do is we
learn a representation that is suitable
for downstream detection it's almost
like the main innovation is the
translation from a point cloud to a to a
canvas that can then be processed by by
a similar architecture that you would
use in an image
and we are sure it outperforms the you
know all publish measures on kitty by a
large margin especially with respect to
inference speed and there's a pre
printout and some code available if you
guys want to play around with it so the
architecture that we're going to use
looks like something like this and I
should say most papers in this space use
this architecture so it's it's kind of a
natural design right so you have the
point cloud and at the top you have this
encoder and that's where we introduced
the point pillars but you can have I'll
show you guys you can have various types
of encoders and then after that that
fits into backbone which is now a
standard convolutional 2d backbone you
have a detection head and you have you
might have you may or may not have a
segmentation at all that right the point
is that after the encoder everything
looks just like dark - is very similar
to the SSD architecture of the our CNN
architecture
so so let's go into a little bit more
detail right so so the range so what
you're given here is it's a range of the
meter say you wanted a model you know 40
meters afforded me to circle around
example you have certain resolution of
your bins and then a number of output
channels right so input is a set of
pillars or and the pillar here is a
vertical column right so you have n m of
those that are non-empty in this space
and you say a pillar P contains all the
points which are a lot of point XYZ and
intensity and there's n sub M indexed by
M points in each pillar right so just to
say that it varies right so it could be
one single point at a particular
location it could be 200 points and then
it's centered around the spin and the
goal here is to produce a tensor as a
fixed size so it's height which is you
know range of a resolution with inter
resolution and then this parameter C C
is the number of channels so in an image
C will be three we don't necessarily
care about that we call it a pseudo
image but it's the same thing it's a
fixed number of channels that the back
broken and operate on
yes here's a same thing without math
right so you have a lot of points and
you have this space with you just grid
it up in these pillars right some are
empty some one of them so in this sort
of with this notation let me give a
little bit of a literature review people
tend to do is you take each pillar and
you divide it into voxels right so now I
have a 3d box or grid right and then you
say I'm gonna extract some sort of
features for each box so for example how
many points are in this voxel or what is
the maximum intensity of all the points
in this voxel then you extract feature
for the whole pillar right what is the
max intensity across all the points in
the whole pillar right
all of these are ten engineer functions
that generates the fixed length output
so what you can do is you can now
concatenate them and their output is a
this tensor X Y see
so then Vox on that came came around
I'd say year or so ago maybe a little
bit more by now so they do the first
this first step is similar right so you
divide each pillar into voxels and then
you take you map the point in each
voxels and the normal thing here is that
they they got rid of the future
engineering so they said we'll we'll map
it from a voxel to two features using a
point net and I'm not going into the
details of a point net but but it's
basically a network architecture that
that allows you to take a point cloud
and map it to again a fixed length
representation it's a series of 1d
convolutions and max pooling layers this
is a very neat paper right so what they
did is they okay we say we apply that to
each voxel but now I end up with this
awkward four dimensional tensor because
I still have X Y Z from the voxels and
then I have this C dimensional output
from the appointment so then they have
to consolidate this Z dimension through
a 3d convolution right and now you
achieve your X Y C tensor so now you're
ready to go so it's very nice in the
sense that it's an turn method they show
good performance but in the day was very
slow as I got like five Hertz run time
and then the the culprit here is is this
last step so the 3d convolution it's
it's much much slower than a standard 2d
convolution alright so here's what we
did we basically said let's just forget
about voxels we'll take all the points
in the pillar and we'll put it straight
through a point in it that's it so just
that single change gave a 10200 fold you
know speed up from walks on that and
then we simplify the point net so now
instead of having so a point that can
have several layers and several modules
inside it so it we simplified it to a
single one deconvolution and max falling
layer and then we showed you can get a
really fast implementation by taking all
your pillars that are not empty stack
them together into nice dense tensor
with a little bit of padding here and
there
and you can run that
you know run the forward pass with a
single you can post it as a 2d
convolution with a one by one kernel so
in the final encoder runtime it's not
1.3 milliseconds which is which is
really really fast so the full method
looks like this right so you have the
point cloud you have this pillar feature
net which which is the encoder so the
different steps there that feeds
straight into the backbone and your
detection heads and and there you go so
it's still a multi-stage architecture
but of course the key is that none of
the steps are all the steps are you know
fully parameterized and we learnt we can
back propagate through the whole thing
and learn it so putting these things
together these were the sort results we
got on the Qt benchmark so if you look
at the core class right we actually got
the highest performance so this is I
think the bird's eye view metric and we
even outperformed the the methods that
relied on lidar ambition and we did that
running at you know over a little bit
over 60 Hertz and we you know and this
is like I said this is a bird's eye view
we can also measure the 3d the 3d
benchmark and we get the same very
similar performance yeah so you know
recorded well cyclist did well
pedestrian there was there was one or
two map methods fusion methods that did
a little bit better but then in
aggregate on the top left we ended up on
top and I put a little asterisk here
this is compared to publish methods at
the time of submission it's so many
things happening so quickly so there's
tons of you know submissions not a
kiddie leaderboard that are a completely
anonymous oh we don't even know you know
what was it what was the input what they
did they use so we only compared to
publish methods
so here's a some quantitative results
you have the we you know just for
visualization you can project them into
the image so you see the gray boxes are
the ground truth and the the corridor
ones are the predictions and yeah some
some challenging challenging us is so
smaller but them so we have for example
the person right there that's a you know
a person with a little stand get
interpreted as a bicycle we have this
man on the ladder which is an actual
annotation error so we discovered it as
a person but it wasn't annotated in the
data here's a young child on a bicycle
that didn't get detected so that's a you
know that's that's a bummer
okay so stubs Kitty and then I just
wanted to show you guys of course we can
run this on our vehicle so this is a
rendering we just deploy the network by
two Hertz on on the full 360 sensor
sweet input is still alive you know if
you lidar sweeps but just projected into
the images for visualization and again
no tracking or smoothing applied here so
it's every single frame is is analyzed
independently see those arrows sticking
out that's the velocity estimate so we
actually show how you can yeah you can
actually cumulate multiple point clouds
into this method and now you can start
reasoning about velocity as well
you
so the second part I want to talk about
is new scenes which is a
data said that we have published alright
so what is new scene so it's one
thousand twenty second scenes and that
we collected with our development
platforms it's a full it's the same
platform that called show I sort of
previous generation platform the so a
vehicle so it's full you know the full
automotive sends to sweep data is
registered and synced in 360-degree view
and it's also fully annotated with 3d
bounding boxes I think there is over 1
million 3d bounding boxes and we
actually make this freely available for
research so you can go to new scene
store right now and download a teaser a
teaser release which is 100 scenes the
full release will be in about a month
and a person motivation is
straightforward right so you know the
whole field is driven by benchmark and
you know without image and I don't think
none of it might be the case that none
of us are here we're here right because
they may never have been able to write
that first paper and sort of start this
whole thing going
looking at 3d I looked at the kiddie
benchmark which is which is truly
groundbreaking I don't want to take
anything away but it was becoming
outdated that they don't have full 3d
view they don't have any radar so I
think this this offers the opportunity
to sort of push push the field forward a
little bit right and just as a
comparison this is sort of the the most
similar benchmark and really the only
one that is the that you can really
compare to is kitty
but so there's other data sets that have
maybe lidar only tons of data sets I
have image only of course but it's it's
a it's quite a big step up from from
kidney yeah some some details so you see
the layouts with the the Raiders along
the edge all the cameras on the roof and
the top top lidar and some of the
receptive fields and this data is all on
the website the taxonomy so we model
several different sub sub categories of
pedestrians several types of vehicles
some static objects barrier cones and
then in addition all the bunch of
attributes on the vehicles and on the
pedestrians all right so with without
further ado let's just look at some data
so this is one of the thousand scenes
right so all I'm showing here is
just just playing the frames one by one
of all the images and again the
annotations are living the in the world
coordinate system right so there are
full three full 3d boxes I've just
projected them into the image and that's
what's so neat so we're not really
annotating the lidar or the or the
camera or the radar we're annotating the
actual objects and put them in a wall
coordinate system and give all the
transformation so you guys can play
around with it how you like so just to
show that so I can because everything is
ready so I can now take the light or
sweep and I can just project them into
the image images at the same time so
here I'm showing just colored by
distance so now you have some sort of
sparse density measurement on the images
a distance measurement sorry so so
that's all I want there let's talk about
thank you hi I was really really
interested in your discussion around
validation and particularly continuous
development that sort of thing and so my
question was basically is is this new
scenes data set is this enough to to
guarantee that your model is going to
generalize to unseen data and you know
not hit pedestrians in that stuff or do
you have other validation that you need
to do no no I mean so the new sensor for
this is it's purely an academic efforts
so we want to share our data with
academic community to drive the car to
feel forward we're not making any claims
that this is somehow a sufficient data
set for trying to save the case it's a
small subset of our our data yeah I
would say you know obviously my
background is in the academic world one
of the hardest things was always
collecting data because it's difficult
and expensive and so having access to a
data set like that which was expensive
to to collect and annotate but which we
thought we would make available because
well we hoped that it would spark
academic interests and smart people like
the people in this room coming up with
new and better algorithms which could
benefit the whole community and then
maybe something
even want to come work with us adaptive
so not totally a little bit of self
interest there wasn't intended to be for
validation was more for research to give
you a sense than the scale of validation
there was one quote there and you know
saying you got to drive 275 million
miles or more depending on your
certainty you want to impose but to date
is an industry we've driven about like
twelve million miles to twelve to
fourteen million miles in some all
participants in autonomous mode under
hundreds of over hundreds of different
Bills of code and many different
environments so this would now be saying
you're supposed to drive hundreds of
millions of miles in a particular
environment on a single build of code a
single platform now obviously we're
probably not going to do that what we'll
end up doing is supplementing the
driving with quite a lot of simulation
and then other methodologies to convince
ourselves that we have we can make a
statistical ultimately a statistical
argument for safety so there'll be use
of data sets like this you know we'll be
doing lots of regression testing on
supersize version of data set
either kind of morally equivalent
versions to test different parts of the
systems now I'm not just classification
but different aspects of the system are
motion planning decision-making
localization all aspects of the system
and then augment that with on-road
driving and augment that with simulation
so the safety case is really quite a bit
broader unfortunately then any single
data set would allow you to to kind of
speak to from an industrial perspective
what do you think can 5g offer for
autonomous vehicles 5g yeah it's an
interesting one well these vehicles are
connected you know that's that's a
that's a requirement
certainly when you think about operating
them as a fleet when the day comes when
you have an autonomous vehicle that is
personally owned and that they will come
in some point in the future it may or
may not be connected it will almost
certainly then be too but when you have
a fleet of vehicles and you want to
coordinate the activity that fleet and a
way to you know maximize the efficiency
of that network that transportation
network
they're certainly connected the
requirements of that kind of tivity is
fairly relaxed if you're talking about
just passing back and forth the position
of the car and maybe some status
indicators you know are you know
autonomous mode manual mode are all
systems go where you have a fault code
and what is it now there's some
interesting requirements that become a
little bit more stringent if you think
about what we call teleoperation and
remote operation of the car the case
where if the car encounters a situation
it doesn't recognize can't figure out
gets stuck or confused you may kind of
phone a human operator who's sitting
remotely to intervene and in that case
you know that human operator will want
to have some situational awareness there
may be a demand of high-bandwidth
low-latency
high reliability the sort that maybe 5g
is better suited to than 4G or LTE or
whatever you've got broadly speaking we
see it as very nice to have but like any
infrastructure we understand that it's
gonna arrive on a time line of its own
and be maintained by someone who's not
us so it's very much outside our control
and so for that reason we design a
system such that we don't rely on kind
of the coming 5g way but we'll certainly
welcome it when it arrives
so you said you have presence in 45
countries so did you observe any
interesting patterns from that like your
car your same your same
self-driving car model that is deployed
in Vegas as well as Singapore was able
to perform equally well in both Vegas
and Singapore the model was able to
perform very well in Singapore compared
to Vegas to speak to your question about
like country to country variation you
know we touched on that for a moment in
the validation discussion but obviously
driving in Singapore and driving in
Vegas is pretty different I mean you're
on the other side of the road for
starters but different traffic rules and
it's sort of underappreciated people
drive differently
there's slightly different traffic norms
so one of the things that well if anyone
was in this class last year
my co-founder Emilio gave a talk about
something we call rule books which is a
structure that we've designed around
that what we call the driving policy or
the decision-making engine which tries
to admit in a general and fairly
flexible way the ability to reprioritize
rules reassign rules change weights on
rules to enable us to drive in one
community and then another in a fairly
seamless manner so they give you an
example when we when you want to get on
the road in Singapore if you can imagine
you've got a so you're let's say you're
a autonomy engineer who was tasked with
writing the decision-making engine you
decided I'm gonna do a finite-state
architecture I'm gonna write down some
transition rules I'm gonna do them by
hand it's gonna be great and then you
did that for the right-hand driving and
your boss came in and said oh yeah next
Monday we're gonna be the left-hand
driving so you flip all that and get it
ready to go that could be a huge pain
pain to do because it's generally
speaking you're doing it manually and
then a very difficult to validate to
ensure that that the outputs are correct
across the entire spectrum possibilities
so we wanted to avoid that and so the
long story short we actually quite
carefully designed the system such that
we can scale to different cities and
countries and one of the ways you do
that is by thinking carefully around the
architectural design of the
decision-making engine but it's it's
it's you know quite different this for
cities I mentioned which are primary
sites Boston Pittsburgh Vegas and
Singapore spans a wide spectrum of
driving conditions I mean everybody
knows Boston which is pretty bad
Vegas is warm weather mid density urban
but it's Vegas so I mean all kinds of
stuff and then Singapore is interesting
perfect infrastructure of good weather
flat people generally speaking obey the
rules so it's kind of close to the ideal
case so you that exposure to this
different spectrum of data I think I'll
speak for Oscar maybe it's pretty
valuable I know for other parts of the
development team quite valuable
Singapore is ideal except they're the
constant construction zones so every
time you drive out there's a new
construction zone so we focus to have a
lot of work in construction zone
detection Singapore and the torrential
rain yeah in the jaywalkers right they
do a walk people don't break the radio
AJ so other than that is perfect so
which country is fully equipped it's a
really good question
yeah well it's interesting because
there's other dimensions so when we look
at which countries are interesting to us
to be in as a market there's there's the
infrastructure conditions
there's the driving patterns and
properties the density you know is it
Times Square at rush hour or is it
Dubuque Iowa there is the regulatory
environment which is incredibly
important you may have a perfectly
well-suited city from a technical
perspective and they may not allow you
to drive there so it's really all of
these things put together and so we you
know we kind of have a matrix we analyze
which cities check these boxes and and
assign them scores and then try to
understand then also the economics of
that market is that city check all these
boxes but there's no one taking using
mobility services there there's no
opportunity to actually generate revenue
from the service so we can you know you
factor in all of those things yeah and I
think I mean one thing to keep in mind
that is always the first first thing I
thought candidates when I interview them
there's a huge difference in the
advantage to the business more we're
proposing right that's right right
having service so we can choose even if
we commit to some city we can solve
something you know select the routes
that we feel comfortable and we can roll
it out sort of piece by piece when you
say okay we we don't feel comfortable
when drive at night in the city yet so
we just won't accept any rights right so
that there's like that that decision
space as well
hi thank you very much for coming and
giving us this talk today was very very
interesting I have a question which
might reveal more about how naive I am
than anything else I was comparing your
your point pillar approach to the
earlier approach where you were which is
this the voxel based approach to
interpreting the lidar results and in
the voxels you had a four dimensional
tensor that you were starting with and
you
your point Miller you only have three
dimensions you're throwing away the Z as
I understood it so when you do that are
you concerned that you're losing
information about potential occlusions
or transparencies or semi occlusions is
this a concern I think so so I may have
you know I've been a little bit sloppy
there so we're certainly not throwing
away the see what we're saying is that
we're learning the embedding of in the C
dimension jointly with with everything
else so Volk sonnet if you want sort of
felt that when I first signed that paper
that I felt the need to like
spoon-feeding Network a little bit and
say let's learn everything you know
stratified in this in this high
dimension and then we'll have a second
step where we learn to consolidate that
into a single vector we just said why
don't just learn those things together
so yeah thanks for a talk I have a
question for Carl you mentioned that
like if people make change to the code
do we need another validation or not so
I work in the industry of nuclear power
so we do nuclear power simulations so
when we like make any change to our
simulation code and to make it
commercialize we need to submit a
request for NRC which is the nuclear
Regulation Committee so in your opinion
do you think for self-driving we need
another third-party validation community
or not or like should that be a third
party or it's just self check yeah
that's a really good question
so I don't know the answer I would be
surprised
let me put it this way I would not be
surprised either way if the automotive
industry ended up with with third party
regulatory or oversight or it didn't and
I'll tell you why there's there's great
precedents for what you just described
nuclear aerospace there's external
bodies who have deep technical
competence who can come in they can do
investigations they can impose strict
regulation or or advise regulation and
they can they can partner or
our define requirements for
certification of various types the
automotive industry has largely been
self-certifying
there's an argument which is which is
certainly not unreasonable that you have
a you know a real alignment of incentive
within the industry and with the public
to be as safe as possible simply put the
cost of a crashes is enormous
you know economically socially
everything else but whether it continues
along that path I couldn't tell you it's
an interesting space because it's one
where the federal government is actually
moving very very quickly I mean I would
say carefully to not overstepping and
not trying to impose too much regulation
around an industry that has never
generated a dollar of revenue is still
quite NASA but if you would have told me
a few years ago that there would have
been very thoughtfully defined draft
regulatory guidelines or advice I mean
let's say it's not firm regulation
around this industry I probably wouldn't
believe you but in fact that exists
there's a third version that was
released this summer by the Department
of Transportation so there's intense
interest on the regulatory side in terms
of how far you know the process goes in
terms of formation of an external body I
think really remains to be seen I don't
know the answer thanks for your
insightful talk looking at this slide
I'm wondering how easy and effective
your train models are to transfer across
different letters and whether you need
for example if it is snowing do we need
specific trainings for specifically for
your light hours to work effectively or
you don't see any issues in that regard
no I mean I think the same rules apply
to this method us as any other machine
learning based method you want to have
support in your training data for the
situation want to deploy in so if we
have no snow you know train that I
wouldn't go and deploy this in snow
I do like one thing I like after having
worked so much with mission though is
that light the lighter point cloud is
really easy to augment and play around
with so for example it's you know if you
wanna say you want to be robust some
really rare events right so let's say
there's a piano on the road I really
want to detect that but it's hard
because I have very few examples of
pianos on the road right now if you
think about augmenting your visual data
set with that data it's actually quite
tricky so that easy to have a
photorealistic piano in your training
data but it is quite easy to do that in
your lighter alright so you have a 3d
model of your obvious piano you have
your your the model for your lidar and
you can get a pretty accurate fairly
realistic point cloud return from that
right so I like that part about working
with lighter you can you can augment you
can play around with it in fact one of
the things we do when we train this
model is that we we copy and paste
samples from from or like objects from
different samples you can take a car
that I saw yesterday take that point the
points the point difference on that car
you can just paste it into your current
light or sweep you have to be a little
bit careful right and I'm and this was
actually proposed by another very
previous paper and we found that that
was really useful they don't it sounds
absurd but it actually works and it
speaks to the ability to do that with
leather punk rock okay great please give
Carl and Oscar up again thank you so
much
you