Transcript

HVx9bwiMWGQ • MIT-AVT: Data Collection Device (for Large-Scale Semi-Autonomous Driving)
/home/itcorpmy/itcorp.my.id/harry/yt_channel/out/lexfridman/.shards/text-0001.zst#text/0042_HVx9bwiMWGQ.txt
Back Raw
Kind: captions
Language: en
the MIT autonomous vehicle technology
study is all about collecting large
amounts of naturalistic driving data
behind that data collection is this box
right here that Dan is term writer a Dan
is behind a lot of the hardware work we
do embedded systems and Michael is
behind a lot of the software the data
pipeline as well as just offloading the
data from the device would like to tell
you some of the details behind rider and
behind the sensors now we have three
cameras in the car and the wires are
running back into the trunk and that's
where the rider is sitting there's a lot
of design specifications to make this
system work month after month reliably
across multiple vehicles across multiple
weather conditions and so on at the end
of the day with multiple sensor streams
we have the three cameras coming in we
have IMU GPS and all of the raw canned
messages coming from the vehicle itself
and all of that has to be collected
reliably synchronized and post processed
once we offload the data first we have a
single board computer here running a
custom version of Linux that we wrote
specifically for this application this
single board computer integrates all of
the cameras all the sensors GPS can IMU
and offloads it all on to the
solid-state hard drive that we have on
board there are some extra components
here for cellular communication as well
as power management throughout the
device here we have our single board
computer as well as sensor integration
and our power system this is our
solid-state drive that connects directly
to our single board computer on our
single board computer we have a sensory
integration board on top here you'll be
able to see our real-time clock as well
as its battery backup and can
transceiver on the reverse side of this
board we have our GPS receiver an IMU
this is our can't control power board
which monitors can throughout the car
and determines whether or not the system
should be on or off when the system is
on this sends power through a buck
converter to reduce the 12 volts from
the vehicle down to 5 volts to operate
the single board computer we also have a
4G wireless connection on board to
monitor the health of rider and
determine things like free capacity left
on our dry
as well as temperature and power usage
information the cameras connect to Ryder
through this USB hub right here so we
needed the box to do at least three
things one was record from at least
three cameras record can vehicle
telemetry data and then lastly be able
to store all this data onboard for a
long period of time such that people
could drive around for months without
having us to offload the data from their
vehicles and so when we're talking about
hundreds of thousands of miles of worth
the data so for about every hundred
thousand miles uncompressed that's about
a hundred petabytes of video data so one
of the key other requirements was how to
store all this data both on the device
and how to be able to then offload us
successfully onto thousands of machines
to be then processed with the computer
vision with a deep learning algorithms
that we're using and one of the
essential elements for that was to do
compression onboard so these are
Logitech c920 webcam
they can do up to 1080p at 30 frames a
second the major reason why we went with
these is because they do onboard h.264
compression of the video so that allows
us to offload all the processing from
our single board computer onto these
individual cameras allowing us to use a
very slim pared-down
lightweight single board computer to run
all of these sensors this is the
original Logitech c920 that you would
buy at a store these are the two same
Logitech c920 s although they were put
into a custom-made camera case just for
this application what this allows us to
do is at our own C s type lenses to
enable us to have a zoom lens as well as
a fisheye lens from within the car
allowing us a greater range of field of
views inside the vehicle so this is the
fisheye lens this is the zoom lens and
the CS type there's also C type those
are types of standard lenses that are
connect to these types of cameras often
to the industrial cameras that are often
used for our Thomas vehicle applications
we tested these cameras to see what
would happen to them if placed inside of
a
a hot car and it's um on a summer day we
wanted to see what these cameras still
be able to hold up to this to the heat
in the summer and still function as
needed we put these cameras in a toaster
a scientific toaster what was the
temperature that went up to we cycled
these cameras between 58 and 75 degrees
Celsius which is about the maximum of a
hundred and fifty degree Fahrenheit max
temperature that a car would get in in
the summer we also cranked it up to 127
degrees Celsius just to see what would
happen to these cameras after prolonged
long-term high heat in fact these
cameras continued to work perfectly fine
after that creating a system that would
intelligently and autonomously turn off
and on to start and end recording was
also a key aspect to this device since
people were just going to be driving
their normal cars we couldn't rely on
them necessarily to start and end
recording so this device rider
intelligently figures out when the car
is running and when it's off to start
and stop recording automatically so how
does writers specifically know when to
turn on so we use can to determine when
the system should turn off and on when
can is active the car is running and we
should turn the system on when can is
inactive we should turn the system off
and end recording this also gives us the
ability to trigger on certain can
messages so for instance if we want to
start recording as soon as they approach
the car and unlock the door we can do
that or if they turn the car on or they
put it into drive or so on
the cost of the car that the system
resides in is about a thousand times
more than the system itself so these a
hundred thousand plus dollar cars so
we'll have to make sure that we design
the system we'll run the wires in such a
way that doesn't do any damage to the
vehicles what kind of things fail when
they fail the biggest issue we've had
with the system our camera cables
becoming unplugged so when a camera
cable becomes unplugged the system will
try to restart that subsystem multiple
times and if it's unable to it
completely shuts off recording and as
long as that cable is still unplugged
writer will not start up the next
so one issue that we've seen is that
cables becoming a plugged causes us to
lose the potential to record some data
and that was one of the requirements of
the system from the very beginning is
that all the video streams are always
recorded perfectly and synchronized now
if any of the systems are failing to be
recording from the sensors that we try
again restart the system restart the
system and if it's still not working it
should shut down so the video in order
to understand what drivers are doing
these systems the video is essential so
if one of the cameras is not working
that means the system that's not working
as a whole the other crucial component
of having a data collection system
that's taking the multiple streams is
that those streams have to be
synchronized perfectly synchronization
was the highest priority from the very
beginning of writers design we have a
real-time clock
onboard writer that allows us down to
two parts per million of accuracy and
time stamping this means over the course
of a one and a half hour drive our time
stamps issue to each of the different
subsystems may drift up to seven or so
milliseconds relatively this is
extremely small compared to most clocks
on computers today and once the data is
offloaded the very first thing we do is
make sure that the time stamping that
the data was time stamp correctly so
that we can synchronize it and the very
first thing is part of the data pipeline
would do is synchronize the data that
means using the time stamp that came
from the real-time clock that was
assigned to every single piece of sensor
data using that time stamp to align the
data together now for video that means
30 frames a second perfectly aligned
with other GPS signals and so on there
are some other sensors like I am you and
the can messages coming from the car
that come much more frequently than 30
Hertz 30 frames a second so we have a
different synchronization scheme there
but overall synchronization from the
very beginning of the design of the
hardware to the very end of the design
of the software pipeline is crucial
because we want to be able to analyze
what people are doing in these semi
autonomous vehicles how they're
interacting with the technology and that
means using data that comes from the
face camera the body camera the forward
view synchronized together with a GPS
that I'm you and all the messages coming
from the vehicle telemetry from camp the
video stream compression which is a very
much CPU or GPU intensive operations
performed onboard the camera there are
other CPU intensive operation performed
on Ryder like the sense of fusion for
IMU but for the most part there's
sufficient CPU cycles left for the
actual data collection to not have any
skips or drifts in the census stream
collection one of the questions we get
is how do we get the data from this box
to our computers then to the cluster
that's doing the compute so when we
receive a hard drive from one of these
Ryder boxes that we're swapping we
connect the hard drive locally to our
computers and then we do a remote copy
to a server that contains all of our
data we then check the data for
consistency and perform any fixes and
the raw data in preparation for a
synchronization operation so we're not
doing any remote offloading of data so
the data lives on Ryder until the
subjects the drivers the owners of the
car come back to us and offload the data
so we take the hard drive swap it out
and aweful the data from the hard drive
can you tell me this the journey that a
pixel takes on its way from the camera
to our cluster well first the camera
records the raw image data based on
these settings that we've configured
from the Ryder box and that raw image
data is compressed on the camera itself
into an h.264 come format and then
transmitted over the USB wire to the
single board computer on the Ryder box
then it's recorded on to the solid-state
drive in a video file where it will stay
until we do an offload in the course of
about six months for rnds subjects and
in one month for 50 subjects after that
it is connected to a local computer
synchronized within a remote server
and is then processed with initial
cleaning algorithms in order to remove
any corrupt data or to fix any subject
data in the configuration files for that
particular trip after the initial
cleaning is taken care of it is
synchronized at 30 frames per second and
can then be used for different detection
algorithms or manual annotation so the
important hard work behind the magic
that deep learning computer vision
unlocks is the synchronization the
cleaning of the messy data making sure
we get anything that's at all weird in
any way in the in the data out so that
at the end of the pipeline we have a
clean data set of multiple sensor
streams perfectly synchronized that we
can then use for both analysis and for
annotation so that we can improve the
neural network models used for the
various detection tasks so writers done
an amazing job over 30 vehicles of
collecting hundreds of thousands of
miles worth of data billions of video
frames so we're talking about an
incredible amount of data all compressed
with h.264 that's close to 300 terabytes
worth of data but of course you can
always improve so so what our next steps
one huge improvement for writer would be
transitioning to another single board
computer in particular a Jetson tx2
there's a lot more capability for added
sensors as well as much more compute
power and even the possibility for
developing some real-time systems with a
Jetson one of the critical things when
you're collecting huge amounts of data
and driving is you realize that most of
driving is quite boring nothing
interesting in terms of understanding
driver behavior or training computer
vision models for edge cases and so on
nothing interesting happens so one of
the future steps we're taking is based
on the thing we found in the data so far
we know which parts are interesting
which are not and so when a design on
board algorithms that are processing in
real time that video data
Herman is this the kind of data I want
to keep it this time and if not throw it
out that means we can collect more
efficiently just the bits that are
interesting for edge case neural network
model training or for understanding
human behavior now this is a totally
unknown open area because really we
don't understand what people do and send
me a time with vehicles when the car is
driving itself and the human is driving
itself so the initial stages of the
study were to keep all the data so we
can do the analysis to analyze the body
pose glance allocation activity
smartphone usage all the various Sun
decelerations autopilot usage where it's
used how it's used geographic weather
night so on but as we start to
understand where the fundamental
insights come from we can start to be
more and more selective about which
epochs of data we want to be collecting
now that requires real time processing
of the data and as Dan said that's where
the justin tx2 the power that the justin
takes to brings is becomes more and more
useful now all of this work is part of
the MIT autonomous vehicle technology
study we've collected over three hundred
twenty thousand miles so far and
collecting five hundred to a thousand
miles every day so we're always growing
adding new vehicles we're working at
adding a Tesla Model 3 a Cadillac ct-6
super cruise system and others one of
the driving principles behind our work
is that the kind of data collection we
need to design safe semi autonomous and
autonomous vehicles is that we need to
record not just the forward roadway or
any kind of sensor collection on the
external environment we need to have
rich sensor information about the
internal environment what the driver is
doing everything about their face the
glance all the cognitive load and body
pose everything about their activity we
truly believe that autonomy autonomous
vehicles require an understanding of how
human supervisors of those systems
behave how we can keep them attentive
keep their glance on the road keep them
as effective efficient supervisors of
those systems
you