Transcript

53YvP6gdD7U • Deep Learning State of the Art (2019)
/home/itcorpmy/itcorp.my.id/harry/yt_channel/out/lexfridman/.shards/text-0001.zst#text/0060_53YvP6gdD7U.txt
Back Raw
Kind: captions
Language: en
the thing I would very much like to talk
about today is the state of the art in
deep learning here we stand in 2019
really at the height of some of the
greater colleges that have happened but
also stand at the beginning it's up to
us to define where this incredible
data-driven technology takes us and so
I'd like to talk a little bit about the
breakthroughs that happened in 2017 and
2018 that take us to this point so this
lecture is not on the state of the art
results on main machine learning
benchmarks so the various image
classification and object detection or
the NLP benchmarks or the Gann
benchmarks this isn't about the cutting
edge algorithm that's available I'll get
help that performs best on a particular
benchmark this is about ideas ideas and
developments that are at the cutting
edge of what defines this exciting field
of deep learning and so I'd like to go
through a bunch of different areas that
I think are really exciting now of
course this is also not a lecture that's
complete there's other things that I may
be totally missing that happened in
2017-18 that are particularly exciting
to people here people beyond for example
medical applications of deep learning is
something I totally don't touch on and
protein folding and all kinds of
applications that there has been some
exciting developments from deep mind and
so on that don't touch on so forgive me
if your favorite developments are
missing but hopefully this encompasses
some of the really fundamental things
that have happened both on a theory side
on the application side and then the
community side of all of us being able
to work together on this and these kinds
of technologies
I think 2018 in terms of deep learning
is the year of natural language
processing many have described this year
as the imagenet moment in 2012
for computer vision when Alex net was
the first Neal now I really gave that
big jump in performance and computer
vision it started to inspire people
what's possible with deep learning with
purely learning based methods in the
same way there's been a series of
developments from 2016-17 led up to 18
with a development of Burt that has made
on benchmarks and in our ability to
apply NLP to solve various NLP tasks
natural language processing tasks a
total leap so let's tell the story of
what takes us there there's a few
developments I've mentioned a little bit
on Monday about the encoder decoder or
recurrent neural networks so this idea
of recurrent neural networks encode
sequences of data and I'll put something
out put either a single prediction or
another sequence when the input sequence
and the output sequence are not the same
necessarily the same size they're like
in machine translation we have to
translate from one language to another
the encoder decoder architecture takes
the following process it takes in the
sequence of words or the sequence of
samples as the input and uses the
recurrent units whether that's cells TM
er user beyond and encodes that sentence
into a single vector so forms an
embedding of that sentence of what it
represented a representation of that
sentence and then feeds that
representation in the decoder recurrent
neural network that then generates the
the sequence of words that form the
sentence in the language that's being
translated to so first you encode by
taking the sequence and mapping it to a
fixed size vector representation and
then you decode by taking that fixed
size vector representation
rolling it into the sentence that can be
of different length than the input
sentence okay that's the encoder/decoder
structure for recurrent neural networks
has been very effective for machine
translation and dealing with arbitrary
length input sequences arbitrary length
output sequences next step attention
what is attention well it's the next
step beyond it's an improvement on the
the encoder decoder architecture it
allows the it provides a mechanism that
allows to look back at the input
sequence so as opposed to saying ever
that you have a seek sequence if that's
the input sentence and that that all
gets collapsed into a single vector
representation you're allowed to look
back at the particular samples from the
input sequence as part of the decoding
process that's attention and you can
also learn which aspects are important
for which aspects of the decoding
process which aspects the input sequence
are important to the opposite once
visualize another way and this is
there's a few visualizations here
they're quite incredible that are done
by jay Alomar I highly recommend you
follow the links and and look at the
further details of these visualizations
of attention so if we look at neural
machine translation the encoder RNN
takes a sequence of words and throughout
after every sequence forms a set of
hidden representations it hidden a
hidden state that captures the
representation of the wars that followed
and those sets of hidden representations
as opposed to being collapsed to a
single fixed size vector are then all
pushed forward to the decoder that are
then used by the Dakota to translate but
in a selective way where the decoder
here visualized on the y-axis the input
language and though on the X still what
the output language
the decoder weighs the different parts
of the input sequence differently in
order to determine how to best translate
generate the word that forms a
translation in the full output sentence
okay that's attention allowing expanding
the encoder decoder architecture to
allow for selective attention to the
input sequence as opposed to collapsing
everything down into fixed
representation okay next step self
attention in the encoding process
allowing the encoder to also selectively
look informing the hidden
representations at other parts of the
input sequence in order to form those
representations it allows you to
determine for certain words what are the
important relevant aspects of the input
sequence that can help you encode that
word the best so it improves the encoder
process by allowing it to look at the
entirety of the context that's self
attention building on that transformer
it's using the self attention mechanism
in the encoder to form these sets of
representations on the input sequence
and then as part of the decoding process
follow the same but in Reverse
with a bunch of self attention that's
able to look back again so it's self
attention on the encoder attention on
the decoder and that's where the magic
that that's where the entirety magic is
that's able to capture the rich context
of the input sequence in order to
generate in the contextual way the
output sequence so let's take a step
back then and look at what is critical
to natural language in order to be able
to reason about words construct a
language model and be able to reason
about the words in order to classify a
sentence or translate a sentence
or compared to sentences and so on there
the sentences are collections of words
or characters and those characters and
words have to have an efficient
representation that's meaningful for
that kind of understanding and that's
what the process of embedding is we
talked a little bit about it on Monday
and so the traditional word Tyvek
process of embedding
is you use some kind of trick and
unsupervised way to map words into into
a compressed representation so language
modeling is the a is the process of
determining which words follow each
other usually so one way you can use it
as in a skip gram model taking a huge
datasets of words
you know there's writing all over the
place taking those datasets and feeding
a neural network that in a supervised
way looks at which words are usually
follow the input so the input is a word
the output is which word are
statistically likely to follow that word
and the same with the preceding word and
doing this kind of unsupervised learning
which is what word Tyvek does if you
throw away the output and the input and
just taking the hidden representation
form in the middle that's how you form
this compressed embedding a meaningful
representation that when two words are
related in a language modeling sense
towards they're related they're going to
be in that representation close to each
other and when they're totally unrelated
have nothing to do with each other
they're far away elmo is the approach of
using bi-directional as L STM's to learn
that representation and what
bi-directional bi-directionally so
looking not just the sequence to let up
to the word but in both directions the
sequence that following the sequence
that before and that allows you to learn
the rich full context of the word in
learning the rich full context of the
word you're forming representations that
are much better able to represent the
the statistical language model behind
the kind of corpus of language that
you're you're looking at and this has
taken a big leap in ability to then that
for further algorithms then with the
language model a reasoning about doing
things like sentence classification
sentence comparisons so on translation
that representation is much more
effective for working with language the
idea of the open AI transformer is the
next step forward is taking the the same
transformer that I mentioned previously
the encoder with self attention decoder
with attention looking back at the input
sequence and using it
taking the like taking the language
learned by the decoder and using that as
a language model and then chopping off
layers and training in a specific on a
specific language tasks like sentence
classification now Bert is the thing
that
did the big leap in performance with the
transformer formulation there is always
there's no bi-directional element there
is it's always moving forward so the
encoding step and the decoding step with
Burt is it's richly bi-directional it
takes in the full sequence of the
sentence and masks out some percentage
of the words 15% of the words 15% of the
samples of tokens from the sequence and
tasks the entire encoding self attention
mechanism to predict the words that are
missing that construct and then you
stack a ton of them together a ton of
those encoders self attention
feed-forward Network self attention feed
forward Network together and that allows
you to learn the rich context of the
language to then at the end perform all
kinds of tasks you can create first of
all like Elmo and like we're Tyvek
create rich contextual embeddings
take a set of words and represent them
in the space that's very efficient to
reason with you can do language
classification you can do settings pair
classification you could do the
similarity of two sentences multiple
choice question answering general
question answering tagging of sentences
okay
I'll link it on that one a little bit
too long but it is also the one I'm
really excited about and really if
there's a breakthrough this year it's
been it's things to Burt the other thing
I'm very excited about is totally
jumping away from the new rips the
theory the those kind of academic
developments and deep learning and into
the world of applied deep learning so
Tesla has a system called autopilot
where the hardware version 2 of that
system
is a newer is uh is a implementation of
the Nvidia Drive px 2 system which runs
a ton of neural networks there's 8
cameras on the car and a variant of the
inception network is now taking in all a
cameras at different resolutions as
input and performing various tasks like
drivable area segmentation like object
detection and some basic localization
tasks so you have now a huge fleet of
vehicles where it's not engineers summer
I'm sure our engineers but it's really
regular consumers people that have
purchased the car have no understanding
in many cases of what in your own
networks limitations the capabilities
are so on now it has and you'll know
what is controlling the well being has
its decisions its perceptions and the
control decisions based on those
perceptions are controlling the life of
a human being and that to me is one of
the great sort of breakthroughs of 17
and 18 the in terms of the development
of what AI can do in a practical sense
in impacting the world and so one
billion miles over 1 billion miles have
been driven in autopilot now there's two
types of systems in currently operating
in Tesla's there's hardware version 1
hardware version 2 hardware version 1
was Intel mobile eye monocular camera
perception system as far as we know that
was not using a neural network and it
was a fixed system that wasn't learning
at least online learning in the Tesla's
the other is hardware version 2 and it's
about half and half now in terms of the
miles driven the hardware version 2 as a
neural network that's always learning
there's weekly updates it's always
improving the model shipping new weights
and so on that's that's the exciting set
of breakthroughs in terms of auto ml the
dream of automating some aspects or all
aspects there's many
aspects as possible of the machine
learning process where you can just drop
in a data set that you're working on and
the the system will automatically
determine all the parameters from the
details of the architectures the size
are the architecture the different
modules and then architecture the hyper
parameters use for training the
architecture running that they're doing
the inference everything all is done for
you all you just feed it as data so
that's been the success of the neural
architecture search in 16 and 17 and
there's been a few ideas with Google
Auto ml that's really trying to almost
create an API we just drop in your data
set and it's using reinforcement
learning and recurrent neural networks
to given a few modules stitch them
together in such a way where the
objective function is optimizing the
performance of the overall system and
they've showed a lot of exciting results
google showed and others that outperform
state AR systems both in terms of
efficiency and in terms of accuracy now
in eighteen they've been a few
improvements on this direction and one
of them is a Dannette where it's now
using the same reinforcement learning
auto ml formulation to build ensembles
on your network so in many cases
state-of-the-art performance can be
achieved by as opposed to taking a
single architecture is building up a
multitude and ensemble a collection of
architectures and that's what is doing
here is given candidate architectures is
stitching them together to form an
ensemble to get state-of-the-art
performance now that stadia our
performance is not a leap a breakthrough
leap forward but it's nevertheless a
step forward and it's it's a very
exciting field that's going to be
receiving more and more attention
there's an area of machine learning
that's heavily under studied and I think
it's extremely exciting area and if you
look at 2012 with Alex net achieving the
breakthrough performance of showing that
deep learning networks are capable of
from that point on from 2012 to today
there's been non-stop extremely active
developments of different architectures
that even on just imagenet alone on
doing the image classification tasks
have improved performance over and over
and over with totally new ideas now on
the other side on the data side there's
been very few ideas about how to do data
augmentation so data augmentation is the
process of you know it's what kids
always do when you learn about an object
right if you look at an object and you
kind of like twist it around is is
taking the the the raw data and messing
it was such a way that it can give you
much richer representation of what this
can this data can look like in other
forms in other in in other contexts in
the real world there's been very few
developments I think still and there's
this Auto augment is just the step a
tiny step into that direction that I
hope that we as a community invest a lot
of effort in so what our argument does
because it says ok so there's these data
augmentation methods like translating
the image sharing the image doing color
manipulation like color inversion let's
take those as basic actions you can take
and then use reinforcement learning and
an RNN again construct to stitch those
actions together in such a way that can
augment data like an image net - - when
you train on that data
it gets state-of-the-art performance so
mess with the data
you know in a way that optimizes the way
you mess with the data so and then
they've also showed that given that the
set of data augmentation policies that
are learned to optimize for example for
image net given the some kind of
architecture you can take that learn the
set of policies for data augmentation
and apply it to a totally different data
set so there's the process of transfer
learning so what is transfer learning
she talked about transfer learning you
have a neural network that learns to do
cat versus dog or no learns to do a
thousand class classification problem on
image net and then you transfer you chop
off few layers and you transfer on the
task of your own data set of cat versus
dog what you're transferring is the
weights that are learned on the image
net classification task and now you're
then fine-tuning those weights on the
specific personal cat vs. dog data set
you have now you could do the same thing
here you can transfer as part of the
transfer learning process take the data
augmentation policies learned on image
net and transfer those you can transfer
both the weights and the policies that's
a really super exciting idea I think it
wasn't quite demonstrated extremely well
here in terms of performance so it
gotten an improvement in performance and
so on but it kind of inspired an idea
that's something that we need to really
think about how to augment data in an
interesting way such that given just a
few samples of data we can generate huge
data sets in a way that you can then
form meaningful complex rich
representations from I think that's
really exciting in one of the ways that
you break open the problem of how do we
learn a lot from a little training
deep neural networks with synthetic data
this also really an exciting topic that
a few groups but especially invidious
invested a lot in and here's a from a CD
PR 2018 probably my favorite work on
this topic is they really went crazy and
said ok let's mess with synthetic data
in every way we could possibly can so on
the Left they're showing a set of
backgrounds then there's also a set of
artificial objects and you have a car or
some kind of object that you're trying
to classify so let's take that car and
mess with it with every way possible
apply lighting variation to whatever way
possible rotate everything that is crazy
so so what Nvidia is really good at is
creating realistic scenes and they said
okay let's create realistic scenes but
let's also go away aboveboard and not do
realistic at all do things that can't
possibly happen in reality and so
generally these huge data sets I wants
to train and again achieve quite
interesting quite a quite good
performance and image classification of
course they're trying to apply to image
and so on these kinds of tasks you're
not going to outperform networks that
were trained on image net but they show
that with just a small sample from from
those real images they can fine tune
this network train on synthetic I'm just
totally fake images to achieve stated
our performance again another way to
generate to get to learn a lot for very
little by generating fake worlds
synthetically the process of annotation
which for supervised learning is what
you need to do in order to train the
network you need to be able to provide
ground truth you need to be able to
label whatever the entity that is being
learned and so frame is classification
that's saying what is going on in the
image and part of that was done on image
net by doing a Google search for
creating candidates now saying what's
going on in the image is a pretty easy
tasks then there is the object detection
task of detecting the bounding box and
so saying drawing the actual bounding
box it's a little bit more difficult but
it's a couple of clicks and so on then
if we take the finals the the probably
one of the higher complexity tasks of
perception of image understanding is
segmentation is actually drawing either
pixel level or polygons the outline of a
particular object now if you have to
annotate that that's extremely costly so
the work with polygon RNN is to use
recurrent neural networks to make
suggestions for polygons it's really
interesting it's there's a few tricks to
perform these high-resolution polygons
so the idea is it drops in a single
point you draw a Paulo bunny box around
an object you use convolutional networks
to drop you the first point and then you
use recurrent neural networks to draw
around it and the performance is really
good there's a few tricks and this tool
is available online it's a really
interesting idea
again the dream with Auto ml is to
remove the human from the picture as
much as possible with data augmentation
remove the pyramid from the picture as
much as possible for a menial
data automate the boring stuff and in
this case the act of drawing a polygon
tried to automate it as much as possible
the interesting other dimension
along which deep learning is recently
been trying to be optimized is how do we
make deep learning accessible fast cheap
accessible so the dawn bench from
Stanford the benchmark the dawn bench
benchmark from Stanford asked formulated
an interesting competition which got a
lot of attention and a lot of progress
it's saying if we want to achieve 93%
accuracy on image net and 94% of see far
let's now compete that's like the
requirement let's not compete how you
can do it in the least amount of time
and for the least amount of dollars do
the training in the least amount of time
and the training in the least amount of
dollars like literally dollars you are
allowed to spend to do this and fast AI
you know it's a renegade renegade group
of deep learning researchers I've been
able to train an image net in three
hours
so this is for training process for 25
bucks so training a network that
achieves 93% accuracy for 25 bucks and
94% accuracy for 26 cents on C part-time
so the key idea that they were playing
with is quite simple but really boils
down to messing with the learning rate
throughout the process of training so
the learning rate is how much you based
on the loss function the air the neural
network observes how much do you adjust
the weights so they found that if they
crank up the learning rate while
decreasing the momentum which is a
parameter of the optimization process
where they do it that jointly they're
able to make the network learn really
fast that's really exciting and the
benchmark itself is also really exciting
because that's exactly for people
sitting in this room
that that opens up the door to doing all
kinds of fundamental deep learning
problems without the resources of google
deepmind or open AI or Facebook or so on
without computational resources that's
important for academia that's important
for independent researchers and so on so
ganz there's been a lot of work on
generative editorial neural networks and
in some ways there has not been
breakthrough ideas in ganz for quite a
quite a bit and I think began from from
google deepmind an ability to generate
incredibly high-resolution images and
it's the same Gantt technique so in
terms of break there's innovations but
scaled so the increase the model
capacity and increase the the batch size
the number of images that are felled
that are fed to the network it produces
incredible images I encourage you to go
online and and look at them it's hard to
believe that they're generated so that
was this of 2018 for Ganz was a year of
scaling and parameter tuning as opposed
to breakthrough new ideas video to video
synthesis this this work is from Nvidia
is looking at the problem so there's
been a lot of work on general going from
image to image so from a particular
image generating another image so
whether it's colorizing an image or just
to traditionally define ganz
the idea with video to video synthesis
that a few people have been working on
but nivea took a good step forward is to
make the video to make the temporal
consistency the temporal dynamics part
of the optimization process so make it
look not jumpy so if you look here at
the comparison the for this particular
so the input is the labels on the top
left and the output of the of the the
Nvidia approach is on the bottom right
see it's temper it's very temporarily
consistent if you look at the image to
image mapping that's that's
state-of-the-art pix to pix HD it's very
jumpy it's not temporally consistent at
all and there's some naive approaches
for trying to maintain temporal
consistency that's in the bottom left so
you can apply this to all kinds of tasks
all kinds of video to video mapping here
is mapping it to face edges edge
detection on faces mapping it to faces
generating faces from just edges you can
look at body pose to actual images so
you could as an input to the network you
can take the pose of the person and
generate the the video of the person
okay semantic segmentation the problem
of perception sort of began with Alex on
an image net has been further and
further developments where the input the
problem is of basic image classification
where the input is an image and the
output is a classification was going on
in that image and the fundamental
architecture can be reused for more
complex tasks like detection like
segmentation and so on interpreting
what's going on in the image so these
large networks from VJ Gina Google net
ResNet escena dense net all these
networks are forming rich
representations that can then be used
for all kinds of tasks whether that task
is object detection this here shown is
the region based methods where the
neural network is tasked the
convolutional layers make region
proposals so much of candidates to be
considered and then there's a step
that's determining what's in those
different regions and forming bounding
boxes around them in a for-loop way and
then there is the one-shot method single
shot method where in a single pass all
of the bonnie boxes in their classes
generated and there has been a
tremendous amount of work in the space
of object detection some are single the
single shot method some are region based
methods and there's been a lot of
exciting work but not more not I would
say breakthrough ideas and then we take
it to the highest level of perception
which is semantic segmentation there's
also been a lot of work there the state
of the art performance is at least for
the open source systems is deep lab v3
plus on the Pascal VLC challenge so
semantic segmentation and catch it all
up started 2014 with fully convolution
neural networks chopping off the fully
connected layers and then
I'll putting the heatmap very grainy
very very low resolution then improving
that was segments performing max pooling
with a breakthrough idea that's reused
in a lot of cases is dilated convolution
Atreus convolutions having some spacing
which increases the the field of view of
the convolutional filter the key idea
behind deep lab v3 that i is the state
of the art is the multi scale processing
without increasing the parameters the
multi scale is achieved by quote unquote
the Atreus rate so taking those atreus
convolutions and increasing the spacing
and you can think of the increasing that
spacing but by enlarging the model's
field of view and so you can consider
all these different scales of processing
and looking at the at the layers of
features so allowing you to be able to
grasp the greater context as part of the
op sampling deconvolution 'l step and
that's what's produced in the stadia our
performances and that's where we have
the the notebook tutorial on github
showing this deep lab architecture
trained on cityscapes to say escapes is
a driving segmentation data set that is
though that is one of the most commonly
used for the task of driving scene
segmentation okay on the deep
reinforcement learning front so this is
touching a bit a bit on the 2017 but i
think the excitement really settled in
in 2018 as the work from google and from
open AI deep mind so it started in DQ on
paper from google deepmind where they
beat a bunch of
a bunch of Atari games achieving
superhuman performance with deep
reinforcement learning methods that are
taking in just the raw pixels of the
game so the same kind of architecture is
able to learn how to beat these how to
beat these games super exciting idea
that kind of has echoes of what general
intelligence is taking in the raw raw
information and being able to understand
the game the sort of physics of the game
sufficient to be able to beat it then in
2016 alphago with some supervision and
some playing against itself self play
some supervised learning on expert world
champ layers and some self play where it
plays against itself I was able to beat
the top of the world champion and go and
then 2017 alphago zero a specialized
version of alpha zero was able to beat
the alphago with just a few days of
training and zero supervision from
expert games so through the process of
self play again this is kind of getting
the human out of the picture more and
more and more which is why alpha zero is
probably or this alphago zero was the
demonstration of the cleanest
demonstration of all the nice progress
in deep reinforcement learning I think
if we look at the history of AI when
you're sitting on a porch hundred years
from now sort of reminiscing back alpha
zero will be a thing that people will
remember as an interesting moment in
time as a key moment in time and alpha
zero was applied in 2017 to beat those
old alpha zero paper was in 2017 and it
was this year played stockfish in chess
which is the best engine chess playing
engines is able to beat it with just
four hours of training of course the
four hours this caveat because four
hours for google deepmind is highly
distributed training so it's not four
hours for an undergraduate student
sitting in their dorm room but meaning
it's it was able to self play to very
quickly learn to beat the
state-of-the-art chess engine and
learned to beat the state-of-the-art
shogi engine Elmo and the interesting
thing here is you know with perfect
information games like chess you have a
tree and you have all the decisions you
could possibly make and so the farther
along you look at along that tree
presumably the the better you do that's
how deep blue beat Kasparov in the 90s
is you just look as far as possible in a
down the tree to determine which is the
action is the most optimal if you look
at the way grab human grandmasters think
it certainly doesn't feel like they're
like looking down a tree there's
something like creative intuition
there's something like you could see the
patterns in the board you can do a few
calculations but really it's an order of
hundreds it's not on the order of
millions or billions which is kind of
the
the stock fish the state-of-the-art
chess engine approach and alpha zero is
moving closer and closer closer towards
the human Grandmaster concerning very
few future moves it's able through the
neural network estimator that's
estimating the quality of the move and
the quality of the different the quality
the current quality of the board and and
the quality of the moves that follow
it's able to do much much less look
ahead
so the neural network learns the
fundamental information just like when a
grandmaster looks at a board they can
tell how good that is so that's again
interesting it's a step towards at least
echoes of what human intelligence is in
this very structured formal constrained
world of chess and go and shogi and then
there's the other side of the world
that's messy it's still games it's still
constrained in that way but open AI has
taken on the challenge of playing games
that are much messier to have this
semblance of the real world and the fact
that you have to do teamwork you have to
look at long time horizons with huge
amounts of imperfect information hidden
information uncertainty so within that
world they've taken on the challenge of
a popular game dota 2 on the human side
of that there's the competition the
International hosted every year where
you know in 2018 the winning team gets
11 million dollars so it's a very
popular very active competition has been
going on for for for a few years
they've been improving and it achieved a
lot of interesting milestones in 2017
their 1v1 bot beat the top professional
dota 2 player the way you achieve great
things is as as you try and in 2018 they
tried to go 5v5 they'll open the IFI
team lost two games go against the top
to the top dota 2 players at the 2018
international and of course their
ranking here the MMR ranking in dota 2
has been increasing over and over but
there's a lot of challenges here that
make it extremely difficult to beat the
human players and and this is you know
in in every story rocky or whatever you
think about losing is essential element
of of a story that leads to then a movie
in a book and the greatness so you
better believe that they're coming back
next year and there's going to be a lot
of exciting developments there this it
also dota 2 and this particular video
game makes it currently there's really
two games that have the public eye in
terms of AI taking on his benchmarks so
we solve go incredible accomplishment
but what's next so last year the
associate were the best paper in
Europe's there was the heads up
Texas No Limit Hold'em AI was able to
beat the top level players was
completely current well not completely
but currently out of reach is the
general not heads up one versus one but
the general team Texas No Limit Hold'em
there you go and on the gaming side this
dream of dota 2 now that's the benchmark
that everybody's targeting and it's
actually incredibly difficult one and
some people think would be a long time
before we can we can win and on the the
more practical side of things the 2018
start in 2017 has been a year of the
frameworks growing up
of maturing and creating ecosystems
around them with tensorflow
with the history there dating back a few
years has really with tons of 41.0 as
has come to be sort of a mature
framework pi torch 1.0 came out 2018 is
matured as well and now the really
exciting developments in the tensile in
tensile flow with the eager execution
and beyond that's coming out intensive
flow 2.0 in in 2019 so really those two
those those two players have made
incredible leaps in standardizing deep
learning in in in in the fact that a lot
of the ideas I talked about today and
Monday and we'll keep talking about are
all have a github repository with
implementations intensive flow on PI
towards make him extremely accessible
and that's really exciting it's probably
best to quote Geoff Hinton the quote
unquote Godfather of deep learning one
of the key people behind backpropagation
said recently all brac propagation is my
view is throw it all away and start
again
he's believes backpropagation is is
totally broken and an idea that has
ancient and it needs to be completely
revolutionized
and the practical protocol for doing
that is he said the future depends on
some graduate student who's deeply
suspicious of everything I've said
that's probably a good way to two to end
the discussion about what the
state-of-the-art in deep learning holds
because everything we're doing is
fundamentally based on ideas from the
60s and the 80s and really in terms of
new ideas there has not been many new
ideas especially the state
state-of-the-art results that I've
mentioned are all based on fundamentally
on stochastic gradient descent and back
propagation
ripe for totally new ideas so it's up to
us to define the real breakthroughs and
the real state-of-the-art 2019 and and
beyond so that I'd like to thank you and
the stuff is on the website deep Larry
and I are mighty idea
you