Ian Goodfellow: Generative Adversarial Networks (GANs)

Ian Goodfellow: Generative Adversarial Networks (GANs) | Lex Fridman Podcast #19

Z6rxFNMGdn0 • 2019-04-18

Transcript preview

Open

Kind: captions
Language: en
the following is a conversation with Ian
good fellow he's the author of the
popular textbook on deep learning simply
titled deep learning he coined the term
of generative adversarial networks
otherwise known as Ganz and with his
2014 paper is responsible for launching
the incredible growth of research and
innovation in this subfield of deep
learning he got his BS and MS at
Stanford his PhD at University of
Montreal with yoshua bengio and Erin
Kerrville he held several research
positions including an open AI Google
brain and now at Apple as the director
of machine learning this recording
happened while Ian was still a Google
brain but we don't talk about anything
specific to Google or any other
organization this conversation is part
of the artificial intelligence podcast
if you enjoy it subscribe on YouTube
iTunes or simply connect with me on
Twitter at lex friedman spelled fri d
and now here's my conversation with Ian
good fellow you open your popular deep
learning book with a Russian doll type
diagram that shows deep learning is a
subset of representation learning which
in turn is a subset of machine learning
and finally a subset of AI so this kind
of implies that there may be limits to
deep learning in the context of AI so
what do you think is the current limits
of deep learning and are those limits
something that we can overcome with time
yeah I think one of the biggest
limitations of deep learning is that
right now it requires really a lot of
data especially labeled data there's
some unsupervised and semi-supervised
learning algorithms that can reduce the
amount of labeled data you need but they
still require a lot of unlabeled data
reinforcement learning algorithms they
don't need labels but they need really a
lot of experiences as human beings we
don't learn to play pong by failing at
pong two million times so just getting
the generalization ability better is one
of the most important bottlenecks and
the capability of the technology today
and then I guess I'd also say deep
learning is like a
of a bigger system so far nobody is
really proposing to have only what you'd
call deep learning as the entire
ingredient of intelligence you use deep
learning as sub modules of other systems
like alphago has a deep learning model
that estimates the value function most
reinforcement learning algorithms have a
deep learning module that estimates
which action to take next but you might
have other components here basically as
building a function estimator do you
think it's possible you said nobody is
kind of in thinking about this so far
but do you think neural networks could
be made to reason in the way symbolic
systems did in the 80s and 90s to do
more create more like programs as
opposed to functions yeah I think we
already see that a little bit I already
kind of think of neural nets as a kind
of program I think of deep learning as
basically learning programs that have
more than one step so if you draw a
flowchart or or if you draw a tensor
flow graph describing your machine
learning model I think of the depth of
that graph is describing the number of
steps that run in sequence and then the
width of that graph is the number of
steps that run in parallel now it's been
long enough that we've had deep learning
working that it's a little bit silly to
even discuss shallow learning anymore
but back when I first got involved in AI
when we used machine learning we were
usually learning things like support
vector machines you could have a lot of
input features to the model and you
could multiply each feature by a
different weight but all those
multiplications were done in parallel to
each other there wasn't a lot done in
series I think what we got with deep
learning was really the ability to have
steps of a program that run in sequence
and I think that we've actually started
to see that what's important with deep
learning is more the fact that we have a
multi-step program rather than the fact
that we've learned a representation if
you look at things like res nuts for
example they take one particular kind of
representation and they update it
several times back when deep learning
first really took off in the academic
world in 2006 when Geoff Hinton
showed that you could train deep belief
networks everybody who was under ested
in the idea thought of it as each layer
learns a different level of abstraction
but the first layer trained on images
learn something like edges and the
second layer learns corners and
eventually you get these kind of
grandmother's cell units that recognize
specific objects today I think most
people think of it more as a computer
program where as you add more layers you
can do more updates before you output
your final number but I don't think
anybody believes the layer 150 of the
resin it is a grand grandmother cell and
you know layer 100 is contours or
something like that okay so you think
you're not thinking of it as a singular
representation that keeps building you
think of it as a program sort of almost
like a state the representation is a
state of understanding and yeah I think
of it as a program that makes several
updates and arrives it better and better
understandings but it's not replacing
the representation at each step its
refining it and in some sense that's a
little bit like reasoning it's not
reasoning in the form of deduction but
it's reasoning in the form of taking a
thought and refining it and refining it
carefully until it's good enough to use
do you think and I hope you don't mind
we'll jump philosophical every once in a
while do you think of you know a
cognition human cognition or even
consciousness as simply a result of this
kind of cincuenta sequential
representation learning do you think
that can emerge cognition yes I think so
consciousness it's really hard to even
define what we mean by that I guess
there's consciousness is often defined
as things like having self-awareness and
that's relatively easy to turn into
something actionable for a computer
scientists the reason about people also
defined consciousness in terms of having
qualitative states of experience like
qualia and there's all these
philosophical problems like could you
imagine jambe who does all the same
information processing as a human but
doesn't really have the qualitative
experiences that we have that sort of
thing I have no idea how to formalize or
turn it into a scientific question I
don't know how you could run in
experiment to tell whether a person is a
zombie or not
and similarly I don't know how you could
run an experiment to tell whether an
advanced AI system had become conscious
in the sense of qualia or not but in the
more practical sense like almost like
self attention
you think consciousness and cognition
can in an impressive way emerge from
current types of architectures though
yes yeah or or if if you think of
consciousness in terms of self-awareness
and just making plans based on the fact
that the agent itself exists in the
world reinforcement learning algorithms
are already more or less forced to model
the agents effect on the environment so
that that more limited version of
consciousness is already something that
we get limited versions of with
reinforcement learning algorithms if
they're trained well but you say limited
so the the big question really is how
you jump from limited to human level
yeah right and whether it's possible you
know the even just building common-sense
reasoning seems to be exceptionally
difficult so K if we scale things up
forget much better on supervised
learning if we get better at labeling
forget bigger datasets and the more
compute do you think we'll start to see
really impressive things that go from
limited to you know something echoes of
human level cognition I think so yeah
I'm optimistic about what can happen
just with more computation and more data
I do think it'll be important to get the
right kind of data today most of the
machine learning systems we train our
mostly trained on one type of data for
each model but the human brain we get
all of our different senses and we have
many different experiences like you know
riding a bike driving a car talking to
people reading I think when you get that
kind of integrated data set working with
a machine learning model that can
actually close the loop and interact we
may find that algorithms not so
different from what we have today learn
really interesting things when you scale
them up a lot and
a large amount of multimodal data so
multimodal is really interesting but
within like you're working adversarial
examples so selecting within modal
within up one mode of data selecting
better at what are the difficult cases
from which are most useful to learn from
oh yeah like could we could you get a
whole lot of mileage out of designing a
model that's resistant to adverse fare
examples or something like that right
yeah question but my thinking on that
has evolved a lot over the last few
years one nice thing when I first
started to really invest in studying
adversarial examples I was thinking of
it mostly as that versus aryl examples
reveal a big problem with machine
learning and we would like to close the
gap between how machine learning models
respond to adversarial examples and how
humans respond after studying the
problem more I still think that
adversarial examples are important I
think of them now more of as a security
liability then as an issue that
necessarily shows there something
uniquely wrong with machine learning as
opposed to humans also do you see them
as a tool to improve the performance of
the system not not on the security side
but literally just accuracy I do see
them as a kind of tool on that side but
maybe not quite as much as I used to
think we've started to find that there's
a trade-off between accuracy on
adversarial examples and accuracy on
clean examples back in 2014 when I did
the first adversary trained classifier
that showed resistance to some kinds of
adversarial examples it also got better
at the clean data on M NIST and that's
something we've replicated several times
an M NIST that when we train against
weak adversarial examples Emnes
classifiers get more accurate so far
that hasn't really held up on other data
sets and hasn't held up when we train
against stronger adversaries it seems
like when you confront a really strong
adversary you tend to have to give
something up interesting this is such a
compelling idea because it feels it
feels like that's how us humans learn
yeah the difficult cases we we try to
think of what would we screw up
and then we make sure we fix that yeah
it's also in a lot of branches of
engineering you do a worst case analysis
and make sure that your system will work
in the worst case and then that
guarantees that it'll work in all of the
messy average cases that happen when you
go out into a really randomized world
you know with driving with autonomous
vehicles there seems to be a desire to
just look for think I'd viscerally tried
to figure out how to mess up the system
and if you can be robust to all those
difficult cases then you can it's a hand
waving empirical way to show that your
system is yeah yes
today most adverse early example
research isn't really focused on a
particular use case but there are a lot
of different use cases where you'd like
to make sure that the adversary can't
interfere with the operation of your
system like in finance if you have an
algorithm making trades for you people
go to a lot of an effort to obfuscate
their algorithm that's both to protect
their IP because you don't want to
research and develop a profitable
trading algorithm then have somebody
else capture the gains but it's at least
partly because you don't want people to
make adversarial examples that fool you
our algorithm into making bad trades
or I guess one area that's been popular
in the academic literature is speech
recognition if you use speech
recognition to hear an audio waveform
and then in turn that into a command
that a phone executes for you you don't
want and a malicious adversary to be
able to produce audio that gets
interpreted as malicious commands
especially if a human in the room
doesn't realize that something like that
is happening in speech recognition has
there been much success in in being able
to create adversarial examples that fool
the system yeah actually I guess the
first work that I'm aware of is a paper
called hidden voice commands that came
out in 2016 I believe and they were able
to show that they could make sounds that
are not understandable by a human but
are recognized as the target phrase that
the attacker wants the phone to
recognize it as since then things have
gotten a little bit better on the
attacker side when worse on the defender
side it's become possible to make sounds
that sound like normal speech but are
actually interpreted as a different
sentence than the human here's the level
of perceptibility of the adversarial
perturbation is still kind of high the
when you listen to the recording it
sounds like there's some noise in the
background just like rustling sounds but
those rustling sounds are actually the
adversarial perturbation that makes the
phone hear a completely different
sentence yeah that's so fascinating
Peter Norvig mention that you're writing
the deep learning chapter for the fourth
edition of the artificial intelligence
the modern approach book so how do you
even begin summarizing the field of deep
learning in a chapter well in my case I
waited like a year before I actually
read anything is it
even having written a full length
textbook before it's still pretty
intimidating to try to start writing
just one chapter that covers everything
one thing that helped me make that plan
was actually the experience of having
ridden the full book before and then
watching how the field changed after the
book came out I realized there's a lot
of topics that were maybe extraneous in
the first book and just seeing what
stood the test of a few years of being
published and what seems a little bit
less important to have included now
helped me pare down the topics I wanted
to cover for the book it's also really
nice now that the field is kind of
stabilized to the point where some core
ideas from the 1980s are still used
today when I first started studying
machine learning almost everything from
the 1980s had been rejected and now some
of it has come back so that stuff that's
really stood the test of time is what I
focused on putting into the book there's
also I guess two different philosophies
about how you might write a book one
philosophy is you try to write a
reference that covers everything and the
other philosophy is you try to provide a
high level summary that gives people the
language to understand a field and tells
them what the most important concepts
are the first deep learning book that I
wrote with Yahshua and Aaron was
somewhere between the the two
philosophies that it's trying to be both
a reference and an introductory guide
writing this chapter for Russell and
Norvig book I was able to focus more on
just a concise introduction of the key
concepts and the language you need to
read about them more and a lot of cases
actually just wrote paragraphs that said
here's a rapidly evolving area that you
should pay attention to it's it's
pointless to try to tell you what the
latest and best version of a you know
learn to learn model is right you know I
can I can point you to a paper that's
recent right now but there isn't a whole
lot of a reason to delve into exactly
what's going on with the latest learning
to learn approach or the latest module
produced by learning to learn algorithm
you should know that learning to learn
is a thing and that it may very well be
the source of the latest and greatest
convolutional net or recurrent net
module that you would want to use in
your latest project but there isn't a
lot of point in trying to summarize
exactly which architecture in which
learning approach got to which level of
performance
so you maybe focus more on the basics of
the methodology so from back propagation
to feed-forward to recur in your
networks convolutional that kind of
thing yeah yeah so if I were to ask you
I remember I took algorithms and data
structures algorithm there of course
remember the professor asked what is an
algorithm and yelled at everybody in a
good way that nobody was answering it
correctly everybody knew what the alkyl
it was graduate course everybody knew
what an algorithm was but they weren't
able to answer it well let me ask you in
that same spirit what is deep learning I
would say deep learning is any kind of
machine learning that involves learning
parameters of more than one consecutive
step so that I mean shallow learning is
things where you learn a lot of
operations that happen in parallel you
might have a system that makes multiple
steps like you might have had designed
feature extractors but really only one
step is learned deep learning is
anything where you have multiple
operations in sequence and that includes
the things that are really popular today
like convolutional networks and
recurrent networks but it also includes
some of the things that have died out
like Bolton machines where we weren't
using back propagation today I hear a
lot of people define deep learning as
gradient descent applied to these
differentiable functions and I think
that's a legitimate usage of the term
it's just different from the way that I
use the term myself so what's an example
of deep learning that is not gradient
descent on differentiable functions in
your I mean not specifically perhaps but
more even looking into the future what's
your thought about that space of
approaches yeah so I tend to think of
machine learning algorithms as
decomposed into really three different
pieces there's the model which can be
something like a neural nut or a Bolton
machine or a recurrent model and I
basically just described
how do you take data and how do you take
parameters and you know what function do
you use to make a prediction given the
data and the parameters another piece of
the learning algorithm is the
optimization algorithm or not every
algorithm can be really described in
terms of optimization but what's the
algorithm for updating the parameters or
updating whatever the state of the
network is and then the the last part is
the the data set like how do you
actually represent the world as it comes
into your machine learning system so I
think of deep learning as telling us
something about what does the model look
like and basically to qualify as deep I
say that it just has to have multiple
layers that can be multiple steps in a
feed-forward differentiable computation
that can be multiple layers in a
graphical model there's a lot of ways
that you could satisfy me that something
has multiple steps that are each
parameterised separately
I think of gradient descent as being all
about that other piece the how do you
actually update the parameters piece so
you can imagine having a deep model like
a convolutional net and training it with
something like evolution or a genetic
algorithm and I would say that still
qualifies as deep learning and then in
terms of models that aren't necessarily
differentiable
I guess Boltzmann machines are probably
the main example of something where you
can't really take a derivative and use
that for the learning process but you
you can still argue that the model has
many steps of processing that it applies
when you run inference in the model so
that's the steps of processing that's
key so geoff hinton suggests that we
need to throw away back prop back
propagation and start all over what do
you think about that what could an
alternative direction of training nil
networks look like I don't know that
back propagation is going to go away
entirely most of this time when we
decide that a machine learning algorithm
isn't on the critical path to research
for improving AI the algorithm doesn't
die it just becomes used for some
specialized set of things
a lot of algorithms like logistic
regression don't seem that exciting to
AI researchers who are working on things
like speech recognition or autonomous
cars today but there's still a lot of
use for logistic regression and things
like analyzing really noisy data and
medicine and finance or making really
rapid predictions in really time-limited
contexts so I think I think back
propagation and gradient descent are
around to stay but they may not end up
being everything that we need to get to
real human level or superhuman AI are
you optimistic about us discovering
you know back propagation has been
around for a few decades
so I optimistic bus about us as a
community being able to discover
something better yeah I am I think I
think we likely will find something that
works better you could imagine things
like having stacks of models where some
of the lower level models predict
parameters of the higher level models
and so at the top level you're not
learning in terms of literally
calculating gradients but just
predicting how different values will
perform you can kind of see that already
in some areas like Bayesian optimization
where you have a Gaussian process that
predicts how well different parameter
values will perform we already used
those kinds of algorithms for things
like hyper parameter optimization and in
general we know a lot of things other
than back prep that work really well for
specific problems the main thing we
haven't found is a way of taking one of
these other non back based algorithms
and having it really advanced the
state-of-the-art
on an AI level problem right but I
wouldn't be surprised if eventually we
find that some of these algorithms that
even the ones that already exists not
even necessarily a new one we might find
some way of customizing one of these
algorithms to do something really
interesting at the level of cognition or
or the the level of I think one system
that we really don't have working quite
right yet is like short-term memory we
have things like LST M's they're called
long short-term memory
they still don't do quite what a human
does with short-term memory
like gradient descent to learn a
specific fact has to do multiple steps
on that fact like if I I tell you the
meeting today is at 3 p.m. I don't need
to say over and over again it's at 3
p.m. it's not 3 p.m. it's at 3 p.m. it's
a 3 p.m. right for you to do a gradient
step on each one you just hear it once
and you remember it there's been some
work on things like self attention and
attention like mechanisms like the
neural Turing machine that can write to
memory cells and update themselves with
facts like that right away but I don't
think we've really nailed it yet and
that's one area where I'd imagine that
new optimization algorithms are
different ways of applying existing
optimization algorithms could give us a
way of just lightning-fast updating the
state of a machine learning system to
contain a specific fact like that
without needing to have it presented
over and over and over again so some of
the success of symbolic systems in the
80s is they were able to assemble these
kinds of facts better but dude there's a
lot of expert input required and it's
very limited in that sense do you ever
look back to that as something that will
have to return to eventually sort of
dust off the book from the shelf and
think about how we build knowledge
representation knowledge place well we
have to use graph searches searches
right and like first-order logic and
entailment and things like that a thing
yeah exactly
in my particular line of work which has
mostly been machine learning security
and and also generative modeling I
haven't usually found myself moving in
that direction for generative models I
could see a little bit of it could be
useful if you had something like a
differentiable knowledge base or some
other kind of knowledge base where it's
possible for some of our fuzzier machine
learning algorithms to interact with the
knowledge base immanuel Network is kind
of like that it's a differentiable
knowledge base of sorts yeah but if if
we had a really easy way of giving
feedback to machine learning models that
would clearly helped a lot with with
generative models and so you could
imagine one way of getting there would
be get a lot better at natural
language processing but another way of
getting there would be take some kind of
knowledge base and figure out a way for
it to actually interact with a neural
network being able to have a chat within
y'all network yes so like one thing in
generative models we see a lot today is
you'll get things like faces that are
not symmetrical like like people that
have two eyes that are different colors
and I mean there are people with eyes
that are different colors in real life
but not nearly as many of them as you
tend to see in the machine learning
generated data so if if you had either a
knowledge base that could contain the
fact people's faces are generally
approximately symmetric and eye color is
especially likely to be the same on both
sides being able to just inject that
hint into the machine learning model
without it having to discover that
itself after studying a lot of data it
would be a really useful feature I could
see a lot of ways of getting there
without bringing back some of the 1980s
technology but I also see some ways that
you could imagine extending the 1980s
technology to play nice with neural nets
and have it help get there
awesome so you talked about the story of
you coming up with idea of Gans at a bar
with some friends you were arguing that
this you know Gans would work Jenner of
adversarial networks and the others
didn't think so then he went home at
midnight coated up and it worked so if I
was a friend of yours at the bar I would
also have doubts it's a really nice idea
but I'm very skeptical that it would
work what was the basis of their
skepticism what was the basis of your
intuition why he should work I don't
want to be someone who goes around
promoting alcohol for the science in
this case I do actually think that
drinking helped a little bit mm-hmm
when your inhibitions are lowered you're
more willing to try out things that you
wouldn't try out otherwise so I I have
noticed it in general that I'm less
prone to shooting down some of my own
ideas when I'm when I have had a little
bit to drink I think if I had had that
idea at lunch time yeah I probably would
have thought it it's hard enough I mean
one neural net you can't train a second
neuron that in the inner loop of the
outer neural net that was basically my
friends
action was that trying to train two
neural nets at the same time would be
too hard so it was more about the
training process unless so my skepticism
would be you know I'm sure you could
train it but the thing would converge to
would not be able to generate anything
reasonable and any kind of reasonable
realism yeah so so part of what all of
us were thinking about when we had this
conversation was deep Bolton machines
which a lot of us in the lab including
me were a big fan of deep bolts and
machines at the time they involved two
separate processes running at the same
time one of them is called the positive
phase where you load data into the model
and tell the model to make the data more
likely the owners called the negative
phase where you draw samples from the
model and tell the model to make those
samples less likely in a deep Bolton
machine it's not trivial to generate a
sample you have to actually run an
iterative process that gets better and
better
samples coming closer and closer to the
distribution the model represents so
during the training process you're
always running these two systems at the
same time one that's updating the
parameters of the model and another one
that's trying to generate samples from
the model and they worked really well on
things like Amnesty a lot of us in the
lab including me had tried to get the
Boltzmann machines to scale past em
inist
to things like generating color photos
and we just couldn't get the two
processes to stay synchronized so when I
had the idea for Gans a lot of people
thought that the discriminator would
have more or less the same problem as
the negative phase in the Boltzmann
machine that trying to train the
discriminator in the inner loop you just
couldn't get it to keep up with the
generator and the outer loop and that
would prevent it from converging to
anything useful yeah I share that
intuition yeah what turns out to not be
the case a lot of the time with machine
learning algorithms it's really hard to
predict ahead of time how well they'll
actually perform you have to just run
the experiment and see what happens
and I would say I still today don't have
like one factor I can put my finger on
it say this is why ganz worked for photo
generation and deep Boltzmann machines
don't
there are a lot of theory papers showing
that under some theoretical settings the
the gun algorithm does actually converge
but those settings are restricted enough
that they don't necessarily explain the
whole picture in terms of all the
results that we see in practice so
taking a step back can you in the same
way as we talked about deep learning can
you tell me what generative adversarial
networks are yeah so generative
adversarial networks are a particular
kind of generative model a generative
model is a machine learning model that
can train on some set of data like so
you have a collection of photos of cats
and you want to generate more photos of
cats or you want to estimate a
probability distribution over cats so
you can ask how likely it is that some
new image is a photo of a cat ganzar one
way of doing this
some generative models are good at
creating new data other generative
models are good at estimating that
density function and telling you how
likely particular pieces of data are to
come from the same distribution as a
training data gans are more focused on
generating samples rather than
estimating the density function there
are some kinds of games like flow gun
that can do both but mostly guns are
about generating samples of generating
new photos of cats that look realistic
and they do that completely from scratch
it's analogous to human imagination when
again creates a new image of a cat it's
using a neural network to produce a cat
that has not existed before it isn't
doing something like compositing photos
together you're not you're not literally
taking the eye off of one cat on the ear
off of another cat it's it's more of
this digestive process where the the
neural net trains on a lot of data and
comes up with some representation of the
probability distribution and generates
entirely new cats there are a lot of
different ways of building a generative
model what's specific against is that we
have a two-player game in the game
theoretic sense and as the players in
this game compete
one of them becomes able to generate
realistic data the first player is
called the generator it produces output
data such as just images for example and
at the start of the learning process
it'll just produce completely random
images the other player is called the
discriminator the discriminator takes
images as input and guesses whether
they're real or fake you train it both
on real data so photos that come from
your training set actual photos of cats
and you try to say that those are real
you also train it on images that come
from the generator network and you train
it to say that those are fake as the two
players compete in this game the
discriminator tries to become better at
recognizing where their images are real
or fake and the generator becomes better
at fooling the discriminator into
thinking that its outputs are are real
and you can analyze this through the
language of game theory and find that
there's a Nash equilibrium where the
generator has captured the correct
probability distribution so in the cat
example it makes perfectly realistic cat
photos and the discriminator is unable
to do better than random guessing
because all the all the samples coming
from both the data and the generator
look equally likely to have come from
either source so do you ever do sit back
and does it just blow your mind that
this thing works so from very so it's
able to estimate that density function
enough to generate generate realistic
images I mean does it yeah do you ever
sit back yeah how does this even why
this is quite incredible especially
where Gant's have gone in terms of
realism yeah and and not just to flatter
my own work but generative models all of
them have this property that if they
really did what we asked them to do they
would do nothing but memorize the
training data right some models that are
based on maximizing the likelihood the
way that you obtain the maximum
likelihood for a specific training set
is you assign all of your probability
mass to the training examples and
nowhere else
forgets the game is played using a
training set so the way that you become
unbeatable in the game is you literally
memorize training examples
one of my former interns wrote a paper
his name is a Vaishnav nagarajan and he
showed that it's actually hard for the
generator to memorize the training data
hard in a statistical learning theory
sense that you can actually create
reasons for why it would require quite a
lot of learning steps and and a lot of
observations of of different latent
variables before you could memorize the
training data that still doesn't really
explain why when you produce samples
that are new why do you get compelling
images rather than you know just garbage
that's different from the training set
and I don't think we really have a good
answer for that especially if you think
about how many possible images are out
there and how few images the generative
model sees during training it seems just
unreasonable that generative models
create new images as well as they do
especially considering that we're
basically training them to memorize
rather than generalize I think part of
the answer is there's a paper called
deep image prior where they show that
you can take a convolutional net and you
don't even need to learn the parameters
of it at all
you just use the model architecture and
it's already useful for things like in
painting images I think that shows us
that the convolutional network
architecture captures something really
important about the structure of images
and we don't need to actually use
learning to capture all the information
coming out of the convolutional net that
would that would imply that it would be
much harder to make generative models in
other domains so far we're able to make
reasonable speech models and things like
that but to be honest we haven't
actually explored a whole lot of
different data sets all that much we
don't for example see a lot of deep
learning models of like biology datasets
where you have lots of microarrays
measuring the amount of different
enzymes and things like that so we may
find that some of the progress that
we've seen for images and speech turns
out to really rely heavily on the model
architecture and we were able to do what
we did for vision by trying to
reverse-engineer the human visual system
and
maybe it'll turn out that we can't just
use that same trick for arbitrary kinds
of data all right so there's aspects of
the human vision system the hardware of
it that makes it without learning
without cognition just makes it really
effective at detecting the patterns
we've seen the visual world yeah that's
yeah that's really interesting what in a
big quick overview in your view in your
view what types of Gans are there and
what other generative models besides
games are there yeah so it's maybe a
little bit easier to start with what
kinds of generative models are there
other than Gans
so most generative models are likelihood
based where to train them you have a
model that tells you how how much
probability it assigns to a particular
example and you just maximize the
probability assigned to all the training
examples it turns out that it's hard to
design a model that can create really
complicated images or really complicated
audio waveforms and still have it be
possible to estimate the the likelihood
function from a computational point of
view most interesting models that you
would just write down intuitively it
turns out that it's almost impossible to
calculate the amount of probability they
assign to a particular point so there's
a few different schools of generative
models in the likelyhood family one
approach is to very carefully design the
model so that it is computationally
tractable to measure the density it
assigns to a particular point so there
are things like auto regressive models
like pixel CN n those basically break
down the probability distribution into a
product over every single feature so for
an image you estimate the probability of
each pixel given all of the pixels that
came before it hmm there's tricks where
if you want to measure the density
function you can actually calculate the
density for all these pixels more or
less in parallel generating the image
still tends to require you to go one
pixel at a time and that can be very
slow
but there again tricks for doing this in
a hierarchical pattern where you can
keep the runtime under control or the
quality of the images it generates
putting runtime aside pretty good
they're reasonable yeah the I would say
a lot of the best results are from Gans
these days but it can be hard to tell
how much of that is based on who's
studying which type of algorithm if that
makes sense the amount of effort invest
in it but yeah or like the kind of
expertise so a lot of people who've
traditionally been excited about
graphics or art and things like that
have gotten interested in Gans and to
some extent it's hard to tell our Gans
doing better because they have a lot of
graphics and art experts behind them or
our Gans doing better because they're
more computationally efficient or our
Gans doing better because they
prioritize the realism of samples over
the accuracy of the density function I
think I think all of those are
potentially valid explanations and it's
it's hard to tell so can you give a
brief history of Gans from 2014 we paid
for 13 yeah so a few highlights in the
first paper we just showed that Gans
basically work if you look back at the
samples we had now they looked terrible
on the CFR 10 dataset you can't even
recognize objects in them your papers I
will use CFR 10 we use em NIST which is
little handwritten digits we used the
Toronto face database which is small
grayscale photos of faces
we did have recognizable faces my
colleague Bing Xu put together the first
again face model for that paper we also
had the CFR 10 dataset which is things
like very small 32 by 32 pixels of cars
and cats and dogs for that we didn't get
recognizable objects but all the deep
learning people back then we're really
used to looking at these failed samples
and kind of reading them like tea leaves
right and people who are used to reading
the tea leaves recognize that our tea
leaves at least look different right
maybe not necessarily better but there
was something unusual about them
and that got a lot of us excited one of
the next really big steps was lap gown
by Emily Denton and seemeth chintala at
Facebook AI research where they actually
got really good high-resolution photos
working with gans for the first time
they had a complicated system where they
generated the image starting at low res
and then scaling up to high res but they
were able to get it to work and then in
2015 I believe later that same year
palek Radford and sumh intelli and Luke
Metz published the DC gain paper which
it stands for deep convolutional again
it's kind of a non unique name because
these days basically all gans and even
some before that were deep in
convolutional but they just kind of
picked a name for a really great recipe
where they were able to actually using
only one model instead of a multi-step
process actually generate realistic
images of faces and things like that
that was sort of like the beginning of
the Cambrian explosion of gans like you
know once once you got animals that had
a backbone you suddenly got lots of
different versions of you know like fish
and right they have four-legged animals
and things like that so so DC Gann
became kind of the backbone for many
different models that came out used as a
baseline even still yeah yeah and so
from there I would say some interesting
things we've seen are there's a lot you
can say about how just the quality of
standard image generation ganz has
increased but what's also maybe more
interesting on an intellectual level is
how the things you can use guns for has
also changed one thing is that you can
use them to learn classifiers without
having to have class labels for every
example in your your training set so
that's called semi-supervised learning
my colleague at open AI Tim Solomon's
who's at at brain now wrote a paper
called improved techniques for training
guns I'm a co-author on this paper but I
can't claim any credit for this
particular part one thing he showed in
the paper is that you can take the gun
discriminator and use it as a classifier
that actually tells you you know this
image is a cat this image is a dog this
image is a car
this image is a truck and so and not
just to say whether the image is real or
fake but if it is real to say
specifically what kind of object it is
and he found that you can train these
classifiers with far fewer labeled
examples learn traditional classifiers
so a few supervised based on also not
just your discrimination ability but
your ability to classify you're going to
do much you're going to convert much
faster to being effective at being a
discriminator yeah so for example for
the emne status set you want to look at
an image of a handwritten digit and say
whether it's a 0 a 1 or 2 and so on
to get down to less than 1% accuracy
required around 60,000 examples until
maybe about 2014 or so in 2016 with this
semi-supervised degan project tim was
able to get below 1% error using only a
hundred labeled examples so that was
about a 600 X decrease in the amount of
labels that he needed he's still using
more images in that but he doesn't need
to have each of them labeled as you know
this one's a 1 this one's a 2 this one's
a 0 and so on then to be able to for
Ganz to be able to generate recognizable
objects so object for a particular class
you still need labelled data because you
need to know what it means to be a
particular class cat dog how do you
think we can move away from that yeah
some researchers at brain Zurich
actually just released a really great
paper on semi-supervised de Gans whether
their goal isn't to classify its to make
recognizable objects despite not having
a lot of label data they were working
off of deep minds big gun project and
they showed that they can match the
performance of began using only 10% I
believe of the of the labels big gun was
trained on the image net dataset which
is about 1.2 million images and had all
of them labelled this latest project
from brain Zurich shows that they're
able to get away with only having about
10% of the of the images labeled
and they do that essentially using a
clustering algorithm where the
discriminator learns to assign the
objects to groups and then this
understanding that objects can be
grouped into you know similar types
helps it to form more realistic ideas of
what should be appearing in the image
because it knows that every image it
creates has to come from one of these
archetypal groups rather than just being
some arbitrary image if you train again
with no class labels you tend to get
things that look sort of like grass or
water or brick or dirt but but without
necessarily a lot going on in them and I
think that's partly because if you look
at a large image net image the object
doesn't necessarily occupy the whole
image and so you learn to create
realistic sets of pixels but you don't
necessarily learn that the object is the
star of the show and you want it to be
in every image you make yeah you've
heard you talk about the the horse the
zebra cycle Gann mapping and how it
turns out again thought provoking that
horses are usually on grass and zebras
are usually on drier terrain so when
you're doing that kind of generation
you're going to end up generating
greener horses or whatever so those are
connected together it's not just yeah
yeah be able to you're not able to
segment
yeah it's generating the segments away
so there are other types of games you
come across in your mind that neural
networks can play with each other to to
to be able to solve problems yeah the
the one that I spend most of my time on
is insecurity you can model most
interactions as a game where there's
attackers trying to break your system
and you order the defender trying to
build a resilient system there's also
domain adversarial learning which is an
approach to domain adaptation that looks
really a lot like Ganz the the author's
had the idea before the game paper came
out their paper came out a little bit
later and you know they they're very
nice and sighted again paper but
I know that they actually had the idea
before I came out domain adaptation is
when you want to train a machine
learning model in 1:1 setting called a
domain and then deploy it in another
domain later and he would like it to
perform well in the new domain even
though the new domain is different from
how it was trained so for example you
might want to train on a really clean
image data set like image net but then
deploy on users phones where the user is
taking you know pictures in the dark or
pictures while moving quickly and just
pictures that aren't really centered or
composed all that well
when you take a normal machine learning
model it often degrades really badly
when you move to the new domain because
it looks so different from what the
model was trained on domain adaptation
algorithms try to smooth out that gap
and the domain adverse oral approach is
based on training a feature extractor
where the features have the same
statistics regardless of which domain
you extracted them on so in the domain
adversarial game you have one player
that's a feature extractor and another
player that's a domain recognizer
the domain recognizer wants to look at
the output of the feature extractor and
guess which of the two domains oh the
features came from so it's a lot like
the real versus fake discriminator and
ends and then the feature extractor you
can think of as loosely analogous to the
generator in games except what's trying
to do here is both fool the domain
recognizer and two not knowing which
domain the data came from and also
extract features that are good for
classification so at the end of the day
you can in in the cases where it works
out you can actually get features that
work about the same in both domains
sometimes this has a drawback where in
order to make things work the same in
both domains it just gets worse at the
first one but there are a lot of cases
where it actually works out well on both
do you think gas being useful in the
context of data augmentation yeah one
thing you could hope for with Kenz is
you could imagine I've got a limited
training set and I'd like to make more
training data to train something else
like a classifier you could train Magan
on the training set and then create more
data and then maybe the classifier would
perform better on the test set after
training on those big ERG and generated
data set so that's the simplest version
of of something you might hope would
work I've never heard of that particular
approach working but I think there's
some there's some closely related things
that that I think could work in the
future and some that actually already
have worked so if you think a little bit
about what we'd be hoping for if we use
the gun to make more training data we're
hoping that again we'll generalize to
new examples better than the classifier
would have generalized if it was trained
on the same buddy at us
and I don't know of any reason to
believe that the Gann would generalize
better than the classifier would but
what we might hope for is that the Gann
could generalize differently from a
specific classifier so one thing I think
is worth trying that I haven't
personally tried but someone could try
is what have you trained a whole lot of
different generative models on the same
training set create samples from all of
them and then train a classifier on that
because each of the generative models
might generalize in a slightly different
way they might capture many different
axes of variation that one individual
model wouldn't and then the classifier
can capture all of those ideas by
training in all of their data so we'd be
a little bit like making an ensemble of
classifiers and I say oh of gans
yeah in a way I think that could
generalize better the other thing that
gans are really good for is not
necessarily generating new data that's
exactly like what you already have but
by generating new data that has
different properties from the data you
already had one thing that you can do is
you can create differentially private
data so suppose that you have something
like medical records and you don't want
to train a classifier on the medical
records and then publish the classifier
because someone might be able to
reverse-engineer some of the medical
records you trained on there's a paper
from Casey greens lab that shows how you
can train again using differential
privacy and then the samples one again
still have the same differential privacy
guarantees as the parameters that again
so you can make fake patient data for
other researchers to use and they can do
almost anything they want with that data
because it doesn't come from real people
and the differential privacy mechanism
gives you clear guarantees on how much
the original people's

Resume

Berikut adalah rangkuman komprehensif dan terstruktur berdasarkan transkrip wawancara dengan Ian Goodfellow.

---

# Wawancara Eksklusif Ian Goodfellow: Evolusi Deep Learning, GANs, dan Keamanan AI

### Inti Sari (Executive Summary)
Wawancara ini membahas perjalanan karir dan wawasan mendalam Ian Goodfellow, pencetus Generative Adversarial Networks (GANs) dan penulis buku *Deep Learning*, mengenai perkembangan kecerdasan buatan. Topik utama mencakup definisi ulang Deep Learning sebagai program berurutan, batasan teknologi saat ini terkait data dan penalaran, serta evolusi GANs dari ide di bar hingga standar industri dalam pembuatan generatif. Diskusi juga menyoroti tantangan krusial mengenai keamanan AI (*adversarial examples*), aplikasi GANs untuk pembelajaran semi-supervisi dan privasi, serta visi masa depan menuju Artificial General Intelligence (AGI) yang aman dan dapat diandalkan.

---

### Poin-Poin Kunci (Key Takeaways)
*   **Definisi Deep Learning:** Deep Learning (DL) dipandang sebagai pembelajaran program dengan banyak langkah berurutan (sequential steps), bukan sekadar operasi paralel seperti pada pembelajaran dangkal.
*   **Asal Usul GANs:** Ide Generative Adversarial Networks (GANs) muncul secara spontan di sebuah bar setelah perdebatan dengan teman, dan berhasil direalisasikan dalam waktu singkat meski awalnya diragukan.
*   **Adversarial Examples:** Contoh adversarial bukan hanya cacat dalam pembelajaran mesin, tetapi merupakan risiko keamanan serius yang dapat dimanipulasi untuk menipu sistem, seperti pengenalan suara atau keuangan.
*   **Efisiensi Data:** GANs memungkinkan pembelajaran semi-supervisi yang sangat efisien, mengurangi kebutuhan data berlabel secara drastis (misalnya dari 60.000 menjadi hanya 100 contoh).
*   **Masa Depan Keamanan AI:** Untuk menghadapi serangan di masa depan, model AI perlu berubah menjadi sistem yang dinamis, bukan statis, agar sulit diprediksi dan dieksploitasi oleh musuh.

---

### Rincian Materi (Detailed Breakdown)

#### 1. Batasan dan Definisi Deep Learning
Ian Goodfellow menjelaskan bahwa Deep Learning adalah subset dari *representation learning* dan *machine learning*. Meskipun sangat kuat, DL memiliki keterbatasan:
*   **Kebutuhan Data:** Membutuhkan jumlah data yang sangat besar, terutama data berlabel. *Unsupervised learning* membantu, tetapi tetap membutuhkan banyak data tak berlabel.
*   **Efisiensi Reinforcement Learning (RL):** Manusia belajar jauh lebih cepat daripada algoritma RL yang membutuhkan jutaan kegagalan (contoh: bermain Pong).
*   **Generalisasi:** Kemampuan untuk menggeneralisasi pengetahuan masih menjadi hambatan utama.
*   **DL sebagai Program:** Jaringan saraf dipandang sebagai program di mana kedalaman (depth) merepresentasikan langkah berurutan dan lebar (width) merepresentasikan langkah paralel. Ini berbeda dengan pembelajaran dangkal (seperti SVM) yang sebagian besar bersifat paralel.

#### 2. Kesadaran dan Penalaran
*   **Kesadaran (Consciousness):** Ada dua definisi kesadaran: kesadaran diri (*self-awareness*) dan kualitas pengalaman (*qualia*).
*   Algoritma RL saat ini sudah memiliki bentuk kesadaran diri yang terbatas karena memodelkan efek agen terhadap lingkungan. Namun, *qualia* sulit diformalkan dalam ilmu komputer.
*   **Penalaran:** Penalaran dalam DL bukanlah deduksi simbolik tradisional, melainkan penyempurnaan pemahaman secara bertahap melalui pembaruan representasi.

#### 3. Adversarial Examples: Ancaman Keamanan
*   **Perubahan Pandangan:** Awalnya dianggap sebagai celah antara ML dan manusia, kini *adversarial examples* lebih dipandang sebagai liabilitas keamanan.
*   **Pertukaran Akurasi:** Meningkatkan ketahanan terhadap serangan adversarial sering kali mengorbankan akurasi pada data bersih (*clean data*).
*   **Serangan Suara:** Penyerang dapat membuat audio yang terdengar seperti kebisingan bagi manusia, tetapi dikenali sebagai perintah spesifik oleh asisten virtual ponsel.
*   **Pertahanan:** Di bidang keuangan, pengaburan data (*obfuscation*) digunakan untuk melindungi kekayaan intelektual dan mencegah eksploitasi algoritma perdagangan.

#### 4. Sejarah dan Mekanisme Kerja GANs
*   **Ide Awal:** Ide GANs muncul saat Goodfellow berada di bar. Teman-temannya skeptis karena melatih dua jaringan saraf secara simultan dianggap terlalu sulit.
*   **Mekanisme:** GANs melibatkan permainan dua pemain (teori permainan):
    *   **Generator:** Mencoba membuat data palsu untuk menipu.
    *   **Diskriminator:** Mencoba membedakan data asli dan palsu.
*   **Keseimbangan Nash:** Tujuannya adalah mencapai keadaan di mana Generator membuat data yang sangat realistis sehingga Diskriminator hanya bisa menebak secara acak.
*   **Mengapa GANs Berhasil?** Berbeda dengan *Deep Boltzmann Machines* yang sulit diskalakan ke gambar berwarna, GANs berhasil karena proses sampling yang lebih efisien, meskipun alasan pastinya masih menjadi subjek penelitian teoretis.

#### 5. Evolusi dan Aplikasi GANs
*   **Perkembangan:**
    *   **2014:** Paper pertama, hasil gambar masih buram pada dataset CIFAR-10.
    *   **LAPGAN:** Menghasilkan foto resolusi tinggi pertama menggunakan sistem bertingkat.
    *   **DCGAN (2015):** Menjadi standar dasar (*backbone*) untuk banyak model generatif selanjutnya karena kemampuannya menghasilkan wajah realistis dengan satu model.
*   **Pembelajaran Semi-Supervisi:** GANs memungkinkan pelatihan pengklasifikasi dengan sangat sedikit label.
    *   Contoh: Pada dataset MNIST, error <1% dapat dicapai hanya dengan 100 contoh berlabel (turun dari kebutuhan 60.000 contoh).
*   **Pengurangan Label:** Penelitian terbaru menunjukkan GANs dapat mencocokkan kinerja model besar seperti BigGAN hanya dengan menggunakan 10% label pada dataset ImageNet.

#### 6. Adaptasi Domain, Privasi, dan Keberlanjutan
*   **Adaptasi Domain:** Menggunakan pembelajaran adversarial untuk menyamakan distribusi fitur antara domain pelatihan (misal: gambar bersih) dan domain penerapan (misal: foto pengguna yang gelap), sehingga model tetap akurat di kedua situasi.
*   **Augmentasi Data & Privasi:** GANs dapat digunakan untuk menghasilkan data sintetis yang mempertahankan privasi (*differential privacy*). Ini memungkinkan peneliti menggunakan data medis palsu tanpa membahayakan privasi pasien asli.
*   **Keberlanjutan (Fairness):** Teknik adversarial dapat menghapus bias sensitif (seperti gender) dari model dengan memaksa *feature extractor* untuk memprediksi target tanpa membiarkan *analyzer* menebak variabel sensitif.

#### 7. Deteksi Palsu, Interpretasi, dan AGI
*   **Deteksi Deepfake:** Mengandalkan deteksi visual saja tidak lagi aman. Solusi masa depan adalah autentikasi kriptografis, di mana perangkat keras (ponsel) menandai konten asli dengan kunci privat.
*   **Interpretabilitas:** Saat ini bersifat subjektif. Diperlukan definisi matematis yang kuat untuk interpretabilitas, mirip seperti definisi *differential privacy*, untuk memajukan bidang ini.
*   **Artificial General Intelligence (AGI):** AGI memerlukan lingkungan simulasi yang kaya untuk memberikan pengalaman yang beragam (bermain game, membaca, menemukan obat) kepada agen, serta kemampuan untuk menjalankan seluruh pipa proses data secara otonom tanpa bantuan manusia (*glue*).

#### 8. Masa Depan Keamanan AI: Model Dinamis
*   **Tantangan Utama:** Membuat AI tahan terhadap serangan adversarial di semua domain (gambar, bahasa, mengemudi).
*   **Analogi Keamanan:** Seperti keamanan ponsel pada tahun 2002 yang sulit diprediksi kebutuhannya di masa depan, keamanan AI harus mencegah pihak luar mengendalikan sistem.
*   **Solusi: Model Dinamis:**
    *   Model saat ini bersifat statis (*frozen*), sehingga menjadi "sasaran empuk" bagi penyerang yang dapat mengeksploitasi kesalahan yang sama berulang kali.
    *   Model dinamis akan memperbarui prediksinya atau mengubah perilakunya setiap kali membuat prediksi. Ini menambah misteri bagi penyerang, membuatnya jauh lebih sulit untuk menemukan pola eksploitasi.

---

### Kesimpulan & Pesan Penutup
Ian Goodfellow menutup wawancara dengan menekankan bahwa meskipun Deep Learning telah mencapai kemajuan luar biasa, komunitas AI harus beralih fokus dari sekadar meningkatkan akurasi menuju keamanan dan ketahanan yang kuat. Ancaman *adversarial attacks* adalah nyata dan akan terus berkembang. Solusi yang paling menjanjikan untuk masa depan adalah pengembangan **

Read

file updated 2026-02-13 13:23:06 UTC