What is Statistics? (Michael I. Jordan) | AI Podcast Clips
AQUAPiHahVY • 2020-02-25
Transcript preview
Open
Kind: captions
Language: en
an absurd question but what is
statistics so the here it's a little bit
it's somewhere between math and science
and technology it's somewhere in that
convex hull so it's some principles that
allow you to make inferences that have
got some reason to be believed and also
principle allow you to make decisions
where you can have some reason to
believe you're not gonna make errors so
all that requires some assumptions about
what do you mean by an error what do you
mean by you know the probabilities and
but and you know it struck me after you
start making some assumptions you're led
to conclusions that yes I can guarantee
that you know you you know if you do
this in this way your probability making
error will be small your probability of
continuing to not make errors over time
will be small and probability you found
something that's real will be small will
be high the decision-making is a big
parts it may be the big part yeah so the
original so statistics you know short
history was that you know it's Carter
goes back this sort as a formal
discipline you know 250 years or so it
was called inverse probability because
around that era probability was
developed sort of is especially to
explain gambling situations of course
and interesting so you would say well
given the state of nature is this
there's a certain roulette or that has a
certain mechanism in it what kind of
outcomes do I expect to see and especial
if I do things long long amounts of time
what outcomes what I see in the
physicists are to pay attention to this
and then people said well given and
let's turn the problem around what if I
saw certain outcomes could I infer what
the underlying mechanism was that's an
inverse problem and in fact for quite a
while statistics was called inverse
probability that was the name of the
field and I believe that I was Laplace
who was working in Napoleon's government
who was trying to needed to do a census
of France learn about the people there
so he went got in gather data and he
analyzed that data to determine policy
and said let's call this field that does
this kind of thing statistics cuz the
the word state is in there in French
that's a table but you know it's the
study of data for the state
it's anyway that caught on and it's been
all statistics ever since but but by the
time it got formalized it was sort of in
the 30s and around that time there was
game theory and decision theory
developed and nearby people that era
didn't think of themselves as either
computer science or statistics or
control or econ they were all they were
all the above and so you know von Norman
is developing game theory but also
thinking about its decision theory Walt
is an economy trician developing
decision theory and then you know turned
that into statistics and so it's all
about here's a here's not just data and
you analyze it here's a loss function
here's what you care about here's the
question you're trying to ask here is a
probability model and here's the risk
you will face if you make certain
decisions and to this day and most
advanced statistical curricula you teach
decision theory is the starting point
and then it branches out and if the two
branches are Bayesian or frequentist but
um that's it's all about decisions in
statistics what is the most beautiful
mysterious may be surprising idea that
you've come across yeah good question um
I mean there's a bunch of surprising
ones there's something it's way too
technical for this thing but something
called James Stein estimation which is
kind of surprising and really takes time
to wrap your head around can you try to
make me I think I don't want to even
want to try um let me just say a
colleague at Steve a Steven stickler
University Chicago wrote a really
beautiful paper on James Stein
estimation which helps to its views of
paradox
it kind of defeats the minds attempts to
understand it but you can and Steve has
a nice perspective on that there so one
of the troubles with statistics is that
it's like in physics that are in quantum
physics you have multiple
interpretations there's a wave and
particle duality in physics and you get
used to that over time but it still kind
of haunts you that you don't really you
know quite understand the relationship
the electrons away when electrons are
particle well hmm well the same thing
happens here there is Bayesian ways of
thinking and frequentist and they are
different they they all they sometimes
become sort of the same in practice but
they're Fazal way different and then in
some practice they are not the same at
all they give you a rather different
answers and so it is very much like wave
in particle duality and that is
something you have to kind of get used
to in the field
can you define Beijing and frequentist
yeah decision theory you can make I have
a like I have a video that people could
see it's called are you amazing or a
frequentist and kind of help try to make
it really clear it comes from decision
theory so you know decision theory you
talk about loss functions which are
function of data X and parameter theta
as well a function of two arguments okay
now either one of those arguments is
known you don't know the data uh priori
it's random and the parameter is unknown
all right so you have this function of
two things you don't know when you're
trying to say I want that function to be
small I want small loss all right well
what are you gonna do so you sort of say
well I'm gonna average over these
quantities or maximize over them or
something so that you know I turned that
uncertainty into something certain so
you could look at the first argument an
average over it or you could look at the
second argument averaged over it that's
Bayesian frequentist so the the
frequentist says I'm gonna look at the X
the data and I'm gonna take that as
random and I've got average over the
distribution so I take the expectation
loss under X theta is held fixed alright
that's called the risk and so it's
looking at other all the data sets you
could get all right and saying how well
will a certain procedure do under all
those data sets
that's called a frequent as guarantee
all right so I think it is very
appropriate when like you're building a
piece of software and you're shipping it
out there and people reviews on all
kinds of data sets you want to have a
stamp a guarantee on it that as people
run it on many many data sets that you
never even thought about that
ninety-five percent of time it will do
the right thing
perfectly reasonable the Bayesian
perspective says well no I'm gonna look
at the other argument at the loss
function the theta part ok that's
unknown and I'm uncertain about it so I
could have my own personal probability
for what it is
you know how many tall people are there
out there I'm trying to infer the
average height of the population well I
have an idea of roughly what the height
is so I'm gonna over the the the theta
so now that loss function as only now
again one arguments gone now it's a
function of X and that's what a Bayesian
does is they say well let's just focus
on a particular
we got the data set we got we condition
on that conditional on the X I say
something about my loss
that's a Bayesian approach to things and
the Bayesian will argue that it's not
relevant to look at all the other data
sets you could have gotten and averaged
over them the frequentist approach it's
really only the data set you got all
right and I do agree with that
especially in situations where you're
working with a scientist you can learn a
lot about the domain and you really only
focus on certain kinds of data and you
gathered your data and you make
inferences I don't agree with it though
that it you know in the sense that there
are needs for frequentist guarantees
you're writing software people are using
it out there you want to say something
so these two things have to got to fight
each other a little bit but they have to
blend so long story short there's a set
of ideas that are right in the middle
that are called empirical Bayes and
empirical Bayes sort of starts with the
Bayesian framework it's it's kind of
arguably philosophically more you know
reasonable and kosher write down a bunch
of the math that kind of flows from that
and then realize there's a bunch of
things you don't know because it's the
real world and you don't know everything
so you're uncertain about certain
quantities at that point ask is there a
reasonable way to plug in an estimate
for those things okay and in some cases
there's quite a reasonable thing to do
to plug in there's a natural thing you
can observe in the world that you can
plug in and then do a little bit more
mathematics and assure yourself it's
really good my math are based on human
expertise what's what it wouldn't go
they're both going in the Bayesian
framework allows you to put a lot of
human expertise in but the math kind of
guides you along that path and then kind
of reassures at the end you could put
that stamp of approval under certain
assumptions this thing will work so
Pratt you asked question was my favorite
you know or was the most surprising nice
idea so one that is more accessible as
something called false discovery rate
which is you know you're making not just
one hypothesis test or making one
decision you're making a whole bag of
them and in that bag of decisions you
look at the ones where you made a
discovery you announced it something
interesting it happened all right that's
gonna be some subset of your big back in
the ones you made a discovery which
subset of those are bad there are false
false discoveries you like the fraction
of your false discoveries among
discoveries to be small that's a
different criterion that accuracy or
precision or recall or sensitivity and
specificity it's a different quantity
those latter ones that are almost all of
them have more of a frequentist flavor
they say given the truth is that the
null hypothesis is true here's what
accuracy would get are given that the
alternative is true here's what I would
get it's kind of going forward from the
state of nature to the data the Bayesian
goes the other direction from the data
back to the state of nature and that's
actually what false discovery rate is it
says given you made a discovery
okay that's conditioned on your data
what's the probability of the hypothesis
it's going the other direction and so
the classical frequency look at that so
I can't know that there's some priors
needed in that and the empirical
Bayesian goes ahead and plows forward
and starts writing down these formulas
and realizes at some point some of those
things can actually be estimated in a
reasonable way no and so it's kind of
it's a beautiful set of ideas so I this
kind of line of arguments come out it's
not certainly mine but it sort of came
out from Robins around 1960 Brad Efron
has written beautifully about this in
various papers and books and and the FDR
is you know been Yamini and Israel
John's story did this Bayesian
interpretation and so on so I've just
absorbed these things over the years and
find it a very healthy way to think
about statistics
you
Resume
Read
file updated 2026-02-13 13:24:40 UTC
Categories
Manage