Regina Barzilay: Deep Learning for Cancer Diagnosis and Treatment

File TXT tidak ditemukan.

Regina Barzilay: Deep Learning for Cancer Diagnosis and Treatment | Lex Fridman Podcast #40

x0-zGdlpTeg • 2019-09-23

Transcript preview

Open

Kind: captions
Language: en
the following is a conversation with
Regina Bardsley she's a professor at MIT
and a world-class researcher in natural
language processing and applications of
deep learning to chemistry and oncology
or the use of deep learning for early
diagnosis prevention and treatment of
cancer she has also been recognized for
teaching of several successful AI
related courses at MIT including the
popular introduction to machine learning
course this is the artificial
intelligence podcast if you enjoy it
subscribe on YouTube give it five stars
and iTunes supported on patreon or
simply connect with me on Twitter at Lex
Friedman spelled Fri D ma a.m.
and now here's my conversation with
Regina Bosley
in an interview you've mentioned that if
there's one course you would take it
would be a literature course for the
friend of yours that a friend of your
teachers just out of curiosity because I
couldn't find anything on it are there
books or ideas that had profound impact
on your life journey books and ideas
perhaps outside of computer science and
the technical fields I think because I'm
spending a lot of my time at MIT and
previously in other institutions where I
was a student I have limited ability to
interact with people so a lot of what I
know about the world actually comes from
books and they were quite enough of
books that had profound impact on me and
how I view the world let me just give
you one example of such a book I've
maybe a year ago read a book called the
emperor of all maladies it's a book
about it's kind of a history of science
book on how the treatments and drugs for
cancer were developed and that book
despite the fact that I am in the
business of science really opened my
eyes on how imprecise and imperfect the
discovery process is and how imperfect
our current solutions and what makes
science succeed and be implemented and
sometimes it's actually known the
strengths of the idea but devotion of
the person who wants to see it
implemented so this is one of the books
and you know at least for the last year
quite changed the way I'm thinking about
scientific process just from the
historical perspective and what do I
need to do to make my ideas really
implemented let me give you an example
of a book which is not kind of which is
a fiction book is a book called
Americana and
this is a book about a young female
student who comes from Africa to study
in the United States and it describes
her paths you know was in her studies
and her life transformation that you
know in a new country and kind of
adaptation to a new culture and when I
read this book I saw myself in many
different points of it but but it also
kind of gave me the lens on different
events and some event that I never
actually paid attention one the funny
stories in this book is how she arrives
to to her new college and she starts
speaking in English and she had this
beautiful British accent because that's
how she was educated in her country and
this is not my case and then she notices
that the person who talks to her you
don't talk to her in a very funny way in
a very slow way and she's thinking that
this woman is disabled in a and she's
also trying to kind of talk um a date
huh and then after a while when she
finishes her discussion with this
officer from her college she sees how
she interacts with the other students
with American students and he discovers
that actually she talked to her this way
because she saw that she doesn't
understand English and I he said wow
this is a fine experience and he
literally within few weeks I went to to
LA to a conference and they asked
somebody in the airport you know how to
find like a cab or something and then I
noticed this person is talking in a very
strange way and my first thought was and
this person have some you know
pronunciation issues or something and
I'm trying to talk very slowly to him an
average with another professor and
Frankel and he's like laughing because
it's funny that I don't get that the guy
is talking it in this way because he
think that I cannot speak so it was
really kind of mirroring experience as
it is let me think a lot about my own
experiences moving
you know from different countries so I
think that books play a big role in my
understanding of the world on the on the
science question you mentioned that it
made you discover that personalities of
human beings are more important than
perhaps ideas is that what I heard it's
not necessarily that they are more
important than ideas but I think that
ideas on their own
unknown sufficient and many times at
least at the local horizon is the
personalities and their devotion to
their ideas is really the locally
changes the landscape now if you're
looking at AI like let's say 30 years
ago
you know Dark Ages of AI or whatever
what the symbolic times you can use
anyone you know there is some people now
we're looking at a lot of that work and
we're kind of thinking this is not
really maybe a relevant work but you can
see that some people manage to take it
and to make it so shiny and dominate the
you know the academic world and make it
to be the standards if you look in the
area of natural language processing it
is well known fact and the reason the
statistics in NLP took such a long time
to became to become mainstream because
there were quite a number of
personalities which didn't believe in
this idea and it stopped research
progress in this area so I do not think
that you know can asymptotically maybe
personalities matters but I think
locally it does make quite a bit of
impact okay generally you know speed
adds
speed up the rate of adoption of the new
ideas yeah and and the other interesting
question is in the early days of
particular discipline I think you
mentioned in in that book was is
ultimately a book of cancer it's called
the Emperor of all maladies
yeah the yep and those maladies included
the trying to the medicine was the
center arm so it was actually centered
on you know how people sort of curing
cancer like like for me it was really a
disc
how people what was the science of
chemistry behind drug development that
it actually grew up out of the dyeing
like coloring industry that people who
developed chemistry in 19th century in
Germany and Britain - do you know the
really new dyes they looked at the
molecular and identified it they do
certain things to cells and from there
the process started and you know like
histology thing yeah this is fascinating
that they managed to make the connection
and look under the microscope and do all
this discovery but as you continue
reading about it and you read about how
chemotherapy drugs socially developed in
Boston and some of them were developed
and Farber dr. Farber from Dana Farber
you know how the experiments were done
that you know there was some
miscalculation let's put it this way and
they tried it on the patients and then
just and those were children with
leukemia and they died and they tried
another modification you look at the
process how imperfect is this process
and you don't like it well again looking
back like six years ago 70 years ago you
can kind of understand it but some of
the stories in this book which were
really shocking to me we're really
happening you know maybe decades ago and
we still don't have a vehicle to do it
much more fast and effective and you
know scientific the way of taking
computer science scientific so from the
perspective of computer science you've
gotten chance to work the application to
cancer and to medicine in general from a
perspective of an engineer and a
computer scientist how far along are we
from understanding the human body
biology of being able to manipulate it
in a way we can cure some of the
melodies some of the diseases so this is
very interesting question and if you're
thinking is a computer scientist about
this problem I think one of the reasons
that we succeeded in the areas we as a
computer scientist succeeded is because
we don't have we are not trying to
understand in some ways like if you're
thinking about like e-commerce Amazon I
was
doesn't really understand you and that's
why it recommends you certain books or
certain products correct and in you know
traditionally when people were thinking
about marketing you know they divided
the population to different kind of
subgroups identify the features of this
subgroup and come up with a strategy
which is specific to that subgroup if
you're looking about recommendation
system they're not claiming they
understanding somebody they're just
managing to from the patterns of your
behavior to recommend you a product now
if you look at the traditional Biogen
obviously I wouldn't say that I at any
way you know educated in this field but
you know what I see there is really a
lot of emphasis on mechanistic
understanding and it was very surprising
to me coming from computer science how
much emphasis is on this understanding
and given the complexity of the system
maybe the deterministic full
understanding of this processes is you
know beyond our capacity and the same
ways in computer science when we doing
recognition when you recommendation in
many other areas it's just probabilistic
matching process and in some way maybe
in certain cases we shouldn't even
attempt to understand we can attempt to
understand but in parallel we can
actually do this kind of matching that
would help us to find you out to do
early diagnostics and so on and I know
that in these communities it's really
important to understand but I am
sometimes wondering what exactly does it
mean to understand here well there's
stuff that works and but that can be
like you said separate from this deep
human desire to uncover the mysteries of
the universe of of science of the way
the body works the way the mind works
it's the dream of symbolic AI of being
able to reduce human knowledge into into
logic and be able to play with that
logic in a way that's very explainable
and understandable for us humans I mean
that's a beautiful dream so I understand
it but it seems that what seems to work
today
we'll talk about it more as as much as
possible reduced stuff into Data reduce
whatever problem you're interested in to
data and try to apply statistical
methods flat machine learning to that on
a personal note you were diagnosed with
breast cancer in 2014 what it facing
your mortality make you think about how
did they change you know this is a great
question and I think that I was
interviewed many times nobody actually
asked me this question I think I was 43
at a time and if the first time I
realized in my life that I may die and I
never thought about it before and yeah
and there was a long time since you
diagnosed until you're sure you know
what you have and have CV is your
disease for me it was like maybe two and
a half months and I didn't know where I
am
during this time because it was getting
different tests and one would say it's
bad and I would say no it is not so
until I knew where I am I really was
thinking about all these different
possible outcomes were you imagining the
worst or were you trying to be
optimistic or it would be really I don't
remember you know what was my thinking
it was really a mixture with many
components at the time at speaking you
know in our terms and one thing that I
remember and you know every test comes
and you're saying oh it could be this so
it may not be this and you're hopeful
then you're disparate so it's like if
there is a whole you know slow of
emotions it goes through but what I
remember is that when I came back to MIT
I was kind of going the whole times to
the treatment to MIT but was brain was
not really there but when I came back
here I finished went to each one that I
was here teaching and everything yeah I
look back at what my group was doing
what other groups was doing and I saw
these trivialities it's like people are
building their careers on improving some
parts around two or three percent or
whatever I was like seriously I did a
walk on how to decipher Ugaritic like a
languages nobody speak and and whatever
like what is significance when I was sad
and you know I walked out of MIT which
is
you know when people really do care you
know what happened to you I clear favor
is you know what is your next
publication to ACL to the world where
people you know people you see a lot of
sufferings that I'm kind of totally
shouldered on it on daily basis and it's
like the first time I've seen like real
life and real suffering and I was
thinking why are we trying to improve
the parser or deal with some
trivialities when we have capacity to
really make a change and it was really
challenging to me because on one hand
you know I have my graduate students
really want to do their papers and their
work and they want to continue to do
what they were doing which was great and
then it was me who really kind of
reevaluated what is the importance and
also at that point because I had to take
some break I look back into like my
years in science and I was thinking you
know like 10 years ago this was the
biggest thing I don't know topic models
that we have like millions of papers on
topic models and variation topics models
now it's really like irrelevant and you
you start looking at this you know what
do you perceive as important a different
point of time and how you know it's
fades over time and since we have a
limited time all of us have limited time
unless it's really important to
prioritize things that really matter to
you maybe matter to you at that
particular point but is important to
take some time and understand what
matters to you which may not necessarily
be the same as what matters to the rest
of your scientific community and pursue
that vision so though that moment did it
make you cognizant you mentioned
suffering of just the general amount of
suffering in the world is that what
you're referring to so as opposed to
topic models and specific detail
problems in NLP did did you start to
think about other people who have been
diagnosed with cancer that the way you
saw the started to see the world perhaps
oh absolutely and it actually creates
because
for instance you know these parts of the
treatment where you need to go to the
hospital every day and you see you know
the community of people that you see and
many of them are much worse then I I was
at a time and you're over sad and see it
all and people who are happy as some day
just because they feel better and for
people who are in our normal every aisle
you take it totally for granted that you
feel well that if you decide to go
running you can go running and you can
you know you're pretty much free to do
whatever you want with your body like I
saw like a community my community became
those people and I remember one of my
friends Dena kitabi took me to
Prudential to buy me a gift for my
birthday and it was like the first time
in months I said I went to kind of to
see other people and I was like wow
first of all these people you know
they're happy and they're laughing and
they're very different from this other
my people and second of singing I think
it totally crazy they're like laughing
and wasting their money on some stupid
gifts and you know they may die they
already may have cancer and and they
don't understand it so you can really
see how the mind changes that you can
see that you know before that you can
have didn't you know that you're gonna
die of course I knew but it was kind of
a theoretical notion it wasn't something
which was concrete and at that point
when you really see it and see how
little means some time the system has to
hummed and you really feel that we need
to take a lot of our brilliance that we
have here at MIT and translated into
something useful
yeah and you so couldn't have a lot of
definitions but of course alleviating
suffering alleviating trying to cure
cancer is a beautiful mission so I of
course know the theoretically the notion
of cancer but just reading more and more
about its 1.7 million new cancer cases
in the United States every year
600,000 cancer related deaths every year
so this has a huge impact United States
global
when broadly before we talk about how
machine learning how MIT can help when
do you think we as a civilization will
cure cancer how hard of a problem is it
from everything you've learned from it
recently I cannot really assess it what
I do believe will happen with the
advancement in machine learning that a
lot of types of cancer we will be able
to predict way early
and more effectively utilize existing
treatments I think I hope at least that
with all the advancements in AI and drug
discovery we would be able to much
faster find relevant molecules what I'm
not sure about is how long it will take
the medical establishment and regulatory
bodies to kind of catch up and to
implement it and they think this is a
very big piece of puzzle that is
currently not addressed the see really
interesting question so first a small
detail that I think the answer is yes
but is cancer one of one of the diseases
that when detected earlier that
significantly improves the outcomes it's
so like because we will talk about
there's the cure and then there is
detection and I think while machine
learning can really help is earlier
detection so the detection help
prediction is crucial for instance the
vast majority of pancreatic cancer
patients are detected at the stage they
are incurable that's why they have such
a you know terrible survival rate it's
like just few percent over five years
it's pretty much today a death sentence
but if you can discover this disease
early there are mechanisms to treat it
and in fact I know a number of people
who were diagnosed and saved just
because they had food poisoning they had
terrible food poisoning they went to our
and they got scan there were early
science on the scan
that would save their lives but this
wasn't really an accidental case so as
we become better we would be able to
help too many more people that have you
know that are likely to develop diseases
and I just want to say that as I got
more into this field I realize that you
know countries of course terrible
disease but they're really the whole
slew of terrible diseases out there like
neurodegenerative diseases and others so
we of course a lot of us are fixated on
cancer just because it's so prevalent in
our society and you see these people and
there are a lot of patients with
neurodegenerative diseases and that kind
of aging diseases that we still don't
have a good solution for and we you know
and I felt as a computer scientist we
kind of decided that it's other people's
job to treat these diseases because it's
like traditionally people in biology or
in chemistry or and these are the ones
who's thinking about it and after kind
of start paying attention I think that
it's really a wrong assumption and we
all need to join the bottle so how it
seems like in cancer specifically that
there's a lot of ways that machine
learning can help so what's what's the
role of machine learning in the
diagnosis of cancer so for many cancers
today we really don't know what is your
likelihood to get cancer
and for the vast majority of patients
especially on the younger patients it
really comes as a surprise like for
instance for breast cancer 80% of the
patients are first in their families
it's like me and I never thought that I
had any increased risk because you don't
nobody had it in my family and for some
reason in my head it was kind of
inherited disease
but even if I would pay attention the
the models that currently this is very
simplistic statistical models that are
currently used that in clinical practice
it really don't give you an answer so
you don't know and the same pancreatic
cancer the same truth for non-smoking
one cancer and many others
so what machine learning can do here is
utilize all this data to tell us le who
is like it'll be susceptible and using
all the information that is already
there beat imaging beat your other tests
and you know eventually liquid biopsies
and others where the signal itself is
not sufficiently strong for human eye to
do good discrimination because the
signal may be weak but by combining many
sources machine which is trained on
large volumes of data can really detect
it early and that what we've seen with
breast cancer and people are reporting
it in other diseases as well that really
boils down to data right and in the
different kinds of sources of data and
you mentioned regulatory challenges so
what are the challenges in gathering
large data sets in the space again
another great question so it took me
after I decided that I want to work on
it two years to get access to data and
like right now in this country there is
no publicly available data set of modern
mammograms that you can just go on your
computer sign a document and get it it
just doesn't exist I mean in obviously
every hospital has its own collection of
mammograms there are data that come out
if they came out of clinical trials what
we're talking about here is a computer
scientist who just want to run his or
her model and see how it works this data
like imagenet doesn't exist and they you
know there is an e said which is called
like florida data set which is a film
mammogram from 90s which is totally not
representative of the current
developments whatever you're learning on
them doesn't scale up this is the only
resource that is available
and today there are many agencies that
govern access to data like the hospital
holds your data and the hospital decides
whether they would give it to the
researcher to walk with this data
individual hospital yeah I mean the
hospital may you know assume is that
you're doing a surgical operation you
can submit you know there is appropriate
prove all process guided by IRB and you
if you go through all the processes you
can eventually get access to the data
but if you yourself know I community
they don't know that many people
culturally ever go to access to data
because it's very challenging process
and Sarge isn't a quick comment eat MGH
or any kind of hospital are they
scanning the data that they digitally
storing it oh it is already digitally
stored you don't need to do any extra
processing steps it's already there in
the right format is that all right now
there are a lot of issues that govern
access to the data because the hospital
is legally responsible for for the data
and you know they have a lot to lose if
they give the data to the wrong person
but they may not have a lot to gain if
they gave it as a hospital as a legal
entity as giving it to you and the way
you know whatever dimension happening in
the future is the same thing that
happens when you're getting your driving
license you can decide whether you want
to donate your organs so you can imagine
that whenever a person goes to the
hospital they it should be easy for them
to the name their data for research and
it can be different kind of do they only
give you your test results or only
mammogram only imaging data or the whole
medical record because at the end
we all will benefit from all this
insights and it's not like you say I
want to keep my data private but I would
really love to get it you know from
other people because other people think
in the same way so if there is a
mechanism to do this the nation and and
the patient has an ability to say how
they want to use their data for research
it would be really a game-changer
people when they think about this
problem there's a it depends on the
population the pains and the
demographics but there's some privacy
concerns generally we're not just
medical data just say any kind of data
it's what you said my data it should
belong kind of to me I'm worried how
it's going to be misused how how do we
alleviate those concerns is that seems
like a problem that needs to be that
problem of trust of transparency needs
to be solved before we build large data
sets that help detect cancer help save
those very people and there in the
future so similar to things that could
be done there is a technical solutions
and there are societal solutions so on
the technical and we today have ability
to improve disambiguation
like for instance for imaging it's you
know for imaging you can do it pretty
well
what's this ambiguous and it's removing
the identification removing the names of
the people there are other data like if
it isn't Rotax you cannot really achieve
99.9 percent but there are all these
techniques that I should some of them I
developed at MIT how you can do learning
on the encode the data where you locally
encode the image you train on network
which only works on the encoded on
encoded images and then you send the
outcome back to the hospital and you can
open it up so those are the technical
solution there are a lot of people who
are walking in this space where the
learning happens in the encoded form I
we're still early but this is the
interesting research area what I think
will make more progress
there is a lot of work in natural
language processing community how to do
the identification better but even today
there already a lot of data which can be
de-identified perfectly like your test
data for instance correct where you can
just you know the name of the patient
you just want to extract the part with
the numbers the big problem here is
again hospitals don't see much incentive
to give this data away on one hand and
then it is general concern now when I'm
talking about societal benefits and
about the education the public needs to
understand that I think that there are
situation and I still remember myself
when I really needed an answer I had to
make a choice there was no information
to make a choice you're just guessing
and at that moment you feel that your
life is at the stake but you just don't
have information to make the choice and
many times when I give talks I get
emails from women who say you know I'm
in this situation can you please run
statistic and see what are the outcomes
we get almost every week a mammogram
that comes by me to my office at MIT I'm
serious
that people ask to run because they need
to make you know life-changing decisions
and of course you know I'm not planning
to open a clinic here but we do run and
give them the results for their doctors
but the point that I'm trying to make
that we all at some point or our loved
ones will be in the situation where you
need information to make the best choice
and if this information is not available
you would feel vulnerable and
unprotected and then the question is you
know what do I care more because at the
end everything is a trainer of correct
yeah exactly
just out of curiosity what it seems like
one possible solution I'd like to see
what you think of it
based on what you just said based on
wanting to know answers for anyone urine
yourself in that situation is it
possible for patients to own their data
as opposed to hospitals owning their
data of course theoretically I guess
patients own their data but can you walk
out there with the USB stick containing
everything or uploaded to the cloud
we're a company you know I remember
Microsoft had a service like I try I was
be really excited about and Google
health was there I tried to give and I
was excited about it basically companies
helping you upload your data to the
cloud so that you can move from hospital
to hospital from Doctor to doctor do you
see a promise of that kind of
possibility I absolutely think this is
you know the right way to to exchange
the data
I don't know now who is the biggest
player in this field but I can clearly
see that even you know for even for
totally selfish health reasons when you
are going to a new facility and many of
us ascend to some specialized treatment
they don't easily have access to your
data and today you know we wouldn't want
to send this mammogram need to go to the
hospital find some small office which
give them that CD and they ship as the
CDC you can imagine we're looking at the
kind of decades-old
mechanism of data exchange
so I definitely think this is in the
area where hopefully all the right
regulatory and technical forces will
align and we will see it actually
implemented it's sad because
unfortunately and I have I need to
research why that happened but I'm
pretty sure Google Health and
Microsoft's HealthVault or whatever it's
called both closed down which means that
there was either regulatory pressure or
there's not a business case or there's
challenges from hospitals which is very
disappointing so when you say you don't
know what the biggest players are the
two biggest that I was aware of close
the doors so I'm hoping uh I'd love to
see why and I'd love to see who else can
come up it seems like one of those Elon
Musk style problems that are obvious
needs to be solved and somebody needs to
step up and actually do this large-scale
data collection I know that is an
initiative in Massachusetts the thing
that you led by the governor to try to
create this kind of house exchange
system or at least to help people who
kind of when you show up in emergency
room and there is no information about
what our ologists and other things so I
drove how far it will go
but another thing is you said and I find
it very interesting it's actually who
are the successful players in this space
and the whole implementation how does it
go two meters from the anthropological
perspective it's more fascinating that
AI that today goes in healthcare you
know we've seen so many you know
attempts and so very little successes
and it's interesting to understand that
I've by no means you know have knowledge
to assess why we are in the position
where we are yeah it's interesting as a
data is really fuel for a lot of
successful applications and when that
data requires regulatory approval like
the FDA or any kind of approval it's
seems that the computer scientists are
not quite there yet in being able to
play the regular game understanding the
fundamentals of it I think that in many
cases when even people
do you have data we still don't know
what exactly do you need to demonstrate
to change the standard of care well like
let me give you example related to my
breast cancer research so traditional in
traditional breast cancer risk
assessment there is something called
density which determines the likelihood
of a woman to get cancer and this is
pretty much this how much white do you
see on the mammogram the white say it is
and the more likely the tissue is dense
and the idea behind density it's not a
bad idea in 1967 a radiologist called
wolf decided to look back at women who
were diagnosed and see what is special
in the images can we look back and says
that they're likely to develop so he
come up with some patters it was the
best that his human I can you know can
identify then it was kind of formalized
and coded into four categories and that
what we are using today and today this
density assessment is actually a federal
law from 2019 they're approved by
President Trump and for the previous FDA
Commissioner where women are supposed to
be advised by their providers if they
have high density putting them into
high-risk category and in some states
you can actually get supplementary
screening paid by your insurance because
you are in this category now you can say
how much science do we have behind it
whatever biological science or
epidemiological evidence so it turns out
that between 40 and 50 percent of women
have dense breasts so about 40 percent
of patients are coming out of their
screening and somebody tells them you
are in high risk now what exactly does
it mean if you as half of the population
high risk its recede maybe I'm not you
know what do I really need to do with it
because the system doesn't provide me a
lot of the solutions because there are
so many people like me we cannot really
provide very expensive solutions for
them
and the reason this whole density became
this big deal it's actually advocated by
the patients who felt very unprotected
because many women when did the
mammograms which were normal and then it
turns out that they already had cancer
quite developed cancer so they didn't
have a way to know who is really at risk
and what is the likelihood it when the
doctor tells you you're okay you are not
okay well at the time and it was you
know 15 years ago this maybe was the
best piece of science that we had and it
took you know quite 1516 years to make
it federal law but now that this is this
is a standard now within the planning
model we can so much more accurately
predict who is gonna develop breast
cancer just because you're trained on a
logical thing and instead of describing
how much white and what kind of white
machine can systematically identify the
patterns which was the original idea
behind the sort of the traditions
machinists can do it much more
systematically and predict the risk when
you train the machine to look at the
image and to say the risk in one to five
years now you can ask me how long it
will take to substitute this density
which is broadly used across the country
and I really it's not helping to bring
this new models and I would say it's not
a matter of the algorithm algorithm is
already orders of magnitude better the
thought is currently in practice I think
it's really the question who do you need
to convince how many hospitals do you
need to run the experiment both you know
all this mechanism of adoption and how
do you explain to patients and to women
across the country that this is really a
better measure and again I don't think
it's in AI question we can walk more and
make the algorithm even better but I
don't think that this is the current you
know the barrier the barrier is really
this other piece that for some reason is
not really explored it's like
anthropological trees and coming back to
a question about books there is a book
that I am reading it's called American
sickness by
Elizabeth was in town and I got this
book from my clinical collaborator dr.
Connie Limon and I said I know
everything that I need to know about
American health system but you know
every page doesn't fail to surprise me
and I think there is a lot of
interesting and really deep lessons for
people like us from computer science who
are coming into this field to really
understand how complex is the system of
incentives in the system to understand
how you really need to play to drive
adoption you just said it's complex but
if we're trying to simplify it who do
you think most likely would be
successful if we push on this group of
people is that the doctors is it the
hospitals is it the governments of
policymakers is it the individual
patients consumers who needs to be
inspired to most likely lead to adoption
or is there no simple answer there's no
simple answer but I think there is a lot
of good people in medical system who do
want you know to make a change
and I think a lot of power will come
from us as a consumers because we all
are consumers or future consumers of
healthcare services and I think we can
do so much more in explaining the
potential and not in the hype terms and
not saying that we now killed all
antimatter and you know I'm really sick
of reading this kind of articles which
made these claims but really to show
with some examples what this
implementation does and how it changes
the care because I can't imagine doesn't
matter what kind of politician it is you
know we all are susceptible to these
diseases there is no one who is free and
eventually you know we all are humans
and we're looking for way to alleviate
the suffering and and this is one
possible way where we can't the
underutilizing which i think can help so
it sounds like the biggest problems are
outside of AI in terms of the biggest
impact at this point but are there any
open problems in the application of ml
to oncology in general so improving the
detection or any other creative methods
whether it's on the detection
segmentations of the vision perception
side or some other clever of inference
yeah what would it in general in youth
any of you are the open problems in this
space yeah I just want to mention sit
beside detection another area what I am
kind of quite active and I think it's
really an increasingly important area in
house care is drug design
because you know it's fine if you detect
something early but you still need to
get you know to get drugs and new drugs
for these conditions and today all of
the drug design ml is non-existent that
we don't have any drug that was
developed by their male model or even
not developed by at least even you that
ml model plays a significant role I
think this area was all the new ability
to generate molecules with desired
properties to do in silica screening is
really a big open area in to be totally
honest with you now when we are doing
diagnostics and imaging primarily taking
the ideas that we develop for other
areas and you applying them with some
adaptation the area of you know drug
design is very technically interesting
and exciting area you need to work a lot
with graphs and capture various 3d
properties there are lots and lots of
opportunities to be technically creative
and I think there are a lot of open
questions in this area you know we're
already getting a lot of successes even
you know with the kind of the first
generation of this models but there is
much more new creative things that you
can do and what's very nice to see is it
actually the you know the the more
powerful
the more interesting models actually do
do better so there is a place to to
innovate in machine learning in this
area and some of these techniques are
really unique to let's say to you know
graph generation and other things so
what just an interpreter quick I'm sorry
graph generation or graphs drug
discovery in general what's what how do
you discover a drug is this chemistry is
this trying to predict different
chemical reactions or is it some kind of
water graphs even represented in this
paper and what's a drug okay so let's
say you think there are many different
types of drugs but let's say you're
gonna talk about small molecules because
I think today the majority of drugs are
small molecules so small molecule is a
graph the molecule is just where the
node in the graph is an atom and then
you have the bone so it's really a graph
representation if you look at it in 2d
correct
you can do it 3d but let's say well
let's keep it simple and stick in 2d so
pretty much my understanding today how
it is done a scale in the companies
you're without machine learning you have
high throughput screening so you know
that you are interested to get certain
biological activity over the compound so
you scan a lot of compounds like maybe
hundreds of thousands some really big
number of compounds you identify some
compounds which have the right activity
and then at this point you know the
chemists come and they're trying to now
to optimize this original heat to
different properties that you want it to
be maybe soluble you want to decrease
tax ECG you want to decrease the side
effects against your dropper can that be
done in simulation or just by looking at
the molecules or do you need to actually
run reactions and real labs with lab it
is so when you do high throughput
screening you really do screening it's
in the lab it's it's really the lab
screening you screen the molecules
corrected screening you just check them
for certain property like in the
physical
space in the physical world like
actually there's a machine probably
that's doing some actually running the
race actually running reactions yeah so
so there is a process where you can run
in this race go high through bodily you
know it become cheaper and faster to do
it and very big number of molecules you
run the screaming you identify potential
you know potential good starts and then
we're the chemists come in who you know
I've done it many times and then they
can try to look at it and say how can I
change the Millennial to get the desired
profile in terms of all other properties
so maybe how do you make it more by
octave and so on and they're you know
the creativity of the chemist really is
the one the determines the success of
this design because again they have a
lot of domain knowledge you know what
works how do you decrease the CCD and so
on and that's what they do so all the
drugs that are currently you know in the
fda-approved Iraq serving drugs that are
in clinical trials they are designed
using these domain experts which goes
through this combinatorial space of
molecular graphs or whatever and find
the right one now adjust it to be the
right ones
sounds like the the breast density
heuristic from sixty seven the same
echoes it's unnecessary is that it's
really you know it's really driven by
deep understand it so like they just
observe it I mean they do deeply
understand chemistry and they do
understand how different groups and how
does it changes the properties so there
is a lot of science it gets into it and
a lot of kind of simulation how do you
want it to behave it eats very very
complex
they're quite effective at this is no
effective yeah we have drugs like a
spinning in how do you measure effect if
if you measure it's in terms of cost its
prohibitive if you measure the incidence
of times you know we have lots of
diseases for which we don't have any
drugs and we don't even know how to
approach and don't need to mention few
drugs on your generative disease drugs
that fail you know so there are lots of
you know trials of face
you know in later stages which is really
catastrophic from the financial
perspective so you know is it is it the
effective the most effective mechanism
absolutely no but this is the only one
that currently works and I would you
know I was closely interacting was
fueling pharmaceutical industry I was
really fascinating on how sharp and and
what a deep understanding of the domain
do they have it's not an observation
driven it's there is really a lot of
science behind what they do but if you
ask me can machine learning change it I
firmly believe yes because even the most
experienced chemist cannot you know hold
in their memory and understanding ever
since you can learn you know from
millions of molecules and reactions and
and this piece of grass is a totally new
space I mean it's a it's a really
interesting space for machine learning
to explore graph generation yeah so
there's a lot of thing that you can do
here so we do a lot of work so the first
tool that we started with was the tool
that can predict properties of the
molecules so you can just give the
molecular molecule and the property it
can be by activity property or it can be
some other property and you train the
molecules and you can now take a new
molecule and predict this property now
when people started working in this area
it is something very simple and they're
kind of existing you know fingerprints
which is kind of handcrafted features of
the molecule when you break the graph to
substructures and then you run it in a
feed-forward neural network and what it
was interesting to see that clearly you
know this was not the most effective way
to proceed and you need to have much
more complex models that can induce the
representation which can translate this
graph into the embeddings and and do
these predictions so this is one
direction and another direction which is
kind of related is not only to stop by
looking at the embedding itself but
actually modify it to produce better
molecules so you can think about it as
the machine translation that you can
start with a molecule and then there is
an improved version of molecular and you
can again within coda translate it into
the hidden space and then learn how to
modify to improve the in some ways
version of the molecules so that's it's
kind of really exciting we already seen
that the property prediction works
pretty well and now
we are generating molecules and there is
actually loves which are manufacturing
this molecule so we'll see why it will
get us okay that's really exciting that
so there's a lot of promise
speaking of machine translation and
embeddings I think you do you have done
a lot of really great research in NLP
natural language processing can you tell
me your journey through NLP what ideas
problems approaches were you working on
were you fascinated with did you explore
before this magic of deep learning
re-emerged and after so when I started
for my working in LP it was the 97th
this is very interesting time it was
exactly the time that I came to ACL and
the time I could barely understand
English but it was exactly like the
transition point because half of the
papers where really you know rule-based
approaches where people took more kind
of heavy linguistic approaches for small
domains and try to build up from there
and then they were the first generation
of papers which were corpus based papers
and they were very simple in our terms
when you collect some statistics and do
prediction based on them and I found it
really fascinating that you know one
community can think so very differently
about you know about the problem and I
remember the first paper that I wrote it
didn't have a single formula it didn't
have evaluation it just had examples of
outputs and this was a standard of the
field at a time in some ways I mean
people maybe just started emphasizing
the empirical evaluation but for many
applications like summarization your
interest or some examples of outwards
and then increasingly you can see that
how the statistical approach is
dominated the field and we've seen you
know increased performance across many
basic tasks the sad part of the story
may be that if you look again through
this journey we see that the role of
linguistics in some ways greatly
diminishes and
I think that you really need to look
through the whole proceeding to do to
find Martin to papers which make some
interesting linguistic references it's
really today today today this was
different active trees just even
basically against our conversation about
human understanding of language which I
guess what linguistic this would be
structured parkour represent
representing language in a way that's
human explainable understandable is
missing know if it is what is
explainable and understandable in the
end you know we perform functions and
it's okay do you have machine which
performs a function like when you're
thinking about your calculator correct
your calculator can do calculation very
different from you would do the
calculation but it's very effective I
mean and this is fine if we can achieve
certain tasks with high accuracy it
doesn't necessarily mean that it has to
understand it the same way as we
understand in some ways it's even the
eve to request because you have so many
other sources of information that are
absent when you are training your system
so it's ok is it delivers it I never
tell you one application this is really
fascinating in 97 when it came to ACL
there was some papers on machine
translation they were like primitive
like people were trying really really
simple and the feeling my feeling wasn't
you know to make real machine
translation system it's like to fly and
the moon and build a house there in the
garden and live happily ever after I
mean it's like impossible I never could
imagine that within you know 10 years we
would already see the system working and
now you know nobody is even surprised to
utilize the system on daily basis so
this was like a huge huge progress
saying that people for very long time
try to solve using other mechanisms and
they were unable to solve it that's why
coming back to a question about biology
then you know in linguistics people try
to go this way and try to write the the
syntactic trees and try to abstract it
and to find the right representation and
you know they couldn't get very
five with this understanding while these
models using you know other sources
actually cable to make a lot of progress
now I'm not naive to think but we are in
this paradise space in NLP and shows you
know that when we slightly change the
domain and when we decrease the amount
of training it can do like really
bizarre and funny thing but I think it's
just a matter of improving
generalization capacity which is just a
technical question Wow so that's that's
the question how much of language
understanding can be solved with deep
neural networks in your intuition I mean
it's unknown I suppose but as we start
to creep towards romantic notions of the
spirit of the Turing test and
conversation and dialogue and something
that may be to to me or to us silly
humans feels like it needs real
understanding how much can I be achieved
with these neural networks or
statistical methods so I guess I am very
much driven by the human by the outcomes
can we achieve the performance which
will be satisfactory for for us for
different tasks now if you again look at
machine translation system which are you
know trained on large amounts of data
they really can do a remarkable job
relatively to where they've been a few
years ago and if you you know if you
project into the future if it will be
the same speed of improvement
you know this is great now does it
bother me that it's not doing the same
translation as we are doing now if you
go to cognitive science we still don't
really understand what we are doing I
mean there are a lot of theories so
there is obviously a lot of progress and
standing but our understanding what
exactly goes on you know in our brains
when we process language it's still not
crystal clear and precise that we can
translate it into machines
what does bother me is that you know
again that machines can be extremely
brittle when you go out of your comfort
zone of that when there is a
distributional shift between training
and testing and it have been years and
years every year when I teach NOP class
you know show them some examples of
translation from some newspaper in
Hebrew whatever it was perfect and then
they have a recipe that to me a closed
system sent me a while ago and it was
w

Resume

Berikut adalah rangkuman komprehensif dan terstruktur dari transkrip video yang diberikan.

***

# Perpaduan AI dan Onkologi: Perjalanan Regina Bardsley Mengubah Data Menjadi Harapan

### Inti Sari (Executive Summary)
Video ini membahas perjalanan profesional dan pribadi Profesor Regina Bardsley dari MIT, yang transisi dari pakar *Natural Language Processing* (NLP) menjadi pelopor penerapan *Machine Learning* (ML) dalam onkologi setelah didiagnosis menderita kanker payudara. Diskusi mencakup potensi besar AI dalam diagnosis dini dan penemuan obat, kendala sistemik dalam akses data medis, serta evolusi pemahaman kita tentang kecerdasan buatan. Video juga menyinggung filosofi mendalam tentang pentingnya pendidikan yang inklusif dan menemukan misi hidup yang sesungguhnya di tengah tekanan akademis.

### Poin-Poin Kunci (Key Takeaways)
*   **Perubahan Arah Karir:** Diagnosis kanker payudara pada tahun 2014 mengubah perspektif Regina dari mengejar kemajuan akademis "sepele" (peningkatan akurasi kecil) ke fokus pada penerapan teknologi untuk mengatasi penderitaan manusia.
*   **AI dalam Kesehatan:** ML memiliki potensi besar untuk mendeteksi kanker lebih awal melalui pemindaian medis dan mempercepat penemuan obat melalui *graph generation*, namun saat ini belum ada obat yang disetujui FDA yang sepenuhnya dikembangkan oleh ML.
*   **Krisis Data Medis:** Tidak adanya kumpulan data publik (seperti ImageNet untuk penglihatan komputer) dan hambatan regulasi rumah sakit menjadi penghalang utama kemajuan AI di bidang kesehatan.
*   **Evolusi NLP:** Perkembangan *Natural Language Processing* bergeser dari pendekatan linguistik berbasis aturan ke pendekatan statistik, membuktikan bahwa mesin tidak perlu "memahami" bahasa seperti manusia untuk memberikan hasil yang efektif.
*   **Pendidikan & Makna Hidup:** Pendidikan ML harus diakses untuk non-ilmuwan komputer, dan menemukan misi internal jauh lebih penting daripada pencapaian eksternal atau validasi sosial.

---

### Rincian Materi (Detailed Breakdown)

#### 1. Pengaruh Literatur dan Peran Kepribadian dalam Sains
Regina Bardsley, Profesor MIT yang meneliti NLP dan *deep learning* untuk kimia/onkologi, membagikan pengaruh buku terhadap pandangannya:
*   ***The Emperor of All Maladies*:** Membuka matanya bahwa penemuan ilmiah itu tidak presisi dan seringkali bergantung pada dedikasi individu pelaksana ide, bukan sekadar kekuatan ide itu sendiri.
*   ***Americanah*:** Memberikan lensa budaya mengenai pengalaman minoritas, yang Regina rasakan secara langsung melalui interaksi sehari-hari yang mengasumsikan ketidakmampuan berdasarkan penampilan.
*   **Adopsi Ide Ilmiah:** Kepribadian dan dedikasi individu sangat berperan dalam percepatan adopsi ide baru di komunitas ilmiah, terbukti dari sejarah statistik di NLP yang butuh waktu lama diterima.

#### 2. Diagnosa Kanker dan Perubahan Pandangan Hidup
Pada usia 43 tahun (2014), Regina didiagnosis menderita kanker payudara.
*   **Ketidakpastian:** Periode diagnosa selama 2,5 bulan penuh dengan emosi campur aduk antara harapan dan keputusasaan.
*   **Dampak pada Karir:** Saat kembali bekerja di MIT, ia merasa pekerjaan akademisnya yang biasa (meningkatkan akurasi model sedikit demi sedikit) adalah hal yang remeh (*trivialities*) dibandingkan dengan penderitaan pasien kanker.
*   **Keputusan:** Ia memutuskan untuk mengalihkan kecerdasan dan sumber daya di MIT untuk sesuatu yang bermanfaat secara nyata dalam mengurangi penderitaan, bukan sekadar publikasi ilmiah.

#### 3. Tantangan AI dalam Diagnosis dan Akses Data
*   **Potensi Deteksi Dini:** Kanker seringkali terdeteksi pada stadium yang sudah parah (seperti kanker pankreas). ML dapat membaca sinyal lemah dari pemindaian medis yang mungkin terlewatkan oleh mata manusia.
*   **Hambatan Data:** Berbeda dengan bidang AI lain yang memiliki data terbuka, data medis modern sulit diakses. Tidak ada dataset mamografi publik yang representatif saat ini.
*   **Solusi Donasi Data:** Regina mengusulkan mekanisme di mana pasien dapat dengan mudah mendonorkan data medis mereka untuk penelitian, mirip dengan konsep donor organ.
*   **Privasi:** Teknologi seperti *de-identification* dan pelatihan pada data yang telah dienkode secara lokal dapat menjaga privasi pasien tanpa menghambat kemajuan riset.

#### 4. Hambatan Sistemik dan Adopsi Teknologi
*   **Kegagalan Inisiatif Data:** Layanan seperti Google Health dan Microsoft HealthVault yang mencoba memusatkan data kesehatan telah ditutup, menunjukkan kesulitan dalam ekosistem kesehatan saat ini.
*   **Regulasi vs. Inovasi:** Penghalang utama bukan pada algoritma (yang sudah jauh lebih akurat daripada metode lama seperti penilaian kepadatan payudara), melainkan pada mekanisme adopsi, persetujuan FDA, dan kompleksitas insentif dalam sistem kesehatan Amerika.
*   **Peran Konsumen:** Perubahan akan didorong oleh konsumen (pasien) yang menuntut perawatan yang lebih baik berbasis data, bukan hanya oleh inovator teknologi.

#### 5. Machine Learning dalam Penemuan Obat (*Drug Discovery*)
*   **Representasi Molekul:** Obat kecil dapat direpresentasikan sebagai grafik (simpul = atom, sisi = ikatan).
*   **Metode Tradisional vs ML:** Metode lama menggunakan *High-Throughput Screening* (HTS) fisik yang mahal dan lambat. ML dapat memprediksi sifat molekul dan menghasilkan molekul baru ("molecule generation") dengan lebih efisien, mirip seperti terjemahan bahasa.
*   **Status Saat Ini:** Meskipun perusahaan sudah mulai memproduksi molekel berbasis AI, belum ada obat yang disetujui FDA yang sepenuhnya dikembangan oleh model ML.

#### 6. Evolusi NLP dan Filosofi "Pemahaman" Mesin
*   **Sejarah NLP:** Pada 1997, terjadi peralihan dari pendekatan aturan linguistik ke pendekatan statistik/korpus. Pendekatan linguistik gagal bersaing dengan model statistik dalam hal performa.
*   **Pemahaman vs Fungsi:** Regina berpendapat mesin tidak perlu "memahami" bahasa seperti manusia untuk berguna (analogi: kalkulator tidak mengerti matematika tapi berguna).
*   **Kerapuhan Model:** Model saat ini masih rapuh (*brittle*) saat menghadapi pergeseran data (misal: menerjemahkan resep masakan Finlandia setelah dilatih pada berita).
*   **Uji Turing:** Manusia sebenarnya sangat siap untuk percaya bahwa mesin "mengerti", sebagaimana terbukti dari eksperimen ELIZA di masa lalu di mana orang berbicara berjam-jam dengan program sederhana.

#### 7. Masa Depan AI, Pendidikan, dan Antarmuka Otak
*   **Benchmark Masa Depan:** Tantangan berikutnya adalah *future learning* (belajar dari sedikit contoh), yang saat ini belum realistis secara fungsional.
*   **Augmentasi Manusia:** AI harus berfungsi untuk meningkatkan kecerdasan manusia. Antarmuka otak-komputer (seperti Neuralink) dan umpan balik kognitif (seperti pelacak perhatian) memiliki potensi besar.
*   **Pendidikan MIT:** Regina membuat kelas ML baru untuk mahasiswa non-ilmuwan komputer karena hambatan utamanya bukan pada pemrograman (Python), melainkan pada bahasa matematika (notasi Big O, aljabar linear, probabilitas).

#### 8. Filosofi Hidup dan Saran untuk Generasi Muda
*   **Menemukan Misi Internal:** Peneliti sering tersesat dalam mengejar validasi eksternal atau publikasi semata, namun Regina menekankan bahwa menemukan misi internal yang memberikan dampak nyata bagi kemanusiaan jauh lebih penting.

## Kesimpulan & Pesan Penutup
Perjalanan Profesor Regina Bardsley menegaskan bahwa integrasi AI dalam onkologi memiliki potensi besar untuk mengubah diagnosis dan pengobatan kanker, meskipun masih dihadang oleh tantangan akses data dan regulasi. Di luar aspek teknis, kisah ini mengajarkan pentingnya menggeser fokus dari pencapaian akademis semata menuju misi yang memberikan manfaat nyata bagi penderitaan manusia. Mari kita dorong kolaborasi antara teknologi dan kemanusiaan, serta memperjuangkan pendidikan yang inklusif untuk masa depan yang lebih baik.

Read

file updated 2026-02-13 13:22:26 UTC