Transcript

P_fHJIYENdI • AlphaFold - The Most Useful Thing AI Has Ever Done
/home/itcorpmy/itcorp.my.id/harry/yt_channel/out/veritasium/.shards/text-0001.zst#text/0401_P_fHJIYENdI.txt
Back Raw
Kind: captions
Language: en
what if all of the world's biggest
problems from climate change to curing
diseases to disposal of plastic waste
what if they all had the same solution a
solution so tiny it would be invisible
I'm inclined to believe this is possible
thanks to a recent breakthrough that
solved one of the biggest problems of
the last century how to determine the
structure of a protein it's been
described to me as as equivalent to
Fermat's Last Theorem but for biology
over six decades tens of thousands of
biologists painstakingly worked at the
structure of 150,000 proteins then in
just a few years a team of around 15
determined the structure of 200 million
that's basically every protein known to
exist in nature so how did they do it
and why does this have the potential to
solve problems way outside the realm of
biology a protein starts simply as a
string of amino acids each amino acid
has a carbon atom at the center then on
one side is an amine group and on the
other side is a carboxy group and the
last thing it's bonded to could be one
of 20 different side chains and which
one determines which of the 20 different
amino acids this molecule
is the Amin group from one amino acid
can react with the carboxy group of
another to form a peptide bond so a
series of amino acids can bond to form a
string and pushing and pulling between
countless molecules electrostatic forces
hydrogen bonds solvent interactions can
cause this string to coil up and fold
onto itself this ultimately determines
the 3D structure of the protein and this
shape is the thing that really matters
about the protein it's built for a
specific purpose like how hemoglobin has
the perfect binding sight to carry
around oxygen in your blood these are
machines they need to be in their
correct orientation in order to work
together to move for example the
proteins in your muscles they change
their shape a little bit in order to
pull and contract but it would take
people a long time to get the structure
of just one protein absolutely so what
should proteins look like uh was only
started to answer really with
experimental techniques the first way
protein structure was determined was by
creating a crystal out of that protein
this was then exposed to x-rays to get a
defraction pattern and then scientists
would work backwards to try to figure
out what shape of molecules would create
such a
pattern it took British biochemist John
kendrew 12 years to get the first
protein structure his Target was an
oxygen storing protein called myoglobin
an important protein in our
hearts he first tried a horse heart but
this produced rather small crystals
because it didn't have enough
myoglobin he knew diving mammals would
have lots of myoglobin in their muscles
since they the best at conserving oxygen
so he obtained a huge chunk of whale
meat from Peru this finally gave kendrew
large enough crystals to create an x-ray
defraction image and when it came out it
looked really weird people expected
something kind of logical mathematical
understandable and it almost looked I
wouldn't say ugly but intricate and
complex and kind of like if you see a
rocket motor right and all the parts
hanging off this structure which has
been called turd of the century one can
drew the 1962 Nobel Prize in
chemistry over the next two decades only
around a 100 more structures were
resolved even today protein
crystallization remains a big challenge
frankly you know it is not uncommon that
just a couple uh protein structures can
be someone's entire PhD sometimes just
one sometimes even just progress toward
one and it's expensive x-ray
crystallography can cost tens of
thousands of dollars per protein so
scientists sought another way to work
out protein structure it only costs
around $100 to find a protein sequence
of amino acids so if you could use this
to figure out how the protein would fold
that would save a lot of time effort and
money I kind of know how carbon behaves
and I know how a carbon sticks to a
sulfur and how that might you know stick
next to a nitrogen and if these ones are
here then I can imagine this one folding
making that Bond there so it seems like
if you have some sense of basic
molecular Dynamics you might be able to
figure out how this protein is going to
fold one of the few true predictions in
biology was actually lonus Pauling
looking at just the geometry of the
building blocks of proteins and saying
say actually they should make huses and
sheets that's what we call secondary
structure the very local kind of twists
and turns of the protein but beyond
helices and sheets biochemists could not
figure out any reliable patterns that
would lead to the final structure of all
proteins one reason for this is that
Evolution didn't design proteins from
the ground up it's kind of like a
programmer that doesn't know what
they're doing and whenever it looked
good they just kept adding that kind of
thing and that's that's how you end up
with these both amazing objects and
incredibly complex and hard to describe
they don't have purpose underneath them
in the same way as like a human designed
um
machinewood to illustrate just how
complicated this process can get MIT
biologist Cyrus lenthal did a back of
the envelope calculation and he showed
that even a short protein chain with 35
amino acids can fold in an astronomical
number of
ways so even if a computer checked the
energy and stability of 30,000
configurations every nanc it would take
200 times the age of the universe to
find the correct
structure refusing to give up University
of Maryland professor John molt started
a competition called Casp in
1994 the challenge was simple to design
a computer model that could take an
amino acid sequence and output its
structure the modelers would not know
the correct structure beforehand but the
output from each model would be compared
to the experim experimentally determined
structure a perfect match would get a
score of 100 but anything over 90 was
considered close enough that the
structure was solved Casp competitors
gathered at an old wooden Chapel turn
Conference Center in Monterey California
and at any point where a prediction
didn't make sense they were encouraged
to tap their feet as friendly banter
there was a lot of foot
tapping in the first year teams could
not achieve scores higher than 40 the
early front runner was an algorithm
called Rosetta created by University of
Washington biologist David Baker one of
his Innovations was to boost computation
by pulling together processing power
from idle computers in homes schools and
libraries that volunteered to install
his software called Rosetta at home as
part of it there was a screen saver that
showed basically the course of the of
the protein folding calculation and then
we started getting people writing in
saying that they were watching the
screen saver and they thought they could
do better than the
computer so Baker had an idea he created
a video
game the game called fold it set up a
protein chain capable of twisting and
turning into different Arrangements but
now instead of the computer making the
moves uh the game players the humans
could make the moves within 3 weeks more
than 50,000 Gamers pulled their efforts
to decipher an enzyme that plays a key
role in HIV x-ray crystallography showed
their results was correct The Gamers
even got credited as co-authors on the
research
paper now one man who played fold it was
a former Child chess Prodigy named Demis
hassabis hassabis had recently started
an AI company called deepmind there AI
algorithm alphago made headlines for
beating world champion Lee settle at the
game of Go one of alphago's moves move
37 shook s all to his core but hbus
never forgot about his time as a f gamer
so of course I was fascinated this just
from games design perspective you know
wouldn't it be amazing if we could mimic
the intuition of these Gamers who were
only by the way of course
amist after returning from Korea Deep
Mind researchers had a week-long
hackathon where they tried to train AI
to play fold it this was the beginning
of haab's long-standing goal of using AI
to advance science he initiated a new
project called Alpha fold to solve the
protein folding problem
meanwhile at Casp the quality of
prediction from the best performers
including Rosetta had
plateaued in fact the performance went
downhill after Casp
8 the predictions weren't good enough
even with faster computers and a growing
number of structures in the protein
datab Bank to train on Deep Mind hoped
to change this with Alpha
fold its first iteration Alpha fold 1
was a standard off-the-shelf deep neural
network like the ones used for computer
vision at that time the researchers
trained it on lots and lots of protein
structures from the protein datab Bank
as input alphafold took the protein's
amino acid sequence and an important set
of Clues given by
Evolution evolution is driven by
mutations changes in the genetic code
which in turn change the amino acids
within a given protein sequence but as
species evolve proteins need to retain
the shape that allows them to perform
their specific function for instance
hemoglobin looks the same in humans cats
horses and basically any mammal
Evolution says if it ain't broke don't
fix it so we can compare sequences of
the same protein across different
species in this evolutionary table where
sequences are similar it's likely they
are important in the protein structure
and function but even where the
sequences are different it's helpful to
look at where mutations happen in pairs
because they can identify which amino
acids are close to each other in the
final structure say two amino acids a
positively charged lysine and a
negatively charged glutamic acid attract
and hold each other in the folded
protein now if a mutation changes lysine
to a negatively charged amino acid it
would repel glutamic acid and
destabilize the whole protein therefore
another mutation must replace glutamic
acid with a positively charged amino
acid this is known as co-evolution these
evolutionary tables were an important
input for Alpha fold
[Music]
as output instead of directly producing
a 3D structure Alpha fold predicted a
simpler 2D pair representation of that
structure the amino acid sequence is
laid out horizontally and vertically
Whenever two amino acids are close to
each other in the final structure their
corresponding row column intersection is
bright distant amino acid pairs are
dim in addition to distances the pair
representation can also hold information
on how amino acid molecules are twisted
within the
structure Alpha fold 1 fed the protein
sequence and its evolutionary table into
its deep neural network which it had
trained to predict the pair
representation once it had this a
separate algorithm folded the amino acid
string based on the distance and torsion
constraints and this was the final
protein structure
prediction with this framework Alpha
fold entered Casp 13 and it immediately
turned heads it was the clear winner
after many
additions but it wasn't perfect its
score of 70 was not enough to clear the
Casp threshold of
90 Deep Mind needed to get back to the
drawing board to get better results so
hbus recruited John jumper to lead
alphafold alphafold 2 was really a
system about designing our deep learning
the individual blocks to be good at
learning about proteins have the types
of geometric physical evolutionary
Concepts that were needed and put it
into the middle of the network instead
of a process around it and that was a
tremendous accuracy boost there were
three key steps to get better results
with AI first Maximum compute power here
Deep Mind was already better positioned
than anybody in the world it had access
to the enormous computing power of
Google including their tensor processing
units second they needed a large and
diverse data set is data the biggest
roadblock
and and why I think it's too easy to say
data is the roadblock and we should be
careful about it Alpha 2 was trained on
the exact same data with much much
better machine learning as Alpha fold
one so everyone overestimates the data
blockage because it it gets less severe
with uh better machine learning and that
was the third key element better AI
algorithms now ai is not just good at
protein folding it can do all kinds of
tasks that no one likes from emails to
answering phone calls something I hate
is building and maintaining a website
it's so much work from optimizing the
website for different platforms finding
a good design so it looks professional
to constantly updating it with new
information about the business as it
grows that's why we partnered with
hostinger the sponsor of today's video
hostinger makes it super easy to build a
website for yourself or your business
and with their Advanced AI tools you can
simply describe what you want your
website to look like and in just a few
seconds your personalized website is up
and running hostinger is designed to be
as easy as possible for beginners and
professionals so any tweaks you need to
make after that are super easy too just
drag and drop any pictures or videos you
want where you want them or just type
what you want to say or have the AI help
you here too if writing isn't your thing
either and if you still want that human
touch hostinger is always available with
24/7 support if you ever run into any
issues but when you're done building and
just a few clicks your website is live
it's all incredibly affordable too with
a domain and business email included for
free so to take your big idea online
today visit hostinger.com slve or scan
this QR code right here and when you
sign up remember to use code ve at
checkout to get 10% off your plan I want
to thank hostinger for sponsoring this
part of the video and now back to
protein
folding as the alpha fold 2 Team
searched for better algorithms they
turned to the Transformer that's the T
in chat GPT and it relies on a concept
called attention in the sentence the
animal didn't cross the street because
it was too tired attention recognizes
that it refers to animal and not Street
based on the word tired attention adds
context to any kind of sequential
information by breaking it down into
chunks converting these into numerical
representations or embeddings and making
connections between them in this case
the word it an animal
three blue one brown has a great series
of videos specifically about
Transformers and
attention large language models use
attention to predict the most
appropriate word to add to a sentence
but alphafold also has sequential
information not sentences but amino acid
sequences and to analyze them the
alphafold team built their own version
of the Transformer called an EVO
forer the Evo former contained Two
Towers evolutionary information in the
biology Tower Tower and pair
representations in the geometry Tower
gone was alphafold 1's deep neural
network that started with one Tower and
predicted the other instead alphafold
2's Evo former builds each Tower
separately it starts with some initial
guesses evolutionary tables taken from
known data sets as before and the pair
representations based on similar known
proteins and this time there's a bridge
connecting the two towers that conveys
newly found biological and geometry
Clues back and forth
in the biology Tower attention applied
on a column identifies amino acid
sequences that have been conserved while
along a row it finds amino acid
mutations that have occurred together
whenever the Evo forer finds too closely
linked amino acids in the evolutionary
table it means they are important to
structure and it sends this information
to the geometry Tower here attention is
applied to help calculate distances
between amino acids there's also this
thing um called triangular retention
that got introduced um which is
essentially about letting triplets
attend to each other for each triplet of
amino acids Alpha fold applies the
triangle inequality the sum of two sides
must be greater than the third this
constrains how far apart these three
amino acids can be this information is
used to update the pair representation
and that helps the model produce like a
self-consistent picture of the structure
if the geometry Tower finds it's
impossible for two amino acids to be
close to each other then it tells the
first Tower to ignore their relationship
in The evolutionary table this exchange
of information within the Evo former
goes on for 48 times until information
within both Towers is refined the
geometrical features learned by this
network are passed onto alphafold 2's
second main Innovation the structure
module for each amino acid we pick three
special atoms in the amino acid and say
that those Define a frin and what the
network does is it imagines that all the
amino acids start out at the origin and
it has to predict the appropriate
translation and rotation to move these
frames to where they sit in the the real
structure so that's essentially what the
structure module does but the thing that
sets the structure module apart is what
it doesn't do previously people might
have imagined that you would like to
encode the fact that this is a chain you
know and that um you know certain
residues should sit next to each other
we don't really explicitly tell Alpha
fold that it's more like we give it a
bag of amino acids and it's allowed to
position each of them separately and and
some people have thought that that um
helps it to not get stuck in terms of um
where things should be placed it doesn't
have to always be thinking about the
constraint of these things forming a
chain that's something that emerges
naturally later that's why live Alpha
fold folding videos can show it doing
some weirdly non-physical
stuff the structure module outputs a 3D
protein but it still isn't ready it's
recycled at least three more times
through the Evo former to gain a deeper
understanding of the protein only then
the final prediction is
made in December 2020 deep mine returned
to a virtual Casp with Alpha fold 2 and
this time they did it I'm going to read
an email from John malt your group has
performed amazingly well in C 14 both
relative to other groups and an absolute
model accuracy congratulations on this
work for many protein Alpha 2
predictions were virtually
indistinguishable from the actual
structures and they finally beat the
gold standard score of
90 for me having worked on this problem
so long after many many stops and starts
and suddenly this is a solution we'
solve the problem this gives you such
excitement about the way science works
over six decades all of the scientists
working around the world on proteins
painstakingly found found about
150,000 protein structures then in one
Fell Swoop Alpha fold came in and
unveiled over 200 million of them nearly
all proteins known to exist in nature in
just a few months alphafold Advanced the
work of research Labs worldwide by
several
decades it has directly helped us
develop a vaccine for malaria it's made
possible the breaking down of antibiotic
resistance enzyme which make many
life-saving drugs effective again it's
even helped us understand how protein
mutations lead to various diseases from
schizophrenia to cancer and biologists
studying little known and endangered
species suddenly had access to proteins
and their life
mechanism the alphafold 2 paper has been
cited over 30,000 times it has truly
made a step function leap in our
understanding of Life John jumper and
Demis aabus were awarded one half of the
2024 Nobel priz in chemistry for this
breakthrough the other half went to
David Baker but not for predicting
structures using Rosetta instead it was
for Designing completely new proteins
from scratch it was really hard to make
brand new proteins that would do things
and so that's kind of the problem that
we solved to do so he uses the same kind
of generative AI that makes art in
programs like DOI you can say draw a
picture of a kangaroo riding on a rabbit
or something and it will do that and so
it's exactly what we did with proteins
his his technique called RF diffusion is
trained by adding random noise to a
known protein structure and then the AI
has to remove this noise once trained in
this way the AI can be asked to produce
proteins for various functions it's
given a random noise input and the AI
figures out a brand new protein that
does what you asked it to
do this work has huge implications I
mean imagine you got bitten by a
venomous snake if you're lucky you'll
have access to antivenom prepared by
milking Venom from the exact kind of
snake which is then injected into live
animals and the antibodies from that
animal are extracted and refined and
then given to you as an antivenom the
trouble is often people have allergic
reactions to these antibodies from other
organisms but your odds of survival can
be a lot better with the latest
synthetic proteins designed in baker's
lab they've created human compatible
antibodies that can neutralize lethal
snake venom this antivenom could be Manu
factured in large quantities and easily
transported to the places where it's
needed with these tiny molecular
machines the possibilities are endless
what are the applications you're most
excited about so I think vaccines are
going to be really powerful we have a
number of proteins that are in human
clinical trials for cancer and we're
working on autoimmune disease now we're
really excited about problems like
capturing greenhouse gases so we're
designing enzymes that can fix methane
um break down plastic what makes this
approach so effective is how fast they
can create and iterate the proteins it's
really quite miraculous um for anyone
who's a conventional old school
biochemist or protein scientist we can
now have designs on the computer get the
amino acid sequence of the design
proteins and then in just a couple days
we can uh get the get the protein out
yeah we've given a name to this which is
Cowboy biochemistry because we just like
we you just got kind of go for it as
fast as you can and it turns out to work
pretty well what a has done for proteins
is just a hint of what it can do in
other fields and on larger scales in
Material Science for example deep mind's
gnome program has found 2.2 million new
crystals including over 400,000 stable
materials that could power future
Technologies from superconductors to
batteries AI is creating transformative
leaps in science by helping to solve
some of the fundamental problems that
have blocked human progress if you think
of the whole tree of knowledge you know
there are certain problems where you
know if they root no problems if you
unlock them if you discover a solution
to them it would unlock a whole new
Branch or Avenue of Discovery and with
this AI is pushing forward the
boundaries of human knowledge at a rate
never seen before you know speed UPS of
2x are nice they're great we love them
speed UPS of 100,000 times change what
you do you do fundamentally different
stuff and you start to rebuild your
science around the things that got easy
and that's what I'm excited about these
discoveries represent real step function
changes in science even if AI doesn't
Advance Beyond where it is today we will
be reaping the benefits of these
breakthroughs for decades and assuming
AI does continue to develop well it will
open up opportunities that were
previously thought impossible whether
that's curing all diseases creating
novel materials or restoring the
environment to a pristine State this
sounds like an amazing future as long as
the AI doesn't take over and destroy us
all first
[Music]