AlphaFold - The Most Useful Thing AI Has Ever Done
P_fHJIYENdI • 2025-02-10
Transcript preview
Open
Kind: captions Language: en what if all of the world's biggest problems from climate change to curing diseases to disposal of plastic waste what if they all had the same solution a solution so tiny it would be invisible I'm inclined to believe this is possible thanks to a recent breakthrough that solved one of the biggest problems of the last century how to determine the structure of a protein it's been described to me as as equivalent to Fermat's Last Theorem but for biology over six decades tens of thousands of biologists painstakingly worked at the structure of 150,000 proteins then in just a few years a team of around 15 determined the structure of 200 million that's basically every protein known to exist in nature so how did they do it and why does this have the potential to solve problems way outside the realm of biology a protein starts simply as a string of amino acids each amino acid has a carbon atom at the center then on one side is an amine group and on the other side is a carboxy group and the last thing it's bonded to could be one of 20 different side chains and which one determines which of the 20 different amino acids this molecule is the Amin group from one amino acid can react with the carboxy group of another to form a peptide bond so a series of amino acids can bond to form a string and pushing and pulling between countless molecules electrostatic forces hydrogen bonds solvent interactions can cause this string to coil up and fold onto itself this ultimately determines the 3D structure of the protein and this shape is the thing that really matters about the protein it's built for a specific purpose like how hemoglobin has the perfect binding sight to carry around oxygen in your blood these are machines they need to be in their correct orientation in order to work together to move for example the proteins in your muscles they change their shape a little bit in order to pull and contract but it would take people a long time to get the structure of just one protein absolutely so what should proteins look like uh was only started to answer really with experimental techniques the first way protein structure was determined was by creating a crystal out of that protein this was then exposed to x-rays to get a defraction pattern and then scientists would work backwards to try to figure out what shape of molecules would create such a pattern it took British biochemist John kendrew 12 years to get the first protein structure his Target was an oxygen storing protein called myoglobin an important protein in our hearts he first tried a horse heart but this produced rather small crystals because it didn't have enough myoglobin he knew diving mammals would have lots of myoglobin in their muscles since they the best at conserving oxygen so he obtained a huge chunk of whale meat from Peru this finally gave kendrew large enough crystals to create an x-ray defraction image and when it came out it looked really weird people expected something kind of logical mathematical understandable and it almost looked I wouldn't say ugly but intricate and complex and kind of like if you see a rocket motor right and all the parts hanging off this structure which has been called turd of the century one can drew the 1962 Nobel Prize in chemistry over the next two decades only around a 100 more structures were resolved even today protein crystallization remains a big challenge frankly you know it is not uncommon that just a couple uh protein structures can be someone's entire PhD sometimes just one sometimes even just progress toward one and it's expensive x-ray crystallography can cost tens of thousands of dollars per protein so scientists sought another way to work out protein structure it only costs around $100 to find a protein sequence of amino acids so if you could use this to figure out how the protein would fold that would save a lot of time effort and money I kind of know how carbon behaves and I know how a carbon sticks to a sulfur and how that might you know stick next to a nitrogen and if these ones are here then I can imagine this one folding making that Bond there so it seems like if you have some sense of basic molecular Dynamics you might be able to figure out how this protein is going to fold one of the few true predictions in biology was actually lonus Pauling looking at just the geometry of the building blocks of proteins and saying say actually they should make huses and sheets that's what we call secondary structure the very local kind of twists and turns of the protein but beyond helices and sheets biochemists could not figure out any reliable patterns that would lead to the final structure of all proteins one reason for this is that Evolution didn't design proteins from the ground up it's kind of like a programmer that doesn't know what they're doing and whenever it looked good they just kept adding that kind of thing and that's that's how you end up with these both amazing objects and incredibly complex and hard to describe they don't have purpose underneath them in the same way as like a human designed um machinewood to illustrate just how complicated this process can get MIT biologist Cyrus lenthal did a back of the envelope calculation and he showed that even a short protein chain with 35 amino acids can fold in an astronomical number of ways so even if a computer checked the energy and stability of 30,000 configurations every nanc it would take 200 times the age of the universe to find the correct structure refusing to give up University of Maryland professor John molt started a competition called Casp in 1994 the challenge was simple to design a computer model that could take an amino acid sequence and output its structure the modelers would not know the correct structure beforehand but the output from each model would be compared to the experim experimentally determined structure a perfect match would get a score of 100 but anything over 90 was considered close enough that the structure was solved Casp competitors gathered at an old wooden Chapel turn Conference Center in Monterey California and at any point where a prediction didn't make sense they were encouraged to tap their feet as friendly banter there was a lot of foot tapping in the first year teams could not achieve scores higher than 40 the early front runner was an algorithm called Rosetta created by University of Washington biologist David Baker one of his Innovations was to boost computation by pulling together processing power from idle computers in homes schools and libraries that volunteered to install his software called Rosetta at home as part of it there was a screen saver that showed basically the course of the of the protein folding calculation and then we started getting people writing in saying that they were watching the screen saver and they thought they could do better than the computer so Baker had an idea he created a video game the game called fold it set up a protein chain capable of twisting and turning into different Arrangements but now instead of the computer making the moves uh the game players the humans could make the moves within 3 weeks more than 50,000 Gamers pulled their efforts to decipher an enzyme that plays a key role in HIV x-ray crystallography showed their results was correct The Gamers even got credited as co-authors on the research paper now one man who played fold it was a former Child chess Prodigy named Demis hassabis hassabis had recently started an AI company called deepmind there AI algorithm alphago made headlines for beating world champion Lee settle at the game of Go one of alphago's moves move 37 shook s all to his core but hbus never forgot about his time as a f gamer so of course I was fascinated this just from games design perspective you know wouldn't it be amazing if we could mimic the intuition of these Gamers who were only by the way of course amist after returning from Korea Deep Mind researchers had a week-long hackathon where they tried to train AI to play fold it this was the beginning of haab's long-standing goal of using AI to advance science he initiated a new project called Alpha fold to solve the protein folding problem meanwhile at Casp the quality of prediction from the best performers including Rosetta had plateaued in fact the performance went downhill after Casp 8 the predictions weren't good enough even with faster computers and a growing number of structures in the protein datab Bank to train on Deep Mind hoped to change this with Alpha fold its first iteration Alpha fold 1 was a standard off-the-shelf deep neural network like the ones used for computer vision at that time the researchers trained it on lots and lots of protein structures from the protein datab Bank as input alphafold took the protein's amino acid sequence and an important set of Clues given by Evolution evolution is driven by mutations changes in the genetic code which in turn change the amino acids within a given protein sequence but as species evolve proteins need to retain the shape that allows them to perform their specific function for instance hemoglobin looks the same in humans cats horses and basically any mammal Evolution says if it ain't broke don't fix it so we can compare sequences of the same protein across different species in this evolutionary table where sequences are similar it's likely they are important in the protein structure and function but even where the sequences are different it's helpful to look at where mutations happen in pairs because they can identify which amino acids are close to each other in the final structure say two amino acids a positively charged lysine and a negatively charged glutamic acid attract and hold each other in the folded protein now if a mutation changes lysine to a negatively charged amino acid it would repel glutamic acid and destabilize the whole protein therefore another mutation must replace glutamic acid with a positively charged amino acid this is known as co-evolution these evolutionary tables were an important input for Alpha fold [Music] as output instead of directly producing a 3D structure Alpha fold predicted a simpler 2D pair representation of that structure the amino acid sequence is laid out horizontally and vertically Whenever two amino acids are close to each other in the final structure their corresponding row column intersection is bright distant amino acid pairs are dim in addition to distances the pair representation can also hold information on how amino acid molecules are twisted within the structure Alpha fold 1 fed the protein sequence and its evolutionary table into its deep neural network which it had trained to predict the pair representation once it had this a separate algorithm folded the amino acid string based on the distance and torsion constraints and this was the final protein structure prediction with this framework Alpha fold entered Casp 13 and it immediately turned heads it was the clear winner after many additions but it wasn't perfect its score of 70 was not enough to clear the Casp threshold of 90 Deep Mind needed to get back to the drawing board to get better results so hbus recruited John jumper to lead alphafold alphafold 2 was really a system about designing our deep learning the individual blocks to be good at learning about proteins have the types of geometric physical evolutionary Concepts that were needed and put it into the middle of the network instead of a process around it and that was a tremendous accuracy boost there were three key steps to get better results with AI first Maximum compute power here Deep Mind was already better positioned than anybody in the world it had access to the enormous computing power of Google including their tensor processing units second they needed a large and diverse data set is data the biggest roadblock and and why I think it's too easy to say data is the roadblock and we should be careful about it Alpha 2 was trained on the exact same data with much much better machine learning as Alpha fold one so everyone overestimates the data blockage because it it gets less severe with uh better machine learning and that was the third key element better AI algorithms now ai is not just good at protein folding it can do all kinds of tasks that no one likes from emails to answering phone calls something I hate is building and maintaining a website it's so much work from optimizing the website for different platforms finding a good design so it looks professional to constantly updating it with new information about the business as it grows that's why we partnered with hostinger the sponsor of today's video hostinger makes it super easy to build a website for yourself or your business and with their Advanced AI tools you can simply describe what you want your website to look like and in just a few seconds your personalized website is up and running hostinger is designed to be as easy as possible for beginners and professionals so any tweaks you need to make after that are super easy too just drag and drop any pictures or videos you want where you want them or just type what you want to say or have the AI help you here too if writing isn't your thing either and if you still want that human touch hostinger is always available with 24/7 support if you ever run into any issues but when you're done building and just a few clicks your website is live it's all incredibly affordable too with a domain and business email included for free so to take your big idea online today visit hostinger.com slve or scan this QR code right here and when you sign up remember to use code ve at checkout to get 10% off your plan I want to thank hostinger for sponsoring this part of the video and now back to protein folding as the alpha fold 2 Team searched for better algorithms they turned to the Transformer that's the T in chat GPT and it relies on a concept called attention in the sentence the animal didn't cross the street because it was too tired attention recognizes that it refers to animal and not Street based on the word tired attention adds context to any kind of sequential information by breaking it down into chunks converting these into numerical representations or embeddings and making connections between them in this case the word it an animal three blue one brown has a great series of videos specifically about Transformers and attention large language models use attention to predict the most appropriate word to add to a sentence but alphafold also has sequential information not sentences but amino acid sequences and to analyze them the alphafold team built their own version of the Transformer called an EVO forer the Evo former contained Two Towers evolutionary information in the biology Tower Tower and pair representations in the geometry Tower gone was alphafold 1's deep neural network that started with one Tower and predicted the other instead alphafold 2's Evo former builds each Tower separately it starts with some initial guesses evolutionary tables taken from known data sets as before and the pair representations based on similar known proteins and this time there's a bridge connecting the two towers that conveys newly found biological and geometry Clues back and forth in the biology Tower attention applied on a column identifies amino acid sequences that have been conserved while along a row it finds amino acid mutations that have occurred together whenever the Evo forer finds too closely linked amino acids in the evolutionary table it means they are important to structure and it sends this information to the geometry Tower here attention is applied to help calculate distances between amino acids there's also this thing um called triangular retention that got introduced um which is essentially about letting triplets attend to each other for each triplet of amino acids Alpha fold applies the triangle inequality the sum of two sides must be greater than the third this constrains how far apart these three amino acids can be this information is used to update the pair representation and that helps the model produce like a self-consistent picture of the structure if the geometry Tower finds it's impossible for two amino acids to be close to each other then it tells the first Tower to ignore their relationship in The evolutionary table this exchange of information within the Evo former goes on for 48 times until information within both Towers is refined the geometrical features learned by this network are passed onto alphafold 2's second main Innovation the structure module for each amino acid we pick three special atoms in the amino acid and say that those Define a frin and what the network does is it imagines that all the amino acids start out at the origin and it has to predict the appropriate translation and rotation to move these frames to where they sit in the the real structure so that's essentially what the structure module does but the thing that sets the structure module apart is what it doesn't do previously people might have imagined that you would like to encode the fact that this is a chain you know and that um you know certain residues should sit next to each other we don't really explicitly tell Alpha fold that it's more like we give it a bag of amino acids and it's allowed to position each of them separately and and some people have thought that that um helps it to not get stuck in terms of um where things should be placed it doesn't have to always be thinking about the constraint of these things forming a chain that's something that emerges naturally later that's why live Alpha fold folding videos can show it doing some weirdly non-physical stuff the structure module outputs a 3D protein but it still isn't ready it's recycled at least three more times through the Evo former to gain a deeper understanding of the protein only then the final prediction is made in December 2020 deep mine returned to a virtual Casp with Alpha fold 2 and this time they did it I'm going to read an email from John malt your group has performed amazingly well in C 14 both relative to other groups and an absolute model accuracy congratulations on this work for many protein Alpha 2 predictions were virtually indistinguishable from the actual structures and they finally beat the gold standard score of 90 for me having worked on this problem so long after many many stops and starts and suddenly this is a solution we' solve the problem this gives you such excitement about the way science works over six decades all of the scientists working around the world on proteins painstakingly found found about 150,000 protein structures then in one Fell Swoop Alpha fold came in and unveiled over 200 million of them nearly all proteins known to exist in nature in just a few months alphafold Advanced the work of research Labs worldwide by several decades it has directly helped us develop a vaccine for malaria it's made possible the breaking down of antibiotic resistance enzyme which make many life-saving drugs effective again it's even helped us understand how protein mutations lead to various diseases from schizophrenia to cancer and biologists studying little known and endangered species suddenly had access to proteins and their life mechanism the alphafold 2 paper has been cited over 30,000 times it has truly made a step function leap in our understanding of Life John jumper and Demis aabus were awarded one half of the 2024 Nobel priz in chemistry for this breakthrough the other half went to David Baker but not for predicting structures using Rosetta instead it was for Designing completely new proteins from scratch it was really hard to make brand new proteins that would do things and so that's kind of the problem that we solved to do so he uses the same kind of generative AI that makes art in programs like DOI you can say draw a picture of a kangaroo riding on a rabbit or something and it will do that and so it's exactly what we did with proteins his his technique called RF diffusion is trained by adding random noise to a known protein structure and then the AI has to remove this noise once trained in this way the AI can be asked to produce proteins for various functions it's given a random noise input and the AI figures out a brand new protein that does what you asked it to do this work has huge implications I mean imagine you got bitten by a venomous snake if you're lucky you'll have access to antivenom prepared by milking Venom from the exact kind of snake which is then injected into live animals and the antibodies from that animal are extracted and refined and then given to you as an antivenom the trouble is often people have allergic reactions to these antibodies from other organisms but your odds of survival can be a lot better with the latest synthetic proteins designed in baker's lab they've created human compatible antibodies that can neutralize lethal snake venom this antivenom could be Manu factured in large quantities and easily transported to the places where it's needed with these tiny molecular machines the possibilities are endless what are the applications you're most excited about so I think vaccines are going to be really powerful we have a number of proteins that are in human clinical trials for cancer and we're working on autoimmune disease now we're really excited about problems like capturing greenhouse gases so we're designing enzymes that can fix methane um break down plastic what makes this approach so effective is how fast they can create and iterate the proteins it's really quite miraculous um for anyone who's a conventional old school biochemist or protein scientist we can now have designs on the computer get the amino acid sequence of the design proteins and then in just a couple days we can uh get the get the protein out yeah we've given a name to this which is Cowboy biochemistry because we just like we you just got kind of go for it as fast as you can and it turns out to work pretty well what a has done for proteins is just a hint of what it can do in other fields and on larger scales in Material Science for example deep mind's gnome program has found 2.2 million new crystals including over 400,000 stable materials that could power future Technologies from superconductors to batteries AI is creating transformative leaps in science by helping to solve some of the fundamental problems that have blocked human progress if you think of the whole tree of knowledge you know there are certain problems where you know if they root no problems if you unlock them if you discover a solution to them it would unlock a whole new Branch or Avenue of Discovery and with this AI is pushing forward the boundaries of human knowledge at a rate never seen before you know speed UPS of 2x are nice they're great we love them speed UPS of 100,000 times change what you do you do fundamentally different stuff and you start to rebuild your science around the things that got easy and that's what I'm excited about these discoveries represent real step function changes in science even if AI doesn't Advance Beyond where it is today we will be reaping the benefits of these breakthroughs for decades and assuming AI does continue to develop well it will open up opportunities that were previously thought impossible whether that's curing all diseases creating novel materials or restoring the environment to a pristine State this sounds like an amazing future as long as the AI doesn't take over and destroy us all first [Music]
Resume
Categories