Dmitry Korkin: Evolution of Proteins, Viruses, Life, and AI

Dmitry Korkin: Evolution of Proteins, Viruses, Life, and AI | Lex Fridman Podcast #153

I51DuprOb0o • 2021-01-11

Transcript preview

Open

Kind: captions
Language: en
the following is a conversation with
dmitry korkin his second time in the
podcast
he's a professor of bioinformatics and
computational biology at wpi
where he specializes in bioinformatics
of complex disease
computational genomics systems biology
and biomedical data analytics
he loves biology he loves computing
plus he is russian and recites a poem in
russian
at the end of the podcast what else
could you possibly
ask for in this world quick mention of
our sponsors
brave browser netsuite business
management software
magic spoon low carb cereal and eight
sleep
self cooling mattress so the choice is
browsing privacy
business success healthy diet or
comfortable sleep
choose wisely my friends and if you wish
click the sponsor links
below to get a discount and to support
this podcast
as a side note let me say that to me the
scientists that did the best
apolitical impactful brilliant work of
2020 are the
biologists who study viruses without an
agenda
without much sleep to be honest just a
pure passion for scientific discovery
and exploration of the mysteries within
viruses
viruses are both terrifying and
beautiful terrifying because they can
threaten the fabric of human
civilization
both biological and psychological
beautiful
because they give us insights into the
nature of life
on earth and perhaps even
extraterrestrial life
of the not so intelligent variety that
might meet us one day
as we explore the habitable planets and
moons in our universe
if you enjoy this thing subscribe on
youtube review it on apple podcast
follow on spotify support on patreon or
connect with me on twitter
at lex friedman and now here's my
conversation
with dmitry korkin it's often
said that proteins and the amino acid
residues that
make them up are the building blocks of
life do you think of proteins in this
way as the
uh basic building blocks of life yes and
no
so the proteins indeed is the the basic
unit biological unit that
carries out uh important functioning of
the cell
however through studying the proteins
and comparing the proteins
across different species across these
different kingdoms
you realize that uh proteins are
actually a more
much more complicated so they
have so-called modular complexity
and so what i mean by that is
an average protein consists of
of several structural units
so we call them protein domains and so
you can imagine a protein
as a string of beads where each bead is
a protein domain
and uh you know in the past
20 years scientists have been studying
uh the nature of the protein domains
because
we realize that it's it's it's the unit
because if you look at the functions
right so so
uh many proteins have more than one
function
and those protein functions uh are often
carried out by those protein domains
so we also see that
in the evolution those proteins domains
get
shuffled so so they act actually as as
the unit
also from the structural perspective
right so you know
some people think of a protein as a sort
of a globular
molecule but as a matter of fact is is
the globular part of this protein
is the protein domain so we we often
have
this uh you know again the the the
collection of this protein domains align
on a string as beads and uh the protein
domains are made up of amino acid
residues so
it's it's so this is the basic build so
you're saying the protein domain
is the basic building block of the
function that we think about
proteins doing so of course you can
always talk about different building
blocks
turtles all the way down but it's
there's a point where there is
at the point of the hierarchy where it's
the most the cleanest
element block
based on which you can put them together
in different kinds of ways
to form complex function and you're
saying protein domains
why is that not talked about as often in
popular
culture well you know there are several
perspectives
on this um and one of course is the
historical perspective right
so historically scientists
have been able to structurally resolved
to obtain the 3d coordinates of
a protein for uh you know for
smaller proteins and smaller proteins
tend to be
a single domain protein so we have a
protein equal to a protein domain and
so so because of that the initial
suspicion was that
the the the proteins are they have
globular shapes
and the more of smaller proteins
you obtain structurally the more you
were
you became convinced that that's that's
the the case
and only later when uh we had
we started having um you know uh
alternative approaches so you know the
the traditional uh the traditional ones
are
x-ray crystallography and nmr
spectroscopy so this is sort of the
the the two main techniques uh
that uh give us the 3d coordinates but
nowadays uh there is huge breakthrough
in
uh cry electron microscopy so the the
more advanced methods that allow us to
uh
you know to get into the uh you know 3d
shapes of much larger molecules
molecular complexes
to give you uh one of the common
examples
uh for this year right so so the the
first
experimental structure of a sars
cove to protein was the cribium
structure
of the s protein so the spike protein
and so it was solved
very quickly and the reason for that is
the
advancement of the uh of this technology
is is pretty spectacular how many
domains does the
uh is it more than one domain oh yes oh
yes i mean so so
it's it's a very complex structure and
we you know on top of the complexity
of a single protein right so this
this structure is actually is a complex
it's a trimer
so it needs to form a trimer in order to
function properly what's a complex
so a complex is agglomeration of
multiple proteins
and so we can have the same
protein copied in multiple uh you know
made up in multiple copies and forming
something that we called uh
a homo oligomer homo means the same
right
so so in this case so uh uh sp the spike
protein is the
is an example of a homo tetramer uh
homotrimer sorry
so these three copies of the three
copies in order to
exactly we have the these three chains
the the three molecular chains uh
coupled uh together and performing the
the function that's what when when you
look at this protein from from the top
you see a perfect triangle
yeah so uh but other uh you know so
other complexes are made up
of um you know different proteins
uh some of them are completely different
some of them are similar the the
hemoglobin molecule right so it's
actually
it's a protein complex it's made of
four basic subunits two of them
uh are identical to each other and two
other identical to each other
but they are also similar to each other
which sort of
uh gives us some ideas about the
evolution of this
uh you know uh of this uh molecule and
uh perhaps so one of the hypothesis is
that you know in the past
it was just a homo tetramer
right so four identical comp uh copies
and then it became
you know uh sort of uh modify
it became mutated over the time and and
became more specialized can we
linger on the spike protein for a little
bit is is there something interesting
or like beautiful you find about it i
mean first of all it's
an incredibly challenging protein and so
we as a part of
uh our sort of research
to understand the structural basis
of this virus to sort of decode
structured decode
every single protein in its proteome
uh which you know we've been working on
the spike uh
protein and uh one of the main
challenges was that
um the cryovm
data allows us to
reconstruct or to obtain the 3d
coordinates of roughly
two thirds of the protein the
rest of the one-third of this protein
it's a part that
uh is buried into the into the membrane
of the virus and uh of the of the
viral envelope and uh it also has a lot
of
unstable structures around it so it's
chemically interacting somehow with
whatever the heck is connecting yeah so
so it people are still trying to
understand so so the the nature of
and the the role of this uh you know uh
of this uh one third because
the the top part uh you know the the
primary function is to get attached to
the
you know h2 receptor human receptor
there is also beautiful you know
mechanics of how this thing happens
right so
because there are three different copies
of this uh
chains or you know there are three
different
domains right so we're talking about
domains so this is the receptor binding
domains rpgs
that gets untangled and
get ready to to to atta to get attached
to
to the receptor and now
they are not necessarily going
in a sync mode as a matter of fact
it's asynchronous yeah so yes so and
this is this is where you know
the another level of complexity comes
into play
because you know right now what we see
is
we typically see just one of the arms
going out and getting ready to to at a
time to be attached to the
uh to the ac2 receptors however
there was a recent mutation that uh
people studied in that spike protein
and a
very recently a group from umass medical
school
uh we happened to collaborate with
groups so this is a group of
jeremy lubin and a number of uh
other faculty um they uh
actually uh solved the uh
the mutated structure of the spike and
they showed that
actually because of these mutations you
have
more than one arms
opening up and so now so you so the
frequency
of two arms going up
increa increase quite you know
drastically how interesting
is that does that change the dynamics
somehow it potentially
can change the dynamics of because now
you have
two possible opportunities to get
attached to the
ac2 receptor it's a very complex
molecular
process mechanistic process but the
first step of this process
is the attachment of this spike protein
of the spike trimer to the
human h2 receptor so this is a molecule
that sits
on the surface of the human cell
and that's essentially what initiates
the what triggers the whole
process of in you know encapsulation if
this was dating this would be the first
date
so this is the uh the way yes
so is it is it possible to have the
spike protein just like floating about
on its own or does it need that
interactive ability with the
uh with the membrane yeah so it needs to
be attached
at least as far as i know but uh you
know when you
get this thing attached on the surface
right there is also a lot of dynamics on
where how it sits on the surface right
so for example
uh there was a recent work in uh again
uh where people use the cry electron
microscopy to get the first glimpse
of the overall structure it's a very low
res but
you still get some interesting details
about this surface about what is
happening inside because we have
literally no clue until
recent work about how the the capsid
is organized so capsid is essentially
it's the inner core of the viral
particle where the uh there is a the
rna of the virus and it's protected by
another protein and protein
that essentially acts as a shield
but you know now we are learning more
and more so
it's actually it's not just this shield
it's you is potentially is used
for the stability of the outer shell of
the
of the virus so it's it's pretty
complicated
and uh i mean understanding all of this
is really useful for trying to figure
out like developing a vaccine or some
kind of
drug to attack any aspects of this right
so i mean there are
many different implications to that i
mean first of all
you know it's it's important to
understand the virus
itself right so you know in order to uh
to understand how it
acts what is the overall mechanism
mechanistic process
of this virus replication of this virus
proliferation to the cell
right so so that's one uh aspect the the
other aspect
is you know designing new treatments
right so one of the
uh possible treatments is uh you know
designing nanoparticles
and so some nanoparticles that will
resemble
the viral shape that would have the
spike integrated
and essentially would act uh as a
competitor to the real virus
by blocking the ace2 receptors
and thus preventing the real virus
entering the cell
now there are also you know there is a
very interesting
direction in looking at the the membrane
at the
envelope portion of the protein and
attacking
its uh m protein so so there are
uh you know to give you a you know sort
of a brief overview
there are four structural proteins these
are the proteins that made up
a structure of the virus
so spike has protein that acts
as a trimer so it needs three copies
e envelope protein that acts as a
pantomer
so it needs five copies to act properly
m uh is a is a membrane protein
at it forms dimers and actually it forms
beautiful lattice and this is something
that we've been studying and we are
seeing it in simulations
it it actually forms a very nice grit or
you know
threads uh you know uh of of different
dimers
attached next to each of copies of each
other and they naturally when you have a
bunch of copies of each other they form
an interesting lattice
exactly and and you know you if you
think about this right so so so
the this complex you know the divi
the viral shape needs to be organized
somehow self-organized somehow
right so it you know if it was a
completely
random process you know you probably
wouldn't have the the
the envelope shell of the so ellipsoid
shape you know you would have something
you know
pretty random right shape so there is
some
you know regularity and how this uh
you know uh how this uh
m dimers get to attach to each other in
a very specific directed way
is that understood at all uh it's not
understood
we are now we we've been working in the
past six months since
you know we met actually this is where
we started working on on trying to
understand the overall
structure of the envelope and the the
key components
that made up this uh you know uh
structure does the envelope also have
the lattice structure
no so so the envelope is essentially is
the outer shell
of the viral particle the n the
nucleocapsid protein
is something that is inside but
get that the n is likely to interact
with
m does it go m and e like where's the e
and
so so e those different proteins
they occur in different copies on the
viral particle so so e this pantomimer
complex we only have two or three
maybe per each particle okay
we have thousand or so of
m dimers that essentially made up uh
that
makes up uh the entire you know
outer shell sure so most of the outer
shell is the m
m dimer and protein when you say
particle that's the viron
the virus the individual single single
element of the virus it's a single virus
single virus
right and we have about you know roughly
50 to 90
spike timers right so so so when you you
know when you
show a per per virus particle per virus
particle
sorry what did you say 50 to 90 50 to 90
right so so this is how this thing is
organized and so now typically right so
you see
this uh uh the the antibodies
that target you know spike proteins
certain parts of the spike protein
but there could be some or also some
treatments right so so these are
you know these are small molecules that
bind strategic parts of these
proteins disrupting its functioning
so one of the promising directions
uh it's one of the newest directions is
actually targeting
the m dimer of the protein
targeting the proteins that make up this
outer shell
because if you are able to destroy the
outer shell
you are essentially destroying the the
the viral particle itself
so preventing it from from you know
functioning
at all so that's you think is uh
from a sort of cyber security
perspective virus security perspective
that's the best attack vector
is uh or like that's a promising attack
vector
i would say yeah so i mean this is still
tons of research needs to be
you know to be done but uh yes i think
you know so there's more attack surface
i guess
more attack surface but you know from
from our analysis from other evolution
analysis
this protein is evolutionary more stable
compared to the say to the spike protein
unstable means a more uh static
target well yeah so so it it it doesn't
change
it doesn't evolve from the evolutionary
perspective
so drastically as for example the spike
protein
there's a bunch of stuff in the news
about mutations of the virus
in the united kingdom i also saw in
south africa
something maybe that was yesterday you
just
kind of mentioned about stability and so
on
which aspects of this are mutatable and
which
aspects if mutated become more dangerous
and maybe even zooming out what are your
thoughts and knowledge and ideas about
the way it's mutated all the news that
we've been hearing are you worried about
it
from a biological perspective are you
worried about it from a human
perspective
so i mean you know mutations are sort of
a general way for these viruses to
evolve
right so so it's you know it's uh
essentially this is the way they evolve
this is
the way they were able to jump
from you know one species to another
we also see uh you know some
recent jumps there were some incidents
of this virus jumping from human to dogs
so you know there is some danger in in
in in those jobs because you know every
time it jumps it also mutates
right so so it when it jumps to
to the uh to the species and jumps back
right so it acquires some mutations that
are sort of
um driven by the
environment of a new host yeah right and
it's different
from the human environment and so we
don't know whether the mutations that
are required
uh in the new species are neutral
with respect to the human host or maybe
you know maybe um damaging yeah change
is always scary
but so you worried about i mean it seems
like because the spread is
during winter niles seems to be
exceptionally high
uh and especially with a vaccine just
around the corner already being actually
deployed there's some worry that there's
this puts evolutionary pressure
selective pressure on the virus
afford to uh to mute for you to mutate
is that us yeah well i mean there is
always this thought
you know in in in the scientists my mind
you know what happened what will happen
right so
uh i know they've been uh they've been
discussions about
sort of the arms race between the you
know the ability
of of the uh of the you know humanity to
uh you know to get vaccinated faster
then the virus you know uh essentially
you know becomes uh you know resistant
to to the vaccine
um i i mean
i don't worry that much
uh simply because uh you know there is
not that much evidence to that to
aggressive mutation around the vaccine
exactly you know obviously there are
mutations around the works
there are vaccines so the reason we get
vaccinated
every year against the season of
mutations
right um but uh
you know i think it's important to study
it
no doubts right so i think one of the
you know to me and again i might be
biased
uh because you know we we've been uh
trying
to to do that as well uh so but one of
the critical directions
in understanding the virus is to uh to
understand its evolution
in order to uh sort of understand the
mechanisms the key mechanisms that
lead the virus to jump you know the
nordic viruses to jump from
species from species to another that
the mechanisms that lead the virus to
become
resistant to accidents also to
treatments
right and hopefully that knowledge was
uh will
enable us to sort of forecast the
evolutionary
uh traces the future evolutionary traces
of those virus
i mean what uh from a biological
perspective this might be a dumb
question but is there
parts of the virus that if uh
souped up like through mutation
could make it more effective at doing
its job we're talking about the specific
coronavirus like yeah because we were
talking about the different like the
membrane
the m protein the e protein the n
and the s the spike
is there some there are 20 or so more
in addition to that but is that is that
a dumb way to look at it like
uh which of these if mutated
could have the greatest impact
potentially
damaging impact on the effectiveness of
the virus so it's actually it's it's a
very good question
because and and the short answer is we
don't know yet
but uh of course there is capacity of
this virus to
to become more efficient the reason for
that
is um you know so if you look at the
virus i mean it's it's a machine right
so it's a machine that
does a lot of different functions and
many of these functions are sort of
nearly perfect but they are not perfect
and
those mutations can make those functions
more perfect for example
the attachment to ace2 receptor right of
the spike
right so uh you know
is it has this virus reached
the efficiency in which the attachment
is carried out or there are some
mutations that
uh that still to be discovered right
that will
make this attachment uh sort of
stronger or you know
something uh more in a way more
efficient
from the point of view of this virus
functioning
that's that's sort of the obvious
example but
if you look at each of these proteins i
mean it's there for a reason it performs
certain function
and it could be that
certain mutations will you know enhance
this function
it could be that some mutations will
make this function
much less efficient right so that's
that's also the case
let's uh since we're talking about the
evolutionary history of a virus uh
let's zoom back out and uh
look at the evolution of proteins i i
glanced at this 2010 nature paper
on the quote ongoing expansion of the
protein universe
and then you know it kind of implies
and uh talks about that uh protein
started with a common ancestor which is
you know kind of interesting it's
interesting thing about like even just
like the first
organic thing that started life on earth
and from that there's now uh you know
what is it 3.5 billion years later
there's now millions of proteins
and they're still evolving and that's
you know in part one of the things that
you're researching
is there something interesting to you
about the evolution
of proteins from this initial ancestor
to today is there something beautiful
insightful about this
long story so i think you know uh if
if i were to pick a single keyword
about uh protein evolution i would pick
modularity something that we talked
about uh
in the in the beginning and that's the
fact that
the proteins are no longer considered
as you know as a sequence of letters
there are hierarchical uh complexities
in the way these proteins are organized
and uh
this complexities are actually going
beyond
the protein sequence it's actually going
all the way back to the
uh to the gene to the nucleotide
sequence
and so you know again these
protein domains they are not only
functional building blocks they are also
evolutionary building blocks
and so what we see in the sort of in the
later stages of evolution
i mean once this stable structurally and
functionally
building blocks were discovered they
essentially
they stay those domains stay
as such so that's why if you start
comparing
different proteins you will see that
many of them
will have similar fragments and those
fragments will correspond to something
that we call
protein domain families and so so they
are still different because you you
still have
mutations and and and and you know
the you know different mutations
are attributed to to you know
diversification of the function
of this uh you know uh protein domains
however
you don't you very rarely see um
you know the the evolutionary events
that would split this domain
into fragments because and it's you know
once you have the the the the domain
split you actually you uh you know
you can completely cancel out its
function
or at the very least you can reduce it
and that's not
you know efficient from the point of
view of the you know of the
cell functioning so so the
the protein domain level is a very
important one
now on top of that right so
if you look at the proteins right so you
have this structural units
and they carry out the function but then
much less is known about things that
connect this
protein domains something that we call
linkers
and those linkers are completely
flexible
you know parts of the protein that
nevertheless carry out
a lot of function it's like little tails
little heads
so so we we do have tails so they called
termini
c and and terminus so these are things
right on the on
on on one and
another ends of the protein sequence so
they are also very important so they
they attribute it to
very specific uh interactions between
the proteins
so but you're referring to the links
between domains
that connect the domains and you know
apart from the just the the uh
simple perspective if you have you know
a very short domain
you have sorry a very short linker you
have two domains
next to each other they are forced to be
next to each other if you have a very
long one
you have the domains that are extremely
flexible
and they carry out a lot of sort of pa
spatial
reorganization right so but
on top of that right just this linker
itself
because it's so flexible it actually can
adapt to a lot of different shapes and
therefore
it's a it's a very good interactor when
it comes to interaction between
this protein and other protein all right
so these things also evolve
you know uh and they in a way
have different law uh sort of uh uh laws
of
uh or the driving laws that
underlie the the evolution because they
no longer need to
uh to preserve certain structure
right uh unlike protein domains and so
now
on top of that you have uh something
that is
even less studied and this is something
that uh
uh attribute to to the concept of
alternative splicing so alternative
splicing so
it's a it's a very cool concept it's
something that uh
uh we've been fascinated about for you
know over a decade
uh in my lab and trying to do research
with that
but so you know so so typically you know
a simplistic
perspective is that one gene is equal
one protein product right so you have a
gene
you know you transcribe it and and
translate it and you
it becomes a protein in reality
when we talk about eukaryotes especially
sort of
more recent eukaryotes that are very
complex
the gene is not it's no longer equal to
one protein it actually can
uh produce multiple
functionally uh you know active
protein products and each of them is
you know is called an alternatively
spliced product
the reason it happens is that if you
look at the gene
it actually has it has also blocks
and the blocks some of which and it
it's essentially it goes like this so we
have a block that will later be
translated we call it exon
then we'll have a block that is not
translated cut out
we call it intron so we have exon intron
exon intro et cetera et cetera et cetera
right so
sometimes you can have uh you know
dozens of these
exons and introns so what happens is
during the the process when the gene is
converted
to rna we have
things that are cut out the introns that
cut out and exons that now get assembled
together
and sometimes we will throw out
some of the exons and the remaining
protein products will become still be
the same different or different right so
so now you have
uh fragments of the protein that no
longer there
they were cut out with the introns
sometimes
you will essentially take one exam and
replace it with another one
right so there's some flexibility in in
this process
so so that creates a whole new level of
complexity
because it's random though is it random
it's it's not random we
and and this is where i think uh now the
the appearance of this modern uh single
cell
uh and and before that tissue
level sequencing next generation
sequencing techniques such as rna-seq
allows us to see that this these are the
events
that often happen in response in it's a
it's a dynamic event
that happens in response to to
disease or in response to certain
developmental stage
of a cell and and this is an incredibly
complex layer that also
undergoes i mean because it's at the
gene level right so it undergoes certain
evolution right and uh
now we have this interplay between
what's happening and what is happening
in the
in the protein world and what is
happening in the
in the gene and you know rna world
and for example you know it's
it's often that we see that the
boundaries of this exons
coincide with the boundaries of the
protein domains
right so there is this you know close
interplay to that
uh it's not always i mean you know
otherwise it would be too simple right
but we do see the connection between
those
sort of machineries and obviously the
evolution
will pick up this complexity and
uh you know select for whatever is
successful
we see that complexity in play and and
makes this question you know more
complex but more exciting
as a small detour i don't know if you
think about this in
into the world of computer science
there's uh
douglas house that or i think came up
with a
name of quine which are i don't know if
you're familiar with these things but
it's computer programs
that have uh i guess exxon and intron
and they copy the whole purpose of the
program is to copy itself
so it prints copies of itself but can
also carry information inside of it
so it's a very kind of crude fun
exercise
of um can we sort of replicate these
ideas from cells
can we have a computer program that when
you run it
just prints itself the entirety of
itself
and does it in different programming
languages and so on i've been playing
around and
writing them it's a kind of fun little
exercise you know when i was a kid
so so you know it it was essentially one
of the
of the sort of main stages
in in informatics olympiads
that you have to reach in order to be
any so good
is you should be able to write a program
that
replicates itself and so the tax then
becomes
even you know sort of more complicated
so what is the shortest
what is the sure program yeah and of
course it's it's you know it's a
function of a programming language
but yeah i remember you know long long
long time ago
when we tried to you know to to make it
short and short and find the the
the shortcut there's actually on a stack
exchange there's a
entire site called code golf
i think where the entirety is just a
competition
people just come up with whatever task i
don't know
like uh write code that reports the
weather today
and the competition is about whatever
programming language what is the
shortest program
and it makes you actually people should
check it out because it makes you
realize there's some
some weird programming languages out
there but
you know just to dig on that a little uh
deeper uh do you think
you know in computer science we don't
often think about
programs just like the machine learning
world now
uh that's still kind of basic
programs and then there's humans that
replicate themselves
right and there's these mutations and so
on
do you think we'll ever have a world
where there's programs that kind of
have an evolutionary process so i'm not
talking about evolutionary algorithms
but i'm talking about programs that kind
of
mate with each other and evolve and like
on their own replicate themselves
so this is kind of the idea here
is you know that's how you can have a
runaway
thing so we think about machine learning
as a system that gets smarter and
smarter and smarter and smarter
at least the machine learning systems of
today are
like it's it's a program that you can
like turn off
as opposed to throwing a bunch of little
programs out there and letting them
like multiply and mate and evolve and
replicate
do you ever think about that kind of
world you know when we jump from the
biological
systems that you're looking at to to
artificial ones i mean it's
almost like you you take the the sort of
the area of intelligent agents
right which are essentially the the
independent sort of uh
codes that run and interact and exchange
the information
right so i i don't see why not i mean i
you know it could be sort of a natural
evolution in in
in this you know uh area of computer
science
i think it's kind of an interesting
possibility it's terrifying too
but i think it's a really powerful tool
like to have like agents that inter
you know we have social networks with
millions of people and they interact
i think it's interesting to inject into
that was already injected into that bots
right but those bots are pretty dumb uh
uh
you know they're they're probably pretty
dumb algorithms
uh you know it's interesting to think
that
there might be bots that evolve together
with humans and there's the sea of
humans
and robots that are operating first in
the digital space
and then you can also think i love the
idea some people worked i think
at harvard at penn there's uh
robotics labs that you know build
take as a fundamental task to build a
robot that given
extra resources can build another copy
of itself
like in the physical space which is uh
super
difficult to do but super interesting i
remember there's like research on
robots that can build a bridge so they
make a copy of themselves and they
connect themselves and
sort of like self-building bridge based
on building blocks you can imagine like
a building that self-assembles so it's
basically self-assembling structures
from uh
from uh robotic parts but it's
interesting
to within that robot add the ability to
mutate and
uh and and do all the interesting
like little things that you're referring
to in evolution to go from a single
origin protein building block to like
well weird complexity and if you think
about this i mean you know the bits and
pieces
are there you know so so you mentioned
revolutionary algorithm right you know
so this is sort of yeah
and the the maybe sort of the the goal
is in a way different right so the goal
is to you know to essentially
uh to to optimize your search right so
uh but uh sort of the the ideas are
there so you people recognize that you
know that the
the you know recombination events
lead to global changes in the in in
search trajectories the mutations event
is a more refined
uh you know uh step in in the search
then you have you know uh other sort of
uh nature inspired algorithm right so
one of the reason that
that you know i think it's it's one of
the funnest one is the slime
uh based algorithm right so that it's a
i think the first was introduced by the
japanese
group but where it was able to to solve
uh some some pre you know complex
problems
uh so so that's the yeah and and then
i think uh there are still
a lot of things we've yet to to
you know borrow from the nature right so
there are
a lot of sort of ideas that nature
uh you know gets to offer us that
you know it's up to us to grab it and to
to
to you know get the best use of it
including neural networks
you know we have a very crude inspire
inspiration from nature on neural
networks maybe there's
other inspirations to be discovered in
the brain or
other aspects of uh the various systems
even like the immune system the way it
uh interplays
i recently started to understand that
like the immune system has something to
do with the way the brain
operates like there's multiple things
going on in there
which uh all of which are not modeled in
artificial neural networks
and maybe if you throw a little bit of
that biological spice in there
you'll come up with something uh
something cool i i
i'm not sure if you're familiar with the
drake equation
that uh estimate i just did a video on
it yesterday because i wanted to give my
own estimate of it
it's uh it's an equation that combines a
bunch of factors to estimate how many
alien civilizations oh yeah i've heard
about it
yes so one one of the interesting
parameters you know it's like
how many uh stars are born every year
how many planets are on average per star
uh for this how many habitable planets
are there
and then the the one that starts being
really interesting
is uh the probability that life emerges
on a habitable planet so like
i don't know if you think about you
certainly think a lot about evolution
but do you think about the thing
which evolution doesn't describe which
is like the beginning of evolution
the origin of life i think i put the
probability of life developing a
habitable planet
one percent this is very scientifically
uh rigorous
okay uh well first at a high level for
the drake equation what would you put
that percent that
on earth and in general do you have
something
do you have thoughts about how life
might have started
you know like the proteins being the
first kind of one of the early
jumping points yes so so um i think
back in 2018 there was a very exciting
paper published in nature
where they uh found
uh one of the simplest amino acids
glycine
in this in a comet dust
so so this is uh and i i
i apologize if i uh don't pronounce it's
a
russian named comets it's i think to
grim of gerasimenko
this is the comment where and there was
this uh
um mission to to get and uh
get close to this comment and get the
the
stardust from from its tail and
uh when scientists analyzed it they
actually found
traces of uh you know uh
of glycine which you know makes up you
know the
one it's one of the basic uh one of the
20 basic
uh amino acids that makes up proteins
right so uh so that was exciting very
exciting
right but you know it's the question is
very interesting right so
what uh you know what if there is some
alien life
is it gonna be made of proteins right or
maybe
rnas right so we see that you know the
the
rna viruses are certainly you know
very well established sort of uh
you know group of molecular machines
right so um so yeah it's it's it's a
very
interesting question you know what what
probability would you put like how hard
is this
job like how unlikely just on earth do
you think this whole thing is
that we got going like is that are we
really lucky or is it
inevitable like what's your sense when
you sit back and think about
life on earth is it higher or lower than
one percent
well because one percent is pretty low
but it still is like damn that's pretty
good chance
yes it's it's a pretty good chance i
mean i i would
personally but again you know i'm um
you know probably not the best person to
to
to do such estimations but uh
i would you know intuitively i would
probably put it
lower yeah but still i mean you know
we're really lucky
here on earth uh i mean or the
conditions are really good
it means you know i think that there was
everything was
right in a way right so it's still it's
not
the the conditions were not like ideal
if you
try to to look at you know what was you
know
several billions years ago when the life
emerged
so there is something called uh the rare
earth hypothesis
that you know encounter to the drake
equation says that
the you know the conditions of earth if
you actually were to describe
earth it's quite a special place
so special might be unique in our galaxy
and potentially you know close to unique
in the entire universe like it's very
difficult to reconstruct those
same conditions and what the rare earth
hypothesis
argues is all those different conditions
are essential for life
and so that's the sort of the counter
you know like all the things we
thinking that earth is pretty average um
i mean i can't really
i'm trying to remember to to go through
all of them but just the fact that it um
is shielded from a lot of asteroids
the obviously the distance to the sun
but also the fact that it's
um it's like a perfect balance between
the amount of water and land and all
those kinds of things and
i don't know there's a bunch of
different factors that i remember
there's a long list
but it's fascinating to think about if
if uh
in order for something like proteins and
then
dna and rna to emerge you need um
and basic living organisms you need
to be a very close and earth-like planet
which would be sad or
exciting i don't know which uh if you
ask me i you know
in a way i put a parallel between um
you know between our own research uh
and i mean from the
from the intuitive perspective you know
you have those
two extremes and the reality is never
very rarely falls into the extremes
it's always the optimus always reached
somewhere in between
so so i would so and that's what i tend
to think i think that
uh you know we're probably somewhere in
between so they were
not unique unique but
again the chances are you know
reasonably small
the problem is we don't know the the
other extreme is like
i tend to think that we don't actually
understand the basic mechanisms of like
what this is all originated from like it
seems like
we think of life as this distinct thing
maybe intelligence is a distinct thing
maybe the physics that from which
planets and suns are born is a distinct
thing but that could be a very
it's like the stephen wolfram thing it's
like the from simple rules emerges
greater and greater complexity
so i you know i tend to believe that
just life finds a way
it like we don't know the extreme of how
common life is
because it could be life is like
everywhere
like like so everywhere that it's almost
like
laughable like that we're such idiots to
think where you
like it's it's like ridiculous to even
like think
it's like ants thinking that their
little colony is the
unique thing and everything else doesn't
exist i mean
it it's also very possible that that's
uh
that's the extreme and we're just not
able to maybe comprehend
the nature of that uh life just to stick
on alien life for just a brief
moment more is there is some signs of
signs of life on venus in gaseous form
there's uh hope for life
on mars probably extinct we're not
talking about intelligent life
although that has been in the news
recently we're talking about basic like
you know uh bacteria bacteria yeah
and then also i guess uh there's a
couple moons
there yeah your europa which is
jupiter's moon i think there's another
one
are you um is that exciting or is it
terrifying to you that
we might find life do you hope we find
life i certainly
do hope that we'll find life um i mean
it was very exciting to
to hear about uh you know uh this
uh news about the the
possible life on the venus it'd be nice
to have hard evidence
of something with uh which is what the
hope is for
for mars and and uh europa but do you
think those organisms would be similar
biologically or would they even be sort
of carbon based
if we do find them i would say they
they would be carbon based uh how
similar
it's a big question right so it's it's
the moment we discover things
outside earth right even if it's a tiny
little
single cell i mean there's so much
just imagine that that would be so i i
think that that would be
another turning point for for the
science you know and if especially if
it's
different in some very new way that's
exciting because that says
that's a definitive state not a
definitive but a pretty strong statement
that life is everywhere in the in the
in the universe to me at least that's
that's really exciting
you brought up joshua letterberg in an
offline conversation
i think i'd love to talk to you about
affifold and this might be an
interesting way to
enter that conversation because uh so he
won the
1958 nobel prize in physiology medicine
for discovering that bacteria can mate
and exchange genes
but uh he also did a ton of other stuff
like uh like we mentioned helping nasa
find life on mars and uh
the uh the dendro
the the chemical expert system expert
systems remember those
uh do you uh what do you find
interesting
about this guy and his his ideas about
artificial intelligence in general
so i have a kind of personal story
to um to share so i
started my phd in canada back in 2000
and so essentially my pg was uh so we
were
developing sort of a new language for
symbolic uh machine learning
so it's different from the feature based
machine learning and and the
uh one of the sort of cleanest
applications of this
uh you know of this approach of this
formalism
was uh two uh chem informatics and
computer aided drug design
right so so so essentially we were uh
you know as a part of my research
uh i developed a system that
essentially looked at chemical compounds
of say the same therapeutic category
you know male hormones right
and tried to figure out
the structural fragments that are the
structural building blocks
that are important that define this
class
versus structural building blocks that
are there just because
you know the to complete the structure
but they are not essentially the ones
that make up the
the chemical the the key chemical
properties of this uh
therapeutic category and and uh
you know uh for me it was something new
i was i was trained as an applied
mathematician
you know as with some a machine learning
background but you know computer
drug design was completely a completely
new territory
so because of that i often uh find
myself
asking lots of questions uh on one of
these sort of central uh
forums back then there were no no
facebooks or stuff like that there was a
forum
you know it's a forum it's essentially
it's like a bulletin board
yeah right yeah so you essentially you
have a bunch of people and you post a
question
and you get you know an answer from you
know different people
and and and back then this one of the
most popular uh forums was
ccl i think um computational chemistry
libra not library but something like
that but ccl
that was the the forum and there i i
you know i asked a lot of dumb questions
yes i ask questions
also share some some you know some uh
information about our former is and how
we do and whether
whatever we do makes sense and so you
know and
uh i remember that well one of this
posts i mean i still remember
you know uh i uh i would call it
desperately looking for uh for uh
a chemist advice something like that
right and so so i post my
question i explained you know how how my
uh
our formalism is what is what it does
and what kind of applications i'm
planning to
to do and you know and it was you know
in the middle of the night
and you know i went back uh you know to
bed
and and next morning have a
phone call from my advisor who also
looked at this forum it's like
you won't believe who replied to you
and and it's like who he said well you
know
there is a message to you from joshua
lederberg
and my reaction was like who is joshua
later back
your eyes are hung up so
and essentially you know joshua wrote me
that
we we had conceptually similar ideas in
in the dandruff
project you may want to look it up
[Music]
and you know we should also sorry and
it's a side comment say that
even though he he won the nobel prize at
a really young age
in 58 but so he he was i think
he was what 33 yeah it's just crazy yeah
so anyway so that's so hence hence in
the 90s
responding to young whippersnappers on
the on the ccl forum
okay and and so so back then he was
already very senior i mean
he unfortunately passed away back in
uh but you know uh back in 2001 he was i
mean he was a professor emeritus at
rockefeller university and you know that
was actually believe it or not one of
the
one of the uh of uh of the reasons
i decided to join uh you know as a
postdoc
the group of andrei saleh who was at
rockefeller university
with the hope that you know that i could
actually you know
uh have a chance uh to meet joshua in
person
and i met him very briefly right the you
know
just because he was walking you know
there's a little breach that connects
the sort of the
research campus with the um
with the uh sort of sky scrapper that
rockefeller
owns the where you know uh post docs and
faculty and
graduate students live and so so i met
him you know and i had a
very short conversation you know but uh
so i i started you know reading about
dandrull
and i was amazed you know it's we're
talking about 1960
yeah right the ideas
were so profound well what's the
fundamental ideas of it
the the reason to make this is even
crazier
so so so leatherberg wanted to make a
system
that would help him study
the extraterrestrial
molecules right so so the idea was that
you know the way you study the
extraterrestrial molecules is you do the
mass spec
analysis right and so the mass spec
gives you sort of bits numbers about
essentially uh
gives you the ideas about the possible
fragments
or you know atoms and you know and and
and maybe little fragments pieces of
this molecule
that make up the molecule right so now
you need to
sort of to decompose this information
and to figure out what was the whole
before you know it beca became uh
fragments bits and pieces right so so
in order to make this uh you know to
have this
tool the idea of leather work was to
connect
chemistry computer science
and to design this so-called expert
system
that looks that takes into account this
it takes as an
input the mass pack data
the possible the database of possible
molecules and essentially try to
uh sort of induce the molecule that
would correspond to this spectra or you
know
essentially the what this
project ended up being
was that you know it would provide a
list of candidates
that then a chemist would look at and
and and
make final decision so but the original
idea is supposed to solve the entirety
of this problem
automatically yes so so so he uh you
know so uh
so he uh back then uh he
succeeded yes believe that
yeah it's it's amazing i mean it still
blows my mind you know that
it's that's is and this was essentially
the the
the origin of the modern bioinformatics
game informatics
you know back in the 60s yeah right so
that's that's you know
you know so every time you you you deal
with
with projects like this with the you
know research like this
you just you know uh so the the power
of of the of the you know intelligence
of this
people uh is is just you know
overwhelming do you think about expert
systems
is there um and why they kind of didn't
become successful especially in the
space of bioinformatics
where it does seem like there's a lot of
expertise in humans
and uh you know it's it's possible to
see that
a system like this could be made very
useful
right so it's it's actually it's a it's
a great question and and this is
something so you know so
uh you know at my university i teach
artificial intelligence and you know we
start the my first two lectures are on
the history of ai
and and there we you know we tried to
you know go through the main stages of
ei and so you know the question of why
expert systems failed
or became obsolete it's actually a very
interesting
one and there are you know if you uh try
to read the
you know the historical p

Resume

# Mengungkap Misteri Biologi melalui Kecerdasan Buatan: Wawasan dari Dmitry Korkin

### Inti Sari (Executive Summary)
Video ini membahas perpaduan antara biologi, bioinformatika, dan kecerdasan buatan (AI) dalam memahami kehidupan, khususnya melalui perspektif Profesor Dmitry Korkin. Pembahasan mencakup analisis mendalam mengenai struktur protein dan virus (seperti SARS-CoV-2), terobosan besar AlphaFold dalam memprediksi struktur protein, serta spekulasi ilmiah mengenai asal usul kehidupan di alam semesta. Selain itu, diskusi juga menyentuh evolusi AI, dampaknya pada seni, dan pentingnya transparansi serta efisiensi dalam komunitas ilmiah modern.

### Poin-Poin Kunci (Key Takeaways)
*   **Struktur Protein & Virus:** Protein memiliki kompleksitas modular yang terdiri dari *domain*; teknologi Cryo-EM dan AI telah membantu memvisualisasikan struktur kompleks seperti protein *spike* pada virus Corona.
*   **Revolusi AlphaFold:** AlphaFold 2 dianggap sebagai salah satu terobosan AI terbesar yang mampu memprediksi struktur protein dengan akurasi mendekati level eksperimental, mengubah cara penelitian biologi dilakukan.
*   **Asal Usul Kehidupan:** Penemuan asam amino di komet dan hipotesis "Bumi Langka" menunjukkan bahwa kehidupan mungkin jarang, namun kompleksitasnya bisa muncul dari aturan sederhana.
*   **AI di Luar Sains:** AI tidak hanya mendominasi sains tetapi juga mulai merambah seni (musik, lukisan), meskipun belum sepenuhnya meniru "jiwa" atau emosi kreatif manusia.
*   **Efisiensi Ilmiah:** Komunitas ilmiah telah menunjukkan efisiensi luar biasa dalam merespons pandemi COVID-19 dibandingkan dua dekade lalu, berkat kemajuan pengurutan (sequencing) dan berbagi data.

---

### Rincian Materi (Detailed Breakdown)

#### 1. Kompleksitas Protein dan Struktur Virus (SARS-CoV-2)
*   **Modularitas Protein:** Profesor Dmitry Korkin menjelaskan bahwa protein bukan sekadar molekul globular, melainkan memiliki "kompleksitas modular". Mereka terdiri dari unit struktural bernama *protein domain* (seperti manik-manik pada tali) yang merupakan unit fungsi dan evolusi.
*   **Teknologi Cryo-EM:** Teknologi *Cryo-electron microscopy* (Cryo-EM) memungkinkan ilmuwan melihat molekul besar yang sebelumnya sulit dipecahkan, seperti protein *spike* pada virus SARS-CoV-2.
*   **Mekanisme Spike Protein:** Protein *spike* pada Corona virus adalah trimer (tiga rantai) yang berfungsi menempel pada reseptor ACE2 manusia. Mutasi pada virus dapat menyebabkan "lengan" protein ini terbuka lebih sering, meningkatkan kemungkinan infeksi.
*   **Organisasi Virus:** Virus memiliki organisasi yang sangat teratur. Protein Membran (M) membentuk kisi (lattice) yang indah, memberikan bentuk dan stabilitas pada virus, bukan sekadar kumpulan acak.

#### 2. Evolusi, Splicing, dan Kode yang Bereplikasi
*   **Alternative Splicing:** Pandangan lama bahwa "satu gen menghasilkan satu protein" sudah tidak berlaku. Proses *splicing* (pemotongan dan penggabungan RNA) menciptakan tingkat kompleksitas dinamis yang merespons penyakit atau tahap perkembangan.
*   **Analogi Komputer (Quines):** Terdapat paralel menarik antara biologi dan ilmu komputer, khususnya pada program *Quine* (program yang dapat mencetak salinan dirinya sendiri) dan kode yang berevolusi, yang menggambarkan bagaimana kehidupan mungkin bermula dari aturan sederhana.
*   **Evolusi Virus:** Virus memiliki kapasitas untuk menjadi lebih efisien. Meskipun mutasi bisa membuatnya kurang efisien, seleksi alam cenderung mempertahankan fungsi yang "hampir sempurna" seperti kemampuan menempel pada sel inang.

#### 3. Terobosan AI: AlphaFold dan Masa Depan Bioinformatika
*   **Sejarah Dendral:** Proyek Dendral pada tahun 1960-an oleh Joshua Lederberg (pemenang Nobel) dianggap sebagai awal bioinformatika modern, yang menggabungkan kimia dan ilmu komputer untuk menganalisis data spektrometri massa.
*   **Kemenangan AlphaFold:** Pada kompetisi CASP 2020, AlphaFold 2 dari DeepMind "memecahkan" masalah pelipatan protein (*protein folding*) dengan akurasi luar biasa. AI ini menggunakan peta kontak dan informasi evolusioner untuk memprediksi struktur 3D protein.
*   **Batasan AlphaFold:** Meskipun luar biasa untuk protein domain tunggal yang kompak, AlphaFold masih menghadapi tantangan dalam memprediksi protein multi-domain yang kompleks (yang umum ditemukan pada sistem saraf) atau interaksi antar protein (*protein-protein interactions*).

#### 4. Asal Usul Kehidupan dan Eksplorasi Luar Angkasa
*   **Kehidupan di Luar Bumi:** Penemuan glisin (asam amino) di komet 67P/Gerasimenko mengindikasikan bahwa bahan penyusun kehidupan ada di luar sana. Namun, hipotesis "Bumi Langka" (*Rare Earth Hypothesis*) menyarankan bahwa kondisi Bumi sangat unik dan spesial.
*   **Potensi Kehidupan:** Ada harapan untuk menemukan kehidupan (mungkin dalam bentuk bakteri punah) di Mars, atau dalam bentuk gas di Venus, maupun di bulan Europa (Jupiter). Penemuan ini akan menjadi titik balik besar bagi sains.
*   **Drake Equation:** Diskusi menyentuh parameter persamaan Drake mengenai berapa lama peradaban bisa bertahan, mengaitkannya dengan risiko pandemi alami versus buatan manusia di masa depan.

#### 5. AI, Seni, dan Etika Sains
*   **Peringkat Terobosan AI:** Dmitry menempatkan Deep Blue (catur) di peringkat teratas karena dampak psikologisnya, namun mengakui AlphaFold sebagai salah satu yang teratas (Top 3) dampaknya pada sains dan masyarakat.
*   **Kreativitas AI:** AI telah digunakan untuk membuat musik klasik, puisi Haiku, hingga melukis ala Rembrandt. Namun, belum ada momen "superhuman" di mana AI menciptakan karya seni komersial yang benar-benar mengubah industri tanpa campur tangan manusia.
*   **Riset Patogenisitas:** Machine Learning berpotensi digunakan untuk memprediksi seberapa patogen sebuah strain virus berdasarkan urutan genenya. Namun, hal ini menimbulkan dilema etis mengenai risiko penggunaan untuk menciptakan senjata biologis versus manfaat untuk pengembangan vaksin.

#### 6. Refleksi Pribadi dan Rekomendasi Buku
*   **Efisiensi Sains:** Dmitry kagum pada kecepatan komunitas ilmiah dalam mengurutkan dan membagikan data virus selama pandemi COVID-19, jauh melampahi respons terhadap SARS 20 tahun lalu.
*   **Rekomendasi Buku:**
    *   *The Master and Margarita* karya Mikhail Bulgakov: Menangkap budaya Rusia dengan cerita romantis dan magis.
    *   *Cancer Ward* karya Aleksandr Solzhenitsyn: Menggambarkan pengalaman pasien kanker secara mendalam dan alegoris.
    *   *The Computer and the Brain* karya John von Neumann: Esai padat mengenai paralel antara otak dan komputer.
    *   *Lab Girl* karya Hope Jahren: Kisah emosional perjalanan seorang wanita menjadi ilmuwan.
*   **Resolusi:** Dmitry berbagi resolusinya untuk menghabiskan lebih banyak waktu bersama keluarga dan menutup sesi dengan pembacaan puisi.

## Kesimpulan & Pesan Penutup
Diskusi bersama Profesor Dmitry Korkin menggambarkan bagaimana konvergensi biologi dan kecerdasan buatan, terutama melalui AlphaFold, telah merevolusi pemahaman kita mengenai struktur protein dan asal-usul kehidupan. Di luar capaian ilmiah, wawasan ini juga menyoroti perluasan AI ke dalam seni serta pentingnya etika dan efisiensi dalam komunitas peneliti. Semoga rangkuman ini memberikan perspektif baru tentang masa depan bioinformatika dan dampaknya bagi umat manusia.

Read

file updated 2026-02-13 13:24:39 UTC