Dmitry Korkin: Computational Biology of Coronavirus | Lex Fridman Podcast #90
CwyOUS8TSl0 • 2020-04-22
Transcript preview
Open
Kind: captions
Language: en
the following is a conversation with
Dimitri korkin he's a professor of
bioinformatics and computational biology
at WPI Worcester Polytechnic Institute
where he specializes in bioinformatics
of complex diseases computational
genomics systems biology and biomedical
data analytics I came across Dimitri's
work one in February his group used the
viral genome of the Cova 19 to
reconstruct the 3d structure of its
major viral proteins and their
interaction with the human proteins in
effect creating a structural genomics
map of the corona virus and making this
data open and available to researchers
everywhere we talked about the biology
of covert 19 SARS and viruses in general
and how computational methods can help
us understand their structure and
function in order to develop antiviral
drugs and vaccines this conversation was
recorded recently in the time of the
corona virus pandemic for everyone
feeling the medical psychological and
financial burden of this crisis
I'm sending love your way stay strong
we're in this together we'll beat this
thing this is the artificial
intelligence podcast if you enjoy it
subscribe on YouTube review it with five
stars in a podcast supported on patreon
or simply connect with me on Twitter
Alex Friedman spelled Fri DM a.m. this
show is presented by cash app the
number-one finance app in the App Store
when you get it you just called Lex
podcast cash app lets you send money to
friends buy Bitcoin and invest in the
stock market with as little as $1 since
cash app allows you to buy Bitcoin let
me mention that cryptocurrency in the
context of the history of money is
fascinating I recommend a cent of money
as a great book on this history debits
and credits on Ledger's started around
30,000 years ago the US dollar created
over two hundred years ago and Bitcoin
the first decentralized cryptocurrency
released just over ten years ago so
given that history cryptocurrency is
still very much in his early days of
development but it's still aiming to and
just my redefined the nature of money
so again if you get cash out from the
App Store Google Play and use the code
let's podcast you get ten dollars in
cash app will also donate ten dollars
the first an organization that is
helping to advance robotics and STEM
education for young people around the
world and now here's my conversation
with Demetri korkin do you find viruses
terrifying or fascinating when I think
about viruses I think about them I mean
I imagine them as those villains that do
their work so perfectly well that's that
is impossible not to be fascinated with
them so what do you imagine when you
think about a virus do you imagine the
individual so these hundred nanometer
particle things or do you imagine the
whole pandemic like Society level the
when you say the efficiency at which
they do their work do you think of
viruses as the millions that him and
that occupy human body or a living
organism Society level like spreading as
a pandemic or do you think of the
individual little guy yes this is I
think this is a unique a unique concept
that allows you to move from micro scale
to the macro scale all right so the
dividers itself I mean it's it's not a
living organism
it's a machine to me it's a machine but
it is perfected to the way that it
essentially has a limited number of
functions it needs to do necessary some
functions and essentially has enough
information just to do those functions
as well as the ability to modify itself
so you know it's it's a machine it's an
intelligent machine so yeah look maybe
on that point you're in danger of
reducing the power of this thing by
calling it a machine right but you now
mention that it's also possibly
intelligent it seems that there's these
elements of brilliance that a virus has
of intelligence of maximizing so many
things about its behavior in to ensure
its survival and its and its success so
do you see it as intelligent so you know
I think the it's a different
understanding differently than you know
I think about you know intelligence over
human kind or intelligence of of the of
the you know of the artificial
intelligence mechanisms
I think the intelligence of a virus is
in its simplicity
the ability to do so much with so little
material and information but also I
think it's it's interesting it keeps me
thinking you know it gives me wondering
whether or not it's also the an example
of the basic swarm intelligence where
you know essentially the viruses act as
the whole and extremely efficient in
that so what do you attribute the
incredible simplicity and the efficiency
- is it the evolutionary process - maybe
another way to ask that if you look at
the next hundred years
are you more worried about the natural
pandemics or the engineered pandemics so
how hard is it to build a virus yes it's
it's a very very interesting
question because obviously there is a
lot of conversations about the you know
whether we are capable of engineering a
you know anyone worse the virus
I personally expect in a mostly
concerned with the naturally occurring
viruses simply because we keep seeing
that we keep seeing new strains of
influenza emerging some of them becoming
pandemic we keep seeing new strains of
coronaviruses emerging this is a natural
process and I think this is why it's so
powerful you know if you ask me you know
did I've read papers about scientists
trying to study the capacity of the
modern you know by technology to alter
the viruses but I hope that that you
know it in it won't be our main concern
in the near future
do you mean by hope well you know if you
look back and look at the history of the
of the most dangerous viruses right so
that's the first thing that comes into
mind is a smallpox so right now there is
perhaps a handful of places where this
you know the the strains of this virus
are stored right so this is essentially
the effort of the whole society to limit
the access to those viruses I mean in a
lab in a controlled environment in order
to study and then smallpox is one of the
viruses for which
should be stated there's a vaccine is
developed yes yes and that's you know
it's until seventies it wasn't in my
opinion it was perhaps the most
dangerous think that was there is there
a very different virus then then the
influenza and coronaviruses it is it is
different in several aspects
biologically it's a so-called
double-stranded DNA virus but also in
the way that it is much more contagious
so they are not for so this is this is
the what are not are not is essentially
an average number as person infected by
the virus can spread to other people so
then the average number of people that
he or she can spread it to and you know
the there is still some you know
discussion about the estimates of the
current virus you know the estimations
vary between you know one point five and
three in case of smallpox it was five to
seven and we're talking about the
exponential growth right so that's
that's a very big difference it's not
the most contagious one measles for
example it's I think 15 and up so so
it's it's you know but it's definitely
definitely more contagious that that the
seasonal flu then
the current coronavirus were stars for
that matter so what makes a what makes a
virus more contagious or the I'm sure
there's a lot of variables that come
into play but is it is it that whole
discussion of aerosol and like the size
of droplets if if it's airborne or
there's some other stuff that's more
biology centered I mean there are a lot
of components and and there are
biological components that there are
also you know social components the
ability of the virus to you know the the
ways in which the virus is spread is
definitely one the ability to virus to
stay on the surfaces to survive the
ability of the virus to replicate fast
also you know once it's in the cell or
whatever once it's inside the host and
interesting enough something that I
think we didn't pay that much attention
to is the incubation period the were you
know hosts are symptomatic and now it
turns out that another thing that we one
really needs to take into account the
percentage of the asymptomatic
population because those people still
shared this virus and still are you know
they still are contagious as other than
the Iceland study which i think is
probably the most impressive size-wise
shows 50 percent asymptomatic this virus
I also recently learned the swine flu is
like just a number of people who got
infected was in the billions it was some
crazy number it was like it was like
like 20 percent of poverty percent of
population something crazy like that so
the lucky thing there is the fatality
rate is low but the fact that a virus
can just take over an entire population
so quickly
it's terrifying I think I mean this is
you know that's perhaps my favorite
example of a butterfly effect because
it's really I mean it's it's even tinier
they'd then a butterfly and look at you
know and with you know if you think
about it right so it used to be in in
those bad species and perhaps because of
you know a couple of small changes in in
the in the viral genome his first had
you know become capable of jumping from
bats to human and then it became capable
of jumping from human to human
alright so this is this is I mean it's
not even the size of a virus it's the
size of several you know several atoms
or says you know few atoms and our
sudden this change has such a major
impact so is that a mutation like on a
single virus is that like so if we talk
about those the the flap of a butterfly
wing like what's the first flap well I
think this is the the the mutations that
make that made this virus capable of
jumping from bat species to human and of
course there's you know the scientists
are still trying to find I mean they
still even trying to find the the who
was the first in fact it is the patient
zero the first human the first human
infected right I mean the fact that
there are corona viruses different
strains of corona viruses in various bat
species I mean we know that so so we you
know viola gist absurdum they studied
them they look at their and genomic
sequences they're trying of course to
understand what make this virus is to
jump from from bats to human there was
you know similar to that and in you know
in influenza that was I think a few
years ago there was this you know
interesting story where several groups
of scientists studying influenza virus
essentially you know made experiments to
show that this virus can jump from one
species to another you know by changing
I think just a couple of residues and
and and of course it was very
controversial I think there was a
moratorium on this study for a while but
then the study was released it was
published so that was their moratorium
is because it shows through engineering
it through modifying it you can make a
jump yes yeah I I personally think it is
important to study this I mean we should
be inform to should try to understand as
much as possible in order to prevent it
but so then the engineering aspect there
is can't you then just start searching
because there's so many strands of
viruses out there
can't you just search for the ones in
bats that are the deadliest from the
virologist perspective and then just try
to engineer try to see how to but see
that's a there's a nice aspect to it the
really nice thing about engineering
viruses it has the same problems nuclear
weapons is it's hard for it to not only
to mutual self-destruction so you can't
control a virus it can't be used as a
weapon right yeah that's why I you know
in the beginning I said you know I I'm
hopeful because that definitely the
definitely regulations to be needed to
be introduced and I mean as the
scientific society is we are in charge
of
you know making the right actions making
the right decisions but I think we we
will benefit tremendously by
understanding the mechanisms by which
the virus can jump by which the virus
can become more you know more more
dangerous to humans because all this
answers with you know eventually to to
designing better vaccines hopefully
Universal vaccines right and that would
be a triumph of the you know science so
what's the universe of vaccines is that
something that well how universal is
universal well I mean you know so what's
the dream I guess because you kind of
mentioned the dream of this I would be
extremely happy if you know we designed
the vaccine that is able I mean I'll
give you an example right so so every
year we do a seasonal flu shot the
reason we do it is because you know we
are in the arms race you know our
vaccines are in the arms race with with
constantly changing virus right now if
the
neck's pandemic influenza pandemic will
a cure most likely this vaccine would
not save us right although it's it's you
know it's the same virus might be
different strain so if we're able to
essentially design a vaccine against you
know influenza A virus no matter what's
the strain no matter which species did
jump from that would be I think that
would be a huge huge progress and
advancement
you mentioned the smallpox until the
seventies might have been something that
he would be worried the most about what
about these days well we're sitting here
in the middle of a cove in nineteen
pandemic but these days nevertheless
what is your biggest worry virus wise
what are you keeping your eye are on it
looks like and you know based on the
past several years of the of the new
viruses emerging I think we're still
dealing with different types of
influence I mean so so the eight seven
and nine avian flu that was that emerged
I think a couple of years ago in China I
think the the mortality rate was
incredible I mean it was you know I
think above thirty percent
you know so this is this is fuchsia I
mean luckily for us this strain was not
pandemic alright so it was jumping from
birds to human but I don't think it it
it was actually transmittable between
the humans and you know this is actually
a very interesting question
which scientists tried to understand
right so the balance the delicate
balance between the virus being very
contagious right so
efficient in spreading and virus to be
very pathogenic you know causing you
know harms you know and and that's to
their horse so it looks like that the
more pathogenic the viruses the less
contagious it is is that a property
biology or what is it was I I don't have
an answer to that
and III think this is this is still an
open question but you know if you look
at you know you know with the corona
virus for example if you look at you
know the the deadlier relative Merce
Merce was never in a pandemic virus
right but the you know did again the the
mortality rate from nurseries far above
you know I think twenty or thirty
percent so so whatever is making this
all happen doesn't want us dead because
it's balancing yeah nicely I mean how do
you explain that one not dead yet like
because there's so many viruses and
they're so good at what they do why do
they keep us alive I mean we will also
have you know a lot of protection right
so the immune system and so I mean we do
have you know ways to to fight against
those viruses and I think with the I now
weigh much better equipped right so with
the discoveries of vaccines and you know
there are vaccines against the the
viruses that maybe two hundred years ago
would wipe us out completely but because
of this vaccines we are actually we're
capable of eradicating pretty much fully
as is the case with smallpox so if we
could can we go to the basics a little
bit of
the biology of the virus how does the
virus infect the body so I think there
are some key steps that the virus needs
to perform and of course the first one
the viral particle needs to get attached
to the host cell in the case of corona
virus there is a lot of evidence that it
actually interacts in the same way of
the as the SARS coronavirus so it gets
attached to a c2 human receptor and so
there is I mean as we speak there is a
growing number of papers suggesting it
moreover a most recent I think most
recent results suggest that this virus
attaches more efficiently to this human
receptor then SARS just a sore back off
so there is a family viruses the corona
viruses and SARS whatever the heck for
that respite or wherever that stands for
so SARS actually stands for the disease
that you get is the syndrome of acute
respiratory so SARS is the first strand
and there's Merce Merce and there is yes
but people scientists actually know more
than three strains I mean so there is
the mhv
strain which is considered to be a
canonical model disease model in mice
and so there is a lot of work done on on
this virus because it's but he hasn't
jumped to humans yet no no yes it's
fascinating so any mention a c2 so the
when you say attached proteins are
involved yeah on both sides yes so so we
have you know so we have this infamous
spike protein on the surface of the
virion particle and
does look like a spike and I mean that's
essentially because of this protein you
know we called the coronavirus
coronavirus so that what makes Corona on
top of the surface so so this via this
protein it actually it acts so it
doesn't act alone it actually it makes a
three copies and it's it makes so-called
trimer so this trimer is essentially a
functional unit a single functional unit
that in starts interacting with the AC
two receptor so this is again another
protein that now sits on the surface of
a human cell host cell I would say and
that's essentially in that way the virus
anchors itself to the host cell because
then it needs to actually it needs to
get inside you know it fuses its
membrane with the host membrane it
releases the the key components it
releases its you know RNA and then
essentially hijacks the the machinery of
the cell because none of the viruses
that we know of have ribosome the the
machinery that allows us to print out
proteins so in order to print out
proteins that are necessary for
functioning of this virus it actually
needs to hijack the host ribosomes the
virus is an RNA wrapped in a bunch of
proteins one of which is this functional
mechanism with by protein that does the
attachment so yeah so you know so if you
look at this virus that there are you
know several basic components right so
we start with the Spike protein this is
not the only surface protein the the
protein that lives on the surface of the
viral particle there is also perhaps the
the protein with the highest number of
copies is the membrane protein so it's
essentially it forms the capsid sorry
the envelope of the protein of the viral
particle and essentially you know helps
to maintain a certain curvature helps to
make a certain curvature then there is a
another protein called envelope protein
or a protein and it it actually occurs
in in far less quantities and still
there is ongoing research what exactly
does this protein do so these are sort
of the three major surface proteins that
you know make the divider envelope and
when we go inside then we have another
structural protein called nuclear
protein and the the purpose of this
protein is to protect the viral RNA it
actually binds to the viral RNA creates
a capsid and so the rest of the virus
viral information is inside of this you
know RNA and you know if you compare the
amount of the genes or you know proteins
that are made of these genes it's much
you know it's significantly higher than
of influenza virus for example influenza
virus has I think around eight or nine
proteins where this one has at least 29
Wow
that has to do with the length of the
RNA strand I mean so I mean so it's it
it affects the length of the RNA strand
right so so so because you essentially
need to have sort of the minimum amount
of information to encode those genes how
many proteases you say 2909 protease yes
so this is this is you know something
definitely interesting because you know
believe it or not
we've been studying you know
coronaviruses for over two decades we've
yet to uncover all functionalities of
his proteins could we maybe take a small
tangent and can you can you say how one
would try to figure out what a function
of a particular protein is so you've
mentioned people are still trying to
figure out what the function of the
envelope protein might be or what's the
process so this is where the research
that computational scientists do might
be of help because you know in the past
several decades was that we actually
have collected a pretty decent amount of
knowledge about different proteins in
different viruses so what we can
actually try to do and this is sort of
could be sort of the our first lead to a
possible function is to see whether
those you know say we have this genome
of the corona virus other of the novel
coronavirus
and we identify the potential proteins
then in order to infer the function what
we can do can actually see whether those
proteins are similar to those ones that
we already know okay in such a way we
can you know for example clearly
identified you know some critical
components that RNA polymerase or
different types of proteases these are
the proteins that essentially clip the
protein sequences and so this works in
many cases however in some cases you
have truly novel proteins and this is a
much more difficult task now as a small
pause when you say similar like what if
some parts are different and some parts
are similar like how do you disentangle
that
you know it's it's a big question of
course you know what by informatics does
it does predictions right so those
predictions and they have to be
validated by experiments functional or
structural predictions both I mean we we
do structural predictions with the
functional predictions we do
interactions predictions things you just
generate a lot of predictions like
reasonable predictions based on
structure and function interaction like
you said and then here you go
that's the power of bioinformatics is
data grounded good predictions of what
should happen so we you know in the way
I see it we're helping experimental
scientists to streamline the discovery
process yeah and the experimental
scientists is that what a virologist is
solely about virology is one of the
experimental sciences that you know
focus on viruses they often work with
other experimental scientists for
example the molecular imaging scientists
right so the the viruses often can be
viewed and reconstructed through
electron microscopy techniques so but
these are you know specialists that are
not necessarily by biologists they've
worked with small small particles more
by whether it's viruses or is it an
organelle of a you know of a human cell
whether it's a you know complex
molecular machinery so the techniques
that are use are very similar in in
surfing in its in their essence and so
yeah so so typically me and in we see it
now the research on you know that is
emerging and that
is needed often involves the
collaborations between biologists you
know
biochemist you know people from from
pharmaceutical sciences computational
sciences so we have to work together so
from my perspective is to step back
sometimes I look at this stuff it's the
how much we understand about RNA DNA how
much we understand about protein like
your work the amount of proteins that
you're exploring is it surprising to you
that we were able we descendants of apes
were able to figure all of this out like
how so your computer scientists so for
me from computer science perspective I I
know how to write a Python program
things are clear but biology is a giant
mess it feels like to me from an
outsider's perspective is how surprising
is it amazing is it that we were able to
figure this stuff out you know if you
look at the you know how computational
science and computer science was
evolving right I think it was just a
matter of time that we would approach
biology so so we we started from you
know applications to much more
fundamental systems physics you know and
now we are or you know small chemical
compounds right so now we are
approaching the more complex biological
systems and I think it's a natural
evolution of you know of the computer
science of mathematics sure that's the
computer science I just might even in in
higher level so that to me surprising
that computer science can offer help in
this messy world but I just mean it's
incredible that the biologists and the
chemists can figure all this out or is
it you sound ridiculous to you that
that of course they would it just seems
like a very complicated set of problems
like the the variety of the kinds of
things that could be produced in the
body the just just like you said 20 and
I approach I mean just getting a hand of
in a hang of it so quickly it just seems
impossible to me I agree I mean it's and
I have to say we are you know in the
very very beginning of this journey I
mean we we've yet to I mean we've yet to
comprehend not even try to understand
and figure out all the details but we've
yet to comprehend the complexity of the
cell we know that neuroscience is not
even at the beginning of understanding
human mind
so where's biology said in terms of
understanding the function deeply
understanding the function of viruses
and cells so there sometimes it's easy
to say when you talk about function what
you really refer to it's perhaps not a
deep understanding but more of a
understanding sufficient to be able to
mess with it using a antiviral like mess
with it chemically to prevent some of
its function or do you understand the
function well I think equally I think
we're much farther in terms of
understanding of the complex genetic
disorders such as cancer where you have
layers of complexity and we you know as
in my laboratory we're trying to
contribute to that research but we're
also in a way overwhelmed with how many
different layers of complexity different
layers of mechanisms that can be
hijacked by cancer simultaneously and so
you know I think biology in the past 20
years again from the perspective of the
outsider because I'm not a biologist but
I think it has advanced tremendously
and one thing that we're computational
scientists and data scientists are now
becoming very very helpful is in the
fact it's kind of from the fact that we
are now able to generate a lot of
information about the cell whether it's
next-generation sequencing or
transcriptomics whether it's life
imaging information where it is you know
complex interactions between proteins or
between proteins and small molecules
such as drugs we we are becoming very
efficient in generating this information
and now the next step is to become
equally efficient in processing this
information and extracting the the key
knowledge from that they could then be
validated with the experiment yeah yeah
so maybe then going all the way back
we're talking you said the first step is
seeing if we can match the new proteins
you found in the virus against something
we've seen before to figure out its
function and then you also mentioned
that but there could be cases where it's
a totally new protein is there something
biron firm addicts can offer when it's a
totally new protein this is where many
of the methods and you probably are
aware of you know the the case of
machine learning many of these methods
rely on the previous knowledge right so
things that where we try to do from
scratch are incredibly difficult you
know something that we call a Benicia
and this is I mean it's not just the
function I mean you know we've yet to
have a robust method to predict the
structures of these proteins in a
Benicia you know by not using any
templates
of other related proteins so protein is
a chain of amino acids residues as
residues yeah and then however somehow
magically maybe you can tell me they
seem to fold in incredibly weird and
complicated 3d shapes yes so and that's
where actually the idea of protein
folding or just not the idea but the
problem of figuring out how the hell it
wants up the concept how they fold into
those weird shapes comes in so that's
another side of computational work so
what can you describe what protein
folding from the computational side is
and maybe your thoughts on the folding
at home efforts that a lot of people
know they you can use your machine to to
do protein folding
so yeah broad protein folding is you
know one of that those 1 million dollar
price challenges right so the reason for
that is we've yet to understand
precisely how the protein gets folded so
efficiently to the point that in many
cases where you you know where you try
to unfold it due to the high temperature
it actually folds back into its original
state right so we know a lot about the
mechanisms right but put putting those
mechanisms together and making sense
it's a computationally very expensive
task in general the proteins fold can
they fold in arbitrary large number of
ways it is they usually fold in a very
small number no it's it's typically I
mean you we tend to think that you know
there is a one sort of canonical fold
for protein although that there are many
cases where the proteins you know upon
the stabilization it can be folded into
a different conformation
and this is especially true when you
look at sort of proteins that in that
include more than one structural unions
so those structural unions we call them
protein domains essentially protein
domain is a single unit that typically
is evolutionary preserved that typically
carries out the single function and
typically has a very distinct fault
structure 3d structure organization but
turns out that if you look at human an
average protein in a human cell would
have to a bit of two or three such
subunit and how they are trying to fold
into the sort of you know next level
fold right
so within subunit is folding and then
and then they fold into the larger 3d
structure right and and all that there's
some wonder saying the basic mechanisms
but not to put together to be able to
fold it we're still I mean we're still
struggling I mean we're we're getting
pretty good about folding relatively
small proteins up to hundred residues
which I mean but we're still far away
from folding you know larger proteins
and some of them are notoriously
difficult for example transmembrane
proteins proteins that that sit in the
in the membranes of the cell they're
incredibly important but they are
incredibly difficult to solve and so
basically there's a lot of degrees of
freedom how it folds and so it's a
combinatorial problem or just explodes
there's so many dimensions Hey well it
is a combinatorial problem but it
doesn't mean that we cannot approach it
from the non canal not from the boot for
a force approach and so the machine
learning approaches you know have been
emerged that try to tackle it so folding
at home
I don't know how familiar with it but is
that used machine learning or is it more
brute force no so folding at home it was
originally and I remember I was a it was
a long time ago I was a postdoc and we
we learned about this you know this game
because it was originally designed as
the game and we you know I took a look
at it and it's interesting because it's
it's really you know it's very
transparent very intuitive so and from
what I heard a via to introduce it to my
son but you know kids are actually
getting very good at folding the
proteins and it was you know it came to
me as they as the not as a surprise but
actually as the sort of manifest of you
know our capacity to do this kind of to
solve these kind of problems when a
paper was published published in one of
these top journals with the coasters
been the actual players of this game so
and what happened is was that they
managed to get better structures than
the scientists themselves so so that you
know that was very I mean it was kind of
profound you know revelation that
problems that are so challenging for a
computational science maybe not that
challenging for a human brain well
that's a really good that's a hopeful
message always when there's a the proof
of existence the existence proof that
it's possible that's really interesting
but the it seems what are the best ways
to do protein folding now so if you look
at what deep mind does with alpha fall
alpha fold yes so they kind of is that's
a learning approach what's your sense I
mean your backgrounds in machine
learning but is this a learnable problem
is this still a brute-force away in the
garry kasparov deep blue days are we in
the alphago playing the game of go days
of folding well I think we are we are
advancing towards this direction I mean
if you look so there is a sort of
olympic game for protein folders called
CASP and it's essentially it's you know
it's a competition where different teams
are given exactly the same protein
sequences and they try to predict their
structures right and of course there's
different sort of subtasks but in the
recent competition half a fault was
among the top performing teams if not
the top performing team so there is
definitely a benefit from the data that
had been generated you know in the past
several decades the structural data and
certainly you know we are now at the
capacity to summarize this data to
generalize this data and to use those
principles you know in order to predict
protein structures as one of the really
cool things here is there's maybe you
can comment on it there seems to be
these open datasets of protein how did
that with the protein databank
the a protein databank I mean as create
is this a recent thing for just the
corona virus or it's it's been for many
many years I believe the first protein
databank was designed on flash cards so
on the so yes it's so this I mean this
is a great example of the community
efforts
of everyone contributing cause every
time you solve a protein or a protein
complex this is where you submit it and
you know the scientists get access to it
scientists get to test it and we went
from occasions use this information to
you know to make predictions so there's
no there's no culture like hoarding
discoveries here so that's I mean you've
you've you've released a few or a bunch
of proteins they were matching its
whatever we'll talk about details a
little bit but it's kind of amazing that
that's the the it's kind of amazing how
open the culture here is it is and I
think this pandemic actually
demonstrated the ability of scientific
community to you know to solve this
challenge collaboratively and this is I
think it if anything it actually moved
us to a brand new level of
collaborations of the efficiency in
which people establish new
collaborations in in which people offer
their help to each other scientists
offer their help to each other and
publish results to it's very interesting
we're now trying to figure out as a few
journals that are trying to sort of do
the very accelerated review cycle but so
many preprints so just hosting a paper
going out
I think it's fundamentally changing the
the way we think about papers yes I mean
the way we think about knowledge now
let's say no yes because yes I
completely agree I think now it's the
knowledge
is becoming sort of the the core value
not the paper or the journal where this
knowledge is published and I think this
is again this is we are living in the in
the times where it becomes really
crystallized that the idea that the most
important value is in the knowledge so
maybe you can comment like what do you
think the future of that knowledge
sharing looks like so you have this
paper that will I hope you get a chance
to talk about a little bit but it has
like a really nice abstract and the
introduction and related like it has all
the usual I mean probably took a long
time to put together so but is that
going to remain like you could have
communicated a lot of fundamental ideas
here in much shorter amount that's less
traditionally acceptable by the journal
context so so well you know so the first
version that we posted not even on a bi
archive because by archive back then it
was essentially you know overwhelmed
with the number of submissions so so our
submission I think it took five or six
days to just for it to be screened and
and and put online so we you know
essentially we put the first pre pre n't
on our website and you know it was
started getting accessed right away so
and and you know so this original
preprint was in a much rougher shape
than this paper and but we tried I mean
we honestly try to be as compact as
possible with you know
introducing the the information that is
necessary that to explain our you know
our results so maybe you can dive right
in if it's okay sure so it's a paper
called structured
of Tsarskoe how do you even pronounce
our scurvy - Co V - yeah by The Cove it
is such a terrible name but it stuck
and yes Tsarskoe V - indicates
evolutionary conserved functional
regions of viral proteins so this is
looking at all kinds of proteins that
are part of the this novel coronavirus
and how they match up against the
previous other kinds of corona viruses
and there's a lot of beautiful figures I
was wondering if you could I mean
there's so many questions I could ask
her but maybe a tough how do you get
started at doing this paper so how do
you start to figure out the 3d structure
of a novel virus yes so there is
actually a little story behind it and so
the story actually dated back in
September of 2019 and you probably
remember that back then we had another
dangerous virus Triple E virus its
eastern equine encephalitis virus and
can you maybe linger in it I have to
admit I was sadly completely unaware so
so that was actually a virus outbreak
that happened in New England only the
the danger in this virus was that it
actually it targeted your brain so so
the word deaths from this virus it was
it was transferred you know transfer the
main vector was mosquitoes and obviously
full-time is you know the time where you
have a lot of them in New England and
you know on one hand people realize this
is this is this actually very dangerous
thing so it had an impact on the local
economy the schools were closed past six
o'clock no activities outside for the
kids because the kids were suffering
quite tremendously from you know what
infected from this virus and how do I
not know about this was impacted it was
in the news I mean it was not impacted
to to high degree in in Boston
necessarily but in the Metro West area
and actually spread around I think all
the way to New Hampshire Connecticut and
you mentioned affecting the brain that's
one other comment
we should make so you mentioned a AC two
for the corona virus so these viruses
kind of attach to something in the body
so it essentially attaches to the to
these proteins in those cells in the
body where those proteins are expressed
where they actually have them in in
abundance so sometimes that could be in
the lungs that could be a brain that
could be so I think what they right now
from what I read they have the
epithelial cells inside in so did the
cells essentially inside the you know
the it's the cells that are covering the
surface you know so inside the nasal
surfaces the this road the lung cells
and I believe liver as a couple of other
organs where they are actually
expressing in abundance that's for the
AC tuition for 318 two percenters okay
so back back to the story yes in the
fall so now the these you know the
impact of this virus is significant
however it's a pre local problem to the
point that you know this something that
we would call a neglected disease
because it's not big enough to make you
know the the drug design companies to
design a new antiviral or in York seen
it's not big enough to generate a lot of
grants from the nation of finding
agencies so so does it mean we cannot do
anything about it and so what I did is I
taught a by informatics class and is in
Worcester Polytechnic Institute and we
are very much problem learning
institution so I thought that that would
be a perfect you know perfect project in
case study so so I asked it you know so
so I we essentially designed a study
where we tried to use by informatics to
to understand as much as possible about
this virus and a very substantial
portion of the study was to understand
the structures of the proteins to
understand how they interact with with
each other and with the with the host
proteins try to understand the evolution
of this virus
it's obviously you know a very important
question how where it will evolve
further how you know how it happened
here you know so so we did all this you
know
projects and now I'm trying to put them
into a paper where all these
undergraduate students will be coasters
but essentially the projects were
finished right about mid-december and a
couple of weeks later I heard about this
mysterious new virus that was discovered
in you know was reported in in Wuhan
province and immediately I thought that
well we just did that can't we do the
same thing with this virus and so we
started waiting for the genome to be
released because that's essentially the
first piece of information that is
critical once you have the genome
sequence you can
doing a lot using my informatics when
you see genome sequence that's referring
to the sequence of letters that make up
the RNA so the sequence that make up the
entire information encoded in the
protein right so so that includes all 29
genes
what are genes what's the encoding of
information sosigenes is essentially is
a basic functional unit that we can
consider so so each gene in the virus
would correspond to a protein that so
gene by itself doesn't do it function it
needs to be converted or translated into
the protein that will become the actual
functional unit like you said the
printer so so we need the printer for
that we need to print it okay so the the
first step is to figure out that the
genome the sequence of things that to be
then used for printing the protein so
okay so then then the next step so once
we have this and so we use the existing
information about Sarkis the Czar's
genomics has been done in abundance so
we have different strains of SARS and
actually other related coronaviruses
MERS the bat coronavirus and we started
by identifying the potential genes
because right now it's just the sequence
right it's a sequence that is roughly
it's less than 30,000 nucleotide long
and this the raw sequence it's a rose
ignore the information really and we now
need to define the boundaries of the
genes
that would then be used to identify the
proteins and protein structures how hard
is that problem it's not I mean it's
pretty straightforward
so you know so because we use the
existing information about SARS proteins
and SARS genes
so once again we kind of we are relying
on the yes so and then once we get there
this is where sort of the first more
traditional bind phonetic steps step
begins we are trying to use these
protein sequences and get the 3d
information about those proteins so this
is where we are relying heavily on the
structure information specifically from
the protein data bank that we are
talking about and here you're looking
for similar proteins yes so so the the
concept that we are operating when we do
this kind of modeling it's called
homology or template based modeling so
essentially using the concept that if
you have two sequences that are similar
in terms of the letters the structures
of these sequences are expected to be
similar as well and this is at the micro
at a very local scale and at the scale
of the whole protein at the whole
protein I saw actually so you know so of
course the devil is any details and this
is why we need actually pre
sophisticated modeling tools to do so
once we get these structures of the
individual proteins we try to see
whether or not this proteins act alone
or they have to be forming protein
complexes in order to perform this
function and again so this is sort of
the next level of the modeling because
now you need to understand how proteins
interact and it could be the case that
the protein interacts with itself and
makes sort of a multi marek complex the
same protein just repeated multiple
times and we have quite quite a few such
proteins in Tsarskoe v2 specifically
spike protein needs three copies to
function and load protein needs five
copies to function and there are some
other multimeric complexes that we mean
by interacted with itself and you see
multiple copy so how do you how do you
make a good guess whether something's
going to interact well again so there
are two approaches right so one is look
at the previously solved complexes now
we're looking not at the individual
structures but the structures of the
whole complex complex is upon multiple
proteins yes so it's a bunch of proteins
essentially glued together and and when
you say glued that's the interaction
that's the interaction so so the
different forces different sort of
physical forces behind this as I
certainly keep asking dumb questions but
is it is the glue is that the
interaction fundamentally structural or
is it functional like in the way you're
thinking about it that's actually a very
good way to ask this question because
turns out that the interaction is
structural but in the way it forms this
truck
it actually also carries out the
function so interaction is often needed
to carry out very specific function or
protein but in terms of an earth-sized
figuring out you're really starting at
the structure before you figure out the
function so there's a beautiful figure
two in the paper of all the different
proteins that make up the able to figure
out the makeup the the new the novel
current virus what what are we looking
at right so these are like that's this
through the the step to the mentioned
when you try to guess at the possible
proteins that's what you're going to get
is these blue blue cyan blobs yes so
those are the individual proteins for
which we have at least some information
from the previous studies right so there
is advantage and disadvantage of using
previous studies the biggest well the
disadvantage is that you know we may not
necessarily have the coverage of all 29
proteins however the biggest advantage
is that the accuracy in which we can
model these proteins is very high much
higher compared to a Benicia methods
that do not use any template information
so but nevertheless this figure also has
incision beautiful and I love these
pictures so much you've as it has like
the pink parts yes there are the parts
that are different so you're
highlighting so the difference you find
is on the 2d sequence and then you try
to infer what I would look like on the
3d yeah so the difference actually is on
1d sequence one d1 design idea so and
and so this is one of these first
questions that we try to answer is that
well if you take this new virus and you
take
the closest relatives which are SARS and
a couple of bad coronavirus strains they
are already the closest relatives that
we are aware of now what are the
difference between this virus and its
close relatives right and what if you
look DIPA Klee when you take a sequence
those differences could be quite far
away from each other so what make what
3d structure makes those difference to
do they very often they tend to cluster
together interesting and over sudden the
differences that may look completely
unrelated actually relate to each other
and sometimes they are there because
they correspond they attack the
functional side right so they are there
because this is the functional side that
is highly mutated so that's a
computational approach to figuring
something out when when it comes
together like that that's kind of a nice
clean indication that there's something
this could be actually indicative of
what's what's happening yes I mean so we
need this information and you know 3d
the 3d structure gives us just a very
intuitive way to look at this
information and then start to ask you
know start asking questions such as so
this place of this protein that is
highly mutated does it does it is it the
functional part of the protein so does
this part of the protein interact with
some other protein so maybe with some
other ligands small small molecules
right so we would try now to
functionally inform this 3d structure
so so you have a bunch of these mutated
parts is like I don't know how like how
many are there in the new novel
coronavirus being compared it's ours oh
we're talking about hundreds of
thousands like these these pink region
all know did much less than that and
it's very interesting that if you look
at that you know so the first thing that
you you start seeing right you know you
look at patterns right and the first
pattern that becomes obvious is that
some of the proteins in the new
coronavirus are pretty much intact right
so they're pretty much exactly the same
as SARS as the bat coronavirus
where some others are heavily mutated so
so it looks like that the you know the
evolution is not is not a curing you
know uniformly across the entire you
know viral genome but actually target
very specific proteins what do you do
with that like from the Sherlock Holmes
perspective well you know so one of the
of the most interesting findings we had
was the fact that the viral so the the
binding sites on the viral surfaces that
get targeted by the known small
molecules the world pretty much not
affected at all and so that means that
the same small drugs or small small drug
like compounds can be efficient for the
new current a virus
this all actually maps to the drug
compounds - like so so you're actually
mapping out what old stuff is gonna work
on this thing and then possibilities for
new stuff to work by mapping out the
things I've mutated yes so so we
essentially know which parts is in
behave differently and which parts are
likely to behave similar and again you
know of course all our predictions need
to be validated by experiments but
hopefully that sort of helps us to
delineate the regions of this virus that
you know can be promising in terms of
the drug discovery you kind of you kind
of mentioned this already but maybe you
can elabora
Resume
Read
file updated 2026-02-13 13:25:51 UTC
Categories
Manage