Luís and João Batalha: Fermat's Library and the Art of Studying Papers | Lex Fridman Podcast #209
ndMahzDCH1Y • 2021-08-09
Transcript preview
Open
Kind: captions
Language: en
the following is a conversation with
luiz and joao batala
brothers and co-founders of firma's
library which is an incredible platform
for annotating papers as they write on
the formats library website justice
pierre de fermat scribbled his famous
last theorem in the margins professional
scientists academics and citizen
scientists can annotate equations
figures ideas and write in the margins
for mars library is also a really good
twitter account to follow i highly
recommend it they post little visual
factoids and explorations that reveal
the beauty of mathematics
i love it
quick mention of our sponsors
skiff
simply safe indeed netsuite and for
sigmatic check them out in the
description to support this podcast as a
side note let me say a few words about
the dissemination of scientific ideas
i believe that all scientific articles
should be freely accessible to the
public
they currently are not
in one analysis i saw more than 70 of
published research articles are behind a
paywall
in case you don't know the funders of
the research whether that's government
or industry
aren't the ones putting up the paywall
the journals are the ones putting up the
paywall while using unpaid labor from
researchers for the peer review process
where is all that money from the paywall
going
in this digital age the costs here
should be minimal
this cost can easily be covered through
donation advertisement or public funding
of science
the benefit versus the cost of all
papers being free to read is obvious and
the fact that they're not free goes
against everything science should stand
for which is the free dissemination of
ideas that educate and inspire
science cannot be a gated institution
the more people can freely learn and
collaborate on ideas the more problems
we can solve in the world together and
the faster we can drive old ideas out
and bring new
better ideas in
science is beautiful and powerful and
its dissemination in this digital age
should be free
this is the lex friedman podcast and
here's my conversation with luiz and
joao batala
luis you suggested an interesting idea
imagine if most papers had a
backstory section the same way that they
have an abstract
so
knowing more about how the authors ended
up working on a paper can be extremely
insightful and then you went on to give
a backstory for the feynman qed paper
this is all in a tweet by the way we're
doing tweet analysis today
how much of the human backstory do you
think is important in understanding
the idea itself that's presented in the
paper or in general
i think this gives way more context to
the work of of scientists i think people
a lot of people have this almost kind of
romantic misconception that
the way a lot of scientists work is
almost as the sum of eureka moments
where all of a sudden they sit down and
start writing two papers in a row and
the papers are usually isolated and when
you actually look at it it's the papers
are you know chapters of a way more
complex uh story
and the definement qed paper is a good
example so feynman was actually going
through a pretty dark phase before
writing that paper it was he lost
enthusiasm with physics and doing
physics problems and there was one time
when he was in the cafeteria of cornell
and he saw a guy that was throwing
flights in the air and he noticed that
there was when the plate was in the air
there were two movements there the plate
was wobbling but he also noticed that
the the cornell symbol was rotating and
he was able to figure out the equations
of motions uh the equations of motions
of those uh plates and that uh led him
to kind of think a little bit about
electron orbits in relativity which led
to the paper of
about quantum electrodynamics so that
kind of reignited
his interest in physics and and and
ended up publishing the paper that led
to the his nobel prize basically and i
think it's it's
there are a lot of really interesting
backstories about papers that readers
never get to know friends we did a
couple of months ago um
an ama around a paper a pretty famous
paper the gans paper with ian goodfellow
and so we did an ama where everyone was
could ask questions about the paper and
ian was responding to those questions
you also he was also telling the story
of how he got the idea for that paper in
a bar so there was also an interesting
and a back story i also read a book by
cedric villani
uh these cedric velani is this
mathematician the fields medalist and in
his book he tries to explain how he got
from like
a phd student to the fields metal and he
tries to be as descriptive as possible
every single step how we got to the
fields metal and it's interesting also
to see just the amount of random
interactions and discussions with other
researchers sometimes over coffee and
how it led to like
fundamental breakthroughs and some of
his most important papers so i think
it's super interesting to have that
context of of the backstory well the ian
goodfellow story is kind of interesting
and perhaps that's true for feynman as
well i don't know if it's romanticizing
the thing but
it seems like just
a few little insights and a little bit
of work
does most of the leap required
do you have a sense that for a lot of
the stuff you've looked at
just looking back through history
uh it it wasn't necessarily the grind
of like andrew wiles of the females last
theorem for example
it was more like a a brilliant moment of
insight in fact ian goodfellow has a
kind of sadness to him almost in that
at that time in machine learning like at
that time especially in uh
for gans
you could
code something up really quickly on a
single machine
and almost do the invention go from idea
to uh experimental validation and like a
single night a single person could do it
and now there's kind of a sadness that a
lot of the breakthroughs you might have
in machine learning kind of require
large-scale experiments
so it was almost like the early days
so i wonder how many
low-hanging fruit there are in science
and mathematics
and even engineering where it's like
you could do that little experiment
quickly like you have an insight and a
bar why is it always a bar but you have
an insight at a bar and then just
implement
and the world changes
it's it's a good point i think it also
depends a lot on the maturity of the
field when you look at a field like
mathematics like it's a pretty mature
field uh feels like machine learning um
it's it's growing pretty fast
and um it's actually pretty pretty
interesting i i looked up like the
number of
new papers
on archive with the keyword machine
learning and like 50 of those papers
have been published on in the last 12
months so you can see just the same zero
five zero fifty percent so you can see
the the the the magnitude of growth in
that field and so i think like as fields
mature like those types of moments i
think naturally
uh are less frequent um
it's just a consequence of
that the other point that is interesting
about the backstory is that it can
really make it more memorable in a way
and and by making it more memorable it's
it kind of sediments the knowledge more
in your mind i remember also reading the
sort of the backstory to
to dijkstra's shortest path algorithm
right where he came up with it uh
essentially while he was
sitting down at a at a coffee shop in
amsterdam and he and he came up with
that algorithm over 20 minutes and one
interesting aspect is he didn't have any
pen or paper at the time and so he had
to do it all in his mind and so
there's only so much complexity that you
can handle if you're just thinking about
it in your mind and that like when you
think about the simplicity of dijkstra's
shortest path finding algorithm it's you
know knowing that backstory helps
sediment that algorithm in your mind so
that you don't forget about it as easily
it might be from you that i saw
a meme about texture
it's like he's trying to solve it he
comes up with some kind of random path
and then it's like my parents aren't
home and then he does uh
he figures out the algorithm for the
shortest path
i strike through words to convey memes
but that's hilarious i don't know if
it's in post that we construct stories
that romanticize it apparently with
newton there was no apple
especially when you're working on
problems that have a physical
manifestation or a visual manifestation
it feels like the world
could be an inspiration to you
so it doesn't have to be completely in
on paper
like you could be sitting at a bar and
all of a sudden see something and a
pattern will
will spark another pattern and you can
visualize it and rethink a problem in a
particular way
of course you can also load the math
that you have on paper and always carry
that with you so when you show up to the
bar some little inspiration could be the
thing that changes it is there any other
people
almost on the human side whether it's
physics with feynman
derock einstein or computer science
touring anybody else any backstories
that you remember that jump out
because i'm also referring to
not necessarily these stories where
something magical happens
but these are personalities they have
big egos some of them are super friendly
some of them are like self-obsessed some
of them have anger issues some of them
how do i describe feynman but he appears
to uh
have a
appreciation of the beautiful in all its
forms it has a wit and a cleverness and
a humor about him so it does that come
into play in terms of the construction
of the science
well i think you brought up newton
newton is it's a good example also to
think about his backstory because you
know there's a certain backstory of
newton that people always talk about but
then there's a whole
another aspect of him
that is also a big part of the person
that he was but you know he was really
into alchemy right and that he spent a
lot of time
thinking about that and writing about it
and he took it very seriously he was
really into bible interpretation and
trying to predict things based on the
bible and so there's also a whole
backstory then and of course you need to
look at it in the context that and the
time that when newton lived um but a but
it adds to his personality and it's
important to also understand those
aspects then maybe
you know uh i'm not people people are
not as proud to teach to little kids
but it's important it was part of who he
was and and maybe without those he who
knows what he would have done otherwise
so
well the the cool thing about alchemy
i don't know how it was viewed at the
time
but it almost like to me symbolizes
dreaming of the impossible
like most of the breakthrough ideas kind
of seem impossible until they're
actually done it's like achieving human
flight it's not completely obvious to me
that alchemy is impossible or like
putting myself in the mindset of the
time
and perhaps even still
every everything that uh
you know some of the most incredible
breakthroughs are
would seem impossible
and i wonder the value of
believing
almost like focusing and dreaming of the
impossible such that it is actually is
possible in your mind and that in itself
manifests
whether the accomplishing that goal or
making progress in some unexpected
direction so alchemy almost symbolizes
that for me i distinctly remember having
the same thought of thinking you know
when i learned about atoms and that they
have protons and electrons i was like
okay to make gold you just take whatever
has an atomic weight below it and then
shove another proton in there and then
you have a bunch of gold so like why
don't people do that
it seemed like conceptually is like you
know this sounds feasible you might be
able to do it and you can actually it's
just very very expensive yeah yeah
exactly exactly so in a sense we do have
alchemy and
maybe even back then it wasn't as crazy
that he was so into it
but good people just don't like to talk
about that as much
yeah but newton in general is a very
interesting fellow
anybody else come to mind
in terms of
people that inspire you
in terms of people that you just
are happy that they have once or still
exist on this earth
i think i mean freeman dyson for me
yeah freeman dyson was was
i've had a chance to actually exchange a
couple of emails with him it was
probably one of the most humble
scientists that i've ever met and that
had a a big impact on me we were trying
we're actually trying to convince him to
annotate a paper on fermat's library
and i sent him an email asking him
if you could annotate a paper and his
response was something like i have very
limited knowledge i just know a couple
of things about certain fields i'm not
sure if i'm qualified to do that that
was his first response and
and this was someone that should have
won an opera fry's and worked on a bunch
of different fields um did some really
really
great work
and then just the interactions that i
had with him every time i asked him a
couple of questions about his papers and
uh he always responded saying i'm not
here to answer your questions i just
want to open it more questions
um and uh so that had a big impact on me
it was like just
an example of an extremely humble
yet
accomplished
uh scientist and feynman was also a big
big inspiration in the sense that he was
able to be
you know again extremely talented and
and scientists but at the same time
socially he was able to to he was also
really smart from a social perspective
and he was able to
interact with people it was also a
really good
teacher and was also to did a awesome
work in terms of um
explaining physics to to the masses and
motivating and getting people interested
in physics
and that for me was was also a big
inspiration
yeah i like the childlike curiosity of
some of those folks like you mentioned
freeman i have daniel kahneman i got a
chance to meet and interact with
some some of these truly special
scientists
what makes them special is that even in
uh older age
they're still
like there's still that fire of
childlike curiosity that burns
and uh some of that is like not taking
yourself so seriously that you think
you've figured it all out
but almost like thinking that you don't
know much of it
and
that's like step one in having a great
conversation or collaboration or
exploring a scientific question it's
cool how the very thing that probably
earned people the nobel prize or
or work that's seminal in some way
is the very thing that still burns even
after
uh they've won the prize it's cool to
see and they're rare humans
it seems and to that point i remember
like the last email that i sent to
freeman dyson was like in his last
birthday he was really into number
theory and primes so what i did is i
took like a photo of him a picture and
then i turned that into like
a giant prime number
so i converted the picture into a bunch
of one and eight and then i moved some
numbers around until it was a prime
um and then i sent him that
also the the visual like it still looked
like the picture it's made up of a
problem that's tricky to do it's hard to
do it looks harder than it actually is
so the the way you do it is like you
convert the darker regions into eights
and the lighter regions in ones
and then there's just keep flipping yeah
but there's like some primality tests
that are cheaper from a computational
standpoint yes but what it tells you is
it excludes numbers that are not prime
then you end up with a set of numbers
that you don't know if they are prime or
not and then you run the full primality
test on that so you just have to keep
iterating on that and it was it was it's
it's funny because when he got the
picture he was like how did you do that
it was super curious too and then we got
into the details and again this was he
was already 90 i think 92 or something
and that curiosity was still there um
so you could really see that in in some
of these scientists
so could we talk about vermont's library
yeah absolutely what
is it
what's the main goal what's the dream
it is a platform for annotating papers
in its essence right and so academic
papers can be one of the densest forms
of content out there and generally
pretty hard to understand at times and
the idea is that you can make them more
accessible and easier to understand by
adding these rich annotations to the
site right and so we can just imagine a
pdf view on your browser and then you
have annotations on each side and then
when you click on them a sidebar expands
and then you have
annotations that support latex and
markdown
and so the idea is that you can
say explain a tougher part of a paper
where there's a step that is not
completely obvious
or you can add more context to it
and then over time papers can become
easier and
easier to understand and can evolve in a
way but it really came from
myself luige and two other friends we've
been
we've had this this long-running habit
of kind of running a journal club
amongst us we come from different
backgrounds right i studied cs we
studied physics and so we read papers
and present them to each other and uh
and then we tried to bring some of that
online and that's that's that's when we
decided to to to build fermat's library
um
then over time it kind of
grew into into something uh with with a
broader goal uh
and really what we're trying to do is
trying to help
uh move science in the in the the right
direction
that's really the ultimate goal and and
where we want to take it now so there's
a lot to be said so first of all for
people who haven't seen it
the interface is exceptionally well done
that's like execution is really
important here absolutely the other
things just to mention
for
a large number of people apparently
which is new to me don't know what latex
is
so it's spelled like latex so be careful
googling it if you haven't before
uh it's uh
uh
sorry i don't even know the correct
terminology type setting it's a
typesetting language
where it's you're basically program
writing a program that then generates
something that looks
from a typography perspective beautiful
absolutely and uh so a lot of academics
use it to write papers i i think there's
like a bunch of communities that use it
to write papers i would say it's
mathematics physics computer science
yeah that's yeah that's the because i'm
collaborating currently on a paper with
uh two neuroscientists from stanford and
they don't know what
so i'm using uh microsoft word and uh
mendeley
and like all of those kinds of things
and it's
and i'm being very zen like about about
the whole process but it's fascinating
it's a little heartbreaking actually
because uh
it actually it's it's funny to say but
uh and we'll talk about open science
actually the bigger mission behind for
mars libraries like
really opening up the world of science
to everybody
is these silly
two facts of like one community uses
latex and another uses word
is actually a barrier between them
that's like it's like boring and
practical in a sense but it makes it
very difficult to collaborate
just on that like i think there if there
are some people that should have
received like a nobel prize that but
we'll never get it and i think one of
those is like donald knuth because of
tech and latex and then
because it had a huge impact in terms of
like just
making it easier for uh researchers to
put their content out there like making
it uniform as much as possible oh you
mean like a nobel peace prize well maybe
maybe a couple of peace prizes
maybe a nobel peace prize yeah
i
i think so i mean he at a very young age
got the touring award for his work in
algorithms and so on so yeah like an
incredible yeah like when i i think it's
in
it might be even the 60s but i think
it's the 70s that so when he was really
young and then he went on to do like
incredible work
with his book and uh yeah with tech that
people don't know and and going back
just one
on the reason why we we ended up because
i think this is interesting the reason
why we ended up using the name for mars
library this was because of uh vermont's
last theorem and from us livestream is
actually a funny story like so pierre de
fermat he was like a lawyer and he
wrote like on a book
that he had a solution to fermat's last
theorem which um but that didn't fit the
margin of that book
and so fermat's lie stream basically
states that there's no solution if you
have uh
integers a b and c there's no solution
to a to the power of n plus b to the
power of n equals to c to the power of
n
if n is bigger than two so there's
there's there's no solutions and
he said that
and
that problem remained open for almost
300 years i believe and a lot of the
most famous mathematicians tried to
tackle that problem no one was able to
figure that that out until andrea wiles
uh i think was in in the 90s was able to
publish the solution which was i i
believe almost 300 pages long
and so it's kind of an anecdote that you
know there's a lot of of knowledge and
insights that can be trapped
in the margins then you and there's a
lot of potential energy that you can
release if you actually um spend some
time trying to digest
that and that was the the the origin
story for
for the name yes you can share the
contents of the margins with the world
exactly that could inspire a solution or
a communication that then leads to a
solution but and and if you think about
papers like papers are as as jean was
saying probably one of the densest
pieces of text that
any human can read and you have these
researchers like some of the brightest
minds in in these fields working on like
new discoveries and publishing these
work on journals that are imposing them
restrictions in terms of the number of
pages that they can have to explain a
new scientific breakthrough so at the
end of the day papers are not optimized
for clarity and for a proper explanation
of of that content because there are so
many restrictions so there's as i
mentioned there's a lot of potential
energy that can be freed if you actually
try to digest a lot of the contents of
papers
can you explain some of the other things
so margins librarian journal club
so journal club is what a lot of people
know us for uh where we every week we
release an annotated paper and in all
sorts of different fields with physics
cs math
margins is kind of the same software
that we use to to run the journal club
and to host the annotations but we've
made that available for free to anybody
that wants to use it and so
folks use it at universities and
and
for running journal clubs
and and so we just made that freely
available and then librarian is a
browser extension that we developed that
is sort of an overlay on top of archive
so it's about bringing some of the same
functionality around comments plus
adding some extra
niceties to to archive like being able
to very easily extract the references of
a paper that you're looking at or being
able to extract the bibtex in order to
cite that paper yourself
so it's an overlay on top of archive the
idea is that you can have that
commenting interface without having to
leave archive it's kind of incredible i
didn't know about it
and once i've learned of it
it's like holy shit
why isn't it more popular given how
popular archive is like everybody should
be using it archive sucks
or uh let me rephrase that it's limited
yeah in terms of what's interesting
archive is a pretty incredible project
right and it is in in a way it's
it you know it the growth has been
completely linear over time if you look
at like number of papers published on
archive like you know it's just been
it's pretty much a straight line for the
past 20 years especially for you know
like if you're coming from a startup
background and then you were trying to
do archive you'd probably try like all
sorts of growth acts and like try to
to then maybe like have paid features
and things like that and that would kind
of maybe ruin it and so there's
there's a subtle balance there yeah and
i don't know what what aspects you can
change about it and yeah for some tools
in science it just takes time for them
to to grow archive is just turned 30 i
believe yeah and for for people that
don't know archive is these kind of
online repository where people put
preprints which are versions of the
papers before they actually make it to
journals
a-r-x-i-v exactly for people who don't
know and it's actually a really vibrant
place to publish your papers in in the
aforementioned
uh communities of mathematics exactly in
computer science it started with
mathematics and physics and then over
the the last 30 years it evolved and now
actually computer computer science now
it's it's a more popular category than
than physics and math on archive and
there's also which i don't know very
much about like a
biology medical version of that bio
archive yeah by archive um it's recent
it's um it's interesting because if you
look at like these um platforms for
preprints they are
they actually play a super important
role because
if you look at a category like math
for some papers in math it might take
close to three years
after you click upload paper on the
journal website and the paper gets
published on the website of the journal
so this is literally the longest
upload period on the internet
um and during those three years like
it's it's you know
their content is just you know locked
and so this that's why it's so important
for people to have websites like archive
so that you can share that before it
goes to the journal with the rest of the
world there was actually on archive that
uh perumann published the the three
pipers that led to the proof of the
poincare conject conjecture and then you
have other fields like
machine learning for instance where the
the field is evolving at such a high
rate that people don't even wait before
the papers go to journals before they
start working on top of those papers so
they publish them on archive then other
people see them they start working on
that and archive did a really good job
at like building that core platform to
host papers but i i think there's a
really really big opportunity in
building more features on top of that
platform apart from just hosting paper
so collaboration annotations and
like having other things apart from from
papers like code um
and and other things because uh in the
field like machine learning there's a
really big you know as i mentioned
people start working on on top of
preprints and they are assuming that
that
that preprint is correct
but you really need a way for instance
to maybe
it's not peer review but
distinguish what is good work from bad
work on archive how do you do that so
like a commenting interface like
librarian it's useful for that so that
you can distinguish that um at
in the field that is growing so fast as
machine learning and um and then you
have
platforms that focus for instance on
just biology bioarchive is a good
example um
bioarchive is also super interesting
because there there's actually
an interesting experiment that was run
in the 60s so in the 60s the nih um
supported this pro this
this experiment called the information
exchange group
which at the time was a way for
researchers to share biology preprints
via mail or using libraries and that
project in the 1960s got cancelled six
years after it started and it was due to
intense pressure from the journals to
kill that project because they they were
fearing a competition from from the
uh
for in for the journal industry creek uh
was also uh was one of the famous
scientists that opposed
to to the uh information exchange group
and it's interesting because right now
if you analyze the number of biology
papers that
appear first as preprints it's only two
percent of the papers
and it this was almost 50 almost 50
years after that first experiment so you
can see like that pressure from the
journals to cancel that uh initial
version of a pre-print repo had a
tremendous impact on on on the number of
papers that are showing up in biology as
preprints so it delayed a lot that
that revolution and um
but now platforms like bioarchive are
doing that work but there's still a lot
of room for growth there and i think
it's super important because those are
the papers that are open that everyone
can read
okay so but if we just look at the
entire process of science as a big
system can we just talk about how it can
be revolutionized
so
you have an idea
uh depending on the field you want to
make that idea concrete you want to run
a few experiments in computer science
there might be some code
there'd be a data set
for you know some of the more sort of
biology
psychology
you might be collecting the data set
that's called you know a study right
so that's part of that that's part of
the methodology and so you are putting
all that into a paper form
and then
you have some results
and then you you submit that to a place
for
review through the peer review process
and there's a process where how would
you summarize the peer review process
but it's it's really just like a handful
of people look over your paper and
comment and based on that decide whether
your paper is good or not
so there's a whole broken nature to it
at the same time i love the peer review
process when i buy stuff on amazon
like uh
for like uh the commenting system
whatever that is so okay so there's a
bunch of possibilities for revolutions
there and then there's the other side
which is the collaborative aspect of the
science which is people annotating
people commenting sort of the low effort
collaboration which is a comment
sometimes as you've talked about a
comment can change everything but you
know or a higher effort collaboration
like more like maybe annotations or even
like contributing to the paper you can
think of like
a
collaborative updating of the paper over
time
so there's all these possibilities for
doing things
better than they've been done
can we talk about some ideas in this
space some ideas that you're working on
some ideas that uh you're not yet
working on but should be revolutionized
because it does seem
that archive and like open review for
example
are like the craigslist of science like
like
yeah okay i'm very grateful that we have
it but it just feels like
it's like 10 to 20 years like it doesn't
feel like that's a feature the
simplicity of it is a feature it feels
like it's a it's a bug
[Music]
but then again the the pushback there is
uh wikipedia has the same kind of
simplicity to it
and it seems to work exceptionally well
in the crowdsourcing aspect of it i'm
sorry this there's a bunch of stuff
going on on the table let's just pick
random things that we can talk about
wikipedia you know for me it's the
cosmological constant of the internet
it's like i think we are lucky to live
in the parallel universe where wikipedia
exists yes because if if someone had
pitched me wikipedia like a publicly
edited
encyclopedia like a couple of years ago
like it would be i don't know how many
people would have said that that would
have
survived
yeah i mean it makes almost no sense
it's like having a google doc that
everybody on the internet can edit and
like that will be like the most reliable
source for for knowledge and i don't
know how many but hundreds of thousands
of topics yeah exactly
it's insane it's insane and like you
have and then you have users like
there's one a single user that edited
one third of the articles on wikipedia
so you have these really really big
power users there are a substantial part
of like what makes wikipedia
successful and so
like
no one would have ever imagined that
that could happen
um and so that that's that's one thing i
i completely agree with what you just
said i also started to interrupt briefly
maybe let's inject that into the
discussion of everything else
i also believe i've seen that with stack
overflow that one individual or a small
collection of individuals contribute or
revolutionize
most of the community like if you create
a really powerful system for archive
or like open review it made it really
easy
and compelling
and exciting for one person who isn't
like a 10x contributor to do their thing
that's going to change everything it
seems like that was the mechanism that
changed everything for wikipedia and
that's the mechanism that changed
everything for stack overflow is
gamifying or making it exciting or just
making it fun or pleasant or fulfilling
in some way for those people who are
insane
enough to like answer thousands of
questions or
write thousands of factoids and like
research them and check them all those
kinds of things or read thousands of
papers yeah no stack overflow is another
great example of that and it's just
and and those are both to
incredibly productive communities that
generate a ton of value and and and
capture almost none of it right and it's
and you know in a way it's almost like
counter um
it's very counter-intuitive that that
that people that these communities would
exist and thrive um
and and it's really hard to
you there aren't that many communities
like that so how do we do that for
science do you have ideas there like
what are the biggest problems that you
see you're working on some of them
like just on that there are a couple of
really interesting experiments that
people are running an example would be
like the polymath projects so this is a
so kind of a social experiment that was
uh created by tim gowers
fields fields medalist and his idea was
to try to prove that is it possible to
do mathematics in a massively
collaborative collaborative way on the
internet so we decided to pick a couple
of problems and
test that and they found out that it it
actually it is possible for a specific
types of problems
namely problems that you're able to
break down in in little pieces and go
step by step you might need as as with
open source you might need people that
are just kind of reorganizing the the
house every once in a while and then you
know people throw a bunch of ideas and
then you know you make some progress
then you reorganize you reframe the
problem you go step by step but they
were actually able to prove that it is
possible to to
uh collaborate online and and
do progress in terms of mathematics um
and so i'm i'm confident that there are
other avenues that could be explored
here can we talk about peer review for
example absolutely i i think like in in
terms of the peer review i think we it's
it's important to look at the bigger
picture here of like
of what this scientific the scientific
publishing ecosystem looks like because
for me
there there are a lot of things that are
wrong about that entire process so
if you look at for instance at the what
publishing means in like a traditional
journal you have uh journals that pay um
authors
for their articles and then they might
pay like reviewers to um review those
articles and finally they pay
people to um or distributors to
distribute the content
in in the scientific publishing world
you have scientists that are usually
backed by government grants they are
giving away their work for free in the
form of papers
and then you have other scientists that
are reviewing their work
this process is known as the peer review
process again for free
and then finally we have um
government-backed universities and
libraries that are buying back
all those
all that work so that other scientists
can we can read so this is for me it's
bizarre you have the government that is
funding the research is paying the
salaries of the scientists it's paying
the salaries of the reviewers and it's
buying back all that uh the product of
their work again
um and i think the problem with this
system and it's what it's why it's so
difficult to to break this
suboptimal equilibrium is because of of
the way academia works right now and the
way you can progress in in your academic
life
and and so
in a lot of fields the the competition
in academia is is really insane
so you have hundreds of phd students
there are um trying to get to
a professor position and and it's hyper
competitive and the only way for you to
get there
is if you publish papers ideally in
journals with a
high impact factor in computer science
it's all it's often conferences are also
very prestigious or actually more
prestigious than journals now
so interesting so that's the one
discipline where i mean that has to do
with the thing we've discussed uh in
terms of the how quickly the field turns
around but like uh in eurips cvpr those
conferences are more prestigious or at
the very least as prestigious as the
journal
but doesn't matter the process is what
it is and and and so with the the so for
people that don't know how the impact
factor of a journal is basically the
average number of citations that a paper
would get if it gets published on that
journal
but so um you can really think that
the problem with the the impact factor
is that it's a way to turn papers into
accounting units
and and and let me unpack this because
it's the impact factor is almost like a
nobility title so because papers are
born with impact even before anyone
reads them so the researchers they don't
have the incentive
to care about if this paper is going to
ever a long-term impact on on on the
world what they care their goal their
end goal is the paper to get published
yes so that they get that value up front
and so for me that that is one of the
problems of of that and that really
creates a tyranny of of metrics
because at the end of the day if you are
a dean what you want to hire is like
people researchers that publish papers
on journals with high impact factors
because that will increase the ranking
of your university and will allow you to
charge more for tuition so on and so
forth and um and and that that
especially when you are in super
competitive areas you know that people
will try to gamify that system and and
misconduct starts showing up
um there's a a really interesting book
on this topic called gaming the metrics
it's a book by a researcher called mario
biagioli it goes a lot into like how
these
the impact factor and metrics affect
science negatively and it's interesting
to think especially in terms of
citations if you look at the early work
of like looking at citations there was a
lot of work that was done by a guy
called eugene garfield and this guy
the early work in terms of citation they
wanted to use they wanted to use
citations as
from a descriptive point of view so what
they wanted to to create was a map
and and that map would create a visual
representation of of influence so
citations would be links
between papers and the ideally what they
would show they would represent is that
you read someone else's paper and it had
an impact on your research they weren't
supposed to be counted i think this
inspired like larry and sergey's exactly
worked right for google exactly i think
they even mentioned that but what
happens is like as you start counting
citations you create a market
and and the same way like and this was
the the work of eugene garfield was a
big inspiration for larry and sergey for
the pagerank algorithm that um you know
led to the creation of google and they
even recognize that and and if you think
about it's like the same way there's a
gigantic market for search engine
optimization uh seo where people try to
optimize you know the the page rank and
how i the uh of a web page will rank on
google the same will happen for papers
people will try to optimize like their
site their the impact factors and the
citations that they get and that um
creates a really big problem and if it's
super interesting to actually analyze
them if you look at the distribution of
the high impact the impact factors of
journals you have like nature with
nature i believe it's like in the low
40s and then you have i believe science
is high 30s and then you have a really
goo
a good set of good journals that
will
fall between 10 and 30 and then you have
a gigantic tale of of journals that have
impact factor below two and you can
really see two economies here you see
the the
you know the universities that are maybe
less prestigious less known that where
the faculty are pressured to just
publish papers regardless of the journal
what i want to do is increase the
ranking of my university and so they end
up publishing as many papers as as they
they can in like journals with low
impact factor and unfortunately this is
represents a lot of of the global south
and then you have the luxury good
economy
so for instance for and there are also
problems here in the luxury good economy
so if you look at the journal like
nature
so with impact factor of like in the low
40s
there's no way that you're going to be
able to sustain that level of impact
factor by just grabbing the attention of
scientists
what what i mean by that is like
for for the journals the articles that
get published in nature they need to be
new york times great so they need to
make it to the you know to the to the
big media they need to be captured by
the big media and because that's the
only way for you to capture enough
attention to sustain that level of
citations yes and that of course creates
problems because people then will try to
again gamify the system and have like
titles or abstracts or that are
bigger claim make claims that are bigger
than what is actually can be um
you know sustained by by the data or the
the content of the paper and you'll have
clickbait titles or clickbait abstracts
and again this is all a consequence of
metrics and uh scientometrics and and
this is a very dangerous cycle that i
think it's very hard to break
but it's happening in academia in a lot
of fields right now
is it fundamentally the existence of
metrics or the metrics just need to be
significantly improved
because uh
like i said the metrics used for amazon
for purchasing
i don't know
computer parts it's pretty damn good in
terms of selecting which are the good
ones which are not
in that same way if if we had an amazon
type of review system in the space of
ideas in the space of science it feels
like that those metrics would be a
little bit better
sort of when it's um
when it's significantly
more open to the crowd source nature of
the internet
of the of the scientific internet
meaning as opposed to like my biggest
problem with peer review
has always been
that it's like
five six seven people
usually even less and it's often
nobody's incentivized to do a good job
in the whole process
meaning
it's anonymous
in a way that
doesn't incentivize like doesn't gamify
or incentivize
great work
and also
it doesn't necessarily have to be
anonymous like there has to be um
the entire system is um
doesn't encourage actual sort of
rigorous review for example like
open review
does kind of incentivize that kind of
process of collaborative review but it's
also imperfect it just feels like
the thing that amazon has which is like
thousands of people contributing their
reviews to a product
it feels like that could be applied to
science
where
the same kind of thing you're doing with
vermont's library
but doing at a scale that's much larger
it feels like that should be possible
given the number of grad students
given the number of um
general public that get like for example
i
personally as a person
who got an education in mathematics and
computer science like
uh i can i can be a quote-unquote like
reviewer
on a lot bigger set of things than than
is my exact uh expertise
if i'm one of thousands of reviewers if
i'm the only reviewer or one of five
then i'd better be like an expert in the
thing but if if i uh and i've learned
this with covet which is like
you can just use your basic skills as a
data analyst as a and to contribute to
the review process and a particular
little aspect of a paper and be able to
comment be able to sort of uh
draw in some references that challenge
the ideas presented or to enrich the
ideas that are presented or you know and
it just feels like crowdsourcing
the review process would be able to
allow you to have
metrics
in terms of how good a paper is that are
much better representative of its actual
impact in the world of its actual value
to the world as opposed to some kind of
arbitrary gamified
version of its impact
i agree with that i i think we there's
definitely the possibility at least for
more resilient a more resilient system
than what we have today and that's i
think that's kind of what you're
describing alex and and i mean to an
extent we we kind of have like a little
bit of a
heisenberg uncertainty principle when
you pick a metric as soon as you do it
then maybe it works as a good heuristic
for for a short amount of time but soon
enough people would start gamifying and
yeah
but but then you can definitely have
metrics that are more resilient to
gamification and they'll work as a
better heuristic to to try to push you
in the in
the best direction
but i guess the underlying problem
you're saying is uh there's a shortage
of positions in academia that's a big
problem for me yeah and and that and so
they're going to be constantly gamifying
the metrics it's a bit of a zero-sum
it's very competitive it's what it's a
very competitive field and and that's
what usually happens in very competitive
fields yeah yeah
but i think some of like the peer review
problems like scale helps i think and
and it's interesting to look at like
what you're mentioning breaking it down
maybe in my smaller parts and having
more people jumping in
um but
th this is definitely a problem and and
the peer review problem as i mentioned
is
is correlated with the problem of like
academic career progression and it's all
intertwined and it's what that's why i
think it's so hard to to break it
um there are like a couple of really
interesting things that are being done
right now there are a couple of for
instance journals that are overlaid
journals on top of
platforms like archive and bioarchive
that want to remove like the more
traditional journals from the equation
so essentially a journal is just a
collection of links to papers and and um
and what they are trying to do is like
removing that middleman and trying to to
make the review process a little bit
more transparent
um
and and and not charging universities
like uh there's there's a couple of
there are a couple of more famous um
ones there's one discrete analysis in
mathematics there's one uh called the
quantum journal which we are actually
working with them we have a partnership
with them for the purpose that get
published in quantum journal they also
get the annotations on formats um and
they are doing pretty well they've been
able to grow substantially the problem
there is getting to critical mass so
it's again convincing the researchers
and especially the young researchers
that need
need that impact factor need those
publications to have citations to not
publish on the traditional journal and
go on an open journal and and publish
their work there there i think there are
a couple of really high-profile
scientists of people like team gowers
that are trying to
incentivize like
famous scientists that already have
tenure and that don't need that to
publish that to increase the reputation
of those journals so that other maybe
younger scientists can start publishing
on on those as well and so they can try
to break that vicious cycle of
of um the more traditional journals i
mean another possible way to break this
cycle is to
like raise public awareness and just by
force like ban paid journals
like what exactly are they contributing
to the world
like basically making it illegal
to uh
forget the fact that it's mostly
federally funded so that's that's a
super ugly picture too
but like why should knowledge be
so expensive
like where everyone is working for the
public good
and then there's these gatekeepers
that you know most people can't read
most papers
without having to pay money and
that's that doesn't make any sense
that's like that that should be illegal
i mean that's what you're saying is
exactly right i mean for instance right
i i went to school here in the us we
studied in europe and
you would sit like you'd ask me all the
time to download papers and send it to
him because he just couldn't get it and
like papers that he needed for his
research and so but he's a student like
he's yeah he's a grad student he was a
grad student but that you know
i'm even referring to just regular
people oh yeah okay that too yeah and i
i think uh during 2020
because of covet a lot of journals put
down the
walls for certain kind of coronavirus or
papers
but like that just gave me an indication
that like
this should be done for everything it's
it's absurd like people should be
outraged that there's these gates
because
so the moment you dissolve the journals
then there will be an opportunity for
startups
to uh build stuff on top of archive it'd
be an opportunity for like
vermont's library to step up to scale up
to something much even larger i mean
that was the original dream of uh
google which i
always admired which is make the world's
information accessible actually it's
interesting that google hasn't maybe you
guys can correct me but they uh put
together google scholar which is
incredible
but they and they've did the scanning of
books but they've haven't really
tried to make science accessible
in the in the in the following way like
besides doing google scholar they
haven't like
delved into the papers
right mm-hmm which is especially curious
given what louise was saying right that
it's kind of in their genesis there's
this
you know research that was very
connected with our papers reference each
other and like building a network out of
that
interesting enough like google but i
think there was a there was not intent
google plus was like the google social
network that got canceled was used by a
lot of researchers yes it was uh whether
i think was just a you know side kind of
a side effect but then a lot of people
ended up migrating to twitter but it was
not on purpose but yeah i agree with you
like they haven't
um gone past the google scholar and well
you know what that said google's call is
incredible people who are not familiar
it's one of t
Resume
Read
file updated 2026-02-14 18:17:52 UTC
Categories
Manage