Luís and João Batalha: Fermat's Library and the Art of Studying Papers | Lex Fridman Podcast #209
ndMahzDCH1Y • 2021-08-09
Transcript preview
Open
Kind: captions Language: en the following is a conversation with luiz and joao batala brothers and co-founders of firma's library which is an incredible platform for annotating papers as they write on the formats library website justice pierre de fermat scribbled his famous last theorem in the margins professional scientists academics and citizen scientists can annotate equations figures ideas and write in the margins for mars library is also a really good twitter account to follow i highly recommend it they post little visual factoids and explorations that reveal the beauty of mathematics i love it quick mention of our sponsors skiff simply safe indeed netsuite and for sigmatic check them out in the description to support this podcast as a side note let me say a few words about the dissemination of scientific ideas i believe that all scientific articles should be freely accessible to the public they currently are not in one analysis i saw more than 70 of published research articles are behind a paywall in case you don't know the funders of the research whether that's government or industry aren't the ones putting up the paywall the journals are the ones putting up the paywall while using unpaid labor from researchers for the peer review process where is all that money from the paywall going in this digital age the costs here should be minimal this cost can easily be covered through donation advertisement or public funding of science the benefit versus the cost of all papers being free to read is obvious and the fact that they're not free goes against everything science should stand for which is the free dissemination of ideas that educate and inspire science cannot be a gated institution the more people can freely learn and collaborate on ideas the more problems we can solve in the world together and the faster we can drive old ideas out and bring new better ideas in science is beautiful and powerful and its dissemination in this digital age should be free this is the lex friedman podcast and here's my conversation with luiz and joao batala luis you suggested an interesting idea imagine if most papers had a backstory section the same way that they have an abstract so knowing more about how the authors ended up working on a paper can be extremely insightful and then you went on to give a backstory for the feynman qed paper this is all in a tweet by the way we're doing tweet analysis today how much of the human backstory do you think is important in understanding the idea itself that's presented in the paper or in general i think this gives way more context to the work of of scientists i think people a lot of people have this almost kind of romantic misconception that the way a lot of scientists work is almost as the sum of eureka moments where all of a sudden they sit down and start writing two papers in a row and the papers are usually isolated and when you actually look at it it's the papers are you know chapters of a way more complex uh story and the definement qed paper is a good example so feynman was actually going through a pretty dark phase before writing that paper it was he lost enthusiasm with physics and doing physics problems and there was one time when he was in the cafeteria of cornell and he saw a guy that was throwing flights in the air and he noticed that there was when the plate was in the air there were two movements there the plate was wobbling but he also noticed that the the cornell symbol was rotating and he was able to figure out the equations of motions uh the equations of motions of those uh plates and that uh led him to kind of think a little bit about electron orbits in relativity which led to the paper of about quantum electrodynamics so that kind of reignited his interest in physics and and and ended up publishing the paper that led to the his nobel prize basically and i think it's it's there are a lot of really interesting backstories about papers that readers never get to know friends we did a couple of months ago um an ama around a paper a pretty famous paper the gans paper with ian goodfellow and so we did an ama where everyone was could ask questions about the paper and ian was responding to those questions you also he was also telling the story of how he got the idea for that paper in a bar so there was also an interesting and a back story i also read a book by cedric villani uh these cedric velani is this mathematician the fields medalist and in his book he tries to explain how he got from like a phd student to the fields metal and he tries to be as descriptive as possible every single step how we got to the fields metal and it's interesting also to see just the amount of random interactions and discussions with other researchers sometimes over coffee and how it led to like fundamental breakthroughs and some of his most important papers so i think it's super interesting to have that context of of the backstory well the ian goodfellow story is kind of interesting and perhaps that's true for feynman as well i don't know if it's romanticizing the thing but it seems like just a few little insights and a little bit of work does most of the leap required do you have a sense that for a lot of the stuff you've looked at just looking back through history uh it it wasn't necessarily the grind of like andrew wiles of the females last theorem for example it was more like a a brilliant moment of insight in fact ian goodfellow has a kind of sadness to him almost in that at that time in machine learning like at that time especially in uh for gans you could code something up really quickly on a single machine and almost do the invention go from idea to uh experimental validation and like a single night a single person could do it and now there's kind of a sadness that a lot of the breakthroughs you might have in machine learning kind of require large-scale experiments so it was almost like the early days so i wonder how many low-hanging fruit there are in science and mathematics and even engineering where it's like you could do that little experiment quickly like you have an insight and a bar why is it always a bar but you have an insight at a bar and then just implement and the world changes it's it's a good point i think it also depends a lot on the maturity of the field when you look at a field like mathematics like it's a pretty mature field uh feels like machine learning um it's it's growing pretty fast and um it's actually pretty pretty interesting i i looked up like the number of new papers on archive with the keyword machine learning and like 50 of those papers have been published on in the last 12 months so you can see just the same zero five zero fifty percent so you can see the the the the magnitude of growth in that field and so i think like as fields mature like those types of moments i think naturally uh are less frequent um it's just a consequence of that the other point that is interesting about the backstory is that it can really make it more memorable in a way and and by making it more memorable it's it kind of sediments the knowledge more in your mind i remember also reading the sort of the backstory to to dijkstra's shortest path algorithm right where he came up with it uh essentially while he was sitting down at a at a coffee shop in amsterdam and he and he came up with that algorithm over 20 minutes and one interesting aspect is he didn't have any pen or paper at the time and so he had to do it all in his mind and so there's only so much complexity that you can handle if you're just thinking about it in your mind and that like when you think about the simplicity of dijkstra's shortest path finding algorithm it's you know knowing that backstory helps sediment that algorithm in your mind so that you don't forget about it as easily it might be from you that i saw a meme about texture it's like he's trying to solve it he comes up with some kind of random path and then it's like my parents aren't home and then he does uh he figures out the algorithm for the shortest path i strike through words to convey memes but that's hilarious i don't know if it's in post that we construct stories that romanticize it apparently with newton there was no apple especially when you're working on problems that have a physical manifestation or a visual manifestation it feels like the world could be an inspiration to you so it doesn't have to be completely in on paper like you could be sitting at a bar and all of a sudden see something and a pattern will will spark another pattern and you can visualize it and rethink a problem in a particular way of course you can also load the math that you have on paper and always carry that with you so when you show up to the bar some little inspiration could be the thing that changes it is there any other people almost on the human side whether it's physics with feynman derock einstein or computer science touring anybody else any backstories that you remember that jump out because i'm also referring to not necessarily these stories where something magical happens but these are personalities they have big egos some of them are super friendly some of them are like self-obsessed some of them have anger issues some of them how do i describe feynman but he appears to uh have a appreciation of the beautiful in all its forms it has a wit and a cleverness and a humor about him so it does that come into play in terms of the construction of the science well i think you brought up newton newton is it's a good example also to think about his backstory because you know there's a certain backstory of newton that people always talk about but then there's a whole another aspect of him that is also a big part of the person that he was but you know he was really into alchemy right and that he spent a lot of time thinking about that and writing about it and he took it very seriously he was really into bible interpretation and trying to predict things based on the bible and so there's also a whole backstory then and of course you need to look at it in the context that and the time that when newton lived um but a but it adds to his personality and it's important to also understand those aspects then maybe you know uh i'm not people people are not as proud to teach to little kids but it's important it was part of who he was and and maybe without those he who knows what he would have done otherwise so well the the cool thing about alchemy i don't know how it was viewed at the time but it almost like to me symbolizes dreaming of the impossible like most of the breakthrough ideas kind of seem impossible until they're actually done it's like achieving human flight it's not completely obvious to me that alchemy is impossible or like putting myself in the mindset of the time and perhaps even still every everything that uh you know some of the most incredible breakthroughs are would seem impossible and i wonder the value of believing almost like focusing and dreaming of the impossible such that it is actually is possible in your mind and that in itself manifests whether the accomplishing that goal or making progress in some unexpected direction so alchemy almost symbolizes that for me i distinctly remember having the same thought of thinking you know when i learned about atoms and that they have protons and electrons i was like okay to make gold you just take whatever has an atomic weight below it and then shove another proton in there and then you have a bunch of gold so like why don't people do that it seemed like conceptually is like you know this sounds feasible you might be able to do it and you can actually it's just very very expensive yeah yeah exactly exactly so in a sense we do have alchemy and maybe even back then it wasn't as crazy that he was so into it but good people just don't like to talk about that as much yeah but newton in general is a very interesting fellow anybody else come to mind in terms of people that inspire you in terms of people that you just are happy that they have once or still exist on this earth i think i mean freeman dyson for me yeah freeman dyson was was i've had a chance to actually exchange a couple of emails with him it was probably one of the most humble scientists that i've ever met and that had a a big impact on me we were trying we're actually trying to convince him to annotate a paper on fermat's library and i sent him an email asking him if you could annotate a paper and his response was something like i have very limited knowledge i just know a couple of things about certain fields i'm not sure if i'm qualified to do that that was his first response and and this was someone that should have won an opera fry's and worked on a bunch of different fields um did some really really great work and then just the interactions that i had with him every time i asked him a couple of questions about his papers and uh he always responded saying i'm not here to answer your questions i just want to open it more questions um and uh so that had a big impact on me it was like just an example of an extremely humble yet accomplished uh scientist and feynman was also a big big inspiration in the sense that he was able to be you know again extremely talented and and scientists but at the same time socially he was able to to he was also really smart from a social perspective and he was able to interact with people it was also a really good teacher and was also to did a awesome work in terms of um explaining physics to to the masses and motivating and getting people interested in physics and that for me was was also a big inspiration yeah i like the childlike curiosity of some of those folks like you mentioned freeman i have daniel kahneman i got a chance to meet and interact with some some of these truly special scientists what makes them special is that even in uh older age they're still like there's still that fire of childlike curiosity that burns and uh some of that is like not taking yourself so seriously that you think you've figured it all out but almost like thinking that you don't know much of it and that's like step one in having a great conversation or collaboration or exploring a scientific question it's cool how the very thing that probably earned people the nobel prize or or work that's seminal in some way is the very thing that still burns even after uh they've won the prize it's cool to see and they're rare humans it seems and to that point i remember like the last email that i sent to freeman dyson was like in his last birthday he was really into number theory and primes so what i did is i took like a photo of him a picture and then i turned that into like a giant prime number so i converted the picture into a bunch of one and eight and then i moved some numbers around until it was a prime um and then i sent him that also the the visual like it still looked like the picture it's made up of a problem that's tricky to do it's hard to do it looks harder than it actually is so the the way you do it is like you convert the darker regions into eights and the lighter regions in ones and then there's just keep flipping yeah but there's like some primality tests that are cheaper from a computational standpoint yes but what it tells you is it excludes numbers that are not prime then you end up with a set of numbers that you don't know if they are prime or not and then you run the full primality test on that so you just have to keep iterating on that and it was it was it's it's funny because when he got the picture he was like how did you do that it was super curious too and then we got into the details and again this was he was already 90 i think 92 or something and that curiosity was still there um so you could really see that in in some of these scientists so could we talk about vermont's library yeah absolutely what is it what's the main goal what's the dream it is a platform for annotating papers in its essence right and so academic papers can be one of the densest forms of content out there and generally pretty hard to understand at times and the idea is that you can make them more accessible and easier to understand by adding these rich annotations to the site right and so we can just imagine a pdf view on your browser and then you have annotations on each side and then when you click on them a sidebar expands and then you have annotations that support latex and markdown and so the idea is that you can say explain a tougher part of a paper where there's a step that is not completely obvious or you can add more context to it and then over time papers can become easier and easier to understand and can evolve in a way but it really came from myself luige and two other friends we've been we've had this this long-running habit of kind of running a journal club amongst us we come from different backgrounds right i studied cs we studied physics and so we read papers and present them to each other and uh and then we tried to bring some of that online and that's that's that's when we decided to to to build fermat's library um then over time it kind of grew into into something uh with with a broader goal uh and really what we're trying to do is trying to help uh move science in the in the the right direction that's really the ultimate goal and and where we want to take it now so there's a lot to be said so first of all for people who haven't seen it the interface is exceptionally well done that's like execution is really important here absolutely the other things just to mention for a large number of people apparently which is new to me don't know what latex is so it's spelled like latex so be careful googling it if you haven't before uh it's uh uh sorry i don't even know the correct terminology type setting it's a typesetting language where it's you're basically program writing a program that then generates something that looks from a typography perspective beautiful absolutely and uh so a lot of academics use it to write papers i i think there's like a bunch of communities that use it to write papers i would say it's mathematics physics computer science yeah that's yeah that's the because i'm collaborating currently on a paper with uh two neuroscientists from stanford and they don't know what so i'm using uh microsoft word and uh mendeley and like all of those kinds of things and it's and i'm being very zen like about about the whole process but it's fascinating it's a little heartbreaking actually because uh it actually it's it's funny to say but uh and we'll talk about open science actually the bigger mission behind for mars libraries like really opening up the world of science to everybody is these silly two facts of like one community uses latex and another uses word is actually a barrier between them that's like it's like boring and practical in a sense but it makes it very difficult to collaborate just on that like i think there if there are some people that should have received like a nobel prize that but we'll never get it and i think one of those is like donald knuth because of tech and latex and then because it had a huge impact in terms of like just making it easier for uh researchers to put their content out there like making it uniform as much as possible oh you mean like a nobel peace prize well maybe maybe a couple of peace prizes maybe a nobel peace prize yeah i i think so i mean he at a very young age got the touring award for his work in algorithms and so on so yeah like an incredible yeah like when i i think it's in it might be even the 60s but i think it's the 70s that so when he was really young and then he went on to do like incredible work with his book and uh yeah with tech that people don't know and and going back just one on the reason why we we ended up because i think this is interesting the reason why we ended up using the name for mars library this was because of uh vermont's last theorem and from us livestream is actually a funny story like so pierre de fermat he was like a lawyer and he wrote like on a book that he had a solution to fermat's last theorem which um but that didn't fit the margin of that book and so fermat's lie stream basically states that there's no solution if you have uh integers a b and c there's no solution to a to the power of n plus b to the power of n equals to c to the power of n if n is bigger than two so there's there's there's no solutions and he said that and that problem remained open for almost 300 years i believe and a lot of the most famous mathematicians tried to tackle that problem no one was able to figure that that out until andrea wiles uh i think was in in the 90s was able to publish the solution which was i i believe almost 300 pages long and so it's kind of an anecdote that you know there's a lot of of knowledge and insights that can be trapped in the margins then you and there's a lot of potential energy that you can release if you actually um spend some time trying to digest that and that was the the the origin story for for the name yes you can share the contents of the margins with the world exactly that could inspire a solution or a communication that then leads to a solution but and and if you think about papers like papers are as as jean was saying probably one of the densest pieces of text that any human can read and you have these researchers like some of the brightest minds in in these fields working on like new discoveries and publishing these work on journals that are imposing them restrictions in terms of the number of pages that they can have to explain a new scientific breakthrough so at the end of the day papers are not optimized for clarity and for a proper explanation of of that content because there are so many restrictions so there's as i mentioned there's a lot of potential energy that can be freed if you actually try to digest a lot of the contents of papers can you explain some of the other things so margins librarian journal club so journal club is what a lot of people know us for uh where we every week we release an annotated paper and in all sorts of different fields with physics cs math margins is kind of the same software that we use to to run the journal club and to host the annotations but we've made that available for free to anybody that wants to use it and so folks use it at universities and and for running journal clubs and and so we just made that freely available and then librarian is a browser extension that we developed that is sort of an overlay on top of archive so it's about bringing some of the same functionality around comments plus adding some extra niceties to to archive like being able to very easily extract the references of a paper that you're looking at or being able to extract the bibtex in order to cite that paper yourself so it's an overlay on top of archive the idea is that you can have that commenting interface without having to leave archive it's kind of incredible i didn't know about it and once i've learned of it it's like holy shit why isn't it more popular given how popular archive is like everybody should be using it archive sucks or uh let me rephrase that it's limited yeah in terms of what's interesting archive is a pretty incredible project right and it is in in a way it's it you know it the growth has been completely linear over time if you look at like number of papers published on archive like you know it's just been it's pretty much a straight line for the past 20 years especially for you know like if you're coming from a startup background and then you were trying to do archive you'd probably try like all sorts of growth acts and like try to to then maybe like have paid features and things like that and that would kind of maybe ruin it and so there's there's a subtle balance there yeah and i don't know what what aspects you can change about it and yeah for some tools in science it just takes time for them to to grow archive is just turned 30 i believe yeah and for for people that don't know archive is these kind of online repository where people put preprints which are versions of the papers before they actually make it to journals a-r-x-i-v exactly for people who don't know and it's actually a really vibrant place to publish your papers in in the aforementioned uh communities of mathematics exactly in computer science it started with mathematics and physics and then over the the last 30 years it evolved and now actually computer computer science now it's it's a more popular category than than physics and math on archive and there's also which i don't know very much about like a biology medical version of that bio archive yeah by archive um it's recent it's um it's interesting because if you look at like these um platforms for preprints they are they actually play a super important role because if you look at a category like math for some papers in math it might take close to three years after you click upload paper on the journal website and the paper gets published on the website of the journal so this is literally the longest upload period on the internet um and during those three years like it's it's you know their content is just you know locked and so this that's why it's so important for people to have websites like archive so that you can share that before it goes to the journal with the rest of the world there was actually on archive that uh perumann published the the three pipers that led to the proof of the poincare conject conjecture and then you have other fields like machine learning for instance where the the field is evolving at such a high rate that people don't even wait before the papers go to journals before they start working on top of those papers so they publish them on archive then other people see them they start working on that and archive did a really good job at like building that core platform to host papers but i i think there's a really really big opportunity in building more features on top of that platform apart from just hosting paper so collaboration annotations and like having other things apart from from papers like code um and and other things because uh in the field like machine learning there's a really big you know as i mentioned people start working on on top of preprints and they are assuming that that that preprint is correct but you really need a way for instance to maybe it's not peer review but distinguish what is good work from bad work on archive how do you do that so like a commenting interface like librarian it's useful for that so that you can distinguish that um at in the field that is growing so fast as machine learning and um and then you have platforms that focus for instance on just biology bioarchive is a good example um bioarchive is also super interesting because there there's actually an interesting experiment that was run in the 60s so in the 60s the nih um supported this pro this this experiment called the information exchange group which at the time was a way for researchers to share biology preprints via mail or using libraries and that project in the 1960s got cancelled six years after it started and it was due to intense pressure from the journals to kill that project because they they were fearing a competition from from the uh for in for the journal industry creek uh was also uh was one of the famous scientists that opposed to to the uh information exchange group and it's interesting because right now if you analyze the number of biology papers that appear first as preprints it's only two percent of the papers and it this was almost 50 almost 50 years after that first experiment so you can see like that pressure from the journals to cancel that uh initial version of a pre-print repo had a tremendous impact on on on the number of papers that are showing up in biology as preprints so it delayed a lot that that revolution and um but now platforms like bioarchive are doing that work but there's still a lot of room for growth there and i think it's super important because those are the papers that are open that everyone can read okay so but if we just look at the entire process of science as a big system can we just talk about how it can be revolutionized so you have an idea uh depending on the field you want to make that idea concrete you want to run a few experiments in computer science there might be some code there'd be a data set for you know some of the more sort of biology psychology you might be collecting the data set that's called you know a study right so that's part of that that's part of the methodology and so you are putting all that into a paper form and then you have some results and then you you submit that to a place for review through the peer review process and there's a process where how would you summarize the peer review process but it's it's really just like a handful of people look over your paper and comment and based on that decide whether your paper is good or not so there's a whole broken nature to it at the same time i love the peer review process when i buy stuff on amazon like uh for like uh the commenting system whatever that is so okay so there's a bunch of possibilities for revolutions there and then there's the other side which is the collaborative aspect of the science which is people annotating people commenting sort of the low effort collaboration which is a comment sometimes as you've talked about a comment can change everything but you know or a higher effort collaboration like more like maybe annotations or even like contributing to the paper you can think of like a collaborative updating of the paper over time so there's all these possibilities for doing things better than they've been done can we talk about some ideas in this space some ideas that you're working on some ideas that uh you're not yet working on but should be revolutionized because it does seem that archive and like open review for example are like the craigslist of science like like yeah okay i'm very grateful that we have it but it just feels like it's like 10 to 20 years like it doesn't feel like that's a feature the simplicity of it is a feature it feels like it's a it's a bug [Music] but then again the the pushback there is uh wikipedia has the same kind of simplicity to it and it seems to work exceptionally well in the crowdsourcing aspect of it i'm sorry this there's a bunch of stuff going on on the table let's just pick random things that we can talk about wikipedia you know for me it's the cosmological constant of the internet it's like i think we are lucky to live in the parallel universe where wikipedia exists yes because if if someone had pitched me wikipedia like a publicly edited encyclopedia like a couple of years ago like it would be i don't know how many people would have said that that would have survived yeah i mean it makes almost no sense it's like having a google doc that everybody on the internet can edit and like that will be like the most reliable source for for knowledge and i don't know how many but hundreds of thousands of topics yeah exactly it's insane it's insane and like you have and then you have users like there's one a single user that edited one third of the articles on wikipedia so you have these really really big power users there are a substantial part of like what makes wikipedia successful and so like no one would have ever imagined that that could happen um and so that that's that's one thing i i completely agree with what you just said i also started to interrupt briefly maybe let's inject that into the discussion of everything else i also believe i've seen that with stack overflow that one individual or a small collection of individuals contribute or revolutionize most of the community like if you create a really powerful system for archive or like open review it made it really easy and compelling and exciting for one person who isn't like a 10x contributor to do their thing that's going to change everything it seems like that was the mechanism that changed everything for wikipedia and that's the mechanism that changed everything for stack overflow is gamifying or making it exciting or just making it fun or pleasant or fulfilling in some way for those people who are insane enough to like answer thousands of questions or write thousands of factoids and like research them and check them all those kinds of things or read thousands of papers yeah no stack overflow is another great example of that and it's just and and those are both to incredibly productive communities that generate a ton of value and and and capture almost none of it right and it's and you know in a way it's almost like counter um it's very counter-intuitive that that that people that these communities would exist and thrive um and and it's really hard to you there aren't that many communities like that so how do we do that for science do you have ideas there like what are the biggest problems that you see you're working on some of them like just on that there are a couple of really interesting experiments that people are running an example would be like the polymath projects so this is a so kind of a social experiment that was uh created by tim gowers fields fields medalist and his idea was to try to prove that is it possible to do mathematics in a massively collaborative collaborative way on the internet so we decided to pick a couple of problems and test that and they found out that it it actually it is possible for a specific types of problems namely problems that you're able to break down in in little pieces and go step by step you might need as as with open source you might need people that are just kind of reorganizing the the house every once in a while and then you know people throw a bunch of ideas and then you know you make some progress then you reorganize you reframe the problem you go step by step but they were actually able to prove that it is possible to to uh collaborate online and and do progress in terms of mathematics um and so i'm i'm confident that there are other avenues that could be explored here can we talk about peer review for example absolutely i i think like in in terms of the peer review i think we it's it's important to look at the bigger picture here of like of what this scientific the scientific publishing ecosystem looks like because for me there there are a lot of things that are wrong about that entire process so if you look at for instance at the what publishing means in like a traditional journal you have uh journals that pay um authors for their articles and then they might pay like reviewers to um review those articles and finally they pay people to um or distributors to distribute the content in in the scientific publishing world you have scientists that are usually backed by government grants they are giving away their work for free in the form of papers and then you have other scientists that are reviewing their work this process is known as the peer review process again for free and then finally we have um government-backed universities and libraries that are buying back all those all that work so that other scientists can we can read so this is for me it's bizarre you have the government that is funding the research is paying the salaries of the scientists it's paying the salaries of the reviewers and it's buying back all that uh the product of their work again um and i think the problem with this system and it's what it's why it's so difficult to to break this suboptimal equilibrium is because of of the way academia works right now and the way you can progress in in your academic life and and so in a lot of fields the the competition in academia is is really insane so you have hundreds of phd students there are um trying to get to a professor position and and it's hyper competitive and the only way for you to get there is if you publish papers ideally in journals with a high impact factor in computer science it's all it's often conferences are also very prestigious or actually more prestigious than journals now so interesting so that's the one discipline where i mean that has to do with the thing we've discussed uh in terms of the how quickly the field turns around but like uh in eurips cvpr those conferences are more prestigious or at the very least as prestigious as the journal but doesn't matter the process is what it is and and and so with the the so for people that don't know how the impact factor of a journal is basically the average number of citations that a paper would get if it gets published on that journal but so um you can really think that the problem with the the impact factor is that it's a way to turn papers into accounting units and and and let me unpack this because it's the impact factor is almost like a nobility title so because papers are born with impact even before anyone reads them so the researchers they don't have the incentive to care about if this paper is going to ever a long-term impact on on on the world what they care their goal their end goal is the paper to get published yes so that they get that value up front and so for me that that is one of the problems of of that and that really creates a tyranny of of metrics because at the end of the day if you are a dean what you want to hire is like people researchers that publish papers on journals with high impact factors because that will increase the ranking of your university and will allow you to charge more for tuition so on and so forth and um and and that that especially when you are in super competitive areas you know that people will try to gamify that system and and misconduct starts showing up um there's a a really interesting book on this topic called gaming the metrics it's a book by a researcher called mario biagioli it goes a lot into like how these the impact factor and metrics affect science negatively and it's interesting to think especially in terms of citations if you look at the early work of like looking at citations there was a lot of work that was done by a guy called eugene garfield and this guy the early work in terms of citation they wanted to use they wanted to use citations as from a descriptive point of view so what they wanted to to create was a map and and that map would create a visual representation of of influence so citations would be links between papers and the ideally what they would show they would represent is that you read someone else's paper and it had an impact on your research they weren't supposed to be counted i think this inspired like larry and sergey's exactly worked right for google exactly i think they even mentioned that but what happens is like as you start counting citations you create a market and and the same way like and this was the the work of eugene garfield was a big inspiration for larry and sergey for the pagerank algorithm that um you know led to the creation of google and they even recognize that and and if you think about it's like the same way there's a gigantic market for search engine optimization uh seo where people try to optimize you know the the page rank and how i the uh of a web page will rank on google the same will happen for papers people will try to optimize like their site their the impact factors and the citations that they get and that um creates a really big problem and if it's super interesting to actually analyze them if you look at the distribution of the high impact the impact factors of journals you have like nature with nature i believe it's like in the low 40s and then you have i believe science is high 30s and then you have a really goo a good set of good journals that will fall between 10 and 30 and then you have a gigantic tale of of journals that have impact factor below two and you can really see two economies here you see the the you know the universities that are maybe less prestigious less known that where the faculty are pressured to just publish papers regardless of the journal what i want to do is increase the ranking of my university and so they end up publishing as many papers as as they they can in like journals with low impact factor and unfortunately this is represents a lot of of the global south and then you have the luxury good economy so for instance for and there are also problems here in the luxury good economy so if you look at the journal like nature so with impact factor of like in the low 40s there's no way that you're going to be able to sustain that level of impact factor by just grabbing the attention of scientists what what i mean by that is like for for the journals the articles that get published in nature they need to be new york times great so they need to make it to the you know to the to the big media they need to be captured by the big media and because that's the only way for you to capture enough attention to sustain that level of citations yes and that of course creates problems because people then will try to again gamify the system and have like titles or abstracts or that are bigger claim make claims that are bigger than what is actually can be um you know sustained by by the data or the the content of the paper and you'll have clickbait titles or clickbait abstracts and again this is all a consequence of metrics and uh scientometrics and and this is a very dangerous cycle that i think it's very hard to break but it's happening in academia in a lot of fields right now is it fundamentally the existence of metrics or the metrics just need to be significantly improved because uh like i said the metrics used for amazon for purchasing i don't know computer parts it's pretty damn good in terms of selecting which are the good ones which are not in that same way if if we had an amazon type of review system in the space of ideas in the space of science it feels like that those metrics would be a little bit better sort of when it's um when it's significantly more open to the crowd source nature of the internet of the of the scientific internet meaning as opposed to like my biggest problem with peer review has always been that it's like five six seven people usually even less and it's often nobody's incentivized to do a good job in the whole process meaning it's anonymous in a way that doesn't incentivize like doesn't gamify or incentivize great work and also it doesn't necessarily have to be anonymous like there has to be um the entire system is um doesn't encourage actual sort of rigorous review for example like open review does kind of incentivize that kind of process of collaborative review but it's also imperfect it just feels like the thing that amazon has which is like thousands of people contributing their reviews to a product it feels like that could be applied to science where the same kind of thing you're doing with vermont's library but doing at a scale that's much larger it feels like that should be possible given the number of grad students given the number of um general public that get like for example i personally as a person who got an education in mathematics and computer science like uh i can i can be a quote-unquote like reviewer on a lot bigger set of things than than is my exact uh expertise if i'm one of thousands of reviewers if i'm the only reviewer or one of five then i'd better be like an expert in the thing but if if i uh and i've learned this with covet which is like you can just use your basic skills as a data analyst as a and to contribute to the review process and a particular little aspect of a paper and be able to comment be able to sort of uh draw in some references that challenge the ideas presented or to enrich the ideas that are presented or you know and it just feels like crowdsourcing the review process would be able to allow you to have metrics in terms of how good a paper is that are much better representative of its actual impact in the world of its actual value to the world as opposed to some kind of arbitrary gamified version of its impact i agree with that i i think we there's definitely the possibility at least for more resilient a more resilient system than what we have today and that's i think that's kind of what you're describing alex and and i mean to an extent we we kind of have like a little bit of a heisenberg uncertainty principle when you pick a metric as soon as you do it then maybe it works as a good heuristic for for a short amount of time but soon enough people would start gamifying and yeah but but then you can definitely have metrics that are more resilient to gamification and they'll work as a better heuristic to to try to push you in the in the best direction but i guess the underlying problem you're saying is uh there's a shortage of positions in academia that's a big problem for me yeah and and that and so they're going to be constantly gamifying the metrics it's a bit of a zero-sum it's very competitive it's what it's a very competitive field and and that's what usually happens in very competitive fields yeah yeah but i think some of like the peer review problems like scale helps i think and and it's interesting to look at like what you're mentioning breaking it down maybe in my smaller parts and having more people jumping in um but th this is definitely a problem and and the peer review problem as i mentioned is is correlated with the problem of like academic career progression and it's all intertwined and it's what that's why i think it's so hard to to break it um there are like a couple of really interesting things that are being done right now there are a couple of for instance journals that are overlaid journals on top of platforms like archive and bioarchive that want to remove like the more traditional journals from the equation so essentially a journal is just a collection of links to papers and and um and what they are trying to do is like removing that middleman and trying to to make the review process a little bit more transparent um and and and not charging universities like uh there's there's a couple of there are a couple of more famous um ones there's one discrete analysis in mathematics there's one uh called the quantum journal which we are actually working with them we have a partnership with them for the purpose that get published in quantum journal they also get the annotations on formats um and they are doing pretty well they've been able to grow substantially the problem there is getting to critical mass so it's again convincing the researchers and especially the young researchers that need need that impact factor need those publications to have citations to not publish on the traditional journal and go on an open journal and and publish their work there there i think there are a couple of really high-profile scientists of people like team gowers that are trying to incentivize like famous scientists that already have tenure and that don't need that to publish that to increase the reputation of those journals so that other maybe younger scientists can start publishing on on those as well and so they can try to break that vicious cycle of of um the more traditional journals i mean another possible way to break this cycle is to like raise public awareness and just by force like ban paid journals like what exactly are they contributing to the world like basically making it illegal to uh forget the fact that it's mostly federally funded so that's that's a super ugly picture too but like why should knowledge be so expensive like where everyone is working for the public good and then there's these gatekeepers that you know most people can't read most papers without having to pay money and that's that doesn't make any sense that's like that that should be illegal i mean that's what you're saying is exactly right i mean for instance right i i went to school here in the us we studied in europe and you would sit like you'd ask me all the time to download papers and send it to him because he just couldn't get it and like papers that he needed for his research and so but he's a student like he's yeah he's a grad student he was a grad student but that you know i'm even referring to just regular people oh yeah okay that too yeah and i i think uh during 2020 because of covet a lot of journals put down the walls for certain kind of coronavirus or papers but like that just gave me an indication that like this should be done for everything it's it's absurd like people should be outraged that there's these gates because so the moment you dissolve the journals then there will be an opportunity for startups to uh build stuff on top of archive it'd be an opportunity for like vermont's library to step up to scale up to something much even larger i mean that was the original dream of uh google which i always admired which is make the world's information accessible actually it's interesting that google hasn't maybe you guys can correct me but they uh put together google scholar which is incredible but they and they've did the scanning of books but they've haven't really tried to make science accessible in the in the in the following way like besides doing google scholar they haven't like delved into the papers right mm-hmm which is especially curious given what louise was saying right that it's kind of in their genesis there's this you know research that was very connected with our papers reference each other and like building a network out of that interesting enough like google but i think there was a there was not intent google plus was like the google social network that got canceled was used by a lot of researchers yes it was uh whether i think was just a you know side kind of a side effect but then a lot of people ended up migrating to twitter but it was not on purpose but yeah i agree with you like they haven't um gone past the google scholar and well you know what that said google's call is incredible people who are not familiar it's one of t
Resume
Categories