Jimmy Wales: Wikipedia | Lex Fridman Podcast #385
diJp4zoQPqo • 2023-06-18
Transcript preview
Open
Kind: captions Language: en we've never bowed down to government pressure anywhere in the world and we never will we understand that we're hardcore and actually there is a bit of nuance about how different companies respond to this but our response has always been just to say no and if they threaten to block we'll knock yourself out you're going to lose Wikipedia the following is a conversation with Jimmy Wales co-founder of Wikipedia one of if not the most impactful websites ever expanding the collective knowledge intelligence and wisdom of human civilization this is Alex Friedman podcast to support it please check out our sponsors in the description and now dear friends here's Jimmy Wales let's start at the beginning what is the origin story of Wikipedia the origin story of Wikipedia well so I was watching the growth of the free software movement open source software and seeing programmers coming together to collaborate in new ways sharing code doing that under free license which is really interesting because it empowers an ability to work together that's really hard to do if the code is still proprietary because then if I chip in and help we sort of have to figure out how I'm going to be rewarded and what that is but the idea that everyone can copy it and it just is part of the commons really empowered a huge wave of uh creative software production and I realized that that kind of collaboration could extend beyond just software to all kinds of cultural works and the first thing that I thought of was an encyclopedia and I thought oh that seems obvious that an encyclopedia you can collaborate on it there's a few reasons why one we all pretty much know what an encyclopedia entry on say the Eiffel Tower should be like you know you should see a picture a few pictures maybe history location something about the architect etc etc so we have a shared understanding of what it is we're trying to do and then we can collaborate and different people can chip in and find sources and so on and so forth so set up first new pedia which was about two years before Wikipedia and with newpedia we we had this idea that in order to be respected we had to be even more academic than a traditional encyclopedia because a bunch of volunteers on the internet getting out of the encyclopedia you know you could be made fun of if it's just every random person so we had implemented this seven stage review process to get anything published um and two two things came with that so one thing one of the earliest entries that we published after this rigorous process a few days later we had to pull it because as soon as it hit the web and the broader Community took a look at it people noticed plagiarism and realized that it wasn't actually that good even though it had been reviewed by academics and so on so we had to pull it so it's like okay well so much for a seven stage review process but also I decided that I wanted to try I was frustrated and why is this taking so long why is it so hard so I thought okay I saw that Robert Merton had won a Nobel prize in economics for his work on option pricing Theory and when I was in Academia that's what I worked on was option pricing Theory how to publish paper so I'd worked through all of his academic papers and I knew his work quite well I thought oh I'll just I'll write a short biography of Merton and when I started to do it I'd been out of Academia I had been a grad student for a few years then I felt this huge intimidation because they were going to take my draft and send it to the most prestigious Finance professors that we could find to give me feedback for revisions and it felt like being back in grad school you know it's like this really oppressive sort of like you're gonna submit it for a review and you're going to get critiques a little bit the bad part of God yeah yeah the bad part of grad school right and so I was like oh this isn't intellectually fun this is like the bad part of grad school it's intimidating and there's a lot of um you know potential embarrassment if I screw something up and so forth and so that was when I realized okay look this is never going to work this is not something that people are really going to want to do so Jeremy Rosenfeld one of my employees had brought and showed me the wiki Concept in December and then Larry Sanger brought in uh the same said what about this Wiki idea and so uh in January we decided to launch Wikipedia but we weren't sure so the original project was called newpedia and even though it wasn't successful we did have quite a group of academics and like really serious people and we were concerned that well maybe these academics are going to really hate this idea and we shouldn't just convert the project immediately we should launch this as a side project the idea of here's a Wiki where we can start playing around but actually we got more work done in two weeks than we had in almost two years because people were able to just jump on and start doing stuff and it was actually a very exciting time you know you could back then you could be the first person who typed Africa is a continent and hit save you know which isn't much of an encyclopedia entry but it's true and it's a start and it's kind of fun like I you know you put your name down actually a funny story was uh several years later I just happened to be online and I saw when um I think his name is Robert Allman won the Nobel prize in economics and we didn't have an entry uh on him at all which was surprising but it wasn't that surprising this was still early days you know um and so I got to be the first person to type Robert Allman won a Nobel prize in economics and hit save which again wasn't a very good article but then I came back two days later and people had improved it and so forth so that that second half of the experience where with Robert Merton I never succeeded because it was just too intimidating it was like oh no I was able to chip in and help other people jumped in everybody was interested in the topic because it's all in the news at the moment and so it's just a completely different model which worked much much better well what is it that made that so accessible so fun so uh so natural to just add something well I think it's you know especially in the early days and this by the way has gotten much harder because there are fewer topics that are just Green Field you know available um but you know you could say oh well uh you know I I know a little bit about this and I can I can get it started uh but then it is fun to to come back then and see other people have added and improved and so on and so forth and that idea of collaborating you know where people can much like open source software um you know you you put your code out and then people suggest revisions and I change it and it modifies and it grows beyond the original Creator um it's just a kind of a fun wonderful quite geeky hobby but um people enjoy it how much debate was there over the interface over the details of how to make that well seamless and frictionless yeah I mean not as much as there probably should have been in a way during that two years of the failure of newpedia where very little work got done what was actually productive was there was a huge long discussion email discussion very clever people talking about things like neutrality talking about what is an encyclopedia but also talking about more technical ideas you know things back then XML was kind of all the rage and thinking about ah could we you know shouldn't you have certain uh data that might be in multiple articles that gets updated automatically so for example you know the population of New York City every 10 years there's a new official census couldn't you just up at the update that bit of data in one place and it would update across all those that is a reality today but back then it was just like how do we do that how do we think about that so that is a reality today where it's yeah there's some yeah so we can data variables yeah Wiki data um you can you can link uh you know from a Wikipedia entry you can link to that piece of data in wikidata I mean it's a pretty Advanced thing but there are Advanced users who are doing that and then when when that gets updated it updates in all the languages where you've done that I mean that's really interesting there was this chain of emails in the early days of discussing the details of what is so there's the interface there's the yeah so the interface so an example there was some software called use mod wiki which we started with it's quite amusing actually because the main reason we launched with use mod wiki is that it was a single Perl script so it was really easy for me to install it on the server and just get running but it was um you know some guy's hobby project it was cool but it was just a hobby project and uh all the data was stored uh in flat text files so there was no real database behind it so the to search the site you basically used graph which is just like the basic Unix utility to like look through all the files so that clearly was never going to scale but also in the early days it didn't have real logins so you could set your username but there were no passwords so you know I might say Bob Smith and then someone else comes along and says no I'm Bob Smith and they both had it now that never really happened we didn't have a problem with it but it was kind of obvious like you can't go a big website where everybody can pretend to be everybody that's that's not going to be good for trust and reputation and so forth so quickly I had to write a little you know login you know store people's passwords and things like that so you can have unique identities and then another example of something you know quite he would have never thought would have been a good idea and it turned out to not be a problem but to make a link in Wikipedia in the early days you would make a link to a page that may or may not exist by just using camel case meaning it's like uppercase lowercase and you smash the words together so maybe uh New York City he might type new no space capital Y York City and that would make a link but that was ugly that was clearly not right and so I was like okay well that that's just not going to look nice let's just use square brackets two square brackets makes a link that may have been an option in the software I'm not sure I thought up Square broadcast but anyway we just did that um which worked really well it makes nice links and you know you can see in its red links or Blue Links depending on if the page exists or not but the thing that didn't occur to me even think about is that for example on the German language standard keyboard there is no square bracket so for German Wikipedia to succeed people had to learn to do some alt codes to get the square bracket or they a lot of users cut and paste a square bracket when they could find one and they just cut and paste one in and yep German Wikipedia has been a massive success so somehow that didn't slow people down um how is the the German keyboards don't have a square bracket how do you do programming how do you how do you live it's life to its fullest with us we have a very good question I'm not really sure I mean maybe it does now because of keyboard standards have you know drifted over time and becomes useful to have a certain character I mean it's same thing like there's not really a w character in Italian um and it wasn't on keyboards or I think it is now but in in general W is not a letter in Italian language but it appears in enough International words that it's crept into Italians and all of these things are probably Wikipedia articles in oh yeah cells oh yeah the discussion of square brackets whole discussion I'm sure on both the English and the German Wikipedia and and then difference between those two might be very uh uh very interesting so wikidata is fascinating but even the broader discussion of uh what is an encyclopedia can you go to that sort of philosophical question of sure what is what is it what is it what is this encyclopedia so uh the way I would put it is uh an encyclopedia or what our goal is is the sum of all human knowledge but some meaning summary so and this was an early debate I mean somebody started uploading uh the full text of Hamlet for example and we said wait hold on a second that's not an encyclopedia article but why not um so hence was born wikisource which is where you put original texts and things like that out of copyright text uh because they said no an encyclopedia article about Hamlet that's a perfectly valid thing but the actual text of the play is not an encyclopedia article so most of it's fairly obvious but there are some interesting quirks and differences so for example as I understand it in uh French language encyclopedias traditionally it would be quite common to have recipes which in English language that would be unusual you wouldn't find a recipe for chocolate cake in Britannica and so I actually don't know the current state I haven't thought about that in many many years now state of cake recipes in Wikipedia in English Wikipedia I wouldn't say there's chocolate cake recipes I mean you might find a sample recipe somewhere I'm not saying there are none but in general no like we wouldn't have recipes I told myself I would not get outraged in this conversation but now I'm outraged I'm deeply upset it's actually very complicated I'm I'm I love to cook I'm I'm you know I'm I'm actually quite a good cook and uh what's interesting is there's it's very hard to have a neutral recipe because like a fanatical recipe for canonical recipes is kind of difficult to come by because there's so many variants and it's all debatable and interesting for something like chocolate cake you could probably say you know here's one of the earliest recipes or here's one of the most common recipes but um you know for many many things uh the variants are as interesting you know as uh you know somebody said to me recently you know 10 Spaniards 12 paella recipes so you know these are all matters of open discussion well just to throw some numbers as of May 27th 2023 there are 6 million 6.66 million articles in the English Wikipedia containing over 4.3 billion words including articles the total number of pages is 58 million yeah uh does that blow your mind I mean yes it does I mean it doesn't because I I know those numbers and see them from time to time but in another sense a deeper sense yeah it does I mean it's really uh remarkable I remember when uh English Wikipedia passed 100 000 articles and when German Wikipedia passed 100 000 because I happen to be in Germany with a bunch of wikipedians that night and um you know then it seemed quite big I mean we knew at that time that it it was nowhere near complete I remember at wikimania in Harvard uh when we when we did our annual conference there in Boston um someone who had come to the conference from Poland had brought along with him a small encyclopedia a single volume uh Encyclopedia of biographies so short biography is normally a paragraph or so about famous people in Poland and there were some 22 000 entries and he pointed out that even then 2006 Wikipedia felt quite big and he said in English Wikipedia there's only a handful of these you know less than 10 I think he said and so then you realize yeah actually you know who was the mayor of Warsaw in 1873 don't know probably not in English Wikipedia but it probably might be today but there's so much out there and of course what we get into when we're talking about how many entries there are and how many you know how many could there be is this very deep philosophical issue of notability um which is the question of well how do you how do you draw the limit how do you draw you know what what is there so sometimes people say oh there should be no limit but I think that doesn't stand up to much scrutiny if you really pause and think about it so I see in your hand there you've got a Bic pen pretty standard everybody's seen you know billions of those in life classic though it's a classic clear big pen so could we have an entry about that big pen oil I bet we do that type of big pen uh because it's classic everybody knows it and it's got a history and um actually there's something interesting about the big company they make pens they also make kayaks and there's something else they're famous or basically uh they're they're sort of a definition by non-essentials company anything that's long and plastic that's what they make wow so if you want to find the time the platonic form of a big but could we have an article about that very big pen in your hand so Lex Friedman's big pen out of this oh the very this is a very specific instance and the answer is no there's not much known about it I dare say unless you know it's very special to you and your great grandmother gave it to you or something you probably know very little about it it's a pen it's just here in the office and um so that that's just to show there's a there's there is a limit I mean in German Wikipedia they used to talk about the the rear nut of the wheel of ulifook's bicycle ulifooks the well-known wikipedian of the time to sort of illustrate like you can't have an article about literally everything and so then it raises the question what can you have an article about what can't do and that can vary depending on the subject matter um one of the areas where we try to be much more careful would be biographies the reason is a biography of a living person if you get it wrong it can actually be quite hurtful quite damaging and so if someone is a private person um and somebody tries to create a Wikipedia there's no way to update it there's not much now so for example an encyclopedia article about my mother my mother school teacher later a pharmacist wonderful woman but never been in the news I mean other than me talking about why there shouldn't be a Wikipedia entry that's probably made it in somewhere standard example but you know there's not enough known and you could sort of Imagine a database of genealogy having date of birth date of death and you know certain elements like that of of private people but you couldn't really write a biography one of the areas this comes up quite often is uh what we call blp1a we've got lots of acronyms biography of a living person who's notable for only one event there's a real sort of danger zone and the type of example would be a victim of a crime so someone who's a victim of a famous serial killer but about whom like really not much is known they weren't a public person they're just a victim of a crime we really shouldn't have an article about that person they'll be mentioned of course and maybe the specific crime might have an article but for that person no not really that's not really something that makes any sense because how can you write a biography about someone you don't know much about and this is you know it varies from from field to field so for example for many academics we will have an entry that we might not have in a different context because for an academic it's important to have sort of their career you know what papers they've published things like that you may not know anything about their personal life but that's actually not encyclopedically relevant in the same way that it is for member of a royal family where it's basically all about the family so you know we we're fairly nuanced about notability and where it comes in and I've always um thought that they the term notability I think is a little problematic I mean it's we we struggle about how to talk about it the problem with notability is it's it can feel insulting so no that you're not noteworthy my mother's noteworthy it's a really important person in my life right so that's not right but it's more like verifiability is there a way to to get information that actually makes an encyclopedia entry it so happens that there's a Wikipedia page about me as I've learned recently and uh the first thought I had when I saw that was uh surely I am not notable enough so I was very surprised and grateful that such a page could exist and actually just allow me to say thank you to all the incredible people that are part of creating and maintaining Wikipedia it's my favorite website on the internet the collection of articles that Wikipedia has created is just incredible uh we'll talk about the various details of that but the the love and care that goes into creating Pages for individuals for a big pen for all this kind of stuff is just it's just really incredible so I just felt the love when I when I saw that page but I also felt just because I do this podcast and I just through this podcast gotten to know a few individuals that are quite controversial I've gotten to be on the receiving end of something quite to me as a person who loves other human beings I've gone to be at the receiving end of some kind of attacks through the Wikipedia form like you said when you look at Living individuals it can be quite hurtful the little details of information um and because I've become friends with Elon Musk and have interviewed him but I've also interviewed people on the left uh far left people on the right some people would say far right and so now you take a step you put your toe into the cold pool of politics and the shark emerges from the dubs and pulls you right in a boiling hot pool of politics I guess it's hot and so I got to experience some of that uh I think what you also realize is um there has to be for Wikipedia kind of credible sources verifiable sources and there's a dance there because some of the sources are pieces of Journalism and of course journalism operates under its own complicated incentives such that people can write articles that are not factual or um are cherry picking all the flaws they can have in a journalistic article for sure and those can be used as as uh sources it's like they dance hand in hand and so um for me sadly enough there was a really kind of concerted attack to say that I was never at MIT I never did anything in MIT just to clarify I am a research scientist at MIT I have been there since 2015. I'm there today I'm at a prestigious amazing laboratory called lids and I hope to be there for a long time and work on AI robotics machine learning there's a lot of incredible people there and by the way MIT has been very kind to defend me unlike Wikipedia says it is not an unpaid position there was no controversy it was all very uh calm and happy and Almost Boring uh research that I've been doing there and the other thing because I am half Ukrainian half Russian and I've traveled to Ukraine and I will travel to Ukraine again uh and I will travel to Russia for some very difficult conversations uh my heart has been broken by this War I have family in both places it's been a really difficult time but the little battle about the biography there also starts becoming important for the first time uh for me I also want to clarify sort of personally I use this opportunity of some inaccuracies there my father was not born in Chicago Russia he was born in Kiev Ukraine I was born in Chicago which is a town not in Russia there is a town like called that in Russia but there's another town in Tajikistan which is a Former Republic of the Soviet Union it is that town is now called b-u-s-t-o-n buston which is funny because we're now in Austin and Allison in Boston it seems like my whole life is surrounded by these kinds of towns so I was born in Tajikistan and the rest of the biography is interesting but my family is very evenly distributed between their Origins and where they grew up between Ukraine and Russia which is as a whole beautiful complexity to this whole thing so I want to just correct that it's like the fascinating thing about Wikipedia is in some sense those little details don't matter but in another sense what I felt when I saw a Wikipedia page about me or anybody I know is is there's this beautiful kind of saving that this person existed like a community that notices you it says like uh like a little you see like a like a butterfly that floats and you're like huh that it's not just any butterfly it's that one I like that one but you see a puppy or something or uh or it's this big pen this one I remember this one as the scratch and you get noticed in that way and that I know that's a beautiful thing and it's I mean maybe it's very silly of me and naive but I feel like Wikipedia in terms of individuals is an opportunity to celebrate people to celebrate ideas for sure and not a battleground of attacks of the kind of stuff we might see on on Twitter like the mockery the derision this kind of stuff for sure and of course you don't want to cherry pick all of us have flaws and so on but it just feels like um to highlight a controversy of some sort when that doesn't at all represent the entirety of the human in most cases yeah is sad yeah yeah yeah so there's a few things uh to unpack and all that um so first one of the things I find really always find very interesting is you know your status with MIT okay that's that's upsetting and it's an argument and can be sorted out but then what's interesting is you you gave as much time to that which is actually important and relevant to your career and so on to also where your father was born which most people would hardly notice but is really meaningful to you and I find that a lot when I talk to people who have a a biography in Wikipedia is there often is annoyed by a tiny error that no one's going to notice like this town in Tajikistan has got a new name and so on like nobody even knows what that means or whatever but it can be super important um and so that's that's one of the reasons you know for biographies we we say like human dignity really matters um and so you know some of the things have to do with and this is this is a common debate that goes on in Wikipedia is what we call undue weight so I give I'll give an example um there was a article I stumbled across many years ago about you know the mayor I know he wasn't a mayor he was a city council member of I think it was Peoria Illinois but some small town in in the Midwest and the entry you know he's been on the city council for 30 years or whatever he's pretty I mean frankly pretty boring guy and seems like a good local city politician but in this very short biography there was a whole paragraph a long paragraph about his son being arrested for DUI and it was clearly undue weight it's like what has this got to do with this guy if it even deserves a mention it wasn't even clear had he done anything hypocritical had he done himself anything wrong even was his son his son got a DUI that's never great but it happens to people and it doesn't seem like a massive Scandal for your dad so of course I just took that out immediately this is a long long time ago and that's the sort of thing where uh you know we have to really think about in a biography and about controversies to say is this a real controversy so in general like one of the things we we tend to say is like any section so if there's a biography and there's a section called controversies that's actually poor practice because it just invites people to say oh I want to work on this entry and let's see there's seven sections so this one's quite short can I add something right go out and find some more controversies that's nonsense right and in general putting it separate from everything else kind of makes it seem worse and also doesn't put it in the right context whereas if it's sort of a lie flow and there is a controversy there's always potential controversy for anyone uh it should just be sort of worked into the overall article because then it doesn't become a Temptation you can contextualize appropriately and so forth so that's you know um uh uh that's you know part of the whole process but I think for me one of the most important things is is what I call Community Health so yeah are we going to get it wrong sometimes yeah of course we're humans and doing good quality you know sort of reference material is hard the real question is how do people react you know to a criticism or a complaint or a concern and if the reaction is defensiveness or combativeness back or if someone's really sort of in there being aggressive um and in the wrong like no no no hold on we've got to do this the right way you got to say okay hold on you know are there good sources is this contextualized appropriately is it even important enough to mention um what does it mean uh you know and sometimes one of the the areas where I do think there is a very complicated flaw and and you've alluded to it a little bit but it's like we know the media is deeply flawed we know that journalism uh can go wrong and I would say particularly in the last whatever 15 years we've seen a real decimation of local media local newspapers uh we've seen a real rise in Click bait headlines and sort of eager focus on anything that might be controversial we've always had that with us of course there's always been tabloid newspapers but that makes it a little bit more challenging to say okay how do we how do we sort things out um when we have a pretty good sense that that not every source is valid so as an example um a few years ago it's been quite a while now um we deprecated uh the mail online as a source um and the mail online you know the digital arm of the Daily Mail it's a tabloid it it's not completely you know it's not fake news but it does tend to run very hyped up stories they they really love to attack people and go on the attack for political reasons and so on and it just isn't great and so by saying deprecated and I think some people say oh you ban The Daily Mail no we didn't ban it as a source we just said look it's probably not a great source right you should probably look for a better source so certainly you know if the daily mail runs a headline saying um new cure for cancer it's like you know probably there's more serious sources than a tabloid newspaper so you know in an article about lung cancer you probably wouldn't cite the Daily Mail that's kind of ridiculous but also for celebrities and and so forth to sort of they do cover celebrity gossip a lot but they also tend to have vendettas and so forth and you really have to step back and go is this really encyclopedic or is this just the daylight mail going on around and some of that requires a great Community Health like I mean it requires massive Community Health even for me for stuff I've seen as kind of if actually iffy about people I know things I know about myself I still feel like a a love for knowledge emanating from the article like in LA like I feel the community health so I will take all slight inaccuracies I would I I would I love it because that means there's people for the most part I feel of respect and love in this search for knowledge like sometimes because I also love stock overflow stock exchange for programming related things and they can get a little cranky sometimes to a degree where it's like it's not as like you could see you can feel the Dynamics of the health of the particular Community yeah and and sub-communities too like a particularly c-sharp or Java or python or whatever like there's little like communities that emerge you can feel the levels of toxicity because a little bit of strictness is good but a little too much is bad yeah because of the defensiveness because when somebody writes an answer and then somebody else kind of says well modify it and get defensive and there's this uh tension that's not conducive to like uh improving towards a more truthful depiction of like what with that topic yeah a great example that I really loved uh this morning that I saw someone left a note on my user talk page in English Wikipedia saying it was quite a dramatic headline thing uh racist hook on front page so we have on the front page of Wikipedia we have a little section called did you know it's just little tidbits and foxes things people find interesting and there's a whole process for how things get there and the one that somebody was raising a question about was it was comparing a very well-known uh U.S football player black uh there was a quote from another famous sport person uh comparing him to a Lamborghini clearly a compliment uh and so somebody said actually here's a study here's some interesting information about how black sports people are far more often compared to inanimate objects and given that kind of analogy and I think it's demeaning to compare a person to a car um Etc but they said I'm not I'm not pulling I'm not deleting it I'm not removing it I just want to raise the question and then there's this really interesting conversation that goes on where I think the general consensus was you know what this isn't like like the alarming headline racist thing on the front page Wikipedia that sounds holy moly that sounds bad but it's sort of like um actually yeah this this probably isn't the sort of analogy that we think is great and so we should probably think about how to improve our language and not not compare Sports people to inanimate objects and particularly be aware of certain racial sensitivities that there might be around that sort of thing if there is a disparity in the media of how people are called and I just thought you know what nothing for me to weigh in on here this is a good conversation like nobody's saying you know people should be banned if if they refer to what was his name the fridge Refrigerator Perry the you know very famous comparison to an inanimate object of a Chicago Bears player many years ago but they're just saying hey let's be careful about analogies that we just pick up from the media I said yeah you know that's good on the sort of uh deprecation of news sources is really interesting because I think what you're saying is ultimately you want to make a article by article decision kind of use your own judgment and it's such a subtle thing because uh the there's just a lot of hit pieces written about uh individuals like myself for example That masquerade as kind of an objective thorough exploration of a human being it's fascinating to watch because controversy and hit Pieces Just get more clicks oh yeah this is a I I guess as a Wikipedia contributor you start to deeply become aware of that and start to have a sense like a radar of Click bait versus truth like to to pick out the truth from the clickbaity type language oh yeah I mean it's it's really important and you know we talk a lot about weasel words um you know and um you know actually I'm sure we'll end up talking about but just to quickly mention in this area I think one of the potentially powerful tools um well because it is quite good at this I've played around with and practiced it quite a lot but Chad gbt4 is is really quite able to to take a passage and uh point out potentially biased terms to to rewrite it to be more neutral now it is a bit uh hanadine and it's a bit you know cliched so sometimes it just takes the spirit out of something that's actually not bad it's just like you know poetic language and you're like okay that's not actually helping but in many cases I think that sort of thing is quite interesting and I'm also interested in um you know can you imagine where you you feed in a Wikipedia entry and all the sources and you say help me find anything in the article that is not accurately reflecting what's in the sources and that doesn't have to be perfect it only has to be good enough to be useful to community so if if it scans an article and all the sources and you say oh it came back with 10 suggestions and seven of them were decent and three of them it just didn't understand well actually that's probably worth my time to do and it can help us um you know really um more quickly get good people to sort of review obscure entries uh and things like that so just as a small aside on that and we'll probably talk about language models a little bit uh or a lot more but one of the Articles uh one of the head pieces about me uh the journalist actually was very straightforward and honest about having used GPT to write part of the article oh interesting and then finding that it made an error and apologized for the error the gpt4 generated which has this kind of interesting Loop which is the articles are used to write Wikipedia Pages GPT is trained on Wikipedia and then there's like this um interesting Loop where the weasel words and the nuances can get lost or can propagate even though they're not ground in reality uh somehow in the generation of the language model new truths can be created and kind of linger yeah there's a famous webcomic that's titled cytogenesis which is about how something an error is in Wikipedia and there's no source for it but then a lazy journalist reads it and writes The Source yeah and then some helpful wikipedian spots that it has on the source finds the source and has it to Wikipedia and voila magic this happened to me once it it uh well it nearly happened um there was this I mean it was really brief I went back and researched I'm like this is really odd so biography magazine which is a magazine published by the biography TV channel um had a profile of me and it said uh in his spare time I'm not quoting exactly it's been many years but in his spare time he enjoys playing chess with friends I thought wow that sounds great like I would like to be that guy but actually I mean I play chess with my kids sometimes but no I'm not it's not a hobby of mine and uh I was like where did they get that and I contacted the magazine said where'd that come from they said oh it was in Wikipedia I looked in the history there had been vandalism of Wikipedia which was not you know it's not damaging it's just false so and it had already been removed but then I thought oh gosh well I better mention this to people because otherwise it's somebody's going to read that and they're going to add it the entry and it's going to take on a life of its own and then sometimes I wonder if it has because I've been I was invited a few years ago to do the ceremonial first move in the World Chess Championship and I thought I wonder if they think I'm a really big chess Enthusiast because they read this biography magazine article so but that that problem uh when we think about large language models and the ability to quickly generate very plausible but not true content I think it's something that there's going to be a lot of ShakeOut a lot of implications of that what would be hilarious is because of the social pressure of Wikipedia and the momentum you would actually start playing a lot more chess just not only the articles are written based on Wikipedia but your own life trajectory changes because just to make it more convenient yeah aspire to Aspire to yes but aspirational um what if we just talk about that before we jump uh back to some other interesting topics on Wikipedia let's talk about gpt4 and large language models uh so the AR in part trained on Wikipedia content yeah uh what are the pros and cons of of these language models what are your thoughts yeah so I mean there's a lot of stuff going on obviously the Technologies move very quickly in the last six months and looks poised to do so for some time to come um so first things first I mean part of our philosophy is the open licensing the free licensing the idea that you know this is what we're here for we we are a volunteer community and we write this um encyclopedia we give it to the world to do what you like with you can modify it pre-distribute it redistribute modified versions commercially non-commercially this is this is the licensing so in that sense of course it's completely fine now we do worry a bit about attribution um because it is a Creative Commons attribution sharealike license so attributes is important not just because of our licensing model and things like that but it's just proper attribution is just good intellectual practice and so and that's a really hard complicated question um you know if um if I were to write something about my visit here I might say in a blog post you know I was in uh Austin which is a city in Texas I'm not going to put a source for Austin as a city in Texas that's just general knowledge I learned it somewhere I can't tell you where so you don't have to cite and reference every single thing but you know if I actually did research and I used something very heavily it's just proper morally proper to give your sources so we would like to see that and obviously um you know they call it grounding so particularly people at Google are really keen on figuring out grounding aesthetical terms so ground any any text that's generated trying to ground it to the Wikipedia quality source source I mean like the same kind of standard of what a source means that Wikipedia uses the same kind of generating yeah the same kind of thing and of course one of the biggest flaws in chargept right now um is that it just literally will make things up just to be like amiable I think it's programmed to be very hopeful and amiable and it doesn't really know or care about the truth and get bullied into uh yeah it can kind of be convincing too well but like this morning I I was the story I was telling earlier about uh comparing a football player to a Lamborghini and I thought is that really racial I don't know but I'm just I'm mulling it over and I thought I'm gonna go to church BT so I sent to church gbt4 I said uh you know this this happened in Wikipedia can you think of examples where a white athlete has been compared to uh a fast car inanimate object and it comes back as a very plausible essay where it tells you know why these analogies are common and support mobile I said no no I really uh could you give me some specific examples so it gives me three specific examples very plausible correct names of athletes and contemporaries and all of that could have been true Googled every single quote none of them existed and so I'm like well that's really not good like I I wanted to explore a thought process I was in I thought hi I thought first I thought how do I Google and say well it's kind of a hard thing to Google Because unless somebody's written about this specific topic it's you know oh it's large language model it can it's processed all this data it can probably piece that together but it just can't yet so I think uh I hope that GPT five six seven you know three to five years I'm hoping we'll see a much higher you know level of accuracy um where when you ask a question like that I think instead of being quite so eager to please by giving you a plausible sounding answer it's just like don't know or maybe uh display the how much might be in this uh generated text like yeah I'm really would like to make you happy right now but I'm really stretched in with this General well it's it's one of the things I I've said for a long time so in Wikipedia one of the great things we do may not be great for our reputation except in a deeper sense for the long term I think it is but you know we'll we'll be a notice that says the neutrality of this section has been disputed or the following section doesn't cite in these sources um and I always joke uh you know sometimes I wish the New York Times would run a banner saying the neutrality of this has been disputed they can give us we had a big fight in The Newsroom as to whether to run this or not but we thought it's important enough to bring it to you but just be aware that not all the journalists are on board with Ah that's actually interesting and that's fine I would trust them more for that level of transparency so yeah similarly Chad GPT should say yeah 87 um well the neutrality one is really interesting because uh that's basically a summary of the discussions that are going on underneath it would be amazing if uh like I should be honest I don't look at the talk page often I don't it would be nice somehow if there was a kind of a summary in the in this Banner way of like this lots of Wars have been fought on this here land for this here paragraph It's really interesting yeah I hadn't thought of that because we one of the things I do spend a lot of time thinking about these days and you know people have found it we're moving slowly but you know we are moving thinking about okay these tools exist are there ways that this stuff can be useful to our community because a part of it is we we do approach things in a non-commercial way in a really deep sense it's like it's it's been great that Wikipedia has become very popular but really we're just we're a community whose hobby is writing an encyclopedia that's first and if it's popular great if it's not okay we might have trouble paying for more servers but it'll be fine and so how do we help the community use these tools what are the ways that these tools can support people and one example I never thought about I'm gonna start playing with it is you know feed in the article and feed in the talk page and say can you suggest some warnings in the article based on the conversation to the top page I think it might might be good at that it might get it wrong sometimes but again if it's reasonably successful at doing that and you can say oh actually yeah it does suggest um you know the neutrality of this has been disputed on a section that has a seven page discussion in the back that might be useful I don't know what you're playing with I mean some more color to the not neutrality but also the amount of emotion Laden in the exploration of this particular part of the topic yeah it might it might actually help you look at more controversial Pages uh like on you know a page on the war in Ukraine or a page on Israel and Palestine there could be parts that everyone agrees on and there's parts that are just like tough tough the hard part it would be nice to when looking at those beautiful long articles to know like all right let me just take in some stuff where everybody agrees on I could give an example that I haven't looked at in a long time but I was really pleased with what I saw at the time so the the discussion was that they're building something in Israel and for their own political reasons uh one side calls it a wall hearkening back to Berlin Wall apartheid the other calls it a security fence so we can understand quite quickly if we give it a moment's thought like okay I understand why people would have this this grappling over the language like okay you want to highlight the negative aspects of this and you want to highlight the positive aspects so you're going to try and choose a different name and so there was this really fantastic Wikipedia discussion on The Talk page how do we word that paragraph to talk about the different naming it's called This by Israel is called this by Palestinians and that how you explain that to people could be quite charged right you could easily explain oh there's this difference and it's because this side's good and this side's bad and that's why there's a difference or you could say actually let's just let's try and really stay as neutral as we can and try to explain the reasons so you may come away from it with with a concept uh oh okay I understand what this debate is about now and uh just the term israel-palestine conflict is still the title of a page at Wikipedia But the word conflict is something that is a charged word of course yeah because uh from the Palestinian side or from uh certain sides the word conflict doesn't accurately describe the situation because if you see it as a genocide One Way genocide is not a conflict because to that to to people that uh discuss um that challenge the word conflict they see you know conflict is when there's two equally powerful sides fighting yeah yeah no it's it's hard and you know in in a number of cases so this is this actually speaks to a slightly broader phenomenon which is there are a number of cases where there is no one word that can get consensus and in the body of an article that's usually okay because we can explain the whole thing you can come away with an understanding of why each side wants to use a certain word but there are some aspects like the pages have a title um so you know there's that same thing with um certain things like photos you know it's like well there's different photos which one's best a lot of different views on that but at the end of the day you need the lead photo because there's one slot for a lead photo categories is another one um so at one point I have no idea if it's in there today but I don't think so um I was listed in uh you know kind of American entrepreneurs fine American atheists and I said hmm that doesn't feel right to me like just personally it's true I mean I wouldn't wouldn't disagree with the objective fact of it but when you click the category and you see sort of a lot of people who are you might say American atheist activist because that's their big issue so Madeline Murray O'Hare or various famous people who uh Richard Dawkins who make it a big part of their public argument and persona but that's not true of me it's just like my private personal belief it doesn't really it's not something I campaign about so it felt weird to put me in the category but like what category would you put you know and and do you need that guy in this case I was I argued that doesn't need that kind of like that's not I don't speak about it publicly except incidentally from time to time I don't campaign about it so it's weird to put me with this group of people and that argument here today I hope not just because it was me but um but categories can be like that where you know you're either in the category or you're not and sometimes it's a lot more complicated than that and
Resume
Categories