Jimmy Wales: Wikipedia | Lex Fridman Podcast #385
diJp4zoQPqo • 2023-06-18
Transcript preview
Open
Kind: captions
Language: en
we've never bowed down to government
pressure anywhere in the world and we
never will
we understand that we're hardcore and
actually there is a bit of nuance about
how different companies respond to this
but our response has always been just to
say no and if they threaten to block
we'll knock yourself out you're going to
lose Wikipedia
the following is a conversation with
Jimmy Wales co-founder of Wikipedia one
of if not the most impactful websites
ever
expanding the collective knowledge
intelligence and wisdom of human
civilization
this is Alex Friedman podcast to support
it please check out our sponsors in the
description and now dear friends here's
Jimmy Wales
let's start at the beginning what is the
origin story of Wikipedia the origin
story of Wikipedia well
so I was watching the growth of the free
software movement open source software
and seeing programmers coming together
to collaborate in new ways sharing code
doing that under free license which is
really interesting because it empowers
an ability to work together that's
really hard to do if the code is still
proprietary because then if I chip in
and help we sort of have to figure out
how I'm going to be rewarded and what
that is but the idea that everyone can
copy it and it just is part of the
commons really empowered a huge wave of
uh creative software production and I
realized that that kind of collaboration
could extend beyond just software to all
kinds of cultural works and the first
thing that I thought of was an
encyclopedia and I thought oh that seems
obvious that an encyclopedia you can
collaborate on it there's a few reasons
why one we all pretty much know what an
encyclopedia entry on say the Eiffel
Tower should be like you know you should
see a picture a few pictures maybe
history location
something about the architect etc etc so
we have a shared understanding of what
it is we're trying to do and then we can
collaborate and different people can
chip in and find sources and so on and
so forth so set up first new pedia which
was about two years before Wikipedia and
with newpedia we we had this idea that
in order to be respected we had to be
even more academic than a traditional
encyclopedia because a bunch of
volunteers on the internet getting out
of the encyclopedia you know you could
be made fun of if it's just every random
person so we had implemented this seven
stage review process to get anything
published
um and two two things came with that so
one thing one of the earliest entries
that we published after this rigorous
process
a few days later we had to pull it
because as soon as it hit the web and
the broader Community took a look at it
people noticed plagiarism and realized
that it wasn't actually that good even
though it had been reviewed by academics
and so on so we had to pull it so it's
like okay well so much for a seven stage
review process but also I decided that I
wanted to try I was frustrated and why
is this taking so long why is it so hard
so I thought okay I saw that Robert
Merton had won a Nobel prize in
economics for his work on option pricing
Theory and when I was in Academia that's
what I worked on was option pricing
Theory how to publish paper so I'd
worked through all of his academic
papers and I knew his work quite well
I thought oh I'll just I'll write a
short biography of Merton
and when I started to do it I'd been out
of Academia I had been a grad student
for a few years then I felt this huge
intimidation because they were going to
take my draft and send it to the most
prestigious Finance professors that we
could find to give me feedback for
revisions and it felt like being back in
grad school you know it's like this
really oppressive sort of like you're
gonna submit it for a review and you're
going to get critiques a little bit the
bad part of God yeah yeah the bad part
of grad school right and so I was like
oh this isn't intellectually fun this is
like the bad part of grad school it's
intimidating and there's a lot of
um you know potential embarrassment if I
screw something up and so forth and so
that was when I realized okay look this
is never going to work this is not
something that people are really going
to want to do
so Jeremy Rosenfeld one of my employees
had brought and showed me the wiki
Concept in December and then Larry
Sanger brought in uh the same said what
about this Wiki idea and so uh in
January we decided to launch Wikipedia
but we weren't sure so the original
project was called newpedia and even
though it wasn't successful we did have
quite a group of academics and like
really serious people and we were
concerned that well maybe
these academics are going to really hate
this idea and we shouldn't just convert
the project immediately we should launch
this as a side project the idea of
here's a Wiki where we can start playing
around
but actually we got more work done in
two weeks than we had in almost two
years because people were able to just
jump on and start doing stuff and it was
actually a very exciting time you know
you could back then you could be the
first person who typed Africa is a
continent and hit save you know which
isn't much of an encyclopedia entry but
it's true and it's a start and it's kind
of fun like I you know you put your name
down actually a funny story was uh
several years later I just happened to
be online and I saw when um I think his
name is Robert Allman won the Nobel
prize in economics and we didn't have an
entry uh on him at all which was
surprising but it wasn't that surprising
this was still early days you know
um and so I got to be the first person
to type Robert Allman won a Nobel prize
in economics and hit save which again
wasn't a very good article but then I
came back two days later and people had
improved it and so forth so that that
second half of the experience where with
Robert Merton I never succeeded because
it was just too intimidating it was like
oh no I was able to chip in and help
other people jumped in everybody was
interested in the topic because it's all
in the news at the moment and so it's
just a completely different model which
worked much much better well what is it
that made that so accessible so fun so
uh so natural to just add something well
I think it's you know especially in the
early days and this by the way has
gotten much harder because there are
fewer topics that are just Green Field
you know available
um but you know you could say oh well uh
you know I I know a little bit about
this and I can I can get it started uh
but then it is fun to to come back then
and see other people have added and
improved and so on and so forth and that
idea of collaborating you know where
people can much like open source
software
um you know you you put your code out
and then people suggest revisions and I
change it and it modifies and it grows
beyond the original Creator
um it's just a kind of a fun wonderful
quite geeky hobby but
um people enjoy it how much debate was
there over the interface over the
details of how to make that well
seamless and frictionless yeah I mean
not as much as there probably should
have been in a way during that two years
of the failure of newpedia where very
little work got done
what was actually productive was there
was a huge long discussion email
discussion very clever people talking
about things like neutrality talking
about what is an encyclopedia but also
talking about more technical ideas you
know things back then XML was kind of
all the rage and thinking about ah could
we you know shouldn't you have certain
uh data that might be in multiple
articles that gets updated automatically
so for example you know the population
of New York City every 10 years there's
a new official census couldn't you just
up at the update that bit of data in one
place and it would update across all
those that is a reality today but back
then it was just like how do we do that
how do we think about that so that is a
reality today where it's yeah there's
some yeah so we can data variables yeah
Wiki data um you can you can link uh you
know from a Wikipedia entry you can link
to that piece of data in wikidata I mean
it's a pretty Advanced thing but there
are Advanced users who are doing that
and then when when that gets updated it
updates in all the languages where
you've done that I mean that's really
interesting there was this chain of
emails in the early days of discussing
the details of what is so there's the
interface there's the yeah so the
interface so an example there was some
software called use mod wiki which we
started with it's quite amusing actually
because the main reason we launched with
use mod wiki is that it was a single
Perl script so it was really easy for me
to install it on the server and just get
running but it was
um you know some guy's hobby project it
was cool but it was just a hobby project
and uh all the data was stored uh in
flat text files so there was no real
database behind it so the to search the
site you basically used graph which is
just like the basic Unix utility to like
look through all the files so that
clearly was never going to scale but
also in the early days it didn't have
real logins so you could set your
username but there were no passwords so
you know I might say Bob Smith and then
someone else comes along and says no I'm
Bob Smith and they both had it now that
never really happened we didn't have a
problem with it but it was kind of
obvious like you can't go a big website
where everybody can pretend to be
everybody that's that's not going to be
good for trust and reputation and so
forth so quickly I had to write a little
you know login you know store people's
passwords and things like that so you
can have unique identities and then
another example of something you know
quite he would have never thought would
have been a good idea and it turned out
to not be a problem but
to make a link in Wikipedia in the early
days
you would make a link to a page that may
or may not exist by just using camel
case meaning it's like uppercase
lowercase and you smash the words
together so maybe uh New York City he
might type new no space capital Y York
City
and that would make a link but that was
ugly that was clearly not right and so I
was like okay well that that's just not
going to look nice let's just use square
brackets two square brackets makes a
link that may have been an option in the
software I'm not sure I thought up
Square broadcast but anyway we just did
that
um which worked really well it makes
nice links and you know you can see in
its red links or Blue Links depending on
if the page exists or not but the thing
that didn't occur to me even think about
is that for example on the German
language standard keyboard there is no
square bracket
so for German Wikipedia to succeed
people had to learn to do some alt codes
to get the square bracket or they a lot
of users cut and paste a square bracket
when they could find one and they just
cut and paste one in and yep German
Wikipedia has been a massive success so
somehow that didn't slow people down
um how is the the German keyboards don't
have a square bracket how do you do
programming how do you how do you live
it's life to its fullest with us we have
a very good question I'm not really sure
I mean maybe it does now because of
keyboard standards have you know drifted
over time and becomes useful to have a
certain character I mean it's same thing
like there's not really a w character in
Italian
um and it wasn't on keyboards or I think
it is now but in in general W is not a
letter in Italian language but it
appears in enough International words
that it's crept into Italians and all of
these things are probably Wikipedia
articles in oh yeah cells oh yeah the
discussion of square brackets whole
discussion I'm sure on both the English
and the German Wikipedia and and then
difference between those two might be
very uh uh very interesting
so wikidata is fascinating but even the
broader discussion of uh what is an
encyclopedia can you go to that sort of
philosophical question of sure what is
what is it what is it what is this
encyclopedia so uh the way I would put
it is uh an encyclopedia or what our
goal is is the sum of all human
knowledge but some meaning summary
so and this was an early debate I mean
somebody started uploading uh the full
text of Hamlet for example and we said
wait hold on a second that's not an
encyclopedia article but why not
um so hence was born wikisource which is
where you put original texts and things
like that out of copyright text uh
because they said no an encyclopedia
article about Hamlet that's a perfectly
valid thing but the actual text of the
play is not an encyclopedia article so
most of it's fairly obvious but there
are some interesting quirks and
differences so for example as I
understand it in uh French language
encyclopedias traditionally it would be
quite common to have recipes which in
English language that would be unusual
you wouldn't find a recipe for chocolate
cake in Britannica and so I actually
don't know the current state I haven't
thought about that in many many years
now state of cake recipes in Wikipedia
in English Wikipedia I wouldn't say
there's chocolate cake recipes I mean
you might find a sample recipe somewhere
I'm not saying there are none but in
general no like we wouldn't have recipes
I told myself I would not get outraged
in this conversation but now I'm
outraged I'm deeply upset it's actually
very complicated I'm I'm I love to cook
I'm I'm you know I'm I'm actually quite
a good cook and uh what's interesting is
there's it's very hard to have a neutral
recipe because like a fanatical recipe
for canonical recipes is kind of
difficult to come by because there's so
many variants and it's all debatable and
interesting for something like chocolate
cake you could probably say you know
here's one of the earliest recipes or
here's one of the most common recipes
but
um you know for many many things uh the
variants are as interesting you know as
uh you know somebody said to me recently
you know 10 Spaniards 12 paella recipes
so you know these are all matters of
open discussion
well just to throw some numbers as of
May 27th 2023 there are 6 million
6.66 million articles in the English
Wikipedia containing over 4.3 billion
words
including articles the total number of
pages is 58 million yeah uh does that
blow your mind I mean yes it does I mean
it doesn't because I I know those
numbers and see them from time to time
but in another sense a deeper sense yeah
it does I mean it's really uh remarkable
I remember when
uh English Wikipedia passed 100 000
articles and when German Wikipedia
passed 100 000 because I happen to be in
Germany with a bunch of wikipedians that
night and
um you know then it seemed quite big I
mean we knew at that time that it it was
nowhere near complete I remember at
wikimania in Harvard uh when we when we
did our annual conference there in
Boston
um someone
who had come to the conference from
Poland had brought along with him a
small encyclopedia a single volume uh
Encyclopedia of biographies so short
biography is normally a paragraph or so
about famous people in Poland and there
were some 22 000 entries and he pointed
out that even then 2006 Wikipedia felt
quite big and he said in English
Wikipedia there's only a handful of
these you know less than 10 I think he
said and so then you realize yeah
actually you know who was the mayor of
Warsaw in
1873 don't know probably not in English
Wikipedia but it probably might be today
but there's so much out there and of
course what we get into when we're
talking about how many entries there are
and how many you know how many could
there be is this very deep philosophical
issue of notability
um which is the question of well how do
you how do you draw the limit how do you
draw you know what what is there so
sometimes people say oh there should be
no limit but I think that doesn't stand
up to much scrutiny if you really pause
and think about it so I see in your hand
there you've got a Bic pen pretty
standard everybody's seen you know
billions of those in life classic though
it's a classic clear big pen so could we
have an entry about that big pen oil I
bet we do that type of big pen uh
because it's classic everybody knows it
and it's got a history and
um actually there's something
interesting about the big company they
make pens they also make kayaks
and there's something else they're
famous or basically uh they're they're
sort of a definition by non-essentials
company anything that's long
and plastic that's what they make wow so
if you want to find the time the
platonic form of a big but could we have
an article about that very big pen
in your hand so Lex Friedman's big pen
out of this oh the very this is a very
specific instance and the answer is no
there's not much known about it I dare
say unless you know it's very special to
you and your great grandmother gave it
to you or something you probably know
very little about it it's a pen it's
just here in the office and
um so that that's just to show there's a
there's there is a limit I mean in
German Wikipedia they used to talk about
the the rear nut of the wheel of
ulifook's bicycle ulifooks the
well-known wikipedian of the time to
sort of illustrate like you can't have
an article about literally everything
and so then it raises the question what
can you have an article about what can't
do and that can vary depending on the
subject matter
um one of the areas where we try to be
much more careful would be biographies
the reason is a biography of a living
person
if you get it wrong it can actually be
quite hurtful quite damaging and so if
someone is a private person
um and somebody tries to create a
Wikipedia there's no way to update it
there's not much now so for example an
encyclopedia article about my mother my
mother school teacher later a pharmacist
wonderful woman but never been in the
news I mean other than me talking about
why there shouldn't be a Wikipedia entry
that's probably made it in somewhere
standard example but you know there's
not enough known and you could sort of
Imagine a database of genealogy having
date of birth date of death and you know
certain elements like that of of private
people but you couldn't really write a
biography one of the areas this comes up
quite often is
uh what we call blp1a we've got lots of
acronyms biography of a living person
who's notable for only one event there's
a real sort of danger zone and the type
of example would be a victim of a crime
so someone who's a victim of a famous
serial killer but about whom like really
not much is known they weren't a public
person they're just a victim of a crime
we really shouldn't have an article
about that person they'll be mentioned
of course and maybe the specific crime
might have an article but for that
person no not really that's not really
something that makes any sense because
how can you write a biography about
someone you don't know much about
and this is you know it varies from from
field to field so for example for many
academics we will have an entry that we
might not have in a different context
because for an academic
it's important to have sort of their
career you know what papers they've
published things like that you may not
know anything about their personal life
but that's actually not encyclopedically
relevant in the same way that it is for
member of a royal family where it's
basically all about the family so you
know we we're fairly nuanced about
notability and where it comes in and
I've always
um thought that they the term notability
I think is a little problematic I mean
it's we we struggle about how to talk
about it the problem with notability is
it's it can feel insulting so no that
you're not noteworthy my mother's
noteworthy it's a really important
person in my life right so that's not
right but it's more like verifiability
is there a way to to get information
that actually makes an encyclopedia
entry it so happens that there's a
Wikipedia page about me
as I've learned recently and uh the
first thought I had when I saw that was
uh surely I am not notable enough so I
was very surprised and grateful that
such a page could exist and actually
just allow me to say thank you to all
the incredible people that are part of
creating and maintaining Wikipedia it's
my favorite website on the internet the
collection of articles that Wikipedia
has created is just incredible uh
we'll talk about the various details of
that but
the the love and care that goes into
creating Pages for individuals for a big
pen for all this kind of stuff is just
it's just really incredible so I just
felt the love when I when I saw that
page but I also felt just because I do
this podcast and I just through this
podcast gotten to know a few individuals
that are quite controversial
I've gotten to be on the receiving end
of something quite
to me as a person who loves other human
beings I've gone to be at the receiving
end of some kind of attacks through the
Wikipedia form like you said when you
look at Living individuals it can be
quite hurtful the little details of
information
um and because I've become friends with
Elon Musk
and have interviewed him but I've also
interviewed people on the left uh far
left people on the right some people
would say far right and so now you take
a step you put your toe into the cold
pool of politics and the shark emerges
from the dubs and pulls you right in a
boiling hot pool of politics I guess
it's hot and so I got to experience some
of that uh I think
what you also realize is
um there has to be for Wikipedia kind of
credible sources verifiable sources
and there's a dance there because some
of the sources are pieces of Journalism
and of course journalism operates under
its own complicated incentives such that
people can write articles that are not
factual or
um are cherry picking all the flaws they
can have in a journalistic article for
sure and those can be used as as uh
sources it's like they dance hand in
hand and so
um for me sadly enough there was a
really kind of concerted attack to say
that I was never at MIT I never did
anything in MIT just to clarify I am a
research scientist at MIT I have been
there since 2015. I'm there today I'm at
a prestigious amazing laboratory called
lids and I hope to be there for a long
time and work on AI robotics machine
learning there's a lot of incredible
people there and by the way MIT has been
very kind to defend me unlike Wikipedia
says it is not an unpaid position
there was no controversy it was all very
uh calm and happy and Almost Boring uh
research that I've been doing there and
the other thing because I am half
Ukrainian half Russian and I've traveled
to Ukraine and I will travel to Ukraine
again
uh and I will travel to Russia for some
very difficult conversations uh my heart
has been broken by this War I have
family in both places it's been a really
difficult time
but the little
battle about the biography there also
starts becoming important for the first
time uh for me I also want to clarify
sort of personally I use this
opportunity of some inaccuracies there
my father was not born in Chicago Russia
he was born in Kiev Ukraine
I was born in Chicago
which is a town not in Russia there is a
town like called that in Russia but
there's another town in Tajikistan which
is a Former Republic of the Soviet Union
it is that town is now called
b-u-s-t-o-n buston
which is funny because we're now in
Austin and Allison in Boston it seems
like my whole life is surrounded by
these kinds of towns so I was born in
Tajikistan and the rest of the biography
is interesting but my family is very
evenly distributed between their Origins
and where they grew up between Ukraine
and Russia which is as a whole beautiful
complexity to this whole thing so I want
to just correct that it's like
the fascinating thing about Wikipedia
is in some sense those little details
don't matter
but in another sense what I felt when I
saw a Wikipedia page about me or anybody
I know is is there's this beautiful kind
of saving that this person existed
like a community that notices you it
says like uh like a little you see like
a like a butterfly that floats and
you're like huh that it's not just any
butterfly it's that one I like that one
but you see a puppy or something or uh
or it's this big pen this one I remember
this one as the scratch and you get
noticed in that way and that I know
that's a beautiful thing and it's
I mean maybe it's very silly of me and
naive but I feel like Wikipedia in terms
of individuals is an opportunity to
celebrate
people to celebrate ideas for sure and
not a battleground of attacks of the
kind of stuff we might see on on Twitter
like the mockery the derision this kind
of stuff for sure and of course you
don't want to cherry pick all of us have
flaws and so on but it just feels like
um to highlight a controversy of some
sort when that doesn't at all represent
the entirety of the human in most cases
yeah is sad yeah yeah yeah so there's a
few things uh to unpack and all that
um so first one of the things I find
really always find very interesting is
you know your status with MIT okay
that's that's upsetting and it's an
argument and can be sorted out
but then what's interesting is you you
gave as much time to that which is
actually important and relevant to your
career and so on to also where your
father was born which most people would
hardly notice but is really meaningful
to you and I find that a lot when I talk
to people who have a a biography in
Wikipedia is there often is annoyed by a
tiny error that no one's going to notice
like this town in Tajikistan has got a
new name and so on like nobody even
knows what that means or whatever but it
can be super important
um and so that's that's one of the
reasons you know for biographies we we
say like human dignity really matters
um and so you know some of the things
have to do with and this is this is a
common debate that goes on in Wikipedia
is what we call undue weight so I give
I'll give an example
um
there was a article I stumbled across
many years ago about you know the mayor
I know he wasn't a mayor he was a city
council member of I think it was Peoria
Illinois but some small town in in the
Midwest and the entry you know he's been
on the city council for 30 years or
whatever he's pretty I mean frankly
pretty boring guy and seems like a good
local city politician but in this very
short biography there was a whole
paragraph a long paragraph about his son
being arrested for DUI
and it was clearly undue weight it's
like what has this got to do with this
guy if it even deserves a mention it
wasn't even clear
had he done anything hypocritical had he
done himself anything wrong even was his
son his son got a DUI that's never great
but it happens to people and it doesn't
seem like a massive Scandal for your dad
so of course I just took that out
immediately this is a long long time ago
and that's the sort of thing where uh
you know we have to really think about
in a biography and about controversies
to say is this a real controversy so in
general like one of the things we we
tend to say is like any section so if
there's a biography and there's a
section called controversies that's
actually poor practice because it just
invites people to say oh I want to work
on this entry and let's see there's
seven sections so this one's quite short
can I add something right go out and
find some more controversies that's
nonsense right and in general putting it
separate from everything else kind of
makes it seem worse and also doesn't put
it in the right context whereas if it's
sort of a lie flow and there is a
controversy there's always potential
controversy for anyone uh it should just
be sort of worked into the overall
article because then it doesn't become a
Temptation you can contextualize
appropriately and so forth so that's you
know
um
uh uh that's you know part of the whole
process but I think for me one of the
most important things is is what I call
Community Health so yeah are we going to
get it wrong sometimes yeah of course
we're humans and doing good quality you
know sort of reference material is hard
the real question is how do people react
you know to a criticism or a complaint
or a concern and if the reaction is
defensiveness or combativeness back or
if someone's really sort of in there
being aggressive
um and in the wrong
like no no no hold on we've got to do
this the right way you got to say okay
hold on you know are there good sources
is this contextualized appropriately is
it even important enough to mention
um what does it mean uh you know and
sometimes one of the the areas where I
do think there is a very complicated
flaw and and you've alluded to it a
little bit but it's like we know the
media is deeply flawed we know that
journalism uh can go wrong and I would
say particularly in the last whatever 15
years we've seen a real decimation of
local media local newspapers uh we've
seen a real rise in Click bait headlines
and sort of eager focus on anything that
might be controversial we've always had
that with us of course there's always
been tabloid newspapers
but that makes it a little bit more
challenging to say okay how do we how do
we sort things out
um when we have a pretty good sense that
that not every source is valid so as an
example
um
a few years ago it's been quite a while
now
um we deprecated uh the mail online as a
source
um and the mail online you know the
digital arm of the Daily Mail it's a
tabloid it it's not completely you know
it's not fake news but it does tend to
run very hyped up stories they they
really love to attack people and go on
the attack for political reasons and so
on and it just isn't great and so by
saying deprecated and I think some
people say oh you ban The Daily Mail no
we didn't ban it as a source we just
said look it's probably not a great
source right you should probably look
for a better source so certainly you
know if the daily mail runs a headline
saying
um new cure for cancer it's like you
know probably there's more serious
sources than a tabloid newspaper so you
know in an article about lung cancer you
probably wouldn't cite the Daily Mail
that's kind of ridiculous but also for
celebrities and and so forth to sort of
they do cover celebrity gossip a lot but
they also tend to have vendettas and so
forth and you really have to step back
and go is this really encyclopedic or is
this just the daylight mail going on
around and some of that requires a great
Community Health like I mean it requires
massive Community Health even for me for
stuff I've seen as kind of if actually
iffy about people I know things I know
about myself I still feel
like a a love for knowledge emanating
from the article like in LA like I feel
the community health so I will take all
slight inaccuracies I would I I would I
love it because that means there's
people for the most part I feel of
respect and love in this search for
knowledge like sometimes because I also
love stock overflow stock exchange for
programming related things and they can
get a little cranky sometimes to a
degree where it's like
it's not as like you could see you can
feel the Dynamics of the health of the
particular Community yeah and and
sub-communities too like a particularly
c-sharp or Java or python or whatever
like there's little like communities
that emerge you can feel the levels of
toxicity because a little bit of
strictness is good but a little too much
is bad yeah because of the defensiveness
because when somebody writes an answer
and then somebody else kind of says well
modify it and get defensive and there's
this uh tension that's not conducive to
like uh improving towards a more
truthful depiction of like what with
that topic yeah a great example that I
really loved uh this morning that I saw
someone left a note on my user talk page
in English Wikipedia saying it was quite
a dramatic headline thing uh racist hook
on front page so we have on the front
page of Wikipedia we have a little
section called did you know it's just
little tidbits and foxes things people
find interesting and there's a whole
process for how things get there
and the one that somebody was raising a
question about was it was comparing a
very well-known uh U.S football player
black uh there was a quote from another
famous sport person uh comparing him to
a Lamborghini clearly a compliment uh
and so somebody said actually here's a
study here's some interesting
information about how black sports
people are far more often compared to
inanimate objects and given that kind of
analogy and I think it's demeaning to
compare a person to a car
um Etc but they said I'm not I'm not
pulling I'm not deleting it I'm not
removing it I just want to raise the
question and then there's this really
interesting conversation that goes on
where I think the general consensus was
you know what this isn't like
like the alarming headline racist thing
on the front page Wikipedia that sounds
holy moly that sounds bad but it's sort
of like um actually yeah this this
probably isn't the sort of analogy that
we think is great and so we should
probably think about how to improve our
language and not not compare Sports
people to inanimate objects and
particularly be aware of
certain racial sensitivities that there
might be around that sort of thing if
there is a disparity in the media of how
people are called and I just thought you
know what nothing for me to weigh in on
here this is a good conversation like
nobody's saying you know people should
be banned if if they refer to what was
his name the fridge Refrigerator Perry
the you know very famous comparison to
an inanimate object of a Chicago Bears
player many years ago but they're just
saying hey let's be careful about
analogies that we just pick up from the
media I said yeah you know that's good
on the sort of uh deprecation of news
sources is really interesting because I
think what you're saying is ultimately
you want to make a article by article
decision kind of use your own judgment
and it's such a subtle thing because uh
the
there's just a lot of hit pieces written
about uh individuals like myself for
example That masquerade as
kind of an objective thorough
exploration of a human being it's
fascinating to watch because controversy
and hit Pieces Just get more clicks oh
yeah this is a I I guess as a Wikipedia
contributor
you start to deeply become aware of that
and start to have a sense like a radar
of Click bait versus truth like to to
pick out the truth from the clickbaity
type language oh yeah I mean it's it's
really important and you know we talk a
lot about weasel words
um you know and
um you know actually I'm sure we'll end
up talking about
but just to quickly mention in this area
I think one of the potentially powerful
tools
um
well because it is quite good at this
I've played around with and practiced it
quite a lot but Chad gbt4 is is really
quite able to to take a passage
and
uh
point out potentially biased terms to to
rewrite it to be more neutral now it is
a bit uh hanadine and it's a bit you
know cliched so sometimes it just takes
the spirit out of something that's
actually not bad it's just like you know
poetic language and you're like okay
that's not actually helping but in many
cases I think that sort of thing is
quite interesting and I'm also
interested in
um you know
can you imagine where you you
feed in a Wikipedia entry and all the
sources
and you say
help me find anything in the article
that is not accurately reflecting what's
in the sources and that doesn't have to
be perfect it only has to be good enough
to be useful to community so if if it
scans an article and all the sources and
you say oh it came back with
10 suggestions and seven of them were
decent and three of them it just didn't
understand well actually that's probably
worth my time to do and it can help us
um you know really
um more quickly get good people to sort
of review obscure entries uh and things
like that so just as a small aside on
that and we'll probably talk about
language models a little bit uh or a lot
more but one of the Articles uh one of
the head pieces about me uh the
journalist actually was very
straightforward and honest about having
used GPT to write part of the article oh
interesting and then finding that it
made an error and apologized for the
error the gpt4 generated which has this
kind of interesting Loop which is the
articles are used to write Wikipedia
Pages GPT is trained on Wikipedia and
then there's like this um
interesting
Loop where the weasel words and the
nuances can get lost or can
propagate even though they're not ground
in reality uh somehow in the generation
of the language model new truths can be
created and kind of linger yeah there's
a famous webcomic that's titled
cytogenesis which is about how
something an error is in Wikipedia and
there's no source for it but then a lazy
journalist reads it and writes The
Source yeah and then some helpful
wikipedian spots that it has on the
source finds the source and has it to
Wikipedia and voila magic this happened
to me once it it uh well it nearly
happened
um there was this I mean it was really
brief I went back and researched I'm
like this is really odd so biography
magazine which is a magazine published
by the biography TV channel
um had a profile of me and it said
uh in his spare time I'm not quoting
exactly it's been many years but in his
spare time he enjoys playing chess with
friends I thought wow that sounds great
like I would like to be that guy but
actually I mean I play chess with my
kids sometimes but no I'm not it's not a
hobby of mine and
uh I was like where did they get that
and I contacted the magazine said
where'd that come from they said oh it
was in Wikipedia I looked in the history
there had been vandalism of Wikipedia
which was not you know it's not damaging
it's just false so and it had already
been removed but then I thought oh gosh
well I better mention this to people
because otherwise it's somebody's going
to read that and they're going to add it
the entry and it's going to take on a
life of its own and then sometimes I
wonder if it has because I've been I was
invited a few years ago to do the
ceremonial first move in the World Chess
Championship and I thought I wonder if
they think I'm a really big chess
Enthusiast because they read this
biography magazine article so but that
that problem uh when we think about
large language models and the ability to
quickly generate very plausible but not
true content I think it's something that
there's going to be a lot of ShakeOut a
lot of implications of that what would
be hilarious is because of the social
pressure of Wikipedia and the momentum
you would actually start playing a lot
more chess just not only the articles
are written based on Wikipedia but your
own life trajectory changes because just
to make it more convenient
yeah aspire to Aspire to yes but
aspirational
um what if we just talk about that
before we jump uh back to some other
interesting topics on Wikipedia let's
talk about gpt4 and large language
models uh so the AR in part trained on
Wikipedia content yeah uh
what are the pros and cons of of these
language models what are your thoughts
yeah so I mean there's a lot of stuff
going on obviously the Technologies move
very quickly in the last six months and
looks poised to do so for some time to
come
um so first things first I mean part of
our philosophy is
the open licensing the free licensing
the idea that you know this is what
we're here for we we are a volunteer
community and we write this
um encyclopedia we give it to the world
to do what you like with you can modify
it pre-distribute it redistribute
modified versions commercially
non-commercially this is this is the
licensing so in that sense of course
it's completely fine now we do worry a
bit about attribution
um because it is a Creative Commons
attribution sharealike license so
attributes is important not just because
of our licensing model and things like
that but it's just proper attribution is
just good intellectual practice and so
and that's a really hard complicated
question
um you know if um
if I
were to write something about my visit
here I might say in a blog post you know
I was in uh
Austin which is a city in Texas I'm not
going to put a source for Austin as a
city in Texas that's just general
knowledge I learned it somewhere I can't
tell you where so you don't have to cite
and reference every single thing but you
know if I actually did research and I
used something very heavily it's just
proper morally proper to give your
sources so we would like to see that and
obviously
um you know they call it grounding so
particularly people at Google are really
keen on
figuring out grounding aesthetical terms
so ground any any text that's generated
trying to ground it to the Wikipedia
quality source source I mean like the
same kind of standard of what a source
means that Wikipedia uses the same kind
of generating yeah the same kind of
thing and of course one of the biggest
flaws in chargept right now
um is that it just literally will make
things up just to be
like amiable I think it's programmed to
be very hopeful and amiable and it
doesn't really know or care about the
truth and get bullied into uh yeah it
can kind of be convincing too well but
like this morning I I was the story I
was telling earlier about uh comparing a
football player to a Lamborghini and I
thought is that really racial I don't
know but I'm just I'm mulling it over
and I thought I'm gonna go to church BT
so I sent to church gbt4 I said uh you
know this this happened in Wikipedia can
you think of examples where a white
athlete has been compared to uh
a fast car inanimate object and it comes
back as a very plausible essay where it
tells you know why these analogies are
common and support mobile I said no no I
really uh could you give me some
specific examples so it gives me three
specific examples very plausible correct
names of athletes and contemporaries and
all of that could have been true Googled
every single quote none of them existed
and so I'm like well that's really not
good like I I wanted to explore a
thought process I was in I thought hi I
thought first I thought how do I Google
and say well it's kind of a hard thing
to Google Because unless somebody's
written about this specific topic it's
you know oh it's large language model it
can it's processed all this data it can
probably piece that together but it just
can't yet so I think
uh I hope that
GPT five six seven you know three to
five years I'm hoping we'll see a much
higher you know level of accuracy
um where when you ask a question like
that I think instead of being quite so
eager to please by giving you a
plausible sounding answer it's just like
don't know or maybe uh display
the how much might be in this
uh generated text like yeah I'm really
would like to make you happy right now
but I'm really stretched in with this
General well it's it's one of the things
I I've said for a long time so in
Wikipedia one of the great things we do
may not be great for our reputation
except in a deeper sense for the long
term I think it is but you know we'll
we'll be a notice that says the
neutrality of this section has been
disputed or the following section
doesn't cite in these sources
um and I always joke uh you know
sometimes I wish the New York Times
would run a banner saying the neutrality
of this has been disputed they can give
us we had a big fight in The Newsroom as
to whether to run this or not
but we thought it's important enough to
bring it to you but just be aware that
not all the journalists are on board
with Ah that's actually interesting and
that's fine I would trust them more for
that level of transparency so yeah
similarly Chad GPT should say yeah 87
um well the neutrality one is
really interesting because uh that's
basically a summary
of the discussions that are going on
underneath it would be amazing if uh
like I should be honest I don't look at
the talk page often I don't it would be
nice somehow if there was a kind of a
summary in the in this Banner way of
like
this lots of Wars have been fought on
this here land for this here paragraph
It's really interesting yeah I hadn't
thought of that because we one of the
things I do spend a lot of time thinking
about these days and you know people
have found it we're moving slowly but
you know we are moving thinking about
okay these tools exist are there ways
that this stuff can be useful to our
community because a part of it is we we
do approach things in a non-commercial
way in a really deep sense it's like
it's it's been great that Wikipedia has
become very popular but really we're
just we're a community whose hobby is
writing an encyclopedia that's first and
if it's popular great if it's not okay
we might have trouble paying for more
servers but it'll be fine and so how do
we help the community use these tools
what are the ways that these tools can
support people and one example I never
thought about I'm gonna start playing
with it is you know feed in the article
and feed in the talk page and say can
you suggest some warnings in the article
based on the conversation to the top
page I think it might might be good at
that it might get it wrong sometimes but
again if it's reasonably successful at
doing that and you can say oh actually
yeah it does suggest
um you know the neutrality of this has
been disputed on a section that has a
seven page discussion in the back that
might be useful I don't know what you're
playing with I mean some more color to
the
not neutrality but also
the amount of emotion Laden in the
exploration of this particular part of
the topic yeah it might it might
actually help you look at more
controversial Pages uh like on you know
a page on the war in Ukraine or a page
on Israel and Palestine there could be
parts that everyone agrees on and
there's parts that are just like tough
tough the hard part it would be nice to
when looking at those beautiful long
articles to know like all right let me
just take in some stuff where everybody
agrees on I could give an example that I
haven't looked at in a long time but I
was really pleased with what I saw at
the time so the the discussion was that
they're building something in Israel
and for their own political reasons uh
one side calls it a wall hearkening back
to Berlin Wall apartheid the other calls
it a security fence so we can understand
quite quickly if we give it a moment's
thought like okay I understand why
people would have this this grappling
over the language like okay you want to
highlight the negative aspects of this
and you want to highlight the positive
aspects so you're going to try and
choose a different name and so there was
this really fantastic Wikipedia
discussion on The Talk page how do we
word that paragraph to talk about the
different naming it's called This by
Israel is called this by Palestinians
and that how you explain that to people
could be quite charged right you could
easily explain oh there's this
difference and it's because this side's
good and this side's bad and that's why
there's a difference or you could say
actually let's just let's try and really
stay as neutral as we can and try to
explain the reasons so you may come away
from it with with a concept uh oh okay I
understand what this debate is about now
and uh just the term
israel-palestine conflict
is still the title of a page at
Wikipedia But the word conflict is
something that is a charged word of
course yeah because uh from the
Palestinian side or from uh certain
sides the word conflict doesn't
accurately describe the situation
because if you see it as a genocide One
Way genocide is not a conflict because
to that to to people that uh discuss
um that challenge the word conflict they
see you know conflict is when there's
two equally powerful sides fighting yeah
yeah no it's it's hard and you know in
in a number of cases so this is this
actually speaks to a slightly broader
phenomenon which is there are a number
of cases where there is no one word that
can get consensus
and in the body of an article that's
usually okay because we can explain the
whole thing you can come away with an
understanding of why each side wants to
use a certain word but there are some
aspects like the pages have a title
um so you know there's that same thing
with
um certain things like photos you know
it's like well there's different photos
which one's best a lot of different
views on that but at the end of the day
you need the lead photo because there's
one slot for a lead photo
categories is another one
um so at one point I have no idea if
it's in there today but I don't think so
um
I was listed in uh you know kind of
American entrepreneurs fine American
atheists and I said hmm that doesn't
feel right to me like just personally
it's true I mean I wouldn't wouldn't
disagree with the objective fact of it
but when you click the category and you
see sort of a lot of people who are you
might say American atheist activist
because that's their big issue so
Madeline Murray O'Hare or various famous
people who uh Richard Dawkins who make
it a big part of their public argument
and persona but that's not true of me
it's just like my private personal
belief it doesn't really it's not
something I campaign about so it felt
weird to put me in the category but like
what category would you put you know and
and do you need that guy in this case I
was I argued that doesn't need that kind
of like that's not I don't speak about
it publicly except incidentally from
time to time I don't campaign about it
so it's weird to put me with this group
of people and that argument here today I
hope not just because it was me but
um but categories can be like that where
you know you're either in the category
or you're not and sometimes it's a lot
more complicated than that and 
Resume
Read
file updated 2026-02-14 07:36:22 UTC
Categories
Manage