Transcript
-DVyjdw4t9I • Guido van Rossum: Python and the Future of Programming | Lex Fridman Podcast #341
/home/itcorpmy/itcorp.my.id/harry/yt_channel/out/lexfridman/.shards/text-0001.zst#text/0689_-DVyjdw4t9I.txt
Kind: captions
Language: en
can you imagine possible features
that python 4.0 might have that would
necessitate the creation of the new 4.0
given the amount of
pain and joy
suffering and Triumph that was involved
in the move between version 2 and
version 3.
the following is a conversation with
Guido van Rossum his second time on this
podcast he is the creator of the Python
programming language and is Python's
Emeritus pdfo benevolent dictator for
life this is the Lex Friedman podcast to
support it please check out our sponsors
in the description and now dear friends
here's Guido and Russell
python 3.11 is coming out very soon init
C python claimed to be 10 to 60 percent
faster how did you pull that off and
what's C python C python is the last
python implementation standing also the
first one that was ever created the
original python implementation that I
started over 30 years ago so what does
it mean that python the programming
language is implemented in another
programming language called C what kind
of audience do you have in mind here
people who know programming no there's
somebody on a boat that's into fishing
and have never heard about programming
but also some world-class programmers
you're gonna have to speak to both
imagine a boat with two people one of
them has not heard about programming is
really into phishing and the other one
is like uh an incredible Silicon Valley
programmer that's programmed in
everything C C plus plus python rust
Java it knows the entire history of
programming languages so you're gonna
have to speak to both I imagine that
boat in the middle of the ocean yes I'm
gonna please the guy who knows how to
fish first yes please
he seems like the most useful in the
middle of the ocean you got you gotta
make him mad I'm sure he has a cell
phone so uh he's probably very
suspicious about what goes on in that
cell phone but he must have heard that
inside his cell phone is a tiny computer
and a programming language is computer
code that tells the computer what to do
it's a very low level
language it's zeros and ones and then
there's assembly and then oh yeah we
don't talk about these really low levels
because those just confuse people I mean
when we're talking about human language
we're not usually talking about vocal
tracts and how you position your tongue
I was talking yesterday about how when
you have a Chinese person and they speak
English uh this is a bit of a stereotype
they often don't know or they they can't
can't seem to make the difference well
between an l and an r and I have a
theory about that and I've never checked
this with linguists
uh that it probably has to do with the
fact that in Chinese there is not really
a difference and it could be that there
are Regional variations in how China
native Chinese speakers pronounce that
one sound that
sounds to L to some like L to some of
them like R to others so it's both the
sounds you produce with your mouth
throughout the history of your life and
what you're used to listening to I mean
every language has that Russian has
exactly the Slavic languages have sounds
like the letters
like uh Americans or English speakers
don't seem to know the sounds
they seemed uncomfortable with that
sound yeah so I'm sure oh yes okay so
we're not we're not going to the shapes
of tongues and the sounds that the mouth
can make fine words similarly we're not
going into the ones and zeros or machine
language I would say a programming
language is a list of instructions like
a cookbook recipe
that sort of tells you how to do a
certain thing like make a sandwich well
acquire a loaf of bread cut it in slices
uh take two slices uh put mustard on one
put the jelly on the other or something
then add the meat then add the cheese
I've heard that science teachers can
actually uh do great stuff with recipes
like that and trying to interpret their
students instructions incorrectly until
the students are completely unambiguous
about it
with language see that's the difference
between
natural languages and programming
languages I think ambiguity is a feature
not a bug in human spoken languages like
uh
that's the dance of communication
between humans
well for lawyers ambiguity certainly is
a feature
uh for plenty of other cases uh the
ambiguity is is not much of a feature
but we work around it of course well
what's more important is context
so with context the Precision of the
statement becomes more and more concrete
right but you know when you say I love
you to a person that matters a lot to
you the person doesn't try to compile
that statement and return an error
saying please define love
right no but I imagine that my wife and
my son uh interpret it very differently
yes even though it's the same three
words but imprecisely still
oh for sure
lawyers never had a lot of follow-up
questions for you nevertheless the
context is already different different
in that case yes fair enough so that's
that's a programming language is uh
ability to unambiguously State a recipe
actually let's go back let's go to Pepe
you go through and pepe the style guide
for python code some ideas of what this
language should look like
feel like read like and the big idea
there is that code readability counts
what does that mean to you and how do we
achieve it so this recipe should be
readable that's a thing between
programmers
because on the one hand we always
explain the concept of programming
language as computers need instructions
and computers are very dumb and they
need very precise instructions because
they don't have much context in in fact
they have lots of context but their
their context is very different
but what we've seen emerge during the
development of software starting in the
probably in the late 40s
is that software is a very social
activity a software developer is not a
mad scientist who sits alone in his lab
writing brilliant code
software is developed by teams of people
uh even the Mad scientists sitting alone
in his lap can type fast enough to
produce enough code so that by the time
he's done with his coding he still
remembers what the first few lines he
wrote mean so even the mad scientist
coding alone in his lab would would be
sort of wise to
adopt conventions on how to format
the instructions that he gives to the
computer so that the thing is there is a
difference between a cookbook recipe and
a computer program
The cookbook recipe the the author of
The cookbook writes it once
and then is printed in 100 000 copies
and then lots of people in their
kitchens try to recreate that recipe
that that particular
pie or dish from the recipe and so there
the
the goal of the cookbook author is to
make it clear
to the human reader of the recipe the
human amateur Chef in most cases
when you're writing a computer program
you have two audiences at once
it needs to
tell the computer what to do
but it also is useful if that program is
readable by other programmers
because computer software unlike the
typical recipe for a cherry pie is so
complex
that you don't get all of it right at
once
you end up with the activity of
debugging and you end up with the
activity of so debugging is
trying to figure out why your code
doesn't run the way you thought it
should run that means brother could be
stupid little errors or it could be big
logical errors it could be anything
spiritual yeah it could be anything from
a typo to uh a wrong choice of algorithm
to
building something that does what you
tell it to do but that's not useful
yeah it seems to work really well 99 of
the time but does weird things one
percent of the time on some edge cases
that's pretty much all software nowadays
all good software right well yeah for
for bad software that that 99 goes down
a lot so but it's not just about the
complexity of the program it's like you
said it is a social endeavor
in that you're constantly improving that
recipe for the cherry pie but you're
sort of you're in a group of people
improving that recipe or the mad
scientist is improving the recipe that
he created a year ago and making it
better
or adding adding something he decides
that he wants a I don't know he wants
some decoration on his pie or icing or
so there's broad philosophical things
and there's specific advice on style so
first of all the thing that people first
experience when they look up python
there is a it is very readable
but there's also like a spatial
structure to it
can you explain the indentation style of
python and what is the magic to it space
bases are important for readability of
any kind of text
if you take a cookbook recipe
and you remove all the sort of
all the bullets and other markup
and you just crunch all the text
together maybe you leave the spaces
between the words but that's all you
leave
when you're in the kitchen trying to
figure out oh what are the ingredients
and what are the steps
and where does this step end and the
next step begin you're going to have a
hard time if it's if it's just one solid
block of text
on the other hand what what a typical
cookbook does if the paper is not too
expensive
each recipe starts on its own page maybe
there's a picture next to it the list of
ingredients comes first
uh there's a standard notation the
there's there's shortcuts so that you
don't have to sort of write two
sentences on how you have to cut the
onion because there are only three ways
that people ever cut onions in a kitchen
small medium and in slices or something
like that
right none of my examples make any sense
to real Cooks of course but yeah
we're talking to programmers with a
metaphor of cooking I love it
um but there is a strictness to the
spacing that python defines so there's
some
a looser thing some stricter things but
the four spaces for the
for the indentation is really
interesting it it really
um it really defines what the language
looks and feels like because indentation
sort of taking a block of text and then
having inside that block of text
a smaller block of text that is indented
further as sort of a group it's it's
it's like you have a a bulleted list
in a complex business document and
inside some of the bullets are other
bulleted lists you will indent those too
if each bulleted list is indented
several inches then at two levels deep
there's no no space left on the page to
put any of the words of the text so you
can't indent too far on the other hand
if you don't indent at all you can tell
whether something is a top level bullet
or a second level bullet or a third
level bullet so you have to have
some compromise and uh based on Ancient
conventions
and the sort of the typical width of a
computer screen in the 80s
uh and all sorts of things sort of
we we came up with sort of four spaces
as a compromise I mean there there are
groups there are large groups of people
who code with uh two spaces per indent
level for example the Google style guide
uh all the Google python code and I
think also all the Google C plus plus
code is indented with only two spaces
per block if you're not used to that
it's harder to at a glance
understand the code because the the sort
of the the high level structure is
determined by the indentation on the
other hand there are there are other
programming languages where the
indentation is uh eight spaces or a
whole tab stop in in sort of classic
Unix and to me that looks weird because
you you sort of after three indent
levels you've you've got no room left
well there's some languages where the
indentation is a recommendation
it's a stylistic one the code compiles
even without any indentation
and then python really indentations a
fundamental part of the language right
it doesn't have to be four spaces so you
you can code python with two spaces per
block or or six Paces or 12 if you
really want to go wild but
sort of everything
that belongs to the same block needs to
be indented the same way
in practice in most other languages
people recommend doing that anyway if
you look at
C or rust or C plus plus
all those languages Java don't have a
requirement of indentation
but except in extreme cases
they're just as anal about having their
code properly indented so any IDE that
the syntax highlighting that works with
Java or C plus they will yell at you
aggressively if you don't do proper
indentation they'd suggests the proper
indentation for you like uh in C you
type a few words and then you type a
curly brace which there is their notion
of sort of begin and an indented block
uh then you hit return and then it
automatically indents four or eight
spaces depending on uh your your style
preferences or how your editor is
configured was there a possible Universe
in which you considered having braces in
Python absolutely yeah well there's a 60
40 70 30 in your head uh uh what was the
trade-off for a long time I was actually
convinced that the indentation was just
better
uh
without context I would still claim that
indentation is better
uh it reduces clutter however
as I started to say earlier context is
almost everything
and in the context of coding
most programmers are familiar with
multiple languages even if they're only
good at one or two
and apart from Python and maybe Fortran
I don't know how that's written these
days anymore but all the other languages
Java rust CC plus plus JavaScript
typescript Perl are all using curly
braces uh to sort of indicate blocks and
so python is the odd one out so it's a
radical idea do you still as a radical
Renegade revolutionary do you still
stand behind this idea of space of uh
indentation versus braces
like what what can you dig into it a
little bit more
why you still stand behind indentation
because context is not the whole story
history in in a sense provides more
context so for python
there's no chance that we can switch
python is using curly braces for
something else dictionaries mostly
we would get in trouble if we wanted to
switch just like you couldn't redefine C
to use indentation
even if you agree that it that
indentation sort of
in a Greenfield environment would be
better
you can't change that kind of thing in a
language yeah it's hard enough to reach
agreement over over much more Minor
Details maybe I mean in the past in
Python we did have a big debate about
test versus spaces and four spaces
versus fewer or more
and we sort of came up with
a recommended standard and sort of
options for people who want to be
different
but yes I guess the thought experiment
I'd like you to consider is if you could
travel back through time when the when
the compatibility is not an issue and he
started python all over again
can you make the case for uh indentation
still
well it frees up a pair of
matched brackets of which there are
never enough in the world
uh for other purposes
really makes the language slightly
sort of
easier to grasp for people who don't
already know
another programming language
because the sort of one of the things
and I I mostly got this from my mentors
who were
taught me programming language design in
the earlier 80s when you're teaching
programming
for for the the total newbie who has not
coded before in not in any other
language
uh a whole bunch of Concepts in
programming are very alien or
sort of
new and and maybe very interesting but
also distracting and confusing and there
are many different things you have to
learn you have to sort of
in a typical
13-week programming course you have to
if it's like really
learning to program from scratch you
have to cover algorithms you have to
cover data structures you have to cover
syntax you have to cover variables Loops
functions recursion classes
Expressions operators there are so many
Concepts if you you sort of
if you can spend a little less time
having to worry about the syntax
the the classic example was often
oh the compiler complains every time I
put a semicolon in the wrong place or I
forget to put a semicolon
uh python doesn't have semicolons in
that sense so you can't forget them and
you're also not
sort of misled into putting them where
they don't belong because you don't
learn about them in the first place
the flip side of that is forcing the
strictness onto the beginning programmer
to teach them that programming is a
values attention to details you don't
get to just write the way you write in
English they have other details that
they have to pay attention to so I think
they'll they'll still get the message
about uh
paying attention to details the
interesting design choice so I still
program quite a bit in PHP and I'm sure
there's other languages like this but
the dollar sign before a variable
that was always an annoying thing for me
it didn't quite fit into my
understanding of why this is good for a
programming language I'm not sure if you
ever thought about that one
that is a historical thing there is a
whole lineage of programming languages
PHP is one Pearl was one
on the union shell
uh is one of the oldest or or all the
different shells
the dollar was invented for that purpose
because a very earliest shells had a
notion of scripting but they did not
have a notion of parameterizing the
scripting
right and so a script is just a few
lines of text
where each line of text is a command
that is read by a very primitive command
processor that then sort of takes the
first word on the line as the name of a
program and passes all the all the rest
of the line as text into the program for
the program to figure out what to do
with as arguments
and so by the time scripting was
slightly more mature than the very first
script
there was a convention that just like
the first word of line is uh the name of
the program the following words
uh could be names of files
input.text
output.html things like that
the next thing that happens is oh it
would actually be really nice if we
could have variables and especially
parameters for scripts parameters are
usually what starts this process
but now you have a problem because you
can't just
say the parameters are x y and z
and so now we we call say let's say x is
the input file and Y is the output file
and let's forget about Z for now I have
my program
and I write program X Y well that
already has a meaning because that
presumably means
X itself is the file
it's a file name it's not a variable
name
uh and so
the inventors of of things like the
unique shell and I'm sure job command
language in at IBM before that
uh
had to use
something that made it clear to the
script processor
here is an X that is not actually the
name of a file which you just pass
through to the to the program you're
running here is an X that is the name of
a variable yeah and
when you're writing a script processor
you try to keep it as simple as possible
because at as certainly in the 50s and
60s
uh the thing that interprets the script
was itself a very had to be a very small
program because it had to fit in a very
small part of memory and so saying oh
just look at each character and if you
see a dollar sign you jump to another
section of the code and then you gobble
up characters or say until the next
space or something and you say that's
the variable name
and so it was was sort of
invented as
a clever way to make parsing of things
that contain but contain both variable
and fixed parts
very easy in a very simple script
processor it also helps even then it
also helps the human
author and the human reader of the
the script
to quickly see oh
20 lines down in the script I see a
reference to x y z Oh it has a dollar in
front of it so now we know that x y z
must be one of the parameters of the
script well this is fascinating several
things to say which is
the leftovers from the simple script
processor languages are now in code
bases like behind Facebook or behind
most of the back end I think php's
probably still runs most of the back end
of the internet oh yeah yeah I think
there's a lot of it in Wikipedia too for
example yeah it's funny that those
decisions are not funny it's fascinating
that those decisions permeate Through
Time
just like biological systems right
I mean that the sort of the inner
workings of DNA
have been stable for well I don't know
how long it was like 300 million years
half a billion years yeah and there
there are all sorts of weird quirks
there
that don't make a lot of sense if you
were to design
a system like self-replicating molecules
from scratch but that system has a lot
of interesting resilience it has
redundancy that results like it messes
up in interesting ways that still is
resilient when you look at the system
level of the organism code doesn't
necessarily have that a program a
computer programming code you'd be
surprised
how much resilience modern code has
I mean if you if you look at the number
of bugs per line of code
even in in very well tested
code that in practice works just fine
there are actually lots of things that
don't work fine
and there are error correcting or
self-correcting mechanisms at many
levels including probably the user of
the code well in the end the user who
sort of is told well you got to reboot
your your PC is part of that system and
a slightly uh less drastic thing is
reload the page which we all know how to
do without thinking about it when
something weird happens you you try to
reload a few times before you say oh
there's something really weird okay or
try to click the button again if the
first time didn't work
well yeah that we should all have
learned not to do that because that's
probably just gonna turn the light back
off yeah true so do it three times
that's the that's the right lesson so uh
and I wonder how many people actually
like the dollar sign like you said it is
documentation so to me it's whatever the
opposite of syntactic sugar is syntactic
poison to me it is such a pain in the
ass that I have to type in a dollar shot
also super error prone
so it's not self-documenting it's it's
like a bug generating thing it is a kind
of documentation that's the pro and the
con is it's a source of a lot of bugs
but actually I have to ask you um
this is a really interesting idea of
bugs per line of code
if you look at all the computer systems
out there from the code that runs
nuclear weapons to the code that runs
all the amazing companies that you've
been involved with and not the code that
runs Twitter and Facebook and Dropbox
and Google and Microsoft Windows and so
on
and we like laid out
wouldn't that be a cool like table bugs
per line of code and what would that
let's let's put like actual companies
aside do you think we'd be surprised by
the number we see there for all these
companies
that depends on whether you've ever read
about research that's been done in this
area before
and
I didn't know the the re the the last
time I
I saw some research like that that was
probably in the 90s and the research
might have been done in the 80s but the
the conclusion was across a wide range
of different software different
languages
different companies
different development styles
the number of bugs is always
I think it's in the order of about one
bug per thousand lines in sort of
mature software that that is considered
interesting as good as it gets can I
give you some facts here there's a lot
of good papers so you said mature
software right so here's uh
a report from a uh like programming
analytics company
now this is from a developer perspective
let me just say what it says because
this is very weird and surprising on
average a developer creates 70 bugs per
1000 lines of code
15 bugs per 1000 lines of code find
their way to the customers
but this is in the software they've oh I
was I was wrong by an order okay there
fixing a bug takes 30 times longer than
writing a line of code
that I can believe yeah 75 of a
developers time is spent on debugging
um that's for an average developer that
they Analyze This 15. argue
1500 hours a year in us alone
113 billion dollars to spend annually on
identifying and fixing bugs
imagine this is marketing literature for
someone who claims to have a golden
bullet or a silver bullet that makes all
that investment in fixing bugs go away
but that that is usually yeah not going
to yeah that's not gonna happen well
they're uh I mean they're referencing a
lot of stuff of course but it is a page
uh that is you know there's a contact us
button at the bottom presumably if you
just spend a little bit less than 100
billion dollars we're willing to solve
the problem for you
right and there's also a report on stock
exchanges stack overflow on the exact
same topic but when I open it up at the
moment the page says stack Overflow is
currently offline for maintenance oh
it's ironic yes uh by the way their
error page is awesome anyway
I mean can you believe that number of
bugs oh absolutely isn't that scary that
70 bucks per 1000 lines of code so even
10 bucks per thousand lives well that's
about one bug after every 15 lines and
that's when you're first typing it in
yeah from a developer but like how many
bugs are going to be found
if you're if you're typing well the
development process is extremely
iterative yeah typically you don't make
a plan for what software you're going to
release a year from now yeah uh and work
out all the details because actually all
the details uh themselves consist
they're sort of compose a program
and that's that
being a program all your plans will have
bugs in them too and inaccuracies
uh but what what you actually do is
you do a bunch of typing and I'm I'm
actually really I'm a really bad typist
that just I've never learned to type
with 10 fingers
how many do you use
well I could use all 10 of them but not
very well
but I I never I never took a talking
class and I never sort of corrected that
so the first time I I seriously learned
I had to learn the layout of a qwerty
keyboard
was actually in college in my first
programming classes where we used Punch
Cards
and so
with my two fingers I sort of pecked out
my code
watch anyone
give you a little coding demonstration
they'll have to produce like four lines
of code
and now see how many times they use the
backspace key yeah because they made a
mistake and and
and some people especially when when
someone else is looking
will will backspace over 20 30 40
characters to fix a typo earlier in a
line if you're
if you're slightly more experienced of
course you use your arrow buttons to go
or your mouse to but the mouse is
usually slower than uh than the arrows
but a lot of people when they type a 20
character word which is not unusual and
they realize they made us made a mistake
at the start of the word the backspace
over the whole thing
and then retype it and sometimes it
takes three four times to get it right
so
I don't know what your definition of bug
is arguably mistyping a word and then
correcting it immediately is not a bug
on the other hand you you already
do sort of lose time and every once in a
while there's sort of a typo that you
don't get in that process
and now you've you've typed like 10
lines of code
uh and some were in the middle of it you
don't know where yet is a typo or maybe
a thinko where you you forgot that you
had to initialize a variable or
something but those are two different
things and I would say yes you have to
actually run the code to discover that
typo but forgetting to initialize a
variable is a fundamentally different
thing because that thing can go
undiscovered uh that depends on the
language in Python it will not right in
sort of modern compilers are usually
pretty good at catching that even
even foresee so for that specific thing
but actually deeper
it might there might be another variable
that has initialized but logically
speaking the one you meant related yep
it's like name the same but it's a
different thing and you forgot to
initialize uh whatever some counter or
some some basic variable they're using I
can tell that you've coded yes by the
way I should mention that I use the
Kinesis keyboard
which has the backspace under the thumb
and one of the biggest reasons I use
that keyboard is because you realize in
order to use the backspace on a usual
keyboard you have to stretch your pinky
out
and like the the for most normal
keyboards the Backspaces under the pinky
and so I don't know if people realize
the pain they go through in their life
because of the backspace keep being so
far away so with the Kinesis it's right
under the thumb so you don't have to
actually move your hands the backspace
and the delivery what do you do if
you're ever not with your own keyboard
and you have to use someone else's PC
keyboard that has a standard layout so
first of all it turns out that you can
actually go your whole life always
having the keyboard
with you so this well except for that
that little tablet that you're using so
we're note taking right now right uh
yeah so it's very inefficient
note-taking but I'm not I'm just looking
stuff up but in most cases I would be
actually using the keyboard here right
now I just don't anticipate you have to
calculate how much typing do you
anticipate if I anticipate quite a bit
then I'll just I have a keyboard
and the same same with I mean
the embarrassing
of accepted being the weirdo that I am
but you know when I go on an airplane
and I anticipate to do programming or a
lot of typing I will have a laptop that
will put pull out a Kinesis keyboard in
addition to the laptop and it's just who
I am you have to you have to accept who
you are
um but also it's a you know for a lot of
people
for me certainly there's a comfort space
where there's a certain kind of setups
that are maximized productivity and
um it's like some people have a warm
blanket that they like
when they watch a movie I like the
Kinesis keyboard takes me to uh a place
of focus and I still mostly I I'm trying
to make sure I use the state-of-the-art
IDS for everything but my comfort place
just like the Kinesis keyboard is still
emacs
so
I still use I still I mean that's one of
some of the debates I have with myself
about everything from a technology
perspective
is how much to hold on to the tools
you're comfortable with versus how much
to invest in using modern tools and the
signal that the communities provide you
with is the noisy one because a lot of
people year to year get excited about
new tools and you have to make a
prediction are these tools defining a
new generation or something that will
transform programming or is this just a
fad that will pass certainly with
JavaScript Frameworks and front and the
back end of the web there's a lot of
different styles that came and went I
remember learning um what was it called
the action script I remember for flash
um you know learning how to program in
Flash uh learning how to design doing
graphic animation all that kind of stuff
in Flash same with Java applets I
remember creating quite a lot of java
applets thinking that this potentially
defines the future of the web and it did
not well you know in most cases like
that the particular technology
eventually gets replaced
but
many of the concepts that the technology
introduced or made accessible first
are preserved of course
because yeah we're not using Java
applets anymore but the notion of
reactive web pages
that sort of contain little bits of code
that respond directly to
something you do like pressing a button
or a link or hovering even
uh is has certainly not gone away
and that those animations that were made
painfully
complicated with flesh
I mean flash was an innovation when it
first came up
and when it was replaced by JavaScript
equivalence
stuff
it was a somewhat better way to do
animations but those animations are
still there not all of them
but but sort of
again there is an evolution and often so
often with technology
the the sort of the technology that was
eventually thrown away or replaced
was still essential to to sort of
get started there wouldn't be jet planes
without propeller planes
I bet you but from a user perspective
yes from the feature set yes but I from
a programmer perspective it feels like
all the time I've spent
with actionscript all the time I spent
with Java on the applet side for the GUI
development I well no Java I have to
push back that was useful that because
it transfers but the Flash doesn't
transfer so some things you learn and
invest time in what yeah what what you
learned this the skill you picked up
learning action script yeah
was sort of it was perhaps
a super valuable skill at the time you
picked it up if if you if you learned
action script early enough but
that skill is no longer
in demand well that's the calculation
you have to make when you're learning
new things like today people start
learning programming today I'm trying to
to see what are the new languages to try
what are the new uh systems to try that
what are the new IDs to try to to keep
keep improving because that's why we
start when we're young right
but that seems very true to me that that
when you're young you have your whole
life ahead of you and your you're
allowed to make mistakes in fact you
should you should feel encouraged to to
do a bit of stupid stuff yeah try not to
get yourself killed or seriously maimed
but try stuff that
deviate from from what everybody else is
doing
and like nine out of ten times you'll
just learn why everybody else is not
doing that or why everybody else is
doing it some other way and one out of
ten times you sort of
you discover something that's better or
that's that somehow works I mean there
are all sorts of crazy things that were
invented
uh by accident by people trying trying
stuff together
that's great advice to try random stuff
make a lot of mistakes once you're
married with kids you're probably going
to uh be a little more risk-averse
because now there's more at stake and
you've already hopefully had some time
where you where you were experimenting
with crazy shit I like how marriage and
kids solidifies their choice of
programming language how does that the
robber Frost poem with the The Road Less
taken which I think is misinterpreted by
most people but anyway I I feel like the
choices you make early on
especially if you go all in they're
going to define the rest of your life's
trajectory in a way that
like you basically are picking a camp so
uh you know there's if you invest a lot
in PHP if you invest a lot in.net if you
invest a lot in JavaScript
you're going to stick there
you that's that's your life Journey
only as far as that technology remains
relevant yes yes I mean if if at age 16
you learn coding in C
and by the time you're 26 C is like a
dead language
then there's still time to switch
there's probably some kind of Survivor
bias or whatever it's called in in sort
of your observation that that you pick a
camp because there are many different
camps to pick and if you pick dot net
then then you can Coast for the rest of
your life because that technology is now
so ubiquitous of course that it's even
if it's if it's bound to die it's going
to take a very long time well for me
personally
I had a very difficult in my own head
Brave leap that I had to take relevant
to our discussion which is most of my
life I programmed in C and C plus plus
and so uh having that hammer everything
looked like a nail
so I would literally even do scripting
in C plus plus like I would create
programs that do script like things and
uh when I first came to Google and and
before then it became already before
tensorflow before all of that there was
a growing realization that c plus is not
the right tool for machine learning we
could talk about why that is it's
unclear why that is a lot of things
has to do with community and culture and
how it emerges and stuff like that but
for me they decided to take the leap to
python like all out basically switched
completely from C plus plus except for a
highly performant robotics applications
there were still uh
there's still a culture of C plus plus
in in the space of robotics
that was a big leap
like I had to you know like like people
have like existential crises or midlife
crises or whatever you have to realize
almost like walking away from uh from a
person you love
um because I was sure that c plus would
have to be a lifelong companion for a
lot of problems I would want to solve C
plus would be there and it was a
question to say well that might not be
the case because sibo spots is still one
of the most popular languages in the
world one of the most used one of the
most dependent on it's also still
evolving quite a bit I mean
that that is not a sort of a fossilizing
community yes they they are doing great
Innovative work actually a lot but yet
the sort of their Innovations are hard
to follow if you're not already a
hardcore C plus plus user well this was
the thing it pulls you in it's a rabbit
hole I was a hardcore the all meta
programming template programming like I
I would start using the modern C plus
plus as it developed right not just the
not just the shared pointer and the
garbage collection that makes it easier
for you to work with some of the flaws
but the detail like The Meta programming
the the crazy stuff that's that's coming
out there but then you have to just
empirically look and step back and say
what language am I more productive in
sorry to say what language do I enjoy my
life with more
and uh readability and able to think
through and all that kind of stuff that
those questions are harder to ask when
you already have
a loved one which in my case was C plus
plus and then there's python uh like
that Meme was is the the grass is
greener on the other side am I just
infatuated with a new fad new cool thing
or is this actually going to make my
life better and I think a lot of people
face that kind of decision it was a
difficult decision for me
um when I made it at this time it's an
obvious switch if you're into machine
learning but at that time it wasn't
quite yet so obvious so it was a risk
and you know you have the same kind of
stuff with um
I still because of my connection to
Wordpress
I still do a lot of back-end programming
in PHP uh
and the question is you know node.js
python do you switch to do you switch
back into any of those
programming there's the case for node.js
for me well more and more and more of
the front end it runs in JavaScript
um and fascinating cool stuff is known
as JavaScript maybe use the same
programming language for the back end as
well
uh the case for python for the back end
is well you're doing so much programming
outside of the web in Python so maybe
use Python for the back end and then the
case for PHP well most of the web still
runs in PHP
you have a lot of experience with PHP
why uh fix something that's not broken
those are my own personal struggles but
I think they reflect the struggles of a
lot of people and with different
programming languages with different
problems they're trying to solve it's a
weird one and there there's not a single
answer right because depending on how
much time you have to learn new stuff
where you are in your life what what
you're currently working on who you want
to work with what communities you like
yeah there's not one right choice
maybe if you if you sort of
if you can look back 20 years you can
say well that whole detour through
action script was a waste of time
but
nobody could know that
so you can you can beat yourself up over
that
uh you just need to accept that not
every choice you make
is going to be perfect maybe sort of
keep Plan B in the back of your mind
uh but don't don't overthink it don't
don't try to sort of don't don't create
a spreadsheet with like where you're
trying to estimate well if I learn this
language I expect to make x million
dollars in a lifetime and if I learn
that language I expect to make why a
million dollars in a lifetime and which
Which is higher and what which has more
risk and where's the chance that it's
like picking picking a stock
kind of kind of but uh
I think with stocks you can do
diversifying your investment as good
with productivity in life
boy that spreadsheet is possible to
construct
like if you actually carefully analyze
what your interest in life are where you
think you can maximally impact the world
there really is better and worse choices
for a programming language that are not
just about the syntax but about the
community about where you predict the
community's headed
what large systems are programmed in
that but can you create that spreadsheet
because that sort of you're mentioning a
whole bunch of inputs that go into that
spreadsheet where you have to estimate
things that are very hard to measure and
even harder I mean they're they're hard
to measure
retroactively and they're even harder to
predict like what is the better
community
well better is is one of those
incredibly difficult words what's better
for you is not better for someone else
no but we're not doing a public speech
about what's better we're doing a
personal spiritual journey I can
determine a circle of friends
circle circle one and circle two and I
can have a bunch of parties with one and
a bunch of parties with two and then
right down or to take a mental note of
what made me happier right and that you
know you have if you're a machine
learning person you want to say Okay I
want to build a large company that does
that is grounded in machine learning but
also has a sexy interface that has a
large impact on the world what languages
do I use you look at what Facebook is
using you look at what Twitter is using
then you look at performance more newer
languages like rust or you look at
languages that have taken that most the
community uses in the machine learning
space that's Python and you can like
think through you can hang out and think
through it and it's it's always a invest
and the the level of activity of the
community is also really interesting
like you said C plus plus and python are
super active in terms of the development
of the language itself
but do you think that you can make
objective choices there no no but
there's a gut you build up like don't
you don't you believe in that gut
feeling everything is very subjective
and yes you most certainly can have a
gut feeling and your gut can also be
wrong that's why there are billions of
people because they're not all right I
mean clearly there are more people
living in the Bay Area who have plans to
sort of create a Google sized company
then there's room in the world for
Google sized companies and they're gonna
have to Duke it out in the market the
space and there's many more choices than
just the programming language speaking
of which let's go back to the boat with
the with the fisherman who's tuned out
long ago I talked to the programmer
let's jump around and go back to see
python that we tried to Define as the
reference implementation and one of the
big things that's coming out in 3.11
what's the right way we tend to say 3.11
because it really was like we went 3.8
3.9 3.10 3.11 and we're planning to go
up to 3.99 99 what happens after 99
probably just 3.100 what if I make it
there okay
and go all the way to 420. I got it
forever python V3 we'll talk about four
but more for fun
so 3.11 is coming out one of the big
sexy things in it is it'll be much
faster so how did you beyond hiring a
great team or working with a great team
make it faster what are some ideas
uh that may makes it faster
it has to do with Simplicity of software
versus performance
and so even though C is known to be a
low-level language which is
great for writing sort of
a high performance language interpreter
when I originally started python or C
python
I
didn't expect there would be
great success and fame in my future
uh so I
I try to get something working
and useful
uh in about three months
and so I I sort of I cut corners
I borrowed ideas left and right when it
comes to language design as well as
implementation
uh I also wrote much of the code as
simple as it could be
and
they're they're like
there are many things that you can code
more efficiently by adding more code
it's a bit of a sort of a time space
trade-off
where you can compute a certain thing
from a small number of inputs
uh and every time you get presented with
new input
uh you do the whole computation from the
top
that can be simple looking code it's
easy to understand it's easy to reason
about that you can you can tell quickly
that it's correct in at least in the
sort of mathematical sense of correct
uh because it's implemented in C maybe
it performs relatively well
but over time as sort of
as the requirements for that code and
the need for performance
go up
you might be able to rewrite that same
algorithm
using more memory maybe remember
previous results
so you don't have to recompute
everything from scratch like the the
classic example is Computing prime
numbers
like
is 10 a prime number
well you sort of is it divisible by two
is it divisible by three is it divisible
by four and we go all the way to is it
divisible by 9. and it is not well
actually 10 is divisible by two so there
we stop but say 11. it's divisible by
ten the answer is nine is no ten times
in a row so now we know 11 is a prime
number
on the other hand if we already know
that 2 3 5 and 7 are prime numbers and
you know a little bit about the
mathematics of how prime numbers work
you know that if you have a rough
estimate for the square root of 11 you
don't actually have to check is it
divisible by four or is it divisible by
five you all you have to check in the
case of 11 is is it divisible by 2 is it
divisible by three
because take 12.
if it's divisible by 4 well 12 divided
by 4 is 3 so you you should have come
across the question is it divisible by 3
first
so if you know basically nothing about
prime numbers except the definition
maybe you go for X from 2
through n minus 1 is n divisible by X
and then at the end if you got uh all
no's uh for every single one of those
questions you know oh it must be a prime
number well the first thing is you can
stop iterating when you find a yes
answer
and the second is you can also stop
iterating when you have have reached
the square root of n because you know
that if it has a divisor larger than
than the square root did not also have a
divisor smaller than the square root
then you say oh except for two we don't
need to bother with checking for even
numbers because all even numbers are
divisible by two so if it's divisible by
four
we would already have come across the
question is it divisible by two and so
now you go special case check is a
divisible by two and then you just check
three five seven eleven
uh and so now you've you've sort of
reduced your search Pace by 50 Again by
by skipping all the even numbers I kept
for two
if you think a bit more about it or you
just
read in your book about the history of
math one of the first algorithms ever
written down
all you have to do is check is it
divisible by any of the previous prime
numbers that are smaller than the square
root
and before you get to a better algorithm
than that
you have to have several phds in in
discrete math so that's as much as I
know so of course that same story
applies to a lot of other algorithms
string matching is a good example
of uh how to come up with an efficient
algorithm and sometimes yeah the more
efficient algorithm is not so much more
complex than the inefficient one but
that's an art and it's not always the
case in the general cases the more
performant the algorithm the more
complex it's going to be there's a
there's a kind of trade-off the simpler
algorithms are also the ones that people
invent first
because when you're looking for a
solution
you look at the simplest way to get
there first
and so if there is a simple solution
even if it's not the best solution not
the fastest or the memory most memory
efficient or whatever
a a simple solution and simple is is
fairly subjective but mathematicians
have also thought about sort of what is
a good definition for simple in the case
of algorithms
uh but the simpler the simpler Solutions
tend to be
easier to follow for other programmers
who haven't made a study of a particular
field and when I when I started with
python I I was a good programmer in
general I knew sort of basic data
structures I knew the C language pretty
well
but there were many areas where I was
only somewhat familiar with the state of
the art
and so I I picked
in many cases the simplest way I could
solve a particular sub problem because
when you when you're designing and
implementing a language you have to like
you've many hundreds of little problems
to solve
and you have to have solutions for every
one of them before you can can sort of
say I've invented a programming language
first of all so see python what kind of
things does it do it's an interpreter it
takes in this readable language that we
talked about that is python what is it
supposed to do The Interpreter basically
it's it's sort of a recipe for
understanding recipes
so instead of a recipe that says bake me
a cake we have a recipe for
well given
the text of a program
how do we run that program and and that
is sort of the recipe for building a
computer the recipe for the Baker and
the chef yeah what are the
algorithmically tricky things that
happen to be low-hanging fruit that
could be improved on maybe throughout
the history of python but also now how
is it possible that 3.11 in year 2022
it's possible to get such a big
performance Improvement
we focused
on a few areas where we we still felt
there was low hanging fruit
the biggest one is actually The
Interpreter itself
and this has to do with details of Pi
how python is defined so I didn't know
if the fisherman is going to follow this
story he already he already jumped off
the boat his uh he's he's this yeah
stupid python is actually even though
it's always called an interpreted
language it's there's also a compiler in
there it just doesn't compile to machine
code it compiles to bytecode which is
sort of code for an imaginary computer
that is called the python interpreter so
it's compiling code that is more easily
digestible by The Interpreter or is
digestible at all it is the code that is
digested by The Interpreter that's the
compiler we tweaked very minor bits of
the compiler almost all the work was
done in The Interpreter
because
when you have a program you compile it
once and then you run the code a whole
bunch of times
or maybe there's one function in the in
the code that gets run many times
now I know that that sort of people who
who know this field are expecting me to
at some point say we built adjusting
time compiler actually we didn't we just
made The Interpreter uh a little more
efficient what's adjust in time compiler
that is a thing from the Java World
although it's now applied to almost all
uh programming languages especially
interpreted ones so you see the compile
inside python not like a just-in-time
compiler but is it compiler that creates
by code that is then
fed to The Interpreter and the compiler
was there something interesting to say
about the compiler it's interesting that
you haven't changed that tweak that at
all or much we changed some parts of the
byte code
but not very much and so we only had to
change the parts of the compiler where
we decided that the the breakdown of a
Python program in bytecode instructions
had to be slightly different
but
that did that didn't gain us the
performance uh improvements that
performance improvements were like
making The Interpreter faster in part by
sort of
removing the fat from some internal data
structures used by The Interpreter but
uh the the key idea is an Adaptive
specializing interpreter
let's go what is adaptive about it what
is specialized about it well let me
first talk about the specializing part
because the Adaptive part is the sort of
the second order effect but they're both
important so bytecode
is a bunch of machine instructions but
it's an imaginary machine but the
machine can do things like call a
function
add two numbers
print value
those are sort of typical instructions
in Python
uh and if we take the example of adding
two numbers
actually in Python the language there's
no such thing as adding two numbers
there's just an the the compiler
doesn't know that you're adding two
numbers you might as well be adding two
strings or two lists
uh or two instances of some user-defined
class that happen to implement this
operator called add
that's a very interesting and and fairly
powerful mathematical concept it's
mostly a user interface trick because it
means that
a certain category of functions
can be written using a symbols single
symbol the plus sign
and sort of a bunch of other functions
can be written using another single
symbol the multiply sign
uh so if we take addition the way
traditionally in Python the ad byte code
was executed is
pointers pointers and more pointers so
first we we have two objects an object
is basically a pointer to a bunch of
memory that contains more pointers
pointers all the way down
not quite but there there are a lot of
them so to simplify a bit uh we look up
in one of the objects
what is the type of that object and does
that object type Define an add operation
and so you can imagine that there is a
sort of a type integer that knows how to
add itself to another integer and there
is a type floating Point number that
knows how to add itself
to another floating Point number
and the integers and floating Point
numbers are sort of important I think
mostly historically because in the first
computers
uh you use the sort of the same bit
pattern when interpreted as a floating
Point number had a very different value
than we interpret it as an integer can I
ask a dumb question here please do given
the basics of ant and float and add who
carries the knowledge of how to add two
integers is it the integer it's the type
integer versus it's the type integer and
the type float what about the operator
is the operator
just exists as a platonic form possessed
by uh the operator is more like
it's an index in a list of functions
that the integer type defines
and so the integer type
is really a collection of functions and
there is an add function and there's a
multiply function and there are like 30
other functions for other operations
there's a power function for example
and
you can imagine that
in in memory there is a distinct slot
for the add operations let's say the add
operation is the first operation of a
type and the multiply is the second
operation of a type
so now we take the integer type and we
take the floating Point type
in both cases the add operation is the
first slot and multiplies the second
slot but
each slot contains a function and the
functions are different because the the
add to integers function interprets the
bit patterns as integers that add to
float
function interprets the the same bit
pattern as
as a floating Point number and then
there is the string
data type which again interprets the the
bit pattern as a
the address of a sequence of characters
there are lots of lies in that story but
that's
that's sort of a basic idea I can tell I
could tell the fact the fake news and
the fabrication going on here at the
table but uh where's the optimization is
it on the operator is it a different so
inside the integer optimization is the
observation that
in a particular line of code
so now you you write your little Python
program and you write a function and
that function sort of takes a bunch of
inputs and at some point it adds two of
the inputs together
now I bet you even if you call your
function a thousand times
that all those calls are likely all
going to be about integers because maybe
your program is all about integers or
maybe
on that particular line of code where
there's that plus operator
every time the program hits that line
the variables A and B that b are being
added together happen to be strings
and so what we do is instead of having
this single byte code that says here's
an ad operation and the implementation
of add is fully generic it looks at the
object from the object it looks at the
type then it takes the type and it looks
at looks of the function pointer then it
calls the function now the function has
to be has to look at the other argument
and it has to double check that the
other argument has the right type
and then there's a bunch of error
checking before it can actually
just go ahead and add the two bit
patterns in the right way
what we do is
every time we execute an ad instruction
like that
we we keep a little note of
in the end after
we hit the code that that did the
addition
for a particular type what type was it
and then after a few times through that
code if it's this if it's the same type
all the time
uh we say oh so this add operation even
though it's the generic ad operation it
might as well be the add integer
operation and add integer operation is
uh much more efficient because it just
says
assume that A and B are integers do the
addition operation do it right there
inline and produce the result
and the big lie here is that in Python
even if you have great evidence that in
the past it was always too integers that
you were adding at some point in the
future that same line of code could
still be hit with two floating points or
two strings or maybe a string and an
integer it's not a great lie that's just
the fact of life
I didn't account for what what should
happen in that case in in the way I told
the story there is some accounting and
and so what we actually have to do is
when we have the add integer operation
we still have to check
are the two arguments in fact integers
we applied some tricks to make those
checks efficient
and we know statistically that the
outcome is almost always yes they were
they are both integers
uh and so we quickly make that check and
then we proceed with the the sort of add
integer operation and then there is a
fallback mechanism where we say
oops one of them wasn't an integer
now we're going to pretend that there
was just the fully generic ad operation
we wasted a few Cycles believing it was
what was going to be two integers and
then we had to back up but we didn't
waste that much time and statistically
uh most of the time
basically we were sort of
hoping that most of the time we guess
right because if we if it turns out that
we guessed wrong too often
uh or we didn't have a good guess at all
uh things might actually end up running
a little slower
so someone with armed with this
knowledge
and a copy of the implementation someone
could easily construct a counter example
where they say oh I have a program and
then now it runs five times as slow in
Python 311 than it did in Python 310.
but that's a very unrealistic program
that's that's just like an extreme fluke
it's a fun reverse engineering task
though oh yeah so there's uh
well people like fun yes so there's some
presumably heuristic
of what defines a momentum
of uh saying you know you seem to be
working adding two integers not two
generic types uh so how do you figure
out that heuristic I think that the
heuristic is actually we assume that the
weather tomorrow is going to be the same
as the weather today so you don't need
two days of the weather no
that is already so much better than than
guessing randomly that so how do you
find this idea
hey I wonder if instead of adding to
generic types we uh we start assuming
that the weather tomorrow is the same as
the weather today
where do you find the idea for that
because that ultimately
for you to do that you have to kind of
understand how people are using the
language right python is not the first
language to do a thing like this this is
a fairly well-known trick especially
from
other interpreted languages that had
reason to be sped up we occasionally
look at papers about hhvm which is for
Facebook's uh
efficient compiler for uh PHP there are
tricks known from the jvm and
sometimes it just comes from Academia
and so the trick here is that the type
itself doesn't the variable doesn't know
what type it is
so this is not a statically typed
language where you can
this is a trick that is especially
important for uh for interpreted
languages with Dynamic typing because
if
if the compiler could read in the source
these X and Y that we're adding our
integers the compiler can just insert
the single add machine code that
Hardware machine instruction that exists
on every CPU and ditto for floats
uh but because in Python you don't
generally declare your the types of your
variables you you don't even declare the
existence of your variables they just
spring into existence when you first
assign them which
is really cool and and sort of helps
those beginners because there's less
bookkeeping they have to learn how to do
before they can start playing around
with code but it makes the the in
interpretation of the code less
efficient and so we're we're sort of
trying to
to make the interpretation more
efficient without losing the the super
Dynamic nature of the language that's
always the challenge 3.5 got the pep 484
type hints
what is Type hinting and is it used by
The Interpreter the hints or is it just
syntactic sugar so the type hints is an
optional mechanism that people can use
and it's especially popular with sort of
larger companies that have very large
code bases written in Python do you
think of it as almost like documentation
saying these two variables are this type
more than documentation I mean so it
it is a sub language of python where
where you can express the types of
variables so here's a variable and it's
an integer and here's an argument to
this function and it's a string and here
is a function that returns a list of
strings but that's not checked when you
run the code but
exactly there there is a separate piece
of software called a static type Checker
that reads all your source code without
executing it and things long and hard
about
what it looks from just reading the code
that code might be doing
and double checks if that makes sense if
you take the types as annotated into
account so this is something you're
supposed to run as you develop it's like
a linter yeah that's definitely a
development tool but the type
annotations currently are not used for
uh speeding up The Interpreter and there
are a number of reasons uh many people
don't use them
even when they do use them uh they
sometimes contain lies where the static
type Checker says everything's fine
I cannot prove that this integer is ever
not an integer but at runtime somehow
someone manages to violate that
assumption
and The Interpreter
ends up doing just fine if we started
enforcing type annotations in Python
many python programs would no longer
work
and some python programs wouldn't even
be possible because they're too dynamic
and so we made we made a choice of not
using the annotations there there is a
possible future where eventually
three four five releases in the future
we could start using those annotations
to sort of
provide hints because we can we can
still say well the source code leads us
to believe that these X and Y are both
integers and so we can generate an add
an add integer instruction
but we can still have a fallback that
says oh if the if somehow the code code
at runtime provided something else maybe
it provided two decimal numbers
we can still use that generic ad
operation as a fallback but we're not
there is there currently a mechanism or
do you see something like that where you
can almost add like an assert
inside a function that says please check
that my type hints are actually mapping
to reality sort of like insert manual
static typing there are third-party
libraries that uh are in that business
so it's possible to do that kind of
thing it's possible to for a third party
library to take a hint
and enforce it it seems like a tricky
thing what what well what we actually do
is and this I think this is a fairly
unique feature in Python the type hints
can be introspected at runtime so while
the program is running
they mean python is a very
introspectable language you can look at
a variable and ask yourself what did
what is the type of this this variable
and if that baby that variable happens
to refer to a function you can ask what
are the arguments to the function
and nowadays you can also ask what are
the type annotations for the function so
the type annotations are there inside
the variable as it's at runtime they're
mostly associated with the function
object not with each individual variable
but uh right you can sort of map from
from the arguments to the variables and
that's what a third-party Library can
have exactly and the problem with that
is that all that extra runtime type
checking
uh is going to slow your code down
instead of speed it up I think uh to
reference this
uh sales pitchy blog post that says 75
of developers time to spend on debugging
I would say that in some cases that
might be okay it might be okay to pay
the cost of performance
for the catching of the types the type
errors
and in most cases doing it
statically before you ship your code to
production is more efficient than doing
it at runtime piecemeal yeah
can you tell me about
m-y-p-y my pie project
what is it what's the mission and in
general what is the future of static
typing in Python
well so my pie uh was started by a
Finnish uh developer ukulele
so many cool things out of Finland I
gotta say just that part of the world I
guess people have nothing better to do
in those long cold Winters yeah I don't
know I think Yuka lived in England when
he invented uh that stuff actually but
my pie is the original static type
checker for Python and the the type
annotations that were introduced with
pep484
were sort of developed
together with the the static type
Checker and in fact Yuka had first
invented a different syntax that wasn't
quite compatible with python
and uh yukai and I sort of met at the
python conference in I think in 2013
and we we sort of came up with a
compromise syntax
that would not require any changes to
python
and that would let my pie sort of be an
add-on static type checker for python
just out of curiosity was it like double
colon or something what was he proposing
that would break python I think he was
using angular brackets for uh types like
in C plus plus or uh Java generics yeah
you can't use angular brackets in Python
it would be too tricky
for attempt well we the the key thing is
that we already had uh you know a Syntax
for annotations we just didn't know what
to use them for yet
so type annotations were just the sort
of most logical thing to to use that
existing dummy Syntax for
so but there was no there was no Syntax
for uh defining generics
directly syntactically in the language
my pie literally meant my version of
python where my it refers to Yuka
he had a parser that translated my pie
into python by like doing the type
checks
and then removing the annotations and
all the angular brackets from the
positions where where he was using them
but a preprocessor model doesn't work
very well with the typical workflow of
uh python development projects
that's funny I mean that could have been
another major split if it became
successful like uh if you watch
typescript versus JavaScript
is it like a split in the community over
types right that seems to be stabilizing
now it's not necessarily a split there
are certainly plenty of people who don't
use typescript but
just use the original JavaScript
notation just like there are many people
in the python world who don't use type
annotations and don't use static type
Checkers now I know but there is a bit
of a split between typescript and
JavaScript old school JavaScript AES
whatever well in the JavaScript world
transpilers are sort of the standard way
of working anyway which is why
typescript being a transpiler itself is
not a big deal
and transplants for people who don't
know it's what's exactly the thing you
said with my pies it's the code I guess
you call it pre-processing code that
translates from one language to the
other and that's part of the culture
part of the workflow of the JavaScript
community so that's right at the same
time
an interesting development in the
JavaScript slash typescript world at the
moment is that
there is a proposal under consideration
it's only a stage one proposal
that proposes to add a feature to
JavaScript where just like python it
will ignore certain syntax
when running the JavaScript code
and what it ignores is more or less a
superset of The typescript annotation
syntax
interesting so that would mean that
eventually if you wanted to you could
take typescript
uh and you could shove it directly into
a JavaScript interpreter without
transplation
the interesting thing in the JavaScript
world at least the web browser world
the web browsers have changed how they
deploy and uh they they sort of update
their JavaScript engines
much more quickly than they used to in
the the early days and so there's much
less of a need for
translation in JavaScript itself because
most browsers just support the most
recent version of ecmascript just on a
tangent of a tangent do you see if you
will recommend somebody use a thing
would you recommend typescript or
JavaScript
I would recommend a typescript just
because of the strictness of the typing
it's an enormously helpful extra tool
that helps you sort of
keep your head straight about
what your code is actually doing
I mean it's it's it it helps with
editing your code it helps with ensuring
that your code
is not too incorrect
and it's actually
quite compatible with JavaScript never
mind this syntactic sort of hack that is
still years in the future
but
any library that is written in pure
JavaScript can still be used from
typescript programs and also the other
way around you can write a library in
typescript and then export it in a form
that is totally consumable by JavaScript
that sort of compatibility is is sort of
the key to this to the success of
typescript
yeah just to look at it as almost like a
biological system that's evolving it's
fascinating to see JavaScript evolve the
way it does well maybe we should
consider that biological systems are
just the Engineering Systems too right
yes but very advanced
with more history
but it's almost like the most visceral
in the JavaScript world because there's
just so much code written in JavaScript
that for its history was messy if you
talk about bugs per line of code I just
feel like JavaScript
eats the cake or whatever the
terminology is it beats python by a lot
in terms of number of bugs meaning like
way more bugs in JavaScript and then and
then the obviously the browsers the
develop I mean just there's so much
active development it feels a lot more
like Evolution where a bunch of stuff is
born and dies and there's
experimentation and debates versus
python is more
um all that stuff is happening but
there's just a longer history of stable
working giant software systems written
in python versus JavaScript is just a
giant beautiful I would say mess of code
it's very different culture and
to some extent differences in culture
are random but to some extent they the
differences have to do with the
environment yeah uh and the fact that
JavaScript is primarily
the language for uh developing web
applications especially the client side
and the fact that it's basically the
only language for developing web
applications
makes that Community sort of just have a
different nature than the community of
other languages
plus the graphical component
um and the fact that they're deploying
it on all kinds of uh shapes of screens
and devices and all that kind of stuff
it just creates a beautiful chaos anyway
back to my fight
so what okay you you met you talked
about a syntax that could work
where does it currently stand
what's the future of static typing in
Python
it is still controversial but it is much
more accepted than when my pi and pep
484 were were young
what's the connection between uh pep 484
type hints and my pie my pie
was the original static type Checker so
it might buy quickly evolved from yuka's
own variant of python to a static type
checker for Python and uh sort of Pep
484 that that was it like
a very productive year where like many
hundreds of messages were exchanged
debating that merits
of every aspect of of that pep
and so my pie is a static type checker
for python it is itself written in
Python
most
additional static typing features that
we introduced in the time since three
six
uh we're also prototyped through my pie
my pie being an open source project with
a very small number of maintainers
it was successful enough that people
said the aesthetic type checking stuff
for python is actually worth an
investment for our company
nice but
somehow they chose not to support
making my pie faster say or adding new
features to my Pi but both Google and
Facebook and later Microsoft developed
their own static type Checker I think
Facebook was one of the first
they decided that they wanted to use the
same technology that they had
successfully used for hhvm
because they they sort of they had a
bunch of compiler writers and and sort
of static type checking experts who had
written the hhvm compiler and it was a
big success within the company
and they had done it in a certain way
sort of
they wrote a big highly parallel
application in an obscure language named
o camel which is apparently mostly very
good for a writing static type checkers
interesting yeah I have a lot of
questions about how to write a static
type Checker then that's very confusing
Facebook wrote their version and they've
worked on it in secret for about a year
and then they came clean and went open
source
uh Google in the meantime was developing
something called Pi type which was
mostly
interesting because it as you may have
heard they have one gigantic mono repo
so all the code is checked into a single
repository Facebook has a different
approach so Facebook developed pyre
which which was written in O camel which
worked well with Facebook's development
workflow
and Google developed something they
called Pi type which was actually itself
written in Python
uh and it was meant to sort of fit well
in
their static type checking needs in
Google's gigantic mono repo so Google
wasn't one giant got it so the just to
clarify this static type checker
philosophically is a thing that's
supposed to exist outside of the
language itself and it's just a workflow
like a debugger for the book it's a
linter for people who don't know a
linter maybe you can correct me but it
it's the thing that runs through the
code continuously
pre-processing to find issues based on
style documentation I mean there's all
kinds of linters right you can check
that what usual things does a linger do
maybe check that you haven't too many
characters in a single line linters
often do static analysis where they try
to point out things that are likely
mistakes but not incorrect according to
the language specification like maybe
you have a variable that you never use
for the compiler that is valid you might
sort of you might be planning to use it
in future version of the of the code and
the compiler might just optimize it out
but the compiler is not going to tell
you hey you're never using this variable
a linter will tell you that variable is
not used maybe there's a typo somewhere
else where you meant to use it but you
accidentally use something else or there
are a number of sort of common scenarios
and A linter is often
a a big collection of little heuristics
where by looking at the combination of
how your code is laid out maybe how it's
indented maybe the comment structure
uh but also just
things like definition of names use of
names
it'll tell you likely things that are
wrong and in some cases linters are are
really style checkers
uh for python there are a number of
linters that check things like
do you use the the pep 8
recommended naming scheme for your
functions and classes and variables
because like classes start with an
uppercase and the rest starts with a
lowercase and
there's like differences there and so
the linter can tell you hey you have a
class that uh whose first letter is not
an uppercase letter and that's just I
just find it annoying if I wanted that
to be an uppercase letter I I would have
typed an uppercase letter but other
people find it very comforting that if
the linter is no longer complaining
about their code that they have followed
all the style rules maybe it's a fast
way for a new developer joining a team
to learn the style rules right yeah
there's definitely that but the best use
of linter is probably
not so much to to sort of
enforce team uniformity but to actually
help Developers
catch bugs that the compilers for
whatever reason don't catch and there's
lots of that in Python and so uh but a
static type checker
focuses on a particular aspect of the
linting which
I mean it might probably doesn't care
how you name your classes and variables
uh
but it is meticulous about when you say
that there was an integer here and
you're passing a string there it will
tell you hey that string is not an
integer so something's wrong either
either you were incorrect when you said
it was an integer or you're incorrect
when you're passing into string if this
is a race of static type Checkers is
somebody winning as you said it's
interesting that the companies didn't
choose to invest in this uh centralized
development
of my pie is is there a future for my
pie
what do you see as the oh well one of
the companies went out and everybody
uses like uh Pi type whatever Google's
is called well Microsoft is hoping that
uh Microsoft's horse in that race called
pyrite is going to win by right right
like r i g h t correct yeah my my all my
word processors tend to typo correct
that as pyrite the name of the I don't
know what it is
some kind of semi-precious metal all
right
I love it okay so okay that's the
Microsoft hope but it okay so let me ask
the question a different way is there
going to be ever a future whereas the
static type Checker gets integrated into
the language
nobody is currently excited about
doing any work towards that that doesn't
mean that five or ten years from now
the situation isn't
different
uh
at the moment
all the static type checkers
uh still evolve at a much higher speed
than Python and its annotation syntax
evolve you get a new release of python
once a year those are the only times
that you can introduce new annotation
syntax and there's there are always
people who invent new new annotation
syntax that they're trying to push
uh and worse
once we've all agreed that we are going
to put some new syntax in we can never
take it back
at least a sort of deprecating an
existing feature takes many releases
because you have to assume that people
started using it as soon as we announced
it
and then you can't take it away from
them right away you have to start
telling them well this will go away but
we're not gonna commit tell you that
it's an error yet and then later it's
going to be a warning and then
eventually three releases in the future
maybe we remove it
on the other hand the typical static
type checker
still has a release like
every month every two months certainly
many times a year
uh some type Checkers also include a
bunch of
experimental ideas that aren't official
standard python syntax yet yeah uh the
static type Checkers also just get
better at discovering
things that that sort of are unspecified
by the language but that sort of could
make sense and so each static type
Checker actually has its sort of strong
and weak points so it's cool it's like a
laboratory of experiments yep Microsoft
Google and all and you get to see and
you see that everywhere right because
there's not one single uh JavaScript in
engine either there is one in Chrome
there is one in Safari there's one in
Firefox
but that said you said there's not
interest I think there is a lot of
interest in type hinting right
um in the pep 484. actually like how
many people use that do you have a sense
how many people use because it's
optional it's the sugar I can't put a
number on it but
from the number of packages that do
interesting things with it at runtime
and the fact that there are like
now three or four very mature type
checkers
that each have their their segment of
the market and oh and then there is a
pie charm which has a sort of more
heuristic based type Checker that also
supports the same syntax
my assumption is that
many many people developing python
software professionally
for some kind of
production situation are using a static
type checker especially any anybody who
has a continuous integration cycle
probably has a
one of the steps in in there their
testing routine that that happens for
basically every every commit
uh is run a static type checker and in
most in most cases that will be my pie
so I think it's pretty popular topic
according to this webpage
20 to 30 percent of Python 3 code bases
are using type hints
wow I wonder how they measured that did
they just scan all of GitHub
yeah that's what it looks like yeah they
did a quick sentence all of but like a
random sampling
so you mentioned pie charm let me ask
you the uh the big subjective question
what's the best IDE for Python and
you're extremely biased now that you're
with Microsoft
um is it pie charm vs code Vim or emex
historically I actually uh started out
with using Vim but when it was still
called VI
uh for a very long time I think from the
early 80s to uh
I'd say two years ago
I was emacs user nice between I'd say
2013 and 2018
I dabbled with pie charm
uh mostly because it had
had a couple of features I mean
pie charm
is like deriving an 18-wheeler truck
whereas emacs is more
foreign
driving uh your comfortable Toyota car
that's that's that that you've had for a
hundred thousand miles and you know what
every little rattle of the car means
I was very comfortable in emacs uh but
there were certain things it couldn't do
it wasn't very good at at sort of at
least the way I had configured it
I didn't have very good Tooling in emacs
for finding a definition of a function
got it when I was at Dropbox
exploring a 5 million line python code
base
uh just grabbing all that code for where
they're where is there a class Fubar
Well turns out that if you grab all five
million lines of code there are many
classes with the same name
and so pycharm sort of once once you've
fired it up and once it's indexed your
repository uh was very helpful but as
soon as I had to edit code I would jump
back to emex and do all my editing there
because I could type much faster and
switch between files when I was when I
knew which file I wanted much much
quicker and I never really got used to
the the whole pie charm user interface
yeah I feel torn in that same kind of
way because I've used pycharm off and on
exactly in that same way
and I feel like I'm just being an old
grumpy man
for not learning how to quickly switch
between files and all that kind of stuff
I feel like that has to do with
shortcuts that has to do with um I mean
you just have to get accustomed just
like with touch typing yeah you have to
just want to to learn that I mean if you
don't need it much you don't need touch
typing either you can type with two
fingers just fine in the short term but
in the long term your life will become
better psychologically and productivity
wise if you learn how to type with 10
fingers if you do a lot of keyboard
input before everyone emails and stuff
right like you look at the the next 20
30 years of your life you have to
anticipate where technology is going
um do you want to invest in handwriting
notes probably not more and more people
are doing uh typing versus handwriting
notes so you can anticipate that so
there's no reason to actually practice
handwriting there's more reason to
practice typing
you can actually estimate back to the
spreadsheet the number of
paragraphs sentences or words you write
for the rest of your life
yes I mean all of that is not actual
like converting to a spreadsheet but
it's a gut feeling like I have the same
kind of gut feeling about books I've
almost exclusively switched to Kindle
now so ebook readers
even though I still love and probably
always will the smell the feel of a
physical book
and
you the reason I switched to Kindle is
like all right well this is really
Paving the future is going to be digital
in terms of consuming books and content
of that nature so you should get you
know you should let your brain get
accustomed to that experience
in that same way it feels like pie charm
or vs code I think pycharm is is the the
most sort of sophisticated featureful uh
python ID it feels like I should
probably at some point very soon switch
entire like I'm not allowed to use
anything else for python than this ID or
vs code it doesn't matter but walk away
from emacs for this particular
application because I think I'm limiting
myself in the same way that using two
fingers for typing is limiting myself
it's um this is a therapy session this
isn't I'm not even asking questions
but I'm sure a lot of people are not
going to stop you uh
I I think that that sort of everybody
has to decide for themselves which one
they want to to invest more time in
I actually
ended up giving vs code a very tentative
try when I started out at Microsoft and
really liking it
and it sort of it took me a while before
I realized why that was
but and and I think that actually the
founders of vs code may not necessarily
agree with me on this
but to me vs code is in a sense the
spiritual successor of emacs
because
as you probably know as an old emacs
hack
the the key part of emacs is that it
it's mostly written in in lisp
and that that sort of new features of of
emacs usually update all the list
packages and add new list packages and
oh yeah there's also
some very obscure thing improved in the
part that's not in lisp
but that's usually not why I would
upgrade to a new version of emacs
there's a core implementation
that that sort of
can read a file and it can put bits on
the screen and it can sort of manage
memory and buffers and then what makes
it an editor full of features is all the
list packages and of course the design
of how the list packages interact with
each other and with that that sort of
that base layer of of the the core
immutable engine without almost
everything in that core engine in emacs
case can still be overridden or replaced
and so
vs code has a similar architecture where
there is like
a base engine that you have no control
over
I mean it's open source but nobody
except the people who work on that part
changes it much
uh and it has a sort of a package
manager
and a whole series of interfaces for
packages and an additional series of
conventions for how packages should
interact with the lower layers and with
each other
and Powerful primitive operations that
let you
move the cursor around or select pieces
of text or delete pieces of text or
interact with the keyboard and mouse and
what other peripherals you have
and and so the sort of the the extreme
extensibility and the package ecosystem
that you that you see in vs code is a is
a mirror of very similar architectural
features in emacs well I'll have to give
it a serious try because uh as far as
sort of the hype and the excitement in
the general programming Community vs
code seems to dominate the interesting
thing about
pie charm and
uh what is it PHP storm which are these
jet brains uh specific IDs that are
designed for one programming language
it's interesting to
when an ID is specialized right they're
usually actually just specializations of
IntelliJ because underneath it's all the
same
editing Engine with different
veneer on top
where in vs code
many things you do
require
loading third-party extensions
in pycharm it is possible to have
third-party extensions but it is it is a
struggle to create one yes and it's not
part of the culture all that kind of
stuff yeah we that I remember that it
might have been five years ago or so we
were trying to get some better my Pi
integration into pie charm because my
pie is sort of python tooling and
pycharm
had had its own
type checking heuristic thing that we
wanted to replace with uh something
based on my pie because that was what we
were using in the company and it for the
for the guy who was writing that
by charm extension it was really a
struggle to to sort of find
documentation and get the development
workflow going and and debug his code
and all that so that that was was not a
pleasant experience
let me talk to you about parallelism
in your post titled reasoning about
async IO semaphore
you talk about a fast food restaurant
Silicon Valley that has only one table
is this a real thing I just wanted to
ask you about that is that just like a
metaphor you're using or is that an
actual restaurant in Silicon Valley it
was it was a metaphor of course okay
I can imagine such a restaurant so for
people who don't then read the thing you
should you should but it was uh
idea of a restaurant where there's only
one table and you show up one at a time
and you're prepared and I actually
looked it up and there is restaurants
like this throughout the world and it
just seems like a fascinating idea you
stand in line you show up there's one
table
they um they ask you all kinds of
questions they cook just for you that's
fascinating it sounds like you'd find
places like that in Tokyo it sounds like
a very Japanese thing or in the Bay Area
there are proper places that probably
more or less work like that but I've
never eaten at such a place the
fascinating thing is you propose is the
fast food this is all for a burger it
was one of my rare sort of more literary
or poetic moments where I thought I'll
I'll just open with a crazy example to
catch your attention and the rest is
very dry stuff about uh locks and
semaphores and how uh semaphore is a
generalization of a lock well it was
very poetic and well delivered and it
actually made me wonder if it's real or
not because you don't make that explicit
and it feels like it could be true and
in fact I wouldn't be surprised if
somebody like listens to this and knows
exactly a restaurant like this in
Silicon Valley anyway can we step back
and can you just talk about parallelism
concurrency threading
asynchronous all these different terms
what is it sort of a high philosophical
level the the fisherman is back in the
boat well the idea is if the fisherman
has uh two fishing rods
uh since fishing is mostly a matter of
waiting for a fish to nibble well it
depends on how you do it actually but if
you had two if if you're doing the style
of fishing where you sort of you you
throw it out and then you let it sit for
a while until maybe you see a nibble one
fisherman can easily run two or three or
four
fishing rods and so as long as you can
afford the equipment you can catch four
times as many fish by
a small investment in four fishing rods
and so since your time you sort of say
you have all Saturday to go fishing if
you can catch four times as much fish
you have a much higher productivity and
that's actually I think how deep sea
fishing is done you could just have a
rod and you put in a hole so you could
have many rods uh what is there an
interesting difference between
parallelism and concurrency
and asynchronous is there one subset of
the other to you like how do you think
about these terms in the computer World
there is a big difference when people
are talking about parallelism uh like a
parallel computer
that's usually really
several complete CPUs that are sort of
tied together and and
share something like memory or an i o
bus
uh concurrency can be a much more
abstract concept
where
you have the illusion that things happen
simultaneously but what the computer
actually does is
it spends a little time running some
this program for a while and then it
spends some time running that program
for a while and then spending some time
for the third program for a while
the parallelism is the reality and
concurrency is part reality part
illusion yeah parallelism typically
implies that there is multiple copies of
the hardware
you write that implementing
synchronization Primitives is hard in
that blog post and you talk about locks
and semaphores why is it hard to
implement synchronization Primitives
because at the conscious level our
brains are not trained to to sort of
keep track of multiple things at the
same time like obviously you can walk
and chew gum at the same time
because they're both activities that
require only a little bit of your
conscious
activity but try balancing your
checkbook
and watching a murder mystery on TV yeah
you'll mix up the digits or you'll miss
an essential clue on in the TV show
so why does it matter that the
programmer the human
is uh is bad because the programmer is
at least with the current state of the
art is responsible for
writing the code correctly and it's hard
enough
to keep track of
a recipe that you just
execute one step at a time
chop the carrots then peel the potatoes
mix the icing you need your whole brain
when you're when you're reading a piece
of code what what is going on okay we're
we're loading the number of mermaids in
variable a and the number of mermen in
variable B and now we take the average
or whatever
metaphor to Metaphor I like it you have
to keep in your head what is an a what
is in B what is in C uh hopefully you
have better names
and that is challenging enough
if you have two different
pieces of code that are are sort of
being executed
simultaneously weather is using the
parallel or the concurrent
approach
if like
a is the number of fishermen and B is
the number of programmers
but in another part of the code a is the
number of mermaids and B is the number
of merman
and somehow that's the same variable if
you do it sequentially if first you do
your mermaid mer people computation and
then you do your people in the boat
computation it doesn't matter that the
variables are called A and B and that is
literally the same variable because you
you're done with one use of that
variable but when you mix them together
suddenly
the number of more people replaces the
number of fishermen and your computation
goes dramatically wrong and there's all
kinds of ordering
of operations that could result in the
assignment of those variables and so you
have to anticipate all possible
orderings and you think you're smart and
you'll put a lock around it and in
practice in terms of bugs per lineup per
a thousand lines of code
this is an area where everything is
worse so a lock is a mechanism by which
you forbid only one
Chef can access the oven at a time
something like that and then semaphores
allow you to do what multiple ovens
that's not the bad idea because if
you're sort of
if you're preparing if you're baking
cakes and you have multiple people all
baking cakes but there's only one oven
yeah then maybe you can tell that the
oven is in use but maybe it's preheating
uh and so you have to maybe maybe you
make a sign that says oven and use
uh and you flip the sign over and it
says often is free when you're done
baking your cake
uh that's a lock that's sort of and and
what do you do when you have two ovens
or maybe you have ten ovens do you you
can put a separate sign on each oven or
maybe you can sort of someone who comes
in wants to see at a glance
and maybe there's an electronic sign
that says uh there are still five ovens
available
uh or maybe they're already
three people waiting for an oven so you
can
if you see an oven that's not in use
it's already reserved for someone else
who got in line first
and that's sort of what what what the
restaurant metaphor was trying to
explain yeah and so you're now tasks
you're sitting as a designer of python
with a team of brilliant core developers
and have to try to figure out to what
degree can any of these ideas be
integrated and not so maybe this is a
good time to ask what is async IO
and how has it evolved since Python 3.4
wow yeah so we had this really old
library for for doing things
concurrently especially things that had
to do with IO and uh networking i o was
especially uh sort of a popular topic
and
in the python standard Library we had a
brief period where there was lots of
development and I think it was late 90s
maybe early 2000s and like
two little modules were added that were
the state of the Art of Doing
asynchronous IO or sort of non-blocking
AIO which means that you can keep
multiple network connections open and
sort of service them all in parallel
like a typical web server does so iOS
input and outputs you're writing either
to the network network connection or
reading and writing to a hard drive the
story also possible and you can do uh
the ideas you could do to multiple while
also doing computation
process of running some code that does
some fancy stuff yeah like when you're
writing a web server when a request
comes in a user sort of needs to see a
particular web page
uh you have to find that page maybe in
the database and format it properly and
send it back to the client and
there's a lot of waiting waiting for the
database waiting for the network and so
you can handle hundreds or thousands or
millions of requests
concurrently on one machine anyway waste
of doing that in Python were kind of
stagnated and uh
I forget it might have been around
2012 2014
uh when someone for the umpteenth time
actually said these async chat and async
core modules that you have in the
standard Library are not quite enough to
solve my particular problem
can we add one tiny little feature and
everybody said no that stuff is not too
but you're not supposed to use that
stuff write your own using uh
third-party library and then everybody
started a debate about what the rights
third-party library was
and somehow I I felt that
that was actually a cue for well maybe
we need a better state of the art
module in the standard library for for
multiplexing input output from different
sources you could say that it spiraled
out of control a little bit it was at
the time it was the largest python
enhancement proposal that was ever
proposed
and you were deeply involved with that
at the time I was very much involved
with that I was like the lead architect
uh I ended up
talking to people who had already
developed Syria's third-party libraries
that did similar things and sort of
taking ideas from them and
getting their feedback on my design and
eventually we put it in the standard
library and after a few years I got
distracted I think the thing the big
thing that distracted me was actually
type annotations
but other people kept it alive and
kicking and it's been quite successful
actually yeah
in the world of python web clients so
initially what are some of the design
challenges there in that debate for the
pep and what are some things that got
rejected what are some things that got
accepted to stand out to you
there are a couple of different ways you
can handle parallel i o and this happens
sort of at an architectural level in
operating systems as well like Windows
prefers to do it one way and Unix
prefers to do it the other way
you sort of
you have an object that represents a
network endpoint say a connection with a
web browser that your client
and say you're you're waiting for an
incoming request two fundamental
approaches are
okay I'm waiting for an incoming request
I'm doing something else come wake me up
or of course sort of come tell me when
uh something interesting happened like a
packet came in on that network
connection
and the other Paradigm is
we're on a team of a whole bunch of
people with maybe a little mind and we
we can only manage one web connection at
a time so
I'm just sitting
looking at this this web connection and
I'm just blocked until something comes
in and then uh I'm already waiting for
it
uh I get I get the data I process the
data and then I go back to the top and
say no sort of I'm waiting for the next
packet those are about the two paradigms
one is
a paradigm where there is sort of
notionally a threat of control whether
it's an actual operating system thread
or more an abstraction in async IO we
call them tasks
but a task in async IO or a thread in
other contexts is devoted to one thing
and it has Logic for all the stages like
when it's a web request like
first wait wait for the first line of
the web request parse it because then
you know if it's a get or a post or a
put or whatever or an error uh then wait
until you have a bunch of lines until
there's a blank line then parse that as
headers and then interpret that and then
wait for the rest of the data to come in
if there is any more that you request
expect that sort of standard web stuff
and the other thing is and there's
always endless debate about which
approach is more efficient and which
approach is more error prone
where I just have a whole bunch of
stacks in front of me and uh whenever
a packet comes in I sort of look at the
number of the pack that there's some
number on the packet and I say oh that
packet goes on this pile
and then I can do a little bit and then
sort of that pile provides my context
and as soon as I'm done with with the
processing I sort of
I can forget everything about what's
going on because the next packet will
come in from some random other client
and it's that pile or this pile
uh and every time a pile is maybe empty
or full or whatever the criteria is I
can toss it away or use it for a new
space but
several traditional third-party
libraries for asynchronous i o
processing in Python shows the model of
a callback
and that's that's the idea where you
have a bunch of different stacks of
paper in front of you and every time
someone gives you a piece gives you new
sheet you decide which stack it belongs
to
and that leads to a certain style of
spaghetti code that
I find sort of aesthetically
not pleasing and I I was sort of never
very successful and I had heard many
stories about people who were also
sort of complaining about that style of
coding
uh it was very prevalent in JavaScript
at the time at least because it was like
how the JavaScript event Loop basically
works and so I thought well the
task-based model where each task has a
bunch of logic we had mechanisms in the
Python language that we could easily
reuse for for that and I thought I want
to build a whole library for
asynchronous networking i o
uh and all the other things that may
need to be done asynchronously
uh based on that Paradigm and so I just
chose a paradigm and tried to see how
far I could get with that and it turns
out that it's pretty good paradigm so
people enjoy that kind of Paradigm
programming for asynchronous Io relative
to callbacks
okay beautiful so how does that all
interplay with
the infamous Gill the goal the global
interpreter lock
maybe can you say what the Gill is and
how does the dance beautifully with Ace
in Kyle
the global interpreter lock
solves the problem that python
originally was not written with either
asynchronous or or parallelism in mind
at all there was no concurrency in the
language there was no parallelism there
were no threads
only a small number of years into
Python's initial development
all the new cool operating systems like
uh Sun OS and silicon graphics irex and
then eventually posix and windows all
came with threading libraries
that lets you do multiple things in
parallel and there is a certain
certain sort of principle which is the
operating system handles the threads for
you
and the program can pretend that there
are as many CPUs as as there are threads
in the program
uh and those CPUs were completely
independently
and if you don't have enough CPUs the
operating system sort of simulates those
extra CPUs on the other hand if you have
enough CPUs you can get a lot of work
done by deploying those multiple CPUs
but python wasn't written
to to do that
uh and so
as libraries for for multi-threading
were added to C
but every operating system vendor was
adding their own version of that
we thought and maybe we were wrong but
at the time we thought well we quickly
want to be able to support these
multiple threads
because they seemed at the time in the
early 90s when they were new at least to
me
they seemed a cool interesting
programming Paradigm and one of the
things that that python at least at the
time
felt was nice about the language was
that we could give a
safe version of all kinds of cool new
operating system toys to the python
programmer like I remember
one or two years before threading I I
had spent some time adding networking
sockets
uh to Python and they were very literal
translation of the networking sockets
that were in the BSD operating system so
Unix BSD
but the nice thing was if you're using
sockets from python then all the things
you can do wrong with sockets in C would
automatically give you a clear error
message instead of just ending up with a
malfunctioning hanging program
and so we thought well we'll do the same
thing with threading
but we didn't really want to rewrite The
Interpreter
to be thread safe because that that was
was like
that would be a very complex refactoring
of all The Interpreter code and all the
runtime code because all the objects
were written with the assumption that
there is only one thread and so we said
okay well we'll take our losses we'll
provide something that looks like
threads
and as long as you only have a single
CPU on your computer which most
computers at the time did
uh it feels just like threads because
the the whole idea of of multiple
threads in the OS was that even if your
computer only had one CPU you could
still fire up at many threads as you
wanted well within reason maybe 10 or 12
not 5000.
uh and so we thought we had conquered
the
the abstraction of threads pretty well
because multi core uh CPUs were were not
in in most python programmers hands
anyway
and then of course a couple of more
iterations of Moore's Law and computers
getting faster and at some point
uh the chip designers decided that they
couldn't make the CPUs faster but they
could still make them smaller and so
they could put multiple CPUs on one chip
and suddenly there was all this pressure
about
do things in parallel and that's where
the the solution we had in Python didn't
work
and that's that's sort of the moment
that the Gill became
became infamous
because the guilt the guilt was the
solution we used to sort of
take this single interpreter and share
it between all the different operating
system threats that you could create and
so as long as the the hardware
physically only had one CPU that was all
fine
and then as Hardware vendors were
suddenly telling us all oh you got to
paralyze everything's got to be
paralyzed
people started saying oh uh but we can
use multiple threads in Python and uh
then they discovered oh but actually all
threads run on a single single core yeah
I mean is there a way is there ideas for
in the future to remove
the global interpreter log Gill like
maybe multiple sub interpreters some
tricky
interpreters on top of interpreters kind
of thing yeah there there are a couple
of possible uh Futures there the the
most likely future is that we'll get
multiple sub interpreters
which each run a completely independent
Python program nice uh but they're
they're still some benefit of
of sort of faster communication between
those programs but it's also managing
for you this running a multiple python
programs yeah so it's hidden from you
right the it's it's hidden from you but
you have to spend more time
communicating between those programs
because the sort of
the attractive
thing about the multi-threaded model is
that the threads can share objects at
the same time that's also the downfall
of the multi-threaded programming model
because when you do share objects
you were and you didn't necessarily
intend to share them or uh there were
aspects of those objects that
that were not reusable you get all kinds
of concurrency bugs and so
the reason I wrote that little blog post
about semaphores was that concurrency
bugs are just harder it would be nice if
python had uh no Global interpreter lock
and it had the so-called free threading
but it would also cause a lot more
software bugs
uh the interesting thing is that there
is still a possible future where we are
actually going to or where we could
experiment at least with that
because there is a guy working for
Facebook who has developed a fork of C
python
that he called the no-gill
interpreter where he removed the Gill
and made a whole bunch of optimizations
so that the the single threaded case
doesn't run too much slower
uh and multi-threaded case will actually
uh use all the cores that you have
and so that that would be an interesting
possibility
if we would be willing as a python core
developers to actually
uh maintain that code indefinitely
and if we're willing to put up with the
additional complexity of The Interpreter
and the additional sort of overhead for
the single threaded case and I'm
personally not convinced
that
there are enough people
uh needing the speed of multiple threads
with their python programs
that it's worth to sort of
take that performance hit and that
complexity hit
and I I feel that the Gill actually is
pretty nice
Goldilocks point between no threads and
uh all threads all the time but not
everybody agrees on that so that is
definitely a possible future the sub
interpreters look like a fairly safe bet
for 312 so say a year from now a year so
the goal is to do a new version every
year
yeah for python let me ask you perhaps a
fun question but there's a philosophy
dude to will there ever be a python 4.0
now before you say it's currently a joke
and probably not I'm gonna go to 3.99 or
3.99 999.
can you imagine possible features
that python 4.0 might have that would
necessitate the creation of the new 4.0
given the amount of
pain and joy
suffering and Triumph that was involved
in the move between version 2 and
version three
yeah well we're we
as a community and as a core development
team we have a large amount of painful
memories about the Python 3 transition
uh which is one reason that sort of
everybody is happy that we've decided
there's not going to be a 4.0 at least
not anytime soon and if there is going
to be one it would will sort of plan the
transition very differently because
clearly we underestimated the pain the
transition
cost for our users in the Python 3 case
and
had we known we could have sort of
designed Python 3 somewhat differently
without making it any worse
we just thought that we had a good plan
but we we we underestimated where
what what sort of the users were capable
of when it comes to that kind of
transition by the way I think we talked
way before like a year and a half before
the uh python 2 officially
end of life end of life oh yeah
what was that what was your memory of
the end of life did you shed a tear on
January 1st 2020 did was there
everyone's standing alone our team had
basically moved on years before
yeah it was it was purely it was a
little symbolic moment
uh to signal to the the remaining users
that
there was no longer going to be any new
releases or support for python 2 7.
did you shed a single tier while looking
out over the horizon
I'm not not a very poetic person and I
don't shed tears like that but no
yeah we we actually had planned a party
but the party was planned for uh the
python the U.S python conference that
year which would never happened of
course because of the pandemic oh is it
like in March yeah the conference was uh
going to be I think late April that year
oh
so that that was a very difficult
decision to cancel it but
they did so anyway if we're going to
have a python 4 we're going to have to
have both a different reason for for
having that
and a different process for managing the
transition can you imagine a possible
process that so so I think you're
implying that if there is a 4.0 in some
ways it would break back compatibility
well so
here is here is a concrete thought I've
had and I'm not unique but not everyone
agrees with this so this is definitely a
personal opinion
if we were to try something like that no
Guild python
uh
my expectation is that
it would feel
just different enough
at least for the the part of the Python
ecosystem that
is heavily based on C extensions
and that is like the entire machine
learning data science Scientific Python
world is all based on C extensions for
python
and so
those people would likely feel the pain
the most
because they even if we don't change
anything about the syntax of the
language and the semantics of the
language when you're writing python code
we we could even say suppose that after
python say 3 19 instead of 320 we'll
have 4.0 suppose that's the time when we
flip the switch to 4.0 will will not
have a gill
imagine it was like that
so
I would probably say
that particular year the release that we
named 4.0 will be syntactically it will
not have any new syntactical features no
new modules in the standard Library no
new built-in functions
everything will be at the python level
will be purely compatible with python
3.19
however
extension modules
will have to make a change they all have
to be recompiled they will not have the
same
binary interface
uh
the semantics and and apis for for some
things that are frequently accessed by C
extensions will be different and so for
a pure python user
4.0 would be a breeze except that there
are very few pure python users left
because everybody who is using python
for something significant is using
third-party extensions they're like I
don't know several hundreds of thousands
of third-party extensions on uh the Pi
Pi service
and I'm not saying they're old they're
all good but there is a large list of
extensions that would have to do work
and some of those extensions are
currently already low on maintainers and
they're struggling to keep afloat so
there you can give a huge heads up to
them if you go to 4.0 to really keep
developing it yeah we'd probably have to
do something like
several years before who knows maybe
five years earlier like 3.15 we would
have to say and and I'm just making the
the specific numbers up but we at some
point we'd have to say foreign
python could be an option
it might be a compile time option
uh
if you want to use no Guild python you
have to recompile python from source for
your platform using your tool set
all you have to do is change one
configuration variable and then you just
run make
or configure and make and it will build
it for you
but now you also have to use the the
no-gill compatible versions of all
extension modules you want to use
and so as long as many extension modules
don't have fully functional
sort of variants that work within the
no-gill world
that's not a very practical thing for
python users but it would allow
extension Developers
to test the waters see what they need to
syntactically to be able to compile at
all maybe they're using
functions that are defined by the Python
3 runtime that won't be in the python 4
runtime those functions will not work
they'll have to find an alternative
uh but they can experiment with that and
sort of write test applications and that
would be a way to transition and that
that could be a series of releases where
the python 4 is more and more imminent
uh we have supported more and more
third-party extension modules to have
solid support that works for no Guild
python for that new API
uh and then sort of python python 4.0 is
like the official moment that the mayor
comes out and cuts the ribbon and is now
a python uh now the sort of no-gill mode
is the default and maybe the only mode
there is
the internet wants to know from Reddit
uh it it's uh it's a small and fun
question there's many fun questions but
uh out of the Pi Pi packages Pi Pi
packages uh do you have uh do you have
ones you like do you in your opinion
other must have Pi Pi libraries or ones
you use all the time constantly oh my
that
I should really have a standard answer
for that question but like a positive
standard answer but my current standard
answer is that I'm not a big user of
third-party packages
when I write python code I'm usually
developing some tooling around building
python itself
and the last thing we want is
dependencies on third-party packages so
I I tend to just use the standard
library and that's where your focus is
that's where your mind is
but do you do you keep an eye of what's
out there to understand where the
standard Library could be moving should
be moving it's a good kind of landscape
of what's missing from the same Library
well usually when something's missing
from the standard Library nowadays uh
it is a relatively new idea and there is
a third party implementation or maybe
possibly multiple third-party
implementations but they evolve at a
much higher rate than they could when
they're in the standard Library so they
it would be a big reduction in in
activity to
incorporate things like that in the
standard Library so I I like that there
is a lively package ecosystem and that
sort of recent Trends in the standard
Library are actually that we're doing
the occasional spring cleaning where
we're just
we're we're
choosing some
modules that have not had a lot of
change in a long time and that maybe
would be better off not existing at all
at this point because there might be a
better third party
alternative anyway and we're sort of
slowly removing those that often those
are things that I sort of
I spiked somewhere in 1992 or 1993 and
if you look look through the commit
history it's very sad like
all cosmetic changes like changes in the
indentation style or uh the name of this
other standard Library module got
changed or like like nothing nothing of
any substance the API is identical to
what it was 20 years ago
So speaking of packages they have a
a lot of impact on a lot of people's
lives does it make sense to you why
python has become the primary the
dominant language for the machine
learning community so packages like uh
Pi torch tensorflow second learn and
even like the lower level stuff like
numpy sci-fi pandas matplotlib with
visualization can you like does it make
sense to you why it uh
uh permeated the entire data science
machine learning AI community
well it's
part of it is an effect that's as simple
as
we're all driving on the right side of
the road right
uh it's compatibility yeah it's it's in
and and and and part of it is uh
not not quite as as as fundamental as
driving on the right side of the road
which you have to do for for safety
reasons I mean you have to agree on
something
every they they could have picked
JavaScript or Pearl there was there was
a time in the early 2000s that it really
looked like Pearl was going to dominate
like biosciences
because DNA search was all based on
regular expressions and pearl has the
fastest and most comprehensive regular
expression engine still does
I spent quite a long time with pearl
that was another letting go
letting go of this kind of uh data
processing uh system yeah the reasons
why python
became the lingua Franca of the
scientific code and and
machine learning learning in particular
and data science
it really had a lot to do with
anything was better than C or C plus
plus
recently a guy who worked at Lawrence
Livermore National Laboratories in the
the sort of computing division
wrote me his his Memoirs and and he had
his his own view of how he helped
something he called computational
steering into existence
and this was the idea that you you take
libraries that in in his days were
written in Fortran that that solved
Universal mathematical problems
uh and those libraries still work but uh
the scientists that use the libraries
use them to solve continuously different
specific applications and answer
different questions and so those poor
scientists
were
were required to to use say Fortran
because Fortran was the library the
language that the library was written in
and then the scientist would have to
write an application that sort of uses
the library to solve a particular
equation or set off
of answer a set of questions and the
safe same for C plus plus
because of and there's there's
interoperability so the dusty decks are
written either in C plus plus or Fortran
uh
and so Paul DuBois was one of the people
who
I think in the mid 90s
saw that that you needed a higher level
language
for the scientists
to to sort of tie together the
fundamental mathematical algorithms of
linear algebra and and other stuff
and so
gradually some libraries started
appearing that did very fundamental
stuff with arrays of numbers in Python I
mean when I first created python I was
not expecting it to be used for arrays
of numbers much I thought that was like
an outdated data type
and everything was like objects and
strings and like python was good and
fast at string manipulation and objects
obviously but arrays of numbers were not
very efficient in the multi-dimensional
arrays didn't even exist in the language
at all
uh but there were people who realized
that python had extensibility
that was flexible enough that they could
write
third-party packages that did support
large arrays of numbers and operations
on them very efficiently
and somehow they got a foothold
through sort of different
parts of the scientific Community I I
remember that the Hubble Space Telescope
people in Baltimore Were Somehow big
python fans in the late 90s
and at various points
small improvements were made and more
people got in touch with using python to
derive these libraries
of interesting uh algorithms and like
once once you have a bunch of scientists
who are working on similar problems say
they're all working on stuff that that
com data that comes in from the Hubble
Space Telescope but they're looking at
different things some some are looking
at stars in this galaxy other are
looking at galaxies the math is
completely different but the the
underlying
libraries are still the same
and so they Exchange
code they say well I wrote this Python
program or I wrote a python library to
solve this class of problems
and the other guys either say oh I can
use that Library 2 or if you make a few
changes I can use that Library too
why why start from scratch in Pearl or
JavaScript
where there's not that infrastructure
for arrays of numbers yet whereas in
Python you have it and so more and more
scientists at different places doing
different
different work
discovered Python and then then people
who had an idea for an important new
fundamental Library decided oh python is
is actually already known to our users
so
let's use python as the user interface I
think that's how tensor I imagine at
least that's how tensorflow ended up
with python as the user interflow
interface right but with tensorflow
there's a deeper history of what the
community is it's not just like what
packages it needs it's like what the
community leans on for programming
language because tensorflow
uh had a prior library that was internal
to Google but there was also competing
machine learning Frameworks like thiano
Cafe they were in Python there was some
Scala
um some other languages but python was
really dominating it
and it's interesting because
um there's other languages from the
engineering space like Matlab
that a lot of people used but different
design choices by the company by the
core developers led to it not spreading
and one of the choices of Matlab
uh by math works is to not make it open
source right or yeah not you know having
people pay it was a very expensive
product and so uh universities
especially disliked it because it was a
price per seat I I remember hearing
yeah but I think that's not why it
failed or it failed to spread I think
the universities didn't like it but they
would still pay for it
the thing is it didn't feed into that
GitHub open source
uh packages culture so like and that's
somehow a precondition for um for viral
spreading the hacker culture like the
tinkerer culture uh with python it feels
like you can build a package from
scratch or solve a particular problem
and get excited about sharing that
package with others and that creates an
excitement about a language I tend to
like Python's approach to open source in
particular because it's sort of
it's almost egalitarium
uh there's there's little hierarchy
there's there's obviously some because
the like you only need to decide whether
you drive on the left or the right side
of the road sometimes
but there is a lot of access for people
with little power you don't have to work
for a big tech company to make a
difference in the python world
uh we have affordable events that really
care about community and support people
and sort of the community is is
it's like a big deal at our conferences
and in in the BSF when the psf funds
events it's always about
growing the community the psf funds very
little development
they that they do some but most of the
develop most of the money that the psf
forks out
uh is to community
fostering things
So speaking of egalitarian last time we
talked four years ago it was just after
you stepped down from your role as the
benevolent dictator for life pdfo
looking back what are your insights and
lessons
you learn from that experience about
python developer Community about human
nature about human civilization
life itself oh my uh
I probably held on to the position too
long
I remember being just
extremely stressed for a long time
and
it wasn't very clear to me
what was leading what was causing the
stress
and looking back
uh
I I should have sort of
relinquished my central role as bdfl
sooner
what were the pros and cons of the bdfl
role like what were the you not
relinquishing it what what are the
benefits of that for the community and
what are the drawbacks well the the
benefits for the community would be
things like
uh
Clarity of vision and sort of
a clear Direction because I I had
certain ideas in in mind when I created
Python and well I sort of let myself be
influenced by many other ideas as python
evolved and became
more successful and more complex and
more used
I also stuck to certain principles and
it's still hard to say what are Python's
core principles
but the fact that I was playing that
role and sort of always very active
grew the community in a certain way
it modeled to the community how to think
about
how to how to solve a certain problem
well
that was a source of stress but it was
also beneficial it was a source of
stress for me personally but it was
beneficial for the community because uh
people people sort of
over time had
learned how I was thinking and could
predict
yeah but how how I would decide about a
particular issue and not always
perfectly of course but there was like
there wasn't a lot of jerking around
like this year we're all this year the
Democrats are in power and we're doing
these kind of things and now the
Republicans are in power and they roll
all that back and do those kind of
things
there is a clear fairly straight path
ahead
and so fortunately the the successor
structure with the steering Council
has has sort of found a similar way of
of leading the community
in a fairly steady Direction without
stagnating and and for me personally
it's more fun because there are there
are things I can just ignore
yeah oh yeah there's a bug in
multi-processing let someone else decide
whether that's important to solve or not
I'll I'll stick to typing in the async
io and the faster interpreter yeah it
allows you to focus a little bit more
yeah
uh what are interesting differences in
culture if you can comment on between
Google Dropbox and Microsoft from our
Python Programming perspective all
places you've been to the positive
is there a difference or is it just
about people and there's great people
everywhere or is there culture
differences
sort of Dropbox is much smaller than the
other two in your list yeah so that
that is a big difference the set of
products they provide is more it's
narrower so they're more focused smaller
code base yeah and and Dropbox sort of
at least during the time I was there
had the tendency of sort of
making a big plan putting the whole
company behind that plan for a year and
then evaluate and then suddenly find
that
everything was wrong about the plan and
then they had to do something completely
different
so there were there was like
the annual engineering reorg was was
sort of an unpleasant tradition that
Dropbox because like oh there's a new VP
of engineering and so now all the
directors are being reshuffled and this
guy was in charge of of
infrastructure one year and the next
year he was made in charge of I don't
know product development
it's fascinating because like you don't
think about these companies internally
but I you know Dropbox to me from the
very beginning was one of my favorite uh
Services there's certain like programs
and online services that
make me happy make me more efficient and
all that kind of stuff but one of the
powers of those kinds of services they
disappear they you're not supposed to
think about how it all works but it's
incredible to me that you can sync stuff
effortlessly
across so many machines so quickly and
like don't have to worry about conflicts
they they take care of the you know as a
person that comes from version
repositories and all that kind of stuff
or merge is super difficult and uh just
keeping different versions different
files is very tricky the fact that they
could take care of that is just I don't
know the the engineering behind the
scenes must be super difficult both on
the computer infrastructure and the
software a lot of internal sort of
hand-wringing about things like that but
the the product itself always worked
very smoothly yeah it does but there's
probably a lot of lessons to that you
can have a lot of turmoil inside on the
engineering side but If the product is
good the product is good and don't maybe
don't mess with that either it is you
know when it's good
keep it's like with Google focus on the
search and the ads
right like and the money will come yeah
and make sure that's done extremely well
and don't forget what you do extremely
well and in what ways do you provide
value and happiness to the world make
sure you do that well
um is there something else to say about
Google and Microsoft Microsoft has said
a very fascinating shift recently with
the new CEO uh what you know recent CEO
with purchasing GitHub
embracing open source culture embracing
the developer culture is pretty
interesting to see that's like why I
joined Microsoft
I mean after after retiring and thinking
that I would stay retired for the rest
of my life which of course was a
ridiculous thought but that I was I was
I was done working for a bit and then
the pandemic made me realize that work
work can also provide a source of
fulfillment
keep you keep you out of trouble
uh Microsoft is a very interesting
company because it has this incredible
very long and varied history and this
amazing catalog of products that many of
which also date way back
I mean
I've been been talking to a bunch of
excel people lately and Excel is like 35
years old yeah and they can still read
spreadsheets that that they might find
on an old floppy Drive
yeah there's man they built so many
incredible tools through the years
Excel one of the great shames of my life
is that I've never learned how to use
Excel well I mean it just always felt
like so many features are there it's
similar with ideas like pie charm
it feels like I I converge quickly to
the dumbest way to use a thing to get
the job done when clearly there's so
much more power at your fingertips
yeah but there's I I do think there's
probably expert users of Excel and oh
that Excel is a cash cow actually oh it
actually brings in money oh yeah a lot
of the engineering sort of if you look
deep inside Excel
there's some very good engineering very
very impressive stuff
okay now I need to definitely learn it's
a little better I had issues because I'm
a keyboard person so I had issues coming
up with shortcuts I mean Microsoft
sometimes
uh it's changed over the years but
sometimes they kind of want to make
things easier for you on the surface and
therefore make it harder for like uh
people that like to have shortcuts and
all that kind of stuff to optimize their
workflow now Excel is probably people
are probably yelling at me and it's like
no Excel probably has a lot of ways to
optimize work in fact I keep discovering
that there are many features in Excel
that only exists at keyboard shortcuts
yeah that's the sense I have and now
like I'm embarrassed that it's just you
just have to know what they are yeah
that's that's like there's no logic or
or Reason to the assignment of the
keyboard shortcuts because they they go
back even longer than 35 years
can you maybe comment about Sachin Adela
and how hard it is for CEO to sort of
pivot a company towards open source or
develop a culture is there something you
could see about like how what's the role
of leadership in such a
pivot and definition of a new vision
I've never met him but uh I hear
he's just a really sharp
thinker
but he also has an incredible business
sense
he took the organization that had very
solid pieces but that was also
struggling
with all sorts of shameful things
especially the Steve Ballmer time
I imagine in part through his personal
charm and thinking and of course the the
great trust that that the the rest of
the leadership has in him he managed to
to Really Turn the company around and
sort of
change it from from openly hostile to
open source
to to actively embracing open source and
that doesn't mean that suddenly Excel is
going to go open source but that means
that there's room for a product like vs
code which is open source
yeah that's fascinating it gives me
faith that large companies with good
leadership can grow can expand can
change and pivot and so on develop
because it gets harder and harder as the
company gets large
um you wrote a blog post in response to
a person looking for advice about
whether with the Cs degree to choose a
nine to five job or to become an
entrepreneur it's an interesting
question if you just think from first
principles right now somebody has took a
few years in programming has loved
software engineering in some sense
creating python is an entrepreneurial
endeavor
that's a choice that a lot of people
that are good programmers have to make
do I work for
a big company or do I create something
new
or you can work for a big company and
create something new there
oh inside the yeah I mean big companies
have individuals who create new stuff
that eventually grows big all the time
and if you're the person that creates a
new thing and grows big you you'll have
a chance to move up quickly in the
company to run that thing
if that's your aspiration what what what
can also happen is that
someone is brilliant engineer and sort
of builds a great first version of a
product
and has no aspirations
to then become a manager and grow the
team from five people to 20 people to
100 people to a thousand people and
be in charge of hiring and meetings and
they move on to
inventing another crazy thing inside the
same company or sometimes they
they found a startup or they moved to a
different great large or small company
there's all sorts of models
and sometimes people sort of do have
this whole trajectory from engineer
buckling down writing code
not nine to five but more like
noon till midnight
seven days a week
and coming up with a product and sort of
staying in charge I mean if you take
drew a house and dropbox's founder he is
still the CEO
and at least when when I was there he
had not checked out or anything he was
he was good CEO but he had started out
as the technical inventor or co-inventor
yeah and so he was someone who
I don't know if he always aspired that I
think when when he was 16 he already
started a company so maybe maybe he did
but he sort of
it turned out that that he had he did
have the the personal sort of skill set
needed to to grow and and stay on top
and other people sort of
our brilliant engineers and horrible at
management I I count myself at least in
the second category so you're you're
your first love and still your love is
to be the quote unquote individual
contributor so the programmer
do you have advice for a programming
beginner on How to Learn Python the
right way
find something you actually want to do
with it
if you say
I want to learn skill X
that's not enough motivation you need to
pick something
and it can be it can be a crazy problem
you want to solve it can be completely
unrealistic
but something that that challenges you
in into actually learning
coding in in some language
and there's so many projects out there
you can look for like that that doesn't
have to be some big ambitious thing it
could be writing a small bot if you're
into social media you can write a read
about or a Twitter bot or uh or some
aspect of automating some as something
that you do every single day processing
files all that kind of stuff nowadays
you can take machine learning components
and and sort of
plug those things together so cool stuff
with them so that's actually a really
good example so if you're interested in
machine learning the state of machine
learning is such that like a a tutorial
that takes an hour can get you to start
using uh pre-trained models to do
something super cool and that's a good
way to Learn Python because you learn
just enough to run this model and that's
like a sneaky way to get get in there to
figure out how to import stuff how to
write basic i o
how to run functions and I'm not sure if
it's the best way to learn the basics of
python but it could be nice to just get
fall in love first and then figure out
the basics right yeah you can't expect
to Learn Python from a one hour video
recording blanking out on the name of of
someone who
who wrote a very funny blog post where
he said
I see all these ads for things like
Learn Python in 10 days or so and he
said the goal should be Learn Python in
10 years that's hilarious but I
completely disagree with that I think
the criticism behind that is that
the the places just like the blog post
from earlier the places that tell you
you learn python in five minutes or 10
minutes they're actually usually really
bad tutorials so the thing is I do
believe that you can learn a thing
in an hour to like get some interesting
quick like it hooks you I mean this but
it just takes a tremendous amount of
skill to be that kind of educator
Richard Feynman was able to condense a
lot of ideas and physics in a very short
amount of time but that takes a deep
deep understanding and so yes of course
the actual I think the 10 the 10 years
is about the experience the pain along
the way and there's something you have
to practice you can memorize the syntax
but well I couldn't but maybe maybe
someone else can but that doesn't make
you a coder yeah actually coding has
changed in fascinating ways because so
much of uh coding is copying pasting
from stack Overflow and then adjusting
which is another way of coding and I
don't want to talk down to that kind of
style of coding because it's kind of
nicely efficient but you know where that
is going
I use it every day and it really yeah it
writes a lot of code for me and usually
it's slightly wrong but it still saves
me typing because all I have to do is
like change one word in a line of text
that otherwise it it generated perfectly
and like how many times are you looking
for like oh what was I doing this
morning I was looking for a begin marker
and I look was looking for an end marker
and so begin is
blah blah blah search for begin this is
the begin token and then the next line I
type e and it it completes the whole
line with end instead of begin that's a
very simple example sometimes it it sort
of
if I name my function right it writes a
5 or 10 line function
and you know python enough to very
quickly then detect the issues so it
becomes a really good dance partner then
it doesn't save me a lot of thinking but
since I'm a poor typist I'm very much
appreciative of all the all the typing
it does for me
much better actually than the the
previous generation of suggestions that
are also still built in vs code
uh where when you hit like a DOT
it it tries to guess what the type is of
the variable to the left of the dot and
then it gives you a list a pop-down menu
of what the attributes of that object
are but copilot is much much smoother
than that well it's fascinating to hear
that you use GitHub copilot uh do you
think do you worry about the future of
that
um did the automatic code generation
the increasing amount of that kind of
capability are programmers
jobs threatened or is there still a
significant role for human our
programmers jobs threatened by the
existence of stack overflow
I don't think so it helps you take care
of the boring stuff and you shouldn't
try to use it to do something that you
have no way of understanding what you're
doing yet
a tool like that is always best when the
question you're asking is please remind
me of how I do this
which I
I could do I could look up how to do it
but right now I've forgotten whether the
method is called Foo or bar or how you
what the shape of the API is does it use
a builder object or a Constructor or a
factory or
uh something else and what are the
parameters it serves that role it's like
a great assistant
but the creative work of sort of
deciding what you want what you want the
code to do is is totally yours
what do you think is the future of
python in the next 10 20 50 years 100
years you look forward you ever think
about you ever imagine a future
of human civilization or living inside
the metaverse
on Mars
humanoid robots everywhere what part
does python play in that
it'll eventually become sort of a legacy
language uh that plays an important role
but that that most people have never
heard of and uh don't need to know about
just like
all kinds of
basic structures in in
biology like mitochondria
so it permeates all of life all of
digital life but people just build on
top of it and they only know the stuff
that's on top of it yeah
you guys you build layers of
abstractions I mean most programmers
nowadays
rarely need to do binary arithmetic
right
yeah or even think about it or even
learn about it or they could go quite
far without knowing I started
building little digital circuits out of
nand gates that I built myself with
transistors and resistors so I'd sort of
I feel very blessed that
with with that start when I was a
teenager
I I learned some of the basic at least
Concepts
that that go into building a computer
and I sort of every part
I have some understanding what
what it's for and why it's there and how
it works and I can forget about all that
most of the time but I sort of
I enjoy knowing oh if you go deeper you
at some point you get to uh nand Gates
and have others and shift registers and
when it comes to the point of how do you
how do you actually make a chip out of
silicon I have no idea that's just magic
to me
but you enjoy knowing that you can walk
a while towards the lower and lower
layers but you don't need to
it's nice the other day as a sort of
a mental exercise I was trying to figure
out if I could build a
a flip-flop circuit out of uh relays
it was just sort of
trying to remember oh how does it really
relay work yeah there's like this
electromagnetic
force that pulls a switch open or shut
uh and you can have have like it can
open one switch and another shut another
and
uh you can have multiple contacts that
go at once and how many relays do I
really need to sort of represent one bit
of information can really just feed on
itself there was I don't think I I got
to the final solution but it was fun
that I
I could still do a little bit of problem
solving and thinking at that level
and it's cool how we build on top of
each other so there's people that are
just you you stood on the shoulders of
giants and there's others will stand on
your shoulders and it's it's a giant
beautiful hire yeah I feel I sort of
cover covered this middle layer of the
technology stack where and sort of
Peters out glow the
the level of of nand gates
and at the at the top I sort of I lose
track when it gets to machine learning
and then eventually the machine learning
will build higher and higher layers that
will help us understand the lowest layer
of the physics and thereby the universe
figures out how
it itself Works maybe maybe not
yeah I did I mean it's it's possible I
mean if you think of human consciousness
if that's even the right concept
it's it's interesting that that sort of
we have this super parallel brain that
does all these
incredible parallel operations like
image recognition
I recognize your face does you huge
amount of processing that goes on in
parallel there's lots of nerves between
my eyes and my brain
and the brain does a whole bunch of
stuff all at once because it's actually
a really slow circuits but there are
many of them that all work together
on the other hand when I'm speaking
everything is completely sequential
I I have to sort of string words
together one at a time
and when I'm thinking about stuff when
I'm when I'm understanding the world
I'm also thinking of everything like one
step at a time
and so we we've we've sort of we've got
all this this incredible
parallel circuitry in our brains and
eventually we use that to simulate a
single threaded much much higher level
interpreter
uh it's exactly I mean that's the
illusion of it that's the illusion
of it for us that it's a single
sequential set of thoughts and all of
that came from a single cell through the
process of embryogenesis so DNA is the
code
DNA holds the entirety of the code the
information and how to use that
information to build up an organism the
entire like
the arms how is it built yeah the brain
it's so it's you don't buy a computer
you buy like uh you buy a a seed a
diagram and then you plant the computer
and it builds itself in almost the same
way and then does the computation and
then is uh eventually dies
it gets stale but gives birth to Young
computers
more and more and gives them lessons but
they figure stuff out on their own and
over time it goes on that way and those
computers when they go to college try to
figure out how to program and they built
their own little computers they're
increasingly more intelligent
increasingly higher and higher levels of
abstractions isn't it interesting that
that you sort of you see the same thing
appearing at different levels though
because you have like
cells that that create new cells
and and eventually that builds a whole
organism but then the animal or the
plant or the human
has its own mechanism of replication
that that is is sort of connected in a
very complicated way to the mechanism of
replication of the cells and then if you
if you look inside the cell if you see
how DNA and proteins are are connected
then there is yet another completely
different mechanism whereby proteins are
mass produced
using enzymes and and and a little bit
of code from from DNA and of course
viruses break into it at that level
and while the mechanisms might be
different it seems like the nature of
the mechanism is the same and it carries
across natural languages and programming
languages
humans
uh maybe even human civilizations or
intelligent civilizations
and then all the way down to uh the
single cell organism it is it is
fascinating to see what abstraction
levels are built on top of individual
humans yeah and how you have like whole
societies
that that sort of have a similar
self-preservation
I don't know what it is Instinct nature
abstraction
as the individuals have and the cells
have and they self-replicate and breed
in different ways it's hard for us
humans to introspect it because we were
very focused on our particular layer of
abstraction but from an alien
perspective looking on Earth they'll
they'll probably see the higher level
organism of human civilization
as part of this bigger organism of life
on Earth itself in fact that could be an
organism just alone just life life life
on Earth
uh this has been a wild both
philosophical and Technical conversation
Guido you're you're an amazing human
being you're you're gracious enough to
talk to me when I was first doing this
podcast or one of the earliest first
people I've talked to uh somebody I
admired for a long time it's just a huge
honor that you did it at that time and
you do it again you're awesome thank you
Lex
thanks for listening to this
conversation with Guido Ben Rossum to
support this podcast please check out
our sponsors in the description and now
let me leave you some words from Oscar
Wilde experience is the name that
everyone gives to their mistakes
thank you for listening and hope to see
you next time