Transcript
gFEE3w7F0ww • Travis Oliphant: NumPy, SciPy, Anaconda, Python & Scientific Programming | Lex Fridman Podcast #224
/home/itcorpmy/itcorp.my.id/harry/yt_channel/out/lexfridman/.shards/text-0001.zst#text/0559_gFEE3w7F0ww.txt
Kind: captions
Language: en
the following is a conversation with
travis oliphant
one of the most impactful programmers
and data scientists ever
he created numpy
scipy and anaconda
numpai formed the foundation of
tensor-based machine learning in python
scipy formed the foundation of
scientific programming in python and
anaconda specifically with conda made
python more accessible to a much larger
audience
travis's life work across a large number
of programming and entrepreneurial
efforts has and will continue to have
immeasurable impact on millions of lives
by empowering scientists and engineers
in big companies small companies and
open source communities to take on
difficult problems and solve them with
the power of programming
plus he's a truly kind human being which
is something that when combined with
vision and ambition makes for great
leader and a great person to chat with
to support this podcast please check out
our sponsors in the description
this is the lex friedman podcast and
here is my conversation with travis
oliphant
what was the first computer program
you've ever written do you remember whoa
that's a good question i think it was in
fourth grade just a simple uh loop in
basic basic basic on an atari 800 atari
400 i think or maybe there's an atari
800 it was at a part of a class and we
just were just basic loops
to print things out
did you use go to statements um yes yes
we used go to statements
i remember in the early days that's when
i first realized there's like principles
to programming when i was told that
don't use go to statements those are bad
software engineering prints like it goes
against
what great beautiful code is i was like
oh okay there's rules to this game
i didn't see that until high school when
i took an ap computer science course
right i did a lot of other kinds of just
programming on ti but finally when i
took an ap computer science course in
pascal
wow that's that was pascal that's when i
oh there are these principles not c or c
plus plus no i didn't take c until uh
the next year in college i had a course
in c um but i haven't done much in
pascal just that ap computer science
course
now sorry for the romanticized question
but when did you first fall in love with
programming oh man good question i think
actually when i was 10 you know my dad
got us a t a timex sinclair
and uh he was excited about the
spreadsheet capability and then but i
made him get the basic the add-on so we
could actually program in basic and just
being able to write
instructions and have the computer do
something then we got a ti-99
ti-994a when i was about 12 and i would
just it had sprites and graphics and
music you could actually program to do
music
that's when i really sort of fell in
love with programming so this is a full
like a real computer with like uh
with memory and storage yeah processors
so what not the type of ti yeah the
timex sinclair was one of the very first
it was a cheap cheap like i think it was
well it was still expensive but it was
2k of memory we got the 16k add-on pack
but yeah it had memory and you could
program it you had the in order to store
your programs you had to attach a tape
drive remember that old the sound that
would play when you invented the
converted the the modems would convert
digital bits to audio files on a tape
drive still remember that sound
but that was the storage and what was
the programming language do you remember
it was basic it was basic and then they
had a visi calc and so a little bit of
spreadsheet programming busy but
mostly just some basic do you remember
what kind of things drew you to
programming was it uh
working with data was it video games and
video games
mathy stuff yeah i've i've always loved
math and
a lot of people think they don't like
math because i think when they're
exposed to it early they uh it's about
memory
you know when you're exposed to math
earlier you have a good short term
memory members timetables
and i i do have a reasonably i mean not
perfect but a reasonably long um little
short-term memory buffer
and so i did great at times tables i
said oh you're good at math but i
started to really like math just the the
problem solving aspect and so computing
was
problem solving
applied
and so that's always kind of been the
draw kind of coupled with the
mathematics
did you ever see the computer as like an
extension
of your mind like something able to
achieve not till later okay yeah no not
then it's just like a little set of
puzzles that you can play with and you
can you can play with math puzzles and
yeah it was it was too rudimentary early
on like it was sort of
yeah it was too it was a lot of work to
actually take
a thought you'd have and actually get it
implemented and that's still work but
it's getting easier and
so yeah i would say that's definitely
what's attracted me to python is that
that was more real
right
i could think in python
speaking a foreign language i only speak
another language fluently besides
english which is spanish and i remember
the day when i would dream in spanish
and you start to think in that language
and then you actually i do definitely
believe that language
limits or expands your thinking
uh there are some languages that
actually lead you to certain
thought processes yeah like uh
so i speak russian fluently and that's
certainly uh
a language that leads you down certain
thoughts
well yeah i mean there's a um
there's a history of
the two world wars right of the
of millions of people starving to death
or near to death throughout its history
of suffering of
injustice like this promise sold to the
people and then
the the carpet or whatever swept from
under them it's like broken promises and
all of that pain and melancholy is in
the language the sad songs the sad
hopeful songs the over romanticized like
i love you i hate you the the sort of
the swings between all the various
uh spectrums of emotion so that's all
within the language the way it's twisted
uh poach there's a there's a there's a
strong culture of rhyming poetry so like
the bards like this this thing there's a
musicality to the language too did
dostoevsky write in russian
yeah so like
yes
all the uh
[Laughter]
all the ones that i know about which are
translated and curious how the
translations so dostoevsky
did not use
the musicality of the language too much
so it actually translates pretty well
because it's so philosophically dense
that the story does a lot of the work
but there's a bunch of things that are
untranslatable certainly the poetry is
not translatable
i actually have a few
conversations coming up offline and also
in this podcast with people who've
translated dusty esky and that's in for
people who worked
who work in this field know how
difficult that is sometimes you can
spend
you know months thinking about a single
sentence right in in the context like
because there's just the magic captured
by that sentence and how do you
translate it just in the right way
because those words can be um
can be really powerful there's a famous
line
beauty will save the world from
dostoyevsky
you know there's so many ways to
translate that and you're right the
language gives you the tools with which
to tell the story but it also leads your
mind down certain trajectories and paths
to where over time as you think in that
language you become a different human
being yes yeah yeah that's a fascinating
reality i think that i know people have
explored that but it's just rediscovered
well we don't we live in our own like
little pockets like this is the sad
thing
is
i feel like unfortunately given time and
given getting older i'll never know
the uh china the chinese world because i
don't truly know the language same with
japanese i don't truly know japanese and
portuguese and brazil that whole south
american continent like yeah i'll go to
brazil and argentina but will i truly
understand the people
if i don't understand the language it's
it's sad because um
i wonder how much how many geniuses were
missing because uh because so much of
the scientific world so much of the
technical world is in english
and so much of it might be lost because
they're they just we don't have the
common language i completely agree i'm
very much in that vein of
there's a lot of genius out there that
we miss and it's sort of sort of
fortunate when it when it bubbles up
into something that we can understand or
process there's a lot we miss
so i tend to lean towards really loving
uh democratization or things that
empower people or you know i
very resistant to sort of authoritarian
structures
fundamentally for that reason it well
several reasons but it just hurts us
yeah we're worse off
so speaking of languages that empower
you so
python was the first language for me
that um that i could i really enjoyed
thinking in yeah as you said sounds like
you shared my experience too so when did
you first do you remember when you first
kind of connected with python maybe even
fell in love with python it's a good
question it was a process it took about
a year i first encountered python in
1997 i was a graduate student studying
biomedical engineering at the mayo
clinic
and i had previously i've been involved
in
taking information from satellites i was
an electrical engineering student
used to taking information and trying to
get something out of it doing some data
processing information out of it and i'd
done that in matlab i'd done that in
perl i've done that in you know
scripting of on a vms there's actually a
vax vms system and they had their own
little scripting tools uh around fortran
done a lot of that and then
as a graduate student i was looking for
something
and encounter python and because python
had an array had two things that made me
not filtered away because i was
filtering a bunch of stuff as yorick i
looked at yorick i looked at a few other
languages throughout there at the time
in 1997 but it it had arrays there's a
library called numeric that had just
been written in 95 like not very
not not too much earlier by an mit alum
uh jim huganin
you know and i went back and read the
mailing list to see the history of how
it grew and there was a very interesting
it's fascinating to do that actually to
see how this emergent
cooperation unstructured cooperation
happens in the open source world that
led to a lot of
this collective uh programming which is
something maybe we might get into a
little later but what that looks like
what gap did numeric fill merrick fill
the gap of having an array object so
there was no array out there was no
array there was a one dimensional
byte concept but there was no
uh n-dimensional two three
four-dimensional tensor they call it now
i'm still in the category that a tensor
is another thing and it's just an md
array we should call it but yeah kind of
lost that battle
there's many battles in this world some
of which will win some we lose that's
exactly right
so and but it was uh it had no math to
it so numeric had math and a basic way
to think in a race so i was looking for
that and it had complex numbers
a lot of programming languages and you
can see it because you know if you're
just a computer scientist you think ah
complex numbers just two floats so you
can people can build that on
but in practice a complex number as a as
one of the significant algebras that
helps connect a lot of physical and
mathematical ideas particularly fft for
an electrical engineer
and and it's a really important concept
and not having it means you have to
develop it several times
and those times may not share an
approach one of the common things in
programming one things programming
enables is abstractions
but when you have shared abstractions
it's even better it sort of gets the
level of language of actually we all
think of this the same way which is both
powerful and dangerous
right because
powerful in that we now can quickly make
bigger and higher level things on top of
those abstractions dangerous because it
also limits us as to the things we left
maybe left behind in producing that
abstraction which is at the heart of
programming today and actually building
around the programming world so i think
it's a fascinating philosophical topic
yeah it will continue for many years i
think
as it builds more and more and more
abstractions yes i often think about you
know we have we have a world that's
built on these abstractions that were
they the only ones possible yeah
certainly not but they led to
now it's very hard to to do it
differently yeah like there's an inertia
that's very hard to you know
push out push away from there's that has
implications for things like you know
the julia language which you have heard
of i'm sure and
i've met the creators and i like julia
it's a really cool language but they've
struggled to kind of
against the just the tide of like this
inertia of people using python and
and you know there's strategies to
approach that but nonetheless it's a
it's a phenomena and sometimes so i love
complex numbers and i love to raise so i
looked at python
and then i had the experience i did some
stuff in python and i was just doing my
phd so i was out my focus was on
i was actually doing a combination of
mri and ultrasound and looking at a
phenomena called elastography which is
you push waves into the body
and observe those waves like you can
actually measure them
and then you
do mathematical inversion to see what
the elasticity is and so that's the
problem i was solving is how to do that
with both ultrasound and mri i needed
some tool to do that with so i was
starting to use python in 1997 in 98 i
went back looked at what i'd written and
realized i could still understand it
which is not the experience i'd had when
doing pearl in 95. right i'd done the
same thing and then i looked back and i
forgotten what it was even saying
now you know i'm not saying it so i that
that means hey this may work i like this
this is something i can
retain without becoming an expert per se
and so that led me to go i'm going to
push more into this and
then that 98 was kind of the when i
started to fall in love with python i
would say
a few peculiar things about python so
maybe compared to pearl compared to some
of the other languages so there's no
braces yeah
so you
space is used indentation i should say
is used as part of my language yeah
right uh so
did you
i mean that that's quite a leap uh were
you comfortable with that leap or were
you just very open-minded good question
i was open-minded so it i was
cognizant of the concern and it
definitely has
it has specific
challenges you know cut and pasting for
example you're cutting pasting code and
if your editors aren't supportive of
that if you're put into a terminal
and particularly in the past when
terminals didn't necessarily have the
intelligence to manage it now now now
ipython and jupyter notebooks handle it
just fine so there's really no problem
but in the past it created some
challenges formatting challenges also
mixed tabs and spaces if you're if
editors weren't you weren't clear on
what was happening you would have these
issues so there were really concrete
reasons about it that i heard and
understood i never really encountered a
problem with it
personally like it was occasional
annoyances but
i really like the fact that it didn't
have all this extra characters right
that
these extra characters didn't show up in
my visual field when i was just trying
to process understanding a snippet of
code yeah there's a cleanness to it but
i mean the idea is supposed to be that
pearl also has a cleanness to it because
of the minimalism of like how many
characters it takes to express a certain
thing yeah so it's very compact yeah but
what you realize with that compactness
comes
there's a culture that uh prizes
compactness and so the code gets more
and more compact and less and less
readable to a point where it's like
uh like to be a good programmer in pearl
you write code that's basically
unreadable right there's a culture like
correct and you're proud of it yeah
you're proud of it
right exactly and it's like feels good
and it's really selective like it means
you have to be an expert in perl to
understand it yeah whereas python was
allowed you not to have to be an expert
you'd have to take all this brain energy
you could leverage what i said you could
leverage your english language center
which you're using all the time i've
wondered about other languages
particularly non-uh latin based
languages you know
latin-based languages where the
characters are at least similar i think
people have an easier time but i don't
know what it's like to be a japanese or
a chinese person trying to
learn a different um
different syntax like what would
computer programming look like in a in
that i haven't looked at that at all but
it certainly doesn't you know leveraging
your your chinese language center i'm
not sure python or any programming does
that
but that was a big deal the fact that it
was accessible i could be a scientist
what i really liked is
many programming languages really demand
a lot of you and you can get a lot you
know you do a lot if you learn it but
python enables you to do a lot without
demanding a lot of you
there's a there's nuance to that
statement but it certainly was it's more
accessible so more people could actually
as a as a scientist as somebody or
engineer
who was trying to solve another problem
besides point programming i could still
use this language and get things done
and and be happy about it i was also
comfortable in c
at that time and matlab you did a little
matlab i did a lot before that exactly
so i was comfortable in
those three languages were really the
tools i used during my studies and
schooling
um but to your point about language
helping you think one of the big things
about matlab is it was and apl before it
i don't know if you're a you remember
apl
apl is uh actually the predecessor of
array-based programming which i think is
really an underappreciated
if i talk to people who are just steeped
in computer programming computer science
like most the people that microsoft has
hired in the past for example
microsoft as a company generally did not
understand array-based programming like
culturally they didn't understand it so
they kept missing the boat kept missing
the understanding of what this what this
was
they've gotten better but there's still
a whole culture of folks that doesn't
programming that's yeah you know that's
that's systems programming or web
programming or lists and maps and you
know what about an n-dimensional array
oh yeah that's just an implementation
detail well
you can think that but then actually if
you have that as a construct you
actually think differently apl was the
first language to understand that it was
in the 60s
right the challenge of apl is apl had
very dense not only glyphs like new
characters new glyphs they even had a
new keyboard because to produce those
glyphs this is back in the early days of
computing when you know the
query keyboard maybe wasn't as
established like what we could have a
new keyboard no big deal
but it was a big deal and it didn't
catch on and the language apl
very much like pearl as people would
pride themselves on how how much could
they write the game of life in
30 characters of apl apl has characters
that mean
uh summation and uh they have adverbs
you know they have adjectives and these
things called adverbs which are like
methods like reduction reduction it
would be an adverb on an ad operator
right so
but doing using these tools you could
construct and then you start to think at
that level you think in end dimensions
it's something i like to say and you
start to think differently about data at
that point you know now you're
it really helps yeah i mean
outside of programming if you really
internalize linear algebra as a course
i mean it philosophically allows you to
think of the world differently yes it's
almost like liberating you don't have to
you don't have to think about the
individual numbers in the n-dimensional
array you can think of it as an object
in itself and all of a sudden this world
can open up
now you're saying matlab and apl were
like the early c
i don't know if many languages got that
right ever no no no they didn't still
even still i would say i mean
numpy is a as a inheritor of the
traditions that i would say apl j
was a another version that was what it
did is not have the glyphs just have
short characters but still a latin
keyboard could type them and then
numeric inherited from that in terms of
let's add arrays plus broadcasting plus
methods reduction even some of the
language like rank is a concept that's
in that was in python it's still in
python
for the number of dimensions right
that's that's different than say the
rank of a matrix which people think of
as well so it's it came from that
tradition but numpy
is a very pragmatic practical tool uh
numpy inherited from numeric and we can
get to where numpy came from which is
the current array
at least current as of
2016-17 now there's a ton of them over
the past two or three years but we can
get into that too so if we just sort of
linger on the early days of
what was your favorite feature of python
do you remember like what yeah
it's so interesting
to linger on like the
what
what really makes you connect with the
language i'm not sure it's
obvious to introspect that no it isn't
and i've thought about that at some
length i'm not i think definitely the
fact that i could read it later yeah
that i could use it productively without
becoming an expert and the other
language i had to put more effort into
right
that's like an empirical observation
like you're not analyzing any one aspect
of the language it just seems
time after time you look back it's
somehow readable it's somewhat readable
then it was sort of i could take
uh executable english yeah and translate
it to python more easily like i didn't
have to go there was no translation
layer
as an engineer or as a scientist i could
think about what i wanted to do and then
the syntax wasn't that far behind it
yeah right now there was some there have
some there's some warts there still it
wasn't perfect like there's some areas
where i'm like ah it'd be better if this
were different or if this were different
some of those things got out of the
language too i was really grateful for
some of the early pioneers in the python
ecosystem back because python got
written in 91 is when the first version
came out but guido was very open to
users and one of the sets of users were
people like jim hugin and david asher
and paul dubois
and
conrad hinson these were people that
were on the main list and they were just
asking for things like hey we really
should have complex numbers in this
language so let's you know there's a j
there's a one j right and the fact they
want the engineering root of j is
interesting
i don't i don't think that's entirely
favorite engineers i think it's because
i is so often used as the index of a for
loop
so i think that's actually
probably right i mean there's there's a
pragmatic aspect like the complex
numbers were there i love that the fact
that i could write nd arrays constructs
and that reduction was there very simple
to write summations and and and
broadcasting was there i could do
addition of whole arrays
um so that was cool those were something
i loved about it
i don't know what to start talking to
you about because you've been you've
created so many incredible projects that
basically changed the whole landscape of
programming but okay let's start with uh
let's go chronologically
with scipy you create a scipy over two
decades ago now yes right yeah i said
i'd love to talk about sci-fi sci-fi was
really my baby
what is it
uh what was its goal what is its goal
how does it work yeah fantastic so scipy
was effectively here i'm using python to
do
stuff that i previously used matlab to
use and i was using numeric which is an
array library that made a lot of it
possible but there's things that were
missing like i didn't have an ordinary
differential equation solver i could
just call right i didn't have
integration hey i wanted to integrate
this function okay well i don't have
just a function i can call to do that
um these are things i remember being
critical things that i was missing
optimization i just want to pass a
function to an optimizer and have it
tell me what the optimum value is
uh those are things like well why don't
we just write a library that adds these
tools
and i started a post on the mailing list
and they're previously been you know
people have discussed i remember conrad
hinson saying wouldn't it be great if we
had this optimizer library or david ash
would say this stuff and and i'm you
know i'm a
ambitious i am this is the wrong word
and eager
and
uh
probably more time than sense i was you
know poor graduate student
uh my wife thinks i'm working on my phd
and i am but part of a phd that i loved
was the fact that it's exploratory
you're not just you know
taking orders fulfilling a list of
things to do you're trying to figure out
what to do and so i thought well you
know i'm writing tools for my own use
and a phd so
i'll just start this project and so in
99 98 was when i first started to write
libraries for python particularly when i
fell in love with python 98 i thought
well there's just a few things missing
like oh i need a reader to read dicom
files i was in medical imaging.com was a
format that i want to be able to load
that into python okay how do i write a
reader for that so i wrote something
called
it was an i o package right and that was
my very first extension module which is
c so i wrote c code to extend python so
that the pos in python i could write
things more easily that that combination
kind of hooked me it was the idea that i
could here's this powerful tool i can
use as a scripting language and a high
level language to think about but that i
can extend easily
easily in c that easily for me because i
knew enough c right and then guido had
written a link i mean the only the hard
part of extending python was something
called the way memory management works
and you have to reference counting and
so there's there's a tracking of
reference counting you have to do
manually
and if you don't you have you have
memory leaks and uh so that's hard plus
then c you know it's just much more you
have to put more effort into it it's not
just i have to now think about pointers
and have to think about stuff that
is different i have to kind of you're
like putting a new cartridge in your
brain like you're okay i'm thinking
about mri now i'm thinking about
programming and there are distinct
modules you end up having to think about
so it's harder when i was just in python
i could just think about mri and
high-level writing
but i could do that and that kind of i
liked it i found that to be enjoyable
and fun and so i ended up oh well let me
just add a bunch of stuff to python to
do integration
well and the cool thing is is that you
know the power of the internet i just
looking around and i found oh there's
this net lib which has
hundreds of fortran routines that people
written in the 60s and the 70s and the
80s in fortran 77 fortunately it wasn't
for trend sixties i've been imported to
fortran 77
and 1477 is actually a really great
language
fortune 90 probably is my favorite 4chan
because it's also it's got complex
numbers got a raise and it's pretty high
level now the problem with it is you'd
never want to write a program in fortune
90 or fortune 77 but it's totally fine
to write a subroutine in
right and so and then 4chan kind of got
a little off course when they tried to
compete with c plus plus but at the time
i just want libraries to do something
like oh here's an order different
equation here's integration here's run
cut integration
already done i don't have to think about
that algorithm and you could but it's
nice to have somebody who's already done
one and tested it and so i sort of
started this journey in 98 really if you
look back at the main list there's sort
of this this productive era of me
writing an extension module to connect
wrench cut integration to python
and making an ordinary digital equation
solver and then
releasing that as a package so we could
call od pack i think i called it then
quad pack and then i just made these
packages eventually that became
multi-pack because they're originally
modular you can install them separately
but a massive problem in python was
actually just getting your stuff
installed
at the time releasing software for me
like today it's people think what does
that mean well then it meant
some poorly written web page i had some
bad web page up and i put a tarball just
a gzip tar ball of source code that was
the release
but okay can we just stand that because
that the community aspect of creating
the package and sharing that yes
that's rare
that to have to both have
at that time so like that was pretty
early yeah so well not not rare maybe
maybe you can uh correct me on this but
it seems like in the scientific
community so many people you were
basically solving the problems you
needed to solve
to process the particular application uh
the data that you need and
to also have the mind that i'm going to
make this usable for others
that's um i would say i was inspired i'd
been inspired by linux i've been
inspired by you know linus linus and him
making his code available and i was
starting to use linux at the time and i
went this is cool so i had kind of been
previously primed that way and generally
i was i was into science because i liked
the sharing notion i like the idea of
hey let's if collectively we build
knowledge and share it we can all be
better off okay so you weren't energized
by that so it's energized value already
yeah right and i can't deny that i was
i'm sort of uh had this very
i liked that part of science that part
of sharing and then all of a sudden oh
wait here's something and here's
something i could do
and then i slowly over years learned how
to share better so that you could
actually engage more people faster one
of the key things was actually giving
people a binary they could install
right so that wasn't just your source
code good luck
compile this and then get it compiled
ready to install you just you know so in
fact a lot of the journey from 98 even
through 2012 we used to when i started
anaconda was about that like it's why uh
you know it's really the key as to why
the scientists with dreams of doing mri
research ended up starting a software
company that installs
software i work with
a few folks now that don't program
like on the creative side and the video
side the audio side and because my whole
life is running on scripts i have to try
to get them to i'm have now the task of
teaching them how to do python enough
yeah to run the scripts and so i've been
actually facing this whether it's on the
condor some
with the task of how do i minimally
explain basically to my mom how to write
a python script
and it's an interesting challenge
it's a to-do item for me to figure out
like what is the minimal amount of
information i have to teach what are the
tools you use that one you enjoy it to
your effect of it they're related to two
related questions and then the debugging
like the the iterative process of
running the script to figure out what
the error is maybe even for some people
to do the fix yourself yeah so do you
compile it do this like how do you
distribute that code to them and it's
interesting because i think
it's exactly what you're talking about
if you increase the circle
of empathy that the circle of people
that are able to use your programs
you increase it its like effectiveness
and its power and so yeah you have to
think
you know can i write scripts can i write
programs that can be used by biomedical
engineers by all kinds of
people that don't know programming and
actually maybe plan to see
have them catch the bug of programming
so that they start on their journey
that's a huge responsibility and
ultimately has to do with the amazon
one-click buy
like how how frictionless can you make
the early steps frictionless is actually
really key to growing any community is
every any friction point you're just
going to lose you're going to lose some
people yeah right now sometimes you may
want to
intentionally do that if you're early
enough on you need
a lot of help you need people who have
the skills you might actually it's
helpful you don't necessarily have too
much too many users as opposed to
contributors if the co if you're early
on
anyway there's uh uh sci-fi started in
98 but it really emerged as this
collection of modules that i was just
putting on the net people were
downloading and they you know
i think i got 100 users right by the end
of that year but there but the fact that
i got 100 users and more than that
people started to email me with fixes
like and that was actually intoxicating
right that was the
that was the you know here i'm writing
papers and i'm giving conferences and i
get people would say hello but yeah good
job but mostly it was you're reviewed
with
it it's competitive yeah right you
publish a paper and people were like oh
it wasn't my paper you know
i was starting to see that sense of
academic
life where it was so much i thought
there was a cooperative effort but it
sounds like we're here just to
one-up each other right and
you know it's not it's not true across
the board but a lot of that's there but
here in this world i was
getting responses from people all over
the world
uh you know i remember pierrot peterson
in estonia right was one of the first
people and he sent me back this make
file because the first thing it is yeah
your build thing stinks and here's a
better make file now it was a complex
make file i think i never understood
that make file actually but it worked
and it did a lot more and so then thanks
this is cool and that was my first kind
of engagement with community
development but you know the process was
he sent me a patch file i had to upload
a new tar ball and i just found i really
loved that and the style back then was
here's a main list it was very it wasn't
as it certainly were the tools that are
available today it was very early on but
i really started that's the whole year i
i think i did about seven packages that
year right and then by the end of the
year i collected them into a thing
called multi-pack so 99 there was this
thing called multi-pack and that's when
a high school student knows a high
school student at the time a guy named
robert kern
took
that package and made a windows
installer
right and then of course a massive
increase of usage so by the way most of
this development was under linux yes yes
it was on linux i was a linux developer
doing it on munix box i mean at the time
i was actually getting into i had a new
hard drive he did some kernel
programming to to make the hard drive
work i mean not programming but
modification to the kernel so i could
actually hard drive working
i i love that aspect of it i was also in
you know at school i was building a
cluster i took mac computers like uh and
you put yellow dog linux on them uh they
were at the mayo clinic they were just
they're all these macs that were older
they were just getting rid of and so i
kind of got permission to go grab them
together i put about 24 of them together
in a cluster and a cabinet
and put yellow dog linux on them all and
i wrote a c plus plus um
program to do mri simulation that was
what i was doing
at the same time for my day job so to
speak so i was loving the whole process
at the same time i was oh i need to
ordinary differential equation that's
why ordinary difference equations were
key was because that's the heart of a
block equation for simulated mri is a
ode solver and so that's
but i actually did that
it doesn't happen at the same time
that's why it kind of what you're
working on and what you're interested in
they're coinciding i was definitely
scratching my own itch
in terms of building stuff and uh which
helped in the sense that i was using it
for me so at least had one user yeah i
had one person who's like well i know
this is better i like this interface
better and i had the experience of
matlab to guide some of what those apis
might look like but you know you're just
doing yourself you're building all this
stuff but with the windows installer it
was the first time i realized oh yeah
the binary installer really helps people
and so
that led to spending more time on that
side of things so around 2000 so i
graduated my phd in 2000 end of year
2000. so
99 doing a lot of work there 98 do a lot
of work there 99 kind of spending more
time on my phd you know helping people
use the tools thinking about what i want
to go from here there was a company
there's a guy actually eric jones and
travis vott they were two friends who
founded a company called nthot it's here
in austin still here
and they
eric contacted me at the time when i was
a uh
i was a graduate student still and he
said hey why don't you come down we want
to build a company
you know we want we're thinking of you
know a scientific
company and we want to take what you're
doing and kind of add it to some stuff
that he'd done he'd written some tools
and then pierre peterson had done ftp
let's come together and build pull this
all together and call it sci-fi
so that's the origin of the scipy brand
it came from you know multi-pack and a
whole bunch of modules i'd written plus
a few things from some other folks and
then pulled together in a single
installer
sci-fi was really a distribution of
python masquerading as a library
how did you think about sci-fi in
context of python in context of numeric
like what we saw scipy as a way to make
an r d environment for python like use
python uh dependent on numeric so
numeric was the array library we
depended on and then from there ext
extend it with a bunch of modules that
allowed for and at the time the original
vision of scipy was to have
plotting was to have you know replied
you know the rebel environment and kind
of a whole really a whole data
environment
um that you could then install and get
going with and that was kind of the
thinking
it didn't really evolve that way right
it sort of had a but one
it's really hard to do massive scale
projects in a
with with open source collectives
actually there's a there's sort of an
intrinsic uh cooperation limit
as to which you know too many cooks in
the kitchen you know you can do amazing
infrastructure work when it comes down
to bringing it all together into a
single deliverable that actually
requires a little more
a little more product management that is
not
it doesn't really emerge from the same
dynamic so it struggled you know
struggled to get
almost too many voices it's hard to have
everybody agree you know consensus
doesn't really work at that scale you
end up with politics you know with the
same kind of things that's happened in
large organizations trying to decide on
what to do together um
so consensus building was still was was
challenging at scale as more people came
in right early on it's fine because
there's nobody there and so it works but
then as you get more successful the more
people use it all of a sudden oh there's
this this
scale at which this doesn't work anymore
and we have to come up with different
approaches so sci-fi came out officially
in 2001 was the first release most the
time i remember the days of getting that
release ready it was a windows installer
and there was there were bugs on how you
know the windows compiler handled
complex numbers and you were you're
chasing segmentation faults and it was
it's a lot of work there's a lot of
effort had nothing to do with my
area of study at the same time i just
got an offer so he wondered if i wanted
to come down and help him start that you
know start that company with his friend
and i at the time i was like i was
intrigued but i was squaring a path an
academic path and i just got an offer to
go and teach at my alma mater so i took
that tenure track position
and saipo and kind of then i started
work on sci-fi as a professor too
okay so that that's i left i've got the
mayo clinic graduated wrote my thesis
using sci-fi wrote you know there's
there's images that were created
now the plotting tool i used was
something from yorick actually it was a
plotting a plt kind of a plotting
language that i used york is a
programming language it was a
programming language had a plotting tool
dislin
it we had integration to dislike i ended
up using dislin plus some some of the
plotting from yorick
linked to from python anyway it was a
people don't plot that way now but this
is before and scipy was trying to add
plotting yeah right
it didn't have much success really the
success of plotting came from john
hunter
who had a similar experience to my
experience my kind of maverick
experience as a person just trying to
get stuff done and kind of having more
time than than money maybe right and
john hunter created what not plot lube
he's the creator of map yeah so john
hunter was uh you know he wasn't a
student at the time but he was an actor
he was working in quant field and he
said we need better plotting so he just
went out and said cool i'll make a new
project and we'll call it matplotlib and
he released in 2001. about the same time
that scipy came out and it was separate
library separate install
use numeric sci-fi use numeric
and so scipy you know 2001 released
scipy and then m-thot created a
conference called scipy which
was brought people together to talk
about the space another conference is
still ongoing it's one of the favorite
conferences of a lot of people because
it's
it's changed over the years but early on
it was you know a collection of 50
people who care about
scientists mostly
practicing scientists who want to care
about
coding and doing it well and not using
matlab
i remember being driven by you know i
like matlab but i didn't like the fact
that like so i'm not opposed proprietary
software i'm actually not an open source
zealot i love open source for the what
it brings but i also see the role for
proprietary software what i didn't like
was the fact that i would develop code
and publish it and then effectively
telling somebody here to run my code you
have to have this proprietary software
right and there's also culture around
matlab
as much because i've talked to a few
folks
math works great it's my life yeah
i mean there's just a culture they try
really hard but it's just there's this
corporate ibm style culture that's like
or whatever i don't don't want to say
negative things about ibm or whatever
but there's a
no it's it's really that connection
something i'm in the middle of right now
is is the business of open source and
how do you connect the ethos of
cooperative development with the
necessity of
of creating profits
right and like right now today you know
i'm still i'm still in the middle of
that that's actually the early days of
of me exploring this question
because i was writing sci-fi i mean as
an aside i also had so i had three kids
at the time i have six kids now i got
married early wanted a family uh i had
three kids and i remember reading i
remember read richard stallman's post
and i was i was a fan of stallman i
would read his work i liked this
collective ideas he would have certainly
the ideas on ip law i read a lot of
stuff but then he said you know
okay
well how do i make money with this how
do i make a living how do i pay for my
kids all this stuff was in my mind a
young graduate student making no money
thinking i got to get a job and he said
well you know i think just be like me
and don't have kids right that's just
don't don't that's his take on this that
was just that was that was the what what
he said in that moment right that's the
thing i read and i went
okay this is a train i can't get out
yeah
there has to be a way to preserve the
culture of open source and still be able
to make sufficient money to feed you yes
exactly there's got to be well so that
actually led me to a study of economics
because at the time i was ignorant and
it really was i'm actually i'm
embarrassed for educational system that
they could let me and i was
valedictorian in my high school class
and i did super well in college and like
academically i did great
right but the fact that i could do that
and then be clueless about this key part
of life
it led me to go there's a problem like i
should i should have learned this in
fifth grade i should learn this in
eighth grade like everybody should come
out with a basic knowledge of economics
you're an interesting example because
you've created tools that uh change the
lives of probably millions of people and
the fact that you don't understand at
the time of the creation of those tools
the basics economics of how like to
build up giant system is a problem yeah
it's a problem and so i during my phd at
the same time this is actually in 98 99
at the same time i was in the library i
was reading books on capitalism i was
reading books on marxism i was reading
books on you know what is this thing
what does it what does it mean yeah and
i encountered a basically what i
encountered a set of writings from
people that said they were the
inheritors adam smith but adam smith for
the first time right which is the wealth
of nations and kind of this notion of
emergent
emergent uh societies and realized oh
there's this whole world out here of
people
and in the challenge the economics is
also political
like because economics you know
people
different parties running for office
they'll
they want their economic friends they
want their economist to back them up
right or to to be there
to be their magicians like the magicians
in pharaoh's court right the people that
are going to say hey this is you should
listen to me because i've got the expert
who says this
and so it gets really muddled right but
i was looking at from as a scientist as
a scientist going what is this space
what does this mean how do people how
does paris get fed how does how what is
money how does it work i found a lot of
writings i really loved i found some
things that i really loved and i learned
from that it was writings from people
like von mises he wrote a pre-order
paper in 1920 that still should be read
more than it is it's got i mean it was
the economic calculation problem of the
socialist commonwealth it's basically in
response to the bolshevik revolution in
1917. and his basic argument was it's
not going to work to not have private
property you're not going to be able to
come up with prices the bureaucrats
aren't going to be able to determine how
to allocate resources without a price
system and a price system emerges from
people making trades and they can only
make trades if they have authority over
the thing they're trading
and that that that creates information
flow that you just don't have
if you try to top down it right right
it's like huh that's a really good point
yeah the prices have a signal that's
used and it's important to have that
signal
when you're trying to build a community
of productive people like you would in
the software engineering yeah the prices
are actually an important
signaling mechanism yeah right and that
money
is just a bartering tool right so this
is the first time i've encountered any
of this concept right and the fact that
oh this is actually
really critical like it's so critical to
our prosperity and
that
we're dangerously
not learning about this not teaching our
children about this you know so you had
the three kids you had to make some
stuff how to make some money right i had
to figure it out but i didn't really
care i mean i was never i've never been
driven by money just need it right right
to eat so what how did that resolve
itself in terms of sci-fi
so i would say it didn't really resolve
itself it sort of started a journey that
i'm continuing on i'm still on i would
say i don't think it resolved itself but
i will say
i i went in wide eyes wide open like i
knew that there were problems with you
know um giving stuff away and creating
uh the the ex market externalities the
fact that yeah people might use it and i
might not get paid for it and i'll have
to figure something else out to get paid
like at least i can say i'm not bitter
that a lot of people have used stuff
that i've written and i haven't
necessarily benefited economically from
it like yeah i've heard other people be
you know bitter about that when they
write or they talk like oh i should have
got more value out of this and i'm also
i want to create systems that let people
like me who might have these desires to
do things let them benefit so it
actually creates more of the same
not to turn on your bitterness module
but
there's some aspect i wish there was
mechanisms for me to reward whoever
created scipy and numpy because it
brought so much joy to my life i
appreciate that i mean the tip dark
notion was there i appreciate that and i
think but there should be a very there's
surely mechanism mechanism i totally
agree i would love to talk about some of
the ideas i have because i actually came
across i think i've come up with some
interesting notions that could work but
they'll require
you know anything that will work takes
time to emerge right like things don't
just turn overnight that's definitely
one thing i've also understood and
learned
is any fixes
that's why it's kind of funny we often
give credit to you know oh this
president gets elected and oh look how
great things have done
and
i saw that when when i had a transition
in a condo when a new ceo came in right
and it's like the success that's
happening there's an inertia there yeah
right and sometimes the decision you
made like 10 years before is the reason
why the successes see right exactly so
we're sort of just running around taking
credit for stuff credit assignment has
like a delay to it yes
that this makes the credit assignment
basically wrong more than right wrong
more than right exactly and so i'm like
oh this is you know that's the stuff i
would i would read a ton about you know
early on so i don't i feel like i'm with
you like i want the same thing i want to
be able to and honestly not for
personally i've been happy i've been
i've been happy i feel like i don't have
any i mean we've been done reasonably
okay but i've had to pursue it like
that's that's really what started my
um trajectory from academia is reading
that stuff letting me say oh
entrepreneurship matters
so i love software but entre but we need
more entrepreneurs and i want to
understand that better so once i kind of
had that that virus infect my brain
it even though i was on a trajectory to
go to a tenure track position at a
university
and i was there for six years i was kind
of already out the door
when i started and we can get into that
but yeah um what can i just ask a quick
question on
is there some design principles that
were in your mind around sci-pi like is
there some key ideas that were just like
sticking to you that this is this is the
fundamental ideas yeah i would say so i
would think it's basically accessibility
to scientists like give them give
scientists and engineers tools they
don't have to think a lot about
programming so give them really good
building blocks give them functions that
they want to call and
sort of just the right length of
spelling
you know
there's a
one tradition in programming where it's
like you know make very very long names
right and you can see it in some
programming languages where the names
get you know take half the screen and
i in
in the 4chan world characters would have
to be six six letters early on right and
that's way too too much too too little
but i was like
i like to have names that were
informative but short so even though
python
well this is a different conversation
but
documentation is doing some work there
so when you look at great scientific
libraries and functions there's
there's a richness of documentation that
helps you get into the details
the first glance that a function gives
you the intuition of all it needs to do
by looking at the headers and so on but
to get the depths of all the
complexities involved all the options
involved documentation does something
else documentation is essential yeah so
that was actually a
so we thought about several things one
is we wanted plotting we wanted
interactive environment we wanted good
documentation these are things we knew
we wanted the reality is those took
about
10 years to evolve
right given the fact that we didn't have
a big budget it was all volunteer labor
it was sort of
um
when nthot got created and they started
to you know try to find
projects people would pay for pieces and
they were able to fund some of it
not nearly enough to keep up with what
was necessary
no criticism just simply the reality i
mean it's it's hard to start a business
and then do consulting and also promote
an open source project that's still
fairly new
cyborg is fairly niche
we stayed connected all while i was a
student sorry a professor i went to byu
and started to teach electrical
engineering all the applied math courses
i loved teaching signal processing
probability theory electromagnetism i
was the if you look at rate my professor
which my kids love to do
i wasn't
like i got some bad reviews because
people
what was the criticism um i would speak
too high too high of a level like i
definitely had a calibration problem
coming out of uh graduate work
where i hate to be condescending to
people like i really have a ton of
respect for people fundamentally like my
fundamental thing is i respect people
sometimes that can lead to a i was i was
thinking they were
they had more knowledge than they did
and so i would just speak at a very high
level yeah assume they got it but they
need to rise to the standard that you
set i mean that's one of the some of the
greatest teachers do that yeah and i
agree and that was kind of what was
inspiring me but but you know you also
have to
i i cannot say i was uh i was articulate
to some of the greatest teachers right i
was you know like one one classic
example when i first taught at byu my
very first class it was overheads
transparencies overheads
before projectors are really that common
i saw transparencies i'm writing my
notes out i go in room's half dark
i just blaring through these
transparencies here it is here it is
here it is
and i gave a quiz after two weeks
nobody knew anything
nothing i had gotten anywhere
and i realized okay i'm not this is not
working so i took put away the
transparencies and i turned around just
started using the chalkboard
and what it did is it slowed me down
right the chalkboard just slowed me down
and gave people time to process and to
think and then that made me focus my
writing wasn't great on their chalkboard
but
i really love that part of like the
teaching so that that entered scipy's
world in terms of we always understood
that there's a didactic aspect of sci-fi
kind of how do you take the knowledge
and then produce it the challenge we had
was the scope like ultimately scipy was
everything right and so 2001 when it
first came out people were starting to
use it no this is cool this is a tool we
actually use
at the same time 2001 time frame there
was a little bit of um like the hubble
space telescope the folks at hubble has
started saying hey python we're gonna
use python for processing images from
hubble and so perry greenfield was a
good friend and running that program and
he
had called me before i left to byu and
said you know
we want to do this but numeric actually
has some challenges in terms of you know
it's not the array doesn't have enough
types
uh we need more operations you know
broadcast needs to be a little more
settled uh they wanted record arrays
they wanted you know record arrays are
like a data frame but a little bit
different but they wanted more
structured data so
he had called me even early on then and
they said yeah hey would you want to
work on something to make this work i
said yeah i'm interested but i'm going
here and i you know we'll see if i have
time so in the meantime while i was
teaching and sci-fi was emerging and i
had a student i was constantly while i
was teaching trying a way to fund this
stuff
so i had a graduate student uh my only
graduate student a
chinese fellow
luhongza is his name great guy he wrote
a bunch of stuff for iterative iterative
linear algebra like got into writing
some of the iterative literary algebra
uh tools that are currently there in
sci-fi and they've gotten better since
but this is in 2005.
kept working on sci-fi but
perry had started working on a
replacement to numeric called namurai
and in 2004 a package called in the
image it was an image processing library
that was written for nomare and it had
in it a morphology tool i don't know
what morphology is it's open dilations
close you know there's sort of this
as a medical imaging student i knew what
it was because it was used in
segmentation a lot and in fact i wanted
to do something like that in python in
scipy but just had never gotten around
to it so when it came out that it worked
only on the num array
and scipy needed numeric and so we
effectively had the beginning of this
split
and numeric and number i didn't share
data they were just two so you could
have a gigabyte of numeric memory data
and gigabyte of numeric data and they
wouldn't share it yeah and so you have
these any of these scientific libraries
written on top
i got really bugged by that yeah i got
really like oh man this is not good
we're not cooperating now we're not
we're sort of redoing each other's work
and we're just this young community
so that's what led me even though i knew
it was risky because my
you know i had i was on a tenure track
position 2004 i got reviewed they said
hey things are going okay you're doing
well paper's coming out but you're kind
of spending a lot of time in this open
source stuff maybe do a little less of
that and a little more of the paper
writing and grant writing which was
naive but it was definitely the time you
know the thinking it still goes on still
goes on
you're basically creating a thing which
enables science in the 21st century
right um maybe don't emphasize that so
much in your for your tenure right
it illustrates some of the challenges
yeah
it does it's and it's people mean well
yeah like but we've gotten broken in a
bunch of ways
certain things programming understanding
the role of software engineering and
programming exactly society is a little
bit less exactly no i was in electrical
engineering position right so it was
even worse
there
yeah it was very they were very focused
and so you know good people and i had a
great time i loved my time i loved my
teaching i loved all the things i did
there the problem was this split was
happening in this community i loved i
saw people and i go oh my gosh this is
going to be
this is not great and so
i happened you know fate
i had a class i had signed up for it
it's a i was trying to build an mri
system so i just i had a kind of a radio
a instead of a radio a digital radio
class as a digital mri class
uh and i had people sign up two people
signed up then they dropped and so i had
nobody in this class
so
and i didn't have any other courses to
teach and i thought oh i've got some
time
and i'll just write i'll just write a
reply a merger of the american memory
like i'll basically take the numeric
code base add the features number i was
adding
and then kind of come up with a single
array library that everybody can use so
that's where numpy came from was my
thinking hey i can do this and who else
is going to because at that point i'd
been around the community long enough
and i'd written enough c code i knew
i knew the structures and i in fact my
first contribution to numeric had been
writing the c api documentation that
went in the first documentation for
numpy for numeric sorry this is paul
dubois david asher connor hinson and
myself
i got credit because i wrote this
chapter which is all the c api of
numeric all the c stuff so i said ah i'm
probably the one to do it you know
nobody else is going to do this so it's
sort of out of a sense of duty and
passion
knowing that
i don't think my academic advice i don't
think the department here is going to
appreciate this but i
it's the right thing to do i was like
can we just
link on that moment because
the importance of the way you thought
and the action you took
i feel is
is understated and
is rare and i would love to see so much
more of it because what happens as the
tools become more popular
uh there's a split that happens
and it's a truly heroic and impactful
action to in those early in that early
split to step up
and you it's like great leaders
throughout history like get what is the
brave heart like get on a horse and row
the troops because i think that can have
make a big difference we have tensorflow
versus pytorch in the machine learning
we have the same problem today yes
i wonder it's actually bigger i wonder
if it's possible
to in the early days to rally the troops
it is possible especially in the early
days the longer it goes the harder right
the more energy and the factions the
harder but in the early days it is
possible and it's extremely
helpful and there's a willingness there
yeah but but but the challenge is
there's usually not a willingness to
fund it yeah there's not a willingness
to you know like i was literally walking
into a field saying i'm going to do this
and here i am like you know i have five
kids at home now
pressure builds sometimes my wife hears
these stories and she's like you did
what
i thought we were gonna i thought you
were actually on a path to make sure we
had resources and money but oh wow but
again there's a there's an aspect i'm a
very hopeful person i'm an optimistic
person by nature i love people i learned
that about myself later on uh
part of my
my religious beliefs actually lead to
that and it's why i hold them dear
because it's actually how i feel about
that's why it's what leads me to this
these attitudes sort of this hopefulness
and this sense of yeah it may not it may
not work out for me financially or maybe
but that's not the ultimate gain like
that's a thing but it's not that's not
the score card
uh for me and so i just wanted to be
helpful and i knew and partly because
these sci-fi conferences because the
mailing-list conversations i knew there
was a lot of need for this
right so i had this it wasn't like i was
alone in terms of no feedback i had
these people who knew
but it was crazy like people at the time
said yeah we didn't think you'd be able
to do it yeah we thought it was crazy
and also instructive like practically
speaking
that you had a cool feature that you
were chasing the morphology like the yes
like it's it it's not either the end
result it's not some visionary thing i'm
going to unite the community you were
like correct you were actually
practically this is what one person
actually could do
uh and actually build because that is
important because you can get over your
skis yeah
you can definitely get over your skis
and i had in fact this almost got me
over my skis right i would say well in
retrospect i hate looking back
we i can tell you all the flaws with
numpy right you want to go into it
there's lots of stuff that i'm like oh
man that's embarrassing that was wrong i
wish i had somebody stop me with a wet
fish there yeah like i needed in fact
what i wished i'd had was somebody
with more experience and certainly
library writing an array library there's
like i wish i had me i could go back in
time and go do this do that there's
important being cause there's things we
did
that are still there that are
problematic that created challenges for
later and and i didn't know it at the
time didn't understand how important
that was and in many cases didn't know
what to do like there was pieces of the
design of nunpai i didn't know what to
do until five years ago
now i know what they should have been
been but i didn't know at the time and
nobody and i couldn't get the help
anyway so i wrote it it took about it
took four months to write the first
version then about 14 months to make it
usable
but it was it wasn't it was that first
four months of intense writing coding
getting something out the door that
worked
that was it was it was definitely
challenging and then the big thing i did
was create a new type object called
d-type that was probably the
contribution and the fact that i added
uh broad not just broadcasting but
advanced indexing so that you could do
um masks indexing and indirect indexing
instead of just slicing in so for people
who don't know maybe you can elaborate
yeah numpy i guess the vision in the
narrowest sense is to have this object
that represents
n-dimensional arrays
and like at any level of abstraction you
want but basically it could be a black
box that you can investigate in ways
that you would naturally want to
investigate yes such objects yes exactly
so you could do math on it easily math
on it easily so you had so it had an
associated library of math operations
and effectively scipy became an even
larger upright set of
math operations so the key for me was
i was going to write numpy and then move
side pi to depend on numpy
in fact early on one of the initial
proposals was that we would just write
scipy and it would have the numeric
object inside of it and it would be
scipy.array or something
that turned out to be problematic
because numeric already had a little
mini library of linear algebra and some
functions and it had enough momentum
enough users that nobody wanted to they
wanted backward compatibility one of the
big challenges of numpy was i had to be
backward compatible with both numeric
and memory in order to allow both of
those communities to come together there
was a ton of work in creating that
backward compatibility that also create
echoes in today's object like some of
the complexity in today's object is
actually from that
goal of backward compatibility these
other communities which if you didn't
have that you'd do something different
which is instructive because a lot of
things are there
you know what is that there for it's
like well it was it was a remnant it's
an artifact of its historical existence
um by the way i love the empathy and the
lack of ego behind that because i feel
you see that in the split
in the javascript frameworks for example
the arbitrary branching right is
you i think in order to unite people you
have to kind of put your ego aside and
truly listen to others like you do what
do you love about nama ray what do you
love about numeric like actually get a
sense we're talking about languages
earlier sort of empathize to the culture
of the people that love something about
this particular api
some
some the the naming style or the
the the use the actual usage patterns
and like truly understand them and so
that you can like
create that same draw in
the united i completely agree and you
have to also have enough passion that
you'll do it it can't be just like a
perfunctory yeah oh yes i'll listen i'm
really i listen to you and then i'm not
really excited about it you so it really
is an aspect it's a it's a philosophical
like there's a philia there's a love of
esteeming of others it's actually at the
heart of
what
it's sort of a life philosophy for me
right that i'm constantly pursuing and
that helped absolutely helped it makes
me wonder in a philosophical like
looking at human civilization as one
object
it makes me wonder how we can copy and
paste travis's in the circle
well in some aspects maybe some aspects
right right exactly well i
it's a good question how do we teach
this how do we how do we encourage it
how we lift it because so much of the
software world it's it's giant
communities right but it seems like so
much is moved by like little individuals
you talk about like linus tarwald
it's like can you could you have not
could you have had linux without him
could you it's like guido and python you
know
python i mean the sci-fi community
particularly it's like i said we wanted
to build this big thing but ultimately
we didn't what happened is we had uh
mavericks and champions like john hunter
created matt plotlib we had fernando
perez who created ipython and so we sort
of inspired each other but and then it
kind of there's sort of a culture of of
this
selfless give the stewardship mentality
as opposed to ownership mentality with
stewardship and and
and community um focused
community focused but intentional work
like not not waiting for everybody else
to do the work but you're doing it for
the benefit of others and not you're not
worried about what you're going to get
you know
you're not worried about the credit
you're not worried about we're going to
get you're worried about i later
realized that i have to worry a little
about credit not because i want the
credit because i want people to
understand what led to the results
like i don't it's not it's not about me
it's i want to understand this is what
led to the result so let's like i think
doing and this is what had no impact on
the result like let's let's promote this
just like you said i want to promote the
attributes so let that help make us
better off how do we make more of west
west mckinney like west mckinney was
critical to the success of python
because of his creation of pandas which
is
the roots of that were all the way back
in uh
american number a and numpy where numpy
created an array of records
west started to use that almost like a
data frame except it's an array of
records and data frame the challenge is
okay if you want to augment it at
another column you have to insert you
have to do all this memory movement to
insert a column whereas data frames
became oh i'm going to have a loose
collection of arrays
so it's a record of arrays that is the
heart of a data frame and we thought
about that back in the memory days but
wes
ended up doing the work to build it and
then then also the operations that were
relevant for data processing what i
noticed is just the each of these little
things creates just another tick another
up so numpy ultimately took a little
while about six months in people started
to join me you know francesc
alted robert kern charles harris
and these these people are many of the
unsung heroes i would say people who are
you know they don't they sometimes don't
get the credit they deserve
because they were critical both to
support
like you know it's it's hard and you
want you need some support people need
support and i needed just encouragement
and they were helping encouraged by
contributing and and once the big thing
for me was when john hunter
he had previously done kind of a simple
thing called numerix to kind of you know
between the american memory he had a
little high level tool that would just
select each one from matplotlib in 2006
he finally said we're going to just make
numpy the dependency of matplotlib
as soon as he did that and i remember
specifically when he did that i said
this okay we've done it like that was
when i knew we had succeeded success
and before then it was still you know to
ensure but that kind of started a roller
coaster and then 2006 to 2009
and then i've been floored by the by
what it's done like i had i knew it
would help i didn't have no idea how
much it would help
right so and it has to do with again the
language thing that just people started
to think in terms of numpy like yes
and that opened up a whole
new way of thinking and part of the
story that would cut you kind of
mentioned but
maybe you can
elaborate is it seems like at some point
in this story
python took over science and data
science yeah and
uh not bigger than that
the scientific community
started to think like programmers or
started to utilize the tools of
computers to do like at a scale that
wasn't done with fortran like at this
gigantic scale they started to opening
their heart and then python was the
thing i mean there's a few other
competitors i guess but python i think
really really took over i agree there's
a lot of stories here that are kind of
during this journey because this is sort
of the start of this journey in 2006.
so
my tenure committee i applied for tenure
in 2006 2007. it came back i split the
department i was very polarizing i had
some huge fans and then some people said
no way right so it was very i was a
polarizing figure in the department it
went all the way up to the university
president
ultimately my department chair had the
had this sway and they didn't say no
they said come back in two years and do
it again
and i went
at that point i was like i said i i i
mean i had this interest in
entrepreneurship this interest in in
not the academic circles not the like
how do how do we make industry work so i
do have to give credit to that expert
that exploration of economics because
that led me
oh i had a lot of opinions i was i was
actually very libertarian at the time
and i'm still i have some libertarian
trends but i'm more of a i understand
i'm more of a collectivist libertarian
so you value broadly philosophically
freedom i value broadly philosophy
freedom but i also understand the
the power of communities like the power
of of collective behavior and and so
what's that balance right that makes
sense
um so by the time i was just i got to go
out and explore this entrepreneur world
so i left academia i said no thanks
called my friend eric here who had his
company was going i said hey could i
join you and start this trend and and he
would at that time they were using
sci-fi a lot they were trying to get
clients and so i came down to texas and
in texas where i
sort of it's my entrepreneurial world
right i left academia and went to
entrepreneur world in in 2007. so moved
here in 2007 kind of took a leap knew
nothing really about business knew
nothing about a lot of stuff there
there's you know for a long time i've
kept some connections to a lot of
academics because i still value it i
still love the
scientific tradition i still value the
the essence and the soul and the heart
of what is possible
don't like
a lot of the administration and the kind
of
we can go into detail about why and
where and how this happens what are
those challenges i mean i i don't know
but i'm with you so
well i'm still affiliated with mit
i still love mit because there's magic
there yeah there's people i talk to like
researchers
faculty
in those conversations and the white
board and and just the conversation
that's magic there
all the other stuff the administration
all that kind of stuff
seems to
um
you don't you don't want to say too
harshly criticize sort of bureaucracies
but there's a lag that seems to get in
the way of the magic yeah and i don't
i'm still have a lot of hope that
that can change because
i don't often see that particular type
of magic
elsewhere in industry so like we need
that and we need that flame going and um
it's the same thing as exactly as you
said it has the same kind of elements
like the open source community does
and but then if you like the reason i
stepped away the reason i'm here just
like you did in austin is like if i want
to build one robot i'll stay at mit but
if i want to build millions
and make money enough to where i can
explore the magic of that then you can't
and i think
that dance is um that translational
dance has been lost a bit yeah right and
there's a lot of reasons for that i'm
certainly not an expert on this stuff
like an opinion like anybody else but i
i realized that i wanted to explore
entrepreneurship which i knew and really
figure out and it's been a driving
passion for 20 years 20 25 years how do
we connect
capital markets and company um because
again i fell in love with the notion
that oh profit seeking on its own is not
a bad thing it's actually a coordination
mechanism for allocating resources that
you know not in an emergent way right
that respects
everybody's opinions right so this is
actually powerful so so
i i say all the time when i make a
company and we do something that makes
profit what we're saying is hey we're
collecting of the world's resources and
voluntarily people are asking us to do
something they like
and that's a huge deal and so i really
like that energy so that's what i came
to do and to learn and to try to figure
out and that's what i've been kind of
stumbling through since for the past 14
years 2007 2007. so you were still
so no i was just emerging just right
one thing i've done i've done it's worth
mentioning because it
emphasized the exploratory nature of my
thinking at the time i said well i don't
know how to fund this thing i've got a
graduate student i'm paying for and i
got no funding for him and i had done
some fundraising from the public to try
to get public fundraiser for my lab
i didn't really want to go out and just
do the fundraising circuit the way it's
traditionally done
so i wrote a book and i said i'm going
to write a book and i'm going to charge
for it it was called guide to numpy and
so ultimately numpy became documentation
driven development because i basically
wrote the book and made sure the stuff
worked to the book would work
so it really helped actually make numpy
become a thing so that doc writing that
book
and it was not a i mean it's not a page
turner i mean kind of not a book you
pick up and go oh this is great over the
fire but it was it's where you could
find the details like how'd all this
work and a lot of people love that book
and so a lot of people end up so i but i
said look i i need to so i'm going to
charge for it
uh and i got some flack for that not
that much just just probably five angry
messages people you know yelling at me
saying i was you know
bad guy for for charging for this book
one of them richard
no just kidding no i i haven't really
had any interaction with him personally
uh like i said um but but there were a
few but but actually surprisingly not
there was actually a lot of people like
no it's fine you know you can charge for
a book that's no big deal we know that's
a way you can you can try to make money
around open source so so what i did what
i i did in an interesting way i said
well
you know kind of my ideas around around
ip law and stuff i love the idea you can
share something you can spread it like
once it's the fact that you have a thing
and copying is free
but the creation is not free so how do
we how do you fund the creation and
allow the copying right and then
software it's a little more complicated
than that because creation is actually a
continuous thing you know it's not like
you build a widget that's done it's sort
of a process of emerging and continuing
to to create but i wrote the book and
had this market determined price thing i
said look i need i think i said 250 000.
if i make 250 000 from this book it's
it'll make it free so as soon as i get
that much money or i said five years
right so there's a time limit like
that's forever cool i didn't know the
story yeah
i i released it on this and it's
actually interesting because one of the
people who also thought that was
interesting ended up being chris white
who was the director of darpa project
that we got funding through at anaconda
and the reason he even called us back is
because he remembered my name from this
book and he thought that was interesting
and so
even though we hadn't gone to the demo
days we applied and the people said yeah
nobody ever gets this without coming to
the demo day first that's the first time
i've seen it but it's because
i knew you know chris had done this and
had this interaction so it did have
impact i was actually really really
pleased by the result i mean i ended up
i ended up in three years i mean 90 000.
so i sold 30 000 copies by myself i just
put it up on you know use paypal and
sold it uh made and those are my first
taste of kind of
okay this can work to some degree and
and i you know all over the world right
from germany to japan to it was actually
it did work and so i appreciated the
fact that paypal existed and had a way
to make to get the money the
distribution was simple
this is pre-amazon book stuff so it was
just published in a website it was the
popularity of sci-fi emerging and
getting company usage
i ended up not letting it go the five
years and not trying to make the full
amount because
you know a year and a half later i was
at m thought i had left academias at m
thought and i kind of had a full-time
job and then actually what happened is
the documentation people there's a group
that said hey we wanna do documentation
for scipy as a collective and they're
essentially
needing the stuff in the book right and
and so
they kind of ask hey can we just use the
stuff in your book and at that point
said yeah i'll just open it up so that's
but it has served his purpose in the
money that i made actually funded my
grad student like it was actually you
know i paid him 25 000 a year
uh out of that money the funny thing is
if you do very similar kind of
experiment now with numpy or something
like it you could probably make a lot
more it's probably true
because of the tooling and the community
building yeah i agree like the and
social media there there's just a
virality to that kind of idea i agree
there'd be things to do i've thought
about that but and really had thought
about a couple of books or a couple of
things that could be done there and i
just
haven't right even
i tried to hire a ghostwriter this week
this year too to speak effect would help
but it it didn't
part of my problem is this i've been so
excited by a number of things that steps
in from that like so i came here worked
at nthot for four years uh graciously
you know eric made me president and we
started to work closely together we
actually helped him buy out his partner
um
it didn't end great like unfortunately
eric and i aren't real aren't friends
now um i still respect him i have a lot
i mean i wish we were but uh
he didn't like the fact that i that
peter and i started anaconda right that
was not i mean
um so i'm there's two sides of that
story so i'm not gonna go into it right
sure um but you
as human beings and you wish you still
could be friends i do
i do it saddens me i mean that that's um
that's the story of great minds building
great companies yeah somehow it's sad
that um yeah when there's that kind of
and
i i i hold him in a steam i'm grateful
for him i think he's they're doing you
know their thoughts still exist they're
doing great work uh helping scientists
they still run the scipy conference
they're in they have an r d platform
they're selling now that's a a tool that
you can go get today right so
um they've been
thought has played a role in the scipy
in in supporting the community around
sci-fi i would say they ended up not
being able to
they ended up building a tool suite to
write gui applications like that's where
they could actually make that the
business could work and so this
supporting scipy and numpy itself wasn't
as possible like they didn't they tried
i mean it was not just because it was
just because the business aspect so and
i wanted to build a company that could
do that could get venture funding
right better for worse i mean that's a
longer story we could talk a lot about
that but and that's that's where
anaconda came that's renaconic in it so
let me let me ask you it's it's a little
bit for fun because you built this
amazing thing and so let's
let's talk about uh like an old warrior
looking over old battles
um you've you know there's a sad
letter in 2012 that you wrote uh to the
numpy mailing list announcing that
you're leaving numpy yeah and what some
of the things you've listed
some some of the things you regret or
not regret necessarily but some things
to think about
if you could go back and you could fix
stuff about numpy or
both sort of in a personal level but
also like looking forward what kind of
things would you like to see changed
good question so i think there's
technical questions and social questions
right there um
first of all you know i wrote numpy as a
service
and i spent a lot of time doing it and
then other people came help make it
happen numpy succeeded because the work
of a lot of people right so it's
important to understand that
i'm grateful for the opportunity the
role i had i could play and grateful
that things i did had an impact
but they only had the impact they had
because the other people that came to
the story and so they were essential but
the way data types were handled the way
data types we had array scalers for
example
that that are really just an um a
substitute for a type concept
right so we had
array scalers are actual python objects
so that there's for every for a 32-bit
float or a 16-bit float or a 16-bit
integer
python doesn't have a natural it's just
as one integer there's one float well
what about these
lower precision types these larger
precision types
so we had them in numpy
so that you could have a collection of
them but then have an object in python
that was one of them
and there's questions about like in
retrospect i wouldn't have created those
of an improved the type system
and like made the type system actually a
python type system as opposed to
currently it's a python one level type
system i don't know if you know the
difference between python one python two
it's kind of technical kind of depth but
python two one of its big things that
guido did it was really brilliant it was
he actually
python one
all classes new objects were where one
if you as a user wrote a class it was an
instance of a single python type called
the ob called the class type
right in python 2
he used a meta typing hook to actually
go oh we can extend this and have users
write classes that are new types
so it's able to have your user classes
be actual types and the python type
system got a lot more rich
i barely understood that at the time
that numpy was written and so i
essentially in python numpy created a
type system that was python one era
it was every
every d type is an instance of the same
type as opposed to having new d types be
really just python types with additional
metadata what's the cost of that is it
efficiency is a usability uh it's
usability primarily the cost isn't
really efficiency it's it's it's the
fact that it's clumsy to create new
types
uh it's hard and then one of the
challenges you want to create new types
you want a quaternion type or you want
to uh add a new you know posit type or
you want to um so it's hard now in the
and and now
if we had done that well
when number came on the scene where we
could actually compile python code it
would integrate with that type system
much cleaner and now all of a sudden you
could
do gradual typing more easily you could
actually have python when you add number
plus better typing could actually be a
uh
you'd smooth out a lot of rough edges
but there's already there's like but are
you talking about from the perspective
of developers within numpy or users and
not buy because developers have new not
really users of numpy so much it's the
development of numpy so you're thinking
about like
how to design numpy so that it's
contributors yeah the contributors are
it's easier it's easier it's less work
to make it better and to keep it
maintained and and where that's impacted
things for example is
the gpu like all of a sudden gpus start
getting added and
we don't have them in numpy like numpy
should just work on gpus right the fact
that we have to have to download a whole
other object called kupai to have arrays
on gpus is just an artifact of
history because there's no fundamental
reason for it well that's really
interesting if we could sort of go on
that tangent briefly is
you have
pi torch and other
library like tensorflow that basically
tried to mimic uh yeah
like you've created a sort of platonic
form
basically
yeah exactly well the problem was they
didn't realize that yeah the platonic
form has a lot of edges they're like
well we should cut those out before we
present it so i i wonder if you can
comment is there like a difference
between their implementations do you
wish that they were all using numpy over
like in this abstraction yeah
and sorry to interrupt that there's
gpus a6 there might be other
neuromorphic computing there might be
other kind of or the aliens will come
with a new kind of computer like an
abstraction that numpy should just
operate nicely over the things that are
more and more and smarter and smarter
with uh with this multi-dimensional
arrays yeah yeah i have there's several
comments there we are working on
something now called
datadashapis.org data.api.org you can go
there today and it's
it's our answer it's my answer you know
it's not just me it's me and ralph and
and athen and aaron and a lot of
companies are helping us at quansite
labs
uh it's not unifying all the arrays it's
creating an api that is unified um so
we do care about this and trying to try
to work through it
actually the chance to go and meet with
the tensorflow team and the pi torch
team and talk to them after uh after
exiting anaconda just talking about
because
the first year after leaving a con in
i became deeply aware of this and
realized that oh this split in the array
community that exists today
makes what i was concerned about in 2005
pretty parochial
it's a lot worse right now there's a lot
more people
so the perhaps the industry can sustain
more stacks right there's a lot of money
but it makes it a lot less efficient i
mean this
but i've also learned to appreciate it's
okay to have some competition it's okay
to have different implementations
but it's better if you can at least
refactor some parts i mean you're going
to be more efficient if you can refactor
parts it's uh it's nice to have
competition over things
overweight they're innovative
competition right they're innovative
yeah innovative and then maybe on the
infrastructure right uh whatever however
you define infrastructure right
maybe it's nice to have controversial
exactly i agree and i think but it was
interesting to hear the stories i mean
tensorflow came out of
the c-plus plus library uh jeff dean
wrote i think that was uh basically uh
how they were doing inference right and
then they realized oh we could do this
tensorflow thing
that close library then what was
interesting to me was the fact that both
google and facebook
did not it's not like they supported
python or numpy initially they just
realized they had to they they came to
this world and then all the users like
hey where's the numpy interface
oh and then they kind of came late to it
and then they had these bolt-ons
tensorflow's bolt on
i don't mean to offend but it was so bad
yeah it's the first time that i i i'm
usually so i mean
one of the challenges i have is i don't
criticize enough because
in the sense that i don't give people
input enough you know if um i think it's
universally agreed upon that the
bolt-ons on tensorflow right but i went
through it there was a talk given at a
mallorca in in spain and it got a great
guy i came and gave a talk i said you
should never show that api again at a pi
data conference
like that
that's terrible like you're taking this
beautiful system you've created and like
you're corrupting all these poor python
people forcing them to write code like
that or thinking they should
uh fortunately you know they adopted
keras as their and that's keros is
better and so keras tensorflow is fine
is reasonable but um they bolted it on
facebook did too like facebook had their
own
c plus library for doing inference and
they also had the same you know reaction
they had to do this one big difference
is facebook maybe because the way it's
situated in the in part of fair part of
the research library tensorflow is
definitely used and you know they have
to make they couldn't just open it up
and let the community you know change
what that is because
i guess they were worried about
disrupting their operations
facebook's been much more open to having
community input on the structure itself
whereas google and tensorflow they're
really eager to have user community
users people use it and build the
infrastructure but it's much more wild
like it's harder to become a contributor
to tensorflow and it's also this is very
difficult question to answer and i don't
mean to be throwing shade at anybody but
you have to wonder it's the microsoft
question
of when you have a tool like pi torch or
tensorflow
how much are you tending to the hackers
and how much are you tending to the big
corporate clients correct and so correct
like the ones that so do you tend to the
millions of people that are giving you
almost no money
or do you tend to the peop the few that
are giving you a ton of money i tend to
um
stand with the people right because i
feel like if you uh nurture the hackers
you will make the right decisions in the
long term that will make the companies
happy i lean that way too
totally but then you have to find the
right date but it's a balance yeah it's
because you can lean to the hackers and
run out of money yeah exactly
exactly
which has been some of the challenge
i've faced yes in the sense that like i
like i would look at some of the
experiments like numpy the fact that we
have the split is a factor of i wasn't
able to collect more money towards
number development yeah right i mean i
didn't succeed at in the early days of
getting enough financial contribution
and umpi so maybe i could work on it
right i couldn't work on it full-time i
had to just
catch an hour here an hour there
and i've basically not liked that like
i've wanted to be able to do something
about that for a long time and trying to
figure out how well there's lots of ways
i mean possibly one could say you know
we had an offer from microsoft at early
days of anaconda uh the 2014 they
offered to come by us right
the problem was the right people that
microsoft didn't offer to buy us and
they were still
they were it was really uh we were like
a second
they had really bought they just bought
r the r company called um
it was not our studio but it was another
r company that was emergent and it was
kind of a well we should also get a
python play but they were really
doubling down on r
right and so it was like it was where
you would go to die so it's not it
wasn't it was before satya was there
satya had just started just started
right and if the and the offer was
coming from someone two levels down from
him gotcha right and if it come from
scott guthrie so i got a chance to meet
scott guthrie great guy i like him if it
offered to come from him
probably would be at microsoft right now
that'd be fascinating that would be
really nice actually especially given uh
what microsoft has since done for the
open source community yes i think
they're doing well i really like some of
the stuff they've been doing they're
still working and they've you know
they've hired guido now and they've
hired a lot of python developers
he retired then he came out of
retirement and he's working out i was
just talking to him and he didn't
mention this person well
i should i should have been further
because i know he loved dropbox but i
wasn't sure what he was doing who he was
up to well he was kind of saying he'd
retire but uh and it's it's literally
been
five years since i last sat down and
really talked to guido right um
guido is a technology uh expert right
he's a so i i came i was excited because
i'd finally figure out the type system
for numbai i wanted to kind of talk
about that with him and i kind of
overwhelmed him could you stay in that
mo just for a brief moment because
you're a fascinating person in the
history of program and he is a
fascinating person what have you learned
from guido about
programming about life yeah yeah uh a
lot actually i've been a fan of guidos
you know we have a chance to talk some i
wouldn't say you know we talk all the
time not only at all he may um but we
talked enough to i respect his back when
i first started number one the first
things i did was i had a i asked guido
for a meeting with him and paul dubois
in san mateo and i went and met him for
lunch and basically to say maybe we can
actually part of the strategy for numpy
was to get it into python 3 and maybe be
part of python so we talked about that
that's cool about that approach right i
would have loved to be a fly in the
world that was a that was good and over
over the years for guido i learned
so he was open like he was willing to
listen to people's ideas
right and know over the years now
generally you know i'm not saying
universally that's been true but but
generally that's been true so he's
willing to listen he's willing to defer
like on the scientific side he would
just kind of defer he didn't really
always understand what we were doing
like and he'd defer one place where he
didn't enough
was we missed a matrix multiply operator
like that finally got added to python
but about 10 years later than it should
have
but the reason was because nobody
it took it takes a lot of effort and i
learned this while i was writing numpy i
also wrote tools to give a python dev
and i added some pieces of python um
like the memory view object i wanted the
structure of numpy into python so we
didn't get numpy into python but we got
the basic structure of it in the python
like so you could build on it nobody did
for a while but eventually database
authors started to
and it was it's a lot better they did
and also antoine petro and stefan craw
actually fixed the memory view object
because i wrote the underlying
infrastructure in c but the python
exposure was terrible until they came in
and fixed it partly because i was
writing numpy and numpy was the python
exposure i didn't really care about if
you didn't have numpy installed anyway
guido
opened up ideas technology you know
brilliant like really
i really got a lot of respect for him
when i saw what he did with the clap
with the this type class merger thing
that was actually tricky right and then
and then willing to share willing to
share his ideas so the other thing early
on in 1998 i said i start wrote my first
extension module the reason i could is
because he wrote this blog post on how
to do reference counting
right and without it i would have been
lost right but he was he was willing to
at least try to write this post and so
he's been motive he's been motivated
early on with python it was a computer
science for everybody we kind of have
this early on desire to oh maybe we
should be pushing programming to more
people so he had this
populist notion i guess or populist
sense
um so learn that there's a certain skill
i've seen it in other people too
of
engaging with contributors sufficiently
to because when somebody engages with
you and wants to contribute to you if
you ignore them they go away so building
that early contributor base requires
real engagement
with other people and he would do that
can you also comment on this tragic
uh stepping down from his position as
the benevolent dictator for life
over the wars
uh uh you know the walrus operator the
walrus operator was the bat last battle
i don't know if that's the cause of it
but uh this there's this for people who
don't know you can look up there's the
walrus operator which is uh looks like a
colon and an equal sign yeah and equal
sign and
it actually does
maybe the thing that you
that an equal sign should be doing yeah
maybe right exactly uh yeah but it's
just historically equal sign means
something else it just means assignment
so he stepped down over this what do you
think about the pressure of leadership
some of the you mentioned the letter i
wrote in umpire at the time that was a
hard time actually i mean you know
there's been really hard times it was
hard you know
you get criticized right and you get
pushed and you get um not everybody
loves what you do like anytime you do
anything that has impact at all
you're not universally loved right you
get some real critics and
that's an important energy because
it's impossible if you do everything
right you need people to be pushing but
sometimes people can get mean yeah
people can
i
i prefer to get people to benefit the
doubt i don't immediately assume they
have bad intentions
and maybe for other you know maybe other
maybe that doesn't happen for everybody
they for whatever reason their past
their experience with people they they
sometimes have bad and they so they
immediately attribute to you bad
intentions so you're like where this
come from i mean i definitely open the
criticism but i think you're
misinterpreting the whole point
uh because i i would get that you know
certainly when i started anaconda you
know
i've been
sometimes i say to people uh i know i'm
i care enough about entrepreneurship to
make some open source people
uncomfortable
and i care enough about open source to
make investors uncomfortable
so i sort of you know create you create
kind of doubters on both sides so when
you have and this is just a
a plea to the listener and the public
i've noticed this too
that
there's a tendency and social media
makes this worse
when you don't have perfect information
about the situation you tend to fill the
gaps with the
the worst possible or at least a bad
uh story that fills those gaps and
i think it's good to live life
uh maybe not fully naively but filling
in the gaps with the with the with the
good
with the
best with the positive with the with the
hopeful explanation of why you see this
so if you see somebody like you trying
to make money on a book about numpy
there's a million stories around that
that are positive and those are good to
think about
to project positive intent on to people
because
for many reasons usually because people
are good and they do have good intent
and also when you project that positive
intent people step up to that
too yes so like it's it has a great
point it has this kind of viral nature
to it and of course
what twitter early on figured out on
facebook is that they can make a lot of
money and engagement from the negative
yes so like there's this we're fighting
this mechanism i agree it's just
challenging it's like easier it's just
easier to be to be negative and then for
some reason something in our minds
really enjoys sharing that and getting
getting all excited about the negativity
we do yeah but but the protective
mechanism perhaps that we're we're going
to eat and if we don't exactly for us to
be effective as a group of people in a
software engineering project you have to
project positive intent i think i
totally agree totally agree and i think
that's very so that that happens in this
in the space but python has done a
reasonable job in the past but here's a
situation where i think it's it's
starting to get this pressure where it
didn't i was i really didn't i didn't
know enough about what happened i've you
know talked to several people about it
and i know
i think most of the steering committee
members today
uh one one person nominated me for that
role but it's the wrong role for me
right now right
um i have a lot of respect for the
python developer space and the python
developers i also understand the gap
between computer science python
developers and array programming
developers or science developers in fact
python succeeds in the array space the
more it has people in that boundary and
there's often very few like i was
playing a role in that boundary and you
know working like everything to try to
keep up with the
with the what even what guido was saying
like
i'm a c programmer but not a computer
scientist like i was a engineer and
physicist and mathematician
and i don't i didn't always understand
what they were talking about and why
they would have opinions the way they
did so you have to listen and try to
understand then you also have to explain
your point of view in a way they can
understand and that takes a lot of work
and that that communication
is always the challenge and it's just
what we're describing here about the
negativity is just another form of that
like how do we come together and it does
appear we're wired anyway to at least
have a
there's a part of us that will enemy you
know friend enemy and
and we see yeah it's like why are we
wiring on the enemy front yeah so so why
are we pushing that why are we promoting
that so deeply let's assume friend until
proven otherwise yes yeah
so because you have such a fascinating
mind and all this let me just ask you
these questions so one interesting side
on the python
history is the move from python 2 to
python 3. you mentioned move from python
1 to python 2 but
the move from python 2 to python 3 is a
little bit interesting because it took a
very long time it uh it broke
in a quite a small way backward
compatibility but even that small way
seemed to have been very painful for
people is there lessons
tons of lessons from uh from how long it
took and how painful it seemed to be
yeah tons of lessons well i mentioned
here earlier that
numpy was written in 2005.
it was in 2005 that i actually went to
guido to talk about getting numpy into
python 3. like my strategy was to
oh we're moving to python 3. let's have
that be and it seems funny in retrospect
because like wait python 3 that was in
20 2020 right when we finally
ended support for python 2 or at least
2017. the reason it took a long time a
lot of time i think it was because
one of the things is there wasn't much
to like about python 3. 3.0 3.1 it
really wasn't until 3.3 like i consider
python 3.3 to be python 3.0
but it wasn't until python 3.3 that i
thought there's enough stuff in it
to make it worth anybody using it
right and then three four started to be
oh yeah i want that and then three five
as the matrix multiply operator and now
it's like okay we gotta use that plus
the libraries that started leveraging
the some of the features of python
exactly yeah so it really the challenge
was it was
but it also illustrated a truism that
you know it's when you have inertia when
you have a pop when you have a group of
people using something it's really hard
to move them away from it you can't just
change the world on them and python 3
you know made some i think it fixed some
things guido had always hated i don't
think he didn't like the fact print was
a statement he wanted to make it a
function but in some sense there's a bit
of gratuitous change to the language and
you could argue and there's people have
but
there was
one of the challenges was there wasn't
enough features and too many just
changes
without features and
so that empathy for the end user as to
why they would switch wasn't wasn't
there i think also it illustrated just
the funding realities like python wasn't
funded like it was also a project with a
bunch of volunteer labor
right it had more people so more
volunteer labor but it was still it was
fun in the sense that least guido had a
job and i i've learned some of the
behind the scenes on that now since
since talking to people who lived
through it and
uh maybe not on air we can talk about
something
but it's interesting to see but guido
had a job but he but his full-time job
wasn't just work on python yeah like he
had other things to do
it's just wild it is wild isn't it as
well how few people are funded yes how
much impact they have yes
maybe that's a feature not a bug i don't
know maybe yes exactly at least early on
like it's sort of i know yeah it's like
olympic athletes are often severely
underfunded but maybe that's what brings
out the greatness perhaps yes correct no
exactly
maybe this is
essential part of it because i do think
about that in terms of i currently have
an incubator for open source startups
like what i'm trying to do right now is
create the environment i wish that
existed when i was leaving academia with
numpy and trying to figure out what to
do i'm trying to create those
opportunities and environments so
uh and that's that's what drives me
still is how do i make the world easier
for the open source entrepreneur uh so
let me stay
i mean i could probably stand numb by
for a long time but um this is fun
question so andre kapathy leads the
tesla autopilot team and uh he's also
one of the most like legit
uh programmers uh i know
it's like he builds stuff from scratch a
lot and that's how he builds intuition
about how a problem works he just built
it from scratch and i always love that
and the primary language he uses is
python
for for the intuition building but he
posted something on
twitter
saying that they got a significant
improvement
on some aspect of their uh like data
loading i think
by switching away from np dot square
root
so the numpy's implementation of square
root to math that square root and then
somebody else commented that you can you
can get even a much greater improvement
by using the vanilla
python square root which is like power
0.5 power 0.5
and it's fascinating to me i just wanted
to
so that absolutely i mean that was some
shade throwing at some no no
but also we're talking about it's a good
way to ask
the trade-off between usability and
efficiency
broadly in numpy but also on these like
specific weird quirks of like a single
function yep so
on that point
if you use a numpy math function on a
scalar
it's going to be slower than using a
python function on that scalar yeah but
because the the math object in p in
numpy is more complicated right because
you can also call that math object on an
array
and so effectively it goes through a
similar machine there aren't enough of
the which you would do in a and you
could do like checks and
fast paths
so yeah if you're basically doing a list
if you run over a list in fact for
problems that are less than
a thousand
even maybe 10 000 it's probably the if
you're going more than 10 000 that's
where you definitely need to be using
arrays but if you're less than that and
for reading if you're doing a reading
process and essentially it's not compute
bound it's i o bound and so you're
you're really taking lists of thousand
at a time and doing work on it yeah you
could be faster just using python
straight up python see but also
and then this is the so sorry to
introduce there's the fundamental
questions
when you look at the long arc of history
it's very possible that np
square is much faster it could be so
like in terms of like don't worry about
it it's the the evils of over
optimization or whatever all the
different quotes are on that it's is uh
sometimes obsessing about this
particular little uh quark is not it's
not
it's efficient like for somebody like uh
if you're if you're trying to optimize
your path i mean i agree premature
optimization
creates all kinds of challenges right
because now but you may have to do it i
believe the quote is it's the root of
all it's root of all evil right
let's give dude i think or take that to
somebody else
well doc newt is kind of like mark twain
people just attribute it don't matter
and it's fine because brilliant so no i
was a tech user myself and so i have a
lot of respect and he did more than that
of course but
uh yeah
someone i really appreciate in the
computer science space yeah i don't i
think that's appropriate there's a lot
of little things like that where people
actually if you understood it you go
yeah of course that's the case yeah like
and the other part and the other part i
didn't mention and
number was a thing we wrote early on and
i was really excited by number because
it's something we wanted it was a
compiler for python syntax and i wanted
it from the beginning of writing numpy
because
of this function question like
taking the power of arrays is really
that you can write functions using all
of it
it has implicit looping right so you
don't worry about this n-dimensional for
loop with you know four loops four four
statements you just say oh big
four-dimensional array i'm gonna do this
operation this plus this minus this
reduction
and you get this it's called
vectorization in other areas but you can
basically think at a high level and get
massive amounts of computation done with
the added benefit
of oh it can be paralyzed easily it can
be put in parallel you don't have to
think about that in fact it's worse to
go decompose your you write the for
loops and then try to infer parallelism
from for loops that's actually harder
problem than to take the array problem
and just automatically parallelize that
problem that's what
and and so functions in numpy are called
universal functions u func so square
root is an example of a u func there are
others sine cosine add subtract
in fact one of those first libraries to
scipy was something called special where
i added bessel functions and like all
these special functions that come up in
physics and i added them as u func so
they could work on arrays so i
understood you function very very well
from day one inside of numeric that was
one of the things we tried to make
better in numpy was how do they work can
they do broadcasting what does
broadcasting mean
but one of the problems is okay what do
i do with a python scalar
so what happens the python scalar gets
broadcast to a zero dimensional array
and then it goes through the whole same
machinery as if it were a ten thousand
dimensional array and then that then
then it kind of unpacks the element and
then does the addition
that's not to mention the function it
calls
in the case of square root is just the c
lab square root right in some cases like
python's power there's some
optimizations they're doing
for that are that could be faster than
just calling this the c lab square root
in the interpreter or the in the no in
the c code in the python runtime in the
pythagorean so they're
they really optimize it and they have
the freedom to do that because they
don't have to worry about it's just a
scalar it's just a scalar right they
don't have to worry about the fact that
oh this could be an object with many you
know many pieces they're not the u funk
machinery is also generic in the sense
that uh type casting and broadcasting
broadcasting's idea of i'm gonna go i
have a zero dimensional array i have a
scalar with a four dimensional array and
i add them
oh i have to just kind of concourse the
shape of this guy
to make it work against the whole
four-dimensional array so it's the idea
if i can do a one-dimensional array
against a two-dimensional array and have
it make sense well that's what numpy
does is it challenges you to reformulate
rethink your problem yes as a
multi-dimensional rate problem versus
like
move away from scalars completely right
exactly yeah exactly in fact that's
where some of the edge cases boundaries
are is that well the they're still there
and this is where array scalers are
particular so arrays are particularly
bad in the sense that they were written
so that you could optimize the math on
them but that hasn't happened
right and so their default is to use is
to coerce the arrays together to a zero
dimensional array and then use the
number the numpy machinery that's what
and you could specialize but it doesn't
happen all the time so in fact when we
first wrote number we'd do comparisons
and say look it's a thousand x speed up
we're lying a little bit in the sense
that well first do with the
the 40x slowdown of using array scalers
inside of a loop because if you used to
use python scalars you'd already be
10 times faster yeah but then we would
get 100 times faster over that using
just compilation
but what we do is compile the loop from
out of the interpreter to machine code
and then that's always been the power of
python is this extensibility so you can
people say oh python's so slow well sure
if you do all your logic in the runtime
of the python interpreter yeah but the
power is that you don't have to you
write all the logic which you do in the
high level is just high level logic and
the the actual calls you're making could
be on
gigabyte arrays of data and that's all
done at compiled speeds
and the fact that integration is
one can happen but two is separable
that's one of the uh their language like
julia says we're gonna be all in one you
can do all of it together and then
there's the jury's out is that possible
i tend to think that you're gonna
there's separate concerns there you want
to pre-compile them but generally you
will want to pre-compile your
some of your loops like scipy is a
compilation step to install sci-fi it
takes about two hours if you have many
machines maybe you can get it down to
one hour but to compile those libraries
takes about takes a while you don't want
to do that at runtime
you don't do that all the time you want
to have this precompiled binary
available that you're then just linking
into so there's real questions about
the whole you know source code
code is running binary code is more than
source code it's created object code
it's the linker it's the loader it's the
how does that interpret it inside the
virtual memory space there's a lot of
details there that actually i didn't
understand for a long time until i you
know read books on the topic and it led
to
the more you know the better off you are
and you can do more details but
sometimes it helps with abstractions too
well the problem as we mentioned earlier
with abstractions is you kind of
sometimes assume
that whoever implemented this thing
had your case in mind and found the
optimal solution yes or like you assume
certain things i mean there's a lot of
correct one of the really powerful
things to me
early on
i mean it sounds silly to say but with
python probably one of the reasons i
fell in love with it is dictionaries yes
um so
obviously probably most languages have
some
concepts some mapping concept but it
felt like it was a first class citizen
and it was just my brain was able to
think in dictionaries but then there's
the thing that i guess i still use to
this day is order dictionaries
because that seems like a more natural
way to construct dictionaries yeah and
and from a computer science perspective
the running time cost is not that
significant but there's a lot of things
to understand about dictionaries
that
the abstraction kind of
doesn't necessarily incentivize you to
understand
right do you really understand the
notion of a hashmap and how that
dictionary is implemented but you're
right dictionaries are a good example of
an abstraction that's powerful and i
agree with you one of the love i agree i
love dictionaries too it took me a while
to understand that once you do you
realize oh they're everywhere and python
uses them everywhere too like it's
actually constructed that one of the
foundational things is dictionaries and
it does everything with x-rays yeah so
it is it's powerful order dictionaries
came later but it is very very powerful
it took me a little while coming from
just the array programming entirely to
understand these other objects like
dictionaries and lists and
tuples and binary trees
like i said i wasn't a computer
scientist so i studied a raise first and
so i was very erase-centric and you
realize oh these others don't have
purposes and value actually
um i agree there's a friendliness about
like one way to think about a raise
is um
arrays are just
not like full of numbers
but to make them accessible to humans
and make them less error-prone to human
users sometimes you want to attach names
human interpretable names
that are sticky to those arrays so yeah
that's how you start to think about
dictionaries yes
you start to convert numbers into
something that's human interpretable and
that's actually the tension i've had
correct with numpy because correct i've
built so much tooling around
human uh
human interpretability and also
protecting me
from a year later not making the
mistakes by being i wanted to force
myself to use english versus uh
numbers
yes so there's a there's a project
called labeled arrays like very
early it was recognized that oh we need
we we're indexing numpy we're just
numbers all the columns and particularly
the dimensions i mean if you have an
image you don't necessarily need to
label each column or row but if you have
a lot of images so you have another
dimension you at least like to label the
dimension as this is x this is y z or
this is give us some human meaning or
some domain circle meaning
that was one of the impetuses for pandas
actually was just oh we do need to label
these things
and label label array was an attempt to
add that like a lighter weight version
of that and there's been like that's an
example of something i think numpy could
add
could be added to numpy
but one of the challenges again how do
you fund this like like i said one of
the tragedies i think is that so i i
never had the chance to i was never paid
to work on empire right so i've always
just done in my spare time always taken
from one thing taken from another thing
to do it
and at the time i mean today it would be
the wrong time today like pay me to work
on empire now would not be a good use of
effort but
but we are finally at quansite labs i'm
actually paying people to work on numpy
and scipy which is i'm thrilled with i'm
excited by uh i've wanted to do that
it's what i wanted to do from day one it
just took me a while
to figure out a mechanism to do that
even like in the university setting
respecting that from like
pushing students young minds the younger
graduate students to contribute
and then figuring out financial uh
mechanisms that enable them to
contribute and then sort of reward them
for their um
innovative scientific journey that that
would be nice but then also just the
better allocation of resources
well you know it's 20-year anniversary
since 9 11 and i was just looking we
spent over six trillion dollars in in
the middle east after 9 11
in the various efforts there and sort of
to
put politics and all that aside it's
just you think about the education
system all the other ways we could have
possibly allocated that money
to me yeah to take it back
the amount of impact you would have by
allocating a little bit of money to uh
the programmers yeah that build the
tools that run the world is fascinating
i mean it is i
it i don't know
i think uh again there is some aspect to
uh being broke
as somewhat of a feature not a bug that
you make sure that you manage that right
now i know i it's so i but i don't think
that's a big part so it's like i think
you can you can have enough money and
actually be wealthy while maintaining
your values agreed i think agreed
there's an old adage that you know
nations that trade together don't go to
war together yeah right i i've often
thought about you know nations that code
together
[Laughter]
one thing i love about open source is
it's global it's multinational like
there aren't national boundaries one of
the challenges with business and open
source is the fact that business is
national like businesses are entities
that are recognized in legal
jurisdictions right and have laws that
are respected in those jurisdictions and
hiring and yet the open source ecosystem
is not it's not it's not there like
currently one of the problems we're
solving is hiring people all over the
world
right because we
it's a global effort and i've had the
chance to work and i've loved the chance
i've never been to uh like a iran but i
once had a conference i was able to talk
to people there right and talk to folks
in uh pakistan never been there but we
had a a
call where there are people there like
just scientists and normal people and
you know and and it's there's a there's
a certain amount of
humanizing right that gets away from the
like we often get the memes of society
that bubble up and get her get discussed
but the memes are not even an accurate
reflection of the reality of what people
are
well if you look at the major power
centers that are leading to something
like cyber war in the next few decades
it's united states it's russia and china
right and th those three countries in
particular have incredible developers
so if they work together yeah i think
that's one way the politicians can do
their stupid bickering but like the
there's a layer of infrastructure of
humanity yeah if they collaborate
together that that i think can prevent
major uh major
military conflict which would i think
most likely happen at the cyber level
versus the actual hot war level you're
right no i think that's good that's good
prediction
nations that code together uh
don't go to war yeah they don't go to
war together that's that's a hope right
that's one of the philosophical hopes
but yeah
so you mentioned uh the project of
number
which is um
fascinating so from the early days there
was kind of a push back on python that
it's not fast
you know you see c if you want to write
something that's fast you use cc plus if
you want to write something that's
usable and friendly but slow you use
python
and so
what is uh number what is its goal how
does it work great yeah yes that's what
the argument and the reality was people
would write
high-level code and use compiled code
but there's still user story use cases
where you want to
write python but then have it still be
fast you still need to write a for loop
like before number it was always don't
write a for loop you know write it in a
vectorized way you put in an array
and often that'll that can make a memory
trade off like quite often you can do it
but then you make maybe use more memory
because you have to build this array of
data that you don't necessarily need all
the time
so
number was it started from a desire to
have
a kind of a vectorized that worked
vectorized was a was a tool in numpy it
was released you give it a python
function and it gave you a universal
function a u-function would work on
arrays so get the function that just
worked on a scalar like you could make a
like the classic case was a simple
function and if then statement in it so
uh sine x over x function sync function
if x equals zero return one otherwise do
sine x over x
the challenge is you don't want that
loop had one in python so you want a
compiled version of that
um but the ufo the vectorize and numpy
would just give you a python function so
it would take the array of numbers and
at every call do a loop back into python
so it was very slow it gave me the
appearance of a u func but it was very
slow so i always wanted a vectorize that
would take that python scalar function
and produce a u-func working on binary
native code so in fact i had somebody
work on that with pi pi to see if
pipette could be used to produce a u
func like that early on um in 2009 or
something like that 2010. um they didn't
work that well it was kind of pretty
bulky but in 2012
uh peter and i just started anaconda we
had i just i
i'd learned to raise money
that's a different topic but i've
learned to you know raise money from
friends family and fools as they say and
that's a good line
oh that's a good sign but you know so i
we're trying to do something we were
trying to change the world peter and i
are super ambitious we wanted to make
array computing and we had ideas for
really what's still still the energy
right now how do you do
at scale data science we had a bunch of
ideas there but one of them
i just talked to people about llvm and i
was like there's a way to do this i just
i went uh i heard about my friend dave
beasley at a compiler course so i was
looking at compilers like and i realized
oh this is what you do and so i wrote a
version
of number that just basically
mapped
python byte code to lvm
nice right so
and the first version is like this works
and it produces code that's fast this is
cool for you know obviously a reduced
subset of python i didn't support all
the python language there had been
efforts to speed up python in the past
but those efforts were i would say not
from the array computing perspective not
from the perspective of wanting to
produce a vectorized improvement they
were from perspective of speeding up the
runtime of python which is fundamentally
hard because python allows for some
constructs that aren't you can't speed
up like it's this generic you know what
is this variable so i from the start did
not try to replicate python's semantics
entirely i said i'm going to take a
subset of the python syntax and let
people write syntax in python but it's
kind of a new language really so it's
almost like for loops like focusing on
for loops scalar arithmetic
you know typed you know really typed
language
a type subset
that was the key so but we wanted to
add inference of types so you didn't
have to spell all the types out because
when you call a function
so python is typed it's just dynamically
typed so you don't tell it what the
types are but when it runs every time an
object runs there's a type for the
variables you know what it is
and so that was the design goals of
number were to make it possible to write
functions that could be compiled
and and have them use for numpy arrays
like the needed support numpy race and
so uh how does it work you have a
comment within python that tells to do
like how do you help out compiler yeah
so
there isn't much actually you don't it's
kind of magical in the sense that just
looks at the type of the objects and
then does type inference to determine
any other variables it needs and then it
was also because we had a use case that
that could work early like one of the
challenges of any kind of new new
development is if you have something
that to make it work it's going to take
you a long time it's really hard to get
out of the ground if you have a project
where there's some incremental story it
can start working today and solve a
problem then you can start getting it
out there getting feedback because
number today you know numbers nine years
old today right the first
two three versions were not great right
but they solved a problem and some
people could try it we could get some
feedback on it not great and that it was
very focused very very fragile very
substantive
the subset it would actually compile was
small and so if you wrote python code
and said to the way it worked did you
write a function and you say at jit use
decorators so decorators just these
little constructs let you
decorate code with an app and then a
name
the atgit would take your python
function and actually just compile it
and replace the python function with a
another function that interacts with
this compiled function got it and it
would just do that and
it would you know we went from python
bytecode then we went to ast i mean
writing compiler is actually i learned a
lot about why computer science is taught
the way it is because
compilers can be hard to write there's
they use tree structures they use all
the concepts of computer science that
are needed and it's actually hard to to
you can it's easy to
to write a compiler and have it be
spaghetti code like the passes become
challenging and we ended up with three
versions of number right number got
written three times
what's uh
what programming language is number
written in python
okay yeah python
so really yeah it's fascinating yeah so
python but then the whole goal of number
is to translate python byte code
to llvm and so lvm actually does the
code generation in fact a lot of times
they'd say yeah it's super easy to write
a compiler if you're not writing the
parser nor the code generator right so
for people who don't know llvm is the
compiler itself so you're compatible
yeah it's really badly named low level
virtual machine which that part of it is
not used it's really low doesn't mean
that
love chris but the name makes you imply
that the virtual machine is what it's
all about it's actually the ir and the
library the the code generation that's
the real beauty of it the fact that what
i love about llvm was the fact that it
was a plateau you could collaborate on
right instead of the internals of gcc or
the internals of the intel compiler like
how do i extend that
and it was a place we could collaborate
and
we were early i mean people had started
before it's a slow compiler like it's
not a fast compiler so
for
some kind of jits like jits are common
in language because one uh every browser
has a javascript jet it does real-time
compilation of the javascript to machine
code
for people who don't know jet is just in
time compilation thank you yeah just in
time compilation they're actually really
sophisticated in fact i got jealous of
how much effort was put into the
javascript jets yes well it's kind of
incredible what they've done yes that
was good
i completely agree i'm very impressed um
but you know number wasn't it was it was
an effort to make that happen with
python and so we used some of the money
raised for anaconda to do it and then we
also applied for this darpa grant and
used some of that money to continue the
development and then we used proceeds
from service pro projects we would do we
get consulting projects on
uh that we would then use some of the
profits to invest in numbers so we ended
up with a team of two or three people
working on number
it was a fits and starts right and
ultimately the fact that we had a
commercial version of it also we were
writing so part of the way i was trying
to fund numbers say well let's do
the free number and then we'll have a
commercial version of number called
number pro and what number pro did is it
targeted gpus so we had the very first
cuda jit and the very first
jit compiler that in 2013
for 13 you could run not just a vue funk
on cpu but a u function gpus and it was
awesome automatically parallelize it and
get 1000x speed and that's a that's an
interesting funding mechanism because
you know large companies
or larger companies care about
speed exactly in just this way so it's
it's it's exactly a really good way yeah
there's been a couple things you know
people will pay for one they'll pay for
really good user interfaces
right and so and so i'm always looking
for what are the things people will pay
for that you can actually adapt to the
open source infrastructure one is
definitely user interfaces the second is
speed yeah like a better run time faster
run time and then when you say people
you mean like a small number of people
pay a lot of money but then there's also
this other mechanism that that's true a
ton of people pay that's true a little
bit first i gotta
we mentioned anaconda we mentioned uh
friends family and fools
so uh anaconda is yet another so there's
a company but there's also a project
correct that is exceptionally impactful
uh in in terms of uh for many reasons
but one of which is bringing a lot more
people into the
um
into the community of folks who use
python so what is
anaconda
what is its goals yeah maybe what is
conda versus anaconda yeah i'll tell you
a little bit of the history of that
because anaconda we we wanted to do uh
we wanted to scale python because we you
know that was the peter and i had the
goal of when we started on anaconda we
actually started as continuum analytics
was the name of the company that started
it got renamed to anaconda in 2015.
but we uh we said we want to scale
analytics numpy's great pandas is
emerging
but these need to run at scale with lots
of machines the other thing we wanted to
do was make user interfaces that were
web we wanted to make sure the web did
not pass by the python community that we
had a ways to translate your data
science to the web so those are the two
kind of technical areas we thought oh
we'll build products in this space and
that was the
idea
very quickly in but of course the thing
i knew how to do was to do consulting to
make money and to make sure my family
and friends and fools that it invested
didn't lose their money so it's a little
different than if you take money from a
venture fund if you take money from a
venture fund the venture fund they want
you to go big or go home and they're
kind of like expecting 9 out of 10 to
fail or 99 out of 100 to fail
it's different i was i was out of
barbell strategy i was like i can't fail
i mean i may not do super well but i
cannot lose their money so i'm going to
do something i know can return a profit
but i want to have exposure to an upside
so that's what happened in anaconda we
didn't there was lots of things we did
not well in terms of that structure and
i've learned from since and have it
better but we've uh we did a really good
job of kind of attracting the interest
around the area to get good people
working and then get funneled some money
on some interesting projects super
excited about what came out of our
energy there like a lot did
so what are some of the interesting
things
dask number bokeh conda uh there was a
data shader panel
holovis
um these are all tools that are
extremely relevant in terms of helping
you build applications build tools build
you know faster code
um there's a couple of days jupiter lab
jupiter lab came out of this too
fascinating yeah
okay so uh well bokeh does plotting is
that okay is plotting so bokeh was one
of the foundational things to say i want
to do plot and python but have the
things show up in a web right that's
right that's right that's right so
applauding to me still with all due
respect to matplotlib and bokeh is feels
like still an unsolved problem not
it's a big problem right because you're
i mean i don't know it's a visualization
broadly yes right i think we've got a
pretty good api story around certain use
cases of python plotting yeah but
there's a difference between static
plots versus interactive plots versus
i'm an end user i just want to write a
simple
you know pandas started the idea of
here's a data frame on a dot plot i'm
just going to attach plot as a method to
my object
which was a little bit controversial
right but works pretty well actually
because
there's a lot less you have to pass in
right you can just say here's my object
you know what you are
you tell the visualization what to do
so
that and there's things like that that
have not been you know super well
developed entirely but bokeh was focused
on interactive plotting so you could
it's a short path between interactive
plotting and
application dashboard application and
there's some incredible work that got
done there right and it was a hard
project because then you're basically
doing javascript and python so we we
wanted to tackle some of these hard
problems and try to and just go after
them we got some darpa funding to help
and it was super helpful it's a funny
story there we actually did two dark
proposals but one we were five minutes
late for
and darpa has a very strict cutoff
window and so i we had two proposals one
for the bokeh and one for actually
number and the the other work
which one were you late for
the foundational numerical work so
fortunately chris let us use some of the
money to fund still some of the other
foundational work
but it wasn't as
yeah his hands were tired he couldn't do
anything about it uh
that was a whole interesting story
so one of the incredible projects that
you worked on is conda yes so what is
that about yeah conda it was early on
like i said with scipy sci-fi was a
distribution masculine library and he
said talk to me talking about compiler
issues and trying to get the stuff
shipped and the fact that people can use
your libraries that they have it
so for a long time we've understood the
packaging problem in python and one of
the first things you did at that
consumer analytics became anaconda
was organize the pi data ecosystem in
conjunction with num focus we actually
started num focus uh uh with some other
folks in the community the same year we
started anaconda i said we're going to
build a corporation but also got to
reify the community aspect and build a
nonprofit so we do both of those can we
pause real quick and and uh can you say
what is pi pi the python package index
like this whole
story yeah of packaging in python yeah
that's what i'm going to get to actually
this is exactly the journey i'm honest
to sort of explain packaging in python i
think it's best expressed so the
conversation i had with guido at a
conference where i said so
you know yeah packaging is kind of a
problem megiddo said i don't ever care
about packaging i don't use it i don't
install new libraries i'm like i guess
if you're the language creator and if
you need something just put it put it in
the distribution maybe you don't worry
about packaging
but guido has never really cared about
packaging right and never really cared
about the problem of distribution
somebody else's problem and that's a
fair position to take i think as a
language creator
in fact there's a
philosophical question about should you
have different development packaging
managers should you have a package
manager per language is that really the
right approach i think there are some
answers of
it is appropriate to have development
tools and there's an aspect of
development tool that is related to
packaging and every language should have
some
story there to help their developers
create so you should have language
specific language tools development
tools that relate to package managers
but then there's a very specific user
story around package management that
those language specific package managers
have to interact with and currently
aren't doing a good job of that that was
one of the challenges that if not seeing
that difference and it still exists in
the in the difference today conda always
was a user
i'm i'm going to use python to do data
science i'm going to use python to do
something
how do i get this installed it was
always focused on that so it didn't have
like a develop you know
classic example is pip has a pip develop
it's like i want to install this into my
current development environment today
now
khan doesn't have that concept because
it's not part of the story for people
who don't know
pip
is a uh python
specific packaging manage package
manager right that's
that's exceptionally popular that's
probably like the default thing it's the
default user yeah and so the story there
emerged because what happened is
in 2012 we had this meeting at the
google googleplex and guido was there to
come talk about what we're going to do
how we're going to make things work
better and
wes mckinney me peter peter has a great
photo of me talking to guido and he
pretends we're talking about this story
maybe we were maybe before but we did at
that meeting talked about it and asked
you know what
we need to fix packaging in python like
people can't get the stuff and he said
go fix it yourself i don't think we're
gonna do it
all right the origin story right there
all right you said okay you said to do
this ourselves so
at the same time people did start to
work on the packaging story in python
it just took a little longer so in 2012
kind of motivated by our training
courses we were teaching like how to
very similar to what you just mentioned
about your mother like it was motivated
by the same purpose like how do we get
this into people's hands and it's this
big long process it takes too expensive
it was actually hurting numpy
development because i would hear people
were saying don't make that change to
numpy because i just spent a week
getting my python environment and if you
change
if you change numpy i have to reinstall
everything and reinstalling such a pain
don't do it i'm like wait okay so now
we're not making changes to a library
because of the installation problem that
it'll cause for end users okay there's a
problem with pac there's a problem with
installation we've got to fix this so we
we we said we're going to make a
distribution of python and we'd
previously done that previously done
that at m thought
i wanted to make one that would give
away for free everyone could just get
like that was critical that we just get
it you know it wasn't tied to a product
it was just you could get it
and then we had constantly thought about
well do we just leverage rpm do we but
the challenge had always been we want a
package manager that works on windows
mac os 10 and linux the same
right and it wasn't there like you don't
have anything like that you have and for
people who don't know rpm is red
operating system specific packaging
correct it's an operating specific yes
so do you create the the design uh
questions do you create an umbrella
package manager then yes cross operating
system yes that was the decision and a
neighboring design question is do you
also create a package manager that spans
multiple programming languages correct
exactly that was the world we faced and
we decided to go multiple operating
systems multiple and programming
language independent because even python
in particular was important was scipy
has a bunch of 4chan in it right and
scikit-learn has links to a bunch of c
plus there's a lot of compiled code and
the python package managers especially
early on didn't even support that so in
2000 so we
we we released anaconda which was just a
distribution of libraries but we started
work on condo in 2012.
first version of kana came out in early
2013 2000 summer of 2013 and it was a
package manager so you could say con
install psychic scikit-learn in fact
that was the
scikit-learn was a fantastic project
that emerged kind of it was the the
classic example of the scikits i still
talked to earlier about scipy being too
big to be a single library well what the
community had done is said let's make
side kits and there's psychic image
there's psychic learn there's a lot of
sidekicks
and it was a fantastic move you know the
community did i didn't do it
i was like okay that's a good idea i
didn't like the name i didn't like the
fact you typed scikit image i was like
that's going to be simpler sk learn we
got to make this smaller
i don't like typing all this stuff from
imports so i was kind of a pressure that
way but i love the energy i love the
fact that they went out and they did it
and dost people jared millman and then
of course guyell and and there's people
i'm not even naming that psychic learn
really emerged this fantastic project
and the documentation around that is
also incredible this was incredible
exactly i don't know who did that but
they did a great job a lot of people in
inria a lot of people a lot of european
contributors um andreas there's some
andreas uh in the u.s there's a lot of
just people i just adore i think are
amazing people um
awesome use of sci-fi right i love the
fact that they were using sci-fi
effectively do something i love which is
machine learning
um but i couldn't install it because
there's so many pieces involved so many
dependencies right yes so our our use
case of condo was con install cycle
learn
right and it was the best way to install
second learn in
2013 to really 2018
17 18.
pip finally caught up i still don't i
still think it's you should khan install
second learn for the pip install second
learn but you can dip install second
learn the the issue is the package they
created was wheels and pip does not
handle the multi-vendor approach they
don't handle the fact you have c plus
libraries you're depending on they just
stop at the python boundary and so what
you have to do in the real world is you
have to
vendor you have to take all the binary
and vendor it now if your change happens
underlying dependency
you have to redo the whole wheel so
tensorflow is a good example but you
should not pip install tensorflow it's a
terrible idea people do it because
the popularity of pip many people think
of course that's how i install
everything python yeah this is one of
the big challenges
you know you take a github repository or
just a basic blog post the number of
times pip is mentioned over conda is
like 100x to one correct correct so they
just haven't that was increasing it
wasn't true early because pip didn't
exist like conda came first so but
that's like the long tail of the
internet documentation user generated so
that like you think how do i install
google how do i install tensorflow
you're just not going to see conda in
that first page not correct exactly and
that today you would you would have in
2017 and it's sad because you saw the
condos solves a lot of usability issues
correct like for especially super
challenging thing i don't know one of
the big pain points for me was uh just
on the computer vision side uh opencv
yeah installation that perfect example
yes i think is i don't know if
condos solved that pun has an open cv
package i don't know i i certainly know
pip has not solved
i mean there's complexities there
because right i actually don't know i
should probably know a good answer for
this but
you know if if you compile
opencv with certain dependencies
you'll be able to do certain things so
there's this kind of flexibility of what
you like what br what options you
compile with yes and i don't think it's
trivial to do that in with condor or or
so has a notion of variance
of a package you can actually have
different compilation versions of a
package so not just the version's
different but oh this is compiled with
these optimizations
so kana does have an answer to those
flavors that's flavors basically
as far as i know does not no no pip
generally
hasn't thought deeply about the binary
dependency problem right and that's
that's why fundamentally it it doesn't
work for the sci-fi ecosystem it barely
it you can sort of paper over it and
duct tape and it kind of works until it
doesn't it falls apart entirely so it's
been a mixed bag like
and i've been having lots of
conversations with people over the years
because again it's an area where if you
if you understand some things but not
all the things but they've done a great
job of
community appeal this is an area where i
think anaconda
uh as a company needed to do some things
in order to
make condom more community-centric right
and this is a i talk about this all the
time there's there's a balance between
you have every project starts what i
call company backed open source even if
the company is yourself it's just one
person just you know doing business ads
but ultimately for products to succeed
virally and become massive influencers
they have to create they have to get
community people on board they have to
get other people on board so it has to
become community driven and a big part
of that is engagement with those people
empowering people governance around it
and there was and what happened with
khan in the early days pip emerged and
we did we did do some good things condo
forge the kind of forage community is
sort of the community
recipe creation community mm-hmm the
condo itself i am still believe and and
you know peter is ceo of anakin he's my
co-founder i ran anaconda tell 2017 2018
is peter still in peter salanakande
right and we're still great friends we
have great friends we talk all the time
i love him to death there's a long story
there about like why and how when we
could cover in some some other podcast
perhaps yeah sort of a more maybe a more
business focused one but but um
this is one area where i think condos
should be
more community driven like he he should
be pushing more to get more community
contributors to conda and let let let
the
not like anika shouldn't be fighting
this battle yeah right it's actually uh
it's really the developers like you said
like help the developers yeah and then
they'll actually move us the right
direction but that was the problem i
have as many of the cool kids i know
don't use conda and that to me is
confusing it is confusing and it's
really a matter of kind of has some
challenges first of all kind of still
needs to be improved there's lots of
improvements we made and that it's that
aspect of wait who's doing this and the
fact that then the pipea really stepped
up like they were not solving the
problem at all and now they kind of got
to where they're solving it for the most
part and then effectively you could get
like conda solved a problem that was
there and it still does and it's still
you know there's still great things it
can do
but um and we still use it all the time
at quansite and with other clients but
with uh but you can kind of do similar
things with pip and docker
right so
especially with the web development
community that part of it again is this
is the there's a lot of different kind
of developers in the python ecosystem
and
there's still a lack of of some clear
understanding i go to the python
conference all the time and there's only
a few people in the pipea who get it and
then others who are just massively
trumpeting the power of pitt but just do
not understand the problem yeah so one
of the obvious things to me from a mom
from a non-programmer perspective is the
across
operating system usability that's much
more natural so yeah they use
windows and just it seems much easier to
uh to recommend conda there but then it
you should also recommend it across the
board so i'll i'll definitely surf but
what i recommend now is a hybrid i do i
mean i have no problem is it possible to
use oh it is it is what i like build the
environment with pip with conda build an
environment with conda
and then pip install on top of that
that's fine be careful about pip
installing
opencv or tensorflow or because if
somebody's allowed that it's going to be
most surely done in a way that can't be
updated that easily so install like the
big
packages the infrastructure yeah
and then the weirdos yeah the the like
the weird like implementation for some
uh i had a there's a cool
library i used
that
based on your location
and time of day and date tells you the
exact position of the sun relative
to the earth and it's just like a simple
library but yeah it's very precise and i
was like all right but that was
that was uh in this episode the thing
they did really well is python
developers who want to get their stuff
published they you have to have a pip
recipe yeah right i mean even if it's
you know the challenge is there's a key
thing that needs to be added to pip just
simply add the pip the ability to defer
to a system package manager like because
it's you know recognize you're not going
to solve all the dependency problem
so let like give up and allow the allow
system packager to work that way
anaconda's installed and it has pip it
would default to conda to install this
stuff but red hat rpm would default to
rpm to install but it's all more things
like that's the that's a key
not difficult but somewhat where some
work feature needs to be added that's an
example of something like i've known we
need to do it i mean it's where i wish i
had more money i wish i was more
successful in the in the business side
trying to get there but i wish my you
know my family friends in full community
that i know was larger was larger and
had more money because i know tons of
things to do effectively with more
resources
but
you know i have not yet been successful
with channel
tons of it you know some you know i'm
happy with what we've done
we we've created again at quansite
what we created to get anaconda started
we created analytics and it kind of
started done it again with quansite
super excited by that by the way it took
three years to do it what is kwan site
what is its mission yeah we've talked a
few times about different fascinating
aspects of it but let's like big picture
what is big picture quan site kwansai is
uh it's mission is to connect data to an
open economy so it's basically
consulting the pi data ecosystem right
it's a consulting company and what i've
said when i started it was we're trying
to create products people and technology
so it's divided into two groups and a
third one as well the two groups are a
consulting services company that just
helps people do data science and data
engineering and data management
uh better and more efficient full stack
like full stock science full thing will
help you build a infrastructure if
you're using jupiter we need we do staff
augmentation need more programmers help
use das more effectively help us gpus
more effectively just basically a lot of
people need help so we do training as
well to help people you know both uh
immediate help and then get get learn
from somebody uh we've added a bunch of
stuff too we've kind of separated some
of these other things into another
company called open teams that we
currently started one things i loved
what we did at anaconda was creating a
community innovation team and so i want
to replicate that this time we did a lot
of innovation in anaconda i wanted to do
innovation but also
contribute to the projects that existed
like create a place where maintainers so
the scipy and numpy and number and all
these projects we already started can
pay people to work on them and keep them
going so that's labs quonsite labs is a
separate organization it's a non-profit
mission
the profits of quansite help fund it in
fact every project that we
have at quansite a portion of the money
goes directly to quansite labs to help
keep it funded so we've gotten several
mechanisms we keep quansite labs funded
and currently so i'm really excited
about labs because it's been a mission
for a long time what kind of projects
are within labs so labs is working to
make the software better like make numpy
better make scipy better make it it's it
only works on open source
so
you know if somebody wants to so you
know companies do we have a thing called
the community work order we call it if a
company says i want to make spider
better okay cool
um you can pay for a month of
a developer of spider
or developer of numpy or developer of
scipy you're not you can't tell them
what you want them to do you can give
them your priorities and things you wish
you wish existed and they'll work
on those priorities with the community
to get what the community wants and what
emerges what the community wants is
there some aspect on the consulting side
that is helping
as we were talking about morphology and
so on is there specific applications
that are particularly like
driving sort of inspiring the need for
updates to science correct absolutely
absolutely gpus are absolutely one of
those and new hardware beyond gpus i
mean
tesla's dojo chip i'm hoping we'll have
a chance to work on that perhaps
um
things like that are definitely driving
it the other thing is driving is
scalable like speed and scale
uh how do i write numpy code or numpy
like code if i want it to run across a
cluster you know that's das or maybe
it's ray i mean there's sort of ways to
do that now or there's moden and there's
so pandas code numpy code sci-fi code
second learn code that i want to scale
so that's one big area have you gotten a
chance to chat with andre and elon about
partic because like no i would love to
by the way i have not but i'd love to i
just saw their tesla ai days uh video
yeah super exciting so that's one of the
you know i love great engineering
software engineering teams and
engineering teams in general and they're
doing a lot of incredible stuff with
python they're like they are
revolutionizing so many aspects of the
machine learning pipeline i agree that's
operating in the real world and so much
of that is python like you said the guy
running you know andre capati running
autopilot is tweeting about optimization
of
in fact we have at quonset we've been
fortunate enough to work with facebook
on pytorch directly so we have about 13
developers at quonsite some of them are
in labs working directly on pi torch on
torch right so i basically started
quantity i went to both tensorflow and
pi torch and said hey i want to help
connect what you're doing to the broader
sci-fi ecosystem because i see what
you're doing we have this bigger mission
we want to make sure we don't you know
lose energy here so uh and facebook
responded really positively and
i didn't get the same reaction not yet
not yet i love the folks
so i really love the folks tensorflow
too they're fantastic i think it's the
just
how it integrates with their business i
mean like i said there's a lot of
reasons just the timing the integration
with their business what they're looking
for they're probably looking for more
users and i was looking to kind of have
some development effort and they
couldn't receive that as easily i think
so i'm hoping i'm really hopeful uh and
love love the people there
what's the idea behind open teams so
open teams i'm super excited about open
teams because it's one of the
i mentioned my idea for investing
directly in open source so that's a
concept called ferro ss but one of the
things we when we started quansite we
knew we would do is we developed
products and ideas and new companies
might come out at anaconda this was
clear right anaconda we did so much
innovation that like five or six
companies could have come out of that
and we just didn't structure it so they
could but in fact they have you look at
das
there's two companies coming on a desk
you know bokeh could be a company
there's like lots of companies that
could exist off the work we did there
and so i thought oh here's a recipe for
an incubation
a concept that we could actually spawn
new companies and new new innovations
then the idea has always been
well money they earn should come back to
fund the open source project so so labs
is you know i think there should be a
lot of things like quansite labs i think
this concept is one that scales you
could have a lot of open source research
labs
along the way so in 2018 when the bigger
idea came how to make open source
investors i said oh i need to write i
need to create a venture fund so we
created a venture fund called quonset
initiate at the same time it's an angel
fund really it's you know we started to
learn that process how do we actually do
this how do we get lp's how do we
actually go in this direction and build
a fund
and i'm like every venture fund should
have an associate open source research
lab there's just no reason like our
venture fund
the carried interest
portion of it goes to the lab it's it
directly will fund the lab that's
fascinating brother so you use the power
of the organic formation of teams in the
open source community and then like
naturally
that leads to a business that can make
there are so many yeah correct it always
maintains and loops back to the opening
looks back to open source exactly it's a
natural fit there's something there's
there's absolutely a repeatable pattern
there and it's also beneficial because
oh i have i have natural connections to
the open source if i have an open source
research lab like
they'll always they'll be out there
talking to people and so
we've we've had a chance to talk to a
lot of early stage companies and we in
our fund focus on the early stages so
kwan site has the services the lab the
fund right in that process a lot of
stuff started to happen like oh you know
we started to do recruiting and support
and training and i was starting to build
a bigger sales team and marketing team
and people besides just developers and
one of the challenges with that is you
end up with different cultural aspects
you know developers
you know there's a
in any company you go to you kind of go
look is this a business led company a
developer led company do they kind of
co-exist how are they what's the
interface between them there's always a
bit of attention there like we were
talking about before you know what is
the tension there with open teams i
thought wait a minute we can actually
just create like this concept of
quantity plus labs it's well worth while
it's specific to the pi data ecosystem
the concept is general for all open
source so open teams emerge as oh we can
create a business development company
for
many many quant sites like thousands of
kwan sites and it can be a marketplace
to connect essentially be the enterprise
software company of the future if you
look at what enterprise software wants
from the customer side and during this
journey i've had the chance to work and
and sell to lots of companies exxon and
shell and jv morgan bank of america like
the fortune 100
and talk to a lot of people in
procurement and see what are they buying
and why are they buying so you know i
don't know everything but i've learned a
lot about oh what are they really
looking for and they're looking for
solutions
they're constantly given products from
the from enterprise software
here's open source lead enterprise
software now i buy it and they have to
stitch it together into a solution
open source is fantastic for gluing
those solutions together
so
whereas they keep getting new platforms
they're trying to buy what most open
source what most
enterprises want is tools that they can
customize that that are as inexpensive
as they can yeah and so you almost want
to maintain the connection to the open
source because that's yes so open teams
about solving enterprise software
problems brilliant brilliant idea by the
way with a connect but we do it honoring
the topology we don't hire all the
people we are a network connecting the
sales energy and the in the procurement
energy and we we were on the business
side get the deals closed and then have
a network of partners like quansite and
others who we hand the deals to right to
actually do the work and then we off we
then we have to maintain i feel like we
have to maintain some level of quality
control so the client can rely on open
teams to ensure the deliveries it's not
just here's a lead go figure that out
but no we're going to make sure you get
what you need
right by the way it's such a skill and i
don't know if i have the patience i will
have the patience to talk to the
business people
or more specific i mean there's all
kinds of flavors of business people or
they're like yeah
marketing people
there's a challenge i hear what you're
saying because i've had the same
challenge yeah and it's true there's
sometimes you think okay this is wait
this is way overwrought yeah so you have
to become an adult you have to because
the the companies have needs they have
ways to make money and they and they
also want to learn and grow and yet it's
your job to kind of educate them on the
best way like the value of open source
for example right and i'm really
grateful for all my experiences over the
past 14 years understanding that side of
it and still learning for sure but not
just understanding from companies but
also dealing with marketing
professionals and sales professionals
and people that make a career out of
that understanding what they're thinking
about and also understanding what let's
make this better like we can really make
a place like open teams i see as the
transmission layer between companies and
open source communities
producing enterprise software solutions
like eventually we want to like today
we're taking on sas and matlab and tools
that we know we can replace for folks
really anytime you have a software tool
an organization
where you have to do a lot of
customization or make it work for you
because now you're just buying this
thing off the shelf and it works it's
like okay you buy the system then you
customize a lot usually with expensive
consultants to actually make it work for
you
all of those should be replaced by open
source foundations with the same
customers
such important work such important work
in these giant organizations they're
doing exactly that taking some
proprietary software and hiring a huge
team of consultants that customize it
and then that's whole thing gets
outdated quick correct and so i mean
that that's brilliant right the one one
solution to that
is how it would like kind of what
tesla's doing a little bit of which is
basically build up a software
engineering team yeah like build a team
from scratch from scratch and companies
are doing it well that's what they're
doing right now yeah right exactly
that's okay and you're creating an
apology for some of that like right you
just don't have to do it that's not the
only answer yeah right and so other
companies can access this new more
flexible we really that's really really
say open team is the future of
enterprise software
um it's we're still early like this idea
just percolated over the past year as
we've kind of grown quan sight and
realize the extensibility of it uh we
just finished in our seed round uh the
work to help you know get more sales
people and then push the push the
messaging correctly and there's lots of
tools we're building to make this easier
like we want to automate the processes
we feel like a lot of the power is the
efficiency of the sales process there's
a lot of wasted energy
in small teams
and the sales energy to get into large
companies and make a deal there's a lot
of money spent on that stuff creating
the tools and processes
makes that super seamless so a single
company can go oh i've got my contract
with open teams we've got a subscription
they can get they can make that
procurement seamless and then the fact
they have access to the entire open
source ecosystem and we have a you know
so we have a part of our work that's
embracing open source ecosystems and
making sure we're doing things useful
for them we're serving them and then
companies making sure they're getting
solutions they care about and then
figuring out which which targets we have
you know yeah we're we're not taking on
all of open source all of enterprise
software yet but but we're well this
feels like the future the idea and the
vision is brilliant uh can ask you uh
why do you think microsoft bought github
and what do you think is the future
great point i thought it was a brilliant
move i think they did because microsoft
has always had a developer centric
culture
like they always have like one things
microsoft's always done well is
understand that their power is
developers right it's been
balmer didn't necessarily make a make a
good meme about how he approached that
but
they're broadening that i think that's
why because they recognize github is
where developers are at right and so but
do they have a vision like open teams
type of situation right so yeah
are they just basically throwing money
at developers to show their support i
think so without uh a topology like you
put it like a a a way to leverage that
like to give developers actual money
right i don't i don't think so i think
they're still it's an enterprise
software company and they make a bunch
of money they make a bunch of games they
have a big they're a big company they
sell products i think part of it is they
know there's opportunity to make money
from github right there's definitely a
business there you know uh to sell to
developers or to sell to people using
development i think there's part of that
i think part of it is also there's they
had definitely wanted to recognize that
that that you need to value open source
to get great developers which is which
is an important concept that was
emerging over the past 10 years that you
know
pi data we were able to convince jp
morgan to support pi data because of
that fact right that was where the money
for them putting a couple hundred
thousand into supporting pi data for uh
several conferences was they want
developers and they realized that
developers want to participate in open
source so enterprise software folks
don't always understand how their
software gets used having spent a lot of
time on the floors at jpmorgan at in
shell at exxon mobil you see oh these
companies have large development teams
and and then you're they're kind of
dealing with the what's being delivered
to them so i really feel kind of a
privilege that i had a chance to learn
some of these people and see what
they're doing and and even work
alongside them uh you know as a
consultant uh
using my using open source and trying
how do we make this work inside of our
large organization some of it is
actually for a large organization some
of it is messaging to the world that you
care about developers and you're the
cool yep you're you care like for
example like if ford because i talked to
them like car companies right they
they want to attract you know you want
to take on tesla and autopilot you want
to take that right and so
what do you do there you show that
you're cool like you sh you try to show
off that you care about developers and
they have a lot of trouble doing that
and like one way i think like ford
should have bought github they just to
show off better yeah yeah like these old
school companies and it's in a lot of
good point a lot of different industries
there's probably different ways it's
probably an article
that you care to developers and the
developers it's it's it's exactly what
you like
for example
just spitballing here but like ford or
somebody like that could give
a hundred million dollars to the
development of numpy and
uh like like literally look at like the
top most popular projects in python
and just say
we're just gonna give money right like
that's gonna immediately make you cool
they could actually yeah and in fact
they set up num focus to make it easy
yeah but the challenge was is also you
have to have some business development
like it's a bit of a it's a bit of a
seating problem right and you look at
how i've talked to the folks at linux
foundation know how they're doing it i
know how
in starting num focus because
we had two babies in 2012 one was
anaconda one was num focus right and
they were both important efforts they
had distinct journeys and super grateful
that both existed and still grateful
both exist um
but there's different energies in
getting donations
as there is getting
um this is important to my business like
i'm selling you something that
this is not a it's a salt is i'm gonna
make money this way if you can tie it if
you can tie the message to an roi for
the company it becomes more effective
it's much more effective right so
and there are rational arguments to make
i've tried to have conversations with
mark especially marketing departments
like very early on it was clear to me
that
oh you could just take a fraction of
your marketing budget and just spend it
on open source development and you get
better results from your marketing
like because how do those can i sorry
i'm gonna try not to go
here what have you learned from the
interaction with the marketing folks on
that kind of because
you gave a great example of something
that will obviously be much better
investment in terms of marketing is
supporting open source projects the
challenge is not dissimilar from the
challenge you have in academia at the
different colleges right knowledge gets
very specific and very channeled right
and so people get they get a lot of
learning in the thing they know about
and it's hard then to bridge that and to
get them to think differently enough to
have a sense that you might have
something to offer because it's
different it's like well how do i
implement that how do i what do i do
with that like do i which budget do i
take from do i slow down my spend on
google ads or am i spent on facebook ads
or do i not hire a content creator and
say like like there's an operational
aspect to that that some that you have
to be the cmo
right or the ceo you have to get the
right level so you have to hire a high
position level right because they care
about this in this right or they won't
know how right right and and because you
can also do it very clumsily yeah right
and i've seen it because you can you
actually have to honor and recognize the
the the people you're going to and the
fact that if you just throw money at
them it could actually create more
problems can i just say this is not you
saying can i just because i just need i
need i need to say this
i've been very surprised
how often marketing people are terrible
at marketing
i feel like the best marketing is doing
something novel and unique that
anticipates the future
it feels like so much of the marketing
practice
is like what they took in school or
maybe they're studying for what was the
best thing that was done in the past
decade
and they're just repeating that over and
over as opposed to innovating like
taking the risk to me marketing a great
point is taking the big risk that's a
great being the first one to risk yeah
and there's an aspect of data
observation from that risk right that's
that's that's you i think because shared
what they're doing already but it
absolutely it's it's about i think it's
content like there's this whole world on
content marketing that you could almost
say well yeah
it can get over you can get you can get
inundated with stuff that's not relevant
to you whereas what you're saying
would be highly relevant and highly
useful and highly highly beneficial yeah
but it's it's a it's risky i mean that's
why sort of uh there's a lot of
innovative ways of doing that test is an
example of people that basically don't
do marketing uh they do marketing in a
very like
it's like elon hired a person who's just
good at twitter for running tesla's
twitter account
you know right right that's exactly what
you want to be doing you want to be
constantly innovating in uh right
there's an aspect of telling i mean i've
i've definitely seen
people doing great work where you're not
talking about it like i would say that's
actually a problem i have right now with
quansite labs once the lab's been doing
amazing work really excited about it we
have not been talking about it enough we
haven't been and there's different ways
to talk about it there's different ways
to there's different channels to wish to
communicate right there's also
like
i'll just throw some um shade at
companies i love
uh so for example irobot i just had a
conversation with them they make roombas
sure and uh they i think i love the
incredible robots but like every time
they do commerci like advertisements not
advertisement but like marketing type
stuff it just looks so corporate
and to me the incredible maybe wrong in
the case of irobot i don't know but to
me when you're talking about engineering
systems
it's really nice to show off the magic
of the engineering and the software and
all the all the geniuses behind this
product and the tinkering and like the
raw authenticity of what it takes to
build that system versus uh the
marketing people who want to have like
pretty people like standing there all
pretty with the robots like moving
perfectly so i to me there's some aspect
it's like speaking to the hackers you
have to throw some
uh bones some
some care towards the the engineers the
developers because
there's some aspect one for the hiring
but two there's an authenticity to that
kind of communication that's really
inspiring to the end user as well like
if they know that brilliant people the
best in the world are working your
company they start to believe that that
product that you're creating is really
it's interesting because your initial
reaction would be wait there's different
users here why would you do that to you
know my wife bought a rumba rumba but
she and she you know loves developers
she loves me but she doesn't
care about
that yeah
so essentially what you said is actually
the authenticity because everyone has a
friend everyone knows people there's
word of mouth i mean if you worded my
mouth is so so yeah exactly
because i think it's the lack of that
that realization there's this halo
effect right and also influences your
general marketing i interesting for some
stupid reason i do have a platform and
it seems that the reason i have a
platform many others like me millions of
others is like the authenticity and like
we get excited naturally about stuff
yeah and like i don't want to get
excited about that irobot video because
it's as boring as marketing as corporate
as opposed to
i wanted to do some fun this is this is
me like a shout out to irobot is they're
not letting me get into the robot
yeah well there's an aspect of that they
could be benefiting from a from a
culture of modern modularity like
add-ons and right like that could
actually dramatically help if you've
seen that over history i mean apple is
an example of a company like that or or
the like i i can see i can see what your
point is is that you have something that
needs to be
it needs to be adopted broadly the
concept needs to be adopted broadly and
if you want to go beyond this one device
you need
to engage this community yeah and
connecting to the the open source as you
said i got to ask you
your programmer
uh one of the most impactful programmers
ever
you've led many programmers you lead
many programmers
what are some from a programmer
perspective
what makes a good programmer what makes
a productive programmer is there a
device you can give to be a great
programmer great great question and
there are times in my life i'd probably
answer this even better than i hope
maybe give an answer today because i've
thought about this at numerous times
like right now i've spent on so much
time recently hiring sales people that
your mind is like
something else on something else but i i
reflected on the past and also uh you
know i have some really the only way i
can do this i have some really great
programmers that i work with who lead
the teams that they they lead and my
goal is to inspire them and and
hopefully help them encourage them and
be uh help them encourage with their
teams i would say there's a number of
things couple things one is um curiosity
like you i think a programmer without
curiosity
is uh mundane like you'll lose interest
you don't do your best work so it's sort
of it's an affect it's sort of are you
you have some curiosity about things
i think two don't try to do everything
at once recognize that you're you know
we're limited as humans you're limited
as a human and each one of us are
limited in different ways you know we
all have our different strengths and
skills so it's adapting the art of
programming to your skills one of the
things that always works is to limit
what you're trying to solve
right so um
if you're part of a team usually maybe
somebody else has put the architecture
together and they've gotten given a
portion for you if you if you're young
if you're not part of a team is sort of
breaking down the problem into smaller
parts is essential for you to make
progress it's very easy to take on a big
project and try to do it all at once and
you get lost and then you do it badly
and so
thinking about you know um very
concretely what you're doing to find you
know defining the inputs and outputs to
finding what you want to get done
um even just talking about that in like
writing down
before you write code just what are you
trying to accomplish and being very
specific about it
uh really really helps i think um
using other people's work
right
don't don't be afraid that somehow
you're
like you should do it all like
nobody does stand on the shoulders
of giants
but but but don't just copy and paste
that's particularly relevant in the era
of codex and the uh you know the
auto-generated code which is essentially
i see as an indexing of stack overflow
right exactly secondly it's like it's a
search engine it's a search engine over
stack overflow basically so it's not i
mean we've had this for a while yeah but
really you want to cut and paste but but
but not blindly like
absolutely i've cut and paste to
understand but then you understand oh
this is what this means oh this is what
it's doing and understanding as much as
you can so it's critical that's where
the curiosity comes in if you're just
blindly cutting and pasting you're not
going to understand and so understand
and then you know
be
uh be sensitive to hype cycles
right every for every view often there's
always a oh test driven development is
the answer oh object oriented is the
answer oh
there's always an answer you know agile
is the answer
be cautious of jumping onto a hype cycle
like likely there's signal like there's
a thing there that's that's actually
valuable you can learn from but it's
almost certainly not the answer to
everything you need what lessons do you
draw from you having created numpy and
scipy like
in service of sort of answering the
question of what it takes to be a great
programmer and giving advice to people
yeah how can you be the next person to
create a cyborg yeah so one is listen
to listen to who uh to uh to people that
have a problem
right which is everybody right but
but listen and listen to many and and
try to uh then do
like don't you you're gonna have to do
an experiment you know do fall down
don't be afraid to fall down don't be
afraid you're the first thing you do is
probably going to suck and that's okay
right
it's it's honestly i think iteration is
the key to innovation
and and it's that it's almost that
psychological um
hesitation we have to just
uh iterate like yeah we know you you
know it's not great but next i want to
be better i mean just just keep learning
and keep breathing and keep improving so
it's it's an attitude um and then
it doesn't take intense concentration
right good things don't happen just
it's not quite like tick tock or like
facebook you know you can't scroll your
way to good programming right there are
you know sincere like hours of
of deep don't be afraid of the deep
problem like often people will run away
from something because oh i can't solve
this and it you might be right but give
it an hour give it a couple of hours and
see
and you know just um five minutes not
gonna not gonna give you that was it
lonely when you were building sci-pod
and number hugely yeah absolutely lonely
in the sense of
you have to have an inner drive
and that inner drive for me always comes
from i have to i have to see that this
is right
in some angle i have to believe it that
this is the right approach the right
thing to do
with scipy it was like oh yeah the world
needs
libraries in python clearly python is
popular enough with enough influential
people that to start and it needs more
libraries
so that is a good in itself so i'm going
to go do that good so find a good find a
thing that you know is good and just
just work on it um so that has to happen
and it is and you kind of have to have
enough realization of your mission
to be okay with the naysayer or the fact
that notably joins you up front in fact
one thing i've talked to people a lot
i've seen a lot of projects come and
some fail like not everything i've done
has actually worked perfectly i've tried
a bunch of stuff that okay that didn't
really work or this isn't working and
why
but you see the patterns and one of the
key things is
you can't even know
for
six months i say 18 months right now if
you're starting a new project you got to
give it a good 18 month run before you
even know if the feedback's there like
it's you're not gonna know in six months
you might have the perfect thing but six
months from now it's still kind of still
emerging so give it time because you're
dealing with humans and humans have an
inertial
inertia energy that just doesn't change
that quickly
so let me ask a silly question but uh
you know
like you said you're focused on the
sales side of things currently but
you know back when you're actively
programming maybe in the 90s you talked
about ids
what's your
a setup that you have that brings you
joy keyboard number of screens yeah well
linux i do still like to program
somebody's not as much as i used to i
have two projects i'm super interested
in
trying to find funding for them trying
to figure out some good teams for them
but i could talk about those um but what
i yeah what get i'm an emacs guy great
thank you
the superior editor everybody i've got
i've i don't often delete tweets but one
of the tweets i deleted when i said
emacs was better than vim and then the
hate i got it is i was like i'm walking
away from this
i do too i i don't push it i mean i'm
not i'm just joking of course yeah
exactly it's kind of like but people do
take the editor seriously they take it i
did it with your life
it is
but uh there's something there's
something beautiful to me about emacs
but there's for people that love them
there's something beautiful to them i
mean i do use them for quick editing
like command line if i say quick editing
i will still sometimes use it but not
much like it's simple corrective
correctness single edited character so
when you were developing sci-fi you were
using ebx yep siphon numpy are already
in an e-max on a linux box
and uh
cvs and then svn version control git
came later like git has i love
distributed branch stuff i think git is
pretty complicated but i love the
concept and uh also of course github is
uh and then git lab make git
definitely consumable
um but that came later did you ever
touch lisp at all like what were you
what yeah emotional feelings about the
parenthesis great question so i find
myself appreciating this today much more
than i did early because when i came to
programming i knew programming but i was
a domain expert right and to me the
parentheses were in the way
it's like wow
it's just all this like it just gets in
the way of my thinking about what i'm
doing so why would i have all these
right
um that was my initial reaction to it uh
you know now as i appreciate kind of the
structure that kind of naturally maps to
the to logical thinking about a program
i can appreciate them right and why it's
actually you could you you could create
editors that make it not so
problematic right honestly um yeah so i
actually have a much more appreciation
of lisp and things like closure and
there's hyvee which is a python you know
a list that compiles the python byte
code
um i think it's challenging like
typically
these languages are you know i even saw
a whole data science programming system
in lisp that somebody created which is
you know cool but again it's the i think
it's the lack of recognition of the fact
that there exists what i call occasional
programmers yes people are never going
to be programmers for a living they
don't want to have all this the cuteness
in their head they want just
it's why basic you know microsoft had
the right idea with basic in terms of
having that be the language of
visual basic the language of excel and
um sql sql server they should have
converted that to python 10 years ago
but the world would be a better place
that they had but
there's also uh there's a beauty and a
magic to the history behind a language
and lisp you know some of the most
interesting people in in the history of
computer science and artificial
intelligence have used lisp so yes you
feel well it's back to that language
when you when you have a language you
can think in it yeah and it helps you
think about it attracts a certain kinds
of people that think a certain kind of
way and then that's that's there okay so
what about like small laptop with a tiny
keyboard or is there like
a screen you know good question i've
never gotten into the many screens to be
honest i mean and
maybe it's because in my head i kind of
just i just swap between windows
like
partly because i guess i i really can't
process three screens at once anyway
like
i just i'm looking at one and i just
flip you know i flip an application open
so
where it's really helpful is actually
when i'm trying to you know here's data
and i want to input it from here right
that's the only time i really need
another screen so now because you're
both a developer lead developers but
then there's also these businesses and
there's sales people you're working with
large companies operations people hiring
people yeah the whole thing which
operating system is your favorites
though at this at this point so linux
was the earliest so yeah i love love
linux as a as a server side and it was
early days i was had my own linux
desktop um i've been on mac laptops for
10 years now yeah that this is what
leadership looks like
as you switch to mac
okay great yeah pretty much i mean
just the fact that i had to do
powerpoints i had to do presentations
and you know plug in i just couldn't
mess with plugging in laptops it
wouldn't project and so uh you mentioned
also quant labs and things like that uh
can you give advice on how to hire great
programmers
and great people yeah i would say
produce an open source project
yeah get people contributing to it and
hire those people yeah
i mean you're doing it sort of uh
you may be perhaps a little biased but
that's probably 100
really good advice
i find it hard to hire i still find it
hard to hire like
in terms of i don't think like it's not
hard to hire if i've worked with
somebody for a couple of weeks
but a cup an hour or two of interviews i
i have no idea so that
instinct that radar of knowing if you're
good or not that you've you found that
you're still not able to it's really
hard i mean the resume can help but
again the resume is like a presentation
of the things they want you to see not
the reality of
of and there's also um
you know you have to understand what
you're hiring for there are different
stages and different kinds of skills and
so it isn't just
one of the things i talk a lot about
internally at my companies is that the
whole idea of measuring ourselves
against the unit a single axis is flawed
because we're not it's a
multi-dimensional space and how do you
order a multi-dimensional space there
isn't one ordering so this whole idea
you you immediately have projected into
a thing when you're talking about hiring
or best or worst or better or not better
so what is the thing you're actually
needing
and you can even hire for that there is
such a thing generally i really value
people who have the affect
they care about open source like so in
some cases their affinity to open source
is simply a kind of a filter of an
affect
however i have found this interesting
dichotomy between
open source contributors and
product creation
there's i don't know if it's fully true
but there does seem to be the more
uh the more experience the more affect
somebody has an open source community
the less ability to actually produce
product that they have
but the other one's kind of true too the
more product focused star i find a lot
of people talk to a lot of people who
produce really great products and they
they have a they're looking over the
open source communities kind of wanting
to participate and play but they've
played here
and they do a great job here and then
they
don't necessarily have some of the same
i don't think that i don't think that's
entirely necessary i think part of it is
cultural how that's how they've emerged
except because one of the things that
open source communities often lack is
great product management like some
product management energy that's
brilliant but you want both of those
energies in a second place together yes
you really do and so it's a lot of it's
creating these teams of people that have
these needed skills and attributes that
are
hard and so
so one of the big things i look for is
somebody that fundamentally recognizes
their need to learn to learn like one of
the values that we we have and all of
the things we do is learning like
if somebody thinks they know it all
they're going to struggle and some of
that is just
there's more basic things like humility
just
being humble in the face of all the
things you don't know and that's like
step one of learning that's step one of
learning right and you know i've spent a
lot of time learning right other people
spend a lot more time but i've spent a
lot of time learning i mean my whole
goal was to get a phd because i loved
school and i wanted to be a scientist
and then
what i found is what's been written
about elsewhere as well is the more i
learned the more i didn't know the more
i realized man i i
i know about this
but this is such a tiny thing in the
global scope of what i might want to
know about so
i need to be listening a whole lot
better than than i am just
talking
that's changed a little bit actually my
wife says that i used to be a better
listener now that i have i'm so full of
all these ideas i want to do she kind of
says you got to give people time to talk
so you you've uh succeeded on multiple
dimensions so one is the
10-year track faculty
uh the others just creating all these
products then building up the businesses
then working with businesses
uh do you have advice for young people
today in high school in college of how
to
live a life
as a non-linear and as successful as
yours
a life that could be uh they could be
proud of well like that's that's a super
compliment i'm humbled by that actually
i i would say
a life they can be proud of honestly one
thing i've said to people is
first um find people you love and care
about them
like family matters to me a lot and
family means people you love and have
committed to
right so it can be whatever you you mean
by that but it's it you need to have a
foundation
uh so find people you love and want to
commit to and do that
um because it anchors you in a way that
nothing else can
right and then
and then you find other things and then
kind of from out there you find other
kinds of things you can commit to
whether it's ideas or or people or
groups of people
um so you know especially in high school
i would say don't settle on what you
think you know
right
give yourself 10 years to think about
the world like there's i see a lot of
high school students who who seem to
know everything already i i think i did
too i think it's maybe natural but
but recognize that
the things you care about you might
change your perspective over time
i certainly have over time as senator
you know i was really passionate about
one specific thing and i was kind of
softened you know i was a big um i
didn't like the federal reserve right
and
there's still we can have a longer
conversation about monetary policy and
finances but
but
i'm a little more uh nuanced in my in my
perspective at this point
um
but you know that's that's one area
where you learn about someone going i
want to attack it you know
build don't destroy
like build like someone so often the
tendency is to not like something they
want to go attack it
build something build some to replace it
yeah build up you know attract people to
your new thing
you'll get you'll be far more far better
right you don't need to destroy
something to build something else
um so that's i guess generally
uh and then you know definitely
uh like curiosity you know follow your
curiosity
and and let it um
don't just follow the money
and all of that like you said is
grounded in um
family friendship and ultimately love
yes
which is uh
a great way to end it travis you're one
of the most impactful people in the
engineering computer science in the
human world so i truly appreciate
everything you've done
and i really appreciate that you would
spend your valuable time with me it was
an honor it was a real pleasure for me i
appreciate that thanks for listening to
this conversation with travis oliphant
to support this podcast please check out
our sponsors in the description
and now let me leave you with something
that in the programming world is called
hodgson's law
every sufficiently advanced lisp
application will eventually be
re-implemented in python
thank you for listening and hope to see
you next time
you