Transcript
pdJQ8iVTwj8 • Chris Lattner: Future of Programming and AI | Lex Fridman Podcast #381
/home/itcorpmy/itcorp.my.id/harry/yt_channel/out/lexfridman/.shards/text-0001.zst#text/0731_pdJQ8iVTwj8.txt
Kind: captions
Language: en
on one access you have more Hardware
coming in on the other hand you have an
explosion of innovation and Ai and so
what happened with both tensorflow and
pytorch is that the explosion of
innovation in AI has led to it's not
just about multiplication and
convolution these things have now like 2
000 different operators
and on the other hand you have I don't
know how many pieces of Hardware there
are there it's a lot part of my thesis
part of my belief of where Computing
goes if you look out 10 years from now
it's not going to get simpler
physics isn't going back to where we
came from it's only going to get weirder
from here on out right and so to me the
exciting part about what we're building
is it's about building that Universal
platform which the world can continue to
get weird because again I don't think
it's avoidable it's physics but we can
help lift people's scale do things with
it and they don't have to rewrite their
code every time a new device comes out
and I think that's pretty cool
the following is a conversation with
Chris Ladner his third time on this
podcast as I've said many times before
he's one of the most brilliant engineers
in modern Computing having created llm
compiler infrastructure project the clan
compiler the Swift programming language
a lot of key contributions to tensorflow
and tpus as part of Google he served as
vice president of autopilot software at
Tesla was a software innovator and
leader at Apple and now he co-created a
new Full stack AI infrastructure for
distributed training inference and
deployment on all kinds of Hardware
called modular and a new programming
language called Mojo that is a superset
of python giving you all the usability
of python but with the performance of C
C plus plus in many cases Mojo code has
demonstrated over 30
000 X speed up over python if you love
machine learning if you love python you
should definitely give Mojo a try this
programming language this new AI
framework and infrastructure and this
conversation with Chris is mind-blowing
I love it
it gets pretty technical at times so I
hope you hang on for the ride this is
the Lex Friedman podcast to support it
please check out our sponsors in the
description and now it's your friends
here's Chris lattner
it's been I think two years since we
last talked and then in that time you
somehow went and co-created a new
programming language called Mojo So it's
optimized for AI it's a super set of
python let's look at the big picture
what is the vision for Mojo for Mojo
well so I mean I think you have to zoom
out so I've been working on a lot of
related Technologies for many many years
so I've worked on llvm and a lot of
things and mobile and servers and things
like this
but the world's changing and what's
happened with AI is we have new gpus and
new
machine learning accelerators and other
Asics and things like that that make ai
go real fast at Google I worked on tpus
that's one of the biggest larger scale
deployed systems that exist for AI and
really what you see is if you look
across all of the things that are
happening in the industry there's this
new compute platform coming and it's not
just about
CPUs or gpus or tpus or npus or ipus or
whatever all the pus right it's about
how do we program these things
right and so for software folks like us
right it doesn't do us any good if
there's this amazing Hardware that we
can't use
and one of the things you find out
really quick is that having the
theoretical capability of programming
something and then having the world's
power and the innovation of all the all
the smart people in the world get
Unleashed on something can be quite
different and so really where Mojo came
from was starting from a problem of we
need to be able to take machine learning
take the infrastructure underneath it
and make it way more accessible way more
usable way more understandable by normal
people and researchers and other folks
that are not themselves like experts in
gpus and things like this and then
through that Journey we realized hey we
need Syntax for this we need to do a
programming language so one one of the
the main features of the language I say
so fully ingest is that it allows you to
have the file extension to be uh an
emoji or the fire Emoji which is
one of the first
emojis used as a file extension I've
ever seen in my life and then you ask
yourself the question why in the 21st
century we're not using Unicode for file
extensions does that mean it's an epic
decision I think clearly the most
important decision you made the most but
but you could also just use mojo as the
file extension well so okay so take a
step back I mean come on Max do you
think that the world's ready for this
this is a big moment in the world right
this is we'll release this onto the
world this is innovation
I mean it really is kind of brilliant
emojis is such a big part of our daily
lives why is it not in programming well
and and like you take a step back and
look look at what file extensions are
right they're basically metadata right
and so why are we spending all the
screen space on them and all the stuff
also you know you have them stacked up
next to text files and PDF files and
whatever else like if you're gonna do
something cool you want to stand out
right and emojis are colorful they're
visual they're they're beautiful right
what's been the response so far from uh
is is there support on like Windows on
the operating system in displaying like
file explorer yeah the one problem I've
seen is the git doesn't escape it right
and so it thinks that the fire Emoji is
unprintable and so it like prints out
weird hex things if you use the command
line git tool but everything else as far
as I'm aware works fine and I I have
faith that git can be improved so GitHub
is fine GitHub is fine yep GitHub is
fine Visual Studio code Windows like all
this stuff totally ready because people
have internationalization yeah in their
normal part of their past
so this is just like taking the next
step right
somewhere between oh wow that makes
sense cool I like new things too oh my
God you're killing my baby like what are
you talking about this can never be like
I can never handle this how am I going
to type this like all these things and
so this is something where I think that
the world will get there we don't have
to bet the whole Farm on this I think we
can provide both paths but I think it'll
be great uh when can we have emojis as
part of the code I wonder uh yeah so I
mean lots of languages provide that so
um I think that we have partial support
for that it's probably not fully done
yet but but yeah you can you can do that
for example in Swift you can do that for
sure so an example we give give it Apple
was the the dog cow yeah so that's a
classical Mac Heritage thing and so he's
the dog and the cow emoji together and
that could be your variable name but of
course the internet went and made pile
of poop for everything yeah so you know
if you want to name your function pile
of poop then you can totally go to town
and see how that gets through code
review
okay so uh let me just ask bunch of
random questions uh so is Mojo primarily
designed for AIS or is it a general
purpose programming yeah good question
so it's AI first and so AI is driving a
lot of the requirements and so
um modular is building and designing and
driving Mojo forward it's not because
it's an interesting project
theoretically to build it's because we
need it
that's what modular we're really
tackling the AI infrastructure landscape
and the big problems in Ai and the
reasons it is so difficult to use in
scale and adopt and deploy and like all
these big problems in Ai and so we're
coming out from that perspective now
when you do that when you start tackling
these problems you realize that the
solution to these problems isn't
actually an AI specific solution
and so while we're doing this we're
building Mojo to be a fully General
programming language and that means that
you can
obviously tackle gpus and CPUs and like
these AI things but it's also a really
great way to build
numpy and other things like that or you
know just if you look at what many
python libraries are today often they're
a layer of python for the API and they
end up being C and C plus plus code
underneath them that's very true in AI
That's True in lots of other domains as
well and so anytime you see this pattern
that's an opportunity for Mojo to help
simplify the world and help people have
one thing to optimize through
simplification
by having one thing so you mentioned
modular Mojo is the programming language
modular is the whole software stack so
just over a year ago we started this
company called modular yeah okay what
modular is about is it's about taking Ai
and up leveling it into the Next
Generation right and so if you take a
step back what's gone on in the last
five six seven eight years is that we've
had things like tensorflow and pytorch
and these other systems come in you've
used them you know this and what's
happened is these things have grown like
crazy they get tons of users it's in
production deployment scenarios it's
being used to power so many systems I
mean AIS all around us now now it used
to be controversial years ago but now
it's a thing but the challenge with
these systems is that they haven't
always been
um thought out with current demands in
mind and so you think about it when
where were llms eight years ago well
they didn't exist right AI has changed
so much and a lot of what people are
doing today are very different than when
these systems were built and meanwhile
the hardware side of this has gone into
a huge mess there's tons of new chips
and accelerators and every every big
company's announcing a new chip every
day it feels like and so between that
you have like this moving system on one
side a moving system on the other side
and it just turns into this gigantic
mess which makes it very difficult for
people to actually use AI particularly
in production deployment scenarios
that's what modular is doing is we're
helping build out that software stack to
help solve some of those problems so
then people can be more productive and
get more AI Research into production
now what Mojo does is it's a really
really really important piece of that
and so that is you know part of that
engine and part of the technology that
allows us to solve these problems so
Mojo is a programming language that
allows you to do a higher level
programming the low level programming
like do all kinds of programming
in that spectrum that gets you closer
and closer to the hardware so take step
back so let's what do you love about
python oh boy
where do I begin
um what is love what do I love about
python you're a guy who knows love I
know this yes
um
how intuitive it is
thank you
how it feels like I'm writing natural
language English
uh
how when I can not just write but read
other people's code somehow I can
understand it faster it's more
and condensed than other languages like
ones I'm really familiar with like C
plus plus and C uh there's a bunch of
sexy little features yeah uh we'll
probably talk about some of them but
list comprehensions and stuff like this
and don't forget the entire ecosystem of
all the Packers oh yeah there's probably
huge there's always something if you
want to do anything there's always a
package yeah so it's not just
the ecosystem of the packages and the
ecosystem of the humans that do it that
that's a really
that's an interesting dynamic because I
think something
about the the usability and the
ecosystem makes the thing viral it grows
and then it's a virtuous cycle I think
well there's many things that went into
that like so I think that ml was very
good for Python and so I think that
tensorflow and pytorch in these systems
embracing python really took and helped
python grow but I think that the major
thing underlying it is that Python's
like the universal connector right it
really helps bring together lots of
different systems so you can compose
them and build out larger systems
without having to understand how it
works but then what is the problem with
python
well I guess you could say several
things but probably that it's slow
I think that's usually what people
complain about right and so slow I mean
other people complain about tabs in
spaces versus curly braces or whatever
but I mean those people are just wrong
because it is actually just better to
use indentation
wow strong words so actually I'm a small
change let's actually take that let's
take all kinds of tangents oh come on
Lex you can push me on I could take nine
and decide listen I've recently left
emacs for vs code the kind of hate mail
I had to receive because on the way to
doing that I also said I've considered
Vim yep and uh chose not to and went
with vs code and especially on deep
religions right anyway uh tabs is an
interesting design decision and so
you've really written a new programming
language here yes it is a a super set of
python but you can make a bunch of
different interesting decisions here
totally yeah and you chose actually to
stick with python is a
uh in terms of some of the syntax well
so let me explain why right so
I mean you can explain this in many
rational ways I think that the
indentation is beautiful but that's not
a rational explanation right so but I
can defend it rationally right so first
of all python one
has millions of programmers yeah it is
huge it's everywhere it owns machine
learning right so factually it is the
thing right second of all if you look at
it C code C plus plus code Java whatever
Swift curly brace languages also run
through formatting tools and get
indented and so if they're not indented
correctly first of all we'll twist your
brain around it can lead to bugs there's
notorious bugs that have happened across
time where the indentation was wrong or
misleading and it wasn't formatted right
and so it turned into an issue right and
so what ends up happening in modern
large-scale code bases is people run
automatic formatters
so now what you end up with is
indentation and curly braces
well if you're going to have
you know the notion of grouping why not
have one thing right and get rid of all
the Clutter and have a more beautiful
thing right also you look at many of
these languages it's like okay well we
can have curly braces or you can omit
them if there's one statement or you
just like enter this entire world of
complicated design space that
objectively you don't need if you have
python style indentation so yeah I would
love to actually see statistics on
errors made because of indentation like
how many errors are made in python
versus in C plus plus that have to do
with basic formatting all that kind of
stuff I would love to see I think it's
it's probably pretty minor because once
you get uh like you use vs code I do too
so if you get vs code set up it does the
indentation for you generally right and
so you don't you know it's actually
really nice to not have to fight it and
then what you can see is the editors
telling you how your code will work by
indenting it which I think is pretty
cool I honestly don't think
I've ever I don't remember having an
error in Python because I indented stuff
wrong so I mean I think that there's
again this is a religious thing and so I
can joke about it and I love I love to
kind of
you know I realized that this is such a
polarizing thing and everyone wants to
argue about and so I like poking at the
bear a little bit right but but frankly
right come back to the first point
python one like it's huge it's an AI um
it's the right thing for us like we see
mojos being an incredible part of the
Python ecosystem we're not looking to
break python or change it or quote
unquote fix it we love python for what
it is our view is that python is just
not done yet
and so if you look at you know you
mentioned python being slow well there's
a couple of different things go into
that which we can talk about if you want
but one of them is it just doesn't have
those features that you would use to do
c like programming and so if you say
okay well I'm forced out of python into
C for certain use cases
well then what we're doing is we're
saying okay well why why is that can we
just add those features that are missing
from python back up to Mojo and then you
can have everything that's great about
python all the things you're talking
about that you love plus not be forced
out of it when you do something a little
bit more computationally intense or
weird or Hardware or whatever it is that
you're doing well a million questions I
want to ask what high level again is it
compiled or is it an interpretive
language so python is just in time
compilation what's what's Mojo
so Mojo a complicated answer does all
the things so it's interpreted it's Chip
compiled and it's statically compiled
um and so this is for a variety of
reasons so
one of the things that makes python
beautiful is that it's very Dynamic and
because it's Dynamic one of the things
they added is that it has this powerful
meta programming feature and so if you
look at something like pytorch or
tensorflow or or I mean even a simple
simple use case like you define a class
that has the plus method right you can
overload the dunder methods like Dunder
add for example and then the plus method
works on your class and so it has very
nice and very expressive
Dynamic meta programming features
in Mojo we want all those features come
in like we don't want to break python we
want all the work but the problem is you
can't run those super Dynamic features
on an embedded processor or on a GPU
right or if you could you probably don't
want to just because of the performance
and so we entered this question of
saying okay how do you get the power of
this Dynamic meta programming into a
language that has to be super efficient
in specific cases and so what we did was
we said okay we'll take that interpreter
python has an interpreter in it right
take that interpreter and allow to run
it compile time and so now what you get
is you get compile time meta programming
and so this is super interesting and
super powerful because
one of the big advantages you get is you
get python style expressive apis you get
the ability to have overloaded operators
and if you look at what happens inside
of like pytorch for example with
automatic differentiation and eager mode
like all these things they're using
these really Dynamic and Powerful
features at runtime but we can take
those features and lift them so they run
a compile time so you're because C plus
does amount of programming with with
templates
but it's really messy it's super messy
it's it's always it was accidentally I
mean different people have different
interpretations my interpretation is
that it was made accidentally powerful
it was not designed to be terrain
complete for example but that was
discovered kind of along the way
accidentally
um and so there have been a number of
languages in the space and so they
usually have templates or code
instantiation code copying features of
various sorts
um some more modern languages or some
more newer languages let's say like you
know they're fairly unknown like Zig for
example
um says okay well let's take all of
those types so you can run it all those
things you can do at runtime and allow
them to happen at compile time and so
one of the problems with C plus plus I
mean which is one of one of the problems
with C plus plus there we go is wrong
words
oh that's okay I mean everybody hates me
for a variety of reasons anyways I'm
sure right I've written that's the way
they show love I have written enough C
plus plus code to earn a little bit of
grumpiness with C plus plus but
um but one of the problems with it is
that the meta programming system
templates is just a completely different
Universe from the normal runtime
programming world and so if you do meta
programming and programming it's just
like a different Universe different
syntax different concepts different
stuff going on and so again one of our
goals with mojos to make things really
easy to use easy to learn and so there's
a natural stepping stone
and so as you do this you say okay well
I have to do programming at runtime
after you do programming at compile time
why are these different things how hard
is that to pull it up because that
sounds to me as a fan of meta
programming in c plus even
how how hard is it to pull that off that
sounds really really exciting because
you can do the same style programming at
compile time in a runtime that's really
really exciting yep and so I mean in
terms of the compiler implementation
details it's hard
I won't be shy about that it's super
hard it requires I mean what Mojo has
underneath the covers is a completely
new approach to the design of the
compiler itself and so this Builds on
these Technologies like mlir that you
mentioned but it also includes other
like caching and other interpreters and
jit compilers and other stuff like so
you have like an interpreter inside
within the compiler yes
and so it really takes the standard
model of programming languages and kind
of twisted and unifies it with the
runtime model right which I think is
really cool and to me the value of that
is that again many of these languages
have meta programming features like they
grow macros or something right lisp
right yes I know your roots right
um you know and this is a powerful thing
right and so you know if you go back to
list one of the most powerful things
about about it is that it said that the
meta programming the programming are the
same right and so that made it way
simpler way more consistent way easier
to understand reason about and it made
it more composable so if you build a
library you can use it both at runtime
and compile time
which is pretty cool yeah and then for
machine learning I think meta
programming
I think we could generally say is
extremely useful and so you get features
I mean I'll jump around but there's the
feature of Auto tuning and adaptive
compilation just blows my mind yeah well
so okay so let's come back to that all
right so so what what is what is what is
machine learning like or what is a
machine learning model like you take a
pie torch model off the internet right
um it's really interesting to me because
what a pipe what pi torch and what
tensorflow and all these Frameworks are
kind of pushing compute into as they're
pushing into like this abstract
specification of a compute problem which
then gets mapped in a whole bunch of
different ways right so this is why it
became a meta programming problem is
that you want to be able to say cool I
have I have this neural net now run with
batch size a thousand right do do do a
mapping across batch or okay I want to
take this problem now running across a
thousand CPUs or gpus right and so like
this this problem of like just describe
the compute and then map it and do
things and transform it or like actually
it's very profound and that's one of the
things that makes machine Learning
Systems really special uh maybe can you
describe Auto tuning and how do you pull
off I mean I guess adaptive compilation
is what we're talking about as meta
programming yeah how do you pull off
auto-tune I mean is that is that as
profound as I think it is it seems like
a really like uh you know we'll
mentioned list comprehensions to me from
a quick glass of Mojo uh which by the
way I have to absolutely like dive in uh
as I realized how amazing this is I
absolutely must have been uh it that
looks like just an incredible feature
for machine learning people yeah well so
so what is autotune so take a step back
Auto tuning is a feature in Mojo it's
not so very very little of what we're
doing is actually research like many of
these ideas have existed in other
systems and other places and so what
we're doing is we're pulling together
good ideas remixing them and making them
into hopefully a beautiful system right
and so Auto tuning the observation is
that it turns out hardware systems
algorithms are really complicated turns
out maybe you don't actually want to
know how the hardware works
right A lot of people don't right and so
there are lots of really smart Hardware
people I know a lot of them uh where
they know everything about okay the the
cache size is this and the number of
registers is that and if you use this
what length of vector is going to be
super efficient because it Maps directly
onto what it can do and like all this
kind of stuff or the GPU has SMS and it
has a warp size of whatever right all
the stuff that goes into these things or
the dial size of a TPU is 128 like these
these factoids right
my belief is that most normal people and
I love Hardware people also I'm not
trying to offend literally everybody in
the internet
um but uh most programmers actually
don't want to know this stuff right and
so if you come at it from perspective of
how do we allow people to build both
more abstracted but also more portable
code
because you know it could be that the
vector length changes or the cash size
changes it could be that the tile size
of your Matrix changes or the number you
know an a100 versus an h100 versus a
Volta versus whatever GPU have different
characteristics right a lot of the
algorithms that you run are actually the
same but the parameters these magic
numbers you have to fill in end up being
really fiddly numbers that an expert has
to go figure out and so what Auto tuning
does it says okay well
guess what there's a lot of compute out
there
right so instead of having humans go
randomly try all the things or do a grid
search or go search some complicated
multi-dimensional space
how about we have computers do that
right and so autotuning does is you can
say hey here's my algorithm
if it's a a matrix operation or
something like that you can say okay I'm
going to carve it up into blocks I'm
going to do those blocks in parallel and
I want this this with 128 things that
I'm running on I want to cut it this way
or that way or whatever and you can say
hey go see which one's actually
empirically better on the system
and then the result of that you cash for
that system yep you save it and so come
back to twisting your compiler brain
right so not only does the compiler have
an interpreter that's used to do meta
programming that compiler that
interpreter that meta programming now
has to actually take your code and go
run it on a Target machine
see which one it likes the best and then
Stitch it in and then keep going right
so part of the compilation is machine
specific yeah well so I mean this is an
optional feature right so you don't have
to use it for everything but yeah if you
if you're so one one of one of the
things that we're in the quest of is
Ultimate performance yes right ultimate
performance is important for a couple of
reasons right so if you're an Enterprise
you're looking to save cost and compute
and things like this ultimate
performance translates to you know fewer
servers
if you care about the environment hey
better performance leads to more
efficiency
I mean you could joke and say like you
know Python's bad for the environment
right and so if you move to Mojo it's
like at least 10x better or just out of
the box and then keep going right
um uh but but performance is also
interesting because it leads to better
products and so in the space of machine
learning right if you reduce the latency
of a model
so that it runs faster so every time you
query the server running the model it
takes less time well then the product
team can go and make the model bigger
well that's actually makes it so you
have a better experience as a customer
and so a lot of people care about that
so for auto-tune for like towel size you
mentioned 128 for tpus you would specify
like a bunch of options to try yeah just
in the code it's a simple statement and
then you can just set and forget and
know depending wherever it compiles
it'll actually be the fastest and yeah
exactly the beauty of this is that it
helps you in a whole bunch of different
ways right so if you're building so
often what will happen is that you know
you've written a bunch of software
yourself right you you wake up one day
you say I have an idea I'm going to go
put up some code I get to work
I forget about it
and move on with life I come back six
months or a year or two years or three
years later you dust it off and you go
use it again in a new environment and
maybe your GPU is different maybe you're
running on a server instead of a laptop
maybe whatever right and so the problem
now is you say okay well I mean again
not everybody cares about performance
but if you do you say okay well I want
to take advantage of all these new
features I don't want to break the old
thing though
right and so the typical way of handling
this kind of stuff before is you know if
you're talking about sequence templates
are you talking about C with macros you
end up with if defs you get like all
these weird things get layered in make
the code super complicated and then how
do you test it right it becomes this
this crazy complexity multi-dimensional
space that you have to worry about and
you know that just doesn't scale very
well
actually let me just jump around before
it goes to specific features like the
increase in performance here that we're
talking about can be just insane uh you
write that Mojo can provide a 35
thousand X speed up over python uh how
does it do that yeah so it can even do
more but uh we'll get to that so uh so
first of all when we say that we're
talking about what's called C python
it's the default python that everybody
uses when you type python3 that's like
typically the one you use right see
python is an interpreter
and so interpreters they have an extra
layer of like byte codes and things like
this that they have to go read parse
interpret and it makes them kind of slow
from that perspective and so one of the
first things we do is we move to a
compiler
and so I'm just moving to a compiler
getting The Interpreter out of the loop
is two to five to ten X speed up
depending on the code so just out of the
gate
just using more modern techniques right
now if you do that one of the things you
can do is you can start to look at how C
python started to lay out data
and so one of the things that that c
python did and this isn't part of the
Python spec necessarily but this is just
sets of decisions is that
if you take an integer for example it'll
put it in an object because in Python
everything's an object and so they do
the very logical thing of keeping the
memory representation of all objects the
same so all objects have a header they
have like payload data they and what
this means is every time you pass around
an object you're passing around a
pointer to the data
well this has overhead it turns out that
modern computers don't like chasing
pointers very much and things like this
it means that you have to allocate the
data means you have to reference count
it which is another way of that python
uses to keep track of memory and so this
has a lot of overhead and so if you say
okay
let's try to get that out of
the Heap out of a box out of an
interaction and into the registers
that's that's another 10x so it adds up
if you if you're reference counting
every single every every single thing
you create that adds up yeah and if you
look at you know people complain about
the python Gill this is one of the
things that hurts parallelism
um that's because of the reference
Counting
right and so the Gill and reference
counting are very tightly intertwined in
Python it's not the only thing but it's
very tightly intertwined and so then you
lean into this and you say okay cool
well modern computers they can do more
than one operation at a time and so they
have vectors what is a vector well a
vector allows you to take one instead of
taking one piece of data doing an ad or
multiply and then picking up the next
one you can now do a 4 or 8 or 16 or 32
at a time right well python doesn't
expose that because of reasons and so
now you can say okay well you can adopt
that
now you have threads now you have like
additional things like you control
memory hierarchy and so what Mojo allows
you to do is it allows you to start
taking advantage of all these powerful
things that have been built into the
hardware over time and it gives the
library gives um very nice features so
you can say just parallelize this do
this in parallel right so it's very very
powerful weapons against slowness which
is why people have been I think having
fun like just taking code and making go
fast because it's just kind of an
adrenaline rush to see like how fast you
can get things before I talk about some
of the interesting stuff with
parallelization all that let's let's
first talk about like the basics we
talked to indentation right so this
thing looks like python
it's sexy and beautiful like python as I
mentioned uh is it a typed language so
what's the role of types yeah good
question so python has types it has
strings as integers it has dictionaries
and like all that stuff but they all
live at runtime
right and so
because all those types of runtime in
Python you never or you don't have to
spell them python also has like this
whole typing thing going on now and a
lot of people use it yeah I'm not
talking about that that's that's kind of
a different thing we can go back to that
if you want but but typically the um
you know you just say I take I have a
death and my def takes two parameters
I'm going to call them A and B and I
don't have to write a type okay so that
is great but what that does is that
forces what's called a consistent
representation so these things have to
be a pointer to an object with the
object header and they all have to look
the same and then when you dispatch a
method you go through all the same
different paths no matter what the the
receiver whatever that type is so what
Mojo does is it allows you to have more
than one kind of type and so what it
does is allows you to say okay cool I
have I have an object an object's behave
like python does and so it's fully
Dynamic and that's all great and for
many things classes like that's all very
powerful and very important
but if you want to say hey it's an
integer and it's 32 bits or 64 bits or
whatever it is or it's a floating point
value
at six four bits well then the compiler
can take that and it can use that to do
way better optimization and turns out
again getting rid of the interactions
it's huge means you can get better code
completion because you have
um because compiler knows what the type
is and so knows what operations work on
it and so that's actually pretty huge
and so what Mojo does allows you to
progressively adopt types into your
program so you can start again it's
compatible with python and so then you
can add however many types you want
wherever you want them and if you don't
want to deal with it you don't have to
deal with it right and so one of one of
you know our opinions on this is It's
not that types are the right thing or
the wrong thing
it's a very useful thing
which was kind of optional it's not
strict typing you don't have to specify
a type exactly
okay so starting from the thing that
Python's kind of reaching towards right
now with trying to inject types into it
yeah with a very different approach but
yes yes what's the different approach
I'm actually one of the people
that have not been using types very much
in Python okay why did you say
it's just well because I I know the
importance it's like adults use strict
typing and so I I refuse to grow up in
that sense it's a it's a kind of
rebellion but I I just know that um
it probably reduces the amount of Errors
even just for forget about performance
improvements it probably reduces errors
of when you do strict typing yeah so I
mean I think it's interesting if you
look at that right and the reason is I'm
giving a hard time yeah is that that
there's this this cultural norm this
pressure this like there has to be a
right way to do things like you know
only grown-ups only do it one way and if
you want to do that you should feel bad
yes right like some people feel like
Python's a guilty pleasure or something
and that's like when I get serious I
need to go rewrite it right yeah exactly
I mean cool I understand history and I
understand kind of where this comes from
but I don't think it has to be a guilty
pleasure yeah right and so if you look
at that you say why do you have to
rewrite it well you have to rewrite it
to deploy well why do you want to deploy
well you care about performance you care
about predictability or you want you
know a tiny thing on the server that has
no dependencies or you know you have
objectives you're trying to attain
so what if python can achieve those
objectives
so if you want types well maybe you want
types because you want to make sure
you're passing the right thing sure you
can add a type if you don't care you're
protyping some stuff you're hacking some
things out you're like pulling some Ram
good off the internet it should just
work right and you shouldn't be like
pressured he shouldn't feel bad about
doing the right thing or the thing that
feels good now if you're in a team right
you're working at some massive internet
company and you have 400 million lines
of python code well they they may have a
house rule that you use types yeah right
because it makes it easier for different
humans to talk to each other and
understand what's going on and bugs at
scale right and so there are lots of
good reasons why you might want to use
types but that doesn't mean that
everybody should use them all the time
right so what Mojo does is it says cool
well allow people to use types and if
you use types you get nice things out of
it right you get better performance and
things like this right but Mojo is a
full compatible superset of python
and so that means it has to work without
types
it has to support all the dynamic things
I support all the packages that support
uh for comprehension list comprehensions
and things like this right and so that
that starting point I think is really
important and I think that
again you can look at why I care so much
about this and there's many different
aspects of that one of which is the
world went through a very challenging
migration from python 2 to python 3.
right yes this migration took many years
and it was very painful for many teams
right and there's a lot of a lot of
things that went on in that
um I'm not an expert in all the details
I honestly don't want to be I don't want
the world to have to go through that
yeah right and you know people can
ignore Mojo and if it's not their thing
that's that's cool but if they want to
use Mojo I don't want them to have to
rewrite all their code yeah I mean just
look at the superset part is
there's just I mean there's so much
brilliant stuff here that definitely is
is incredible
um we'll talk about that yeah first of
all how's the typing implemented
differently in uh in python versus uh
Mojo So this heterogeneous flexibility
you said it's definitely implemented
yeah so I'm not a full expert in the
whole backstory and types in Python so
I'll give you I'll give you that I can
give you my understanding
um my understanding is basically like
many Dynamic languages the ecosystem
went through a phase where people went
from writing scripts during a large
scale huge code bases in Python and at
scale kind of helps have types yeah
people want to be able to reason about
interfaces what what do you expect
string or an inch or like what these
basic things right and so what the
python Community started doing is it
started saying okay let's have tools on
the side
Checker tools right the go and like
enforce some variants check for bugs try
to identify things these are called
Static analysis tools generally and so
these tools run over your code and try
to look for bugs
what ended up happening is there's so
many of these things so many different
weird patterns and different approaches
on specifying the types and different
things going on that the python
Community realize and recognize hey hey
there's the thing here and so what they
started to do is they started to
standardize the Syntax for adding types
to python now one of the challenges that
they had is that they're coming from
kind of this fragmented world where
there's lots of different tools they
have different trade-offs and
interpretations and the types mean
different things and so if you look at
types in Python according to the python
spec
the types are ignored
right so according to python spec you
can write pretty much anything in in a
tight position okay and um
you can technically you can write any
expression okay now
that's beautiful because you can extend
it you can do cool things you can build
your own tools you can build your own
house linter or something like that
right but it's also a problem because
any existing Python program may be using
different tools and they have different
interpretations and so if you adopt
somebody's package into your ecosystem
try to run the tool you prefer it may
throw out tons of weird errors and
warnings and problems just because it's
incompatible with how these things work
also because they're added late and
they're not checked by the python
interpreter it's always kind of more of
a hint than it is a requirement also the
C python implementation can't use them
for performance and so it's really
that's a big one right so you can't
utilize the for the compilation for the
just in time compilation okay exactly
and this this all comes back to the
design principle if it's it's kind of
they're kind of hints they're kind of
the definition is a little bit murky
it's unclear exactly the interpretation
in a bunch of cases and so because of
that you can't actually even if you want
to it's really difficult to use them to
say like it is going to be an INT and if
it's not it's a problem right a lot of
code would break if you did that so so
in Mojo right so you can still use those
kind of type annotations it's fine but
in Mojo if you declare a type and you
use it then it means it is going to be
that type and the compiler helps you
check that and force it and it's safe
um and it's not it's not a like best
effort kind of a thing so if you try to
shovel string type thing into an integer
you get an error from the compiler
compile time
nice okay what kind of basic types are
there yeah so uh Mojo is
um pretty hardcore in terms of what it
tries to do in the language which is the
philosophy there is that we
um
again if you if you look at python right
Python's a beautiful language because
it's so extensible right and so all of
the different things in Python like for
loops and plus and like all these things
can be accessed through these Under
Armor methods okay so you have to say
okay if I make something that is super
fast I can go all the way down to the
metal why do I need to have integers
built into the language
right so what Mojo does it says okay
well we can have this notion of structs
so we have classes in Python now you can
have structs classes are Dynamic structs
are static
cool we can get high performance we can
write C plus plus kind of code with
structs if you want these things mix and
work beautifully together but what that
means is that you can go and Implement
strings and ins and floats and arrays
and all that kind of stuff in the
language
right and so that's really cool because
you know to me as a ideal idealizing
compile compiler language type of person
what I want to do is I want to get magic
out of the compiler and put in the
libraries because if somebody can you
know if we can build an integer that's
beautiful and it has an amazing API it
does all the things you'd expect an
Editor to do
if you don't like it maybe you want a
big integer maybe you want to like
sideways integer I don't know like what
what all the space of integers are um
then uh then you can do that and it's
not a second class citizen
and so if you look at certain other
languages like C plus plus one I also
love and use a lot um
into hard code in the language
but complex is not and so isn't it kind
of weird that you know you have this STD
complex class but you have int and
complex tries to look like a natural
numeric type and things like this but
integers and floating Point have these
like special promotion rules and other
things like that that are magic and
they're hacked into the compiler and
because of that you can't actually make
something that works like the built-in
types is there something provided as a
standard because uh you know because
it's AI first
you know numerical types are so
important here so is there something
like a nice standard implementation of
integer influence yeah so so we're still
building all that stuff out so we
provide answers and floats and all that
kind of stuff we also provide like
buffers and tensors and things like that
that you'd expect in an ml context
honestly we need to keep designing and
redesigning and working with the
community to build that out and make
that better that's not our strength
right now
give us six months or a year and I think
it'll be way better but um but the power
of putting in the library means we can
have teams of experts that aren't
compiler Engineers that can help us
design and refine and drive this forward
so uh one of the exciting things we
should mention here is that
this is uh this is new and fresh this
cake is unbaked
it's almost baked you can tell it's
delicious
but it's not fully ready to be consumed
yep that's very fair it is very useful
but it's very useful if you're a super
low level programmer right now and what
we're doing is we're working our way up
the stack and so the way I would look at
Mojo today in May and 2023
um is that it's like a 0.1
so I think that you know a year from now
it's gonna be way more interesting to a
variety of people but what we're doing
is we're we decide to release it early
so that people can get access to it and
play with them we can build it with the
community we um have a big road map
fully published being transparent about
this and a lot of people are involved in
this stuff and so what we're doing is
we're really optimizing for building
this thing the right way and building it
the right way is kind of interesting
working with the community because
everybody wants it yesterday
and so it's sometimes it's kind of you
know there's some Dynamics there but
yeah I think it's good it's the right
thing so there's a Discord also so the
Dynamics is pretty interesting sometimes
the community probably can be very
chaotic
and uh introduce a lot of stress Guido
famously quit over the stress of the
walrus operator I mean yeah you know it
broke maybe
exactly and so like it could be very
stressful to develop but can you just
add tangent upon a tangent is it
stressful to to uh
to work through the design of various
features here given that the community
is so richly involved well so um so I've
been doing open development and
Community stuff for decades now somehow
this has happened to me
um so I've I've learned some tricks but
the the thing that always gets me is I
want to make people happy right and so
this is this is maybe not all people all
happy all the time but generally I want
I want people to be happy right and so
the challenge is that again we're
tapping into some long
some deep-seated long tensions and
pressures both in the python world but
also in the AI world in the hardware
world and things like this and so people
just want us to move faster right and so
again our decision was let's release
this early let's get people used to it
or access to it and play with it and
like let's let's build it in the open
which we could have you know had the the
language monk sitting in the Cloister up
on the hilltop like beavering away
trying to build something but in my
experience you get something that's way
better if you work with the community
right uh and so yes it can be
frustrating can be challenging for lots
of people involved and you know if you I
mean if you mention our Discord we have
over 10 000 people on the Discord 11 000
people or something keep in mind we
released Mojo like two weeks ago yeah so
um very effective so it's very cool
um but what that means is that um you
know 10 11 000 people all will want
something different right and so what
we've done is we've tried to say okay
cool here's our roadmap here here and
the roadmap isn't completely arbitrary
it's based on here's the logical order
in which to build these features or add
add these capabilities and things like
that and what we've done is we've spun
really fast on like bug fixes and so we
actually have very few bugs which is
cool I mean actually for a project in
the state but then what we're doing is
we're dropping in features very
deliberately I mean this is fun to watch
because you got the two
gigantic communities of like Hardware
like systems engineers and then you have
the machine learning python people that
are like higher level yeah and it's just
too like for like Army like uh they've
been at War yeah they've been at War
right and so so here's here's a Tolkien
novel or something okay so here's a test
again like it's it's super funny for for
something that's only been out for two
weeks right people are so impatient
right but okay cool let's fast forward a
year
like in a year's time Mojo will be
actually quite amazing and solve tons of
problems and be very good
um people still have these problems
right and so you you look at this you
say and the way I look at this at least
is to say okay well we're solving big
long-standing problems
to me I again working on many different
problems I want to make sure we do it
right
there's like a responsibility you feel
because if you mess it up right there's
very few opportunities to do projects
like this and have them really have
impact on the world if we do it right
then maybe we can take those feuding
armies and actually heal some of those
wounds yeah like this feels this feels
like a speech by George Washington or
Abraham Lincoln or something and you
look at this it's like okay well how
different are we yeah we all want
beautiful things we all want something
that's nice we all want to be able to
work together we all want our stuff be
used right and so if we can help heal
that now I'm not optimistic that
all people will use Mojo and they'll
stop using C plus plus like that's not
my goal right but um but if we can heal
some of that I think that'd be pretty
cool yeah and we start by putting the
people who like braces into the gulag no
uh so so there are proposals for adding
braces to Mojo and we just know what's
your thing we tell them no okay
politely yeah anyway so there's a lot of
amazing features on the roadmap and
those already implemented it it'd be
awesome I could just ask you a few
things yeah so uh the the other
performance Improvement comes from
immutability so what's the what's this
VAR and this let thing that we got going
on what's immutability
yeah so one of the things that is uh
useful and it's not always required but
it's useful is knowing whether something
can change out from underneath you right
so in Python you have a pointer to an
array right and so you pass that pointer
to an array around to things
if you pass into a function they may
take that and scroll away in some other
data structure and so you get your array
back and you go to use it now somebody
else is like putting stuff in your array
how do you reason about that it gets to
be very complicated at least lots of
bugs right and so one of the things that
you know again this is not selling Mojo
forces on you but something that Mojo
enables is a thing called value
semantics and what value semantics do is
they take
collections like arrays like
dictionaries also tensors and strings
and things like this that are much
higher level and make them behave like
proper values and so it makes it look
like if you pass these things around you
get a logical copy of all the data and
so if I pass you an array your array you
can go do what you want to it you're not
going to hurt my array now that is an
interesting and very powerful design
principle it defines away a ton of bugs
you have to be careful to implement it
in an efficient way as their performance
hit that's a significant
uh generally not if you implement it the
right way but it requires a lot of very
low level uh getting the language right
bits I assume there'll be a huge
performance hit because it's a really
the benefit is really nice because you
don't get into that absolutely well the
trick is is you can't do it you can't do
copies
so you have to provide
the behavior of copying without doing
the copy yeah how do you do that
is that how do you do that it's not
magic it's just it's actually pretty
cool well so first before we talk about
how that works let's talk about how it
works in Python right so in Python you
need to find a person class or maybe a
person class is a bad idea you define a
database class right and database class
has an array of Records something like
that right and so the problem is that if
you pass in a record or class instance
into the database it'll take a hold of
that object and then it assumes it has
it and if you're passing an object in
you have to know that that database is
going to take take it and therefore you
shouldn't change it after you put in the
database right this is this you kind of
have to know that you just have to kind
of know that right and so you roll out
version one of the database you just
kind of have to know that of course Lex
uses its own database right yeah right
because you built it you understand how
this works right somebody else joins the
team they don't know this yes right and
so now they suddenly get bugs you're
having to maintain the database you
shake your fist you argue the tenth time
this happens you're like okay we have to
do something different right and so what
you do is you go to change your python
code and you change your database class
to copy the record every time you add it
and so what ends up happening is you say
okay I will do what's called a defensive
copy inside the database and then that
way if somebody passes something in I
will have my own copy of it and they can
go do whatever and they're not going to
break my thing
okay this is usually the the two design
patterns if you look in pytorch for
example this is cloning a tensor like
there's a specific thing and you have to
know where to call it if you don't call
in the right place you get these bugs
and this is state of the art right
so a different approach so it's used in
many languages so I've worked with it in
Swift
um is you say okay well let's provide
value semantics and so we want to
provide the view that you get a
logically independent copy but we won't
do that lazily
and so what what we do is you say okay
if you pass something into a function it
doesn't actually make a copy what it
actually does is it just increments a
reference to it and if you pass it
around you stick in your database
they can go on the database you or not
and then you come back out of the stack
nobody's copied anything you come back
out of the stack and then the caller
lets go of it well then you've just
handed it off to the database you've
transferred it and there's no copies
made
now on the other hand if you know your
co-worker goes and hands you a record
and you pass it in you stick it in the
database and then you go to town and you
start modifying it what happens is you
get a copy lazily on demand
and so what this does is gives you
copies only when you need them and it
also so it defines away the bugs but
also generally reduces the number of
copies in practice and so but the
implementation details are tricky here
yeah so this is yes something with
reference Counting
but to make it performant
across a number of different kinds of
objects yeah well so you need a couple
of things and so there's many so this
concept has existed in many different
worlds and so that again it's not novel
research at all right the magic is
getting the design right so that you can
do this in a reasonable way right and so
there's a number of components that go
into this one is when you're passing
around so we're talking about Python and
reference counting and the expense of
doing that when you're passing values
around you don't want to do extra
reference counting for no good reason
and so you have to make sure that you're
efficient and you transfer ownership
instead of duplicating references and
things like that which is a very low
level problem you also have to adopt
this and you have to build these data
structures and so if you say
um you know Mojo has to be compatible
with python so of course the default
list is a reference semantic list that
works the way you'd expect in Python but
then you have to design a value semantic
list and so you just have to implement
that and then you implement the logic
within and so the the role of the
language here is to provide all the
low-level hooks that allow the author of
the type to be able to get and express
this Behavior without forcing it into
all cases or hard coding this into the
language itself but there's a ownership
So you you're constantly transferring
you're tracking who owns the thing yes
and so there's a whole system called
ownership and so this is related to work
done in the rust Community also the
Swift community's done a bunch of work
and there's a bunch of different other
languages that have all kind of C plus
plus actually has copy Constructors and
destructors and things like that and so
um and I mean single spell says
everything so it has moved Constructors
it has like this whole world of things
and so this is uh this is a body of work
that's kind of been developing for many
many years now and so Mojo takes some of
the best
ideas out of all these systems and
remixes in a nice way so that you get
the power of something like the rust
programming language but you don't have
to deal with it when you don't want to
which is a major thing in terms of
teaching and learning and being able to
use and scale these systems uh how does
that play with argument conventions what
are they why are they important how does
the value semantics how does the
transfer ownership uh work with with the
arguments when they're passing different
yeah so so if you go deep into systems
programming land so this isn't again
this is not something for everybody but
if you go deep into systems programming
land what you encounters you encounter
these types that get weird so if you're
used to python you think about
everything I could just copy it around I
can go change it and mutate it and do
these things and it's all cool
um if you get into systems programming
land you get into these things like I
have an atomic number or I have a mutex
or I have a uniquely owned database
handle things like this right so these
types you can't necessarily copy yeah
sometimes you can't necessarily even
move them to a different address
and so what Mojo allows you to do is it
allows you to express hey I don't want
to get a copy of this thing I want to
actually just get a reference to it and
by doing that what you can say is you
can say okay if I'm defining something
weird like a atomic number or something
it's like it has to be so it's an atomic
number is a an area in memory that
multiple threads can access at a time
without synchronous without without
locks right and so uh and so like the
definition of atomic number is multiple
different things have to be poking it
therefore they have to agree on where it
is right so you can't just like move it
up from underneath one because it kind
of breaks what what it means and so
that's that's an example of a type that
you can't even you can't copy you can't
move it like once you create it has to
be where it was right now if you look at
many other examples like a database
handle right so okay well what happens
how do you copy a database handle do you
copy the whole database that's not
something you necessarily want to do
um the there's a lot of types like that
where you want to be able to say that
they are uniquely owned and so there's
always one of this thing and or if if I
create a thing I don't copy it and so
what Mojo allows you to do is it allows
you to say hey I want to pass around a
reference to this thing without copying
it and so it has borrowed conventions so
you can say you can use it but you don't
get to change it you can pass it by
mutable reference and so if you do that
then you can you get a reference to it
but you can change it and so it manages
all that kind of stuff so it's uh it's
just a really nice implementation of
like C plus plus has yeah uh you know
the reference kinds of pointers yeah
smart smart different different kinds of
applications and smart pointers that you
can uh explicitly Define this allows you
but you're saying that's more like
um the weird case versus the common case
well it depends on where I mean I mean I
don't I don't think I'm a normal person
so yes I mean I'm not one to call other
people weird yeah
but the uh uh but you know if you talk
to a normal python a typical python
programmer you're typically not about
this right this is a lower level of
abstraction now if you talk to a C plus
plus programmer certainly if you talk to
a rust programmer again they're not
weird they're delightful like these are
all good people right
um those those folks will think about
all the time
right and so I look at this as there's a
spectrum between very deep low-level
systems I'm going to go poke the bits
and care about how they're laid out in
memory all the way up to application and
scripting and other things like this and
so it's not that anybody's right or
wrong it's about how do we build
one system that scales
by the way the the idea of an atomic
number has been something that always
brought me deep happiness because
the flip side of that the the idea that
threads can just modify stuff
um
asynchronously it's the whole idea of
concurrent programming is a source of
infinite stress for me well so this is
where you jump into
um you know again you zoom out and get
out of program languages or compilers
and just look what the industry has done
my mind is constantly blown by this
right and you look at what you know
Moore's Law Moore's law has this idea
that like computers for a long time
single thread performance just got
faster and faster and faster and faster
for free
but then physics and other things
intervened in power consumption like
other things started to matter and so
what ended up happening is we went from
single Core computers to multi-core then
we went to accelerators right this this
trend towards specialization of Hardware
is only going to continue and so for
years us programming language nerds and
compiler people have been saying okay
well how do we tackle multi-core right
for a while it was like multi-core is
the future we have to get on top of this
thing and then it was multi-cores to
default what are we doing with this
thing and that is like there's chips
with hundreds of cores in them what
happened right yeah and so
I'm super inspired by the fact that you
know in the face of this you know those
machine learning people invented this
idea of a tensor right and was it tensor
a tensor is an
like an arithmetic and algebraic concept
it's like an abstraction around a
gigantic paralyzable data set right and
because of that and because of things
like tensorflow and pytorch we're able
to say okay we'll Express the math
of the system this enables you to do
automatic differentiations enables you
do like all these cool things
um and and it's it's an abstract
representation well because you have
that abstract representation you can now
map it onto these parallel machines
without having to
um control okay put that right here put
that right there put that right there
and this has enabled an explosion in
terms of AI compute accelerators like
all the stuff and so that's super super
exciting what about the the deployment
the execution across multiple machines
so uh you write that the modular compute
platform dynamically partitions models
with billions of parameters and
distributes their execution across
multiple machines enabling unparalleled
efficiency
whether the use of unparalleled in that
sentence anyway enabling unparalleled
efficiency scale and reliability for the
largest workloads so how do you do this
um
abstraction of uh distributed deployment
of of a large models yeah so one of the
really interesting
um tensions so there's a whole bunch of
stuff that goes into that I'll pick a
random walkthrough uh if you if you go
back and replay the history of machine
learning right I mean the brief the
brief most recent history of machine
learning because this is as you know
very deep I I knew Lex when he had an AI
podcast yes
right yeah
so uh so if you look at just tensorflow
and Pie George which is pretty recent
history in the big picture right but
tensorflow is all about graphs pie torch
I think pretty unarguably ended up
winning and why did It win mostly
because of usability
right and the usability of pie torches I
think huge and I think again that's a
huge Testament to the power of taking
abstract theoretical technical Concepts
and bring it to the masses right now the
challenge with what the tensorflow
versus the pie George
design points was that tensorflows kind
of difficult to use for researchers but
it was actually pretty good for
deployment
pytorch is really good for researchers
it kind of not super great for
deployment right and so I think the we
as an industry have been struggling and
if you look at what deploying a machine
learning model today means is that
you'll have researchers who are I mean
Wicked smart of course but they're
Wicked smart at model architecture and
data and calculus and like all like
they're Wicked Smart in various domains
they don't want to know anything about
the hardware deployment or C plus plus
or things like this right and so what's
happened is you get people who train the
model they throw over throw it over the
fence and they have people that try to
deploy the model
well every time you have a team a does X
they throw it over the fence and team y
does some Team B does y like you have a
problem because of course it never works
the first time and so you throw over the
fence they figure out okay it's too slow
it won't fit doesn't use the right
operator the tool crashes whatever the
problem is then they have to throw it
back over the fence
and every time you throw a thing over a
fence it takes three weeks of project
managers and meetings and things like
this and so what we've seen today is
getting models in production can take
weeks or months like it's not atypical
right I talk to lots of people and you
talk about like VP of software some
internet company trying to deploy a
model and they're like why do I need a
team of 45 people
okay it's so easy to train a model why
why can't I deploy it right and if you
dig into this
every layer is problematic so if you
look at the language piece I mean this
is tip of the iceberg it's a very
exciting tip of the iceberg for folks
but you've got python on one side and C
plus plus on the other side python
doesn't really deploy I mean it can
theoretically technically in some cases
but often a lot of production teams will
want to get things out of python because
they get their performance and control
and whatever else so Mojo can help with
that
if you look at serving so you talk about
gigantic models well a gigantic model
won't fit on one machine
right and so now you have this model
it's written in Python it has to be
Rewritten in C plus plus now it also has
to be carved up so that half of it runs
on one machine half of it runs on
another machine or maybe it runs on 10
machines
well so now suddenly the complexity is
exploding right and the reason for this
is that if you if you look into
tensorflow pytorch these systems they
weren't really designed for this world
right they're designed for you know back
in the day when we were starting and
doing things where it was a different
much simpler world like you want to run
resnet 50 or some ancient model
architecture like this it was just a it
was a completely different world trained
on one GPU exactly doing
yeah now it's not right in the major
breakthrough and
um
and the world has changed right and so
now the challenge is that tensorflow Pi
towards these systems they weren't
actually designed for llm so like that
was not that was not a thing and so what
where tensile actually has amazing power
in terms of scale and deployment and
things like that and I think Google is
I mean maybe not unmatched but they're
like incredible in terms of their
capabilities and gigantic scale
um many researchers using pytorch right
and so pytorch doesn't have those same
capabilities and so what modular can do
is it can help with that now if you take
a step back and say like what is modular
doing right so modular has like a
a bitter enemy they were fighting
against in the industry and it's one of
these things where everybody knows it
but nobody is usually willing to talk
about it the bitter enemy The Bitter
thing that we have to destroy that we're
all struggling with and it's like all
around it's like fish can't see water
it's complexity
sure yes
complexity right that was very
philosophical
and so if you look at it yes it is on
the hardware side yes all these all
these accelerators all these software
Stacks that go with the accelerator all
these like this massive complexity over
there you look at
what's happening on the modeling side
massive amount of complexity like things
are changing all the time people are
inventing turns out the research is not
done
right and so people want to be able to
move fast Transformers are amazing but
there's a ton of diversity even within
Transformers and what's the next
Transformer right and you look into
serving also huge amounts of complexity
it turns out that all the cloud
providers right have all their very
weird but very cool hardware for
networking all this kind of stuff and
it's all very complicated people aren't
using that you look at classical serving
right there there's this whole world of
people who know how to write high
performance servers with zero copy
networking and like all all this fancy
uh asynchronous I O and like all these
fancy things in the in in the serving
Community very little that has pervaded
into the machine learning world right
and why is that well it's because again
these systems have been built up over
many years they they haven't been
rethought there hasn't been a first
principle's approach to this and so what
modular is doing is we're saying okay
we've built many of these things like so
I've worked on tensorflow and tpus and
things like that other folks on our team
like are worked on pytorch core we've
worked on onyx one time we've worked on
many of these other systems and so the
systems like the Apple accelerators and
all that kind of stuff like our team is
quite amazing and so one of the things
that roughly everybody modular is grumpy
about is that when you're working on one
of these projects you have a first order
goal
get the hardware to work get the system
to enable one more model get this
product out the door enable the specific
workload or make it solve this problem
for this this product team right and
nobody's been given a chance to actually
do that step back and so we as an
industry we didn't take two steps
forward we took like 18 steps forward in
terms of all this really cool technology
across compilers and systems and
runtimes and heterogeneous Computing
like all this kind of stuff and like all
this technology has been you know I
wouldn't say uh beautifully designed but
it's been proven in different quadrants
like you know you look at Google with
tpus massive huge exaflops of compute
strapped together into machines that
researchers are programming in Python in
a notebook that's huge that's amazing
that's incredible right it's incredible
and so you look at the technology that
goes into that and the the algorithms
were actually quite General
and so lots of other Hardware out there
and lots of other teams out there don't
have the sophistication or that maybe
the the years working on it or the the
budget or whatever that Google does
right and so they should be getting
access to same algorithms but they just
don't have that right that's what
modular's doing is we're saying
cool this is not research anymore like
we've we've built Auto tuning in many
systems we've built programming
languages right and so like have have
you know implemented C plus plus I've
implemented Swift I've implemented many
of these things and so you know this
it's hard but it's not research and you
look at accelerators well we know
there's a bunch of different weird kind
of accelerators but they actually
cluster together right and you look at
gpus well there's a couple of major
vendors of gpus and they maybe don't
always get along but their architectures
are very similar you look at CPUs CPUs
are still super important for the
deployment side of things you see new
new architectures coming out from all
the cloud providers and things like this
and they're all super important to the
world right but they don't have the 30
years of development that the entrenched
people do right and so what modular can
do is we're saying okay all this
complexity like it's not it's not bad
complexity it's actually Innovation
right and so it's Innovation that's
happening and it's for good reasons but
I have sympathy for the poor software
people right I mean again I'm a
generally a software person too I love
Hardware but software people want to
build applications and products and
solutions that scale over many years
they don't want to build a solution for
one generation of Hardware with one
vendor's tools right and because of this
they need something that scales with
them they need something works on cloud
and mobile
right because you know their product
manager said hey I wanted to be have
lower latency and it's better for
personalization or whatever they decide
right products evolve and so the
challenge with the machine learning
technology and the infrastructure we
have today in the industry is that it's
all these Point Solutions
and because there are all these Point
Solutions it means that Azure product
evolves you have to like switch
different technology Stacks or switch to
different vendor and what that does is
that slows down progress
so basically a lot of the things we've
developed in those little uh silos for
machine learning tasks you want to make
that the first class citizen of a
general purpose programming language
they can then be compiled across all
these kinds of Hardware well so it's not
really about a programming language I
mean the programming language is a
component of the mission right and the
mission is are not literal but our
joking mission is to save the world from
terrible AI software
so so you know if you look at this
Mission you need a syntax
so that's so yeah she needed a
programming language right and and like
we wouldn't have to build the
programming language if one existed
right so if python was already good
enough then cool we've just used it
right we're not just doing very large
scale expensive engineering projects for
the sake of it like it's to solve a
problem right it's also about
um uh accelerators it's also about
exotic numerics and B float 16 and
Matrix multiplications and convolutions
and like this this kind of stuff
um within the stack there are things
like uh kernel Fusion
that's a esoteric but really important
thing that leads to much better
performance and much more general
research hackability together
right and that that's enabled by the
Asics that's enabled by certain Hardware
so it's like where's the dance between
um there's several questions here like
how do you add a piece of Hardware to
this deck yeah if a new piece like if I
have this genius invention
of a specialized accelerator yeah how do
I add that to the module framework and
also how does modular as a standard
start to define the kind of
Hardware that should be developed yeah
so let me take a step back and talk
about status quo okay yes and so um if
you go back to tensorflow 1 Pi torch one
the this kind of time frame
um and these have all evolved and gotten
way more complicated so let's go back to
the the Glorious simple days right these
things basically were CPUs and Cuda and
so what you do is you say go do
a dense layer and a dens layer has a
matrix multiplication in it right and so
when you say that you say go do this big
operation of matrix multiplication and
if it's on a GPU kick off Cuda kernel if
it's on CPU go do
like an Intel algorithm or something
like that with the Intel mko okay now
that's really cool if you're either in
video or Intel right but then more
Hardware comes in
right and and on one access you have
more Hardware coming in on the other
hand you have an explosion of innovation
in Ai and so what happened with both
tensorflow and pytorch is that the
explosion of innovation in AI has led to
it's not just about multiplication and
convolution these things have now like 2
000 different operators
and on the other hand you have I don't
know how many pieces of Hardware there
are out there it's a lot
it's it's not it's not even hundreds
it's probably thousands okay and across
all of Edge and across like all the
different things that are used at scale
yeah exactly I mean so it's not just
like ai's everywhere yeah it's not a
handful of TPU Alternatives correct it's
it's every phone often with many
different right chips inside of it from
different vendors right like it's AI is
everywhere it's a thing right why are
they all making their own chips like
what why is everybody making their own
thing
well so is that a good thing official so
Chris's velocity on Hardware yeah right
so My Philosophy is that there isn't one
right solution
right and so I think that again we're at
the end of Moore's Law specialization
happens yeah if you if you're building
if you're training gpt5 you want some
crazy super computer data center thingy
if you're making a smart camera that
runs on batteries you want something
that looks very different
if you're building a phone you want
something looks very different if you
have something like a laptop you want
something that looks maybe similar but a
different scale right and so AI ends up
touching all of our Lives robotics right
and like lots of different things and so
as you look into this these have
different Power envelopes there's
different trade-offs in terms of the
algorithms there's new Innovations and
sparsity and other data formats and
things like that and so uh Hardware
Innovation I think is a really good
thing right and what I'm interested in
is unlocking that Innovation there's
also like analog and Quantum and like
all the the
they're really weird stuff right and so
if somebody can come up with a chip that
uses analog Computing and it's 100x more
power efficient think what that would
mean in terms of the daily impact on the
products we use that would be huge now
if you're building an analog computer
you may not be a compiler specialist
right these are different skill sets
right and so you can hire some compiler
people if you're running a big company
maybe but it turns out these are really
uh like exotic new generation of
compilers like this this is a different
thing right and so if you if you take a
step back out and come back to what is
the status quo status quo is that
if you're Intel or you're in video you
can you keep up with the industry and
you chase and okay there's 1900 now
there's 2 000 now there's 2100 and you
have a huge team of people that are like
trying to keep up and tune and optimize
and even when uh one of the big guys
comes out with a new generation of their
chip they have to go back and rewrite
all these things right so really it's
only powered by having hundreds of
people they're all like frantically
trying to keep up and what that does is
that keeps out the little guys
and sometimes the not so little guys the
big guys that are also just not not in
those dominant positions and so
um and so what has been happening and so
a lot of you talk about the rise of new
exotic crazy accelerators is people have
been trying to turn this from uh let's
go write lots of special kernels problem
into a compiler problem
and so we and I contributed to this as
well we as an industry went into it like
let's go make this compiler problem
phase let's call it and much of the
industry is still in this phase by the
way so it's I won't say this phase is
over and so the idea is to say look okay
what a compiler does is it provides a
much more General extensible
uh hackable interface for dealing with
the general case right and so
um within machine learning algorithms
for example people figured out that hey
if I do a matrix multiplication and I do
a relu right the classic activation
function it is way faster to do one pass
over the data and then do the relu on
the output where I'm writing out the
data because really is just a maximum
operation right Max is zero and so
it's an amazing optimization to take not
more value squish together in one
operation now we have Matt morelu
well wait a second if I do that now I
just went from having you know two
operators to three
but now I figure out okay well there's a
lot of activation functions what about
uh leaky rally what about like like a
million things that are out there right
and so as I start fusing these in now I
get permutations of all these algorithms
right and so what the compiler people
said is they said hey cool well I will
go enumerate all the algorithms and I
will enumerate all the pairs and I will
actually generate a kernel for you and I
think that this has been very very
useful for the industry this is one of
the things that powers Google tpus uh
pytorch twos like rolling out really
cool compiler stuff with Triton this
other technology and things like this
and so the compiler people are kind of
coming into their four and saying like
awesome this is a competitive problem
we'll compiler it
here's the problem
not everybody's compiler person I love
compiler people trust me right but not
everybody can or should be a compiler
person it turns out that there are
people that know analog computers really
well or they know
some GPU internal architecture thing
really well or they know some crazy
sparse numeric interesting algorithm
that is the cusp of research but they're
not compiler people and so one of the
challenges with this new wave of
Technology trying to turn everything
into a compiler
once again it's excluded a ton of people
and so you look at what does mojo do
what is the modular stack do it brings
programmability back into this world
like it enables I wouldn't say normal
people but like a new you know different
kind of delightful nerd that cares about
numerics or cares about Hardware or
cares about things like this to be able
to express that in the stack and extend
the stack without having to actually go
hack the compiler itself to extend the
stack on the on the algorithm side yeah
and then on the hardware side yeah so
again go back to like the simplest
example of int right and so what both
Swift and Mojo and other things like
this did is we said okay pull magic out
of the compiler and put it in the
standard Library
right so what modular is doing with the
engine that we're providing and like
this this very deep technology stack
right which goes into heterogeneous run
times and like a whole bunch of really
cool really cool things
um this this whole stack allows that
stack to be extended and hacked and
changed by researchers and by Hardware
innovators and by people who know things
that we don't know because you know
modular has some smart people but we
don't have all the smart people it turns
out right uh what are heterogeneous
runtimes yeah so uh so what is
heterogeneous right so heterogeneous
just means many different kinds of
things together and so the simple
simplest example you might come up with
is a CPU and a GPU and so it's a simple
heterogeneous computer to say I'll run
my data loading and pre-processing and
other algorithms on the CPU and then
once I get it into the right shape I
shove it into the GPU I do a lot of
Matrix multiplications and convolutions
and things like this and I get it back
out and I do some reductions and
summaries and they shove it across the
wire to across the network to another
machine right and so you've got now what
are effectively two computers
a CPU and a GPU talking to each other
working together in a heterogeneous
system
um
but that was 10 years ago
okay you look at a modern cell phone
modern cell phone you've got CPUs and
they're not just CPUs there's like Big
Dot little CPUs and so there's multiple
different kinds of CPUs that are again
working together they're multi-core
you've got gpus you've got neural
network accelerators you got dedicated
Hardware blocks for for media so for
video decode and jpeg code and things
like this and so you've got this
massively complicated system and this
isn't just cell phones every laptop
these days is doing the same thing and
all these blocks can run at the same
time
and need to be
choreographed right and so again one of
the cool things about machine learning
is it's moving things to like data flow
graphs and higher level of abstractions
and tensors and these things that it
doesn't specify here's how to do the
algorithm it gives the system a lot more
flexibility in terms of how to translate
or map it or compile it onto the system
that you have and so what you need you
know at the bottom is part of the layer
there is a way for all these devices to
talk to each other
and so this is one thing that you know
I'm very passionate about I mean you
know I'm a nerd but um but all these all
these machines and all these systems are
effectively parallel computers running
at the same time sending messages to
each other and so they're all fully
asynchronous well this is actually a
small version of the same problem you
have in a data center right in a data
center you now have multiple different
machines sometimes very specialized
sometimes with gpus or tpus in OneNote
and sometimes with disks and other nodes
and so you get a much larger scale
heterogeneous computer and so what ends
up happening is you have this like
multi-layer abstraction of hierarchical
parallelism hierarchical
asynchronous communication and making
that again the enemy my enemy is
complexity by getting that away from
being different specialized systems at
every different part of the stack and
having more consistency and uniformity I
think we can help lift the world and
make it much simpler and actually get
used but how do you leverage like the
strengths of the different specialized
systems so looking inside the smartphone
yeah like there's there's what like I
don't know five six computers
essentially inside a smartphone uh how
do you
without
Trying to minimize the explicit uh
making it explicit which which computer
is supposed to be used for which
operation yeah so there's there's a
pretty well known algorithm and what
you're doing is you're looking at two
two factors you're looking at the factor
of sending data from one thing to
another right because it takes time to
get it from that side of the chip to
that side of the Chip and things like
this and then you're looking at what is
the time it takes to do an operation on
a particular block so take CPUs CPUs are
fully General they can do anything right
but then you have a neural net
accelerator that's really good at Matrix
multiplications okay and so you say okay
well if my workload is all Matrix
multiplications I start up I send the
data over the neural net thing it goes
and does matrix multiplications when
it's done it sends me back the result
all is good right and so the simplest
thing is just saying do Matrix do Matrix
operations over there right but then you
realize you get a little bit more
complicated because you can do Matrix
multiplications on a GPU you can do it
on
a neural net accelerator you can do it
on CPU and they'll have different
trade-offs and costs and it's not just
matrix multiplication and so what you
actually look at is you look at I have
generally a graph of compute I want to
do a partitioning I want to look at the
communication the bisection bandwidth
and like the overhead and the sending of
all these different things and and build
a model for this and then decide okay
it's an optimization problem where do I
want to place this compute
this is the old school theoretical
computer science problem of scheduling
and then how does uh
presumably it's possible to somehow
magically include autogun into this
absolutely so I mean in my opinion this
is an opinion this is not uh not
everybody would agree with this but in
my opinion the world benefits from
simple and predictable systems at the
bottom that you can control
but then once you have a predictable
execution layer you can build lots of
different policies on top of it right
and so one policy can be that
the human programmer says do that here
do that here do that here do that here
and like fully manually controls
everything
and the system should just do it right
then you quickly get in the mode of like
I don't want to have to tell it to do it
yeah and so the next logical step that
people typically take because they write
some terrible heuristic oh if it's
amazing location do it over there or if
it's floating Point dude on the GPU if
it's integer due on the CPU like
something like that right and and then
you you then get into this mode of like
people care more and more and more and
you say okay well let's actually
um like make your stick better let's get
into auto tune let's actually do a
search of the space to decide well what
is actually better right well then you
get into this problem where you realize
this is not a small space this is a many
dimensional
hyperdimensional space that you cannot
exhaustively search
so do you know of any algorithms that
are good at searching very complicated
spaces for
don't tell me you're going to turn this
into a machine learning problem so then
you turn into a machine learning problem
and then you have a space of genetic
algorithms and reinforcement learning
and like all these all these what can
you include that into the stack into the
into the modulus that yeah yeah where
does it sit where does it live is it
separate thing or is it part of the
compilation so you start from simple and
predictable models and so you can have
full control and you can have coarse
grain knobs that like nudge system so
you don't have to do this but if you
really care about getting the best you
know the last ounce out of a problem
then you can use additional tools and
they're the cool thing is you don't want
to do this every time you run a model
you want to figure out the right answer
and then cache it and once you do that
you can get you can say okay cool I can
get up and running very quickly I can
get good execution out of my system I
can decide if something's important and
if it's important I can go through a
bunch of machines at it and do a big
expensive search over the space using
whatever technique I feel like it's
further up to the problem and then when
I get the right answer cool I can just
start using it
right and so you can get out of this um
this trade-off between okay am I gonna
like spend forever doing a thing or do I
get up and running quickly and it's a
quality result like these these are
actually not In Contention with each
other if the system's designed to scale
you started and did a little bit of a
whirlwind overview of how you get 35 000
x uh speed up or more over python
um Jeremy Howard did a really great
presentation about sort of the basic
like look at the code here's how you get
the speed up like you said that's
something we could uh probably
developers can do for their own code to
see how you can get these gigantic
Speedos but can you maybe speak to the
machine learning task in general how do
you how do you make some of this code
fast and specifics like what would you
say is the main bottleneck
uh for uh machine learning tasks so are
we talking about uh Matt Mall matrix
multiplication how do you make that fast
so I mean if you just look at the python
problem right you can say how do I make
python faster
there's been a lot of people that have
been working on the
okay how to make python 2x faster 10xs
or something like that right and there's
been a ton of projects in that van right
Mojo started from the what can the
hardware do
like what is the limit of physics yeah
what is the speed of light what is it
like how fast can the sun go and then
how do I express that yeah right and so
it wasn't well it wasn't anchored
relatively on make python a little bit
faster it's saying cool I know what the
hardware can do let's unlock that right
now when you when you just say how how
gutsy that is to be in the meeting and
as opposed to trying to see how do we
get the Improvement it's like what can
the physics do
I mean maybe I'm a special kind of nerd
but you look at that what is the limit
of physics how fast can these things go
right
when you start looking at that typically
it ends up being a memory problem right
and so today uh particularly with these
specialized accelerators the problem is
that you can do a lot of math within
them but you get bottleneck sending data
back and forth to memory whether it be
local memory or distant memory or disk
or whatever it is and and that that
bottleneck particularly is the training
sizes get large as you start doing tons
of inferences all over the place like
that becomes a huge bottleneck for
people right
so again what happened is we went
through a phase of many years where
people took the special case and hand
tuned it and tweaked it and tricked it
out and they knew exactly how the
hardware worked and they knew the model
and they made it they made it fast
didn't generalize
and so you can make you know resting at
50 or some or Alex net or something
Inception V1 like you can you can do
that right because the models are small
they fit in your head right but as the
models get bigger more complicated as
the machines get more complicated it
stops working right and so this is where
things like kernel Fusion come in so
what is Chrono Fusion this is this idea
of saying let's avoid going to memory
and let's do that by building a new
hybrid kernel a numerical algorithm that
actually keeps things in the accelerator
instead of having to write all the way
out to memory all right what's happened
with with these accelerators now is you
get multiple levels of memory like in a
GPU for example you'll have Global
memory and local memory and like all
these things
um if you zoom way into how Hardware
works the register file is actually a
memory
so the registers are like an L zero
cache and so a lot of taking advantage
of the hardware ends up being fully
utilizing the full power
in all of its capability and this has a
number of problems right one of which is
again the complexity of disaster right
there's too much Hardware even if you
just say let's look at the chips from
one line of vendor like apple or Intel
or whatever it is
each version of the chip comes out with
new features and they change things so
that it takes more time or less time to
do different things and you can't
rewrite all the software whenever a new
chip comes out right and so this is
where you need a much more scalable
approach and this is what Mojo and what
the modular stack provides is it
provides this infrastructure and the
system for factoring all this complexity
and then allowing people to express
algorithms you talk about Auto tuning
for example Express algorithms in a more
portable way so that when a new chip
comes out you have to you don't have to
rewrite it all
so to me like you know I kind of joke
like what is a compiler well there's
many ways to explain that you convert
thing a into thing B and you convert
source code to machine code like you can
talk about many many
things that compilers do but to me it's
about a bag of tricks it's about a
system and a framework that you can hang
complexity it's a system that can then
generalize and it can work on problems
that are bigger than fit in one human's
head right and so what that means what a
good stack and what the modular stack
provides is the ability to walk up to it
with a new problem and it'll generally
work quite well
and that's something a lot of machine
learning infrastructure and tools and
Technologies don't have typical
state-of-the-art today as you walk up
particularly if you're deploying if you
walk up with a new model you try to push
it through the converter the converter
crashes
that's crazy the state of ml tooling
today is not anything that a c
programmer would ever accept right and
it's always been this kind of flaky set
of tooling that's never been integrated
well and it's been uh never worked
together and because it's not designed
together it's built by different teams
it's built by different Hardware vendors
it's built by different systems it's
built by different internet companies
that are trying to solve their their
problems right and so that means that we
get this fragmented terrible mess of
complexity
so I mean the specifics of Emily Jeremy
showed this uh there's the vectorize
function which I guess is
uh built in to the uh into Mojo does
that vectorized as he showed is built
into the library into the library
instead of the library
um vectorized paralyze
which vectorizes more low-level
paralyzes higher level there's the
tiling thing which is how he
demonstrated the um
autotune I think so so think of think
about this in like levels hierarchical
levels of abstraction right and so it at
the very if you zoom all the way into a
compute problem you have one floating
Point number right so then you say okay
I want to be I can do things one at a
time in an interpreter it's pretty slow
right so I can get to doing one one at a
time in a compiler I can see then I can
get to doing four or eight or 16 at a
time with vectors that's called
vectorization
then you can say hey I have a whole
bunch of different
you know what what a multi-core computer
is is it's basically a bunch of
computers
right so they're all independent
computers that can talk to each other
and they share memory and so now what
parallelized does it says okay run
multiple instances this on different
computers and now they can all work
together on Chrome right and so what
you're doing is you're saying keep going
out to the next level out and and as you
do that how do I take advantage of this
so tiling is a memory optimization right
it says okay let's make sure that we're
keeping the data close to the compute
part of the problem instead of sending
it all back and forth through memory
every every time I load a block and the
size of the block size is is all that's
how you get to the auto tune to make
sure it's optimized yeah well so all of
these The Details Matter so much to get
good performance this is another funny
thing about machine learning and high
performance Computing that is very
different than C compilers we all grew
up grew up with where you know if you
get a new version of GCC or new version
of clang or something like that you know
maybe something will go one percent
faster
right and so compiler insurers will work
really really really hard to get half a
percent out of your C code something
like that but when you're talking about
an accelerator or an AI application or
you're talking about these kinds of
algorithms now these are things people
used to write in Fortran for example
right
if you get it wrong it's not five
percent or one percent it could be 2X or
10x right if you think about it
um you really want to make use of the
full memory you have the cash for
example but if you use too much space it
doesn't fit in the cache now you're
going to be thrashing all the way back
out to main memory and these can be 2x
10x Major Performance differences and so
this is where getting these magic
numbers and these things right is really
actually quite important so you
mentioned that moja is a superset of
python
can you run
python code
as if it's Mojo code
yes yes so and so and this has two sides
of it so Mojo's not done yet so I'll
give you disclaimer mode it's not done
yet but already we see people that take
small pieces of python code move it over
they don't change it and you can get 12x
speedups like somebody's just tweeting
about that yesterday which is pretty
cool right and again interpreters
compilers right and so without changing
any code without also this is not with
this is not jit compiling or do any
anything fancy this is just basic stuff
moving straight over now Mojo will
continue to grow out and as it grows out
it will have more and more and more
features and our North Stars to be a
full super set of python and so you can
bring over
basically arbitrary python code and have
it just work and it may not always be
12x faster but um but it should be at
least as fast and way faster in many
cases this is the goal right
um
now I'll take time to do that and python
is a complicated language there's not
just the obvious things but there's also
non-obvious things that are complicated
like we have to be able to talk to C
python packages to talk to the C API and
there's a bunch of there's a bunch of
pieces so you have to I mean to make
explicit the obvious that may not be so
obvious until you think about it so you
know to run python code that means you
have to run all the python packages and
libraries yeah yeah so that means what
what's the relationship between Mojo and
C python the The Interpreter that's
presumably would be tasked with getting
those packages to work yep so in the
fullness of time Mojo will solve for all
the problems and you'll be able to move
python packages over and run them in
Mojo without the C python without C
python someday yeah right it's not today
not someday and that'll be a beautiful
day because then you'll get a whole
bunch of advantages and you'll get
massive speed ups and things like this
but you can do that one at a time right
you can move packages one exactly but
but we're not willing to wait for that
python is too important the ecosystem is
too broad uh we want to both be able to
build Mojo out we also want to do it the
right way without time like in without
intense time pressure we're obviously
moving fast but
um and so what we do is we say okay well
let's make it so you can import an
arbitrary existing package
arbitrary
including like you write your own on
your local disk or whatever it's not
it's not like a standard like an
arbitrary package
and import that using C python because C
python already runs all the packages
right and so what we do is we built an
integration layer where we can actually
use C python again I'm practical and to
actually just load and use all the
existing packages as they are the
downside of that is you don't get the
benefits of Mojo for those packages
right and so they'll run as fast as they
do in the traditional C python way
but what that does is that gives you an
incremental migration path and so if you
say hey cool well here's a you know the
python ecosystem is vast I want all of
it to just work but there's certain
things that are really important and so
if I if I'm doing weather forecasting or
something well
I want to be able to load all the data I
want to be able to work with it and then
I have my own crazy algorithm inside of
it Well normally I'd write that in C
plus plus
if I can write in Mojo and have one
system that scales well that's way
easier to work with is it hard to do
that to to have that layer
that's running C python because is there
some communication back and forth yes
it's complicated I mean this is what we
do so I mean we make it look easy but um
it is it is complicated but what we do
is we use
the C python existing interpreter so
it's running its own byte codes and
that's how it provides full
compatibility and then it gives us C
python objects and we use those objects
as is and so that way we're fully
compatible with all the C python objects
and all the the you know it's not just
the python part it's also the C packages
the C libraries underneath them because
they're often hybrid and so we can fully
run and we're fully compatible with all
that and the way we do that is that we
have to play by the rules right and so
we we keep objects in that
representation when they're coming from
that world what's the representation
that's being used in memory you'd have
to know a lot about how the C python
interpreter works it has for example
reference counting but also different
rules on how to pass pointers around and
things like this super low level fiddly
and it's not like python it's like how
The Interpreter works okay and so that
gets all exposed out and then you have
to Define wrappers around the low level
C code right and so
what this means is you have to know not
only C
which is a different world from python
obviously not only python but the
rappers but The Interpreter and the
rappers and the implementation details
and the conventions and it's just this
really complicated mess and when you do
that now suddenly you have a debugger
that debugs python they can't step into
C code
right so you have this two world problem
right and so by pulling this all into
Mojo what you get is you get one world
you get the ability to say cool I have
untyped very Dynamic beautiful simple
code
okay I care about performance for
whatever reason right there's lots of
reasons you could you you might care and
so then you add types you can
parallelize things you can factorize
things you can use these techniques
which are General techniques to solve a
problem and then you can do that by
staying in the system and if you're uh
you have that one python package it's
really important to you you can move it
to Mojo you get massive performance
benefits on that and other other
advantages you know if you like stack
types it's nice if they're enforced
some people like that right rather than
being hints so there's other advantages
too and then
um
and then you can do that incrementally
as you go
so one different perspective on this
would be um
why Mojo instead of making C python
faster or redesigning C python yeah well
I mean you can argue Mojo is redesigning
C python but but uh but why not make C
python faster and better and other
things like that uh there's lots of
people working on that so actually
there's a team at Microsoft that is
really improving I think C python
3.11 came out in October or something
like that and it was you know 15 faster
20 faster across the board
which is pretty huge given how mature
python is and things like this and so
um that's awesome I love it
um doesn't run on GPU it doesn't do AI
stuff like it doesn't do vectors doesn't
do things
um I'm 20 is good 35 000 times is better
right so like they're they're they're
definitely I'm a huge fan of that work
by the way and it composes well with
what we're doing and so it's not it's
not like we're fighting or anything like
that it's actually just general it's
goodness for the world but it's just a
different path right and again we're not
working forwards from making python a
little bit better we're working
backwards from what is the limit of
physics what's the process of uh
supporting python code to Mojo is there
a
what's involved in that in the process
is there tooling for that not yet so um
we're missing some basic features right
now and so we're continuing to drop out
new features like on a weekly basis but
um you know at the fullness of time give
us a year and a half maybe two years is
it an automatable process so when we're
ready it'll be very automatable yes is
it automatable automate like is it
possible to automate
in the general case the python Mojo
conversion yeah well you're saying it's
possible well so and this is why I mean
among other reasons why we use tabs yes
right so first of all by being a
superset yep you could it's like C
versus C plus plus can you move C code
to C plus plus
yes yeah right and you move you you can
move C code to C plus plus and uh then
you can adopt classes you can add adopt
templates you can adopt other references
or whatever C plus features you want
after you move C to C code to C plus
plus like you can't use templates in C
right and so if you leave it a c fine
you can't use the cool features but it
still works right and C and C plus plus
code work together and so that's the
analogy right now
um here right you you you
there's not a python is bad and the Mojo
is good
right Mojo just gives you superpowers
right and so if you want to stay with
python that's cool uh but the tooling
should be actually very beautiful and
simple because we're doing the hard work
of defining a superset right so you're
right so there's several things to say
there but also the conversion tooling
should probably give you hints as to
like how you can improve the code and
then yeah exactly once you're in the new
world then you can build all kinds of
cool tools to say like hey should you
adopt this feature or like and we
haven't built those tools yet but I
fully expect those tools will exist and
then you can like you know quote
modernize your code or however you want
to look at it right so I mean one of the
things that I think is really
interesting about Mojo is that there
have been a lot of projects to improve
python over the years
um everything from you know getting
python to run on the Java virtual
machine uh Pi Pi which is the jit
compiler there's tons of these projects
out there that have been working on
improving python in various ways
they founded one or two camps so pipei
is a great example of a camp that is
trying to be compatible with python
even there not really it doesn't work
with all the C packages and stuff like
that but um but they're trying to be
compatible with python there's also
another category of these things where
they're saying well python is too
complicated
and you know I'm gonna cheat on the
edges and it you know like integers in
Python can be an arbitrary size integer
like if you care about it fitting in a
going fast on a register and a computer
that's really annoying right and so you
can you can choose to pass on that right
you can say well people don't really use
big integers that often therefore I'm
gonna just not do it and it'll be fine
not not a python superset or you can do
the hard thing and say okay this is
python
you can't be a super set of python
without
being a super set of python and that's a
really hard technical problem but it's
in my opinion worth it right and it's
worth it because it's not about any one
package it's about this ecosystem it's
about what python means for the world
and it also means we don't want to
repeat the python 2 to Python 3
transition like we want we want people
to be able to adopt this stuff quickly
and so by doing that work we can help
lift people yeah the challenge it's
really interesting technical
philosophical challenge of
really making a language a superset of
another language
that's breaking my brain a little bit
well it paints you in the corners so
um again I'm very happy with python
right so joking all joking aside I think
that the indentation thing is not the
actual important part of the problem yes
right but the the fact that python has
amazing Dynamic meta programming
features and they translate to beautiful
static meta programming features I think
is profound I think that's huge right
and so python I've talked with Guido
about this it's it's like it was not
designed to do what we're doing that was
not the reason they built it this way
but because they really cared and they
were very thoughtful about how they
designed the language it scales very
elegantly in the space but if you look
at other languages for example C and C
plus plus
right if you're building a superset you
get stuck with the design decisions of
the subset
right and so you know C plus plus is way
more complicated because of C in the
Legacy than it would have been if they
would have theoretically designed a from
scratch thing
and there's lots of people right now
that are trying to make C plus plus
better and recent tax C plus plus it's
gonna be great we'll just change all the
syntax
uh but if you do that now suddenly you
have zero packages
so you don't have compatibility so what
what are the if you could just uh Linger
on that what are the
biggest challenges of keeping that
superset status
what are the things you're struggling
with is it all boil down to having a big
integer
no I mean it's it's one of the other
things usually it's the um it's a long
tail weird things so let me give you a
war story okay so War story in the space
is
um you go way back in time project I
worked on is called clang clang what it
is it's a cc plus plus parser right and
when I start working on clang
spent like 2006 or something was 2007 in
2006 when I first started working on it
right
um it's funny how time flies yeah the uh
uh I started that project and I'm like
okay well I want to build a c parser C
plus plus parser for lvm it's gonna be
the work GCC is yucky you know this is
mean earlier times it's yucky it's on
principled it has all these weird
features like all these bugs like
it's yucky so I'm going to build a
standard compliant C and C plus parser
it's gonna be beautiful it'll be amazing
well engineered all the cool things an
engineer wants to do
and so I started implementing building
it out building on building out and then
I got to include standardio.h
and all of the headers in the world use
all the GCC stuff
okay this and so again come back away
from
Theory back to reality right I had I was
in a fork on the road I could have built
an amazingly beautiful academic thing
that nobody would ever use
or I could say well it's yucky in
various ways all these design mistakes
accents of History the Legacy at that
point GCC was like over 20 years old
which by the way yeah now lvm's over 20
years old yeah that's funny how yeah
time catches up to you right and so
um you you say okay well what what is
easier right I mean as an engineer it's
it's actually much easier for me to go
Implement long tail compatibility weird
features even if they're distasteful and
just do the hard work and like figure it
out reverse engineer understand what it
is write a bunch of test cases like try
to understand Behavior
it's way easier to do all that work as
an engineer than it is to go talk to all
C programmers and get argue with them
and try to get them to rewrite their
code yeah
right and because that breaks a lot more
things yeah and and you have realities
like nobody actually even understands
how the code works because it was
written by the person who quit 10 years
ago right and so this is this software
has kind of frustrating that way but
it's that's how the world works right
yeah unfortunately it can never be this
perfect beautiful thing well there there
are occasions in which you get to build
like you know you invent a new
data structure or something like that or
there's this beautiful algorithm that
just like makes you super happy right I
I love that moment but but when you're
working with people yeah and you're
working with code and Dusty that code
bases and things like this right
it's not about what's theoretically
beautiful it's about what's practical
what's real what people will actually
use and I don't meet a lot of people
that say I want to rewrite all my code
just for the sake of it
by the way there could be interesting
possibilities and we'll probably talk
about it where AI can help rewrite some
code that might be farther out future
but it's a really interesting one how
that could create more
be a a tool in the battle against this
monster of complexity that you mentioned
yeah
foreign Guido the the benevolent
dictator for life of python what does he
think about Mojo have you talk too much
about it uh I have talked with him about
it he found it very interesting
um we actually talked with Guido before
it launched and so he was aware of it
before it went public
um I have a ton of respect for Credo for
a bunch of different reasons you talk
about walrus operator and like Guido's
pretty amazing in terms of
steering such a huge and diverse
community and and
and
like driving forward and I think python
is what it is thanks to him right and so
to me it was really important starting
to work on Mojo to get his feedback and
get his input and get his eyes on this
right now
um a lot of what Guido was is wasn't as
I think concerned about is how do we not
fragment the community yeah we don't
want to python to Python 3 thing like
that was that was really painful for
everybody involved and so we spent quite
a bit of time talking about that and
some of the tricks I learned from Swift
for example so in the migration from
Swift we managed to like not just
convert
Objective C into a slightly prettier
Objective C which we did we then
converted not entirely but almost an
entire Community to completely different
language
right and so there's a bunch of tricks
that you learn along the way that are
directly relevant to what we do and so
this is where for example the you
leverage C python
while bringing up the new thing like
that that approach is I think proven and
and comes from experience and so Guido
is very interested in like okay cool
like I think that python is really his
legacy it's his baby I have ton tons of
respect for that incidentally I see mojo
as a member of the Python family I'm not
trying to take python away from Guido
and from the python Community
um uh and so uh to me it's really
important that we're a good member of
that community and so yeah I think that
again you would have to ask Guido this
but I think that he was very interested
in this notion of like
cool but I think it's been up for being
slow
maybe there's a path out of that
right and that you know if the future is
python right I mean look look at the the
far outside
case on this right and I'm not saying
this is Guido's perspective but you know
there's this path of saying like Okay
well suddenly python can suddenly go all
the places it's never been able to go
before
right and that means the python can go
even further and can have even more
impact on the world so in some sense
Mojo could be seen as python 4.0
I would not say that I think that would
drive a lot of people really crazy
because of the PTSD of the 3.02. I'm
willing to annoy people about emacs
versus Bim versus spaces that's that one
I don't know that might be a little bit
far even for me like my my skin may not
be that thick but the point is the step
to it being a superset and allowing all
these capabilities I think is the
evolution of a language it feels like an
evolution of a language
so he he's interested by the ideas that
you're playing with but also concerned
about the fragmentation so how what are
the ideas you've learned what are you
thinking about how do we avoid
fragmenting the community where the the
pythonistas and the
uh I don't know what to call the Mojo
people uh magicians The Magicians yeah I
like it uh can coexist happily and and
share a code and basically just have
these big code bases that are using uh C
Python and more and more moving towards
Mojo well so again these are lessons I
learned from Swift and and here we Face
very similar problems right and Swift
you have Objective C super Dynamic uh
they're very different syntax right but
you're talking to people who have large
scale code bases I mean Apple's got the
biggest largest scale code base of
objective c code right and so you know
none of the companies none of the iOS
developers none of the other developers
want to rewrite everything all at once
and so you want to be able to adopt
things piece at a time and so a thing
that I found that worked very well in
the Swift Community was saying okay cool
and this is when switch was very young
as you say okay you have a million line
of code Objective C app
don't rewrite it all but when you
implement a new feature go Implement
that new class
using Swift right and so now this turns
out is a very wonderful thing for an app
developer
but it's a huge challenge for this
compiler team and the systems people
that are implementing that's right and
this comes back to what is this
trade-off between doing the hard thing
that enables scale versus doing the
theoretically pure and ideal thing right
and so Swift adopted and built a lot of
different Machinery to deeply integrate
with the objective runtime and we're
doing the same thing with python right
now what what happened in the case with
swift is that
Swift as the language got more and more
and more mature over time right and
incidentally Mojo is a much simpler
language than Swift in many ways and so
I think that Mojo will develop way
faster than Swift for a variety of
reasons but as the language gets more
mature in parallel with that you have
new people starting new projects
right and stuff when the language is
mature and somebody's starting a new
project that's when they say okay cool
I'm not dealing with a million lines of
code I'll just start and use the new
thing for my whole stack now the problem
is again you come back to where
communities and we're
people that work together you build new
subsystem or a new feature or new thing
in Swift or you build new thing in Mojo
then you want to be end up being used on
the other side
right and so then you need to work on
integration back the other way
and so it's not just Mojo talking python
it's also python talking to Mojo right
and so what I would love to see and I
don't want to see this next month right
but what I want to see over the course
of time is I would love to see people
that are building these packages like
you know numpy or uh you know tensorflow
or what you know these packages that are
half python half C plus plus
and if you say okay cool I want to get
out of this python C plus plus world
into a unified role and so I can move to
Mojo
but I can't give up on my python clients
because they're like these libraries get
used by everybody and they're not all
going to switch ever all you know all
once and maybe never right well so the
way we should do that is we should vend
python interfaces to the Mojo types
and that's what we did in Swift and
we're great I mean it was a huge
implementation challenge for the
compiler people right but um there's
only a dozen of those compiler people
and there are millions of users and so
it's a very expensive Capital intensive
like skill set intensive problem but
once you solve that problem it really
helps adoption it really helps the
community progressively adopt
Technologies and so I think that this
approach will work quite well with with
the Python and the Mojo world so for a
package ported to Mojo and then create a
python interface yep
so how do just the Linger on these
packages numpy Pi torch and tensorflow
yeah how do they play nicely together so
is uh Mojo supposed to be let's talk
about the machine learning ones
is Mojo kind of vision to replace ply
torture tensorflow uh to incorporate it
what's what's the relationship in this
all right so um dance so take a step
back so I wear many hats so you're
you're angling it on the Mojo side yes
Mojo is a programming language and so it
can help solve the C C plus plus python
Feud that's happening the fire Emoji got
me I'm sorry we should be talking about
modular yes yes yes okay so the fire
Emoji is amazing I love it uh it's it's
a big deal the other side of this is the
fire Emoji is in service of solving some
big AI problems yes right and so the big
AI problems are again this fragmentation
this Hardware nightmare this uh this
explosion of new potential but that's
not getting felt by the industry right
and so when you look at how does the
modular engine help tensile and pytorch
right it's not replacing them right in
fact when I talk to the people again
they don't like to rewrite all their
code you have people that are using a
bunch of high torch a bunch of
tensorflow they have models that they've
been building over the course of many
years right and when I talk to them
there's a few exceptions but generally
they don't want to rewrite all their
code
right and so what we're doing is we're
saying okay well you don't have to
rewrite all your code what happens is
the modular engine goes in there and
goes underneath tensorflow and Pi torch
it's fully compatible and just provides
better performance better predictability
better tooling
it's a better experience that helps lift
tensorflow and pytorch and make them
even better I love python I love
tensorflow I love by torch right this is
about making the world better because we
need AI to go further but if I have a
process that trains a model and have a
process that performs inference on that
model and have the model itself
uh what should I do with that in the
long Arc of History
in terms of if I use Pi torch to train
it should I rewrite stuff in Mojo would
that if I care about performance well so
I mean again it depends so if you care
about performance then writing and mojos
can be way better than writing in Python
but if you look at
um if you look at llm companies for
example so you look at open AI rumored
and you look at many of the other folks
that are working on maybe these many of
these LMS and other like Innovative
machine learning models on the one hand
they're innovating in the data
collection and the model billions of
parameters in the model architecture and
the RL HF and the the like all these all
the cool things that people are talking
about
but on the other hand they're spending a
lot of time writing Cuda girls
so you say wait a second how much faster
could all this progress go if they were
not having to handwrite all these Cuda
kernels right and so there are a few
technologies that are out there and
people have been working on this problem
for a while and
um and they're trying to solve subsets
the problem again kind of fragmenting
the space and so what Mojo provides for
these kinds of companies is the ability
to say cool I can have a unifying Theory
right again this the The Better Together
the unifying Theory the the two world
problem or the three world problem or
the enrolled problem like this is the
thing that is slowing people down and so
as we help solve this problem I think
it'll be very helpful for making this
whole cycle go faster
so obviously we've talked about the
transition from Objective C to Swift if
designed this uh programming language
and you've also talked uh quite a bit
about the use of Swift for machine
learning uh context
why have you decided to move away from
uh maybe an intense focus on Swift for
the machine learning context versus sort
of Designing a new programming language
that happens to be a superstar this is
an irrational set of Life Choices I make
I go to the desert and did you meditate
on it okay all right no it was Bull it
was bold and needed and I think uh I
mean it's just bold and sometimes to
take those leaps is a difficult leap to
take yeah well so okay I mean I think
there's a couple of different things so
um actually I left apple back in 2017
like January 2017. so it's been a number
of years that I left apple and the
reason I left Apple was to do AI
okay so and again I won't comment on
Apple and AI but the uh uh at the time
right I want to get into and understand
and understand the technology understand
the applications the workloads and so I
was like okay I'm gonna go dive deep
into applied and Ai and then the
technology underneath it right
um
I found myself a Google
and that was like when tpus were yep
waking up exactly and so I found myself
at Google and uh Jeff Dean who's a rock
star as you know right and the and in
2017 tens flow is like really taking off
and doing incredible things and I was
attracted to Google to help them with
the tpus right and tpus are an
Innovative Hardware accelerator platform
uh have now I mean I think proven
massive scale and like done incredible
things right and so one of the things
that this led into is a bunch of
different projects which I'll skip over
right one of which was this Swift for
tensorflow project right and so that
project was a research project and so
the idea of that is say okay well let's
look at Innovative new programming
models where we can get a fast
programming language we can get
automatic differentiation into the
language let's push the boundaries of
these things in a research setting right
now that project I think lasted two
three years there's some really cool
outcomes of that so one of things that's
really interesting is um
I published a talk at an LM conference
in 2018 again that seems like so long
ago about graph program abstraction
which is basically the thing that's in
pytorch too
and so Pi Torch 2 with all this Dynamo
real thing it's all about this graph
program abstraction thing from Python
bytecodes and so a lot of the research
that was done
um ended up pursuing and going out
through the industry and influencing
things and I think it's super exciting
and awesome to see that
but the software testflow project itself
did not work out super well and so
there's a couple of different problems
with that one of which is that you may
have noticed Swift is not python
there's a few people that write python
code yes and so it turns out that all of
ml is pretty happy with python it's
actually a problem that other
programming languages have as well that
they're not python well probably maybe
briefly talk about Julia was a very
interesting uh beautiful programming
language but it's not python exactly
well and so if and so like if you're
saying I'm going to solve a machine
learning problem where all the
programmers are python Pro programmers
yeah and you say the first thing you
have to do is switch to a different
language
well your new thing may be good or bad
or whatever but if it's a new thing the
adoption barrier is massive it's still
possible still possible yeah absolutely
the world changes and evolves and
there's definitely room for new new and
good ideas but it just makes it so much
harder right and so
lesson learned Swift is not Python and
people are not always in search of like
learning a new thing for the sake of
learning a new thing and if you want to
be compatible with all the world's code
turns out
meet the world where it is right second
thing is that
um you know a lesson learned is that uh
Swift as a very fast and efficient
language kind of like Mojo but a
different a different take on it still
um
really worked well with eager mode
and so eager mode is something that
pytorch does and it proved out really
well and it enables really expressive
and dynamic and easy to debug
programming
um tensorflow at the time was not set up
for that
let's say that was not the timing is
also important in this world yeah yeah
intensive flow is a good thing and it
has many many strengths but uh
you could say Swift potential is a good
idea except for the Swift and except for
the tensorflow part
sell it because it's not Python and
tensorflow because it's not it wasn't
set up for eager mode at the time yeah
it was 1.0 exactly yeah and so one of
the so one of the things about that is
in the context of it being a research
project I'm very happy with the fact
that we built a lot of really cool
technology we learned a lot of things I
think the ideas went on to have
influence and other systems like pytorch
a few people use that right here right
and so I think that's super cool and for
me personally I learned so much from it
right and I think a lot of the engineers
that worked on it also learned a
tremendous amount and so you know I
think that
um that's just really exciting to see
and and you know I'm sorry that the
project didn't work out I wish it did of
course right but um
uh but you know it's it's a research
project and so you're there to learn
from it but it's interesting to think
about
uh the evolution of programming
as we come up with these whole new set
of algorithms in machine learning in
artificial intelligence and what's going
to win out because it could be a new
programming language yeah it could be
um I mean we I just mentioned Julia I
think there's a lot of ideas behind
Julia that
Mojo shares
um what what are your thoughts about
Julia in general
um
um so I would I will have to say that
when we launched Mojo the
one of the biggest things I didn't
predict was the response from the Julia
community and so
um I was not I mean I've okay let me
take a step back I've known the Julia
folks for a really long time they were
they're an adopter of llvm a long time
ago they've been pushing state of the
art in a bunch of different ways Julie
is a really cool system
um I had always thought of Julia as
being mostly a scientific Computing
focused environment right and and I
thought that was its focus
um I neglected
to understand that one of their missions
is to like help make python work end to
end and so I think that was my my error
for not understanding that and so I
could have been maybe more sensitive to
that but um but there's major
differences between what Mojo's doing
what Julie is doing so as you say Julia
is not python
right and so one of the things that a
lot of the Julia people came out and
said is like okay well if we put a ton
of more energy and ton more money or
engineering or whatever into Julia maybe
uh that would be better than starting
Mojo right
well I mean maybe that's true but it
still wouldn't make Julian to python so
if you've worked backwards from the goal
of let's build something for python
programmers without requiring them to
relearn syntax
then Julia just isn't
there right I mean that's a different
thing right and so if you anchor on I
love Julia and I want Julia to go
further then you can you can look at it
from a different lens But the lens we
were coming at it was hey everybody is
using python python isn't syntax isn't
broken let's take what's great about
Python and make it even better and so
it's just a different starting point so
I think Julie is a great language the
community is a lovely Community they're
doing really cool stuff but it's just a
different a slightly different angle
but it does seem that python is quite
sticky uh is there some
uh philosophical almost thing you could
say about why python by many measures
seems to be the most popular programming
language in the world well I can tell
you things I love about it maybe that's
one way to answer the question right so
huge package ecosystem
super lightweight and easy to integrate
it has very low startup time
right so what startup time you mean
money curve or what yeah so if you if
you look at certain other languages that
you know you say like go and it just
takes a like Java for example it takes a
long time to compile all the things and
and then the the VM starts up and the
garbage clusters kicks in and then it
revs its engines and then it can plow
through a lot of Internet stuff or
whatever right
um python is like scripting like it's it
just goes right python has very low
compile time like so you're not sitting
there waiting python integrates into
notebooks in a very elegant way that
makes exploration super interactive and
it's awesome right python is also um
it's like almost the glue of computing
because it has such a simple object
representation a lot of things plug into
it that Dynamic meta programming thing
we were talking about also enables
really expressive and beautiful apis
right so there's lots of reasons that
you can look at
technical things that python has done
and say like okay well this is actually
a pretty amazing thing and any one of
those you can neglect people all just
talk about indentation
and ignore like the fundamental things
but then you also look at the community
side right so python owns machine
learning
machine learning is pretty big yeah and
it's growing and it's growing right and
it's growing in importance right and so
and there's a reputation of prestige to
machine learning to where like if you're
a new programmer you're thinking about
like which programming language do I use
well I should probably care about
machine learning therefore let me try
Python and what kind of builds and
builds a bit and even go go back before
that like my kids Learn Python
probably not because I'm telling them to
Learn Python but because were they
replying against you or what no no well
they also learn scratch right and things
like this too but it's because python is
taught everywhere right because it's
easy to learn right and because it's
pervasive right and there's like my day
we learned Java and C plus plus yeah but
uphill both directions but yes I guess
python is the main language of teaching
software engineering schools now yeah
well if you look at if you look at this
there's these growth Cycles right if you
look at what causes things to become
popular and then gain in popularity
there's reinforcing feedback loops and
things like this and I think python has
done again the whole Community has done
a really good job of building those
growth loops and help Propel the
ecosystem and I think that again you
look at what you can get done with just
a few lines of code it's amazing so this
kind of self
building Loop
it's interesting to understand because
when you look at Mojo what it stands for
some of the features
it seems sort of clear that this is a
good direction for programming languages
to evolve in the machine Learning
Community but it's still not obvious
that it will because of this
whatever the engine of popularity of
virality
um is there something you could speak to
like how how do you get people to switch
Yeah well I mean I think that the the
the the viral growth Loop is to switch
people to Unicode yeah I think the
Unicode file extensions are what I'm
betting on I think that's going to be
the thing yeah tell the kids that you
could use the fire emojis exactly what
exactly
uh well in all seriousness like I mean I
think there's really I'll give you two
opposite answers one is
I hope if it's useful if it solves
problems and if people care about those
problems being solved
they'll adopt the tech
right that's that's kind of the simple
answer and when you're looking to get
Tech adopted the question is is it
solving an important problem people need
solved and is the adoption cost low
enough that they're willing to make the
switch and cut over and do do the pain
up front so they can actually do it
right
and so hopefully Mojo will be that for a
bunch of people and you know people
building these hybrid packages are
suffering it's really painful and so I
think that we have a good shot of
helping people but the other side is
like it's okay if people don't use Mojo
like it's not my job to say like
everybody should do this like I'm not
saying python is bad like I hope python
see python like all these
implementations because python ecosystem
is not just C python it's also a bunch
of different implementations with
different trade-offs and this ecosystem
is really powerful and exciting
um as are other programming languages
it's not like typescript or something is
going to go away right and so it's not a
there's not a winner take all thing and
so I hope that Mojo is exciting and
useful to people but if it's not that's
also fine but I also wonder what uh
the use case
for why you should try Mojo would be so
practically speaking Yeah it seems like
uh so there's entertainment there's a
dopamine hit of saying holy this is
10 times faster
uh this little piece of code is 10 times
faster in Mojo out of the box before you
get to 35 000. exactly I mean just even
that I mean that's the dopamine hit that
uh every programmer sort of dreams of is
uh the optimization it's it's also the
drug that can uh pull you in and have
you waste way too much of your life
without optimizing and over optimizing
right
um but so what uh what do you see it
would be like comedy is this very hard
to predict of course but
um you know if you look 10 years from
now on Mojo's uh super successful what
do you think would be the thing
where people like try it and then use it
regularly and it kind of grows and grows
and grows well let's say you talk about
dopamine hit and so what again humans
are not one thing and
some people love rewriting their code
and learning new things and throwing
themselves in the deep end and trying
out a new thing in my experience most
people don't
like they're too busy they have other
things going on
um by number most people don't want like
this I want to rewrite all my code
but
even those people the two busy people
the people that uh don't actually care
about the language that just care about
getting stuff done those people do like
learning new things
right and so you talk about the dopamine
Rush of 10x faster wow that's cool I
want to do that again well it's also
like here's here's the thing I've heard
about in a different domain and I don't
have to write all my code I can learn a
new trick right well that's called
growth you know and so and so one thing
that I think is cool about Mojo and
again those will take a little bit of
time for for example the blog posts and
the books and like all that kind of
stuff develop and the languages get
further along but what we're doing you
talk about types like you can say look
you can start with the world you already
know and you can progressively learn new
things and adopt them where it makes
sense
if you never do that
that's cool you're not a bad person
if you if you get really excited about
and want to go all the way in the deep
end and want to rewrite everything and
like whatever that's cool right but I
think the middle path is actually the
more likely one where it's um you know
you you come out with a new a new idea
and you discover wow that makes my code
way simpler way more beautiful way
faster way whatever and I think that's
what people like now if you fast forward
and you said like 10 years out right uh
I can give you a very different answer
on that which is I mean
if you go back and look at what
computers look like 20 years ago
every 18 months they got faster for free
right 2x faster every 18 months it was
like clockwork it was it was free right
you go back 10 years ago and we entered
in this world where suddenly we had
multi-core CPUs and we had gpus
and if you squint and turn your head
what a GPU is it's just a many core or
very simple CPU thing kind of right and
so
um and 10 years ago it was CPUs and gpus
and graphics
today we have CPS gpus graphics
and AI because it's so important because
the compute is so demanding because of
the smart cameras and the watches and
all the different places the AI needs to
to work on our lives it's caused this
explosion of hardware
and so part of my thesis part of my
belief of where Computing goes if you
look out 10 years from now it's not
going to get simpler
physics isn't going back to where we
came from it's only going to get weirder
from here on out right and so to me the
exciting part about what we're building
is it's about building that Universal
platform which the world can continue to
get weird because again I don't think
it's avoidable it's physics but we can
help lift people's scale do things with
it and they don't have to rewrite their
code every time a new device comes out
and I think that's pretty cool and so if
Mojo can help with that problem then I
think that it will be hopefully quite
interesting and quite useful to a wide
range of people because there's so much
potential and like there's someone you
know maybe analog computers will become
a thing or something right and we need
to be able to get into a mode where we
can move this programming model forward
but do so in a way where we're lifting
people and and growing them instead of
forcing them to write all their code and
exploding them do you think there will
be a few major libraries that go Mojo
first
uh well so I mean the modular engines
all Mojo so I can't come back to like
we're not building Mojo because it's fun
we're building Mojo because we had to to
solve these accelerators that's the
origin story but I mean ones that are
currently in Python yeah so I think that
a number of these projects will and so
one one of the things again this is just
my best guess like each of the package
maintainers also has I'm sure plenty of
other things going on people don't like
really don't like rewriting code just
for the sake of rewriting code
um but sometimes like people are excited
about like adopting a new idea yeah it
turns out that while rewriting code is
generally not People's First
thing turns out that redesigning
something while you rewrite it and using
a rewrite as an excuse to redesign can
lead to the 2.0 of your thing that's way
better than the 1.0 right and so I have
no idea I can't predict that but there's
a lot of these places where again if you
have a package that is half C and half
python right it it just solve the pain
make it easier to move things faster
make it easier to debug and evolve your
Tech adopting Mojo kind of makes sense
to start with and then it gives you this
opportunity to rethink these things so
the two big gains are that the there's a
performance gain
and then
um there's the
portability to all kinds of different
devices and their safety right so you
talk about real types
I mean not saying this is for everybody
but that's actually a pretty big thing
right yeah types are and and so there's
a bunch of different aspects of what you
know what value Mojo provides and so I
mean it's funny for me like I've been
working on these kinds of Technologies
and tools for too many years now
um but you look at Swift right and we
talked about Swift for tensorflow but
Swift as a programming language right
for Swift snow
13 years old from when I started it yeah
so because I started in 2010 if I
remember and so
that that project and I was involved
with it for 12 years or something right
that that project has gone through its
own really interesting story arc right
and it's a mature successful used by
millions of people system right uh
certainly not dead yet right but but
also going through that story arc I
learned a tremendous amount about
building languages about building
compilers about working with community
and things like this and so that
experience like I'm helping Channel and
bring directly into Mojo and you know
other systems same thing like apparently
I like building building and iterating
and evolving things and so you look at
this lvm thing I worked on 20 years ago
you look at mlir right and so a lot of
the Lessons Learned in llvm got fed into
mlir and I think that mlr is a way
better system than lvm was and you know
Swift is a really good system and it's
it's amazing but I hope that Mojo will
take the next step for
step forward in terms of design
in terms of running Mojo people can play
with it what's uh Mojo playground yeah
and uh
from the interface perspective and from
the hardware perspective what's this
incredible thing running on yeah so
right now so here we are two weeks after
launch yes we decided that okay we're we
have this incredible set of technology
that
we think might be good but we have not
given it to lots of people yet and so
we're very conservative and said let's
put it in a workbook so that if it
crashes we can do something about it we
can monitor and track that right so
um again things are still super early
but we're having like one person a
minute
sign up with over 70 000 people two
weeks in it's kind of crazy so you you
can sign up to playground and you can
use it in in the cloud yeah in your
browser and so what that's running on
Notebook yeah what that's running on is
that's running on
um Cloud VMS and so you share a machine
with a bunch of other people but turns
out there's a bunch of them now because
there's a lot of people and so what
you're doing is you're getting free
compute and you're getting a play with
this thing and kind of a limited
controlled way so that we can make sure
that it doesn't
totally crashing
be embarrassing right yeah so um now a
lot of the feedback we've gotten is
people want to download it around
locally so we're working on that right
now and so that's that's the goal to be
able to download locally yeah that's
what everybody expects and so we're
working on that right now and so we just
want to make sure that we do it right
and I think this is this is one of the
lessons I learned from Swift also by the
way
is it when we launch Swift uh gosh it
feels like forever ago it's 2014. and uh
we I mean it was super exciting I and we
the team had worked on Swift for a
number of years in secrecy okay and we
uh four years into this development
roughly of working on this thing
at that point about 250 people at Apple
knew about it yeah okay so secret
Apple's good at secrecy and it was a
secret project and so we launched this
at wwc a bunch of hoopla and excitement
and said developers are going to be able
to develop and submit apps the App Store
in three months okay well several
interesting things happened right so
first of all we learned that a it had a
lot of bugs and it was not actually
production quality and it was extremely
stressful in terms of like trying to get
it working for a bunch of people and so
what happened was we went from zero to
you know I don't know how many
developers Apple had at the time but a
lot of developers overnight and they ran
into a lot of bugs and it was really
embarrassing and it was very stressful
for everybody involved right it was also
very exciting because everybody was
excited about that the other thing I
learned is that when that happened
roughly every software engineer who did
not know about the project at Apple
their head exploded when it was launched
because they didn't know it was coming
and so they're like wait what is this I
I signed up to work for Apple because I
love Objective C why is there a new
thing right and so uh
now what that meant practically is that
the push from launch to first of all the
fall but then to 2.0 and 3.0 and like
ever All the Way Forward was
super painful for the engineering team
and myself it was very stressful the
developer Community was very grumpy
about it because they're like okay well
wait a second you're changing and
breaking my code and like we have to fix
the bugs and it was just like a lot of
tension and friction on all sides
um uh there's a lot of technical debt in
the compiler because we have to run
really fast you have to go implement the
thing and unblock the use case and do
the thing and and you know it's not
right but you never have time to go back
and do it right and I'm very proud of
the Swift team because they've come
I mean we but they came so far and made
so much progress over over this time
since launch it's pretty incredible and
Swift is a very very good thing but I
just don't want to do that again right
and so a more iterate more through the
development process and so what we're
doing is we're not launching it when
it's hopefully is 0.9 with no testers
we're launching it and saying it's 0.1
right and so we're setting expectations
of saying like Okay well don't use this
for production
right if you're interested in what we're
doing we'll do it in an open way and we
can do it together but don't use it in
production yet like we'll get there but
let's let's do it the right way and I'm
also saying we're not in a race
the thing that I want to do is build the
world's best thing yeah right because if
you do it right and it lifts the
industry it doesn't matter if it takes
an extra two months yeah like two months
is worth waiting and so doing it right
and not being overwhelmed with technical
debt and things like this is like again
War wounds
um Lessons Learned uh whatever you want
to say I think is absolutely the right
thing to do even though right now people
are very frustrated that you know you
can't download it or it doesn't have
feature X or something like this and so
what have you learned in the in a little
bit of time since it's been
released into the wild or that people
have been complaining about future X or
Y or Z what have they been complaining
about what they have been
uh excited about like yeah almost like
detailed things versus a big I think
everyone would be very excited about the
big Vision yeah yeah well so I mean I've
been very pleased and in fact I mean
we've been massively overwhelmed with
response which is um a good problem to
have um it's kind of like a success
disaster yeah in a sense right
um and um so I mean if you go back in
time when we started modular which is
just
um not yet a year and a half ago so it's
still a pretty new company new team
small but very good team of people like
we started with extreme conviction that
there's a set of problems we need to
solve and if we solve it then people
will be interested in what we're doing
right but but again you're building in
basically secret right you're trying to
figure it out it's the creation's a
messy process you're having to go
through different paths and understand
what you want to do and how to explain
it often when you're doing disruptive
and new kinds of things
just knowing how to explain it is super
difficult right
um and so when we launched we hope
people would be excited but you know I'm
I'm an optimist but I'm also like don't
want to get ahead of myself and so when
people found out about Mojo I think
their heads exploded a little bit right
and you know here here's a I think a
pretty credible team that has built some
languages and some tools before and so
they have some lessons learned and are
tackling some of the deep problems in
the python ecosystem and giving it the
love and attention that it should be
getting and I think people got very
excited about that and so if you look at
that I mean I think people are excited
about ownership and taking a Step Beyond
rust right there's people that are very
excited about that there's people that
are excited about uh you know just like
I made Game of Life go 400 times faster
right and things like that and that's
really cool there are people that are
really excited about the okay I really
hate writing stuff in C plus plus save
me like systems and you're they're like
stepping up like yeah yes so that's
that's that's that's me by the way also
um I really want to stop writing C plus
plus but the um I get third person
excitement when people tweet here I made
this code Game of Life or whatever it's
faster and you're like yeah yeah and and
also like um well I would also say that
um Let me let me cast blame out to
people who deserve it sure these
terrible people who convinced me to do
some of this yes Jeremy Howard yes that
guy
well he's been pushing for this kind of
thing he's one of this for years yeah
he's wanted this for a long time he's
won this for years and so for people who
don't know Jimmy Howard he's like one of
the most legit people in the machine
Learning Community he's uh has a
Grassroots he really teaches he's an
incredible educator he's an incredible
teacher but also legit uh in terms of a
machine learning engineer himself yeah I
think he's been running the fast uh dot
Ai and looking I think for uh exactly
what you've done exactly so and so um I
mean the first time so I met Jeremy
pretty early on but the first time I sat
up and I'm like
this guy is ridiculous is when I was at
Google and we're bringing up tpus and we
had a whole team of people and we're
there was this competition called Don
bench of who can train uh imagenet yeah
fastness right yes and Jeremy and one of
his researchers
crushed Google Yeah by not through sheer
force of the amazing amount of compute
and the number of tpus and stuff like
that that he just decided that
Progressive imagery sizing was the right
way to train the model and if you're
Epoch faster and make the whole thing go
go vroom right yep and I'm like this guy
is incredible right so you can say
anyways come back to you know where's
Mojo coming from Chris finally listened
to Jeremy
it's all his fault well there's a kind
of very uh
refreshing uh pragmatic view that he has
about machine learning that
um I don't know if it's like this mix of
a desire for efficiency But ultimately
grounded in a desired to make uh machine
learning more accessible to a lot of
people I don't know what that is I guess
that's coupled with efficiency and
performance but it's not just obsessed
about performance well so so a lot of AI
and AI research ends up being that it
has to go fast enough to get scale so a
lot of people don't actually care about
performance particularly on the research
side until it allows them to have more a
bigger data set
right and so suddenly now you care about
distributed compute and like all these
exotic HPC like you don't actually want
to know about that you just want to be
able to do more experiments faster and
do so with bigger data sets right and so
Jeremy has been really pushing limits
and one of the things I'll say about
Jeremy and there's many things I could
say about Jeremy because I'm a fanboy of
his but uh he uh it fits in his head
and Jeremy actually takes the time where
many people don't to really dive deep
into why is the beta parameter of the
atom Optimizer equal to this yeah right
and he'll go survey and understand what
are all the activation functions in the
trade-offs and why is it that everybody
that does uh you know this model pick
that thing so the why not just trying
different values like really what is
going on here right and so as a
consequence of that like he's always he
again he makes time but he he spends
time to understand things that are depth
that a lot of people don't and as you
say he then brings it and teaches people
and he's his mission is to help lift you
know his website says making AI uncool
again like it's about like forget about
the hype list it's actually practical
and useful let's teach people how to do
this right now the problem Jeremy
struggled with is he's pushing the
envelope
right research isn't about doing the
thing that is staying on the happy path
or the the well-paved road right and so
a lot of the systems today have been
these really frag fragile fragmented
things or special case in this happy
path and if you fall off the happy path
you get eaten by an alligator
so
what about uh so python has this giant
ecosystem of packages uh and there's a
package repository do you have ideas of
how to do that well for Mojo
yeah how to do a repository of packages
well so that's another really
interesting problem that I knew about
but I didn't understand how big of a
problem it was uh python Packaging
a lot of people have very big pain
points and a lot of scars with python
packaging Oh you mean uh so there's
several things building and distributing
yes managing dependencies and versioning
and all this stuff so from the
perspective of if you want to create
your own package yes yeah and then or
you want to build on top of a bunch of
other people's packages and then they
get updated and it's like this now I'm
not an expert in this so I don't know
the answer I think this is one of the
reasons why it's great that we work as a
team and there's other really good and
smart people involved
um the uh but one of my
one of the things I've heard from smart
people who've done a lot of this is that
the packaging becomes a huge disaster
when you get the python and C together
and so if you have this problem where
you have code split between Python and C
now not only do you have to package the
C code you have to build the C code C
doesn't have a package manager right C
doesn't have a dependency versioning
management system right and so I'm not
experiencing the state of the art and uh
all the different python package
managers but my understanding is that's
a massive part of the problem and I
think Mojo solves that part of the
problem directly heads on now one of the
things I think we'll do with the
community and this isn't again we're not
solving all the world's problems at once
we have to be kind of focused start with
is that I think that we will have an
opportunity to reevaluate Packaging
right and so I think that we can come
back and say okay well given the new
tools and Technologies and the cool
things we have that we've built up
because we have not just syntax we have
an entirely new compiler stack that
works in a new way maybe there's other
Innovations we can bring together and
maybe we can help solve that problem so
almost a tangent to that question from
the user perspective of packages
it was always surprising to me
that it was not easier to sort of
explore and find packages
you know with with Pip install and it
just it feels uh it's an incredible
ecosystem it's just uh interesting that
it wasn't made it's still I think not
made easier to discover packages to do
yeah like uh
uh search and Discovery as YouTube calls
it well I mean it's kind of funny
because this is one of the challenges of
these like
intentionally decentralized communities
and so I don't know what the right
answer is for python I mean there are
many people that
or I don't even know the right answer
for Mojo
so there are many people that would have
much more informed opinions than I do
but but it's interesting if you look at
this right open source communities
um you know there's git git is a fully
decentralized anybody could do it any
way they want but then there's GitHub
right and GitHub centralized commercial
in that case right thing uh really help
pull together and help solve some of the
discovery problems and help build a more
consistent community and so maybe
there's opportunities for something like
a GitHub yeah although even GitHub I
might be wrong on this but the search
and Discovery for GitHub is not that
great like I still use Google Search
yeah well I mean make it maybe that's
because GitHub doesn't want to replace
Google Search right and I think there is
room for specialized solutions to
specific problems but sure I don't know
I don't know the right answer for GitHub
either that's I think they can go figure
that out but the point is to have an
interface that's usable that's
accessible to people of all different
skill levels well well and again like
what what are the benefit of Standards
right standards allow you to build these
next level up ecosystem next level of
infrastructure or next level of things
and so
um again come back to I hate complexity
see C plus python is complicated it
makes everything more difficult to deal
with it makes it difficult to Port move
code around work with all these things
get more complicated and so I mean I'm
not an expert but maybe Mojo can help a
little bit by helping reduce the amount
of C in this ecosystem and make it
therefore scale better so any kind of
packages that are hybrid in nature would
be a natural fit to move to Mojo which
is a lot of them by the way yeah
a lot of them especially they're doing
some interesting stuff computational
wise
let me ask you about some features yeah
so we talked about obviously the
indentation that it's a type language or
optionally typed
is that the right way to say it it's
either optionally or progressively or
aggressively I think so so so people
have very strong opinions on the right
word to use yeah I don't know I look
forward to your letters uh so there's
the the VAR versus let but let is for
constants uh VAR is an optional uh yeah
makes it mutable so you can reassign
okay uh then there's uh
function overloading oh okay yeah so I
mean there's a lot of source of
happiness for me but function
overloading that's
um I guess is that is that for
performance or is that why does python
not have function overloading
so I can speculate so um python is a
dynamic language the way it works is
that um
uh python Objective C are actually
very similar worlds if you ignore syntax
and so
uh Objective C is straight line derived
from small talk
they're really venerable interesting
language that much of the world has
forgotten about but the people that
remember it love it generally and the
way that Small Talk Works is that every
object has a dictionary in it and the
dictionary maps from the name of a
function or the name of a value within
an object to its implementation
and so the way you call a method in
Objective C is you say go look up the
way I call Foose I go look up Foo I get
a pointer to the function back and then
I call it okay that's how python works
right and so now the problem with that
is that
the dictionary within a python object
all the keys are strings
and it's a dictionary yeah so you can
only have one entry per name you think
it's as simple as that I think it's as
simple as that and so now why do they
never fix this like why do they not
change it to not be a dictionary like do
other things
um well you don't really have to in
Python because it's Dynamic and so you
can say I get into the function now if I
got past an integer do some Dynamic
tests for it if it's a string go do
another thing there's another additional
challenge which is even if you did
support overloading you're saying okay
well here's a version of a function for
integers and a function for Strings well
you'd have even if you could put it in
that dictionary you'd have to have the
caller do the dispatch and so every time
you call the function you'd have to say
like is an integer is it a string and so
you have to figure out where to do that
test and so in a dynamic language
um overloading is something you
generally you don't have to have so
but now you get into a type language and
you know in Python if you subscript with
an integer
then you get typically one element out
of a collection if you subscript with a
range you get a different thing out
right and so often in type languages
you'll want to be able to express the
fact that cool I have different Behavior
depending on what I actually pass into
this thing if you can model that it can
make it safer and more predictable and
faster and like all these things it
somehow feels safer yes but also feels
empowering like in terms of clarity like
you don't have to design hold different
functions yeah well this is also one of
the the challenges with the existing
python typing systems is that in
practice like you take subscript like in
practice a lot of these functions they
don't have one signature right they
actually have different behavior in
different cases and so this is why it's
difficult to like retrofit this into
existing python code and make it
uh play well with typing you kind of
have to design for that okay so there's
a interesting distinction
that people the program python might be
interested in is def versus FN
so it's two different ways to define a
function
yeah and uh FN is uh a stricter version
of death what's the coolness that comes
from the strictness so here you get into
what is the trade-off with the superset
yes okay so superset you have to or you
really want to be compatible if like if
you're doing a superset you've decided
compatibility with existing code is the
important thing even if some of the
decisions they made were maybe not what
you choose yeah okay so that means you
put a lot of time into compatibility and
it means that you get locked into
decisions of the past
even if they may not have been a good
thing right now systems programmers
typically like to control things right
and they want to make sure that you know
not not all cases of course and no and
even systems programmers are not one
thing right but but often you want
predictability and so one of one of the
things that python has for example as
you know is that if you find a variable
you just say x equals four I have a
variable name to X
now I say some long method some some
long name equals 17.
print out some long name
oops but I typoed it right well the
compiler the python compiler doesn't
know in all cases what you're defining
what you're using and did you typo the
use of it or the definition right and so
for people coming from type languages
again I'm not saying they're right or
wrong but that drives them crazy because
they want the compiler to tell them you
typo the name of this thing right and so
what FN does is it turns on as you say
it's a strict mode and so it says okay
well you have to actually declare
intentionally declare your variables
before you use them that gives you more
predictability more error checking and
things like this but you don't have to
uh you don't have to use it and this is
a way that Mojo is both compatible
because deaths work the same way that
deaths have already always worked but it
provides a new alternative that gives
you more control and allows certain
kinds of people that have a different
philosophy to be able to express that
and get that but usually if you're
writing Mojo code from scratch you'll be
using FN
it depends again it depends on your
mentality right it's not it's not the
deafest python and FN is Mojo Mojo has
both and it loves both right it really
depends on it's just strict yeah exactly
do you are you playing around and
scripting something out is it a one-off
throwaway script cool like python is
great at that I'll still be using
nothing but yeah well so I I love
strickness okay well so control Power
you also like suffering right yes go
hand in hand how many how many pull-ups
I have lost count at this yeah at this
point so I mean that's cool I love you
for that yeah some and I love other
people like strict things right but but
I don't want to say that that's the
right thing because Python's also very
beautiful for hacking around and doing
stuff and research and these other cases
where you may not want that you see I
just feel like
uh maybe I'm wrong with that but it
feels like strictness leads to faster
debugging so in terms of going from
even on a small project from zero to
completion it's just I guess it depends
how many bugs you generate usually well
so I mean if it's again Lessons Learned
in looking at the ecosystem it's really
I mean I think it's
if you study some of these languages
over time like the Ruby Community for
example now Ruby is a pretty well
developed pretty established Community
but along their path they really
invested in unit testing
like so I think that the Ruby Community
is really pushed forward the state of
the art of testing because they didn't
have a type system that caught a lot of
bugs at compel time right and so you can
have the best of both worlds you can
have good testing and good types right
and things like this but but I thought
that that it was really interesting to
see how certain challenges get solved
and in Python for example
the interactive notebook kind of
experiences and stuff like this are
really amazing if you typo something it
doesn't matter it just tells you that's
fine right and so I think that the
tryouts are very different if you're
building a
um you know large scale production
system versus you're building and
exploring a notebook and the speaking of
control the hilarious thing if you look
at code I write just for myself for fun
it's like littered with asserts
everywhere okay
it's a kind of yeah you would like to
ask it's basically saying uh in a
dictatorial way this should be true now
otherwise everything stops and that that
is the sign I love you man but that is a
sign of somebody who likes control yeah
and so yes I think that you'll like I
think you're like Mojo therapy session
yes I definitely will uh uh speaking of
asserts uh exceptions are called errors
why is it called errors so we I mean we
we use the same we're the same as python
right but um we implemented a very
different way right and so if you look
at other languages like we'll pick on C
plus plus our favorite right uh C plus
plus has this thing called zero cost
exception handling
okay see and this is
in my opinion
something to learn lessons from it's a
nice polite way of thing and so
um and so zero cost exception handling
the way it works is that it's called
zero cost because
if you don't throw an exception there's
supposed to be no overhead for the
non-error code and so it takes the error
path out of the uh the common path
um it does this by making throwing an
error extremely expensive and so if you
actually throw an error with a C plus
plus compiler using exceptions let's go
look up in tables on the side and do all
the stuff and so throwing an error could
be like 10 000 times more expensive than
returning from a function right also
it's called zero cost exceptions but
it's not zero cost by any stretch of the
imagination because it massively blows
out your code your binary it also adds a
whole bunch of different paths because
of destructors and other things like
that that exist in C plus plus and it
reduces the number of optimizations it
has like all these effects and so this
thing that was called zero cost
exceptions
it really ain't okay now if you fast
forward to newer languages and um and
this includes Swift and rust and go and
and now Mojo
um
uh well in Python's a little bit
different because it's interpreted and
so like it's got a little bit of a
different thing going on but if you look
at it if you look at compiled languages
um many neural languages say okay well
let's not do that zero cost exception
handling thing let's actually treat and
throwing an error the same as returning
a variant returning either the normal
result or an error now
programmers generally don't want to deal
with all the typing machinery and like
pushing around a variant and so you use
all the syntax that python gives us for
example try and catch and it you know
functions that raise and things like
this you can put erases decorator on
your functions stuff like this and if
you want to control that and then the
language can provide syntax for it but
under the hood the way the computer
executes it throwing errors basically as
fast as returning something interesting
so it's exactly the same way it's from a
compiler perspective and so this is
actually I mean it's a fairly nerdy
thing right which is why I love it
um but the uh this has a huge impact on
the way you design your apis
right so in C plus plus
huge communities turn off exceptions
because the cost is just so high right
and so the zero cost cost is so high
right and so that means you can't
actually use exceptions in many
libraries
right and even for the people that do
use it well okay how and when do you
want to pay the cost if I try to open a
file should I throw an error well what
if I'm probing around looking for
something right I'm looking up in many
different paths well if it's really slow
to do that maybe I'll add another
function that doesn't throw an error it
returns an error code instead and now I
have two different versions the same
thing and so it causes you to Fork your
apis and so you know one of the things I
learned from Apple and isil love is the
art of API design is actually really
profound I think this is something that
Python's also done a pretty good job at
in terms of building out this
large-scale package ecosystem it's about
having standards and things like this
and so you know we wouldn't want to
enter a mode where
um you know there's this theoretical
feature that exists in language but
people don't use it in practice
now I'll also say one of the other
really cool things about this
implementation approach is that it can
run on gpus and it can run on
accelerators and things like this and
that standard zero cost exception thing
would never work on an accelerator and
so this is also part of how Mojo can
scale all the way down to like little
embedded systems and to running on gpus
and things like that can you actually
say about the
maybe uh is there some high-level way to
describe the challenge of
exceptions and how they work in code
during compilation so it's just this
idea of percolating up a thing
an error yeah yeah so the way the way to
think about it is
um think about a function that doesn't
return anything just as a simple case
right and so you have
function one calls function two calls
function three calls function four
along that call stack that are try
blocks right and so if you have function
one calls function two function two has
a try block and then within it it calls
function three right well what happens
if function three throws
well actually start simpler what happens
if it returns well if it returns it's
supposed to go back out and continue
executing and then fall off the bottom
of the try block and keep going and
all's good
if the function throws you're supposed
to exit the current function
and then get into the accept Clause
right and then do whatever code's there
and then keep falling on and going on
and so the way that a compiler like Mojo
works is that the call to that function
which happens in the accept block calls
the function and then instead of
returning nothing
it actually returns you know a variant
between nothing and an error
and so if you return normally go off the
bottom or do a return
you refer nothing and if you throw throw
an error you
return the variant that is I'm an error
right so when you get to the call you
say okay cool I called a function hey I
know locally I'm in a try block
right and so I I call the function and
then I check to see what it returns aha
if it's that error thing jump to the
accept block
and that's all done for you behind the
scenes exactly and so the competitors
all this for you and I mean one of the
things if you dig into how this stuff
works in Python it gets a little bit
more complicated because you have
finally blocks which now need you need
to go into do some stuff and then those
can also throw and return wait what
nothing and like the stuff matters for
compatibility
um like there's there's nestum there's
with Clauses and so with Clauses are
kind of like finally blocked with some
special stuff going on and so there's
nothing in general nesting of anything
nothing of functions should be illegal
it just feels like it adds a level of
complexity Lex I'm merely an implementer
oh this is again yeah one of one of the
one of the trade-offs you get when you
decide to build a superset is you get to
implement a full Fidelity implementation
of the thing that you decided is good
and so
yeah I mean we can we can complain about
the reality of the world and Shake our
fist but it always feels like you
shouldn't be allowed to do that like to
declare functions in certain functions
inside functions
what happened to lacks the the lisp guy
no I understand that but lisp is what I
used to do in college so now you've
grown up
you know we've all done things in
college we're not proud
okay yeah I was gonna say you're afraid
of me you're taking the whole internet
it's uh it worked it worked as a joke in
my head and yeah it was right so so
message functions are joking aside
actually really great and for certain
things right and so these are also
called closures
closures are pretty cool and you can
pass callbacks there's a lot of good
patterns and so uh So speaking of which
I don't think you have uh nested
functions implemented yet in Mojo we
don't have Lambda syntax but we do have
the synthetics functions yeah so there's
a few things on the roadmap they have
that it would be cool to sort of just
fly through because it's interesting to
see you know how many features there are
in a language small and big yep they
have to implement yeah so first of all
there's Tuple support and that has to do
with some very specific aspect of it
like the parentheses or not parentheses
that yeah this is just a totally a
syntactic thing a syntactic thing okay
there's but it's cool it's still uh
so keyword arguments and functions yeah
so this is where in Python you can say
call a function x equals four yeah and X
is the name of the argument that's a
nice sort of documenting
self-documenting feature yeah I mean and
again this isn't rocket science to
implement that's just the laundry it's
just on the list
uh the bigger features are things like
traits so traits are when you want to
Define abstract so when you get into
typed languages you need the ability to
write generics and so you want to say I
want to write this function and now I
want to work on all things that are
arithmetic like
well what does arithmetic like mean well
arithmetic-like is a categorization of a
bunch of types and so it's again you can
Define many different ways and I'm not
going to go into ring Theory or
something but the uh you know you can
say it's arithmetic like if you can add
subtract multiply divide it for example
right and so what you're saying is
you're saying there's a set of traits
that apply to a broad variety of types
and so they're all these types of
arithmetic like all these tensors and
floating Point integer and like there's
this category of of types and then I can
Define on an orthogonal axis algorithms
that then work against types that have
those properties
and so this is a again it's a widely
known thing it's been implemented in
Swift and rust in many languages so it's
not a Haskell
which is where everybody learns learns
their tricks from
um but the uh but we need to implement
that and that will enable a new level of
expressivity
uh so classes yeah class is a big deal
it's a big deal uh still to be
implemented
um like you said Lambda syntax
and there's like detailed stuff like
coal module import uh
support for top level code and file
scope so and then Global variables also
so being able to have variables outside
of a top level well and so this comes
back to the where Mojo came from and the
fact that it's 0.1 right and so we're
building the modular is building an AI
stack right and an air stack has a bunch
of problems working with hardware and
writing high performance kernels and
doing with kernel Fusion thing I was
talking about and getting the most out
of the hardware and so we've really
prioritized and built Mojo to solve
modulus problem
right now our North Stars build out and
support all the things and so we're
making incredible progress by the way
Mojo's only like seven months old so
that's another interesting thing I mean
part of the reason I wanted to mention
some of these things is like there's a
lot to to do and it's pretty cool how
you just kind of sometimes you take for
granted how much there is in a
programming language how many cool
features you kind of rely on and this is
kind of a nice reminder when you lay it
as a to-do list yeah and so I mean but
also you look into
it's it's amazing how much is also there
and you take it for granted that
um a value if you define it it will get
destroyed automatically
like that little feature itself is
actually really complicated given the
way the ownership system has to work and
the way that works within Mojo is a huge
step forward from what Russ and Swift
have done can you say that again when a
value when you define it gets destroyed
yeah so like say you have a string right
so you just find a string on the stack
okay whatever that means like in in your
local function
right and so you say uh like whether it
being a deaf once they just say x equals
hello world right well if your string
type requires you to allocate memory
then when it's destroyed you have to
deallocate it so in Python and Mojo you
define that with the Dell method right
where does that get run
well it gets run sometime between the
last use of the value and
the end of the program like in this you
know get into a garbage collection you
get into like all these long debated you
talk about religions and and trade-offs
and things like this this is a hugely
hotly contested world
if you look at C plus plus the way this
works is that
if you define a variable or a set of
variables within a function they get
destroyed in a last in first out order
so it's like nesting okay
um this has a huge problem because if
you define you have a big scope I need
to find a whole bunch of values at the
top and then you use them and then you
do a whole bunch of code that doesn't
use them they don't get destroyed until
the very end of that scope right and so
this also destroys tail calls it's a
good functional programming right this
this has a bunch of different impacts on
um you know you talk about reference
counting optimizations and things like
this a bunch of very low level things
and so what Mojo does it has a different
approach on that from any language I'm
familiar with where it destroys them as
soon as possible
and by doing that you get better memory
use you get better predictability you
get tail calls that work like you get a
bunch of other things you get better
ownership tracking there's a bunch of
these very simple things that are very
fundamental that are already built in
there in Mojo today that are the things
that nobody talks about generally but
when they don't work right you find out
and you have to complain about is it
trivial to know
uh what's the soonest possible to delete
a thing that's not going to be used
again yeah well I mean it's generally
trivial it's it's after the last use of
it so if you find X as a string and then
you have some use of X somewhere in your
code within that scope I mean within the
scope that's accessible it's yeah
exactly so you can only use something
within its scope and so then it doesn't
wait until the end of the script to
delete it it destroys it after the last
years so there's kind of some very ego
machine that's just sitting there and
deleting yeah and it's all in the
compiler so it's not at runtime which is
also cool and so yeah and so what and
this is actually non-trivial because you
have control flow right and so it gets
complicated pretty quickly and so like
getting straight was not also you have
to insert delete like in a lot of places
potentially yeah exactly so the compiler
asks a reason about this and this is
where again it's experience building
languages and not getting this right so
again you get another chance to do it
and you get basic things like this right
but it's it's extremely powerful when
you do that right and so there's a bunch
of things like that that kind of combine
together
and this comes back to the you get a
chance to do it the right way do it the
right way and make sure that every brick
you put down is really good so that when
you put more bricks on top of it they
stack up to something that's beautiful
well there's also
like how many
design discussions do there have to be
about particular details like
implementation of particular small
features because the features that
seem small I bet some of them might be
like
really uh require really big design
decisions yeah well so I mean let me
give you another example of this python
has a feature called async await so it's
it's a new feature I mean in in the long
arguments on History it's a relatively
new feature right that allows way more
expressive asynchronous programming okay
again this is this is a Python's a
beautiful thing and they did things that
are great for Mojo for completely
different reasons
um the reason that async await got added
to python as far as I know is because
python doesn't support threads
okay and so python doesn't support
threads but you want to work with
networking and other things like that
that can block I mean python does
support threads it's just not its
strength and so
um
and so they added this feature called
async await it's also seen in other
languages like Swift and JavaScript and
many other places as well
um async wait and Mojo's amazing because
we have a high performance heterogeneous
compute runtime underneath the covers
that then allows non-blocking IO so you
get full use of your accelerator that's
huge it turns out it's actually really
an important part of fully utilizing a
machine you talk about design
discussions that took a lot of
discussions right and it probably will
require more iteration and so My
Philosophy with Mojo is that you know we
have a small team of really good people
that are pushing forward and they're
very good at the extremely deep knowing
how the compiler and runtime and like
all the the low-level stuff works
together
um but they're not perfect the same
thing as the Swift team right and this
is where one of the reasons we released
Mojo much earlier is so we can get
feedback and we've already like renamed
a keyword and did a community feedback
and which one uh we use an ampersand and
now it's named in out we're not renaming
existing python keywords because that
breaks compatibility right we're
renaming things we're adding and making
sure that they are designed well we get
usage experience we iterate and work
with the community because again if you
scale something really fast and
everybody write the older code and they
start using it in production then it's
impossible to change and so you want to
learn from people you want to iterate
and work on that early on and this is
where design discussions it's it's
actually quite important could you could
you incorporate an emoji like into the
language into the main language
do you have a favorite one well I really
like uh in terms of humor like uh RAW
full whatever rolling on the floor
laughing
so that could be like a
what would that be the use case for that
I can accept throw an exception of some
sort I don't know you should totally
file a feature request
uh or maybe a hard one it has to be a
hard one uh people have told me that I'm
insane so this is this is this is I I'm
liking this
I'm gonna I'm gonna use the viral nature
of the internet to actually get this to
get this passed uh I mean it's funny you
come back to the flame Emoji file
extension right the uh um you know we
have the option to use the flame Emoji
which just even that concept because for
example the people at GitHub say no I've
seen everything like
yeah there's something uh it kind of
it's reinvigorating it's like uh
it's like oh that's possible that's
really cool that for some reason that
makes everything else actually I'm
really excited the world is ready for
this stuff right and so you know when we
have a package manager we'll clearly
have to innovate by having the compiled
package thing be the little box with the
bow on it right I mean
it has to be done it has to be done is
there some stuff on the road map that
you're particularly stressed about or
excited about that you're thinking about
a lot I mean as a today snapshot which
will be obviously tomorrow uh the
lifetime stuff is really exciting and so
lifetimes give you safe references to
memory without dangling pointers and so
this has been done in languages like
Russ before and so we have a new
approach which is really cool I'm very
excited about that that'll be out to the
community very soon
um the traits feature is really a big
deal and so that's blocking a lot of API
design and so there's that I think
that's really exciting
um
a lot of it is these kind of table
Stakes features
um one of the things that is again also
Lessons Learned with Swift uh
is that uh programmers in general like
to add syntactic sugar
and so it's like oh well this annoying
thing like like in Python you have to
spell ad
why can't I just use plus def plus come
on why can't I just do that right and so
try a little bit of syntactic sugar it
makes sense it's beautiful it's obvious
we're trying not to do that
and so
um for two different reasons one of
which is that again lesson learn Swift
Swift has a lot of syntactic sugar
um
which may maybe a good thing maybe not I
don't know but um but because it's such
an easy and addictive thing to do sugar
like make sure blood get crazy right
um like the community will really dig
into that and want to do a lot of that
and I think it's very distracting from
building the core abstractions second is
we want to be a good member of the
Python community
right and so we want to work with the
broader python community and yeah we're
pushing forward a bunch of systems
programming features and we need to
build them out to understand them but
once we get a long ways forward I want
to make sure that we go back to the
python community and say okay let's do
some design reviews let's actually talk
about this stuff let's figure out how we
want this stuff all to work together and
syntactic sugar just makes all that more
complicated so
and uh yeah list comprehensions that you
have to be implemented and my favorite I
mean dictionaries
yeah but nonetheless it's actually still
quite interesting and useful as you
mentioned modular is very new
Mojo is very new it's a relatively small
team yeah it's building up this yeah
we're just gigantic stack it's
incredible stack that's going to perhaps
Define the future of
development of our AI overlords uh we
just hope it will be useful
as do all of us uh so what uh what have
you learned from this process of
building up a team maybe one question is
how do you hire
great programmers great people that
operate in this
compiler Hardware machine learning
software interface design space yeah and
maybe are a little bit fluid yeah what
they can do so okay so language design
too so building a company is just as
interesting in different ways is
building a language like different skill
sets different things but super
interesting and I've built a lot of
teams in a lot of different places
um if you zoom in from the big problem
into recruiting
well so here's our problem okay I'll
I'll just I'll be very straightforward
about this we started modular with a lot
of conviction about we understand the
problems we understand the customer pain
points we need to work backwards from
the suffering in the industry and if we
solve those problems we think it'll be
useful for people
but the problem is is that the people we
need to hire as you say are all these
super specialized people that have jobs
at Big Tech big Tech worlds right and
you know we I don't think we have
um product Market fit in the way that a
normal startup does we don't have
product Market fit challenges because
right now everybody's using Ai and so
many of them are suffering and they want
help and so again we started with strong
conviction now again you have to hire
and recruit the best and the best all
have jobs and so what we've done is we
said okay well let's build an amazing
culture
start with that that's usually not
something a company starts with usually
you hire a bunch of people and then it
people start fighting and it turns into
gigantic mess and then you try to figure
out how to improve your culture later my
co-founder Tim in particular is super
passionate about making sure that that's
right and we've spent a lot of time
early on to make sure that we can scale
can you come inside before we get to the
second yeah what makes for a good
culture
um so I mean there's many different
cultures and I have learned many things
from
several very unique almost famously
unique cultures and some of them I
learned what to do and some of them I
learned what not to do yep okay and so
um
we want an inclusive culture uh I
believe in like amazing people working
together
and so I've seen cultures where people
you have amazing people and they're
fighting each other
I see amazing people and they're told
what to do like Thou shalt line up and
do what I say it doesn't matter if it's
the right thing do it right and neither
of these is the and I've seen people
that have no Direction they're just kind
of floating in different places and they
want to be amazing they just don't know
how and so a lot of it starts with have
a Clear Vision
right and so we have a clear vision of
what we're doing and um so I kind of
grew up at Apple in my engineering life
right and so a lot of the Apple DNA
rubbed off on me my co-founder Tim also
is like a strong product guy and so what
we learned is you know I decided Apple
that you don't work from building cool
technology you don't work from like come
up with cool product and think about the
features you'll have in the big check
boxes and stuff like this
because if you go talk to customers they
don't actually care about your product
they don't care about your technology
what they care about is their problems
right and if your product can help solve
their problems well hey they might be
interested in that right and so if you
speak to them about their problems if
you understand and you have compassion
you understand what people are working
with then you can work backwards to
building an amazing product so divisions
finding the problem and then you can
work backwards in solving technology got
it and at Apple like it's I think pretty
famously said that you know for every
you know there's a hundred no's for
every yes
I would find that to say that there's a
hundred not yet for every yes but
famously if you go back to the iPhone
for example right the iPhone one every I
mean many people laughed at it because
it didn't have 3G it didn't have copy
and paste
right and then a year later okay finally
it has 3G but it still doesn't have copy
and paste it's a joke nobody will ever
use this product blah blah blah blah
blah blah right well year three it had
copy and paste and people stopped
talking about it right and so and so
being laser focused and having
conviction and understanding what the
core problems are and giving the team
the space to be able to build the right
Tech is really important
um also I mean you come back to
recruiting you have to pay well right so
we have to pay industry leading salaries
and have good benefits and things like
this that's a big piece uh we're a
remote first company and so we have to
uh
uh so remote first has a very strong set
of pros and cons on the one hand you can
hire people from wherever they are and
you can attract amazing talent even if
they live in strange places or unusual
places on the other hand you have time
zones
on the other hand you have like
everybody on the internet will fight if
they don't understand each other and so
we've had to learn how to like have a
system where we actually fly people in
and we get the whole company together
periodically and then we get work groups
together and we plan and execute
together and there's like an intimacy to
the in-person brainstorming yeah I guess
you lose but maybe you don't maybe if
you get to know each other well and you
trust each other maybe you can do that
yeah well so when the pandemic first hit
I mean I'm curious about your experience
too the first thing I missed was having
whiteboards yeah right in those design
discussions where like I can high high
intensity work through things get things
done work through the problem of the day
understand where you're on figure out
and solve the problem and move forward
yeah
um but we figured out ways to work
around that now with you know all these
uh screen sharing and other things like
that that we do the thing I miss now is
sitting down at a lunch table with the
team yeah the spontaneous things like
those the the coffee the coffee bar
things and the and the bumping into each
other and getting to know people outside
of the transactional solve a problem
over Zoom okay and I think there's
there's just a lot of stuff that um I'm
not an expert at this I don't know who
is hopefully there's some people but
there's stuff that somehow is missing on
Zoom
even with the Whiteboard if you look at
that
if you have a room with one person at
the Whiteboard and there's like three
other people at a table
there's uh first of all there's a social
aspect to that where you're just
shooting the a little bit almost
like yeah as people just kind of coming
in and yeah that but also while
like it's a breakout discussion that
happens for like seconds at a time maybe
an inside joke or it's like this
interesting Dynamic that happens that
Zoom you're bonding yeah you're bonding
you're bonding but through that bonding
you get the excitement there's certain
ideas are like complete and
you'll see that in the faces of others
that you won't see necessarily on zoom
in like something it feels like that
should be possible to do
without being in person well I mean
being in person is a very different
thing yeah I don't it's worth it but you
can't always do it and so again we're
still learning and we're also learning
as like Humanity with this new reality
right but um but what we found is that
getting people together whether it be a
team or the whole company or whatever
is it worth the expense because people
work together and are happier
after that like it just it just like
there's a massive period of time where
you like go out and things start getting
frayed pull people together and then you
realize that we're all working together
we see things the same way we work
through the disagreement or the
misunderstanding we're talking across
each other and then you work much better
together and so things like that I think
are really quite important what about uh
people that are kind of specialized in
very different aspects of the stack
working together what are some
interesting challenges there yeah well
so I mean I mean there's lots of
interesting people as you can tell I'm
you know hard to deal with too
but you're one of the most lovable the
uh uh so one of the so there's different
philosophies in building teams uh for me
and so some people say higher 10x
programmers and that's the only thing
that whatever that means right
um what I believe in is building
well-balanced teams teams that have
people that are different in them like
if you have all generals and no troops
or all troops and no generals or you
have all people that think in one way
and not the other way what you get is
you get a very biased and skewed and
weird situation where people end up
being unhappy and so what I like to do
is I like to build teams of people where
they're not all the same you know we do
have teams and they're focused on like
runtime or compiler GPU or Excel or
whatever the specialty is but people
bring a different take and have a
different perspective and I look for
people that complement each other and
particularly if you look at leadership
teams and things like this you don't
want everybody thinking the same way you
want people bringing different
perspectives and experiences and so I
think that's really important that's
team but what about building a a company
as ambitious as modular so what uh some
interesting questions there oh I mean so
many like so um one of the things I love
about okay so modular is the first
company I built from scratch
um
uh one of the first things that was
profound was I'm not cleaning up
somebody else's mess right and so if you
look at and that's liberating to
something it's super liberating and
um and also many of the projects I've
built in the past have not been core to
the product of the company
Swift is not Apple's product
right mlar is not Google's revenue
machine or whatever right it's not it's
it's important but it's like working on
the accounting software for you know the
the retail giant or something right it's
it's it's like enabling infrastructure
and technology and so at modular the the
tech we're building is
here to solve people's problems like it
is directly the thing that we're giving
to people and so this is a really big
difference and what it means for me as a
leader but also for many of our
Engineers is they're working on the
thing that matters and that's actually
pretty I mean again for for compiler
people and things like that that's
that's usually not the case right and so
that's that's also pretty exciting and
and quite nice but the um one of the
ways that this manifests is it makes it
easier to make decisions
and so one of the challenges I've had in
other worlds is it's like okay well
Community matters
somehow for the goodness of the world
like or open source matters
theoretically but I don't want to pay
for a t-shirt
right or some Swag like well t-shirts
cost 10 bucks each you can have 100
t-shirts for a thousand dollars to a
mega Corp a thousand dollars is
uncountably can't count that low right
but justifying it and getting a t-shirt
by the way if you'd like a t-shirt
why would 100
like a t-shirt are you joking you can
have a fire Emoji t-shirt is that I will
I will treasure this I will pass it down
to my grandchildren and so you know it's
it's very liberating to be able to
decide I think that life should have a
t-shirt
right and it becomes very simple
like Lex
it's this uh this is awesome
um so
I have to ask you about the
one of the interesting developments with
large language models
is that they're able to generate code
uh recently really well
I guess to a degree that maybe a
I don't know if you understand but I
have I struggle to understand because it
it forces me to ask questions about the
nature of programming of the nature of
thought
because the uh language models are able
to predict the kind of code I was about
to write so well yep that it makes me
wonder like how unique my brain is and
where the valuable ideas actually come
from like how much do I contribute in
terms of uh
Ingenuity Innovation to code I write or
design and that kind of stuff
when you stand on the shoulders of
giants are you really doing anything and
what L alums are helping you do is they
help you stand on the shoulders of
giants new program there's mistakes
they're interesting that you learn from
but I just it would love to get your
opinion first high level yeah of what
you think about this impact of large
language models when they do program
synthesis when they generate code yeah
well so
um
I don't know where it all goes yeah
um I'm an optimist and I'm a human
Optimist right I think that things I've
seen are that a lot of the llms are
really good at crushing leak code
projects and they can reverse the link
list like crazy well it turns out
there's a lot of
instances of that on the internet and
it's a pretty stock thing and so if you
want to see
standard questions answered LMS can
memorize all the answers and that can be
amazing and also they do generalize out
from that and so there's good work on
that but um but I think that if in my
experience building things building
something like you talk about Mojo where
you talk about these things where you
talk about building an applied solution
to a problem it's also about working
with people
it's about understanding the problem
what is the product that you want to
build what are the use case what are the
customers you can't just go survey all
the customers because they'll tell you
that they want a faster horse maybe they
need a car right and so a lot of it
comes into
um you know I don't feel like we have to
compete with L alums I think they'll
help automate a ton of the mechanical
stuff out of the way and just like you
know I think we all try to scale through
delegation and things like this
delegating wrote things to an llm I
think is an extremely valuable and
approach that will help us all scale and
be more productive but I think it's a
it's a fascinating companion but I'd say
I don't think that means that we're
going to be done with coding
but there's power in it as a companion
from there I could I would love to zoom
in onto Mojo a little bit do you think
uh do you think about that do you think
about llm's generating Mojo code
and helping sort of like when you design
new programming language it almost seems
like man it would be nice to sort of
um
almost as a way to learn how I'm
supposed to use this thing
for them to be trained on some of the
most good so I do lead an AI company so
maybe there will be a Mojo llm at some
point uh but if your question is like
how do we make a language to be suitable
for llms yeah I think that the
um
I think the cool thing about LMS is you
don't have to
and so if you look at what is English or
any of these other terrible languages
that we as humans deal with on a
continuous basis they're never designed
for machines and yet they're the
intermediate representation they're The
Exchange format that we humans use to
get stuff done right and so these
programming languages they're an
intermediate representation between the
human and the computer or the human and
the compiler roughly right and so I
think the llms will have no problem
learning whatever keyword we pick maybe
the Phi Emoji is gonna oh maybe that's
gonna break it it doesn't tokenize no
the reverse of that it will actually
enable it because one of the issues I
could see with being a super set of
python is there would be Confusion by
the gray area
so we'll be mixing stuff
but well I'm a human Optimist I'm also
an llm optimist I think that will solve
that problem but the uh um but but you
look at that and you say okay well
reducing the rote thing right turns out
compilers are very particular and they
really want things they really want the
indentation to be right they really want
the colon to be there on your else or
else it'll complain right I mean
compilers can do better at this but um
lens can totally help solve that problem
and so I'm very happy about the new uh
predictive coding and copilot type
features and things like this because I
think it'll all just make us more
productive it's still messy and fuzzy
and uncertain unpredictable so but is
there a future you see given how big of
a leap gpt4 was where you start to see
something like llms inside a compiler
uh I mean you could do that yeah
absolutely I mean I think that would be
interesting there's otherwise well well
I mean it would be very expensive so
compilers run fast and they're very
efficient and LMS are currently very
expensive there's on device llms and
there's other things going on and so
maybe there's an answer there
um I think that one of the things that I
haven't seen enough of is that
so llms to me are amazing when you tap
into the creative potential of the
hallucinations right and so if you're
building doing creative brainstorming or
creative writing or things like that the
hallucinations work in your favor
um
if your writing code that has to be
correct because you're going to ship it
in production then maybe that's not
actually a feature
and so I think that there there has been
research and there has been work on
building algebraic reasoning systems and
kind of like figuring out more things
that feel like proofs and so I think
that there could be interesting work in
terms of building more reliable scale
systems and that could be interesting
but if you chase that rabbit hole down
the question then becomes how do you
express your intent of the machine and
so maybe you want LM to provide the spec
but you have a different kind of net
that then actually implements the code
right so it's a used documentation and
and inspiration versus the actual
implementation yeah potentially
since uh if successful modular will be
the thing that runs I say so jokingly
our AI overlords but AI systems that are
used across
uh I know it's a cliche term but uh
internet of things so of course so so
I'll joke and say like AGI should be
written in Mojo yeah AGI you're joking
but it's also possible that it's not a
joke uh that a lot of the ideas behind
Mojo is uh seems like the the natural
set of ideas that would enable at scale
training and inference of AI systems
um so just I have to ask you about the
big philosophical question about human
civilization so folks like uh uh
eliezeriatkowski are really concerned
about the threat of AI do you think
about
the the good
and the bad that can happen at scale
deployment of AI systems well so I've
I've thought a lot about it and there's
a lot of different parts to this problem
everything from job displacement to
Skynet things like this and so you can
zoom into sub parts of this problem
um
I'm not super optimistic about AGI being
solved next year
I don't think that's going to happen
personally so you have a kind of
zen-like calm about because there's a
nervousness because the leap of gbt4
seems so big sure that's like we're
almost we're there's some kind of
transition here period you're thinking
well so so I mean there's a couple of
things going on there one is
um I'm sure GPT five and seven and 19
will be also huge leaps
um they're also getting much more
expensive to run and so there may be a
limiting function in terms of just
expense on the one hand and train like
that that could be a limiter that slows
things down but I think the bigger
limiter is outside of like Skynet takes
over and I don't spend any time thinking
about that because if Skynet takes over
and kills us all then I'll be dead so I
don't worry about that so you know I
mean that's just okay I have other
things worry about I'll just focus on
yeah I'll focus and not worry about that
one
um but I think that the the other thing
I'd say is that
AI moves quickly but humans move slowly
and we adapt slowly and so what I expect
to happen is just like any technology
diffusion like the promise and then the
application takes time to roll out and
so I think that I'm not even too worried
about autonomous cars defining away all
the taxi drivers remember autonomy is
supposed to be solved by 2020. yeah boy
do I so and um and so like I think that
on the one hand we can see amazing
progress but on the other hand we can
see that uh you know the the reality is
a little bit more complicated and it may
take longer to roll out than than you
might expect well that's in the physical
space I I do think in the digital space
is a the stuff that's built on top of
llms that runs
you know the millions of apps that could
be built on top of them
and they could be run on millions of
devices millions of types of devices
I I just think
that the rapid effect it has in human
civilization could be truly
transformative to it yeah you don't even
know well so that predict well and there
I think it depends on are you an
optimist or a pessimist yeah or a
masochist
um just to clarify uh optimist about
human civilization me too and so I look
at that as saying okay cool well yeah I
do right and so some people say oh my
God it's going to destroy us all how do
we prevent that I I kind of look at it
from a is it going to unlock us all
right you talk about coding it's going
to make so I don't have to do all the
repetitive stuff
well suddenly that's a very optimistic
way to look at it and you look at what a
lot of a lot of these technologies have
done to improve our lives and I want
that to go faster
what do you think the future of
programming looks like in the next 10 20
30 50 years
there are the limbs and uh with with
Mojo with modular like your vision for
devices the hardware to compilers to
this to the different stacks of software
yeah well so what I want I mean coming
coming back to my arch nemesis right
it's complexity right so again me being
The Optimist if we drive down complexity
we can make these tools these
Technologies these cool Hardware widgets
accessible to way more people right and
so what I'd love to see is more
personalized experiences more uh things
the research getting into production
instead of being lost at nerups right
and so and like the the the these things
that impact people's lives by entering
products
and so one of the things that I'm a
little bit concerned about is right now
um the big companies are investing huge
amounts of money and are driving the top
line of AI capability for really quickly
but if it means that you have to have
100 million dollars to train a model or
more 100 billion dollars right well
that's gonna make it very concentrated
with very few people in the world that
can actually do this stuff I would much
rather see
lots of people across the industry
be able to participate and use this
right and you look at this you know I
mean a lot of great research has been
done in the health world and looking at
like detecting mythologies and doing
Radiology with AI and like doing all
these things well the problem today is
that to deploy and build these systems
you have to be an expert in radiology
and an expert in AI
and if we can break down the barriers so
that more people can use AI techniques
it's more like programming python
which roughly everybody can do if they
want to right then I think that we'll
get a lot more practical application of
these techniques and a lot more nicheer
cool but narrower demands I think that's
that's going to be really cool do you
think we'll have more or less
programmers in the world than no well so
um I think we'll have more more
programmers but they may not consider
themselves to be programmers that'd be a
different name for you right I mean do
you consider somebody that uses uh you
know I think that arguably the most
popular programming language is Excel
yeah
right yeah and so do they consider
themselves to be programmers maybe not I
mean some of them make crazy macros and
stuff like that but but but what what
the you mentioned Steve Jobs it's the uh
bicycle for the mind it allows you to go
faster right and so I think that as we
look forward right what is AI I look at
it as hopefully a new programming
Paradigm it's like object-oriented
programming right if you want to write a
cat detector you don't use for Loops it
turns out that's not the right tool for
the job right and so right now
unfortunately because I mean it's not
unfortunate but it's just kind of where
things are AI is this weird different
thing that's not integrated into
programming languages and normal tool
chains and all the Technologies really
weird and doesn't work right and you
have to babysit it and every time you
switch Hardware it's different shouldn't
be that way when you change that when
you fix that suddenly again the tools
Technologies can be way easier to use
you can start using them for many more
things and so that that's that's what I
would be excited about
what kind of advice could you give to
somebody in high school right now or
maybe early college who's curious about
programming
and
feeling like the world is changing
really quickly here yeah what kind of
stuff to learn what kind of stuff to
work on
should they finish college they
go work at a company there build a thing
what do you think well so I mean one of
the things I'd say is that um you'll be
most successful if you work on something
you're excited by
and so don't get the book and read the
book
cover to cover and study and memorize in
our site and flash card and go build
something like go solve a problem go
build the thing that you wanted to exist
go build an app go build train a model
like go build something and actually use
it and set a goal for yourself and if
you do that then you'll you know there's
a success there's the adrenaline rush
there's the achievement there's the
unlock that I think is where you know if
you keep setting goals and you keep
doing things and Building Things
learning by building is really powerful
um in terms of career advice I mean
everybody's different it's very hard to
give generalized experience generalized
advice
um all speakers you know a compiler nerd
if everybody's going
left sometimes it's pretty cool to go
right yeah and so just because
everybody's doing a thing it doesn't
mean you have to do the same thing and
follow the herd in fact I think that
sometimes the most exciting path through
life lead to being curious about things
that nobody else actually focuses on
right and turns out that understanding
deeply parts of the problem that people
want to take for granted makes you
extremely valuable and specialized in
ways that the herd is not and so again I
mean there's lots of rooms for
specialization lots of rooms for uh
generalists there's lots of room for
different kinds and parts of the problem
but but I think that it's you know just
because everything everybody's doing one
thing doesn't mean you should
necessarily do it and now the herd is
using python so if you want to be a
rebel
go check out Mojo and uh help Chris and
the rest of the world fight the arch
nemesis of complexity because simple is
beautiful there you go
because you're an incredible person
you've uh you've been so kind to me ever
since we met they've been extremely
supportive I'm forever grateful for that
thank you for being who you are for
being legit for being kind for fighting
this um
um really interesting problem of how to
make AI accessible to a huge number of
people huge number of devices yeah well
so Lex you're a pretty special person
too right and so I think that you know
one of the funny things about you is
that besides being curious and pretty
damn smart you're actually willing to
push on things and you're you're I think
that you've got an agenda to like make
the world think
which I think is a pretty good agenda
it's a pretty good one uh thank you so
much for talking hey Chris yeah thanks
Alex
thanks for listening to this
conversation with Chris Ladner to
support this podcast please check out
our sponsors in the description and now
let me leave you some words from Isaac
Asimov
I do not fear computers
I fear the lack of them
thank you for listening and hope to see
you next time