Rajat Monga: TensorFlow | Lex Fridman Podcast #22

NERNE4UThHU • 2019-06-03

Transcript preview

Open

Kind: captions
Language: en
the following is a conversation with
Rajat manga he's an engineering director
of Google leading the tensorflow team
tensorflow
is an open source library at the center
of much of the work going on in the
world and deep learning both the cutting
edge research and the large-scale
application of learning based approaches
but is quickly becoming much more than a
software library it's now an ecosystem
of tools for the deployment of machine
learning in the cloud on the phone in
the browser on both generic and
specialized hardware tbu GPU and so on
plus there's a big emphasis on growing a
passionate community of developers Raja
Jeff Dean and a large team of engineers
at Google brain are working to define
the future of machine learning with
tensorflow
2.0 which is now in alpha I think the
decision to open-source tensorflow is a
definitive moment in the tech industry
it showed that open innovation can be
successful and inspire many companies to
open-source their code to publish and in
general engage in the open exchange of
ideas this conversation is part of the
artificial intelligence podcast if you
enjoy it
subscribe on youtube itunes or simply
connect with me on Twitter at Lex
Friedman spelled Fri D and now here's my
conversation with Roger manga
you were involved with Google brain
since its start in 2011 with Jeff Dean
it started with this belief the
proprietary machine learning library and
turn into tensorflow in 2014 the open
source library so what were the early
days of Google brain like what were the
goals the missions how do you even
proceed forward once there's so much
possibilities before you
it was interesting back then you know
when I started or I needed you even just
talking about it the idea of deep
learning was interesting and intriguing
in some ways it hadn't yet taken off but
it held some promise that had shown some
very promising and early results
I think that the idea where Andrew and
Jeff had started was what if we can take
this what people are doing in research
and scale it to what Google has in terms
of the compute power and also put that
kind of data together what does it mean
and so far the results had been if you
scale the compute scale the data it does
better and would that work and so that
that was the first year or two can we
prove that outright and with disbelief
and we started the first year we got
some early wins which which is always
great
what were the ones like what was the
ones where you were there's some
problems to this this is gonna be good I
think they're too early wins were one
was speech that we collaborated very
closely with the speech research team
who was also getting interested in this
and the other one was on images where we
you know the cat paper as we call it
that was covered by a lot of folks and
the birth of Google brain was around
neural networks that was so who was
declaring from the very beginning that
that was the whole mission so what would
in terms of scale what was the sort of
dream of what this could become like
what were there echoes of this
open-source tensorflow community that
might be brought in was there a sense of
TP use was there a sense of like machine
learning is not going to be at the core
the entire company is going to grow into
that direction
yeah I think so that was interesting in
like if I think back to 2012 or 2011 and
first was can we scale it in in the year
or so we had started scaling it to
hundreds and thousands of machines in
fact we had some runs even going to
10,000 machines and all of those shows
great promise in terms of machine
learning at CoCo the good thing was
Google's been doing machine learning for
a long time deep learning was new but as
we scale this up pretty sure that yes
that was possible and it was going to
impact lots of things
we started seeing real products wanting
to use this again speech was the first
there were image things that photos came
out of in many other products as well so
that was exciting as we went into with
that a couple of years externally also
academia started to you know there was
lots of push on okay deep learning is
interesting we should be doing more and
so on and so by 2014 we were looking at
okay this is a big thing it's gonna grow
and not just internally externally as
well yes maybe Google's ahead of where
everybody is but there's a lot to do so
I wanted this start to make sense and
come together so the decision to
open-source I was just chatting with the
Chris flattener about this the decision
go open-source with tons of flow I would
say so for me personally seems to be one
of the big seminal moments and all the
software engineering ever I think that's
a when a large company like Google
decides to take a large project that
many lawyers might argue has a lot of IP
just decide to go open-source with it
and in so doing lead the entire world
and saying you know what open innovation
is is is a pretty powerful thing and
it's okay to do that that was I mean
that's an incredible incredible moment
in time so do you remember those
discussions happening the other open
source should be happening what was that
like I would say I think so the initial
idea came from Jeff who was a big
proponent of this I think it came off of
two big things one was research by his
view at a research group we were putting
all our research out there if you wanted
to we were building on others research
and we wanted to push the state of the
art forward and part of that was to
share the research that's how I think
deep learning and machine learning has
really grown so fast
so the next step was okay now word
software help with that and it seemed
like they were existing a few libraries
out there they are hoping one torch
being other and a few others but they
were all done by academia and the level
was was significantly different
the other one was from a software
perspective Google had done lots of
software or that we'd used internally
you know and we published papers often
there was an open source project that
came out of that that somebody else
picked up that paper and implemented and
they were very successful back then it
was like okay there's Hadoop which has
come off of tech that we've built
we know the tech we've built is very
better for a number of different reasons
we've you know invested a lot of effort
intact and turns out we have Google
cloud and we are now not really
providing our tech but we are saying
okay we have BigTable which was thought
is no thing we're going to now provide
HBase api's on top of that which isn't
as good but that's what everybody's used
to so there's there's like can we make
something that is better and really just
provide helps the community in lots of
ways but it also helps push the write a
good standard forward so how does cloud
fit into that there's a tensorflow open
source write library and how does the
fact that you can use so many of the
resources that Google provides and the
cloud fit into that strategy so tensile
flow itself is open and you can use it
anywhere right and we want to make sure
that continues to be the case on Google
cloud we do make sure that there's lots
of integrations with everything else and
we want to make sure that it works
really really well there so you're
leaving the tensorflow effort you tell
me the history and the timeline of
transfer flow project in terms of major
design decisions so like the open source
decision but really you know what to
include and not there's this incredible
ecosystem that I'd like to talk about
there's all these parts but what if you
just some sample moments that defined
what tensorflow eventually became
through its I don't know if you were a
lot to say history when it's just
but in deep learning everything moves so
fast in just a few years is already
history yes yes so looking back we were
building tensor flow I guess we open
sourced it in 20
15 November 2015 we started on it
in summer of 2014 I guess and somewhere
like three to six late 2014 by then we
had decided that okay there's a high
likelihood we'll open source it so we
started thinking about that and making
sure we're heading down that path
at that point by that point we had seen
a few you know lots of different use
cases at Google so there were things
like okay yes you want to run in a large
scale in the data center yes we need to
support different kind of hardware we
had GPUs at that point we had our first
TPU at that point er was about to come
out you know roughly around that time so
the design sort of included those we had
started to push on mobile so we were
running models on mobile at that point
people were customizing chord so we
wanted to make sure tensorflow could
support that as well so that that sort
of became part of that overall design
when you say mobile you mean like pretty
complicated algorithms running on the
phone that's correct so so then you have
a model that you deploy on the phone and
run it their authority at that time
there was the ideas of running machine
learning on the phone that's correct we
already had a couple of products that
were doing that by then in those cases
we had basically customized handcrafted
code or some internal libraries that
were using so I was actually at Google
during this time in a parallel I guess
University but we were using piano and
cafe yeah we did we was there some
degree to which you were bouncing I like
trying to see what cafe was offering
people trying to see what Theano was
offering that you want to make sure
you're delivering on whatever that is
perhaps the Python part of thing maybe
did that influence any design decisions
um totally so when we built this belief
and some of that was in parallel with
some of these libraries coming up I mean
Theano itself is older but we were
building this belief focused on our
internal thing because our systems were
very different by the time we got to
this we looked at a number of libraries
that were out there Tiano there were
folks in the group who had experience
with torch with Lua there were folks
here who had seen cafe I mean actually
Yangcheng was here as well
there's one other libraries I think we
looked at a number of things might even
have looked at China back then I'm
trying to remember if across there
in fact the I we did discuss ideas
around okay should we have a graph or
not and they were so supporting all
these together was definitely you know
there were key decisions that we wanted
we we had seen limitations in our priors
just believe things a few of them were
just in terms of research was moving so
fast we wanted the flexibility we want
the hardware was changing fast we
expected to change that so that those
probably were two things
and yeah I think the flexibility in
terms of being able to express all kinds
of crazy things was definitely a big one
then so what the the graph decisions
without with moving towards tensorflow
2.0 there's more by default will be
eager execution so sort of hiding the
graph a little bit you know because it's
less intuitive in terms of the way
people develop and so on
what was that discussion like with in
terms of using graphs
it seemed its kind of the Theano way
they seemed the obvious choice so I
think where it came from was are like
this belief had a graph like thing as
well a much more simple it wasn't a
general graph it was more like a
straight line you know thing more like
what you might think of cafe I guess in
that sense but the graph was and we
always were cared about the production
stuff like even with disbelief we were
deploying a whole bunch of stuff in
production so graph did come from that
when we thought of okay should we do
that in Python and we experiment with it
some ideas where it looked a lot simpler
to use but not having a graph went okay
how do you deploy now so that was
probably what triggered the balance for
us and eventually we ended up with a
graph and I guess the question there is
did you I mean the production seems to
be the really good thing to focus on but
did you even anticipate the other side
of it where there could be what is it
what are the numbers something crazy a
forty 1 million downloads yep I mean was
that even like a possibility in your
mind that there would be as popular as
it became so I think we did see a need
for this a lot from the research
perspective and like early days of keep
learning in some is 41 million oh I
don't think I imagined this number then
there it seemed like there's a potential
future where lots more people would be
doing this and how do we enable like I
would say this kind of growth
I probably started seeing somewhat after
the open-sourcing there was like okay
you know deep learning is actually
growing way faster for a lot of
different reasons and we are in just the
right place to push on that and leverage
that earned and delivered on a lot of
some things that people want so what
changed once the open source like how
you know this incredible amount of
attention from a global population of
developers what how did the project
start changing
I don't you actually remember it during
those times I know looking now there's
really good documentation there's an
ecosystem of tools there's a community
of law is a YouTube channel now yeah
it's very very community driven back
then I guess 0.1 version is that the
version I think we called two point six
or five something like what changed
leading into 1.0 it's interesting you
know I think we've gone through a few
things there when we started our twin we
first came out people love the
documentation we have because it was
just a huge step up from everything else
because all of those were academic
projects people doing you know we don't
think about documentation I think what
that changed was instead of deep
learning being a research thing some
people who were just developers could
now certainly take this out and do some
interesting things with it right who had
no clue what machine learning was before
then and that I think really changed how
things started to scale up in some ways
and pushed on it over the next few
months as we looked at you know how do
we stabilize things as we look at not
just researchers now we want stability
people who aren't apply things that's
how we started planning for minato and
there are certain needs for that
perspective and so again documentation
comes up designs more kinds of things to
put that together
and so that was exciting to get back to
a stage where more and more enterprises
wanted to buy in and really get behind
that and I think post one not oh and you
know with the next few releases that
enterprise adoption also started to take
off I would say between the initial
release and whatnot oh it was okay
researchers of course then a lot of
hobby is an early interest people
excited about this who started to get on
board and then over the one knotek's
thing lots of enterprises i imagine
anything that's you know below 1.0 get
some pressure to be and rise probably
want something that's stable exactly and
uh do you have a sense now the
tensorflow misses day like it feels like
the deep learning in general is
extremely dynamic field as so much is
changing do you have uh and doesn't fall
it's been growing incredibly you have a
sense of stability at the helm of it I
know you're in the midst of it but yeah
it's it's I think in the midst of it
it's often easy to forget what in
enterprise wines and what some of the
people on that side one they're still
people running models that are three
years old four years old so inception is
still used by tons of people just even
last night fifty is what couple of years
over now or more but tons of people who
use tag and they're fine they don't need
the last couple of bits of performance
or quality they want some stability and
things that just work and so there is
value in providing that with that kind
of stability and in making it really
simpler because that allows a lot more
people to access it and then there's the
research crowd which wants okay they
want to do these crazy things exactly
like you're saying right not just deep
learning in the straight-up models that
used to be there they warned RN ends and
even are an enzyme a B or there are
transformers now and now it needs to
combine with RL and Gans and so on so so
there's definitely that area that like
the boundary that's shifting and pushing
the state of the art but I think there's
more and more of the past arts much more
stable and even stuff that was two three
years old
is very very usable by lots of people so
that makes her that part makes it all
easier so I imagine maybe you can
correct me if I'm wrong
one of the biggest use cases is
essentially taking something like resna
50 and doing some kind of transfer
learning on a very particular problem
that you have
it's basically probably what majority of
the world does and you want to make that
as easy as possible that's right so I
would say for the hobbyist perspective
that's the most common case right
in fact the apps on phones and stuff
that you'll see the early ones that's
the most common case I would say there a
couple of reasons for that one is that
everybody talks about that it looks
great on slides yeah that's a virtual
presentation you know exactly what
enterprises wine is that is part of it
but that's not the big thing enterprises
really have data that they want to make
predictions on this is often what they
used to do with the people who are doing
M L was just regression models linear
regression logistic regression linear
models or maybe gradient booster trees
and so on some of them still benefit
from deep learning but they weren't that
that that's the bread and butter like
the structured data and so on so
depending on the audience you look at
their little bit different and they just
have I mean the best of enterprise
probably just has a very large data set
or deep learning can probably shine
that's correct right
and then they I think the other pieces
that they weren't again it with 2.0 or
that developer summit we put together is
there the whole tensorflow extended
piece which is the entire pipeline they
care about stability across doing their
entire thing they want simplicity across
the entire thing I don't need to just
train a model I need to do that every
day again over and over again I wonder
to which degree you have a role in I
don't know so I teach a course on deep
learning and I have people like lawyers
come up to me and say you know say one
is machine learning gonna enter legal
the legal around the same thing in all
kinds of disciplines immigration
insurance often when I see what it boils
down to is
these companies are often a little bit
old-school in the way they organize the
day so the data is just not ready yet
it's not digitized if you also find
yourself being in the role of an
evangelist for like let's get organized
your data folks and then you'll get the
big benefit of tensorflow do you get
those have those conversations so yeah
yeah I you know I get all kinds of
questions there from okay what can I do
what do I need to make this work right -
do we really need deep learning I mean
they're all these things I already used
this linear model why would this help I
don't have enough data or let's say you
know or I want to use machine learning
but I have no clue where to start so I'd
really start to all the way to the
experts who wise but very specific
things it's interesting is there a good
answer is it boils down to oftentimes
digitizing data so whatever you want
automated though whatever date you want
to make prediction based on you have to
make sure that it's in an organized form
you'd like with it within in the sense
of like ecosystem there's now you're
providing more and more data sets and
more pre training models are you finding
yourself also the organizer of data sets
yes I think the tensorflow data sets
that we just released
that's definitely come up people want
these data sets can we organize them and
can we make that easier so that's that's
definitely one important thing the other
related thing I would say is I often
tell people you know what don't think of
the most fanciest thing that the newest
model that you see make something very
basic work and then you can improve it
there's just lots of things you can do
in there yeah I start with the basic
truth one of the big things that makes
it makes tensorflow even more accessible
was the appearance whenever that
happened of Karass the Cara standard
sort of outside of tents of no I think
it was Karis on top of the a no at first
only and then Karis became on top of
tensorflow
do you know when Cara shows to also add
10 Sefolosha back and
who was the was it just the community
that drove that initially do you know if
there was discussions conversations yeah
so Francis started the Charis project
before he was at Google and the first
thing was Tiana would I don't remember
if that was after tensorflow was created
or way before and then at some point ray
intense flow started becoming popular
there were enough similarities that he
decided to okay create this interface
and input tense flows the back end I
believe that might still have been
before he joined Google so I you know
we're not really talking about that he
decided on his own and thought that was
interesting and relevant to the
community in fact I didn't find out
about him being at Google until a few
months after he was here he was working
on some research ideas and doing Kerris
on his nights and weekends project and
things so he wasn't like part of the
texture flow he didn't join in the joint
research and he's doing some amazing
here's some papers on that on research
he's done he's a great researcher as
well and at some point we realized oh
he's he's doing this good stuff people
seem to like the API and he's right here
so we talked to him and he said okay why
don't I come over to your team and work
with you for a quarter and let's make
that integration happen and we talked to
his manager and he said sure my quarters
fine and that quarter's been something
like two years now so Karis got
integrated into tensorflow like in a
deep way yeah and now with 2.0
tensorflow 2.0 sort of Karass is kind of
the recommended way for a beginner to
interact with testify which makes that
initial sort of transfer learning or the
basic use cases even for an enterprise
super simple right that's good that's
right so what was that decision like
that seems like a I it's kind of a bold
decision as well we did spend a lot of
time thinking about that one we had a
bind
of API somewhere by us there was a
parallel layers API that we were
building and when we decided to do caris
in parallel so they were like okay two
things that we are looking at and the
first thing we was trying to do is just
have them look similar like be as
integrator as possible share all of that
stuff they were also like three other
API is that others had built over time
because we didn't have a standard one
but one of the messages that we keep
kept hearing from the community okay
which one do we use and they kept seeing
like okay here's a model in this one and
here's a model in this one which should
I pick so that that's sort of like okay
we had to address that straight on with
2.0 the whole idea is you need to
simplify you had to pick one based on
where we were we were like okay let's
see what's what are the what do the
people like and caris was clearly one
that lots of people loved there were
lots of great things about it so we
settled on that organically that's kind
of the best way to do it which it was
great because it was surprising there
were less to sort of bring in and
outside I mean there was a feeling like
Karis might be almost like a competitor
and this is a certain kind of to
tensorflow and in a sense it became an
empowering element of tensorflow
that's right yeah it's interesting how
you can put two things together which
don't which can align iron in this case
I think Francois the team and I you know
a bunch of us have chatted and I think
we we all want to see the same kind of
things we all care about making it
easier for the huge set of developers
out there and that makes a difference so
python has grid over in Rossum who until
recently held the position of benevolent
dictator for life right so there's a
huge successful open source project like
tensorflow need one person who makes a
final decision so you did a pretty
successful tensorflow dev summit just
now last couple of days there's clearly
a lot of different new features being
incorporated and amazing ecosystem so on
who's a
how are those design decisions made is
there is there a btfl intensive flow and
or is it more distributed in organic I
think it's it's some more different I
would say I've always been involved in
the key design directions but there are
lots of things that are distributed
where there number of people Martin Rick
being one who is really driven a lot of
our open source stuff a lot of the api's
in there there a number of other people
who have been you know pushed and been
responsible for different parts of it we
do have regular design reviews over the
last year we've really spent a lot of
time opening up to the community and
adding transparency we're setting more
processes in place so RFC's special
interest groups really grow that
community and and scale that I think
that kind of scale that ecosystem is in
I don't think we could scale with having
me as the saloon decision-maker yeah so
yeah the growth of that ecosystem maybe
you can talk about a little bit first of
all when I started with Andre karpati
when he first had come that j/s the fact
that you can train in your network in
the browser's in that JavaScript was
incredible
yep so now tensorflow jas is really
making that a serious like a legit thing
a way to operate whether it's in the
back end or the front end then there's
the tensorflow extended like you
mentioned there's a stencil for light
for mobile and all of it as far as I can
tell it's really converging towards
being able to you know save models in
the same kind of way you can move around
you can train on the desktop and then
move it to mobile and so on thickness is
that cohesiveness so he may be give me
whatever I missed a bigger overview of
the mission of the ecosystem that's
trying to be built and where is it
moving forward yeah
so in short the way I like to think of
this is our goals to enable machine
learning and in a couple of ways you
know one is if you have lots of exciting
things going on in ml today we started
with deep learning but we now support a
bunch of other algorithms too so so one
is to on the research side keep pushing
on the state-of-the-art can we you know
how do we enable researchers to build
the next amazing thing so Bert came out
recently you know it's great that people
are able to do new kinds of research and
there are lots of you know amazing
research that happens across the world
so that's one direction the other is how
do you take that across all the people
outside who want to take that research
and do some great things with it and
integrate it to build real products to
have a real impact on people and so if
that's the other axes in some ways you
know at a high level one way I think
about it is there a crazy number of
compute devices across the world and we
often used to think of ml and training
and all of this as okay something you do
either in the work station or the data
center or cloud but we see things
running on the phones we see things
running on really tiny chips I mean we
had some demos at the developer summit
and so the way I think about this
ecosystem is how do we help get machine
learning on every device that has the
compute capability and that continues to
grow in and so in some ways this
ecosystem is looked at you know various
aspects of tagged and grown over time to
cover more of those and we continue to
push the boundaries in some areas we've
built more tooling and things around
that to help you I mean the first tool
we started was ten support you wanted to
learn just the training piece the
effects or tensorflow extended to really
do your entire ml pipelines if you're
you know care about all that production
stuff but then going to the edge going
to different kinds of things and it's
not just us now if you're a place where
there are lots of libraries being built
on top so there are some for research
may be things like pens flow agent
certain
the probability that started as research
things or for researchers for focusing
on certain kinds of algorithms but
they're also being deployed or produced
by you know production folks and some
have come from within Google just teams
across Google who wanted to build these
things others have come from just the
community because there are different
pieces that different parts of the
community care about and I see our goal
as enabling even that right it's not we
cannot and won't build every single
thing that just doesn't make sense but
if we can enable others to build the
things that they care about and there's
a broader community that cares about
that and we can help encourage that and
that that's great that really helps the
entire ecosystem not just those one of
the big things about 2.0 that we're
pushing on is okay we have these so many
different pieces right how do we help
make all of them work well together
so there are few key pieces there that
we're pushing on one being the core
format in there and how we share the
models themselves through save model and
Martin's flow hub and so on and you know
a few of the pieces that we really put
this together I was very skeptical that
that's you know intensive for j/s came
out I didn't seem or deep-learning j/s
yeah it was the first it seems like
technically very difficult project as a
standalone it's not as difficult but as
a thing that integrates into the
ecosystems is very difficult so yeah I
mean there's a lot of aspects of this
you're making look easy but and the
technical side how many challenges have
to be overcome here a lot and still have
to be yes that's the other question here
too there are lots of steps to a
training leave iterated over the last
few years so there's lot we've learned I
yeah often when things come together
well things look easy
that's exactly the point it should be
easy for the end user but there are lots
of things that go behind that if I think
about still challenges ahead there are
you know we have a lot more devices
coming on board for example from the
hardware perspective how do we make it
really easy for these vendors to
integrate with something like tensorflow
right so there's a lot of compiler stuff
that others are working on there things
we can do in terms of our API is and so
on that we can do as we you know
tensorflow started as a very monolithic
system and to some extent it still is
there are less lots of tools around it
but the core is still pretty large in
monolithic one of the key challenges for
us to scale that out is how do we break
that apart with clear interfaces it's
you know in some ways its software
engineering 101 but for a system that's
now four years old I guess or more and
that's still rapidly evolving and that
we're not slowing down with it's hard to
you know change and modify and really
break apart it's sort of like as people
say right it's like changing the engine
with a car running or fake professor
that's exactly what we're trying to do
so there's a challenge here because the
downside of so many people being excited
about tensorflow and becoming to rely on
it in many of their applications is that
you're kind of responsible it's the
technical debt you're responsible for
previous versions to some degree still
working so when you're trying to
innovate I mean it's probably easier to
just start from scratch every few months
absolutely so do you feel the pain of
that a 2.0 does break some back
compatibility but not too much it seems
like the conversion is pretty
straightforward and do you think that's
still important given how quickly deep
learning is changing
can you just the things that don't
you've learned can you just start over
or is there pressure to not it's it's a
tricky balance so if it was just a
researcher writing a paper who a year
later will not look at that code again
sure it doesn't matter
there are a lot of production systems
that rely on tensor flow port at Google
and across the world
and people worry about this I mean
they're these systems run for a long
time so it is important to keep that
compatibility and so on and yes it does
come with a huge cost there's we have to
think about a lot of things as we do new
things and make new changes I think it's
a trade-off right you can you might slow
certain kinds of things down but the
overall value you're bringing because of
that is is much bigger because it's not
just about breaking the person yesterday
it's also about telling the person
tomorrow that you know what this is how
we do things you're not going to break
you and you come on part because there
are lots of new people who are also
going to come on board hey you know one
way I like to think about this and I
always push the team to think about as
well when you want to do neat things you
want to start with a clean slate
design with a clean slate in mind and
then we'll figure out how to make sure
all the other things work and yes we do
make compromises occasionally but unless
you've designed with the clean slate and
not worried about that you'll never get
to a good place I was brilliant so even
if you're do you are responsible when
you're in an idea stage when you're
thinking of new it just put all that
behind you yeah that's right okay that's
really really well put so I have to ask
this because a lot of students
developers ask me how I feel about pie
tours for successful so I've recently
completely switched my research group to
tensorflow
I wish everybody would just use the same
thing and tensile force as close to that
I believe as we have but do you enjoy
competition so testify was leading in
many ways on many dimensions in terms of
the ecosystem terms the number of users
momentum power production level so on
but you know a lot of researchers and
now also using PI torch do you enjoy
that kind of competition or do you just
ignore it and focus on making tencel
flow the best that it can be so just
like research or anything
people are doing right it's great to get
different kinds of ideas and when we
started with tensorflow like I was
saying earlier one it was very important
for us to also have production in mind
we didn't want just research right and
that's why we chose certain things now
pi torch came along and said you know
what I only care about research this is
what I'm trying to do what's the best
thing I can do for this and it started
iterating and said okay I don't need to
worry about drafts let me just run
things I don't care if it's not as fast
as it can be but let me just you know
make this part easy and there are things
you can learn from that right they they
again had the benefit of seeing what had
come before but also exploring certain
different kinds of spaces and they had
some good things there you know building
on say things like chain and so on
before that so competition is definitely
interesting it made us you know this is
an area that we had thought about like I
said you know very early on over time we
had revisited this a couple of times
should we add this again at some point
we said you know what here's it seems
like this can be done well so let's try
it again and week that's how you know we
started pushing on eager execution how
do we combine those two together which
has finally come very well together in
2.0 but it took us a while to get all
the things together and so on so let me
I mean ask put another way I think eager
execution is a really powerful thing
those at it you think he wouldn't have
been
you know Muhammad Ali versus Frazier
right you think it wouldn't have been
added as quickly if pi torch wasn't
there it weight might have taken longer
no long yeah it was I mean we dried some
radiance of that before so I'm sure it
would ever happen but it might have
taken longer I'm grateful that
tensorflow responds in the way they did
it's doing some incredible work last
couple years what are the things that we
didn't talk about are you looking
forward in 2.0 that it comes to mind so
we talked about some of the ecosystem
stuff making it easily accessible to
Karis Iker execution is there other
things that we missed yeah I would say
one is just where 2.0 is and you know
with all the things that we've talked
about I think as we think beyond that
there are lots of other things that it
enables us to do and that we are excited
about so what it's setting us up for ok
the hair these really clean api's we've
cleaned up the surface for what the
users warned what it does also allows us
to do a whole bunch of stuff behind the
scenes once we've we are ready with 2.0
so for example intensive flow with
graphs and all the things you could do
you could always get a lot of good
performance if you spent the time to
tune it right and we have clearly shown
that lots of people do that the 2.0 with
these API is where we are we can give
you a lot of performance just with
whatever you do you know if you're
because we see please
it's much cleaner we know most people
are going to do things this way we can
really optimize for that and get a lot
of those things out of the box and it
really allows us you know both for
single machine and distributed and so on
really explore other spaces behind the
scenes after you know 2.0 in the future
versions as well so right now the team
is really excited about that that over
time I think we'll see that the other
piece that I was talking about in terms
of just restructuring the monolithic
thing into more pieces and making it
more modular I think that's going to be
really important for a lot
the other people in the ecosystem are
their organizations and so on that
wanted to build things can you elaborate
a little bit what you mean by making
tons of flow more ecosystem more modular
so the way it's organized today is
there's one there are lots of
repositories in the tensorflow
organization at github the core one
where we have cleanser flow it has the
execution engine it has you know the key
backends for CPUs and GPUs it has the
work to do distributed stuff and all of
these just work together in a single
library or binary there's no way to
split them apart easily when there are
some interfaces but they're not very
clean in a perfect world you would have
clean interfaces where okay I want to
run it on my fancy cluster with some
custom networking just implement this
and do that I mean we kind of support
that but it's hard for people today I
think as we are starting to see more
interesting things in some of these
paces having that clean separation will
really start to help and and again going
to the large size of the ecosystem in
the different groups involved they're
enabling people to evolve and push on
things more independently just allows it
to scale better and by people you mean
individual developers and I know
organization and organizations that's so
the hope is that everybody sort of major
I don't know Pepsi or something uses
like major corporations go to tensorflow
do this kind of yeah if you look at
enterprise like Pepsi or these I mean a
lot of them are already using tensorflow
they are not the ones that do the
development or changes in the core some
of them do but a lot of them don't I
mean they cut small pieces there are
lots of these some of them being let's
say hardware vendors who are building
their custom hardware and they want
their own Vsauce or some of them being
bigger companies say IBM I mean they're
involved in some of our special interest
groups and they see a lot of users who
want certain things and they want to
optimize for that so folks like that
often a tourist vehicle companies
perhaps exactly yes so yeah like I
mentioned tensorflow has been downloaded
41 million
50,000 commits almost 10,000 pool
requests and 1,800 contributors so I'm
not sure if you can explain it but Oh
what does it take to build a community
like that what if in retrospect what do
you think what is the critical thing
that allowed for this growth to happen
and how does that growth continue yeah
uh yeah that's a interesting question I
wish I had all the answers there I guess
so you could replicate it I I think
there's a there number of things that
need to come together right a one you
know just like any new thing it is about
there's a sweet spot of timing what's
needed you know does it grow with what's
needed so in this case for example
tensa flow is not just grown because it
was a good tool it's also grown with the
growth of deep learning itself so those
factors come into play
other than that though I think just
hearing listening to the community what
they're - what they need being open to
like in terms of external contributions
we've spent a lot of time in making sure
we can accept those contributions well
we can help the contributors in in
adding those putting the right process
in place getting the right kind of
community welcoming them and so on like
over the last year we've really pushed
on transparency that that's important
for an open source project people want
to know where things are going and
they're like okay here's a process where
you can do that here RFC's and so on so
thinking through there are lots of
community aspects that come into that
you can really work on as a small
project it's may be easy to do because
there's like two developers in and you
can do those as you grow putting more of
these processes in place thinking about
the documentation thinking about what to
developers care about what kind of tools
would they want to use one of these come
into planting so one of the big things I
think that feeds the tensorflow fire is
people building something on tensorflow
and you know some
implement a particular architecture that
does something cool useful and they put
it at that and github and so it just
feeds this this growth these have a
sense that with 2.0 and 1.0 that there
may be a little bit of a partitioning
like there's a Python two and three but
there'll be a code base and in the older
versions of test fall there will not be
as compatible easily or any pretty
confident that this kind of conversion
it's pretty natural and easy to do so
we're definitely working on hard to make
that very easy to do there's lots of
tooling that we talked about at the
developer summit this week and really
continue to invest in that tooling it's
you know when you think of these
significant version changes that's
always a risk and we we are really
pushing hard to make that transition
very very smooth I I think so
so at some level people want to move and
they see the value in the new thing they
don't want to move just because it's a
new thing and some people do it but most
people want a really good thing and I
think over the next few months as people
start to see the value will F&T see that
shift happening so I'm pretty excited
and confident that we will see people
moving as you said earlier this field is
also moving rapidly so that'll help
because we can do more things and you
know the new things will clearly happen
into point X so people who have lots of
good reasons to move so what do you
think that's the 43.0 looks like is that
is there
it's everything's happening so crazily
that even at the end of this year seems
impossible to plan for or is it possible
to plan for the next five years I I
think it's tricky there are some things
that we can expect in terms of okay
change yes change is going to happen I
are there some good things going to
stick around and something's not going
to stick around I would say that the
basics of deep learning the you know say
convolution models or the basic kind of
things they'll probably be around in
some form still in five years will rln
ganz stay very likely
based on where they are we have new
things probably but those are hard to
predict and some directionally some
things that we can see is you know and
things that we're starting to do right
with some of our projects right now is
just 2.0 combining you could execution
in in graphs where we starting to make
it more like just your natural
programming language you're not trying
to program something else
similarly with surfer tensorflow we're
taking that approach can you do
something roundup right so some of those
ideas seem like okay that's the right
direction in five years we expect to see
more in that area other things we don't
know is will hardware accelerators be
the same will we be able to train with
four bits instead of 32 bits and I think
the TPU side of things is exploring that
I mean GPUs already on version three it
seems that the evolution of TPU and
tensorflow are sort of their Co evolving
almost in terms of both are learning
from each other and from the community
and from the applications where the
biggest benefit is achieved that's right
you've been trying to sort with with ego
with carrots to make tensorflow as
accessible and easy to use as possible
what do you think for beginners is the
biggest thing they struggle with have
you encountered that or is basically
what Karis is solving is that eager like
we talked about yeah for for some of
them like you said right
beginners want to just be able to take
some image model they don't care if it's
inception the rest net or something else
and do some training or transfer
learning on the kind of model being able
to make that easy is important so I in
some ways if you do that by providing
them simple models would say in hub or
so on and they don't care about what's
inside that box but they want to be able
to use it so we are pushing on I think
different levels if you look at just a
component that you get which has the
layers already smushed in the beginners
probably just want that then the next
step is okay look at building layers
with players if you go out to research
then they are probably writing custom
layers themselves
they don't live so there's a whole
spectrum there and then providing the
pre-trained models seems to really
decrease the time from are you trying to
start so you could basically in a collab
notebook achieve what you need
so basically answering my own question
because I think what tensorflow
delivered on recently is this trivial
for beginners so I was just wondering if
there was other pain points you tried to
ease but I'm not sure there would know
that those are probably the big ones
every night I see high schoolers doing a
whole bunch of things now it's pretty
amazing it's it's both amazing and
terrifying yes in a sense that when they
grow up it's some incredible ideas will
be coming from them so there's certainly
a technical aspect to your work but you
also have a management aspect to your
role with tensorflow leading the project
large number of developers and people so
what do you look for in a good team what
do you think you know Google has been at
the forefront of exploring what it takes
to build a good team and tensorflow is
one of the most cutting-edge
technologies in the world so in this
context what do you think makes for a
good team it's definitely something I
think a fair bit about I think in terms
of you know the team being able to
deliver something well one of the things
that's important is a cohesion across
the team so being able to execute
together and doing things it's not an
end like at this scale an individual
engineer can only do so much there's a
lot more that take they can do together
even though we have some amazing
Superstars across Google and in the team
but there's you know often the way I see
it as the product of what the team
generates is very larger than the whole
or you know the individual put together
and so how do we have all of them work
together the culture of the team itself
hiring good people is important but part
of that is it's not just that okay we
hire one
smart people and throw them together and
let them do things it's also people have
to care about what they're building
people have to be motivated for the
right kind of things that's often an
important factor and you know finally
how do you put that together with a
somewhat unified vision of where we want
to go so are we all looking in the same
direction or what's going on over and
sometimes it's a mix Google's a very
bottom-up organization in some sense
also research even more so and that's
how we started but as we've become this
larger product and ecosystem I think
it's also important to combine that well
with mix if ok here's the direction you
want to go in there is exploration we'll
do around that but let's keep staying in
that direction not just all over the
place and is there a way you monitor the
health of the team sort of like is is
there way you know you did a good job he
was good like I mean you're sort of
you're saying nice things but it's
sometimes difficult to determine yeah
how aligned yes because it's not binary
it's nothing it's it's there's tensions
and complexities and so on and the other
element of visit the mesh is superstars
you know there's so much even at Google
such a large percentage of work is done
by individual superstars too so there's
a yeah and sometimes those superstars
could be against the dynamic of a team
and those those tensions and it was that
has the I mean I'm sure in telephone
might be a little bit easier because the
mission of the project is so
mr. beautiful year at the cutting edge
was exciting yeah when have you had
struggle with that has there been
challenges there are always people
challenges in different kinds of fairs
that bad said I think we've been what's
good about getting people who care and
are you know have the same kind of
culture and that's Google in general to
a large extent but also like you said
given that the project has had so many
excit

Resume

Berikut adalah rangkuman komprehensif dan terstruktur dari transkrip wawancara dengan Rajat Monga mengenai TensorFlow dan ekosistem Machine Learning.

---

# Evolusi TensorFlow: Dari Google Brain Hingga Ekosistem AI Open Source Terdepan

### Inti Sari (Executive Summary)
Video ini membahas perjalanan evolusi TensorFlow bersama Rajat Monga, Engineering Director Google yang memimpin tim TensorFlow. Topik utamanya mencakup sejarah lahirnya TensorFlow dari proyek Google Brain, strategi di balik keputusan *open source*, serta tantangan dalam mengembangkan ekosistem yang dapat menyeimbangkan kebutuhan riset akademis dengan implementasi produksi skala industri. Diskusi juga menyentuh masa depan AI melalui integrasi Keras, modularisasi arsitektur, serta pentingnya aksesibilitas bagi pemula melalui alat seperti Google Colab.

### Poin-Poin Kunci (Key Takeaways)
*   **Asal Usul:** TensorFlow lahir dari proyek internal Google (DistBelief/Google Brain) dan menjadi *open source* pada tahun 2015 untuk mempercepat inovasi dan menetapkan standar industri.
*   **Desain & Arsitektur:** Keputusan menggunakan *computation graph* diambil untuk mendukung skala produksi yang besar dan optimasi perangkat keras, meskipun library lain saat itu lebih fokus pada kemudahan riset.
*   **TensorFlow 2.0 & Keras:** Versi terbaru mengadopsi Keras sebagai API tingkat tinggi utama dan mengimplementasikan *eager execution* untuk kemudahan penggunaan, belajar dari pesaing seperti PyTorch.
*   **Ekosistem Luas:** TensorFlow kini bukan hanya satu library, melainkan ekosistem yang mencakup TFX (produksi), TensorFlow Lite (mobile), TensorFlow.js (browser), dan TensorFlow Hub.
*   **Tantangan Pengembangan:** Tantangan terbesar termasuk mengubah arsitektur monolitik menjadi modular, menjaga kompatibilitas mundur, dan mengelola komunitas global yang besar.
*   **Pesan untuk Pemula:** Kunci sukses di era AI adalah memulai dengan data yang terorganisir, memanfaatkan model *pre-trained*, dan menggunakan alat seperti Colab untuk belajar tanpa hambatan teknis instalasi.

---

### Rincian Materi (Detailed Breakdown)

#### 1. Sejarah dan Filosofi Open Source
Rajat Monga, bersama Jeff Dean, memulai Google Brain pada tahun 2011. Awalnya menggunakan library internal bernama DistBelief, mereka menyadari potensi besar *Deep Learning* untuk mengubah berbagai produk Google, mulai dari *Speech Recognition* hingga Google Photos.
*   **Keputusan Open Source:** Pada tahun 2015, Google memutuskan untuk membuka sumber kode TensorFlow. Keputusan ini didorong oleh dua alasan utama: untuk berbagi pengetahuan riset demi kemajuan teknologi (*state of the art*), dan untuk menyediakan infrastruktur yang sebenarnya (bukan tiruan) kepada komunitas, mirip dengan apa yang Google lakukan dengan MapReduce atau BigTable.
*   **Pertimbangan Desain:** Desain TensorFlow dipengaruhi oleh library yang ada saat itu seperti Theano, Torch, dan Caffe. Namun, Google memilih pendekatan *graph* yang lebih kaku namun sangat optimal untuk *deployment* di skala produksi (data center) dan berbagai perangkat keras (GPU/TPU), berbeda dengan library akademis yang lebih fokus pada fleksibilitas kode.

#### 2. Pertumbuhan Ekosistem dan Peran Keras
Setelah *open source*, TensorFlow tumbuh pesat dengan lebih dari 41 juta unduhan. Pertumbuhan ini menggeser fokus pengguna dari sekadar peneliti akademis hingga developer perusahaan dan pemula.
*   **Integrasi Keras:** Untuk menjawab tantangan kemudahan penggunaan, TensorFlow mengintegrasikan Keras (dibuat oleh François Chollet) ke dalam inti ekosistemnya. Pada TensorFlow 2.0, Keras dijadikan API standar untuk pemula dan penggunaan tingkat tinggi.
*   **Tata Kelola Komunitas:** Pengembangan TensorFlow tidak dipimpin oleh satu orang "diktator" (seperti Guido van Rossum di Python), melainkan melalui proses transparan menggunakan RFC (Request for Comments) dan SIG (Special Interest Groups) untuk melibatkan komunitas luas.

#### 3. Tantangan Teknis dan Persaingan
Mengembangkan framework sebesar TensorFlow memiliki tantangan unik, terutama dalam menyeimbangkan inovasi dengan stabilitas.
*   **Monolitik vs Modular:** TensorFlow awalnya bersifat monolitik. Tantangan ke depan adalah memecahnya menjadi bagian-bagian modular (mesin eksekusi, backend, distribusi) dengan antarmuka yang jelas, agar ekosistem dapat berkembang lebih cepat.
*   **Kompatibilitas & Versi 2.0:** Transisi ke versi 2.0 memutus beberapa kompatibilitas mundur, mirip transisi Python 2 ke 3. Namun, tim menyediakan alat bantu konversi dan fokus pada nilai tambah yang ditawarkan oleh API baru yang lebih bersih.
*   **Persaingan dengan PyTorch:** Rajat mengakui PyTorch sebagai pesaing kuat di ranah riset karena kemudahannya (*dynamic graph/eager execution*). Hal ini mendorong TensorFlow untuk mengadopsi *eager execution* di versi 2.0, menggabungkan kemudahan riset dengan kekuatan produksi.

#### 4. Manajemen Tim dan Budaya Kerja
Sebagai pemimpin teknik, Rajat menekankan pentingnya kohesi tim. Ia menyatakan bahwa hasil kerja tim harus lebih besar dari jumlah individu-individunya.
*   **Rekrutmen:** Google memiliki proses rekrutmen yang matang, mencari keseimbangan antara keahlian teknis dan motivasi. Kandidat harus memiliki semangat untuk membangun sesuatu yang bermanfaat, bukan sekadar kemampuan teknis semata.
*   **Budaya:** Mengelola tim superstar membutuhkan kearifan; individu yang brilian namun merusak dinamika tim tidak diinginkan. Kunci sukses adalah memiliki visi yang menyatukan arah sambil memberikan ruang untuk inisiatif bawah (*bottom-up culture*).

#### 5. Masa Depan AI dan Saran untuk Pemula
Rajat juga berbagi pandangannya tentang masa depan AI dan bagaimana pemula dapat memulai perjalanan mereka.
*   **Kesiapan Data:** Banyak perusahaan yang tertarik pada AI namun datanya belum siap (belum didigitalkan). Rajat menyarankan untuk fokus pada pengorganisasian data terlebih dahulu sebelum menerapkan model yang kompleks.
*   **Cloud Computing dan Colab:** Layanan cloud, khususnya Google Colab, memainkan peran besar dalam demokratisasi AI. Colab memungkinkan siapa saja untuk menjalankan kode machine learning langsung di browser tanpa instalasi perangkat keras yang rumit.
*   **Pengalaman Iklan Google:** Mengacu pada pengalamannya memimpin tim Iklan Penelusuran Google, Rajat menekankan bahwa teknologi harus seimbang dengan pengalaman pengguna. Iklan yang tidak relevan atau berkualitas rendah tidak akan ditampilkan, meskipun berpotensi menghasilkan uang.

### Kesimpulan & Pesan Penutup
TensorFlow telah berkembang dari sebuah library riset internal menjadi ekosistem machine learning yang komprehensif dan mendunia. Visi utamanya adalah mendemokratisasi akses ke teknologi AI agar dapat digunakan oleh siapa saja, mulai dari siswa sekolah menengah hingga perusahaan besar. Bagi mereka yang baru tertarik dengan machine learning, pesan penutupnya adalah jelas: **Jangan tunggu.** Kunjungi situs web resmi TensorFlow, ikuti tutorial, dan mulailah bereksperimen langsung menggunakan Google Colab tanpa perlu pusing dengan instalasi perangkat lunak.

Read

file updated 2026-02-13 13:25:51 UTC