Transcript preview
Open
Kind: captions Language: en okay so today I'm going to briefly introduce you tno how to use it and go over the basic principles behind the libraries and if you paid attention during yesterday's presentation of tensor flow some concepts will be familiar to you as well and if you paid attention to you go lava Shell's introduction area talk you'll see some some serie concept as well so there's going to be four main parts so the first one is well this slide and introduction about what the concept of Tiano are there is a companion ipython notebook that's on github so if you go on that page or clone that github repository there is an eye Python notebook that basically has all the code snippets from these slides so that you can run them at the same time then we're going to have a more hands-on example basically applying logistic regression on the Emnes digits data set and then if we have time we'll go quickly over to more examples concepts so the basic Linette architecture and an STM for character level generation of text so Tiano is we can say mathematical symbolic expression compiler so what does that mean it means that it makes it possible to define expressions that represent mathematical expression using numpy syntax so it's easy to use and it supports all the kind of basic mathematical operations like main max addition subtraction all the kind of basic things not only larger blocks like layers of neural nets whole networks or things like that it makes it possible to manipulate those expressions during rough substitutions cloning and replacement things like that and also making possible to go through that graph and perform things like automatic differentiation a symbolic differentiation actually all the our operator for forward differentiation applying some optimizations for increased numerical stability and then it's possible to use that optimized graph and the Endo's runtime to actually compute some values some output values even inputs we also have a couple of tools that help debug both pianos code and the users code and try to inspect and understand better what's actually happening when you're using Tianna so when I was currently more than 8 years old it started small with only a couple of contributors from the ancestor of Mila and which was called Lisa at the time and it grew a lot we now have contributors from all over the world users from all over the world and it's been used to drive a lot of research papers prototypes for industrial application in startups and in larger companies tno has also been the base of other software projects that build on top of the nose so for instance blocks Kara's Lezyne our machine learning deep learning libraries that used ya know as a back-end and provides user interface that is a higher level so that has concepts of layers of training algorithms of this kind of things whereas ya know is modern backends SK don't ya know as well which is nice because it has a converter to load cafe models from the cafe zoo and use them in Tiano and does a lot of other things as well pi MC 3 actually uses t anode not to do machine learning but for ballistic programming and we have two other libraries platoon that Mira is developing and TN o MP I developed a 12 with our layers on top of T and O to help train on multiple machines multiple GPUs and have some level of model parallelism and data parallelism so how to use TN well first of all we are working with symbolic expression symbolic variables so that will make up a computation graph so let's see how how to do that so to define the symbolic expression so we defined the expression first then we want to compile a function and then execute that function on values so to define the expression we start by defining inputs so the inputs are symbolic variables that have some type so you have to define in advance whether like this variable is like a vector or matrix what's its data type is floating-point integers and so on so things like the number of dimensions have to be known in advance but the shape is not fixed the memory layout is not fixed so you could have shapes that change between like 1 mini-batch and the next or different calls to do to the function in general so x and y are purely symbolic variables here we will give them values later but for now that's just that's just empty there's another kind of input variables that is share variables and they they're symbolic but they also hold a value and that value is persistent across function calls it's shared between different IANA functions it's usually used for instance for storing parameters of the model that you want to learn and yet these values can be updated as well so here we create two other variables from social variables from from values this one has two dimensions because its initial values after dimensions and this one has only one so that's basically weight matrix and the bias we can name variables by assigning to the name attribute short variable do not have a fixed side either there are usually kept fixed in most models but it's not a requirement then from these inputs we can define expressions that will build new variables intermediate variables which are the result of some computation and so for instance here we can define well the product of X and W at the bias apply sigmoid function on that and they say this is our output variable and from the output Y ball and Y we can define just say the squared error cost so those new variables are connected to the previous ones through the operations that we define and we can visualize the graph structure like that by using for instance by dot print which is a helper function so variables are those square boxes and we have other nodes here we call apply nodes that represent the mathematical operation that connects them so input variables and shared variables do not have any ancestors they don't have any road connecting from them but then you see that intermediate result and and more of them usually when we visualize we don't necessarily care about all the intermediate variables unless they have a name or something and so this is a simplified version of exactly the same the same graph where we hide the unnamed intermediate variables but you can still see all the operations and actually you see that the type on the edges so once you have defined some graph say your forward computation for your model we want to be able to use back propagation to to get your idioms so this is just the basic concept of the chain rule we have a scalar crossed we have intermediate variables that here are vectors here's just the general starting from the from the cost and so the whole derivative of say that that function G is actually a whole Jacobian matrix that's M by n if the intermediate variables are vectors of size N and M and usually you don't need that and it's actually usually a bad idea to compute it explicitly unless you need it for some other purposes what the only thing you need is an expression that given any vector representing the gradient of the cost with respect to the output will compute you the gradient of the cost with respect to the input so basically the dot product between that vector and the whole Jacobian matrix so that's also called the L operator sometimes and so almost all operations in Tiano implement a function that returns that and it actually returns not numbers not a numerical expression for that but it returns a symbolic expression that represents that computation again usually without having to explicitly represent or define that whole Jacobian matrix so you can call Tia no grant which will back propagate through the graph from the cost towards the inputs that that you give and along the way it will call that grad method of each operation back propagating means starting from one for the cost and back propagating through the whole graph accumulating when you have the same variables that used more than once and so on and again here so DCW and this is DB they are symbolic expression the same way as if you had manually defined the gradient expression using T&O operations like the dot product the sigmoid and so on that we that we've seen earlier so we have non numerical values at that point and they are part of the computation graph so the completion graph was extended to add these these variables and we can continue extending the graph from these variables for instance to compute update expressions corresponding to gradient descent something like that like we do here so for instance this is what the extended graph for the gradient looks like so you see there's like a lot of small operations that have been inserted and outputs you have actually here the gradients with respect to the bias which is both an output and an intermediate result that will help compute the gradient with respect to the weights and here's the graph or the update expressions so you have as intermediate as intermediate variables the gradients that we had on the previous slide and then this uses the scaled version with constant 0.1 that's somewhere so once we have defined the whole graph the whole expression that we actually care about from the input and initial weights to the weight updates for our training algorithm we want to compile a function that we'll be able to actually compute those numbers given inputs and perform the weight updates so to compute values what we do is called Tiano dot function and you provide it with the input variables that you want to feel and the output variables that you want to get and you don't have necessarily to provide values for all the inputs that you might have declared especially if you don't want to go all the way through the end of the graph you can have a function that only computes sub set expression for a subset of the graph for instance we can have a predict function here that goes only from X to out we don't need values from Y we don't need and so the gradient and so on will not be computed it's just going to take a small part of the graph and make a function out of it so so that's it you can first compile it get value and call it so you have to provide values for all the input variables that that you define you don't have to provide values for shared variables W and B that we declared earlier there are implicit inputs to all the functions and their value will automatically be be fetched when it's needed can declare other functions like monitoring function that computes both the output and the cost so you have two output you also need the second input Y you can compute the function that does not start from the beginning like for instance I want an error function that only computes the the mismatch between the prediction and the actual targets then I don't have to start from the input I can just start from the prediction and compute the cost then the next thing that you might we want to do is update your Bibles for training it's necessary and again you can pass duty and functions updates a list of updates and updates are pairs of shared variable and the symbolic expression that will compute the new value for that shared Bible so you can see a big W and up they'd be here as implicit outputs of the function like W and B were implicit inputs update W update B are implicit outputs that will compute it that will be completed at the same time as C and then after all the outputs are computed the updates are actually effective and the values are updated so here if we print the value of B before and after having calling after having called the same function then we see the value has changed what happens also during graph compilation is that the graph that we selected for that particular function gets optimized and what we mean by that is that it's going to be rewritten in parts there are some expressions that will be substituted and so on and there are different different goals for that some are quite simple that for instance if we have the same computation being defined twice we only want it to be executed once if you have expressions that are not necessary you don't want to compute them at all for instance if you have X divided by X you don't know and and X is not used anywhere else we just want to replace that by one there are numerical stability optimizations for instance well log of one plus can under fill' if X is really small and this would give 0 whereas which would be close to X things like log of softmax get optimized into more stable locks of Max operation it's also the time where in place and destructive operations are inserted for instance if an operation is the last to be executed on some numbers it can instead of allocating output memory I can just work in place on its input and so on also the transfer of the graph expression to the GPU is due is done during the optimization phase so by default Kanno tries to apply most of the optimizations so that you have the run time that's almost as fast as possible except for a couple of checks and assertions but if you're iterating and want fast feedback and don't care that much about Timothy about the runtime speed then you have a couple of ways of enabling and disabling some set of optimizations and you can do that either globally or function by function so to have a look at for instance what happens during the the graph up to my different phase here's the the original and optimized graph going from the inputs X and W going to the output prediction it's the same one that we've seen before and if we compare that with the function the compile function that goes from these input variables to out which was called predicts this is what we have I won't go into details about what's happening in there but here you have a gem G operation which basically calls an optimized Blas routine that can also do multiplication and accumulation at the same time we have a sigmoid operation here can will work in place destructively on its input which is denoted by the red arrow here if you have a look at for instance the operation optimized graph completing the expression for the updated W and B this was the original one and the optimized one is much smaller it has also in place operations it has fused LM wise operations like for instance if you have a whole tensor and then you do an element-wise a addition with with a constant and then a sigma eight and then something else and so on you want to only loop once through the array and apply all these carrier operations on each element and then go to the next and so on and not iterate each time that you want to apply a new new person and those kind of things happen often when you have automatically generated gradient expressions oh and here you see the update for the shared eyeballs which are inputs so you see the cost and the implicit outputs for the updated wnb here and here another graphitization tool that exists is the back print which basically prints text-based tree like structure of of the graph assigning arbitrary ids and printing the variable names and so on so here you can see more in detail like what the structure is and you see the inputs of gmv and the scaling parameters and so on so when the function is compiled then we can actually run it so T no function is call a ball python objects that that we can that we can call and we've seen those examples here for instance where we call train and so on but what happens to have say optimized run time it's not only the degree of optimizations but we also generate C++ or CUDA code for instance for the LMS loop fusion that I mentioned we can't know in advance which elementwise operation will be will occur in which order in any drive that the user might define so we have on-the-fly code generations for that you generate Python module written in C++ or in CUDA that gets compiled and imported back so that we can use it from Python the runtime environment then calls in the right order the different operations that have to be executed from the inputs to the outputs so that we so that we get the desired results we have a couple of different ones and in particular there's one which was written in C++ which avoids having to switch contacts between the Python interpreter and the C++ execution engine something else that's really crucial for speed and performance is GPU so how to use a GPU in TN oh we wanted to make it as simple as possible in usual cases so now it supports a couple of different data types not only float 32 but double precision if you really need that integers as well and we have now easier interaction with GPU arrays from Python itself so you can just use Python code to handle GP arrays outside of a Tiano function if you'd like all of that he will be in future 0.9 release that we hope to get out soon and to use it well you select the device that you want to use the primary device that you want to use with just the configuration flag for instance you could to get the first GPU that's available or one specific one and if you specify that in the configuration then all share variable will by default be created in GPU memory and the optimizations that move the computation from CPU to GPU so that replace the CPU operation by GPU operations are going to be applied usually you want to make sure you use 432 or even float16 for storage which is experimental but because most GPUs don't have a good performance for for for double precision so how you set those configuration flags you have in order that you never see configuration file that you can it's just basic configuration file from for for Python you have an environment variable where you can define those and the environment variable overrides the config file and you can also set things directly from Python but some flags have to be known in advance before you know is is imported so for instance the device itself you have to set it either in the configuration file or throw flags so I'm going to quickly go over more advanced topics and if you want to learn more about that there's other tutorials available online and there's a documentation on the planning up net so to have loops in the graph we've seen that the expression graph is basically a directed acyclic graph and we cannot have loops in there one way if you know if you know in advance the number of iterations it's just to unroll the loop use a for loop in Python that builds all the nodes for all the time steps it doesn't work if you want for instance to have dynamic no dynamic size for the loop for models that generate sequences for instance it can be an issue so what we have for that in India know is called scan and basically it's one node that encapsulate another whole T&O function and that the end of function or step function is going to compute the is going to represent the computation that has to be done at each time step so you have at the end of function that performs the competition for one time step and you have the scan node that calls it in the loop taking care of the bookkeeping of indices and sequences and feeding the right slice at the right point and feeding back the outputs where needed and having that structure makes it also possible to define gradient for that node which is basically another scan node another loop that goes backwards and applies back drops with time and it can be transferred to GPU as well in which case the internal function is going to be transferred to G and recompile on GPU and there's an example of scan in the lsdm example later this is just a small small example but it's we don't really have time for that we also have a visualization debugging and diagnostic tools one of the reason it's important is that in piano like in terms of flow the definition of a function is separate from its execution and if something doesn't work during the execution if you encounter errors and so on then it's not obvious how to connect that from where the expression was actually defined so we try to have infirmity of error messages and we have some completion modes that enable to for instance check for not a number fall out values you can assign test values to the symbolic variables so that each time you create a new symbolic intermediate variable each time you define a new expression then it the test value gets computed and so you can evaluate on one piece of data at the same time as you build a graph which can be useful to detect shape mismatch errors or it's like that it's possible to extend ya know a couple of ways you can create an app just from Python by calling python wrappers for existing efficient libraries you can extend ya know by writing C or CUDA code and you can also add optimizations either for increased numerical stability for instance or for more efficient computation or for introducing your new ops instead of the nave versions that that a user might have used we have a couple of new features that have been recently added to to the analyst I mentioned the new GPU back-end with support for many data types and we've had some performance improvements especially for convolution 2d and 3d and especially on GPU we made some progress on the time of the graph optimization phase and also have introduced new ways of avoiding recompiling the same graph over and over again and we have new diagnostic tools that are quite useful and interactive visualization an interactive graphical ization tool and pdb breakpoints that enables you to monitor a couple of eyeballs and only break if some condition is met rather than monitoring something every time the before for every every piece of data in the future well we're still working on new operations on GPU we still want to wrap more convenient operations for for better performance in particular the basic errand ends should be completed in the following days hopefully someone has been working on that a lot recently we want better support for 3d convolutions still faster optimization and more work on data parallelism as well so what we want to thank well most of my colleagues and main tno developers and people who contributed one way or another to a lab and the software development efforts and of course recognizing the organizers for volley school now yeah so the slides are available online as I mentioned as a companion notebook and now we can start to go and and more resources if you want to go to go further and now I think that it's time to start the practical examples so for those who have not clone the repository yet then this is the command line you want to two nouns for those who had cloned it you might want to do a git ball just to get the latest to make sure we have the latest versions and you can launch Jupiter notebook on the on the repository itself so we have three examples that we are going to go over logistic regression comes net and the rest yeah so I've launched the Jupiter notebook here and let's start with so intro TN o was the companion notebooks there's nothing new in there just the code snippets I've showed your alrighty and okay let's go with a logic regression is that big enough for do we need to increase the font size okay so I'm going to skip over the text because you probably know already about the model we have some we've packaged the amnesty database with the on on github with the repository so let's load the data and here let's see how we define the model so it's basically the same way that we did in in the styles we define sizes that will be useful for the shell variables we define an input variable here it's a matrix because we want to use mini-batches and we have survived balls initialized from zeros then we define the our model so here's our predictor so the probability of the class given the input and we're going to use well so here the fine model and then the softmax on top of it and the prediction if you want to help prediction it's going to be the class of maximum probability so hard max over that axis because we still want one prediction for each element of of the mini batch then we define the loss function so here is going to be the log likelihood of the label given the input or the cross entropy and we define it simply we don't have like we don't need to have one croissants for P or log likelihood operation by itself you can just build it from the basic building blocks so we take the log of the probability you take the index of the actual target and then you take the mean of that to have them in prediction over the mini batch then derived equations derive the update rules so again we don't have like one gradient descent objects or something like that we just build whatever rule we we want so yeah we could use momentum by defining other shape variables that will hold the velocity and then you have that expressions for both the velocity and the survival itself and then we compile a training function going from X&Y outputting the laws and the dating W and B so while the code is getting generated and compiled and the graph is getting optimized let's see the next step well we also want to monitor not only the log-likelihood but actually actually the misclassification rate on validation and test set so it's simply the different like how many elements are different between the prediction which was the arc max and the actual target and the rate is the mean or the mini-batch and we create another we compile another two and a function outputting that and not doing any updates of course so to train the model well first we need to process the data a little bit so we want to feed the model one mini batch of data at a time so here we have simply a generator I mean not really pay attention right over just a helper function that gives us the mini batch number I and it's going to be the same fraction used both for the training and validation and test set we define a couple of parameters for early stopping in that training loop it's not necessary it's just like a way of knowing when to stop and use only like the best model that was encountered during the optimization so let's let's define that and this is the main training loop it's a bit more complex that it might be but it's because we use this early stopping and we want to only validate when we are confident that the training error has gone down enough but basically the the most important part is you loop over the epochs unless unless you encounter the early stopping conditions and then during each epoch you want to loop over the mini batches and call train model then every once in a while you want to validate and print some result of the validation error so here we call test model on the validation set for that and then keep track of what the best model currently is and get the the test error as well and save the best one so to save the best one to save the model we usually just save the values of all parameters which is more robust than trying to pick all the whole Python object and it also enables more easily transferred to other frameworks to visualization frameworks and so on so let's try to execute that so of course it's a simple model the data is not that big so it should it should not take that long so you see that at the beginning well almost at each iteration we are better on the training set and then after a while the progress is slower and okay so just wait a little bit more seems to stall more and more and okay and here it's the end after 96 epochs so now if we want to visualize what filters were learned or what the final train model looks like we just using a helper function call here to visualize the filters it's not really important but here what we use is we call get value on the weights to access the internal value of the shell variable and then we use that to to plot the different filters and we can see it's kind of reasonable like this is the filter for class zero and see kind of like zero one part did what's important for the two is to have like an opening here and so on so yeah if we have a look at the final error well we can see that the training error is well to hit training you know not plotting it but the validation and the test error I are quite high and we know that the human level performance is quite low and the performance of our model is quite low so it really means that the model is too simple and we should use something more advanced so to use something more advanced if you go back to the home of the Jupiter notebook can have a look at the continent and run Lynnette so this new example is basically it's the same data it's still amnesty because it has the other edge of training fast even on an older laptop and but this time we're going to use a completion net we look up all of conclusion layers and then fully connected layers and then the final classifier so I'm going to make for that float X is float 32 here and let's see how we could use Tiano to define helper classes that are layers that can make it easier for a user to compose them if they want to you to replicate some results or use some classical architectures this is done usually in frameworks built on top of Tiano like carrots like blocks like lasagna some people also develop their own mini framework with their own versions of layers and so on that they find useful and intuitive so this logistic regression layer basically holds well parameters weight and bias and compute the well the conditional probability of classes prediction holds the params and have expressions for the negative log likelihood and errors so if you were to use only that class then it's doing essentially the same as what we did by hand in the previous notebook and in the same way we can define a layer that has convolution and pooling so again in the init methods we pass it well filter shape image shape data side of pooling and so on we initialize the weights using the formula from grow and venture at 2010 and buyers from zeros and then from the inputs while we compute to the convolution with the filters we then computes max pooling and output wealth and H of the pooling plus the bias and here the bias is only like one number for each channel so which means that you don't have a different bias for each location in the image so you could actually apply such a layer on images of various size without having to initialize new parameters or return that and then the same way we define the hidden layer which is just a fully connected layer again initializing weight and by and expression going from so the symbolic expression going from the input and the shared variables to the output after activation and again we want to collect the parameters so that we know what we will want to Train and then here's a function that has that the main the main loop in the main training loop so we have a mini batch generator again it's synced as as before and here we are building the whole graph so always the same the same process we define input symbol symbolic input variables matrix and a vector of int here so L vector is a vector of long because the targets here are in this's and not not one Hots vectors or masks or something like that and we create the first layer which is a Linette compo layer with size we want to have the next one with also so yeah here the image size changes this is mostly for efficiency actually you don't really have to to pass that for for those particular models but you still need like the shape of filters I mean you have the filters anyway and then it's useful to have those size still because even if the convolution layers can handle arbitrary sized images then after that we want to flatten the whole the whole feature Maps and feed that into a fully connected layer and then to the projection layer so this one has to be fixed so we have to know what the last comes layer will will have four dimensions and here we here we go a fully connected layer and the output layer that's just logic regression class same as before we want the final cost to be the log likelihood of that we have again the errors which is the misclassification rate parameters or the concatenation of the parameters of all layers and once we have that we can build the gradient so just one call of grad of cost with respect to parameter updates so again just regular SGD but we could have a class or something that performs like momentum a degree that a delta whatever you need compile the function and here we have again the early stopping routine with the same main loop for all a parks until we are done then loop over the mini-batches and validate every once in a while and stop when it's finished so let's just declare that loading the data exactly the same as before and here we can actually run run that so this was the result of a previous run it that took 5 minutes so I will probably not have time to do that but here you can see basically what happens and if you want to run it or try that during the lunch break or or later you're welcome to to play with it and after that yeah you can visualize the the the round filters as well here you you have them for the first layer and for the and here you have the an example of the activations of the first layer for one example so we have just a little bit more time to cover the lsdm tutorial I mean example so if you go back to the home of the Jupiter notebook and go to ASTM then so this model is an SEM network that tries to predict the next character of our sentence given the previous ones so not going to go into details but here you can see that the LSM layer is defined here with like shot variables for all the the matrices that that you need and the different biases for the different gates and so on so you have a lot of parameters it would be possible and sometimes more efficient to actually define say only one variable that contains the concatenation of a couple of matrices and that way you can do more efficient bigger matrix matrix multiply but this is just one one simple implementation and here's an example of how to use scan for the loop so here we define the step function that takes well a couple of different different inputs so you have like the different activation and so on from the previous time steps you have the current sequence input and so on and from them here's basically the DSM formula where you have the dot product and Sigma 8 or 10 H of the different connection inside the cell and in the end you have the hidden and that it so once you have that that's step function is going to be passed to Tiano dot scan where the sequences are the masks and input so the mask is is useful because we're using mini batches of sequences and not all the sequences in the same batch have the same length also for efficiency we usually want to group them with two group example of similar length together but they may not always be exactly the same length so in that case we pad that to only the longest sequence in the mini batch not the longest sequence in the whole set just for the mini batch but we still have to pad and remember like what's the length of the different sequences is in order for us to correctly predict and back propagate so let's define that here we define the cost function that's the categorical cross-entropy of the sequence and here again you see that the mask is used so that we don't consider the predictions after the end of the sequence logistic regression the same as before does the final cost here for processing the data we're using fuel which is another tool being developed by students at Mira and it's nice because it can read from just plain text data do some pre-processing on-the-fly including things that I mentioned earlier like grouping sequences by similar length and then shuffling them and padding and doing all of that and so it outputs like a generator that you can then feed in your main loop through a channel function so that whole processing happens outside of tno and then the processed values are fed into into the channel function so yeah here we build our final key on a graph we have symbolic inputs for well the input and masks we create lsdm layered a lot correct layer define our cost parameters are the concatenation of the parameters of logistic regression and the current layer take the gradients of course with right to all parameters so as I mentioned it's going to use back prop through time to get the gradient through the scan operation the update rule again simple SGD no momentum nothing it's something that you could add if you want to play with it and compile to function to evaluate the model so here the main loop is training and we also have another function that generates one character at a time given the previous ones that's why we will declare like input here and so does that speak function that get probability predictions we normalize them because we are working in float32 and sometimes if you divide by the sum and RISM then it doesn't add up to one so we want a higher precision for just that operation and then try to generate to generate a sequence every once in a while so again this is the result of a previous run so we see in the so for for monitoring we seed that prediction with the meaning of life is and then we let the network generate so if I try to run it now it's going to be long but here's some examples that I generated yesterday in the previous run so it starts with not that much and it has like a couple of unusual characters I mean it's usually it's not usual to have like one Chinese character in the middle of words you have like concentration in the middle of word and so on but then as it as it progresses you see that it's getting slowly better and better and the meaning of life is is the dets and so of course this is not what's going to give you the the actual meaning of life but yeah a tons lot of ham why not and and this is this so so yeah so I interrupted the the training at some point but you can play with it a little bit and here are some suggestions of things you might want to do like better training algorithms different nonlinearities inside the lsdm sell different initialization of weights try to generate something else that the meaning of life is and yeah so I hope I could give you a good introduction of what you know is what it can be used for and what you can build on top of it and if you have if you have any questions later then we have general users mailing lists we are answering questions on Stack Overflow as well and we would be happy to have your feedback have time for a few quick quick questions that's right here could you go to the mic can you just give a quick example of what debugging might look like in Theon Oh could you just break something in there and show us what happens and how you figure out what it was actually yeah I think I had one okay so let's let's go to say a simple simpler example okay so I'm just going to go to the logistic regression 1 and say for instance that when I initialize my thing I don't have the right I don't have the right shape so you can still build the the whole symbolic graph and at the time where you want to actually execute it then you have an error message that tells you shape mismatch X has of Cowen's and some rows but Y has only that number of rows and the apply node that caused the error is that dot product and gives the inputs again and in that case it tells you it's not really able to tell where it was defined but if you remove the optimizations then it might so we can we can do that and we can go back to where the train operation was defined train Model T a new function and then I'll just say optimizer equals none sorry I have to do my Audi calls piano note optimized or not that's correct yes so it's recompiling the function let's record everything and then he updated our message says back-trace when the node was created and it's somewhere in my kernel and it's on the line py given X equals that so of course we have like lots of things in there but you know that there's a dot product and it's probably a mismatch between those so that's that's one example then there are other techniques that we can use we can have the breakpoints as I said and so on I don't have right now tutorial about that but have some one line and I could point you to that I have some models I'd like to distribute and I don't want to require people to install Python and a bunch of compilers and so unfortunately at the time we're pretty intermingled with Python a lot because all the memory management during the execution is done by Python and we use an umpire and arrays for our intermediate values on the CPU and the similar structure on the GPU even though that one might be easier to convert but yeah all our C code deals with Python and does the ink ref and Decker F and so on so that Python manages the memory so if you want to distribute that I would suggest like a docker container something like that recently even for GPU and video docker is quite efficient and we don't have any modest allowance that that we had seen earlier so it's not ideal and if like someone has some time and the wheel to to help us disentangle tno from the Python runtime it would be awesome but that's a use project okay let's thank Pascal again and we reconvene in 55 minutes for the next talk have a good lunch
Resume
Categories