File TXT tidak ditemukan.
Machine Learning at Spotify - Gustav Soderstrom | AI Podcast Clips
_TWgsvF4hBQ • 2019-10-09
Transcript preview
Open
Kind: captions Language: en there is an interesting statistic I saw that so Spotify has maybe you can correct me but over 50 million songs tracks and over 3 billion playlists so yes a million songs and three billion playlist 60 times more playlists what do you make of that yeah so the way I think about it is that from a from is that the station or machine learning point of view you have all these if you only thing about reinforcement learning where you have this state space of all the tracks and you can take different journeys through this through this world and these I think of these is like people helping themselves and each other creating interesting vectors through this space of tracks and then it's not so surprising that across you know many tens of millions of kind of atomic units there will be billions of paths that make sense and we're probably pretty quite far away from having found all of them so kind of our job now is users when Spotify started it was really a search box that was for that time pretty powerful and then I'd like to refer to that this programming language called play listing where if you as you probably were pretty good at music you knew your new releases you knew your backyard low you knew your stairway to heaven you could create a soundtrack for yourself using this playlist thing to all that's like meta programming language for music this sounds like your life and people who were good at music it's back to how do you scale the product for people who are good at music that wasn't actually enough if you had the catalog in a good search tool you can create your own sessions you could create really good a soundtrack for your entire life probably perfectly personalized because you did it yourself but the problem was most people many people aren't that good at music they just can't spend the time even if you're very good at news it's gonna be hard to keep up so what we did to try to scale this was to essentially try to build you can think of them as agents that there's this friend that some people had that helped them navigate this music catalog that's what we're trying to do for you but also there is something like 200 million active users on Spotify so there it's okay so from the machine learning perspective you have these two hundred million people plus they're creating it's really interesting to think of playlist as I mean I don't know if you meant it that way but it's almost like a programming language it's a released a trace of exploration of those individual agents the the listeners and you have all this new tracks coming in so it's a fascinating space that is ripe for machine learning so that is there is there is it possible how can playlist be used as data in terms of machine learning and just to help Spotify organize the music so we found in our data not surprising that people who play listed lots they retain much better they had a great experience and so our first attempt was to playlist for users and so we acquired this company called tune ego of editors and professional playlist errs and kind of leverage the maximum of human intelligence to help to help build kind of these vectors through the track space for for people and that that broaden the product then the obvious next and we you know use statistical means where they could see what when they created a playlist how did that play this perform you know they could see skips of the songs they could see how the songs perform and they manually iterated the playlist to maximize performance for a large group of people but there were never enough editors to playlists for you personally so the promise of machine learning was to go from kind of group personalization using editors and tools into statistics to individualization and then what's so interesting about the 3 billion playlist we have is we ended the truth is we locked up this was not a priori strategy as is often the case yeah it looks really smart in hindsight was as dumb luck we looked at these playlists and we had some people in the company a person named Eric Reynolds on it was really good at machine learning already back in in back then in like 2007-2008 back then it was mostly collaborative filtering you so forth but we realized that what what this is is people are grouping tracks for themselves that have some semantic meaning to them and then they actually label it with a playlist name as well so in a sense people were grouping tracks along semantic dimensions and labeling them and so could you could you use that information to find that that latent embedding and so we started playing around with collaborative filtering and we saw tremendous success with it basically trying to extract some of these some of these dimensions and and if you think about it's not surprising at all it'd be quite surprising if playlists were actually random if they had no semantic meaning for most people they group these tracks for some reason so we just happen to cross this incredible data set where people are taking taken these tens of millions of tracks and grouped them along different semantic vectors and the semantics being outside the individual user so some kind of universal there's a universal embedding that holds across people on this earth yes I do think that the embeddings do finally gonna be reflective of the people who play listed so if if you have a lot of indie lovers who playlist your embeds can perform better there but what we found was that yes there were these these latent similarities they were very powerful and we we had them it was interesting because I think that the people who play listed the most initially were this so-called music aficionados who who really into music and they often had a certain they're tasteful stuff is often certain geared towards a certain type of music and so what surprised us if you look at the problem from the outside you might expect that the algorithms would start performing best with mainstreamers first because it somehow feels like an easier problem to solve mainstream taste then really particular taste it was the complete opposite for us the recommendations performed fantastically for people who saw them us having very unique taste that's probably because all of them playlist and they didn't perform so well for mainstream is they actually thought they were a bit too particular and unorthodox so we had a complete opposite of what we expected success within the hardest problem first and then had to try to scale to more mainstream recommendations you
Resume
Categories