Transcript
f2aOe-AATps • Rohit Prasad: Alexa Prize | AI Podcast Clips
/home/itcorpmy/itcorp.my.id/harry/yt_channel/out/lexfridman/.shards/text-0001.zst#text/0256_f2aOe-AATps.txt
Kind: captions Language: en can you briefly speak to the Alexa prize for people who are not familiar with it and also just maybe were things stand and what have you learned and what's surprising what have you seen the surprising from this incredible competition absolutely it's a very exciting competition like surprise is essentially Grand Challenge in conversational artificial intelligence where we threw the gauntlet to the universities who do active research in the field to say can you build what we call a social board that can converse with you coherently and engagingly for 20 minutes that is an extremely hard challenge talking to someone in a who you're meeting for the first time or even if you're you've met them quite often to speak at 20 minutes on any topic an evolving nature of topics is super hard we have completed two successful years of the competition first was one with University of Washington second industry of California we are in our third instance we have an extremely strong team of 10 cohorts and the third instance of the of the lexer prizes underway now and we are seeing a constant evolution first year was definitely learning it was a lot of things to be put together we had to build a lot of infrastructure to enable these you know STIs to be able to build magical experiences and undo high-quality research just a few quick questions sorry for the interruption what is failure look like in the 20-minute session so what does it mean to fail not to reach the 20 minimal awesome question so there are one first of all I forgot to mention one more detail it's not just 20 minutes but the quality of the conversation to that matters and the beauty of this competition before I answer that question on what failure means is first that you actually converse with millions and millions of customers as these social BOTS so during the judging phases there are multiple phases before we get to the finals which is a very controlled judging in a situation where we have we bring in judges and we have interactors who interact with these social BOTS that is much more control setting but till the point we get to the finals all the judging is essentially by the customers of Alexa and there you basically rate on a simple question how good your experience was so that's where we are not testing for a 20 minute boundary being claw across because you do want it to be very much like a clear-cut winner be chosen and and it's an absolute bar so did you really break that 20 minute barrier is why we have to test it in a more control setting with actors essentially in tractors and see how the conversation goes so this is why it's a subtle difference between how it's being tested in the field with real customers versus in the lab to award the prize so on the latter one what it means is that essentially the that there are three judges and two of them have to say this conversation is stalled essentially got it and the judges the human experts judges or human experts okay great so this is in the third year so what's been the evolution how far it's in the DARPA challenge in the first year the autonomous vehicles nobody finished in the second year a few more finished in the desert so how far along within this I would say much harder challenge are we this challenge has come a long way do they extend that we've definitely not close to the 20-minute barrier being with coherence and engaging conversation I think we are still five to ten years away in that horizon to complete that but the progress is immense like what you're finding is the accuracy in what kind of responses these social BOTS generate is getting better and better what's even amazing to see that now there's humor coming in the bots are quite you know you're talking about ultimate science of intial and signs of intelligence I think humor is a very high bar in terms of what it takes to create humor and I don't mean just being goofy I really mean good sense of humor is also a sign of intelligence in my mind and something very hard to do so these social BOTS are now exploring not only what we think of natural language abilities but also personality attributes and aspects of when to inject an appropriate joke went to when you don't know the question the domain how you come back with something more intelligible so that you can continue the conversation if if you and I are talking about AI and we are domain experts we can speak to it but if you suddenly switch the topic to that I don't know of how do I change the conversation so you're starting to notice these elements as well and that's coming from partly by by the nature of the 20 minute challenge that people are getting quite clever on how to really converse and essentially mass of the understanding defects if they exist so some of this this is not Alexa the product this is somewhat for fun for research for innovation and so on I have a question sort of in this modern era there's a lot of you look at Twitter and Facebook and so on there's there's discourse public discourse going on and some things are a little bit too edgy people get blocked and so on I'm just out of curiosity are people in this context pushing the limits is anyone using the f-word is anyone sort of pushing back sort of you know arguing I guess I should say in as part of the dialogue to really draw people in first of all let me just back up a bit in terms of why we are doing this right so you said it's fun I think fun is more part of the engaging part for customers it is one of the most used skills as well in our skill store but up that apart the real goal was essentially what was happening is with lot of AI research moving to industry we felt that academia has the risk of not being able to have the same resources at disposal that we have which is law so beta massive computing power and clear ways to test these AI advances with real customer benefits so we brought all these three together in the like surprise that's why it's one of my favorite projects and Amazon and with that the secondary fact is yes it has become engaging for our customers as well we're not there in terms of where we want to it to be right but it's a huge progress but coming back to your question on how do the conversations evolve yes there is some natural attributes of what you said in terms of argument and some amount of swearing the way we take care of that is that there is a sensitive filter we have built that see words and so it's more than keywords a little more in terms of of course there's key word base too but there's more in terms of context these words can be very contextual as you can see and also the topic can be something that you don't want a conversation to happen because this is a criminal device as well a lot of people use these devices so we have put a lot of guardrails for the conversation to be more useful for advancing AI and not so much of these these other issues you attributed what's happening in there I feel as well right so this is actually a serious opportunity I didn't use the right word fun I think it's an open opportunity to do some some of the best innovation in conversational agents in the world why just universities why just you know streets because as I said I really felt young minds young minds it's also too if you think about the other aspect of where the whole industry is moving with AI there's a dearth of talent in in given the demands so you do want universities to have a clear place where they can invent and research and not fall behind with that they can motivate students imagine all grad students left to to industry like us or or faculty members which has happened to so this is in a way that if you're so passionate about the field where you feel industry and academia need to work well this is a great example and a great way for universities to participate so what do you think it takes to build a system that wins the lots of prize I think you have to start focusing on aspects of reasoning that it is there are still more lookups of what intense customers asking for and responding to those are rather than really reasoning about the elements of the of the conversation for instance if you have if you're playing if the conversation is about games and it's about a recent sports event there's so much context in war and you have to understand the entities that are being mentioned so that the conversation is coherent rather than you suddenly just switch to knowing some fact about a sports entity and you're just relying that rather than understanding the true context of the game like you if you just said I learned this fun fact about really rather than really say how he played the game the previous night then the conversation is not really that intelligent so you have to go to more reasoning elements of understanding the context of the dialogue and giving more appropriate responses which tells you that we are still quite far because a lot of times it's more facts being looked after and something that's close enough as an answer but not really the answer so that is where the research needs to go more an actual true understanding and reasoning and that's why I feel it's a great way to do it because you have an engaged set of users working to make help these AI advances happen in this case right you mentioned customers they're there quite a bit and there's a skill what is the experience for the for the user that is helping so just to clarify this isn't as far as I understand the Alexa so this skill is to stand alone for the alakh surprise that means focus on the Alexa prize it's not you ordering certain things that I was on the Cawood trait checking the weather or you're playing Spotify right separate skills exactly so you're focused on helping that I don't know how do people how do customers think of it are they having fun are they helping teach the system what's the experience like I think it's both actually and let me tell you how the how you invoke this skill so you all you have to say Alexa let's chat and then the first time you say Alexa let's chat it comes back with a clear message that you're interacting with one of those you know three social BOTS and there's a clear so you know exactly how we interact right and that is why it's very transparent you are being asked to help right and and we have lot of mechanisms where as the we are in the first phase of feedback phase then you send a lot of emails to our customers and then this they know that this the team needs a lot of interactions to improve these accuracy of the system so we know we have lot of customers who really want to help these you know ste baths and they're conversing with that and some are just having fun with just saying Alexa let's chat and also some adversarial behavior to see whether how much do you understand as a social bot so I think we have a good healthy mix of all three situations so what is the if we talk about solving the Alexa challenge they like surprise what's the data set of really engaging pleasant conversations look like is if we think of this as a supervised learning problem I don't know if it has to be but if it does maybe you can comment on that do you think there needs to be a data set of what it means to be an engaging successful fulfilling conversation that's part of the research question here this was I think it's we at least got the first spot right which is have a way for universities to build and test in a real-world setting now you're asking in terms of the next phase of questions which we are still we're also asking by the way what does success look like from a optimization function that's what you're asking in terms of we as researchers are used to having a great corpus of annotated data and then making Rob then you know sort of tune our algorithms on those right and fortunately and unfortunately in this world of alack surprise that is not the way we are going after it so you have to focus more on learning based on live feedback that is another element that's unique we're just not I started with giving you how you ingress and experience this capability as a customer what happens when you're done so they ask you a simple question on a scale of one to five how likely are you to interact with this social bada game that does a good feedback and customers can also leave more open-ended feedback and I think partly that to me is one part of the question you're asking which I am saying is a mental model shift that as researchers also you have to change your mindset that this is not a dart by evaluation or NSF funded study and you have a nice corpus this is where it's real world you have real data the scale is amazing is the beautiful thing then and then the customer the user can quit the conversation in any tax exactly user that is also a signal for how good you were at that point so and then on a scale one to five one two three did they say how likely are you or is it just a binary I wanted to fire one two five Wow okay that's such a beautifully constructed challenge okay you