Transcript
f2aOe-AATps • Rohit Prasad: Alexa Prize | AI Podcast Clips
/home/itcorpmy/itcorp.my.id/harry/yt_channel/out/lexfridman/.shards/text-0001.zst#text/0256_f2aOe-AATps.txt
Kind: captions
Language: en
can you briefly speak to the Alexa prize
for people who are not familiar with it
and also just maybe were things stand
and what have you learned and what's
surprising what have you seen the
surprising from this incredible
competition absolutely it's a very
exciting competition like surprise is
essentially Grand Challenge in
conversational artificial intelligence
where we threw the gauntlet to the
universities who do active research in
the field to say can you build what we
call a social board that can converse
with you coherently and engagingly for
20 minutes that is an extremely hard
challenge talking to someone in a who
you're meeting for the first time or
even if you're you've met them quite
often to speak at 20 minutes on any
topic an evolving nature of topics is
super hard we have completed two
successful years of the competition
first was one with University of
Washington second industry of California
we are in our third instance we have an
extremely strong team of 10 cohorts and
the third instance of the of the lexer
prizes underway now and we are seeing a
constant evolution first year was
definitely learning it was a lot of
things to be put together we had to
build a lot of infrastructure to enable
these you know STIs to be able to build
magical experiences and undo
high-quality research just a few quick
questions sorry for the interruption
what is failure look like in the
20-minute session so what does it mean
to fail not to reach the 20 minimal
awesome question so there are one first
of all I forgot to mention one more
detail it's not just 20 minutes but the
quality of the conversation to that
matters and the beauty of this
competition before I answer that
question on what failure means is first
that you actually converse with millions
and millions of customers as these
social BOTS so during the judging phases
there are multiple phases before we get
to the finals which is a very controlled
judging in a situation where we have we
bring in judges and we have interactors
who interact with these social BOTS that
is much more control setting but till
the point we get to the
finals all the judging is essentially by
the customers of Alexa and there you
basically rate on a simple question how
good your experience was so that's where
we are not testing for a 20 minute
boundary being claw across because you
do want it to be very much like a
clear-cut winner be chosen and and it's
an absolute bar so did you really break
that 20 minute barrier is why we have to
test it in a more control setting with
actors essentially in tractors and see
how the conversation goes so this is why
it's a subtle
difference between how it's being tested
in the field with real customers versus
in the lab to award the prize so on the
latter one what it means is that
essentially the that there are three
judges and two of them have to say this
conversation is stalled essentially got
it and the judges the human experts
judges or human experts okay great so
this is in the third year so what's been
the evolution how far it's in the DARPA
challenge in the first year the
autonomous vehicles nobody finished in
the second year a few more finished in
the desert so how far along within this
I would say much harder challenge are we
this challenge has come a long way do
they extend that we've definitely not
close to the 20-minute barrier being
with coherence and engaging conversation
I think we are still five to ten years
away in that horizon to complete that
but the progress is immense like what
you're finding is the accuracy in what
kind of responses these social BOTS
generate is getting better and better
what's even amazing to see that now
there's humor coming in the bots are
quite you know you're talking about
ultimate science of intial and signs of
intelligence I think humor is a very
high bar in terms of what it takes to
create humor and I don't mean just being
goofy I really mean good sense of humor
is also a sign of intelligence in my
mind and something very hard to do so
these social BOTS are now exploring not
only what we think of natural language
abilities but also personality
attributes and aspects of when to inject
an appropriate joke went to when you
don't know the question the domain how
you come back with something more
intelligible so that you can continue
the conversation if if you and I are
talking about AI and we are domain
experts we can speak to it but if you
suddenly switch the topic to that I
don't know of how do I change the
conversation so you're starting to
notice these elements as well and that's
coming from partly by by the nature of
the 20 minute challenge that people are
getting quite clever on how to really
converse and essentially mass
of the understanding defects if they
exist so some of this this is not Alexa
the product this is somewhat for fun for
research for innovation and so on I have
a question sort of in this modern era
there's a lot of you look at Twitter and
Facebook and so on there's there's
discourse public discourse going on and
some things are a little bit too edgy
people get blocked and so on I'm just
out of curiosity are people in this
context pushing the limits is anyone
using the f-word is anyone sort of
pushing back sort of you know arguing I
guess I should say in as part of the
dialogue to really draw people in first
of all let me just back up a bit in
terms of why we are doing this right so
you said it's fun I think fun is more
part of the engaging part for customers
it is one of the most used skills as
well in our skill store but up that
apart the real goal was essentially what
was happening is with lot of AI research
moving to industry we felt that academia
has the risk of not being able to have
the same resources at disposal that we
have which is law so beta massive
computing power and clear ways to test
these AI advances with real customer
benefits so we brought all these three
together in the like surprise that's why
it's one of my favorite projects and
Amazon and with that the secondary fact
is yes it has become engaging for our
customers as well we're not there in
terms of where we want to it to be right
but it's a huge progress but coming back
to your question on how do the
conversations evolve yes there is some
natural attributes of what you said in
terms of argument and some amount of
swearing the way we take care of that is
that there is a sensitive filter we have
built that see words and so it's more
than keywords a little more in terms of
of course there's key word base too but
there's more in terms of context these
words can be very contextual as you can
see and also the topic can be something
that you don't want a conversation to
happen because this is a criminal device
as well a lot of people use these
devices so we have put a lot of
guardrails for the conversation to be
more useful for advancing AI and not so
much of these these other issues you
attributed what's happening in there I
feel as well right so this is actually a
serious opportunity I didn't use the
right word fun I think it's an open
opportunity to do some some of the best
innovation in conversational agents in
the world why just universities why just
you know streets because as I said I
really felt young minds young minds it's
also too if you think about the other
aspect of where the whole industry is
moving with AI there's a dearth of
talent in in given the demands so you do
want universities to have a clear place
where they can invent and research and
not fall behind with that they can
motivate students imagine all grad
students left to to industry like us or
or faculty members which has happened to
so this is in a way that if you're so
passionate about the field where you
feel industry and academia need to work
well this is a great example and a great
way for universities to participate so
what do you think it takes to build a
system that wins the lots of prize I
think you have to start focusing on
aspects of reasoning that it is there
are still more lookups of what intense
customers asking for and responding to
those are rather than really reasoning
about the elements of the of the
conversation for instance if you have if
you're playing if the conversation is
about games and it's about a recent
sports event there's so much context in
war and you have to understand the
entities that are being mentioned so
that the conversation is coherent rather
than you suddenly just switch to knowing
some fact about a sports entity and
you're just relying that rather than
understanding the true context of the
game like you if you just said I learned
this fun fact about
really rather than really say how he
played the game the previous night then
the conversation is not really that
intelligent so you have to go to more
reasoning elements of understanding the
context of the dialogue and giving more
appropriate responses which tells you
that we are still quite far because a
lot of times it's more facts being
looked after and something that's close
enough as an answer but not really the
answer so that is where the research
needs to go more an actual true
understanding and reasoning and that's
why I feel it's a great way to do it
because you have an engaged set of users
working to make help these AI advances
happen in this case right you mentioned
customers they're there quite a bit and
there's a skill what is the experience
for the for the user that is helping so
just to clarify this isn't as far as I
understand the Alexa so this skill is to
stand alone for the alakh surprise that
means focus on the Alexa prize it's not
you ordering certain things that I was
on the Cawood trait checking the weather
or you're playing Spotify right separate
skills exactly so you're focused on
helping that I don't know how do people
how do customers think of it
are they having fun are they helping
teach the system what's the experience
like I think it's both actually and let
me tell you how the how you invoke this
skill so you all you have to say Alexa
let's chat and then the first time you
say Alexa let's chat it comes back with
a clear message that you're interacting
with one of those you know three social
BOTS and there's a clear so you know
exactly how we interact right and that
is why it's very transparent you are
being asked to help right and and we
have lot of mechanisms where as the we
are in the first phase of feedback phase
then you send a lot of emails to our
customers and then this they know that
this the team needs a lot of
interactions to improve these accuracy
of the system so we know we have lot of
customers who really want to help these
you know ste baths and they're
conversing with that and some are just
having fun with just saying Alexa let's
chat and also some adversarial behavior
to see whether
how much do you understand as a social
bot so I think we have a good healthy
mix of all three situations so what is
the if we talk about solving the Alexa
challenge they like surprise what's the
data set of really engaging pleasant
conversations look like is if we think
of this as a supervised learning problem
I don't know if it has to be but if it
does maybe you can comment on that do
you think there needs to be a data set
of what it means to be an engaging
successful fulfilling conversation
that's part of the research question
here this was I think it's we at least
got the first spot right which is have a
way for universities to build and test
in a real-world setting now you're
asking in terms of the next phase of
questions which we are still we're also
asking by the way what does success look
like from a optimization function that's
what you're asking in terms of we as
researchers are used to having a great
corpus of annotated data and then making
Rob then you know sort of tune our
algorithms on those right and
fortunately and unfortunately in this
world of alack surprise that is not the
way we are going after it so you have to
focus more on learning based on live
feedback that is another element that's
unique we're just not I started with
giving you how you ingress and
experience this capability as a customer
what happens when you're done so they
ask you a simple question on a scale of
one to five how likely are you to
interact with this social bada game that
does a good feedback and customers can
also leave more open-ended feedback and
I think partly that to me is one part of
the question you're asking which I am
saying is a mental model shift that as
researchers also you have to change your
mindset that this is not a dart by
evaluation or NSF funded study and you
have a nice corpus this is where it's
real world you have real data the scale
is amazing is the
beautiful thing then and then the
customer the user can quit the
conversation in any tax exactly user
that is also a signal for how good you
were at that point so and then on a
scale one to five one two three did they
say how likely are you or is it just a
binary I wanted to fire one two five
Wow okay that's such a beautifully
constructed challenge okay
you