Transcript
MGW_Qcqr9eQ • Turing Test: Can Machines Think?
/home/itcorpmy/itcorp.my.id/harry/yt_channel/out/lexfridman/.shards/text-0001.zst#text/0372_MGW_Qcqr9eQ.txt
Kind: captions Language: en in this video I proposed to ask the question that was asked by Alan Turing almost seventy years ago in his paper Computing Machinery and intelligence can machines think this is the first paper in a paper reading club that we started focused on artificial intelligence but also including mathematics physics computer science you know science all the scientific and engineering disciplines on the surface this is a philosophical paper but really it's one of the most impactful and important first steps towards actually engineering intelligent systems but providing a test benchmark that we call today the Turing test of how we can actually know quantifiably that a system has become intelligent so I'd like to talk about an overview of ideas in the paper provide some of the objections inside the paper and external to the paper consider some alternatives to the test proposed within the paper and then finished with some takeaways like I said the title of the paper was Computing Machinery and intelligence published almost 70 years ago in 1950 author Alan Turing and to me now we can argue about this on the slide I say it's one of the most impactful papers to me it probably is the most impactful paper in the history of artificial intelligence while only being a philosophy paper I think the number of researchers from inside computer science and from outside that has inspired as may dream at a collective intelligence level of our species inspire that this is possible I think is immeasurable for all the major engineering breakthroughs and computer science breakthroughs and papers stretching all the way back to the 30s and 40s with even the work by Alan Turing with the Turing machine some some of the mathematical foundations of computer science to today with deep learning a sequence of papers from the very practical Alex Ned paper to the backpropagation paper so all of these papers that underlie the actual successes of the field I think the seed was plan did the dream was born with this paper and it happens to have some of my favorite opening laws of any paper I've ever read it goes I propose to consider the question can machines think this should begin with the definitions of the meaning of the terms machine and think the definition might be framed so as to reflect so far as possible normally use of the words but this attitude is dangerous if the meaning of the words machine and think are to be found in examining how they're commonly used it is difficult to escape the conclusion that the meaning and the answer to the question can machines think is to be sought in a statistical survey such as a Gallup poll but this is absurd instead of attempting such a definition I shall replace the question by another which is closely related to it and is expressed in relatively unambiguous terms and he goes on to define the imitation game the construction that we today call the Turing test which goes like this there's a human interrogator on one side of the wall and there's two entities one a machine one a human on the other side and the human interrogator communicates with the two entities on the other side of the wall by written word by passing notes back and forth and after some time of this conversation the human interrogator is tasked with making a decision which of the other two entities is a human and which is a machine I think this is a powerful leap of engineering which is take an ambiguous but a profound question like can machines think and convert it into a concrete test that can serve as a benchmark for intelligence but there's echoes in this question to some of the other profound questions that we often ask so not only can machines think but can machines be conscious Commission's fall in love can machines create art music poetry can machines enjoy a delicious meal piece of chocolate cake I think these are really really important questions but very difficult to ask when we're trying to create a non-human system that tries to achieve human-level capabilities so that's where touring formulates this imitation game and his prediction was that by the year 2000 or in 50 years since the paper that a machine with 100 megabytes of storage will fool 30% of humans in a five-minute test of conversation another broader societal prediction he made which i think is also interesting is that people will no longer consider a phrase like thinking machine contradictory so basically artificial intelligence at a human level become so commonplace that we would just take it for granted and the other part that he goes at lengthen towards the end of the paper to describe which he believes that learning machines or machine learning will be a critical component of this success I think it's also useful to break apart to imply claims within the paper open claims open questions one is that the imitation game as throwing proposes is a good test of intelligence and the second is that machines can actually pass this test so when you say can machines think you're both proposing an engineering benchmark for the word think and raising the questions can machines pass this benchmark one of the perhaps tragic but also exciting aspects of this whole area of work is that we still have a lot of work to do so throughout this presentation I will not only describe some of the ideas in the paper and outside of it in the year since but also some of the open questions that remain both at the philosophical the psychological and the technical levels so here the open question stands is even impossible to create a test of intelligence for artificial systems that will be convincing to us or will we always raise the bar a Korell that question is looking at the prediction that were made that people will no longer find the phrase Thinking Machines contradictory why do we still find that phrase contradictory why do we still think that computers are not at all intelligent for many people the game of chess was seen as the highest level of intelligence in these early days in fact we assign a lot of intelligence to Garry Kasparov for being one of the greatest if not the greatest chess players of all time as a human why do we not assign at least an inkling of that to IBM D blue when I beat Garry Kasparov now of course you might start saying there's a brute-force algorithm or in the case of alphago now for zero you know how the learning mechanisms behind those algorithms work when they mastered the game of go in the game of chess and we'll get to some of those objections but there's something deeply psychological within those objections that almost fear an artificial intelligence that passes the test so the drawing test is very interesting as a thought experiment as a philosophical construct but it's also interesting as a real engineering test and one of the implementations of it has been called the lobner prize which has been running since 1991 to today and the awards behind it the war structure is 25,000 dollars for a system that using text alone passes the test and $100,000 that uses other modalities like visual and auditory input the rules of the competition have changed through the years but there currently are as follows it's a 25-minute conversation and in order to win to pass the test you have to fool 50 percent of the judges with which the system communicates mitsugu and rose from Steve Warwick and Bruce Wilcox have been dominating the past ten years winning all but one of the years some details Masuka rows are both mostly scripted rule-based chatbots so they're not and to end learning systems I believe there is a little bit of machine learning but as as I understand at the core there mostly scripted and on the next slide we'll go through a transcript of a conversation with mitsuko to give you a sense of what those conversations are like but I'd like to mention sort of a tragic note really that as far as I understand the prize is no longer funded but money aside it shows that there's not a broad interest in this construction both inside and outside the research community it's really surprising to me that google deepmind facebook AI major groups major individual researchers in the field of AI have not taken on this challenge I hope to actually be a small part of changing that a little bit because I think there is a lot of value in considering the Turing test I don't think as a distraction and I'll return to that point later on in this datian so you can pause and maybe read parts of this transcript I'll highlight on the right at the top of the right column the human says you're a robot Mitsuko answers how can a person be a robot human many people live lives akin to robots Mitsuko how many more than a million human yes Mitsuko that seems like rather a lot human do you think Southampton will be Swansea with suku I think so human why it's so good no reason it's just what I think human he must have a reason Masuka nope no real reason human which team do you support Mitsuko I support Leeds United human leads exclamation point how's the season going Masuku yes it is a city in the north of England about half a million people live there it is the same as ever so you see in this transcript it's just some little gap of in humanity that becomes apparent there's a lingering and a repeatability of points there's certain gaps in understanding and ability to follow tangents all those kinds of things but it's still not clear to me as an open question how to make explicit where exactly the point of the failure of the test is I believe that hasn't actually been really researched that well in these constructions as opposed to decision making at the very end of a conversation is this human or not rather marking parts of a conversation is more or less human like suspicious parts that make you wonder this is not human I think they'll be really interesting to see if it's possible to make explicit what aspects of the conversation are the failure points one of the times that the claim that the Turing test was passed I think most famously was in 2014 at an exhibition event that marked the 60th anniversary of drawings death eugene goostman fooled 33% of the event judges and the method he used was to portray a 13 year old Ukrainian boy that had a bunch of different personality quirks and obviously the language barrier and had some humor and a constant sort of drive towards misdirecting the conversation back to the places where it was comfortable doing so there's some criticism the committee of this event due to some sort of smoke and mirrors kind of the PR marketing side of things that that I think is always there with these kind of exhibition events but setting that aside I think the interesting lessons here is that the parameters the rules of the actual engineering of the Turing test can determine whether it contains sort of the spirit of the Turing test which is the test that captures the ability of of an agent to have a deep meaningful conversation so in this case you can argue that a few tricks were used to circumvent the need to have a deep meaningful conversation and 30% of judges were fooled without rigorous thorough transparent open domain testing on the left is a transcript with Scott Harrison the famed computer scientist quantum computing researcher talked to him on the podcast brilliant guy he posted some of the conversation that he had with Eugene he was one of the judges on his blog that I think is really interesting so it shows that the judge the interrogator when they're an expert they can drive they can truly put the the bot to the test Scott did he really didn't allow the kind of misdirection the Eugene non-stop tried to to do and you could see that in the transcript Scott refuses to take the misdirection so as I mentioned despite the waning I guess popularity of the lobner prize and the Turing test idea in general Google has published the paper and proposed a system called Mena that's a chatbot that's an end-to-end deep learning system presentation of goal in the 2.6 billion parameters is to capture the conversational context well to be able to generate the text that fits the conversation context well now one interesting aspect of this besides being a serious attempt at creating a learning-based system for open domain conversational agents is that a new metric is proposed and it's a two-part metric of sensibleness and specificity now sensibleness is that a boss responses have to make sense in context they have to fit the context just give you a sense for humans who have 97 percent sensibleness so ability to match what we're saying to the to the context now the reason you need another side of that metric is because you can be sensible you can fit the context by being boring by being generic by making statements like I don't know or that's a good point so if these generic statements that fit a lot of different kinds of contexts so the other side of the metric is specificity basically the goal being there is don't be boring is to say something very specific to this context so not only does it match the context but it captures something very unique to this particular set of lines of conversation that form the context I think it's fair to say that the the beauty the music the the humor the wit of conversation comes from that ability to play with the specifics the specificity metric so both are really important humans achieve 86% sensibleness and specificity mean achieved seventy-nine percent compared to mitsugu who achieves 56% now take this all with a grain of salt I want to be very careful here because there is also not to throw shade but it's close source currently and there's a little bit of a feeling of a PR marketing situation here naturally perhaps the paper is made in such a way the methodology and the results are made in such a way that benefit the way the learning framework was constructed now that's I don't want to over criticize that because I think there's still a lot of interesting ideas in this paper but in terms of looking at the actual percentages of 86 percent human performance and 79 percent meaning a performance I think we're quite away from being able to make conclusive statements about a system achieving human level conversational capabilities so those plots should be taken with a grain of salt but the actual content of the idea is I think is really interesting I think quite obviously the future long term but hopefully short term is in learning end to end learning based approaches to open domain conversation so just like drawing described funny enough 70 years ago on this paper that machine learning it would be essential to success I I believe the same it's a lot less interesting and revolutionary to think so today but I believe that machine learning will also need to be a very central part of achieving human level conversational capabilities so let's talk through some objections nine of them are highlighted by Turing himself in his paper here provides some informal highly informal summaries the first objection is religious which connects thinking to quote unquote the soul and God presumably is the giver of the soul to humans now Tory's response to that is God is all-powerful there is no reason why he can't assign souls to anything biological or artificial so it doesn't seem that whatever mechanism by which the soul arrives in the human cannot also be repeated for artificial creatures the second objection is the quote unquote head in the sand it's a bit of a ridiculous one but I think it's an important one because it keeps coming up often even in today's context highlighted by folks like you know I musk Stuart Russell and so on the head in the sand objection is that AGI is scary so human level and super human level intelligence scary today we talk about is existential threats it seems like the world would be totally transformed if we have something like that then it could be transform in a highly negative way so let's not think about it because it kind of seems far away so it probably won't happen so let's just not think about it that's kind of the objection of the Turing test it's so far away it's not worthwhile to even think about a test for this intelligence or what human level intelligence means or what superhuman level intelligent means the response quite naturally is that it doesn't matter how you feel about something on whether it's going to happen or not so we kind of have to set our feelings aside and not allow fear or emotion to model our thinking or detract us from thinking about it at all the third objection is from Gaydos incompleteness theorem saying there's limits to computation this is the Roger Penrose line of thinking that basically if a machine is a computation system there is limited capabilities in that it can never be a perfectly rational system Tony's response to this is that humans are not rational either they're flawed nowhere does it say that intelligence equals infallibility in fact it could probably be argued that phal ability is at the core of intelligence the fourth objection is that consciousness may be required for intelligence Touring's response to this is to separate whether something is conscious and whether something appears to be cautious so the focus of the Turing test is how something appears and so in some sense humans to us as far as we know only appear to be cautious we can't prove that they're actually conscious humans outside of ourselves and so since humans only appear to be cautious there's no reason to think that machines can't also appear to be cautious and that's at the core of the Turing test so the Turing test kind of skirts around the question of whether something is or isn't intelligence whether is or isn't conscious the fundamental question is does it appear to be intelligent does it appear to be cautious so he actually doesn't respond to the idea that consciousness is or isn't required for intelligence he just says that if it is there's no reason why you can't fake it and that will be sufficient to achieve the display of intelligence the fifth objection is the negative Nancy objection of machines will never be able to do X whatever X is you can make it love joke humor understand to generate humor eat enjoy food create art music poetry and so on so there's a lot of things we could put in that X the machines could never do and basically highlighting our human intuition about the limitations of machines just like well the second objection naturally the response here is that the objection that machines will never do X doesn't have any actual reasoning behind it is just a vapid opinion based on the world today refusing to believe that the world of tomorrow will be different the sixth objection probably the most important one the most interesting comes by way of Ada Lovelace lady Lovelace the mother of computer science it was a basic idea that machines can only do what we program them to do now this is an objection that appears in many forms throughout before touring and after touring and I think it's a really important objection to think about so in this particular case I think Turia's response is quite shallow but it is nevertheless pretty interesting and we'll talk about it again later on his responses well if machines can only do what we programmed them to do we can rephrase that statement as saying machines can't surprise us and when you rephrase it that way it becomes clear that machines actually surprised us all the time a system that is sufficiently complex will no longer be one of which we have a solid intuition of how it behaves even if we built all the individual pieces of code for those of you have programmed things so I've written a lot of programs in the initial design stage of an intuition about how it should behave there's a design there's a plan you know what the individual functions do but as the piece of code grows your ability to intuit exactly the mapping from input to output fades with the size of the code base even if you understand everything about the code and even if you said logical and syntactic bugs aside the objection looks to the brain and looks to the continuous analog nature of that particular neural network system so Touring's response to that is sure the brain might be analog and then computer digital computers are discrete but if you have a big enough digital computer it can sufficiently approximate the analog system meaning to a sufficient degree that it would appear intelligent the eighth objection is the freewill objection right is that when you have deterministic rules laws algorithms they're going to result in predictable behavior and this kind of exactly deterministic predictable behavior doesn't quite feel like the mind that we know us humans as possessing this kind of feeling that underlies what's required for intelligence for a mind I think is behind the Chinese room thought experiment that we'll talk about next so Touring's response here is that humans very well could be a complex collection of rules there's no indication that we're not just because we don't understand or don't even have the tools to explore the kind of rules that underlie our brain doesn't mean it's not just a collection of deterministic perfectly predictable sets of rules objection number nine is kind of fun quite possibly Turing is throwing us but more likely the ideas of mind-reading extrasensory perception telepathy were a little bit more popular in his time so the objection here is what if mind-reading was used to cheat the test so basically if human to human communication through telepathy could be used then a machine can't achieve that same kind of telepathic communication and so that can be used to uh to circumvent the effectiveness of the test now join us response to this is well you just have to design our room that not only protects you from being able to see whether it's a robot or a human but also design a telepathy proof room that prevents telepathic communication again could be touring trolling us but I think more importantly I think it's a nice illustration at the time and even still today that there's a lot of mystery about how our mind works if you chuckle and completely laugh off the possibility of telepathic communication I think you're assuming too much about your own knowledge about how our mind works I think we know very little about how our mind works it is true we have very little scientific evidence of telepathic communication but that shouldn't you shouldn't take the next leap and have a feeling like you understand that telepathic communication is impossible you should nevertheless maintain an open mind but as an objection it doesn't seem to be a very effective one I wanted to dedicate just one slide and probably the most famous objection to the Turing test proposed by John Searle in 1980 in his paper minds brains and programs commonly known as the Chinese room thought experiment and it's kind of a combination of number four number six and number eight objections in the previous slide which is the consciousness is required for intelligence the ada lovelace objection that programs can only do what we program them to do and the deterministic free will objection that deterministic rules will lead to predictable behavior and that doesn't seem to be like what the mind does so there's echoes of all those objections that toring anticipated all put together into the Chinese room as a small aside it is now 6 a.m. I did not sleep last night so this video is brought to you by this magic potion called nitro cold brew a an excessively expensive canned beverage from Starbucks that fuels me this wonderful Saturday morning here's to you dear friends okay the Chinese room involves following instructions of an algorithm so there's a human sitting inside a room that doesn't know how to speak Chinese but there's notes being passed to them inside the room from outside in Chinese and all they do is follow a set of rules in order to respond to that language so the idea is if the brain inside the system that passes the Turing test is simply following a set of rules that it's not truly understanding it is not conscious it does not have a mind the objection is philosophical so there's not for my computer science engineering self there's not enough meat in it to even make it that interesting it's very human centric but allow us to explore it further so the key argument is that programs computational systems are formal and so they can capture syntactic structure minds our brains have mental content so they can capture semantics and so the claim that I think is the most important the clearest in the paper is that syntax by itself is neither constitutive of nor sufficient for semantics so just because you can replicate the syntax of the language doesn't mean you can truly understand it and this is the same kind of criticism we hear of language models of today with transformers that opening is gp2 really doesn't understand the language it's just mimicking the statistics of it so well that it can generate syntactically correct and even like have echoes of semantic structure that indicates some kind of understanding but it doesn't to me that argument is not very interesting from an engineering perspective because it just sounds like saying humans can understand things humans are special therefore machines cannot understand things it's a very human centric argument that's not allowing us to rigorously explore what exactly this understanding mean from a computational perspective or put in other words if understanding intelligence consciousness either one of those is not achievable through computation then where is the point that computation hits the wall the most interesting open questions to me here are on the point of faking things or mimicking or the appearance of things does the mimicking of thinking equal thinking does the mimicking of consciousness equal consciousness does the mimicking of love equal love this is something that I think a lot about and depending on the day go back and forth but I tend to believe from an engineering perspective I tend to agree with the spirit and the work of Alan Turing in that at this time as engineers we can only focus on building the appearance of thinking the appearance of consciousness the appearance of love I think as we work towards creating that appearance will actually begin to understand the fundamentals of what it means to be conscious what it means to love what it means to think you may have even heard me say sometimes that the appearance of consciousness is consciousness I think that's me being a little bit poetic but I think from our perspective from our exceptionally limited understanding both problems are in the same direction so it's not like if we focus on creating the appearance of consciousness that's going to lead us astray in my personal view is going to lead us very far down the road of actually understanding and maybe one day engineering consciousness and now I'd like to talk about some alternatives and variations the Turing test that I find quite interesting so there's a lot of kind of natural variations and extensions to the Turing test first the total Turing test proposed in 1989 it extends the Turing test in the natural language conversation domain to perception computer vision and obviously manipulation of robotics so it takes it into the world the interesting question here to me is whether adding extra modalities like audio visual manipulation makes the test harder or easier to me is very possible that a test with a narrow bandwidth of communication such as the natural language communication the Turing test is actually harder to pass than the one that includes other modalities but anyway one of the powerful things about the original Turing test is that is so simple the Lovelace test proposed in 2001 builds on the Ada Lovelace objection to form the test that says the machine has to do something surprising that the creator or the person who's aware how the program was created cannot explain so it should be truly surprised there is also in 2014 was proposed Lovelace 2.0 test which emphasizes a more constrained definition of what surprising is because it's very difficult to pin down to formalize the idea of surprise and explain right in in the original formulation of the Lovelace test but with Lovelace 2.0 it emphasizes sort of creativity art so on so it's more concrete than surprise especially if you define constraints to which creative medium we're operating in you basically have to create an impressive piece of artistic work I think that's an interesting conception but it takes us in the land that's much more not less subjective than the original Turing test but this brings us to the open and the very interesting question of surprise which i think is really at the core of our conception of intelligence I think it is true that our idea of what makes an intelligent machine is one that really surprised us so when we one day finally create a system of human level or superhuman level intelligence we will surely be surprised so we have to think what kind of behavior is one that will surprise this to the core to me I have many examples in mind that I'll cover in future videos but one certainly one of the hardest ones is humor and finally the truly total Turing test proposed in 1998 proposes an interesting philosophical idea that we should not judge the performance of an individual agent in an isolated context but instead look at the body of work produced by a collection of intelligent agents throughout their evolution with some constraints on the consistency underlying you know the evolutionary process it's interesting to suggest that the way we conceive of intelligence amongst us humans is grounded in the long arc of history of the body of work we've created together I don't find that argument convincing but I do find the interesting question and the open question the idea that we should measure systems not in the moment or a particular five-minute period or 20 minute period but over a period of months and years perhaps condensed in a simulated context so really increase the scale at which we judge interactions by several orders of magnitude that to me is a really interesting idea you know to judge alpha zero performance not on a single game of chess but looking at millions of games and not looking at a million games for a static set of parameters but looking at the millions of games played as the system was strained from scratch and became better and better and better there's something about that full journey that may capture intelligence so intelligence very well could be the journey not the destination I think there's something there it's very imprecise in this construction but it struck me as a as a very novel idea for benchmark not to measure instantaneous performance but performance over time in the improvement of performance over time it appears that there's something to that but I can't quite make it concrete and I'm not sure as possible to formalize in the way that the original Turing test is formalized another kind of test is the Winograd schema challenge which i think is really compelling and in many ways so first to explain it with an example there's a sentence really two sentences let's say the trophy doesn't fit into the brown suitcase because it's too small and the trophy doesn't fit into the brown suitcase because it is too large and the question is what is too small what is too large the answer for the small what is too small is the suitcase is too small the trophy doesn't fit into the brown suitcase because it is too small and then the second question is what is too large the answer there is the trophy that trophy doesn't fit into the brown suitcase because it is too large the basic idea behind this challenge is the ambiguity and the sentence can only be resolved with common-sense reasoning about ideas in this world and so the strength of this test is it's quite clear quite simple and yet requires the least in theory this this deep thing that we think makes us human which is the ability to reason at the very basic level of common sense reasoning the other nice thing is it can be a benchmark like we're used to in the machine learning world that doesn't require subjective human judges there's literally a right answer the weakness here that's holds for other similar challenges in the space is that it's very difficult to come up with a large amount of questions I mean each one is handcrafted and so that means you can't build a benchmark of millions or billions of questions it has to be on a small scale variations of the Winograd scheme are included and some natural language benchmarks of today that people use in the machine learning context the Amazon elect surprise I think captures nicely the spirit of the Turing test I think it's actually quite an amazing challenge and competition that uses voice conversation in the wild so with real people and they can use a I think it's called a social bot skill on there Alexa devices and I don't want to wake up my own Alexa devices but basically say her name and say let's chat and that brings up one of the bots involved in the challenge and then you can have a conversation and then the bar that's to be reached is for you to have a twenty minute or longer conversation with the bot and for two-thirds or more of the interactions to be that long so the basic metric of successful interaction is the duration of the interaction and as of today we're still really really far away from that so why is this a good metric and I do think it's a really powerful metric as opposed to us judging the quality of conversation in retrospect we speak with our actions so a deep meaningful conversation is one we don't want to leave when we have other things contending for our time when we make the choice to stay in that conversation that's as powerful a signal as any to show that that conversation has content has meaning is enjoyable I think that it's what passing the Turing test in its original spirit actually is and I should mention that as of today no team has even come close to passing the Turing test as it is constructed by the Alexa prize there are several things that are really surprising about this challenge one is that it's not a lot more popular and two that Amazon chose to limit it to students only I mean almost making it an educational exercise as opposed to a moonshot challenge for our entire generation of researchers I mentioned before but I'll say it again here that it's surprising to me that the biggest research lab industry and academia have not focused on this problem have not found the magic within the Turing test problem and the elect surprise as it formulates I believe the spirit of the Turing tests quite well a very different kind of test is the hotter price that I buy markers hotter which I think is really fascinating on both a philosophical mathematical angle underlying it is the idea that compression is strongly correlated with intelligence put another way the ability to compress knowledge well requires intelligence and the better you compress that knowledge the more intelligent you are I think this is a really compelling notion because then we can make explicit we can quantify how intelligent you are by how well you're able to compress knowledge as the prize webpage puts it being able to compress well is closely related to acting intelligently thus reducing the slippery concept of intelligence to hard file size numbers so the task is to take one gigabyte of Wikipedia data and compress it down as much as possible the current best is a eight point five eight compression factor so down from one gigabyte to one hundred seventeen megabytes and the awards for each one percent improvement you win five thousand euros I find this competition just amazing and fascinating on many levels I think it's a really good formulation of an intelligence challenge but it's not a test that's one of his kind of limitations at least in the poetic sense that it doesn't set a bar beyond which we're really damn impressed meaning it's harder to set a bar like the one formulated by the Turing test beyond which we feel it would be human level intelligence now the bar that's set by the Turing Alan Turing and others the lobna prize alexa prize are also arbitrary but it feels like we're able to intuit a good bar in that context better being able to intuit the kind of bar we need to set for the compression challenge another fascinating challenge is the abstraction and reasoning challenge put forth by francois charlet just a few months ago so this is very exciting it's actually ongoing is a competition on Kegel I think with the deadline in May it's a really really interesting idea I haven't internalized it fully yet and perhaps we'll do a separate video on just this paper alone and I'll talk to Francois I'm sure on the podcast and other contacts in the future about it I think there's a lot of brilliant ideas here that I still have to kind of digest a little bit but let me describe the high level ideas behind this benchmark so first of all the name is abstraction reason in corpus or challenge arc the domain is in a grid world of patterns not limited in size but the grid world is filled with cells that can be of different colors and the spirit of the set of tests that Francois proposes is to stay close to IQ test so psychometric intelligent tests that we use to measure the intelligence of human beings now the Turing test is kind of at a higher level of natural language in this construction of Arc it goes as close as possible to the very basic elements of reasoning just like an attic you test of patterns it gets to the very core such that we can then make explicit the priors the concepts that we bring to the table of those tests and if we can make them explicit it reduces the test as close as possible to the measure of the system's ability to reason now the concepts that are brought to this grid world here's just a couple of example of priors that Francois shows in his paper I recommend highly it called on the measure of intelligence here prior concept is not referring to a previous concept is referring to a prior set of knowledge that you bring to the table so this first row of illustrations of the two world's illustrates the idea of object persistence with noise so we're able to understand that large objects when there is some visual noise occluding our ability to see them that they still exist in the world and if that noise changes the object is still unchanged so that that idea of object persistence in the world is as a prior that we bring to the table of understanding this grid world another prior is on the left at the bottom is objects are defined by spatial contiguity so so objects in this grid world when the cells are the same color and they're touching each other they're probably part of the same object and if there's black cells that separate the those groupings of cells that means there's multiple objects so this kind of spatial contiguity of colored cells defined the entity of the object and on the right at the bottom is the color based contiguity which means that even if the cells of different colors are touching if their colors are different that means it likely belongs to a different object that's the basic prior and there's a few others by the way just beautiful pictures in that paper that make you really think about the core elements of intelligence I love that paper worth worth looking at there's a lot of interesting insights in there just to give you some examples of what the actual task for the machine in this test looks like it's similar to the kind of task we've seen in an IQ test so here there's three pairings and the task is for the fourth pairing of images to generate the grid world that fits the other three that fits the generating pattern of the other three so in this case figure four from the paper a task where the implicit goal is to complete a symmetrical pattern the nature of the task is specified by the three input-output examples the test-taker must generate the output grid corresponding to the input grid of the test input bottom right so here will your task with understanding in the first three pairings is that the input has a perfect global symmetry to it and also that there's parts of the image that are missing that can be filled in order to complete that perfect symmetry now that's relying on another prior another basic concept of symmetry which I think underlies a lot of our understanding of visual patterns again so the intelligent system has to have a good representation of symmetry in various contexts this is fascinating and beautiful beautiful images okay another example figure 10 from the paper a task where the implicit goal is to count unique objects and select the objects that appears the most times the actual task has more demonstration pairs in these three so figure 10 here from the paper a task where the implicit goal is to count unique objects and select the objects that appear the most times so again there's three pairings you see in the first one there's three blue objects and the second one is four yellow objects and the third one there's three red objects so you have to figure that out and then the output is the grid cells capturing that object that appears the most times and so apply that kind of reasoning to complete the output of the fourth pairing one of the challenges for this kind of test is it's difficult to generate but just like I said I think there's a lot of really interesting technical and philosophical ideas here that are worth exploring so let's quickly talk through a few takeaways so zooming is the Turing test a good measure of intelligence and can it serve as an answer to the big ambiguous but profound philosophical questions of chem machines think so first some notes on the underlying challenges of the Turing test let's talk about intelligence so if we compare human behavior and intelligent behavior it's clear that the Turing test hopes to capture the intelligent parts of human behavior but if we're trying to really capture human level intelligence it's also possible that we want to capture the unintelligent irrational parts human behavior so it's an open question or the natural conversation is a test of intelligence or humaneness because if it's a test of intelligence it's focusing only on kind of rational systematic thinking if it's a test of humaneness then you have to capture the full range of emotion the mess the irrationality the laziness the boredom all the things that make us human and all the things that then project themselves into the way we carry out through conversation as I mentioned in the previous objectives the Turing test really focuses on the external appearances not the internal processes so like I said from an engineering perspective I think it's very difficult to create a test for internal processes for some of these concepts that we have a very poor understanding of like intelligence like consciousness I think the best we can do right now in terms of quantifying and having a measure of something we have to look at the external performance of the system as opposed to some properties of the internal processes another challenge for the Turing test as Scott our instance conversation we gene Guzman indicates is that the skill of the interrogator is really important here that's both on the just the conversational skill of how much you can stretch and challenge the conversation with and to on the human side of it the ability of the interrogator identified the humaneness of both the human and the machine so the ability to have a conversation that challenges the bot and the ability to make the actual identification of human or machine those are both skills that are essential to the Turing test also to me is really interesting the anthropomorphize a ssin of human to inanimate object interaction I think is really fascinating and it's an open question whether in some construction of the Turing test whether anthropomorphism is leveraged to convince the human whether that's cheating the Turing test or in fact that's an essential element to convincing us humans that something is intelligent perhaps as a starting point we have to anthropomorphize something before we allow to be intelligent in our subjective judgment of its intelligence and finally another limitation of the Turing test that could be narrowly stated as why do we expect a bot to talk what is it why what if it doesn't feel like talking does it still fail I think a more general way to phrase that is why do we judge the performance of a system on such a narrow window of time I think as I mentioned before this there could be something interesting on expanding the window of time over which we analyze the intelligence of the system looking not just at the average performance but the growth of its performance as it interacts with you as the individual I think one key aspect of intelligence is a social aspect and a social connection I think in part may require getting to know the person and there's something to rethink in the Turing test that relies on us building a relationship with a person as part of the test so you could think of it as kind of the ex machina Turing test where they spent a series of conversations together several days together all those kinds of things that feels like an interesting extension of the Turing test which could reveal the significant limitation of the current construction of the Turing test which is a limited window of time one time at the end interrogator judgment of whether it's human or machine now my view overall on the Turing test is that yes something like the Turing test as originally constructed so the natural language conversation is close to the ultimate test of intelligence and moreover this is where I disagree I think I disagree with Francois shalay and other world-class researchers in the areas through it Russell and so on that I think the Turing test is not a distraction for us to think about it doesn't pull us away from actually making progress in the field I think it keeps us honest I think truly analyzing where we stand in natural language conversation will help us understand how far away we are and more than that I think there should be active research on this field I think the love the prize type of formulations the elect surprise formulations should be more popular than they are and I think researchers should take them very seriously now that doesn't mean that the the work of the the arc benchmark with the IQ test type of intelligent tests is not also going to be fruitful potentially very fruitful but I think ultimately the real and test of human level intelligence will occur in something like the construction of the Turing test with natural language open domain conversation the results in deep meaningful connection between human and machine zooming out a little bit I think in general I think AI researchers don't like and try to avoid the messiness of human beings as is captured by the human robot interaction field and set of problems I think more than just embracing the Turing test I think we should embrace the messiness of the human being in all the different domains of computer vision of natural language of robotics autonomous vehicles I've been a longtime advocate that semi autonomous vehicles are here to stay for a long time we're going to have to figure out the human robot interaction problem and for that we have to embrace perceiving everything about the human inside the car perceiving everything about the humans outside the car as I mentioned this presentation of the paper is actually part of our paper reading club focused on artificial intelligence where we discuss a couple of times a week on the discord server called Lex plus AI podcast they you're welcome to join we have an amazing community of brilliant people there that discuss all kinds of topics in artificial intelligence and beyond this particular illustration that I just love is from will Scobie who's an illustrator from United Kingdom who is part of this discord community so he contributed it and in general aside from the amazing conversations I encourage and hope to see other members of the community contribute art code visualizations slides ideas for these kinds of videos I'm really excited by the kind of conversations I've seen if you're watching this video I want to join in click on a discord link in the description on the slide join the conversation new paper every week it's fun just to give you a little sense of the ideas behind this AI paper reading Club like what the goals are so what is it I think the goal is to take a seminal paper in the field that doesn't just focus in on the specific sort of paragraph to paragraph section of section analysis what the papers saying but actually use the paper to discuss the history the big-picture development of the field within the context of that paper now that could be philosophical papers like the storm-tossed paper or it could be very specific papers in the field again physics mathematics compete science and probably quite a bit of deep learning so the hope is to prioritize beautiful powerful impactful insights as opposed to full coverage of all the contents of the paper and the actual meanings on this chord hopefully are less one person presenting and more discussion there's a lot of brilliant people there civil so you can have 300 400 people on voice chat which is a really intimate setting and yet people aren't interrupting each other it's not chaos it's quite an amazing community the other goal I'd love to see is even if we cover technical papers the goal is for it to be accessible to everyone so both high school students people outside of all of these fields in general but also I'd love to make it be useful to experts in the field expert researchers so avoid using technical jargon but still try to discover insights that are new that are interesting that are important for the researchers in the field that's what I would love to achieve here with this paper Reading Club if you're interested join in listening or contribute to the conversation suggest papers suggest content visualizations code always welcome it's an amazing community thanks for watching this excessively long presentation if you have suggestions let me know otherwise hope to see you next time you