Kind: captions Language: en the following is a conversation with jay mcclelland a cognitive scientist at stanford and one of the seminal figures in the history of artificial intelligence and specifically neural networks having written the parallel distributed processing book with david rommelhart who co-authored the backpropagation paper with jeff hinton in their collaborations they've paved the way for many of the ideas at the center of the neural network-based machine learning revolution of the past 15 years to support this podcast please check out our sponsors in the description this is the lex friedman podcast and here is my conversation with jay mcclelland you are one of the seminal figures in the history of neural networks at the intersection of uh cognitive psychology and computer science what do you have over the decades emerged as the most beautiful aspect about neural networks both artificial and biological the fundamental thing i think about with neural networks is how they allow us to link biology with the mysteries of thought and um you know in the when i was first entering the field myself in the late 60s early 70s cognitive psychology had just become a field there was a book published in 67 called cognitive psychology um and the author said that you know the study of the nervous system was only of peripheral interest it wasn't going to tell us anything about the mind and i didn't agree with that i i always felt oh look i'm i'm a physical being i from dust to dust you know ashes to ashes and somehow i emerged from that um so that's really interesting so there was a sense with cognitive psychology that in understanding the sort of neuronal structure of things you're not going to be able to understand the mind and then your senses if we study these neural networks we might be able to get at least very close to understanding the fundamentals of the human mind yeah i used to think um where i used to talk about the idea of awakening from the cartesian dream so descartes you know thought about these things right he he was walking in the gardens of versailles one day and he stepped on a stone and a statue moved and he walked a little further stepped on another stone and another statue moved and he like why did the statue move when i stepped on the stone and he went and talked to the gardeners and he found out that they had a hydraulic system that allowed the physical contact with the stone to cause water to flow in various directions which caused water to flow under the statue and move the statue and he used this as the beginnings of a theory about how animals act and he had this notion that these little fibers that people had identified that weren't carrying the blood you know were these little hydraulic tubes that if you touch something there would be pressure and it would send a signal of pressure to the other parts of the system and that would cause action so he had a mechanistic theory of animal behavior and he thought that the human had this animal body but that some divine something else had to have come down and been placed in him to give him the ability to think right so the physical world includes the body in action but it doesn't include thought according to descartes right and so the study of physiology at that time was the study of sensory systems and motor systems and things that you could directly measure when you stimulated neurons and stuff like that and um the study of cognition was something that you know was tied in with abstract computer algorithms and things like that but when i was an undergraduate i learned about the physiological mechanisms uh and so when i'm studying cognitive psychology as a first year phd student i'm saying wait a minute the whole thing is biological you had that intuition right away that was seemed obvious to you yeah yeah it isn't that magical though that from just the little bit of biology can emerge the full beauty of the human experience is that why is that so obvious to you well i it's obvious and not obvious at the same time um and i think about darwin in this context too because darwin knew very early on that none of the ideas that anybody had ever offered gave him a sense of understanding how evolution could have worked but he wanted to figure out how it could have worked that was his goal and he spent a lot of time working on this idea and coming you know reading about things that gave him hints and thinking they were interesting but not knowing why and drawing more and more pictures of different birds that differ slightly from each other and so on you know and and then then he figured it out but after he figured it out he had nightmares about it he would dream about the complexity of the eye and the arguments that people had given about how ridiculous it was to imagine that that could have ever emerged from some sort of you know unguided process right that it hadn't been the product of design and and uh so he he didn't publish for a long time in part because he was scared of his own ideas he didn't think they could probably possibly be true yeah um but then you know by the time the 20th century rolls around we all uh you know we understand that evolut or many people understand or believe that evolution uh produced you know the entire uh range of uh animals that there are uh and uh you know descartes idea starts to seem a little wonky after a while right like well wait a minute um there's the apes and the chimpanzees and the bonobos and you know like they're pretty smart in some ways you know so what oh you know somebody comes oh there's a certain part of the brain that's still different they don't you know there's no hippocampus in the monkey brain it's only in the human brain uh huxley had to do a surgery in front of many many people in the late 19th century to show to them there's actually a hippocampus in the chimpanzees brain you know so so their continuity of the species is another element uh that you know contributes to um this sort of you know idea that we are ourselves uh a total product of nature um and uh that to me is the is the magic in the mystery how how nature could actually um you know give rise to uh organisms that have the capabilities that we have so it's interesting because even the idea of evolution is hard for me to keep all together in my mind so because we think of a human time scale it's hard to imagine that like like the the development of the human eye would give me nightmares too because you have to think across many many many generations and it's very tempting to think about kind of a growth of a complicated object and it's like how is it possible for that such such a thing to be built because also me from a robotics engineering perspective it's very hard to build these systems how can through an undirected process can a complex thing be designed it seems not it seems wrong yeah so that's absolutely right and i you know a slightly different career path that would have been equally interesting to me would have would have been um to actually study the process of embryological development flowing on into brain development and the um exquisite sort of laying down of pathways and so on that occurs in the brain and uh i know the slightest bit about that it's not my field but um there are you know fascinating aspects to this process that eventually result in the you know the complexity of of uh various brains at least you know one thing um we're um in in the field i think people have felt for a long time it in the study of vision the continuity between humans and non-human animals has been has been second nature for a lot longer i was having i had this conversation um with somebody who's a vision scientist and you're saying oh we we don't have any problem with this you know the monkey's visual system and the human visual system extremely similar um up to certain levels of course they they diverge after a while but um the first the the visual pathway from the eye to the brain and the first few um layers of cortex um or cortical areas i guess one would say uh are are extremely similar yeah so on the cognition side is where the leap seems to happen with humans that it does seem we're kind of special and that's a really interesting question when thinking about alien life or if there's other intelligent alien civilizations out there is how special is this leap so one special thing seems to be the origin of life itself however you define that there's a gray area and the other leap this is very biased perspective of a human is the the origin of intelligence and again from an engineer perspective it's a difficult question to ask an important one is how difficult does that leap how special were humans did uh did uh a monolith come down did aliens bring down a monolith and some apes had to touch a monolith but to get it it's a lot like dark descartes uh you know idea right exactly i it's but it just seems that it seems one heck of a leap yeah to get to this level of intelligence yeah and you know so chomsky um uh argued um that you know some uh genetic fluke occurred a hundred thousand years ago and you know just happened that some human some homonym of current humans had this one genetic tweak that resulted in language yeah and language then provided this special thing that separates us from all other animals um i'm i think there's a lot of truth to the value and importance of language but i think it comes along with um the evolution of a lot of other related things related to sociality and mutual engagement with others and um establishment of i don't know rich mechanisms for organizing an understanding of the world which language then plugs into right so it's uh language is a tool that allows you to do this kind of collective intelligence and whatever is at the core of the thing that allows for this collective intelligence is the main thing and it's interesting to think about that one fluke one mutation could lead to the like the the first crack open opening of the door to human intelligence like all it takes is one like evolution just kind of opens the door a little bit and then it time and selection takes care of the rest you know there's so many fascinating aspects to these kinds of things so we think of evolution as continuous right we think oh yes okay over 500 million years there could have been this you know relatively continuous uh changes and um but that's not what anthropologists evolutionary biologists found from the fossil record they found you know hundreds of years of hundreds of millions of years of stasis and then you know suddenly a change occurs well suddenly on that scale is a million years or something but but or even 10 million years but but um the concept of punctuated equilibrium was a very important concept in evolutionary biology uh and that also feels somehow right about you know the stages of our mental abilities we we seem to have a certain kind of mindset at a certain age and then at another another age we like look at that four-year-old and say oh my god how could they have thought that way so piaget was known for this kind of stage theory of child development right and you look at it closely and suddenly those stages are so discreet and the transitions but the difference between the four-year-old and the seven-year-old is profound and that's another thing that's always interested me is how we something happens over the course of several years of experience where at some point we reach the point where something like an insight or a transition or a new stage of development occurs and uh you know these kinds of things can be understood um in complex systems uh research and so um evolutionary biology developmental biology cognitive development are all things that have been approached in this kind of way yeah just like you said i find both fascinating those early years of human life but also the early like minutes days of from the embryonic development to like how from embryos you get like the brain that development again from the engineering perspective is fascinating so it's not so the early when you deploy the brain to the human world and it gets to explore that world and learn that's fascinating but just like the assembly of the mechanism that is capable of learning that's like amazing the stuff they're doing with like brain organoids where you can build many brains and study that um self-assembly of a mechanism from like the dna material that that's like what the heck you have literally like biological programs that just generate a system this mushy thing that's able to be robust and learn in a very unpredictable world and learn seemingly arbitrary things or like a very large number of things that enable survival yeah ultimately um that is a very important part of the whole process of you know understanding this sort of emergence of mind from brain kind of kind of thing and the whole thing seems to be pretty continuous so let me uh let me step back to neural networks for for another brief minute you wrote parallel distributed processing books that explored ideas of neural networks in the 1980s together with a few folks but the books you wrote with david uh ronald hart who is the first author on the back propagation paper with jeff hinton so these are just some figures at the time that were thinking about these big ideas what are some memorable moments of discovery and beautiful ideas from those early days i'm going to start sort of with my own process in the mid 70s and then into the late 70s when i met jeff hinson and he came to san diego and we were all together in my time in graduate school as i've already described to you i had this sort of feeling of okay i'm really interested in human cognition but this disembodied sort of way of thinking about it that i'm getting from the current mode of thought about it is isn't working fully for me and when i got my assistant professorship i went to ucsd and um that was in 1974. something amazing had just happened dave rummelhart had written a book together with another man named don norman and the book was called explorations in cognition and it was a series of chapters exploring interesting questions about cognition but in a completely sort of abstract you know non-biological kind of way and i'm saying gee this is amazing i'm coming to this community where people can get together and feel like they've collectively exploring you know ideas and um it was a book that had a lot of i don't know lightness to it and you know the the don norman who was the the more senior figure the roman heart at that time who led that project um you know cr always created this spirit of playful exploration of ideas and so i'm like wow this is great but i was also you know still trying to get from the neurons to the to the cognition and i realized at one point i i got this opportunity to go to a conference where i heard a talk by a man named james anderson who is an engineer but by then a professor in a psychology department who had used linear algebra to create neural network models of perception and categorization and memory and i just blew me out of the water that one could you know create a model that was simulating neurons not just kind of engaged in a stepwise algorithmic process that was construed abstractly but it was simulating remembering and recalling and um recognizing the prior occurrence of a stimulus or something like that so for me this was a bridge between the mind and the brain and i just like stuck and i i remember i was walking across campus one day in 1977 and i almost felt like saint paul on the road to damascus i said to myself you know if i think about the mind in terms of a neural network it will help me answer the questions about the mind that i'm trying to answer and that really excited me so i think that a lot of people were becoming excited about that and one of those people was jim anderson who i had mentioned another one was steve grossberg who had been writing about neural networks since the 60s and jeff hinton was yet another and his phd dissertation showed up uh in an applicant pool to a postdoctoral training program that dave and don the two men i mentioned before remember heart and norman were administering and rommelhardt got really excited about hinton's phd dissertation um and so uh hinton was one of the first um people who came and joined this group of postdoctoral scholars uh that was funded by this this wonderful grant that they got another one who is also well known in neural network circus circles is pulse milenski he was another one of that group anyway um jeff and jim anderson organized a conference at ucsd uh where we we were and uh it was called parallel models of associative memory and it brought all the people together who had been thinking about these kinds of ideas in 1979 or 1980 and this this began to kind of really resonate with some of rommel hart's um own thinking some of his reasons for wanting something other than the kinds of computation he'd been doing so far so let me talk about ronald hart now for a minute okay with that context well let me also just pause because he said so many interesting things before we go to roma heart so first of all for people who are not familiar uh neural networks are at the core of the machine learning deep learning revolution of today uh jeffrey hidden that we mentioned is one of the figures that were important in the history like yourself in the development of these neural networks artificial neural networks that are then used for the machine learning application like i mentioned the back propagation paper is one of the optimization mechanisms by which these uh networks uh can learn and uh the word parallel is really interesting so it's it's almost like synonymous from a computational perspective what how you thought at the time about neural networks that is parallel computation is that would that be fair to say well yeah the the parallel the word parallel in this you know comes from the idea that each neuron is an independent computational unit right it it gathers data from other neurons it integrates it in a certain way and then it produces a result and it's a very simple little computational unit but it it's autonomous in the sense that you know it does its thing right it's it's in a biological medium where it's getting nutrients and various uh chemicals from that medium um but it's uh you know you can think of it as almost like a little little computer in and of itself so the idea is that each you know our brains have oh look you know a hundred or hundreds almost a billion of these little neurons right um and they're all capable of doing their work at the same time so it's like instead of just a single central processor that's engaged in you know chug chug one step after another we have a billion of these little computational units working at the same time so at the time that's i don't know maybe you can comment it seems to me even still to me uh quite a revolutionary way to think about computation relative to the development of theoretical computer science alongside of that where it's very much like sequential computer you're analyzing algorithms that are running on a single computer that's right you're saying wait a minute what what why don't we take a really dumb very simple computer and just have a lot of them interconnected together and they're all operating in their own little world and they're communicating with each other and thinking of computation in that way and from that kind of computation on trying to understand how things like certain characteristics of the human mind can emerge right that's quite a revolutionary way of thinking i would say well yes i agree with you and um there's still this sort of sense of not sort of knowing how we kind of get all the way there um i think and this very much remains at the core of the questions that everybody's asking about the capabilities of deep learning and all these kinds of things but if i could just play this out a little bit a a convolutional neural network or a cnn which you know many people may have heard of is a set of you could think of it biologically as a set of collections of neurons each one had each collection has maybe 10 000 neurons in it but there's many layers right some of these things are hundreds or even a thousand layers deep but others are closer to the biological brain and maybe they're like 20 layers deep or something like that so we have within each layer we have thousands of neurons or tens of thousands maybe well in the brain we probably have millions in each layer so but we're getting sort of similar in a certain way right um and then we think okay at the bottom level there's an array of things that are like the photoreceptors in there in the eye they respond to the amount of light of a certain wavelength at a certain location on the on the pixel array so that's like the biological eye and then there's several further stages going up layers of these neuron-like units and you go from that raw input array of pixels to a classification you've actually built a system that could do the same kind of thing that you and i do when we open our eyes and we look around and we see there's a cup there's a cell phone there's a water bottle and these systems are doing that now right so they are in in terms of the parallel idea that we were talking about before they are doing this massively parallel computation in the sense that each of the neurons in each of those layers is thought of as computing its little bit of something about the input uh simultaneously with all the other ones in the same layer we get to the point of abstracting that away and thinking oh it's just one whole vector that's being computed one one activation pattern is computed in a single step and that that that abstraction is useful but it's still that parallel and distributed processing right each one of these guys is just contributing a tiny bit to that whole thing and that's the excitement that you felt that from these simple things you can emerge when you add these level of abstractions on it yeah you can start getting all the beautiful things that we think about as cognition right and so okay so you have this conference i forgot the name already but it's parallel and something associative memory and so on very exciting technical and exciting title and you started talking about dave romohart so who is this person that was so you've spoken very highly of him yeah can you tell me about him his ideas his mind who he was as a human being as a scientist so dave came from a little tiny town in western south dakota and his mother was the librarian and his father was the editor of the newspaper um and uh i know one of his brothers pretty well um they grew up there were four brothers uh and uh they grew up together uh and their father encouraged them to compete with each other a lot they competed in sports and they competed in mind games you know um i don't know things like sudoku and chess and various things like that and uh dave um was a standout undergraduate he went as at a younger age than most people do to college at the university of south dakota and majored in mathematics and i don't know how he got interested in psychology but he applied to the mathematical psychology program at stanford and was accepted as a phd student to study mathematical psychology at stanford so mathematical psychology is the use of mathematics to model mental processes right so something that i think these days might be called cognitive modeling that whole space yeah it's mathematical in the sense that um you say if this is true and that is true then i can derive that this should follow okay and so you say these are my stipulations about the fundamental principles and this is my prediction about behavior and it's all done with equations it's not done with a computer simulation right so the you you solve the equation and that tells you what the probability that the subject will be correct on the seventh trial of the experiment is or something like that right so it's a it's a it's a it's a use of mathematics to descriptively characterize uh aspects of of behavior and uh stanford at that time was the place where uh there were several really really strong mathematical thinkers who were also connected with three or four others around the country who um you know brought a lot of really exciting ideas uh onto the table and it was a very very prestigious part of the field of psychology at that time so remember heart comes into this um he was a very strong student within that program uh and uh he got this job at this brand new university in san diego in 1967 he's one of the first assistant professors in the department of psychology at ucsd so i got there in 74 seven years later and reunhard at that time was still doing mathematical modeling um but he had gotten interested in cognition he'd gotten interested in understanding and you know understanding i think remains you know what does it mean to understand anyway you know uh it's it's an interesting sort of curious you know like how would we know if we really understood something but but he was interested in building machines that would you know hear a couple of sentences and have an insight about what was going on so for example one of his favorite things at that time was marky was sitting on the front step when she heard the familiar jingle of the good humor man she remembered her birthday money and ran into the house what is margie doing why well there's a couple of ideas you could have but the most natural one is that the good humor man brings ice cream she likes ice cream she's she knows she needs money to buy ice cream so she's gonna run into the house and get her money so she can buy herself an ice cream it's a huge amount of inference that has to happen to get those things to link up with each other and and he was interested in how the hell that could happen and he was trying to build um you know good old-fashioned ai style models of representation of language and and content of you know things like has money so like a lot or like formal logic and like knowledge bases like that kind of stuff yeah so he was integrating that with his thinking about cognition yes the mechanisms cognition how can they like mechanistically be applied to build these knowledge like to actually build something that looks like a web of knowledge and thereby from from there emerges something like understanding whatever the heck that is yeah he was grappling this was something that they grappled with at the end of that book that i was describing explorations and cognition but he was realizing that the paradigm of good old-fashioned ai wasn't giving him the answers to these questions yeah and by the way that's called good old-fashioned ai now it was called that well it was it was beginning to be called that because it was from the 60s yeah by by the late 70s it was kind of old-fashioned and it hadn't really panned out you know and people were beginning to recognize that but and and remember heart was you know like yeah it was part of the recognition that this wasn't all working anyway so he started thinking in terms of uh the idea that we needed systems that allowed us to integrate multiple simultaneous constraints in a way that would be mutually influencing each other so he wrote a paper that just really first time i read it i said oh well you know yeah but is this important but after a while it just got under my skin and it was called an interactive model of reading and in this paper he laid out the idea that every aspect of our interpretation of what what's coming off the page when we read at every level of analysis you can think of actually depends on all the other levels of analysis so what are the actual pixels making up each letter and what do those pixels signify about which letters they are and what do those letters tell us about what words are there and what do those words tell us about what ideas the author is trying to convey and so he had this model where you know we have these little tiny uh elements that represent each of the pixels of each of the letters and then other ones that represent the line segments in them and other ones that represent the letters and other ones that represent the words and um at that time his idea was there's this set of experts there's an expert about how to construct a line out of pixels and another expert about how which sets of lines go together to make which letters and another one about which letters go together to make mitch words and another one about what the meanings of the words are and another one about how the meanings fit together and you know things like that and all these experts are looking at this data and they're they're um updating hypotheses at at other levels so the word expert can tell the letter expert oh i think there should be a t there because i think there should be a word the here and the bottom up sort of feature to letter expert could say i think there should be a t there too and if they agree then you see a t right and so there's a top-down bottom-up interactive process but it's going on at all layers simultaneously so everything can filter all the way down from the top as well as all the way up from the bottom and it's a completely interactive bi-directional parallel distributed process that is somehow because of the abstractions is hierarchical so like yeah so there's different layers of responsibilities different levels of responsibilities first of all it's fascinating to think about it in this kind of mechanistic way so not thinking purely from the structure of a neural network or something like a neural network but thinking about these little little guys that work on letters and then the letters come words and words become sentences and uh that's a very interesting hypothesis that from that kind of hierarchical structure can emerge uh understanding yeah so but the thing is though i want to just sort of relate this to the earlier part of the conversation um when rommelhart was first thinking about it there were these experts on the side one for the features and one for the letters and one for how the letters make the words and so on and and they would each be working sort of evaluating various propositions about you know is this combination of features here going to be one that looks like the letter t and so on and and what he realized kind of after reading hinton's dissertation and hearing about jim anderson's linear algebra-based neural network models that i was telling you about before was that he could replace those experts with neuron-like processing units which just would have their connection weights that would do this job so there so what ended up happening was that remote heart and i got together and we created a model called the interactive activation model of letter perception which is takes these little pixel level uh inputs constructs uh line segment features letters and words but now we built it out of a set of neuron like processing units that are just connected to each other with connection weights so the unit for the word time has a connection to the unit for the letter t in the first position and the letter i in the second position so on and because these connections are bi-directional if you have prior knowledge that it might be the word time that starts to prime the feature to the letters and the features and if you don't then it's it has to start bottom up but the directionality just depends on where the information comes in first and and if you have context together with features at the same time they can convergently result in an emergent perception and that um that was the um the piece of work that we did together that uh sort of got us both completely convinced that you know this neural network way of thinking was going to be able to actually address the questions that we were interested in as cognitive cycle so the algorithmic side the optimization side those are all details like when you first start the idea that you can get far with this kind of way of thinking that in itself is a profound idea so do you like the term uh connectionism to describe this kind of set of ideas i think it's useful it highlights the notion that the knowledge that the system exploits is in the connections between the units right there isn't a separate dictionary the connections between the units so i already sort of laid that on the table with the connections from the letter units to the unit for the word time right the unit for the word time isn't a unit for the word time for any other reason then it's got the connections to the letters that make up the word time those are the units on the input that excite it when it's excited that it in a sense represents in the system that there's support for the hypothesis that the word time is present in the input um but it's not there there's the word time isn't written anywhere inside the bottle it's only written there in the picture we drew of the model to say that's the unit for the word time right yeah and um if if if somebody wants to tell me well what are the how do you spell that word you have to use the connections from that out to to then get those letters for example that's such a that's a counter-intuitive idea we humans want to think in this logic way this this idea of connectionism it doesn't it's weird it's weird that this is how it all works yeah but let's go back to that cnn right that cnn with all those layers of neuron like processing units that we were talking about before it's going to come out and say this is a cat that's a dog but it has no idea why it said that it's just got all these connections between all these layers of neurons like from the very first layer to the you know the like whatever these layers are they just get numbered after a while because they you know they they somehow further in you go the more the more abstract the features are but it's a graded and continuous sort of process of abstraction anyway and you know it goes from very local very very specific to much more sort of global but it's still you know another sort of pattern of activation over an array of units and then at the output side it says it's cat or it's a dot and when when we when i open my eyes and say oh that's lex or um oh you know there's my own dog and i recognize my dog which is a member of the same species as many other dogs but i know this one because of some slightly unique characteristics i don't know how to describe you know what it is that makes me know that i'm looking at lex or at my particular dog right yeah or even that i'm looking at a particular brand of car like i could say a few words about it but if i wrote you a paragraph about the car you you would have trouble figuring out which car is he talking about right so the idea that we have propositional knowledge of what it is that allows us to recognize that this is an actual instance of this particular natural kind is um has always been you know something that uh it never worked right you couldn't ever write down a set of propositions for you know visual recognition and and and so in that space it sort of always seemed very natural that something more implicit um you know you don't have access to what the details of the computation were in between you just get the result so that's the other part of connectionism you cannot you don't read the contents of the connections the connections only cause outputs to occur based on inputs yeah it's it's and for us that like final layer or some particular layer is very important the one that tells us that it's our dog or like it's a cat or a dog but you know each layer is probably equally as important in the grand scheme of things like there's no reason why the cat versus dog is more important than the lower level activations it doesn't really matter i mean all of it is just this beautiful stacking on top of each other and we humans live in this particular layers for us for us it's useful to to survive to to use those cat versus dog predator versus prey all those kinds of things it's fascinating that it's all continuous but then you then ask you know the history of artificial intelligence you ask are we able to introspect and convert the very things that allow us to tell the difference to cat and dog into logic into formal logic that's been the dream i would say that's still part of the the dream of symbolic ai and i've recently talked to uh doug leonard who created psych and that's that's a project that lasted for many decades and still carries a sort of dream in it right um but we still don't know the answer right it seems like connectionism is really powerful but it also seems like there's this building of knowledge and so how do we how do you square those two like do you think the connections can contain the depth of human knowledge and the depth of what uh dave romohart was thinking about of understanding well uh that remains the 64 question and um with inflation that number yeah maybe it's the 64 billion dollar question now uh you know i think that um from the emergence side which you know uh i placed myself on um so i i used to sometimes tell people i was a radical eliminative connectionist because i didn't want them to think that i wanted to build like anything into the machine but um i don't like the word eliminative uh anymore because it makes it seem like it's wrong to think that there is this emergent level of understanding and um i disagree with that so i think you know i would call myself in a radical emergentist uh connectionist rather than eliminative connectionist right because i want to acknowledge that that these higher level kinds of aspects of our cognition are are real but they're not they're they don't they don't exist as such and so there was an example that uh doug hofstetter used to use that i thought was helpful in this respect just the idea that we could think about sand dunes as entities and talk about like how many there are even um but we also know that a sand dune is a very fluid thing it's it's it's a it's a it's a pile of sand that is capable of moving around under the wind and the and and um you know reforming itself in somewhat different ways and and if we think about our thoughts it's like sand dunes as being things that you know emerge from uh just the the way all the lower level elements sort of work together and and are constrained by external forces then we can we can say yes they exist as such but they they also you know we shouldn't treat them as completely monolithic entities that we we can understand without understanding sort of all of the stuff that allows them to change in the ways that they do and that's where i think the connectionist feeds into the into the cognitive it's like okay so if the under if the substrate is parallel distributed connectionist um then it doesn't mean that the contents of thought isn't you know like abstract and symbolic and um but it's more fluid maybe then uh is easier to capture with a set of logical expressions yeah that's a heck of a sort of thing to put at the top of a resume radical emergingist connectionist so i there is just like you said a beautiful dance between that between the machinery of intelligence like the neural network side of it and the stuff that emerges i mean the stuff that emerges seems to be um i don't know i don't know what that is that it seems like maybe all of reality is emergent what i what i think about this is made most distinctly rich to me when i look at cellular automata look at game of life they're from very very simple things very rich complex things emerge that start looking very quickly like organisms that you forget that the forget how the actual thing operates they start looking like they're moving around they're eating each other some of them are generating offspring it you forget very quickly and it seems like maybe it's something about the human mind that wants to operate in some layer of the emergent and forget about the the mechanism of how that emerges happens so i it just like you are in your radicalness i'm uh also it seems like unfair to eliminate the magic of that emergent like eliminate the the fact that that the emergence is real yeah no i agree i'm not that's why i got rid of eliminative right yeah yeah because it seemed like that was trying to say that you know it's all completely like an illusion of some kindness well it it you know who knows whether there isn't there aren't some illusory characteristics there um and and i i think that uh philosophically um many people have have confronted that possibility over time but but uh it it's still important to um you know accept it as magic right so you know i think of fellini in this context i think of um others who have appreciated uh the role of magic uh of actual trickery in creating illusions that that move that move us you know had plato was odd to this too it's like somehow or other these shadows you know give rise to something much deeper than that and and that's that's so you know we won't try to figure out what it is we'll just accept it as given that that that occurs and um you know but he was still on to the magic of it yeah yeah we won't try to really really really deeply understand how it works we just enjoy the fact that it's kind of fun okay but you uh worked closely with dave around my heart he passed away as a human being what do you remember about him do you miss the guy absolutely you know he passed away um 15 ish years ago now and um his his demise was actually one of the most poignant and um you know like relevant uh tragedies um relevant to our conversation he started to undergo a progressive neurological condition that isn't fully understood that is to say his particular course isn't fully understood um because certain you know brain scans weren't done in certain stages and no autopsy was done or anything like that the wishes of the family um so we don't know as much about the underlying pathology as we might but um i had begun to get interested in this neurological condition that might have been the very one that he was succumbing to as my own efforts to uh understand another aspect of this mystery that we've been discussing while he was beginning to get progressively more and more affected so i'm going to talk about the disorder and not about remember heart for a second okay sure the disorder is something my colleagues and collaborators have chosen to call semantic dementia so it's a specific form of loss of mind related to meaning semantic dementia and it's progressive in the sense that the patient loses the ability to appreciate the meaning of the experiences that they have either from touch from sight from sound from language they i hear sounds but i don't know what they mean kind of thing um the so as as this illness progresses it starts with the patient being unable to um differentiate like similar breeds of dog or remember you know the the lower frequency unfamiliar categories that they used to be able to remember but as it progresses it it it becomes more and more striking and and you know the the patient loses the ability to recognize um you know things like pigs and goats and sheep and calls all middle-sized animals dogs and all can't recognize rabbits and and rodents anymore they call all the little ones cats and they can't recognize hippopotamuses and and cows anymore they call them all horses you know so there was this one patient who went through this progression where uh at a certain point any four-legged animal he would call it either a horse or a dog or a cat and if it was big he would tend to call it a horse if it was small he'd tend to call it a cat middle-sized onesie called dogs this is just a part of the syndrome though it the the patient loses the ability to relate uh concepts to each other so my my collaborator in this work carolyn patterson developed a test called the pyramids and palm trees test so you give the patient a picture of pyramids and they have a choice which goes with the pyramids palm trees or pine trees and you know she showed that this wasn't just a matter of language because the patient's loss of this ability shows up whether you present the material with words or with pictures the pictures they can't put the pictures together with each other properly anymore they can't relate the pictures to the words either they can't do word picture matching but they've lost the conceptual grounding from either modality of input and um so it's that's why it's called semantic dementia the very semantics is disintegrating and and we we understand this in terms of our idea that distributed representation a pattern of activation represents the concepts really similar ones as you degrade them they start being you lose the differences and and then um so the difference between the dog and the goat sort of is no longer part of the pattern anymore and since dog is really familiar that's the thing that remains and and we understand that in the way the models work and learn but but remember heart underwent this this condition so on the one hand it's a fascinating aspect of parallel distributed processing to me uh and it reveals this uh this sort of texture of distributed representation in a very nice way i've always felt but at the same time it was extremely poignant because this is exactly the condition that romal heart was undergoing and there was a period of time when he was this man who had been the most focused um goal-directed competitive um thoughtful person who was willing to work for years to solve a hard problem you know he he he starts to disappear and there was a period of time when it was like hard for any of us to really appreciate that he was sort of in some sense not fully there anymore do you know if he was able to introspect this um the solution of this you know the the understanding mind was he i mean this is one of the big scientists that thinks about this yeah was he able to look at himself and understand the fading mind you know um we can we can contrast um hawking and normal heart in this way and i i like to do that to honor rummelhart because i think rummelhart is sort of like the hawking of you know cognitive science to me in some ways um but both of them suffered from a degenerative condition and in hawking's case it affected the motor system in in romelhart's case it's it's affecting the semantics uh and um not not just the pure uh object semantics but maybe the self semantics as well and we don't understand that broadly but but but it's so i would say uh he didn't and this was part of what from the outside was a profound tragedy but but on the other hand at some level he sort of did because you know there was a period of time when it finally was realized that he had really become profoundly impaired this was clearly a biological condition and he wasn't you know it wasn't just like he was distracted that day or something like that so he retired uh you know from his professorship at stanford and he became um he he uh lived with his brother for a couple years and then he moved into a a facility for people with um cognitive impairments um a one that you know many elderly people end up in when they have cognitive impairments and i would spend time with him during that period this was like in the late 90s around 2000 even and you know i would we would go bowling and he could still bowl uh and um i after bowling i took him to lunch and i i said where would you like to go you want to go to wendy's and he said nah and i said okay well where you want to go and he he just pointed he's turn here you know so he still had a certain amount of spatial cognition and he could get me to the restaurant and then when we got to the restaurant i i said what do you want to order and um he couldn't come up with any of the words but he knew where on the menu the thing was that he wanted so so fascinating it's it you know and he couldn't say what it was but he knew that that's what he wanted to eat and and so there was you know that it's it's it's like it isn't monolithic at all this the our cognition is is you know first of all graded in certain kinds of ways but also multipartite there's many elements to it and things uh certain sort of partial competencies still exist in the absence of of other aspects of these competencies so this is what always fascinated me about what uh used to be called cognitive neuropsychology you know the effects of brain damage on cognition but in particular this gradual disintegration part you know i'm a big believer that the loss of a human being that you value is as powerful as you know first falling in love with that human being i think it's all a celebration of the human being so the disintegration itself too is a celebration yeah yeah yeah and but just to say something more about the scientists and and the back propagation idea that you mentioned um so in in 1982 hinton had been there as a postdoc and organized that conference he'd actually gone away and gotten an assistant professorship and then um there was this opportunity to bring him back so jeff hinton was back on a sabbatical san diego in san diego and uh remember heart and i had decided we wanted to do this you know we thought it was really exciting and um our the papers on the interactive activation model that i was telling you about had just been published and we both sort of saw a huge potential for this work and and and jeff was there and so the three of us uh started a research group which we called the pdp research group and several other people came um francis crick who was at the salk institute heard about it from jeff um and because jeff was known among brits to be brilliant and francis was well connected with his british con friends so francis crick came and a heck of a group of people wow and uh uh several as paul spalensky um was one of the other postdocs he was still there as a postdoc and a few other people but anyway jeff talk to us about learning and how we should think about how you know learning occurs in a neural network and he said the problem with the way you guys have been approaching this is that you've been looking for inspiration from biology to tell you how what the rules should be for how the synapses should change the strengths of their connections how the connections should form he said that's the wrong way to go about it what you should do is you should think in terms of how you can adjust connection weights to solve a problem so you define your problem and then you figure out how the adjustment of the connection weights will solve the problem and removal heart heard that and said to himself okay so i'm going to start thinking about it that way i'm going to essentially imagine that i have some objective function some goal of the computation i want my machine to correctly classify all of these images and i can score that i can measure how well they're doing on each image and i get some measure of law error or loss it's typically called in in deep learning and um i'm going to figure out how to adjust the connection weights so as to minimize my loss or reduce the error uh and that's called you know gradient descent and engineers were already familiar with the concept of gradient descent and in fact there was an algorithm called the delta rule that had been invented by a professor in the engineering electrical engineering department at stanford uh woodrow bernie woodrow and a collaborator named hoff i don't never met him anyway so so gradient descent in continuous neural networks with multiple neuron-like processing units was already understood um for a single layer of connection weights we have some inputs over a set of neurons we want the output to produce a certain pattern we can define the difference between our target and what the narrow network is producing and we can figure out how to change the connection weights to reduce that error so what rommelhard did was to generalize that so as to be able to change the connections from earlier layers of units to the ones at a hidden layer between the input and the output and so he first called the algorithm the generalized delta rule because it's just an extension of the gradient descent idea and interestingly enough hinton was thinking that this wasn't going to work very well so hinton had his own alternative algorithm at the time based on the concept of the balsa machine that he was pursuing so the paper on the balsa machine came out in learning in bolster machines came out in 1985 but it turned out that backprop worked better than the bolster machine learning algorithm so this generalized delta algorithm ended up being called back propagation as you say back prop yeah and the you know probably that name is opaque to maybe what what does that mean what it what it meant was that in order to figure out what the changes you needed to make to the connections from the input to the hidden layer you had to back propagate the error signals from the output layer through the connections from the hidden layer to the output to get the signals that would be the error signals for the hidden layer and that's how rimmel hard formulated it was like well we know what the air signals are at the output layer let's see if we can get a signal at the hidden layer that tells each hidden unit what its error signal is essentially so it's back propagating through the connections from the hidden to the output to get the signals to tell the hidden units how to change their weights from the input and that's why it's called back problems yeah but so it came from hinton having introduced the concept of you know define your objective function figure out how to take the derivative so that you can um adjust the connections so that they make progress towards your goal so stop thinking about biology for a second and let's start to think about optimization and computation yeah a little bit more so what about jeff hinton what you've gotten a chance to work with him in that little the set of people involved there is quite incredible the small set of people under the pdp flag it's just given the amount of impact those ideas have had over the years it's kind of incredible to think about but you know just like you said uh like yourself jeffrey hinton is seen as one of the not just like a seminal figure in ai but just a brilliant person just a like the horsepower of the mind is pretty high up there for him because he's just a great thinker so what kind of ideas have you learned from him have you influenced each other on have you debated over what stands out to you in in in the full space of ideas here at the intersection of computation and cognition well so um jeff has said many things to me that had a profound impact on my thinking um and he's written several articles which um uh were way ahead of their time um he uh he had two papers in 1981 just to give one example uh one of which was essentially the idea of transformers and another of which was a early paper on semantic cognition which inspired uh him and rummelhart and me uh throughout the 80s and uh um you know still uh i think sort of grounds my own thinking about um the semantic aspects of of cognition he also in a in a small paper that was never published that he wrote in 1977 you know before he actually arrived at ucsd or maybe a couple of years even before that i don't know uh when he was a phd student he he um described how a neural network could do recursive computation and um it was a very clever idea that he's continued to explore over time which was sort of the idea that um when you when you call a subroutine you need to save the state that you had when you called it so you can get back to where you were when you're finished with the subroutine and and the idea was that you would save the state of the calling routine by making fast changes to connection weights and then when you finished with the subroutine call those fast changes and the connection weights would allow you to go back to where you had been before and reinstate the previous context so that you could continue on with the the top level of the computation anyway that was part of the idea and um i always thought okay that's really you know he just he had extremely creative ideas that were uh quite a lot ahead of his time and many of them in the 1970s and early early 1980s so another thing about jeff hinton's way of thinking which has profoundly influenced my effort to understand human mathematical cognition is that he doesn't write too many equations and people tell stories like oh in in the hints and lab meetings you don't get up at the board and write equations like you do in everybody else's machine learning lab what you do is you draw a picture and and you know he he explains aspects of the way deep learning works by putting his hands together and showing you the shape of a ravine and um using that as a geometrical metaphor for the what's happening as this gradient descent process you're coming down the wall of a ravine if you take too big a jump you're going to jump to the other side and um so that's why we have to turn down the learning rate for example um and it it speaks to me of the fundamentally intuitive character of uh deep insight together with a commitment to really understanding um in a way that's absolutely ultimately explicit and clear uh but also intuitive yeah the there's certain people like that here's an example some kind of weird mix of uh visual and intuitive and all those kinds of things feynman is another example different style thinking but very unique and when you when you're around those people for me in the engineering realm uh there's a guy named jim keller who's a chip designer engineer every time i talk to him it doesn't matter what we're talking about just having experience that unique way of thinking transforms you and makes your work much better and that's that's the magic you look at daniel kahneman you look at the great collaborations throughout the history of science that's the magic of that it's not always the exact ideas that you talk about but it's the process of generating those ideas being around that spending time with that human being you can come up with some brilliant work especially when it's cross-disciplinary as it was a little bit in your case yeah with jeff yeah um jeff is uh a descendant of the logician bool he comes from a long line of english academics and together with the um deeply intuitive thinking ability that he has he also um has uh you know it's been clear he's he's described this to me um and i think he's mentioned it from time to time in other interviews with that he's had with people um you know he's he's wanted to be able to sort of think of himself as contributing to the to the understanding of reasoning itself not just human reasoning like bull like is about logic right it's about what can we conclude from what else and how do we formalize that and um as a computer scientist uh logician philosopher you know um the goal is to understand how we derive truths from other from givens and things like this and and the work that jeff was doing in the um early to mid 80s on something called the boltzmann machine was his way of connecting with that boolean tradition and bringing it into the more continuous probabilistic graded constraint satisfaction realm um and it was it was um beautiful uh a set of ideas linked with theoretical physics um and um as well as with logic and um it it's always been i mean i've always been inspired by the balsa machine too it's it's like well if the neurons are probabilistic rather than you know deterministic in their computations then you know that that maybe this somehow is part of the um serendipity or you know advantageousness of the moment of insight right it might not have occurred at that particular instant it might be sort of partially the result of a stochastic process and uh and and that too is part of the magic of the emergence of uh some of these things well you're right with the bullying lineage and the the dream of computer science is uh somehow i mean i certainly think of humans this way that humans are one particular manifestation of intelligence that there's something bigger going on and you're trying to you're hoping to figure that out the mechanisms of intelligence the mechanisms of cognition are much bigger than just humans yeah so i think of um i've i started using the phrase computational intelligence at some point as to characterize the the field that i thought you know people like jeff hinton um and many of the of the people i know at deepmind um are are working in and where i i feel like i'm um you know i'm a i'm a kind of a human-oriented computational intelligence researcher in that i'm actually kind of interested in the human solution but at the same time i i i feel like that's that's where um a huge amount of the the excitement of deep learning actually lies is in the idea that you know we may be able to even go beyond what we can achieve with our own nervous systems when we build computational intelligences that are um you know not limited in the ways that we are by our own biology perhaps allowing us to scale the very mechanisms of human intelligence just increase its power through scale yes and and i think that that you know obviously that's the that's being played out massively at google brain at open ai and to some extended deep mind as well um i guess i shouldn't say to some extent yeah uh the the massive scale of the um computations that uh are used to succeed at games like go or to solve the protein folding problems that they've been solving and so on still not as many uh synapses and neurons as the human brain so we still got we're still still beating them on that we humans are beating the ais but uh they're catching up qui pretty quickly you write about modeling of uh mathematical cognition so let me first ask about mathematics in general um there's a paper uh titled parallel distributed processing approach to mathematical cognition where in the introduction there's some beautiful discussion of mathematics and uh you reference there uh tristan needham who criticizes a narrow formal view of mathematics by liking the studying of mathematics as symbol manipulation to studying music without ever hearing a note so from that perspective what do you think is mathematics what is this world of mathematics like well i think of mathematics as a set of tools for exploring idealized worlds that often turn out to be extremely relevant to the real world but need not um but there are worlds in which objects exist with idealized properties and in which the relationships among them can be characterized with precision so as to allow the implications of certain facts to then allow you to derive other facts with certainty so you know if uh you have two triangles and you know that there is um uh an angle in the first one that has the same measure as an angle in the second one and you know that the lengths of the sides adjacent to that angle in each of the two triangles the corresponding sides adjacent to that angle are also have the same measure then you can then conclude that the triangles are congruent that is to say they have all of their properties in common and and that is something about triangles it's not a matter of formulas these are idealized objects in fact you know we built bridges out of triangles and uh we understand how to measure the height of something we can't climb by um extending these ideas about triangles a little further and um uh you know all of the ability to um get a tiny speck of matter launched from uh the planet earth to intersect with some tiny tiny little body way out in way beyond pluto somewhere at exactly a predicted time and date is is something that depends on these ideas right so but and it's actually uh happening in the real physical world that these ideas make contact with it uh in those kinds of instances um and um so but you know there are these idealized objects these triangles or these distances or these points whatever they are that um uh allow for this um set of tools to be created that then gives human beings the uh it's this incredible leverage that they didn't have without these concepts and uh i think this is actually already true when we think about just you know the natural numbers um i always like to include zero so i'm going to say the non-negative integers but that's that's the place where some people prefer not to include zero but uh yeah we like zero here natural numbers zero one two three four five six seven and so on yeah and and you know because they give you the ability to um be exact about um like how many sheep you have like you know i sent you out this morning there were 23 sheep you came back with only 22. what happened yeah right the fundamental problem of physics how many sheep you have yeah it's a fundamental problem of life of human uh society that you damn well better bring back the same number of sheep as you started with uh and you know it allows commerce it allows um contracts it allows the establishment of uh records and so on to have systems that allow these things to be notated but they they have um an inherent aboutness to them that's at this one at the one in the same time sort of abstract and idealized and generalizable while at the other on the other hand um potentially very very grounded and concrete and one of the things that makes for the incredible achievements of the human mind is the fact that humans invented these idealized systems that leverage the power of human thought in such a way as to allow all this kind of thing to happen and and so that's what mathematics to me is the development of systems for thinking about uh the properties and relations among uh sets of idealized objects and um uh you know the the mathematical notation system that we unfortunately focus way too much on is um just our way of expressing uh propositions about these properties right it's just just like we're talking with chomsky and language it's the thing we've invented for the communication of those ideas they're not necessarily the deep representation of those ideas yeah so what um what's uh what's a good way to model such powerful mathematical reasoning would you say what what are some ideas you have for capturing this in a model the insights that human mathematicians have had is a combination of the kind of the intuitive kind of connectionist like knowledge that makes it so that something is just like obviously true so that you don't have to think about why it's true that then makes it possible to then take the next step and ponder and reason and figure out something that you previously didn't have that intuition about it then ultimately becomes a part of the intuition that the next generation of mathematical thinkers have to ground their own thinking on so that they can extend the ideas even further i came across this quotation from i'll replace while i was um walking in the in the woods with my wife in a state park in northern california uh late last summer and what it said on the bench was it is by logic that we prove but by intuition that we discover and so what what for me the the essence of the of the project is to understand how to bring the intuitive connectionist resources to bear on letting the intuitive discovery arise uh you know from engagement in thinking with this formal system so i i think of you know the ability of somebody like hinton or newton or einstein or romal heart or poincare to um archimedes is another example right so suddenly a flash of insight occurs it's it's like the constellation of all of these simultaneous constraints that somehow or other causes the mind to settle into a novel state that it never did before and and give rise to a new idea um that you know then you can say okay well now how can i prove this you know how do i write down the steps of that theorem that that'll allow me to make it rigorous and certain and so i feel like the the kinds of things that we're beginning to see um deep learning systems do of their own accord kind of gives me this feeling of of um i don't know hope or encouragement that ultimately it'll all happen so in particular as many people now have have become really interested in thinking about you know neural networks that have been trained with massive amounts of text can be given a prompt and they can then sort of generate some really interesting fanciful creative story from that prompt um and there's there's kind of like a sense that they've somehow synthesized something like novel out of the you know all of the particulars of all of the billions and billions of experiences that went into the training data that that gives rise to something like this sort of intuitive sense of what would be a a fun and interesting little story to tell or something like that it just sort of wells up out of the out of the letting the thing play out its own imagining of what somebody might say given this prompt as a as a input to get it to start to generate its own thoughts and and to me that that sort of represents the potential of capturing this the intuitive side of this yeah and there's other examples i don't know if you find them as captivating is you know on the deep mind side with alpha zero if you study chess the kind of solutions that has come up in terms of chess it is it there's novel ideas there it feels very uh like there's brilliant moments of insight and the mechanism they use if you think of search as as maybe more towards good old-fashioned ai and and then there's the connection is the neural network that has the intuition of looking at a board looking at a set of patterns and saying how good is this set of positions and the next few positions how good are those and that's it no that's just an intuition uh yeah great grandmasters have this and understanding positionally tactically how good the situation is how can it be improved without doing this full like deep search um and then maybe doing a little bit of the what uh human chess players call calculation which is the search taking a particular set of steps down the line to see how they unroll but there there is moments of genius in those systems too so that's another hopeful illustration that from neural networks can emerge this novel creation of an idea yes and i think that you know i think demas asabus is um you know he's spoken about those things he uh i heard him describe a move that was made in in one of the go matches against lisa doll in this very in a very similar way and um it caused me to become really excited to kind of collaborate with some of those guys at deepmind um so i think though that what what i like to really emphasize here is one part of what i like to emphasize about mathematical cognition at least is that philosophers and logicians going back three or even a little more than 3 000 years ago began to develop these formal systems and gradually the whole idea about thinking formally got constructed um and you know it's preceded euclid um certainly present in the work of thales and others and i'm not the world's leading expert in all the details of that history but euclid's elements were the the kind of the touch point of a of a coherent document that sort of laid out this idea of an actual formal system within which these objects were characterized and the um the system of uh inference that um allowed new truths to be derived from others was sort of like established as a paradigm and what what i find interesting is the idea that the ability to become a person who is capable of thinking in this abstract formal way is you know a result of the same kind of immersion uh in in experience thinking in that way that you know we now begin to think of our understanding of language as being right so we immerse ourselves in in a particular language in a particular world of objects and their relationships and we learn to talk about that and we develop intuitive understanding of the real world in in a similar way we can think that what academia has created for us what you know those early philosophers and their academies in athens and alexandria and others other places allowed was the development of these schools of thought modes of thought that that then become deeply ingrained and you know it becomes what it is that makes it so that somebody like jerry fodor would think that um systematic thought is the essential characteristic of the human mind as opposed to a derived and an acquired characteristic that results from acculturation in a certain mode that's been invented by humans would you say it's more fundamental than like language if we start dancing if we if we bring chomps get back into the conversation first of all is it unfair to draw a line between mathematical cognition and language linguistic cognition i think that's a very interesting question and i think um it's one of the ones that i'm actually very interested in right now but i i think the answer is in important ways it is important to draw that line but then to come back and look at it again and see some of the subtleties and interesting aspects of the difference so if we think about chomsky himself he was born into an academic family his father was a professor of rabbinical studies at a small rabbinical college in philadelphia and he was deeply enculturated in uh you know a culture of thought and reason and brought to the effort to understand natural language this profound engagement with these formal systems and um you know i think that there was tremendous power in that and that chomsky had some amazing insights into the structure of natural language but that i'm going to use the word but there the actual intuitive knowledge of these things only goes so far and does not go as far as it does in people like chomsky himself and this was something that was discovered in the phd dissertation of lila gleitman who was actually trained in the same linguistics department with chomsky so what lila discovered was that the intuitions that linguists had about even the meaning of a phrase not just about its grammar but about what they thought a phrase must mean were very different from the intuitions of an ordinary person who wasn't a formally trained thinker and well it recently has become much more salient i happen to have learned about this when i myself was a phd student at the university of pennsylvania but um i never knew how to put it together with all of my other thinking about these things so so i actually currently have the hypothesis that formally trained linguists and other formally trained academics whether it be linguistics philosophy cognitive science computer science machine learning mathematics have a mode of engagement with experience that is intuitively deeply structured to be more organized around the systematicity uh and um ability to be conformed with the principles of a system than um then is actually true of the natural human mind without that immersion that's fascinating so the different fields and approaches with which you start to study the mind actually take you away from the natural operation of the mind so it makes it very difficult for you to to be somebody who introspects yes and you know this is where um uh things about human belief and so-called knowledge that we consider private not our business to manipulate in others we are not entitled to tell somebody else what to believe about certain kinds of things um what are those beliefs well they are the product of this sort of immersion and enculturation uh that is what i believe so and that's limiting it's it's something to be aware of does that limit you from uh having a good model of some of cognition you can so when you look at mathematical or linguistics so i mean what what is that line then what um so is chomsky unable to sneak up to the full picture of cognition are you when you're focusing on mathematical uh thinking are you also unable to do so i think you're you're right i think that's a great way of characterizing it and um i also think that um it's related to um the concept of beginner's mind uh and um another concept called the expert blind spot so the expert blind spot is much more prosaic seeming than than this point that you were just making but it's it's something that plagues experts when they try to communicate their understanding to non-experts and that is that things are self-evident to them that they they can't begin to even think about how they could explain it to somebody else because it's like well it's just like so patently obvious that it must be true and um you know like um when kronecker said god made the natural numbers all else is the work of man he was expressing that that intuition that um somehow or other you know the basic fundamentals of discrete quantities being countable and innumerable and you know indefinite in number um was was not something that had to be discovered um but he was wrong it turns out that many cognitive scientists agreed with him for a time there was a long period of time where there were where um you know the natural numbers were considered to be a part of the innate endowment of you know core knowledge or you know to use the kind of phrases that spelke and and kerry use to talk about what they believe are the innate primitives of the human mind and um they no longer believe that they it's actually um been more or less accepted by almost everyone that the natural numbers are actually a cultural construction and it's it's so interesting to go back and sort of like study those few people who still exist who you know who don't have those systems so so this is just an example to me and where you know a certain mode of thinking about language itself or a certain mode of thinking about geometry and those kinds of relations so become so second nature that you don't know what it is that you need to teach [Music] and um and in fact we don't really teach it all that explicitly anyway and it's it's you know you take a math class the professor sort of teaches it to you the way they understand it some of the students in the class sort of like you know they get it they start to get the way of thinking and they can actually do the problems that get get put on the homework that the professor thinks are interesting and challenging ones but but but most of the students who don't kind of engage as deeply don't ever get you know and we we think oh that man must be brilliant he must have this special insight but i you know he must have some you know biological sort of bit that's different right that makes him so that he or she could have that insight but i i'm i although i don't want to dismiss biological individual differences completely i i find it much more interesting to think about the possibility that um you know it was that difference in the dinner table conversation at the chomsky house when he was growing up that made it so that he had that cast of mind yeah and uh there's there's a few topics we talked about that kind of interconnect because because i wonder the better i get at certain things we humans the deeper we understand something what are you starting to then miss about the rest of the world we talked about david and his uh degenerative mind and you know when you look in the mirror and wonder how different am i am i cognitively from the man i i was a month ago from the man it was a year ago like what you know if i can um having thought about language if i'm chomsky for for 10 20 years what am i no longer able to see what is in my blind spot and how big is that and then to somehow be able to leap back out of your deep like structure that you form for yourself about thinking about the world leap back and look at the big picture again or jump out of the your current way of thinking um and to be able to introspect like what are the limitations of your mind are how is your mind less powerful than you used to be or more powerful or different powerful in different ways so that seems to be a difficult thing to do because we're living we're looking at the world through the lens of our mind right to step outside and introspect is difficult but it seems necessary if you want to make progress you know one of the threads of psychological research that's always been very um i don't know important to me to be aware of is is is the idea that our explanations of our own behavior aren't necessarily um actually part of the causal process that caused that behavior to occur or even valid observations of the set of constraints that led to the outcome but they are post-hoc rationalizations that we can give based on information at our disposal about what might have contributed to the result that we came to when asked and um so this this is an idea that was introduced in a very important paper by nisbet and wilson about you know the limits on our ability to to uh be aware of the factors that cause us to make the choices that we make um and um you know i think it's it's uh it's something that we really ought to be much more um cognizant of in general as human beings is that our own insight into exactly why we hold the beliefs that we do and we hold the attitudes and make the choices and and and feel the feelings that we do is not something that we um we totally control or totally observe and um it's subject to you know our culturally transmitted understanding of what it is that is the mode that we give to explain uh these things uh when asked to do so as much as it is about anything else and so even our ability to introspect and think we have access to our own thoughts as a product of of culture and uh belief you know practice so let me ask you the big uh question of advice so you've lived an incredible life in terms of the ideas you've put out into the world in terms of the trajectory you've taken through your career through your life what advice would you give to young people today in high school and college about um how to have a career or how to have a life they can be proud of finding the thing that you are intrinsically motivated to engage with and then celebrating that discovery is is what uh what it's all about when when i was in college i struggled with that i i um i had thought i wanted to be a psychiatrist because i think i was interested in human psychology in high school and it it at that time the only sort of information i had that had anything to do with the psyche was you know freud and eric from and sort of popular psychiatry kinds of things and so well they were psychiatrists right so i had to be a psychiatrist and that meant i had to go to medical school and i got to college and i find myself taking you know the first semester of a three-quarter physics class and it was mechanics and this was so far from what it was i was interested in but it was also too early in the morning in the winter court semester so i i never made it to the physics class um but i wondered about the rest of my freshman year and um most of my sophomore year until uh i found myself in the midst of this situation where around me um there was this big revolution happening i was at columbia university in 1968 and the vietnam war is going on colombia's building a gym in morningside heights which is part of harlem and people are thinking oh the big bad rich guys are stealing the the park land that belongs to the people of harlem and um you know they're part of the military-industrial complex which is enslaving us and sending us all off to war in vietnam and um so there was a big revolution that involved a confluence of black activism and you know sds and social justice and the whole university blew up and got shut down and um i got a chance to sort of think about why people were behaving the way they were in this context and i you know i happen to have taken mathematical statistics i happened to have been taking psychology that quarter just psych one and somehow things in that space all ran together in my mind and got me really excited about about asking questions about why people what made certain people go into the buildings and not others and things like that and so suddenly i had a path forward that and i had just been wandering around aimlessly and at the different points in my career you know and i think okay well should i take this class or should i just read that book about some idea that i want to understand better you know or should i should i pursue the thing that excites me and interests me or should i you know meet some requirement you know that's i always did the latter so i ended up my my professors in psychology were thought i was great they wanted me to go to graduate school um they they nominated me for phi beta kappa and i went to the phi beta kappa in the ceremony and this guy came up now he said oh are you magnar summa i wasn't even getting honors based on my grades they just happened to have thought i was interested enough in ideas to belong to phi beta kappa so i mean would it be fair to say you kind of stumbled around a little bit through accidents of too early morning of classes in physics and so on until you discovered intrinsic motivation as you mentioned and then that's it it hooked you and then you celebrate the fact that this happens to you human beings yeah like and what is it that made what i did intrinsically motivating to me well that's interesting and i don't know all the answers to it and i don't think uh i wanna i want anybody to think that um you should be sort of in any way i don't know sanctimonious or anything about it you know it's like i really enjoyed doing statistical analysis of data i really enjoyed running my own experiment which was what i got a chance to do in the psychology department that chemistry and physics had never i never imagined that mere mortals would ever do an experiment in those sciences except one that was in the textbook that you were told to do in lab class but in psychology we were already like even when i was taking psych one it turned out we had our own rat and we got to after two set experiments we got to okay do something you think of you know with your rat you know so it's the opportunity to do it myself yeah and and to to bring together a certain set of things that that engaged me intrinsically and and i think it it has something to do with why certain people turn out to be you know profoundly um amazing musical geniuses right they get immersed in it at an early enough point and it just sort of gets into the fabric so my my little brother had intrinsic motivation for music as we witnessed when he discovered how to put records on the phonograph when he was like 13 months old and recognize which one he wanted to play not because he could read the labels because he could sort of see which ones had which scratches which were the different you know oh that's rapidly espanol and that's oh wow you know and and and he enjoyed that that connected with him somehow yeah and and there was something that it fed into and you're extremely lucky if you have that and if you can nurture it and can let it grow and let it be be a important part of your life yeah those are those are the two things is like be attentive enough to to feel it when it comes like this is something special i mean i don't know uh for example i really like um tabular data like excel sheets like it it brings me deep joy i don't know how useful that is for anything but there's this i don't know what i'm talking about exactly so there's like a million not a million but there's a lot of things like that for me you have to hear that for yourself like be like realize this is really joyful but then the other part that you're mentioning which is the nurture is take time and stay with it stay with it a while and see where that takes you uh in life yeah and i think i think the um the the motivational engagement results in the immersion that then creates the opportunity to obtain the expertise so you know that we could call it there the mozart effect right i mean when i think about mozart i think about you know the person who was born as the fourth member of the family's dream quartet right and uh and they handed him the violin when he was six weeks old all right start playing you know it's like and um so the the level of immersion there was was amazingly profound but uh hopefully he also had you know some something maybe this is where the more uh sort of the genetic part comes in sometimes i think uh you know something in him resonated to the music so that that the synergy of the combination of that was so powerful so so that's what i really consider to be the mozart effect it's sort of the the synergy of something with with experience that that then results in the unique flowering of a particular you know mind um so i i know my siblings and i are all very different from each other we've all gone in our own different directions and you know i mentioned my younger brother who was very musical um i had my other younger brother was like this amazing like intuitive engineer um and um my sister one of my sisters was passionate about uh in you know water conservation well before it was a you know such a hugely important issue that it is today so we all sort of somehow these find a different thing um and uh i don't i don't mean to say it isn't uh tied in with something about about us biologically but but it's also when that happens where you can find that then you know you can do your thing and you can be excited about it so people can be excited about fitting people on bicycles as well as excited about making neural networks achieve insights into human cognition right yeah like for me personally i've always been excited about love and friendship between humans and just like the actual experience of it since i was a child just observing people around me and also been excited about robots and there's something in me that thinks i really would love to explore how those two things combine it doesn't make any sense a lot of it is also timing just to think of your own career in your own life you found yourself in certain pieces places that happen to involve some of the greatest thinkers of our time and so it just worked out that like you guys developed those ideas and there may be a lot of other people similar to you and they were brilliant and they never found that right connection and place to where they their ideas could flourish so it's timing its place it's people and uh ultimately the whole ride you know it's uh undirected can ask you about something you mentioned in terms of psychiatry when you were younger because i had a similar experience of you know reading freud and uh called young and just you know those kind of popular psychiatry ideas and that was a dream for me early on in high school to uh like i hope to understand the human mind by i somehow psychiatry felt like the right discipline for that does that make you sad that psychiatry is not the the mechanism by which you want to are able to explore the human mind so for me i was a little bit disillusioned because of how much prescription medication and biochemistry is involved in the discipline of psychiatry as opposed to the dream of the the freud like use the mechanisms of language to explore the human mind so that was a little disappointing and and that's why i kind of went to computer science and thinking like maybe you can explore the human mind by trying to build the thing yes i wasn't exposed to the um sort of the biomedical slash pharmacological aspects of psychiatry at that point because um i didn't i dropped out of that whole idea the physical pre-med that i never even found out about that until much later but you're absolutely right that's uh so i was actually a member of the um national advisory mental health council that is to say the board of scientists who advised the director of the national institute of mental health and that was around the year 2000 and in fact um at that time the man who came in as the new director i had been on this board for a year when he came in um okay schizophrenia is a biological illness it's a lot like cancer we've made huge strides in curing cancer and that's what we're going to do with schizophrenia we're going to find the medications that are going to cure this disease and we're not going to listen to anybody's grandmother anymore and um you know good old behavioral psychology is not something we're going to support any further and um you know he he uh completely alienated me from the institute and from all of its prior policies which had been much more holistic i think really at some level and and basically and the the other people on the board were like psychiatrists right uh very biological psychiatrist it didn't pan out right that that that nothing has changed in in our ability to uh to help people with mental illness uh and um so 20 years later that that that particular path uh was a dead end as far as i can tell well there's some aspect to and sorry to romanticize the whole philosophical conversation about the human mind but to me psychiatrists for time held the flag of we're the deep thinkers in the same way that physicists are the deep thinkers about the nature of reality psychiatrists are the deep thinkers about the nature of the human mind and i think that flag has been taken from them and carried by people like you it's like it's more in the cognitive psychology especially when you have a foot in the computational view of the world because you can both build it you can like intuit about the functioning of the mind by building little models and be able to say mathematical things and then deploying those models especially in computers to say does this actually work they do a little like experiments and then some combination of neuroscience where you're starting to actually be able to observe you know do certain experiments on human beings and observe how the uh the brain is actually functioning and there using intuition you can start being the philosopher like richard feynman is the philosopher a cognitive psychologist can become the philosopher and psychiatrists become much more like doctors they're like very medical they help people with medication by biochemistry and so on but they are no longer the the the the book writers and the philosophers which of course i admire the i admire the richard feynman ability to do great low-level mathematics and physics and the high-level philosophy yeah i think it was uh frohm and young more than freud that was sort of initially kind of like made me feel like oh this is really amazing and interesting and i want to explore it further i actually when i got to college and i lost that thread i i found more of it in sociology and literature than i did in any place else so i took quite a lot of both of those disciplines as an undergraduate and you know i was actually deeply ambivalent about the psychology because i was doing experiments after the initial flurry of interest in why people would occupy buildings during a insurrection and consider you know uh to be be sort of like so over committed to their beliefs but i ended up in in the psychology laboratory running experiments on pigeons and and so i had these profound sort of like dissonance between okay the kinds of issues that would be explored when i was thinking about uh what i read about in in modern british literature um versus what i could study with my pigeons in the laboratory that got resolved when i went to graduate school and i discovered cognitive psychology and and so for me that was uh that was the path out of this sort of like extremely sort of um ambivalent divergence between the interest in the human condition and and uh the desire to do you know actual mechanistically oriented thinking about it um and i think we we've come a long way in that regard and that uh is you're absolutely right that nowadays this is something that's accessible to people through the pathway in through computer science or the pathway in through uh neuroscience you know you can get derailed in neuroscience down to the bottom of the system where you might find the curious of various conditions but you don't get a chance to think about the higher level stuff so it's in the systems in cognitive neuroscience and computational intelligence miasma up there at the top that i think these opportunities are most are richest uh right now and um so yes i am indeed blessed by having had the opportunity to fall into that space so you mentioned the human condition speaking which you happen to be a human being who is unfortunately not immortal that seems to be a fundamental part of the human condition that this riot ends do you think about the fact that you're going to die one day are you afraid of death uh i i would say that i am not as much afraid of death as i am of um degeneration uh and uh i say that in part for reasons of having you know seen some tragic degenerative situations uh unfold it's exciting when you can continue to participate and uh feel like you're you're near the the place where the the wave is breaking on the shore i feel like you know um and and i i i think about you know my own uh future potential um if if i were to undergo a uh begin to suffer from dementia uh alzheimer's disease or semantic dementia or some other condition you know i would sort of gradually lose the thread of that ability and so so one can live on for several for a decade after you know sort of having to retire because one no longer has uh these kinds of um abilities to engage and uh i think that's the thing that i feared the most the losing of that like that that um the the breaking of the way the flourishing of the mind where you could have these ideas and they're swimming around you're able to play with them yeah and and and and collaborate with other people who you know are themselves uh um really helping to push these ideas forward so yeah what about the edge of the cliff the end i mean the the mystery of it the i mean the migrated sort of conception of mind and you know sort of continuous sort of way of thinking about most things makes it so that uh to to me the the the um the discreteness of that transition is less less less apparent than it seems to be to most people i see i see yeah um yeah i wonder so i don't know if you know the work of ernest becker and so on i wonder what what role mortality and our ability to be cognizant of it and anticipate it and perhaps be afraid of it what role that plays in in our reasoning of the world i think that it it can be motivating to people to think they have a limited period left um i think in in my own case you know it it's like seven or eight years ago now that i was i was sitting around doing experiments on decision making that were satisfying in a certain way because i could really get closure on what whether the model fit the data perfectly or not and i could see how one could test you know the predictions in monkeys as well as humans and really see what the neurons were doing but i just realized hey wait a minute you know i may only have about 10 or 15 years left here and i don't feel like i'm getting towards the answers to the really interesting questions while i'm doing this this particular level of work and that's when i said to myself okay um let's pick something that's hard you know so that's when i started working on mathematical cognition and um i i think it was more in terms of well i got 15 more years possibly of useful life left let's imagine that it's only 10. i'm actually getting close to the end of that now maybe three or four more years um but i'm beginning to feel like well i probably have another five after that so okay i'll give myself another another six or eight um but a deadline is a little bit like and that's not gonna go on forever yeah and so um so uh yeah i gotta keep um thinking about the questions that i think are the interesting and important ones for sure what do you hope your legacy is you've done some incredible work in your life as a man as a scientist when the aliens and the human civilization is long gone and the aliens are reading the encyclopedia about the human species what do you hope is the paragraph written about you i would wanted to sort of highlight a couple things that i was you know able to see um one path that was more exciting to me than the one that seemed already to be there for a cognitive psychologist you know but not for any super special reason other than that i'd had the right context prior to that but that i had gone ahead and followed that lead you know and then i forget the exact wording but i i said uh in this preface that the the joy of science is the moment in which you know a partially formed thought in the mind of one person gets crystallized a little better in the discourse and becomes the foundation of some exciting concrete piece of actual scientific progress and i feel like that you know moment happened when romelu heart and i were doing the interactive activation model and when rommel heart heard hinton talk about gradient descent and having the objective function to guide the learning process and um it it happened a lot in that period and i i sort of seek that kind of thing in my uh collaborations with my students right so um you know the idea that this is a person who contributed to science by finding exciting collaborative opportunities to engage with other people through is something that i certainly hope is part of the paragraph and uh like you said taking a step maybe in directions that are not not obvious so it's the the old robert frost road less taken so maybe because you said like this incomplete initial idea that step you take is a little bit uh off the beaten path if if i could just say one more thing here i uh this was something that really contributed to energizing me in a way that i uh that i feel it would be useful to share i uh my my phd dissertation project was completely empirical experimental project and i i wrote a paper based on the the two main experiments that were the core of my dissertation and i submitted it to a journal and at the end of the paper i had a little section where i laid out my the beginnings of my theory about what i thought was going on that would explain the data that i had collected and i had submitted the paper to the journal of experimental psychology so i got back a letter from the editor saying thank you very much these are great experiments we'd love to publish them in the journal but what we'd like you to do is to leave the theorizing to the theorists and take that part out of the paper and so i did i took that part out of the paper but you know i almost found myself labeled as a non-theorist right by this uh and um i could have like succumbed to that and said okay well i guess my job is to just go on and do experiments right but but uh that's not what i wanted to do and and so when i when i got to my assistant professorship um although i continued to do experiments because i knew i had to get some papers out i also at the end of my first year submitted my first article to psychological review which was the theoretical journal where i took that section and elaborated it and wrote it up and submitted it to them and they didn't accept that either but they said oh this is interesting you should keep thinking about it this time and then that was what got me going to think okay you know so it's not a superhuman thing to contribute to the development of theory you know you don't have to be you can do it as a mere mortal and the broader i think lessons don't succumb to the labels of a particular or anybody labeling you right you know exactly i mean that yeah exactly and then you especially as you become successful you'll label labels get assigned to you for that you're successful for that connectionist cognitive scientist and not a neuroscientist and then you can you can completely that's just that's the stories of the past you're today a new person that can completely revolutionize and totally new areas so don't let those labels um hold you back well let me ask the big question um when you look at into you said it started with colombia trying to observe these humans and they're doing weird stuff and you want to know why are they doing this though so let's zoom out even bigger at the 100 plus billion people who've ever lived on earth why do you think we're all doing what we're doing what do you think is the meaning of it all the big why question we seem to be very busy doing a bunch of stuff and we seem to be kind of directed towards somewhere but why well um i myself think that we make meaning for ourselves and that um we find inspiration in the meaning that other people have made in the past uh you know and the great uh religious thinkers uh of the first millennium bc and you know a few few that came in the early part of the second uh millennium uh you know laid down some important foundations for us um but i i i do believe that you know we are uh an emergent uh result of a process that happened naturally without guidance and um that meaning is what we make of it and that the creation of uh efforts to refine meaning in um like religious traditions and so on is just a part of the expression of that of that goal that we have to you know not not find out what the meaning is but to make it ourselves and um so to me it's something that's very personal it's very individual it's like meaning will come for you through the particular combination of synergistic elements that are your fabric and your experience and your um context and your and um you know you should it's it's it it's all made in a in a certain kind of a local context though right it's what here i am at ucsd with this brilliant man rommel heart uh who's having you know these doubts about um symbolic artificial intelligence that resonate with my desire to see it grounded in the biology and um uh let's make the most of that you know yeah and so and so from that like little pocket there's some kind of uh peculiar little emergent process that then uh which is basically each one of us each one of us humans is a kind of you know you think cells and they come together and it's an emergent process that then tells fancy stories about itself and then gets just like you said just enjoys the beauty of the stories we tell about ourselves it's an emergent process that lives for time uh is defined by its local pocket and context uh in time and space and then tells pretty stories and we write those stories down and then we celebrate how nice the stories are and then it continues because we build stories on top of each other and eventually we'll colonize hopefully other planets other solar systems other galaxies and will tell even better stories but all starts uh here on earth jay year speaking of uh peculiar emerging processes that lived one heck of a story you're you're one of the the great scientists of cognitive uh science of psychology of computation it's a huge honor you would talk to me today that you spend your very valuable time i really enjoy talking with you and thank you for all the work you've done i can't wait to see what you do next well thank you so much and i uh you know this has been an amazing opportunity for me to let ideas that i've never fully expressed before come out because you asked such a wide range of um you know the deeper questions that we're all we've all been thinking about for so long so thank you very much for that thank you thanks for listening to this conversation with jay mcclelland to support this podcast please check out our sponsors in the description and now let me leave you with some words from jeffrey hinton in the long run curiosity driven research works best real breakthroughs come from people focusing on what they're excited about thanks for listening and hope to see you next time you