Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs | Lex Fridman Podcast #426
F3Jd9GI6XqE • 2024-04-17
Transcript preview
Open
Kind: captions Language: en naively I certainly thought that all humans would have words for exact counting uh and the Paha don't okay so they don't have any words for even one there's not a word for one in their language and so there's certainly not a word for two three or four so that kind of blows people's minds often yeah that blowing my mind that's pretty weird how are you how are you going to ask I want two of those you just don't and so that's just not a thing you can possibly ask in the P it's not possible that is there is no words for that the following is a conversation with Edward Gibson or Ted as everybody calls him he is a psycho Linguistics professor in MIT he heads the MIT language lab that investigates why human languages look the way they do the relationship between cultureal language and how people represent process and learn language also he should have a book titled syntax a cognitive approach published by MIT press coming out this fall so look out for that this is Alex rman podcast to support it please check out our sponsors in the description and now dear friends here's Edward Gibson when did you first become fascinated with human language as a kid in school when we had to structure sentences in English grammar I I I found that process interesting I found it confusing as to what it was I was told to do I didn't didn't didn't understand what the theory was behind it but I found it very interesting so when you look at grammar you're almost thinking about like a puzzle like almost like a mathematical puzzle yeah I think that's right I didn't know I was going to work on this at all at that point I was really just I was kind of a math geek person computer scientist I really liked computer science and then I found language as a a neat puzzle to work on from an engineering perspective actually that's what I as a I I sort of accidentally well I decided after I finished my undergraduate degree which was computer science and math and Canada and Queens University I decided to go to grad school it's like that's what I always thought I would do and I went to Cambridge where they had a master's in a master's program in computational linguistics and I hadn't taken a single language class before all I had taken was CS computer science math classes pretty much mostly as an undergrad and I just oh this was an interesting thing to do for a year because it was a single year program and um then I ended up spending my whole life doing it so fundamentally your journey through life was one of a mathematician and a computer scientist and then you kind of discovered the puzzle the problem of language and approached it from that angle uh to try to understand it from that angle almost like a mathematician or maybe even an engineer as an engineer I'd say I mean to be frank I had taken an AI class I guess it was 83 or 84 5 somewhere 84 in there a long time ago and there was a natural language section in there and it didn't impress me I thought there must be more interesting things we can do didn't it didn't seem very it seemed just a bunch of uh hacks to me it didn't seem like a real theory of things in any way and so I just thought this was this seemed like an interesting area where there wasn't enough good work did you ever come across like the the philosophy angle of logic so if you think about the 80s with AI the expert systems where you try to kind of uh maybe sidestep the The Poetry of language and some of the syntax and the grammar and all that kind of stuff and go to the underlying meaning that language is trying to communicate and try to somehow compress that in a computer representable way did you ever come across that in your studies I mean I probably did but I wasn't as interested in it I was I was trying to do the easier problems first the ones I could thought maybe were handleable which is seems like the syntax is easier like which is just the forms as opposed to the meaning like you're talking when you're starting talking about the meaning that's very hard problem and it's still is a really really hard problem but the forms is is easier and so I thought at least figuring out the forms of human language which sounds really hard but is actually maybe more attractable so it's interesting you think there is a big divide there's a gap there's a distance between form and meaning because that's a question you have discussed a lot with llms mhm because they're damn good at form yeah I think that's what they're good at is form exactly and that's that's why they're good because they can do form meanings hard do you think there's oh wow and I mean it's an open question right yeah how close form and meaning are we'll discuss it but I to me studying form maybe it's a romantic notion gives you form is like the shadow of the the bigger meaning thing underlying language CU I it form is is language is how we communicate ideas we communicate with each other using language so in understanding the structure of that communication I think you start to understand the structure of thought and the structure of meaning behind those thoughts and communication to me but to you big gap yeah what do you find most beautiful about human language maybe the form of human language the expression of human language what I find beautiful about human language is the uh some of the generalizations that um happen across the human languages within and across a language so let me give you an example of something which I find kind of remarkable that is if like a language if it has um a word order such that the verbs tend to come before they're objects and so that's like English does that so we have the the first the subject comes first in a in a simple sentence so I say uh you know the the dog chased the cat or or Mary kicked the ball so the subject's first the and then after the subject there's the verb and then we have objects all these things come after in English so it's a it's generally a verb and most of the stuff that we want to say comes after the subject it's comes it's the it's the objects there's a lot of things we want to say that come after and and and there's a lot of languages like that about 40% of the languages of the world are look like that they're um sub subject verb object languages and then um these languages tend to have um prepositions these little markers on the nouns that that connect nouns to other nouns or nouns to verbs so I when I so verb like sorry preposition like in or on or of or about I say I talk about something the something is the object of that preposition that we have these little markers come also just like verbs they come before their their nouns okay and then so now we look at other languages that like Japanese or or Hindi or some these are these are so-called verb final languages those as about maybe a little more than 40% maybe 45% of the world's languages or more I mean 50% of the world's languages are verb final those tend to be um post positions those markers the same we have the states have the same kinds of markers as we do in English but they put them after so uh uh sorry they put them uh first the markers come first so you say instead of um you know talk about a book you say book about the opposite order there in Japanese or in Hindi you do the opposite and and the talk comes at the end so the verb will come at the end as well so instead of um Mary kicked the ball it's Mary uh Ball kicked and then uh says Mary kicked the ball to John it's John two the two little the marker there uh the preposition it's a postposition in these languages and so the interesting thing fascinating thing to me is that within a language this order aligns it's harmonic and so if it's one or the other it's either verb initial or verb final but then you then you'll have prepositions prepositions or postpositions and so that and that's across the languages that we we can look at we' got around a thousand languages for for there's around 7,000 languages around on on the Earth right now uh but we have information about say word order on around a thousand of those pretty decent amount of information and for those thousand which we know about um about 95% fit that pattern so they will have either verb so about it's about half and half half a verb initial like English and half a verb final like um like Japanese so just to clarify verb initial is subject verb object that's correct verb final is still subject object verb that's correct yeah the subject is generally first that's so fascinating I ate an apple or I Apple at yes okay and this fascinating that there's a pretty even division in the world amongst those 40 45% yeah it's pretty it's pretty even and and those two are the most common by far those two word ARS the subject tends to be first there's so many interesting things but these things are the thing I find so fascinating is there are these generalizations within and across a language and and not only those are the and there's actually a simple explanation I think for a lot of that and that is um you're trying to like minimize dependencies between words that's basically the story I think behind a lot of why word order looks the way it is is you we're always connecting what is it what is the thing I'm telling you I'm I'm talking to you in sentences you're talking to me in sentences these are sequences of Words which are connected and the connections are dependencies between the words and and it turns out that what we what we're trying to do in a language is actually minimize those dependency links it's easier for me to say things if the words that are connecting for their meaning are close together it's easier for you in understanding if that's also true if they're far away it's it's hard as to produce produce that and it's hard for you to understand and the languages of the world within a language and across languages you know fit that generalization which is you know so I you know it turns out that having verbs initial and then having prepositions ends up making dependencies shorter and and having verbs final and having postpositions ends up making dependency shorter then if you cross them if if you cross themit ends up you just end up it's possible you can do it it mean within a language within a language you can do it it just ends up with longer dependencies than if you didn't so languages tend to go that way they tend to minim they say they call it harmonic so it was observed a long time ago by uh without the explanation by a guy called Joseph Greenberg who's a um famous typologist from Stanford he observes a lot of generalizations about how word order works and these are some of the harmonic generalizations that he observed harmonic generalizations about word word order there's so many things I want to ask you okay let me uh just sometimes Basics you you mentioned dependencies a few times yeah what do you mean by dependencies well what I mean is in um in language there's kind of three structures to three components to the structure of language one is the sounds so cat is C and T in English I'm not talking about that part I'm talking then there's two meaning parts and those are the words and and you were talking about meaning earlier so words have a form and they have a meaning associated with them and so cat is a full form in English and it has a meaning associated with whatever a cat is and then the combinations of words uh that's what I'll call grammar or syntax and uh that's like when I have a combination like the cat or two cats okay so uh where I take two different words there and put them together and I get a compositional meaning from putting those two different words together and and so that's the syntax and in any sentence or utterance whatever I'm talking to you you're talking to me we have a bunch of words and we're putting together in a sequence they it turns out they are connected so that every word is connected to just one other word in that in that sentence and so you end up with what's what's called technically a tree it's a tree structure so there where there's a root of that of that utterance of that sentence and then there's a bunch of dependence like branches from that root that go down to the words the words are the leaves in this metaphor for a tree so a tree is also sort of a mathematical construct a graph theoretical thing graph Theory thing uh so in the it's fascinating that you can break down a sentence into a tree and then one every word is hanging on to another this depending on right and and everyone agrees on that so all linguists will agree with that no one not controversial that is not controversial there's nobody sitting here mad at you I don't think so okay there's no linguist sitting there mad at this I think every language I think everyone agrees that all sentences are trees at some level can I pause on that cuz it it's to me just as a Layman it uh it's surprising yeah that you can break down sentences in many most all languages all languages I into a tree I think so that's weird I I've never heard of anyone disagreeing with that that's weird the details of the trees are what people disagree about well okay so what's uh what's at the root of a how do you conru construct how hard is it what is the process of constructing a tree from a sentence uh well this is where you know depending on what you're there's different theoretical Notions I'm going to say the simplest thing the pendency grammar it's like a bunch of people invented this tenier was the first French guy back in I mean the paper was published in 1959 but he was working on the 30s and stuff so and and it goes back to uh you know philologist Pini was doing this in ancient uh India okay and so you know doing something like this the simplest thing we can think of is that there's just connections between the words to make the the utterance and so just say I have like two dogs entered a room okay here's a sentence and so uh we're connecting two and dogs together that's like there's some dependency between those words to make some bigger meaning and then we're connecting dogs now to uh entered right and we connect a room somehow to entered and so I'm going to connect uh to room and then room back to enter is that's the tree is I that the root is entered that's the the thing is like an entering event that's what we're saying here and the the subject which is whatever that dog is is two dogs it was and and the connection goes back to dogs which goes back to then that that goes back to two I'm just that that's my tree it it starts at entered goes to dogs down to two and on the other side after the verb the object it goes to room and then that goes back to the the determiner or article whatever you want to call that word uh so there's a bunch of categories of words here we're noticing so there are verbs those are these things that typically Mark uh they refer to events and states in the world and they're nouns which typically refer to people places and things is what people say but they can refer to other more they can refer to events themselves as well they're they're they're marked by you know how they how they get you what the category the part of speech of a word is how it gets used in language it's like that's how you decide what the what the category of a word is not not by the meaning but how it's how it gets used how it's used what's usually the root is it going to be the verb that defines the event usually usually yes yes okay yeah I mean if I don't say a verb then there won't be a verb and so it'll be something else what if you're messing are we talking about language that's like correct language what if you're doing poetry and messing with stuff is it then then rules go out the window right then it's no you're still no no no you're constrained by whatever language you're dealing with probably you have other constraints in poetry such that you're like usually in poetry there's multiple constraints that you want to like you want to usually convey multiple meanings is the idea and maybe you have like a rhythm or a rhyming structure as well and depending on so but you usually are constrained by your the rules of your language for the most part and so you don't violate those too much you can violate them somewhat but not too much so it has to be recognizable as your language like in English I can't say dogs to entered room ah I mean I meant the you know two dogs entered a room and I I I can't mess with the order of the the Articles the Articles and the nouns you just can't do that in some languages you can you can mess around with the order of words much more I mean you speak Russian Russian has a much Freer word order than English and so in fact you can move around words in you know I told you that English has the subject verb object word order so does Russian but Russian is much Freer than English and so you can actually mess around with the word order so probably Russian poetry is going to be quite different from English poetry because the word order is much less constrained yeah there's a much more extensive uh culture of poetry throughout the history of the last 100 years in Russia and I I always wondered why that is but it seems that there's more flexibility in the way the language is used there's more you're morphing the language Easier by altering the words altering the order of the words messing with it well you can just mess with different things in each language and so Russian you have case markers right on the end which is there these endings on the nouns which tell you how it connects each noun connects to the verb right we don't have that in English and so when I say um Mary kissed John I don't know who the agent or the patient is except by the order of the words right in in Russian you actually have a marker on the end if you're using a Russian name and each of those names you'll also say is it you know agent it'll be the uh you know nominative which is marking the subject or an accusative will Mark the object and you could put them in the reverse order you could put accusative first as you could put subject you could put um the patient first and then the verb and then the the the subject and that would be a perfectly good Russian sentence and it would still mean Mary I could say John kissed Mary meaning Mary kissed John with as long as I use the case markers in the right way you can't do that in English and so uh I love the terminology of agent and patient and uh and the other ones you used those are sort of linguistic terms correct those are those are for like kind of meaning those are meaning and and subject and object are generally used for position so subject is just like the thing that comes before the the verb and the object is one that comes after the verb the agent is kind of like the thing doing it that's kind of what that means right the subject is often the person doing the action right the thing so yeah okay this is fascinating so how hard is it to form a tree in general is there um is there a procedure to it like if you look at different languages is it supposed to be a very natural like is it aable or is there some human genius involved in I think it's pretty automatable at this point people can figure out the words are they can figure out the morphemes which are the technically morphemes are the the minimal meaning units within a language okay and so when you say eats or drinks it actually has two morphemes and in English there's there's the there's the root which is the verb and then there's some ending on it which tells you you know that's this third person uh third person singular say what mores are morphemes are just the minimal meaning units within a language and a word is just kind of the things we put spaces between English and they have a little bit more they have the morphology as well they have the endings this inflexal morphology on the endings on the roots they modify something about the word that adds additional meaning they tell you yeah yeah yeah and so we have a little bit of that in English very little much more in Russian for instance and and uh but we have a little bit in English and so we have a little on the on the nouns you can say it's either singular or plural and and you can say uh same thing for um for for verbs like simple past tense for example like you know notice in English we say drink drinks uh you know he drinks but everyone else is I drink you drink we drink it's unmarked in a way and then but in the past tense it's just drank there for everyone there's no morphology at all for past tense it's there is morphology it's marking past tense but it's kind of it's an irregular now so we don't even you know drink to drank you know it's not even a regular word so in most verbs many verbs there's an ed we kind of add so walk to walked we add that to say it's the past tense that I just happen to choose an irregular because it's a high frequency word and the high frequency words tend to have Irregulars in English for what's an irregular irregular it's just there's there isn't a rule so drink to drank is an is an irregular drink drank okay Asos to walk walked talk talked and there's a lot ofre Irregulars in English there's a lot of Irregulars in English the the the frequent ones the common words tend to be irregular the Le there's many many more um low frequency words and those tend to be those IR regular ones the evolution of the Irregulars are fascinating it's essentially slang that's sticky mhm cuz you're breaking the rules and then everybody use it and doesn't follow the rules yeah and they they say screw it to the rules it's fascinating so you said it mores lots of questions so morphology is what the study of morphemes morphology is the is the connections between the morphemes onto the Roots the Roots so in English we mostly have suffixes we have endings on the words not very much but a little bit and uh as opposed to prefixes some words depending on your language can have you know mostly prefixes mostly suffixes or mostly or or both and then even languages several languages have things called infixes where you have some kind of a uh General uh form for the for the root and you put stuff in the middle you change the vowels that's fascinating that is fascinating so wait so in general there's what two morphemes per word usually one or two or three well in English it's it's it's one or two in English it tends to be one or two there can be more you know in in other languages you know a lang language like uh like finish which has a very uh elaborate morphology there may be 10 morphemes on the end of a route okay and so there may be Mill there be millions of forms of a given word okay okay I I will ask the same question over and over but uh how does a just sometimes to understand things like morphemes it's nice to just ask the question how does these kinds of things evolve so you uh have a great book studying sort of the how how the cognitive processing how language used for communication so the the mathematical notion of how effective language is for communication what role that plays in the evolution of language but just high level like how do we how does a language evolve with where English is two morphemes or one or two mores per word and then Finnish has Infinity forward so what how does that how does that happen is it just that's a really good question yeah that's a very good question is like why do languages have more morphology versus less morphology and and I don't think we know the answer to this I don't I think there's just like a lot of good solutions to the problem of communication so I like I believe as you hinted that language is an invented system by humans for communicating their ideas and I think we it comes down to we label things we want to talk about those are the the the morphemes and words those are the things we want to talk about in the world and we invent those things and then uh we put them together in ways that are um easy for us to convey to process but that that that's like a naive View and I don't I mean I I think it's probably right right it's naive and probably right well I don't know if it's naive I think it's simple simple yeah I think naive is naive is an indication that it's an incorrect somehow it's a trivial to too simple I think it could very well be correct but it's interesting how sticky it feels like two people got together it just it just feels like once you figure out certain aspects of a language that just becomes sticky and the tribe forms around that language maybe the language maybe the tribe forms first and then the language evolves and then you just kind of agree and that you stick to whatever that is I mean these are very interesting questions we don't know really about how words even words get invented very much about you know we don't really I mean assuming they get invented they we don't really know how that process works and how these things evolve what we have is kind of a a current picture a current picture of few thousand languages a few thousand instances we don't have any pictures of really how these things are evolving really and and then the evolution is massively con you know uh confused by contact right so as soon as one language group one group runs into another we are smart hum are smart and they take on whatever is useful in the other group and so any kind of contrast which you're talking about which I find useful I'm going to I'm going to start using as well so I I worked a little bit in um in in specific areas of words in in number words and in in color words and in color words that so we have in English we have around 11 words that everyone knows for colors and uh and many more if you happen to uh be interested in color for some reason or other if you're a fashion designer or an artist or something you may have many many more words but we can see Millions like if you have normal color vision normal tri chometric color vision you can see millions of distinctions in colors so we don't have millions of words you know the most efficient no the most you know detailed color vocabulary would have over a million terms to distinguish all the different colors that we can see but of course we don't have that so it's somehow it's been it's kind of useful for English to have evolved in some way to there's 11 terms that people find useful to talk about you know black white red uh blue green yellow purple uh gray pink and I probably missed something there anyway uh there there's 11 that everyone knows yeah and um and depending on your and but you go to different cultures um especially the non-industrialized cultures and there'll be many fewer so some cultures will have only two believe it or not that the Dan I and in Papa New Guinea have only two labels that the that the group uses for color those are roughly black and white they are okay very very dark and very very light which are roughly black and white and you might think oh they're dividing the whole color space into you know light and dark or something and that's not really true they mostly just only label the light the black and the white things they just don't talk about the colors for the other ones and so and and then there's other groups I've worked with a group called The chimani down in um in Bolivia in South America and they have three words that everyone knows but there's a few others that are that that several people like that many people know and so they have me kind of depending at how you count between three and seven words that the group knows okay and uh and again they're they're black and white everyone knows those and red red is you like that tends to be the third word that everyone that that cultures bring in if there's a word it's always read the third one and then after that it's kind of all bets are off about what they bring in and so after that they they bring in a sort of a big blue green Spa gr gr they have one for that and then they have uh and then you know different people have different words that they'll use for other parts of the space and so anyway it's probably related to what they want to talk what they not what they not what they see because they see the same colors as we see so it's not like they have they don't they have a a weak a low color palette and the things they're looking at they're looking at a lot of beautiful scenery okay a lot of different colored uh flowers and berries and things and you know and so there's lots of things of very bright colors but they just don't label the color in those cases and the reason probably we we don't know this but we think probably what's going on here is that what you do why you label something is you need to talk to someone else about it and and why do I need to talk about a color well if I have two things which are identical and I want you to give me the one that's different and and the only way it varies is color then I invent a word which tells you uh you know this is the one I want so I want the red sweater off the rack not the not the green sweater right there's two and and so those those things will be identical ex because these are things we made and they're died and there there's nothing different about them and so in in industrialized Society we have you know everything everything we've got is pretty much arbitrarily colored uh but you go to non-industrialized group that's not true and so they don't re Sly they're not interested in color you you bring bright colored things to them they like them just like we like them bright colors are great they're beautiful they are but they just don't need to don't need to talk about them they don't have so probably color words is a good example of how language evolves from sort of function when you need to communicate the use of something I think so then then you kind of invent different variations and uh and basically you can imagine that the evolution of a language has to do with what the early tribe is doing like what what they want what what kind of problems they're facing them and they're quickly figuring out how to efficiently communicate uh the solution to those problems whether it's aesthetic or functional all that kind of stuff running away from a mammoth or whatever um but you know it's so so I think what you're pointing to is that we don't have data on the evolution of language because many languages have formed a long time ago so you don't get the chatter we have a little bit of like Old English to Modern English because there was a writing system and we can see how how old English looked so the word order changed for instance in Old English to Middle English to Modern English and so it you know we can see things like that but most languages don't even have a writing system so of the 7,000 only you know a small subset of those have a writing system and even if they have a writing system they it's not a very modern writing system and so they don't have it so we just basically have for Mandarin for Chinese we have a lot of a lot of evidence from from for a long time and for English and not for much else not for in German a little bit but not for a whole lot of like long-term um language Evolution we don't have a lot we just have snapshots is what we've got of current languages yeah I you get an inkling of that from the rapid communication on certain platforms like on Reddit there's different communities and they'll come up with different slang usually from my perspective during by a little bit of humor um or maybe mockery or whatever it's you know just talking and different kinds of ways and uh you could see the evolution of language there because um I think a lot of things on the internet you don't want to be the boring mainstream so you like want to deviate from the proper way of talking MH and so you get a lot of deviation like rapid deviation then when communities Collide you get like uh just like you said humans adapt to it and you can see it through L of humor I mean it's very difficult to study but you can imagine like 100 years from now well if there's a new language born for example will get really high resolution data on I mean English is changing English changes all the time all languages change all the time so you know there the famous um result about the queen's English so the que if you look at the Queen's vowels the queen's English is supposed to be you know originally the proper way for the talk was sort of defined by whoever the queen talked or the king whoever was in charge and uh and and so if you look at the how her vowels changed uh from when she be first became Queen in 1952 or 53 when she was car the first I mean that's Queen Elizabeth who's got who died recently of course uh until you know 50 years later her vowels changed her vowels shifted a lot and so that you know even in the sounds of British English in her the way she was talking was changing the vowels were changing slightly so that's just in the sounds there's change I don't know what's you know we're we're I'm interested we're all interested in what's driving any of these changes the the word order of English changed a lot over Thousand Years right so it used to look like German you know it looks it used to be a verb final language with case marking and it shifted to a verb medial language a lot of contact so a lot of contact with French and it became a verb medial language with no case marking and so it became this you know verb verb initially thing so and so that's evolving we it totally evolved and so it may very well I mean you know it doesn't evolve maybe very much in 20 years is maybe what you're talking about but over 50 and 100 years things change a lot I I think will now have good data on it which is great that's for sure um can you talk to what is syntax and what is grammar so you wrote a book on syntax I did you were asking me before about what you know how do I figure out what a dependency structure is I'd say the dependency structures aren't that hard to generally I think there's a lot of agreement of what they of what they are for almost any sentence in in most languages I think people will agree on a lot of that there are other parameters in the mix such that some people think there's a more complicated grammar than just a dependency structure and so you know like n chsky he's the most famous linguist ever uh and he he is famous for proposing a a a slightly more complicated syntax and so he he invented phrase structure grammar so he's um well known for many many things but in the 50s in early 60s like but late 50s he was basically figuring out what's called formal language Theory so and he uh figured out sort of a framework for figuring out how complicated langu you know a certain type of language might be so-called phrase structured grammars of language might be and so he his his idea was that maybe we can we can think about the complexity of a language by how complicated the rules are okay and the rules will look like this they will have a left hand side and will have a right right hand side something on the left hand side will expand to the thing on the right hand side so we'll say we'll start with an a an S which is like the root which is an a sentence okay and then we're going to expand to things uh like a noun phrase and a verb phrase is what he would say for instance okay an S goes to an NP and a VP is a kind of a phrase structure Rule and then and we figure out what an NP is an NP is a a a determiner and a noun for instance and a verb verb phrase is something else is a verb and another noun phrase and another npce for instance those are the rules of a very simple phrase structure okay and and so he he proposed phrase structure grammar as a way to sort of cover human languages and then he actually figured out that well depending on the formalization of those grammars you might get more complicated or less complicated languages so you could he could he said well you these are these are things called you know um context free languages that rule that he thought you know human languages tend to be what he calls context free languages um and but there are simpler languages which are so-called regular languages and they have a more a more constrained form to the rules of the of the phrase structure of of these particular rules so he he basically discovered and kind of invented ways to describe the language and and those are phrase those are phrase structure a human language and he was mostly interested in English initially in his his work in the 50s so a quick questions around all this so former language theory is The Big Field of just studying language formally yes and it doesn't have to be human language there we have computer languages any kind of system which is generating a uh a um some set of um expressions in a language and those could be like the the um you know the statements in a in a computer language for example so formal it could be that or it could be human language so technically you can study programming languages ab and have been been heavily studied using this formalism there there's a big field of programming languages within the formal language okay and then phrase structure grammar is this idea that you can break down language into this s npvp it's a particular formalism for describing language okay so and chsky was the first one he's the one who figured that stuff out back in the 50s and and and but he and and that's equivalent actually the this the context free grammar is actually is kind of equivalent in the sense that it generates the same sentences as a dependency grammar would you know as the dependency grammar is a little simpler in some way you just have a root and it goes like we don't have any of these the the rules are implicit I guess in and we just have connections between words the phrase structure grammar is a kind of a different way to think about the the dependency grammar it's slightly more complicated but it's kind of the same in some ways so to clarify dependency grammar is the framework under which you see language and you make the case that this is a good way to describe language that's correct and uh no Nome jsky is watching this is very upset right now so let's uh I'm just kidding but uh what's the difference between uh where's the the place of disagreement um between phrase structure grammar and dependency grammar they're they're very close so phrase structure grammar and dependency grammar aren't that aren't that far apart I I I like dependency grammar because it's more perspicuous it's more transparent about representing the connections between the words it's just a little harder to see in phrase structure grammar you know the the place where Chomsky sort of devolved or went off from from from this is he also thought there was um something called M okay and so so and that's where we disagree okay that's the place where I would say we disagree and and and I mean we maybe we'll get into that later but the idea is if you want to do you want me to explain that now I would love can you to explain movement movement okay so you're saying so many interesting things yeah yeah yeah okay so here's the movement is Chomsky basically sees English and he says okay I said um you know we had that sentence earlier like it was like two dogs enter the room it's changed a little bit say two dogs will enter the room and he notices that hey English if I want to make a question a yes no question from that same sentence I I say instead of two dogs will enter the room I say will two dogs enter the room okay there's a different way to to say the same idea and it's like well the auxiliary verb that will thing it's at the front as opposed to in the middle okay and so and he looked you know if you look at English you see that that's true for all those modal verbs and for other kinds of auxiliary verbs in English you always do that you always put an auxiliary verb at the front and and what he when he saw that so you know if I say um I can win this bet can I win this bet right so I move a can to the front so actually that's a theory I just gave you a theory there I he he talks about it as movement that word in the thinks the declarative is the root is is the sort of default way to think about the sentence and you move the auxiliary verb to the front that's a movement Theory okay and he he just thought that was just so obvious that it must be true that that that there's nothing more to say about that that this is how auxiliary verbs work in English there's a movement rule such that you're move like to get from the declarative to the interrogative you're moving the auxiliary to the front and it's a little more complicated as soon as you go to simple simple present and simple past because you know if I say you know John slept you have to say did JN sleep not slept John right and so it's you have to somehow get an auxiliary verb and I guess underlyingly it's like slept is it's a little more complicated than that but his that's his idea there's a movement okay and and and so a different way to think about that that isn't I mean the then then he ended up showing later so he proposed this theory of grammar which has movement there's other places where he thought there's movement not just auxiliary verbs but things like the passive in English and things like um questions wh questions a bunch of places where he thought there's also movement going on and and in each each one of those these things there's words well phrases and words are moving around from one structure to another what you call Deep structure to surface structure I mean there's like two different structures in his in his theory okay um there's a different way to think about this um which is there's no movement at all there's a lexical copying rule such that the word will or the word can these these auxiliary verbs they just have two forms and and and one of them is the declarative and one of them is interrogative and you basically have the declarative one and oh I form the interrogative or I can form one from the other it doesn't matter which direction you go and and I just have a new entry which has the same meaning which has a slightly different argument structure argument structure just a fancy word for The Ordering of the words and so if I say you it was um the the dogs two dogs can or will enter the room the the there's two forms of will one is Will declarative and and then okay I've got my subject to the left it comes before me and the verb comes after me in that one and then the will interrogative it's like oh I go first interrogative will is first and then have the subject immediately after and then the verb after that and so you just you can just generate from one of those words another word with a slightly different argument structure with different ordering and these are just lexical copies they they're not necessarily moving from one to another there's no movement there's a romantic notion that you have like one main way to use a word and then you could move it around right right which is essentially what movement is implying yeah but that's that's the lexical copying is similar so then so then then we we do Lex copying for that same idea that maybe the declarative is the source and then we can copy it and so an advantage uh for there's multiple advantages of the lexical copying story it's not my story this is like um Ivan SG linguists a bunch of linguists have been proposing these stories as well you know in tandem with the movement story okay you know he's he Ivan soag died a while ago but he was a one of the proponents of the non-movement of the lexical copying story and so that is that um a great Advantage is well Chomsky really famously in 1971 showed that the movement story leads to learnability problems it leads it leads to problems for for how language is learned it's really really hard to figure out what the underlying structure of a language is if you have both phrase structure and movement it's like really hard to figure out what came from what there's like a lot of possibilities there if you don't have that problem learning that learning problem gets a lot easier say there's lexical copies and when we say the learning problem do you mean like humans learning a new language yeah just learning English so baby is lying around listening to the crib listening to me talk and is you know how are they learning English or or you know maybe it's a 2-year-old who's learning you know interrogatives and stuff or one you know there you how are they doing that are they doing it from like are they figuring out or like know so Chomsky said it's impossible to figure it out actually he said it's actually impossible not not hard but impossible MH and therefore that's that that's where Universal grammar comes from is that it has to be built in and so what they're learning is uh that there there's some built-in movement is built in in his story is absolutely part of your language module and uh and then you are you're just setting parameters you're you're said depending on English is just sort of a variant of the universal grammar and you're figuring out oh which orders do does English do these things that's the the non-movement story doesn't have this it's like much more bottom up uh you're you're learning rules you're learning rules one by one and oh there's this this word is connected to that word a great advant another Advantage it's learnable another advantage of it is that it predicts that not all auxiliaries might move like it it might depend on the word depending on whether you and and and that turns out to be true so there's words that um that don't really work as auxiliary you they work in declarative and not in in interrogative so I can say um I'll give you the opposite first if so I can say aren't I invited to the party okay and that's an that's an interrogative form but it's not from I aren't invited to the party there is no I aren't right so that's that's interrogative only and and then we also have forms like um ought uh I I ought to do this and and I guess some British old British people can say exactly it doesn't sound right does it for me it sounds ridiculous I don't even think a is great but I mean I totally recognize I ought to I is not too bad actually I can say I ought to do this that sounds if I'm trying to sound sophisticated maybe I don't know it just sounds completely out to me I yeah anyway it's so there are variance here uh and a lot of these words just work in one versus is the other and and that's like fine under the lexical copying story it's like well you just learn the usage whatever the usage is is what you is what you do with this with with this word but um it doesn't it's a little bit harder in the movement story The Movement story like that's an advantage I think of lexical copying in all these different places there's there's all these usage variants which make the movement story um a little bit harder to work so one of the main divisions here is the movement Story versus the C story that has to do about the auxiliary warts and so on but you if rewind to the phrase structured grammar yeah versus dependency grammar those are equivalent in some sense in that for any dependency grammar I can generate a dependence a phrase structure grammar which generates exactly the same sentences I just I just like the dependency grammar uh formalism because it makes something really Salient which is the depend the the lengths of dependencies between Words which isn't so obvious in in the phrase in the phrase structure it's just kind of hard to see it's in there it's just very very it's opaque uh technically I think phrase structure grammar is mappable to dependency grammar and vice versa and vice versa yeah there's like these like little labels SN PVP yeah for a particular dependency grammar you can make a phrase structure grammar which generates exactly those same sentences and vice versa but there are many phrase structure grammars which you can't really make a dependency grammar I mean there you can do a lot more in a phrase structure grammar you get many more of these extra nodes basically you you can have more structure in there uh and and some people like that and and maybe there's value to that I I I don't like it well for you so we should clarify so so dependency grammar it's just uh well one word depends on only one other word and you form these trees and that makes it really puts priority on those dependencies just like as a as a tree that you can then measure the distance of the dependency from one word to the other they can then map to uh the cognitive processing of the of these sentences how well how easy it is to understand and all that kind of stuff so it just puts the focus on just like the mathematical um uh distance of dependence between words so like it's just a different Focus absolutely Ju Just continue on a thread of chsky because it's really interesting because it as you're discussing disagreement to the degree there's disagreement you're also telling the history of the study of language which is really awesome so you mention context free versus regular does that distinction come into play for the peny grammar no okay not at all I mean the regular regular languages are too simple for human languages they they are uh they it's a part of the hierarchy but human languages are in in the phrase structure world are definite they they're at least context free maybe a little bit more a little bit harder than that but uh so there's something called context sensitive as well where you can have like this is the just the formal language description in in a context free grammar you have one this is like a bunch of like formal language Theory we're doing here but I love it okay so you have you have a left- hand side category and you're expanding
Resume
Categories