Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity | Lex Fridman Podcast #452
ugvHCXCOmm4 • 2024-11-11
Transcript preview
Open
Kind: captions Language: en if you extrapolate the curves that we've had so far right if if you say well I don't know we're starting to get to like PhD level and and last year we were at undergraduate level and the year before we were at like the level of a high school student again you can you can quibble with at what tasks and for what we're still missing modalities but those are being added like computer use was added like image generation has been added if you just kind of like eyeball the rate at which these capabilities are increasing it does make you think that we'll get there by 2026 or 2027 I think there are still worlds where it doesn't happen in in a 100 years those world the number of those worlds is rapidly decreasing we are rapidly running out of truly convincing blockers truly compelling reasons why this will not happen in the next few years the scale up is very quick like we we do this today we make a model and then we deploy thousands maybe tens of thousands of instances of it I think by the time you know certainly within two to three years whether we have these super powerful AIS or not ERS are going to get to the size where you'll be able to deploy millions of these I am optimistic about meaning I worry about economics and the concentration of power that's actually what I worry about more the abuse of power and AI increases the amount of power in the world and if you concentrate that power and abuse that power it can do immeasurable damage yes it's very frightening it's very it's very frightening the following is a conversation with Dario amade CEO of anthropic the company that created Claude that is currently and often at the top of most llm Benchmark leader boards on top of that Dario and the anthropic team have been outspoken advocates for taking the topic of AI safety very seriously and they have continued to publish a lot of fascinating AI research on this and other topics I'm also joined afterwards by two other brilliant people from propic first Amanda ascal who is a researcher working on alignment and fine-tuning of Claude including the design of claude's character and personality a few folks told me she has probably talked with Claude more than any human at anthropic so she was definitely a fascinating person to talk to about prompt engineering and practical advice on how to get the best out of Claude after that chrisa stopped by for chat he's one of the pioneers of the field of mechanistic interpretability which is an exciting set of efforts that aims to reverse engineer neural networks to figure out what's going on inside inferring behaviors from neural activation patterns inside the network this is a very promising approach for keeping future super intelligent AI systems safe for example by detecting from the activations when the model is trying to deceive the human it is talking to this is Alex Freedman podcast to support it please check out our sponsors in the description and now dear friends here's Dario amade let's start with a big idea of scaling laws and the scaling hypothesis what is it what is its history and where do we stand today so I can only describe it as it you know as it relates to kind of my own experience but I've been in the AI field for about uh 10 years and it was something I noticed very early on so I first joined the AI world when I was uh working at BYU with Andrew in in late 2014 which is almost exactly 10 years ago now and the first thing we worked on was speech recognition systems and in those days I think deep learning was a new thing it had made lots of progress but everyone was always saying we don't have the algorithms we need to succeed you know we we we we're we're not we're only matching a tiny tiny fraction there's so much we need to kind of discover algorithmically we haven't found the picture of how to match the human brain uh and when you know in some ways was fortunate I was kind of you know you can have almost beginner's luck right I was like a a newcomer to the field and you know I looked at the neural net that we were using for speech the recurrent neural networks and I said I don't know what if you make them bigger and give them more layers and what if you scale up the data along with this right I just saw these as as like independent dials that you could turn and I noticed that the model started to do better and better as you gave them more data as you as you made the models larger as you trained them for longer um and I I didn't measure things precisely in those days but but along with with colleagues we very much got the informal sense that the more data and the more compute and the more training you put into these models the better they perform and so initially my thinking was hey maybe that is just true for speech recognition systems right maybe maybe that's just one particular quirk one particular area I think it wasn't until 2017 when I first saw the results from gpt1 that it clicked for me that language is probably the area in which we can do this we can get trillions of words of language data we can train on them and the models we were training in those days were tiny you could train them on one to eight gpus whereas you know now we train jobs on tens of thousands soon going to hundreds of thousands of gpus and so when I when I saw those two things together um and you know there were a few people like ilaser who who you've interviewed who had somewhat similar reviews right he might have been the first one although I think a few people came to came to similar views around the same time Right There Was You Know Rich Sutton's bitter lesson there was gur wrote about the scaling hypothesis but I think somewhere between 2014 and 2017 was when it really clicked for me when I really got conviction that hey we're going to be able to do these incredibly wide cognitive tasks if we just if we just scale up the models and at at every stage of scaling there are always arguments and you know when I first heard them honestly I thought probably I'm the one who's wrong and you know all these all these experts in the field are right they know the situation better better than I do right there's you know the Chomsky argument about like you can get syntactics but you can't get semantics there's this idea oh you can make a sentence make sense but you can't make a paragraph makes sense the latest one we have today is uh you know we're going to run out of data or the data isn't high quality enough or models can't reason and and each time every time we manage to we manage to either find a way around or scaling just is the way around um sometimes it's one sometimes it's the other uh and and so I'm now at this point I I I still think you know it's it's it's always quite uncertain we have nothing but inductive inference to tell us that the next few years are going to be like the next the last 10 years but but I've seen I've seen the movie enough times I've seen the story happen for for enough times to to really believe that probably the scaling is going to continue and that there's some magic to it that we haven't really explained on a theoretical basis yet and of course the scaling here is bigger networks bigger data bigger compute yes all in in particular linear scaling up of bigger networks bigger training times and uh more and and more data uh so all of these things almost like a chemical reaction you know you have three ingredients in the chemical reaction and you need to linearly scale up the three ingredients if you scale up one not the others you run out of the other reagents and and the reaction stops but if you scale up everything everything in series then then the reaction can proceed and of course now that you have this kind of empirical scienceart you can apply it to other uh more nuanced things like scaling laws applied to interpretability or scaling laws applied to posttraining or just seeing how does this thing scale but the big scaling law I guess the underlying scaling hypothesis has to do with big networks Big Data leads to intelligence yeah we've we've documented scaling laws in lots of domains other than language right so uh initially the the paper we did that first showed it was in early 2020 where we first showed it for language there was then some work late in 2020 where we showed the same thing for other modalities like images video text to image image to text math they all had the same pattern and and you're right now there are other stages like posttraining or there are new types of reasoning models and in in in all of those cases that we've measured we see similar similar types of scaling laws a bit of a philosophical question but what's your intuition about why bigger is better in terms of network size and data size why does it lead to more intelligent models so in my previous career as a as a biophysicist so I did physics undergrad and then biophysics in in in in grad school so I think back to what I know as a physicist which is actually much less than what some of my colleagues at anthropic have in terms of in terms of expertise in physics uh there's this there's this concept called the one over F noise and one overx distributions um where where often um uh you know just just like if you add up a bunch of natural processes you get gaussian if you add up a bunch of kind of differently distributed natural processes if you like if you like take a take a um probe and and hook it up to a resistor the distribution of the thermal noise in the resistor goes as one over the frequency um it's some kind of natural convergent distribution uh and and I I I I and and I think what it amounts to is that if you look at a lot of things that are that are produced by some natural process that has a lot of different scales right not a gaussian which is kind of narrowly distributed but you know if I look at kind of like large and small fluctuations that lead to lead to electrical noise um they have this decaying 1 overx distribution and so now I think of like patterns in the physical world right if I if or or in language if I think about the patterns in language there are some really simple patterns some words are much more common than others like the' then there's basic noun verb structure then there's the fact that you know you know nouns and verbs have to agree they have to coordinate and there's the higher level sentence structure then there's the Thematic structure of paragraphs and so the fact that there's this regressing structure you can imagine that as you make the networks larger first they capture the really simple correlations the really simple patterns and there's this long taale of other patterns and if that long taale of other patterns is really smooth like it is with the one over F noise in you know physical processes like like like resistors then you could imagine as you make the network larger it's kind of capturing more and more of that distribution and so that smoothness gets reflected in how well the models are at predicting and how well they perform language is an evolved process right we've we've developed language we have common words and less common words we have common expressions and less common Expressions we have ideas cliches that are expressed frequently and we have novel ideas and that process has has developed has evolved with humans over millions of years and so the the the guess and this is pure speculation would be would be that there is there's some kind of longtail distribution of of of the distribution of these ideas so there's the long tail but also there's the height of the hierarchy of Concepts that you're building up so the bigger the network presumably you have a higher capacity to exactly if you have a small Network you only get the common stuff right if if I take a tiny neural network it's very good at understanding that you know a sentence has to have you know verb adjective noun right but it's it's terrible at deciding what those verb adjective and noun should be and whether they should make sense if I make it just a little bigger it gets good at that then suddenly it's good at the sentences but it's not good at the paragraphs and so the these these rare and more complex patterns get picked up as I add as I add more capacity to the network well the natural question then is what's the ceiling of this like how complicated and complex is the real world how much of stuff is there to learn I don't think any of us knows the answer to that question um I my strong Instinct would be that there's no ceiling below level of humans right we humans are able to understand these various patterns and so that that makes me think that if we continue to you know scale up these these these models to kind of develop new methods for training them and scaling them up uh that will at least get to the level that we've gotten to with humans there's then a question of you know how much more is it possible to understand than humans do how much how much is it possible to be smarter and more perceptive than humans I I would guess the answer has has got to be domain dependent if I look at an area like biology and you know I wrote this essay Machines of Loving Grace it seems to me that humans are struggling to understand the complexity of biology right if you go to Stanford or to Harvard or to Berkeley you have whole Departments of you know folks trying to study you know like the immune system or metabolic pathways and and each person understands only a tiny bit part of it specializes and they're struggling to combine their knowledge with that of with that of other humans and so I have an instinct that there's there's a lot of room at the top for AIS to get smarter if I think of something like materials in the in the physical world or you know um like addressing you know conflicts between humans or something like that I mean you know it it may be there's only some of these problems are not intractable but much harder and and it it may be that there's only there's only so well you can do with some of these things right just like with speech recognition there's only so clear I can hear your speech so I think in some areas there may be ceilings in in in you know that are very close to what humans have done in other areas those ceilings may be very far away and I think we'll only find out when we build these systems uh there's it's very hard to know in advance we can speculate but we can't be sure and in some domains the ceiling might have to do with human bureaucracies and things like this as you're right about yes so humans fundamentally have to be part of the loop that's the cause of the ceiling not maybe the limits of the intelligence yeah I think in many cases um you know in theory technology could change very fast for example all the things that we might invent with respect to biology um but remember there's there's a you know there's a clinical trial system that we have to go through to actually administer these things to humans I think that's a mixture of things that are unnecessary and bureaucratic and things that kind of protect the Integrity of society and the whole challenge is that it's hard to tell it's hard to tell what's going on uh it's hard to tell which is which right my my view is definitely I think in terms of drug development we my view is that we're too slow and we're too conservative but certainly if you get these things wrong you know it's it's possible to to to risk people's lives by by being by being by being too Reckless and so at least at least some of these human institutions are in fact protecting people so it's it's all about finding the balance I strongly suspect that balance is kind of more on the side of pushing to make things happen faster but there is a balance if we do hit a limit if we do hit a Slowdown in the scaling laws what do you think would be the reason is it compute limited data limited uh is it something else idea limited so a few things now we're talking about hitting the limit before we get to the level of of humans and the skill of humans um so so I think one that's you know one that's popular today and I think you know could be a limit that we run into I like most of the limits I would bet against it but it's definitely possible is we simply run out of data there's only so much data on the internet and there's issues with the quality of the data right you can get hundreds of trillions of words on the internet but a lot of it is is repetitive or it's search engine you know search engine optimization driil or maybe in the future it'll even be text generated by AIS itself uh and and so I think there are limits to what to to what can be produced in this way that said we and I would guess other companies are working on ways to make data synthetic uh where you can you know you can use the model to generate more data of the type that you have that you have already or even generate data from scratch if you think about uh what was done with uh deep mines Alpha go zero they managed to get a bot all the way from you know no ability to play Go whatsoever to above human level just by playing against itself there was no example data from humans required in the the alphao zero version of it the other direction of course is these reasoning models that do Chain of Thought and stop to think um and and reflect on their own thinking in a way that's another kind of synthetic data coupled with reinforcement learning so my my guess is with one of those methods we'll get around the data limitation or there may be other sources of data that are that are available um we could just observe that even if there's no problem with data as we start to scale models up they just stop getting better it's it seemed to be a a reliable observation that they've gotten better that could just stop at some point for a reason we don't understand um the answer could be that we need to uh you know we need to invent some new architecture um it's been there have been problems in the past with with say numerical stability of models where it looked like things were were leveling off but but actually you know know when we when we when we found the right Unblocker they didn't end up doing so so perhaps there's new some new optimization method or some new uh Technique we need to to unblock things I've seen no evidence of that so far but if things were to to slow down that perhaps could be one reason what about the limits of compute meaning uh the expensive uh nature of building bigger and bigger data centers so right now I think uh you know most of the Frontier Model companies I would guess are are operating you know roughly you know $1 billion scale plus or minus a factor of three right those are the models that exist now or are being trained now uh I think next year we're going to go to a few billion and then uh 2026 we may go to uh uh you know above 10 10 10 billion and probably by 2027 their Ambitions to build hundred hundred billion dollar uh hundred billion dollar clusters and I think all of that actually will happen there's a lot of determination to build the compute to do it within this country uh and I would guess that it actually does happen now if we get to 100 billion that's still not enough compute that's still not enough scale then either we need even more scale or we need to develop some way of doing it more efficiently of Shifting The Curve um I think be between all of these one of the reasons I'm bullish about powerful AI happening so fast is just that if you extrapolate the next few points on the curve we're very quickly getting towards human level ability right some of the new models that that we developed some some reasoning models that have come from other companies they're starting to get to what I would call the PHD or professional level right if you look at their their coding ability um the latest model we released Sonet 3.5 the new or updated version it gets something like 50% on sbench and sbench is an example of a bunch of professional real world software engineering tasks at the beginning of the year I think the state-of-the-art was three or 4% so in 10 months we've gone from 3% to 50% on this task and I think in another year we'll probably be at 90% I mean I don't know but might might even be might even be less than that uh we've seen similar things in graduate level math physics and biology from Models like open AI 01 uh so uh if we if we just continue to extrapolate this right in terms of skill skill that we have I think if we extrapolate the straight curve Within a few years we will get to these models being you know above the the highest professional level in terms of humans now will that curve continue you've pointed to and I've pointed to a lot of reasons why you know possible reasons why that might not happen but if the if the extrapolation curve continues that is the trajectory we're on so anthropic has several competitors it'd be interesting to get your sort of view of it all open aai Google xai meta what does it take to win in the broad sense of win in the space yeah so I want to separate out a couple things right so you know anthropics anthropic mission is to kind of try to make this all go well right and and you know we have a theory of change called race to the top right race to the top is about trying to push the other players to do the right thing by setting an example it's not about being the good guy it's about setting things up so that all of us can be the good guy I'll give a few examples of this early in the history of anthropic one of our co-founders Chris Ola who I believe you're you're interviewing soon you know he's the co-founder of the field of mechanistic interpretability which is an attempt to understand what's going on inside AI models uh so we had him and one of our early teams focus on this area of interpretability which we think is good for making models safe and transparent for three or four years that had no commercial application whatsoever it still doesn't today we're doing some early betas with it and probably it will eventually but uh you know this is a very very long research bed in one in which we've we've built in public and shared our results publicly and and we did this because you know we think it's a way to make models safer an interesting thing is that as we've done this other companies have started doing it as well in some cases because they've been inspired by it in some cases because they're worried that uh you know if if other companies are doing this that look more responsible they want to look more responsible too no one wants to look like the irresponsible ible actor and and so they adopt this they adopt this as well when folks come to anthropic interpretability is often a draw and I tell them the other places you didn't go tell them why you came here um and and then you see soon that there that there's interpretability teams else elsewhere as well and in a way that takes away our competitive Advantage because it's like oh they now others are doing it as well but it's good it's good for the broader system and so we have to invent some new thing that we're doing others aren't doing as well and the hope is to basically bid up bid up the importance of of of doing the right thing and it's not it's not about us in particular right it's not about having one particular good guy other companies can do this as well if they if they if they join the race to do this that's that's you know that's the best news ever right um uh it's it's just it's about kind of shaping the incentives to point upward instead of shaping the incentives to point to point downward and we should say this example the field of uh mechanistic interpretability is just a a rigorous non handwavy way of doing AI safety yes or it's tending that way trying to I mean I I think we're still early um in terms of our ability to see things but I've been surprised at how much we've been able to look inside these systems and understand what we see right unlike with the scaling laws where it feels like there's some you know law that's driving these models to perform better on on the inside the models aren't you know there's no reason why they should be designed for us to understand them right they're designed to operate they're designed to work just like the human brain or human biochemistry they're not designed for a human to open up the hatch look inside and understand them but we have found and you know you can talk in much more detail about this to Chris that when we open them up when we do look inside them we we find things that are surprisingly interesting and as a side effect you also get to see the beauty of these models you get to explore the sort of uh the beautiful n nature of large neural networks through the me turb kind ofy I'm amazed at how clean it's been I I'm amazed at things like induction heads I'm amazed at things like uh you know that that we can you know use sparse autoencoders to find these directions within the networks uh and that the directions correspond to these very clear Concepts we demonstrated this a bit with the Golden Gate Bridge clad so this was an experiment where we found a direction inside one of the the neural network layers that corresponded to the Golden Gate Bridge and we just turned that way up and so we we released this model as a demo it was kind of half a joke uh for a couple days uh but it was it was illustrative of of the method we developed and uh you could you could take the Golden Gate you could take the model you could ask it about anything you know you know it would be like how you could say how was your day and anything you asked because this feature was activated would connect to the Golden Gate Bridge so it would say you know I'm I'm I'm feeling relaxed and expansive much like the the arches of the Golden Gate Bridge or you know it would masterfully change topic to the Golden Gate Bridge and it integrated there was also a sadness to it to to the focus ah had on the Golden Gate Bridge I think people quickly fell in love with it I think so people already miss it because it was taken down I think after a day somehow these interventions on the model um where where where where you kind of adjust Its Behavior somehow emotionally made it seem more human than any other version of the model strong personality strong ID strong personality it has these kind of like obsessive interests you know we can all think of someone who's like obsessed with something so it does make it feel somehow a bit more human let's talk about the present let's talk about Claude so this year A lot has happened in March claw 3 Opa Sonet Hau were released then claw 35 Sonet in July with an updated version just now released and then also claw 35 hi coup was released okay can you explain the difference between Opus Sonet and Haiku and how we should think about the different versions yeah so let's go back to March when we first released uh these three models so you know our thinking was you different companies produce kind of large and small models better and worse models we felt that there was demand both for a really powerful model um you know and you that might be a little bit slower that you'd have to pay more for and also for fast cheap models that are as smart as they can be for how fast and cheap right whenever you want to do some kind of like you know difficult analysis like if I you know I want to write code for instance or you know I want to I want to brainstorm ideas or I want to do creative writing I want the really powerful model but then there's a lot of practical applications in a business sense where it's like I'm interacting with a website I you know like I'm like doing my taxes or I'm you know talking to uh you know to like a legal adviser and I want to analyze a contract or you know we have plenty of companies that are just like you know you know I want to do autocomplete on my on my IDE or something uh and and for all of those things you want to act fast and you want to use the model very broadly so we wanted to serve that whole spectrum of needs um so we ended up with this uh you know this kind of poetry theme and so what's a really short poem it's a Haik cou and so Haiku is the small fast cheap model that is you know was at the time was released surprisingly surprisingly uh intelligent for how fast and cheap it was uh sonnet is a is a medium-sized poem right a couple paragraphs since o Sonet was the middle model it is smarter but also a little bit slower a little bit more expensive and and Opus like a magnum opus is a large work uh Opus was the the largest smartest model at the time um so that that was the original kind of thinking behind it um and our our thinking then was well each new generation of models should shift that tradeoff curve uh so when we release Sonet 3.5 it has the same roughly the same you know cost and speed as the Sonet 3 Model uh but uh it it increased its intelligence to the point where it was smarter than the original Opus 3 Model uh especially for code but but also just in general and so now you know we've shown results for a Hau 3. 5 and I believe Hau 3.5 the smallest new model is about as good as Opus 3 the largest old model so basically the aim here is to shift the curve and then at some point there's going to be an opus 3.5 um now every new generation of models has its own thing they use new data their personality changes in ways that we kind of you know try to steer but are not fully able to steer and and so uh there's never quite that exact equivalence the only thing you're changing is intelligence um we always try and improve other things and some things change without us without us knowing or measuring so it's it's very much an inexact science in many ways the manner and personality of these models is more an art than it is a science so what is sort of the reason for uh the span of time between say Claude Opus 3 and 35 what is it what takes that time if you can speak to yeah so there's there's different there's different uh processes um uh there's pre-training which is you know just kind of the normal language model training and that takes a very long time um that uses you know these days you know tens you know tens of thousands sometimes many tens of thousands of uh gpus or tpus or tranium or you know what we use different platforms but you know accelerator chips um often often training for months uh there's then a kind of posttraining phase where we do reinforcement learning from Human feedback as well as other kinds of reinforcement learning that that phase is getting uh larger and larger now and you know you know often that's less of an exact science it often takes effort to get it right um models are then tested with some of our early Partners to see how good they are and they're then tested both internally and externally for their safety particularly for catastrophic and autonomy r risks uh so uh we do internal testing according to our responsible scaling policy which I you know could talk more about that in detail and then we have an agreement with the US and the UK AI safety Institute as well as other third-party testers in specific domains to test the models for what are called cbrn risk chemical biological radiological and nuclear which are you know we don't think that models pose these risks seriously yet but but every new model we want to evaluate to see if we're starting to get close to some of these these these more dangerous um uh these more dangerous capabilities so those are the phases and then uh you know then then it just takes some time to get the model working in terms of inference and launching it in the API so there's just just a lot of steps to uh to actually to actually making a model work and of course you know we're always trying to make the processes as streamlined as possible right we want our safety testing to be rigorous but we want it to be RoR ous and to be you know to be automatic to happen as fast as it can without compromising on rigor same with our pre-training process and our posttraining process so you know it's just like building anything else it's just like building airplanes you want to make them you know you want to make them safe but you want to make the process streamlined and I think the creative tension between those is is you know is an important thing and making the models work yeah uh rumor on the street I forget who was saying that uh anthropic is really good tooling so I uh probably a lot of the challenge here is on the software engineering side is to build the tooling to to have a like a efficient low friction interaction with the infrastructure you would be surprised how much of the challenges of uh you know building these models comes down to you know software engineering performance engineering you know you you know from the outside you might think oh man we had this Eureka breakthrough right you know this movie with the science we discovered it we figured it out but but but I think I think all things even even even you know incredible discoveries like they they they they they almost always come down to the details um and and often super super boring details I can't speak to whether we have better tooling than than other companies I mean you know I haven't been at those other companies at least at least not recently um but it's certainly something we give a lot of attention to I don't know if you can say but from three from CLA 3 to CLA 35 is there any extra pre-training going on or is they mostly focus on the post-training there's been leaps in performance yeah I think I think at any given stage we're focused on improving everything at once um just just naturally like there are different teams each team makes progress in a particular area in in in making a particular you know their particular segment of the relay race better and it's just natural that when we make a new model we put we put all of these things in at once so the data you have like the preference data you get from rhf is that applicable is there ways to apply it to newer models as it get trained up yeah preference data from old models sometimes gets used for new models although of course uh it it performs somewhat better when it's you know trained on it's trained on the new models note that we have this you know constitutional AI method such that we don't only use preference data we kind of there's also a post-t trainining process where we train the model against itself and there's you know new types of post training the model against itself that are used every day so it's not just RF it's a bunch of other methods as well um post training I think you know it's becoming more and more sophisticated well what explains the big leap in performance for the new Sona 35 I mean at least in the programming side and maybe this is a good place to talk about benchmarks what does it mean to get better just the number went up but you know I I I program but I also love programming and I um claw 35 through cursor is what I use uh to assist me in programming and there was at least experientially anecdotally it's gotten smarter at programming so what like what what does it take to get it uh to get it smarter we observe that as well by the way there were a couple uh very strong Engineers here at anthropic um who all previous code models both produced by us and produced by all the other companies hadn't really been useful to to hadn't really been useful to them you know they said you know maybe maybe this is useful to beginner it's not useful to me but Sonet 3.5 the original one for the first time they said oh my God this helped me with something that you know that it would have taken me hours to do this is the first model that has actually saved me time so again the water line is rising and and then I think you know the new Sonet has been has been even better in terms of what it what it takes I mean I'll just say it's been across the board it's in the pre-training it's in the posttraining it's in various evaluations that we do we've observed this as well and if we go into the details of the Benchmark so s bench is basically you know since since you know since since you're a programmer you know you'll be familiar with like PLL requests and you know uh just just PLL requests are like you know the like a sort of a sort of atomic unit of work you know you could say I'm you know I'm implementing one I'm implementing one thing um uh and and so sbench actually gives you kind of a real world situation where the codebase is in a current state and I'm trying to implement something that's you know that's described in described in language we have internal benchmarks where we where we measure the same thing and you say just give the model free reign to like you know do anything run run run anything edit anything um how how well is it able to complete these tasks and it's that Benchmark that's gone from it can do it 3% of the time to it can do it about 50% of the time um so I actually do believe that if we get you can gain benchmarks but I think if we get to 100% on that Benchmark in a way that isn't kind of like overtrained or or or game for that particular Benchmark probably represents a real and serious increase in kind of in kind of programming programming ability and and I would suspect that if we can get to you know 90 90 95% that that that that you know it will it will represent ability to autonomously do a significant fraction of software engineering tasks well ridiculous timeline question uh when is clad Opus uh 3.5 coming up uh not giving you an exact date uh but you know there there uh you know as far as we know the plan is still to have a Claude 3.5 opus are we gonna get it before GTA 6 or no like Duke Nukem Forever was that game that there was some game that was delayed 15 years was that Duke Nukem Forever yeah and I think GTA is now just releasing trailers it you know it's only been three months since we released the first son it yeah it's Inc the incredible pace of relas it just it just tells you about the pace the expectations for when things are going to come out so uh what about 40 so how do you think about sort of as these models get bigger and bigger about versioning and also just versioning in general why Sonet 35 updated with the date why not Sonet 3.6 actually naming is actually an interesting challenge here right because I think a year ago most of the model was pre-training and so you could start from the beginning and just say okay we're going to have models of different sizes we're going to train them all together and you know we'll have a a family of naming schemes and then we'll put some new magic into them and then you know we'll have the next the next Generation Um the trouble starts are already when some of them take a lot longer than others to train right that already messes up your time time a little bit but as you make big improvements in as you make big improvements in pre-training uh then you suddenly notice oh I can make better pre-train model and that doesn't take very long to do and but you know clearly it has the same you know size and shape of previous models uh uh so I think those two together as well as the timing timing issues any kind of scheme you come up with uh you know the reality tends to kind of frustrate that scheme right T tends to kind of break out of the break out of the scheme it's not like software where you can say oh this is like you know 3.7 this is 3.8 no you have models with different different tradeoffs you can change some things in your models you can train you can change other things some are faster and slower at inference some have to be more expensive some have to be less expensive and so I think all the companies have struggled with this um I think we did very you know I think think we were in a good good position in terms of naming when we had Haiku Sonet and we're trying to maintain it but it's not it's not it's not perfect um so we'll we'll we'll try and get back to the Simplicity but it it um uh just the the the nature of the field I feel like no one's figured out naming it's somehow a different Paradigm from like normal software and and and so we we just none of the companies have been perfect at it um it's something we struggle with surprisingly much relative to you know how relative to how trivial it is to you know for the the the the grand science of training the models so from the user side the user experience of the updated Sonet 35 is just different than the previous uh June 2024 Sonet 35 it would be nice to come up with some kind of labeling that embodies that because people talk about son 35 but now there's a different one and so how do you refer to the previous one and the new one and it it uh when there's a distinct Improvement it just makes conversation about it uh just challenging yeah yeah I I definitely think this question of there are lots of properties of the models that are not reflected in the benchmarks um I I think I think that's that's definitely the case and everyone agrees and not all of them are capabilities some of them are you know models can be polite or brusk they can be uh you know uh very reactive or they can ask you questions um they can have what what feels like a warm personality or a cold personality they can be boring or they can be very distinctive like Golden Gate Claude was um and we have a whole you know we have a whole team kind of focused on I think we call it Claude character uh Amanda leads that team and we'll we'll talk to you about that but it's still a very inexact science um and and often we find that models have properties that we're not aware of the the fact of the matter is that you can you know talk to a model 10,000 times and there are some behaviors you might not see uh just like just like with a human right I can know someone for a few months and you know not know that they have a certain skill or not know there's a certain side to them and so I think I think we just have to get used to this idea and we're always looking for better ways of testing our models to to demonstrate these capabilities and and and also to decide which are which are the which are the personality properties we want models to have have and which we don't want to have that itself the normative question is also super interesting I got to ask you a question from Reddit from Reddit oh boy you know there there's just this fascinating to me at least it's a psychological social phenomenon where people report that Claude has gotten Dumber for them over time and so uh the question is does the user complaint about the dumbing down of claw 35 Sonic hold any water so are these anecdota reports a kind of social phenomena or did Claude is there any cases where Claude would get Dumber so uh this actually doesn't apply this this isn't just about Claude I I believe this I believe I've seen these complaints for every Foundation model produced by a major company um people said this about gp4 they said it about gp4 turbo um so so so a couple things um one the actual weights of the model right the actual brain of the model that does not change unless we introduce a new model um there there just a number of reasons why it would not make sense practically to be randomly substituting in substituting in new versions of the model it's difficult from an inference perspective and it's actually hard to control all the consequences of changing the way to the model let's say you wanted to fine-tune the model to be like I don't know to like to say certainly less which you know an old version of Sonet used to do um you actually end up changing a 100 things as well so we have a whole process for it and we have a whole process for modifying the model we do a bunch of testing on it we do a bunch of um like we do a bunch of user testing and early customers so it we both have never changed the weights of the model without without telling anyone and it it it wouldn't certainly in the current setup it would not make sense to do that now there are a couple things that we do occasionally do um one is sometimes we run AB tests um but those are typically very close to when a model is being is being uh released and for a very small fraction of time um so uh you know like the you know the the day before the new Sonet 3.5 I I agree we should have should have had a better name it's clunky to refer to it um there were some comments from people that like it's got It's got it's gotten a lot better and that's because you know a fraction were exposed to to an AB test for for those one or for those one or two days um the other is that occasionally the system prompt will change um on the system prompt can have some effects although it's un it it it's unlikely to dumb down models it's unlikely to make them Dumber um and and and and we've seen that while these two things which I'm listing to be very complete um happen relatively happen quite infrequently um the complaints about to for us and for other model companies about the model changed the model isn't good at this the model got more censored the model was dumb down those complaints are constant and so I don't want to say like people are imagining it or anything but like the models are for the most part not changing um if I were to offer a theory um I I think it actually relates to one of the things I said before which is that models have many are very complex and have many aspects to them and so often you know if I if I if if I ask a model a question you know if I'm like if I'm like do task X versus can you do task XX the model might respond in different ways uh and and so there are all kinds of subtle things that you can change about the way you interact with the model that can give you very different results um to be clear this this itself is like a failing by by us and by the other model providers that that the models are are just just often sensitive to like small small changes in wording it's yet another way in which the science of how these models work is very poorly developed uh and and so you know if I go to sleep one night and I was like talking to the model in a certain way and I like slightly Chang the phrasing of how I talk to the model you know I could I could get different results so that's that's one possible way the other thing is man it's just hard to quantify this stuff uh it's hard to quantify this stuff I think people are very excited by new models when they come out and then as time goes on they they become very aware of the they become very aware of the limitations so that may be another effect but that's that's all a very long- rended way of saying for the most part with some fairly narrow exceptions the models are not changing I think there is a psychological effect you just start getting used to it the Baseline ra like when people have first gotten Wi-Fi on airplanes it's like amazing magic and then now like I can't get this thing to work this is such a piece of crap exactly so it's easy to have the conspiracy theory of they're making Wi-Fi slower and slower this is probably something I'll talk to Amanda much more about but U another Reddit question uh when will Claud stop trying to be my uh panical grandmother imposing its moral World viw on me as a paying customer and also what does it that ology behind making Claude overly apologetic so this kind of reports about The Experience a different angle on the frustration it has to do with the character yeah so a couple points on this first one is um like things that people say on Reddit and Twitter or X or whatever it is um there's actually a huge distribution shift between like the stuff that people complain loudly about on social media and what actually kind of like you know statistically users care about and that drives people to use the models like people are frustrated with you know things like you know the model not writing out all the code or the model uh you know just just not being as good at code as it could be even though it's the best model in the world on code um I I think the majority of thing of things are about that um uh but uh certainly a a a kind of vocal minority are uh you know kind kind of kind of rais these concerns right are frustrated by the model refusing things that it shouldn't refuse or like apologizing too much or just just having these kind of like annoying verbal ticks um the second caveat and I just want to say this like super clearly because I think it's like some people don't know it others like kind of know it but forget it like it is very difficult to control across the board how the models behave you cannot just reach in there and say oh I want the model to like apologize less like you can do that you can include trading data that says like oh the models should like apologize less but then in some other situation they end up being like super rude or like overconfident in a way that's like misleading people so they're they're all these tradeoffs um uh for example another thing is if there was a period during which models ours and I think others as well were T verbose right they would like repeat themselves they would say too much um you can cut down on the verbosity by penalizing the models for for just talking for too long what happens when you do that if you do it in a crude way is when the models are coding sometimes they'll say of the code goes here right because they've learned that that's a way to economize and that they see it and then and then so that leads the model to be so-called lazy in coding where they where they where they're just like ah you can finish the rest of it it's not it's not because we want to you know save on compute or because you know the models are lazy and you know during winter break or any of the other kind of conspiracy theories that have that have that have come up it's actually it's just very hard to control the behavior of the mod
Resume
Categories