Transcript preview
Open
Kind: captions Language: en [Applause] in 2011 an article was published in the reputable Journal of Personality and Social Psychology it was called feeling the future experimental evidence for anomalous retroactive influences on cognition and effect or in other words proof that people can see into the future the paper reported on nine experiments in one participants were shown two curtains on a computer screen and asked to predict which one had an image behind it the other just covered a blank wall once the participant made their sele the computer randomly positioned an image behind one of the curtains then the selected curtain was pulled back to show either the image or the blank wall the images were randomly selected from one of three categories neutral negative or erotic if participants selected the curtain covering the image this was considered a hit now with there being two curtains and the images positioned randomly behind one of them you would expect the hit rate to be about 50% and that is exactly what the researchers found at least for negative neutral images however for erotic images the hit rate was 53% does that mean that we can see into the future is that slight deviation significant well to assess significance scientists usually turn to P values a statistic that tells you how likely a result at least this extreme is if the null hypothesis is true in this case the null hypothesis would just be that people couldn't actually see into the future and the 53% result was due to lucky guesses for this study the P value was .01 meaning there was just a 1% chance of getting getting a hit rate of 53% or higher from simple luck P values less than 005 are generally considered significant and worthy of publication but you might want to use a higher bar before you accept that humans can accurately perceive the future and say invite the study's author on your news program but hey it's your choice after all the 0.05 threshold was arbitrarily selected by Ronald fiser in a book he published in 1925 but this raises the question how much of the published research literature is actually false the intuitive answer seems to be 5% I mean if everyone's using p less than 05 as a cut off for statistical significance you would expect five of every hundred results to be false positives but that unfortunately grossly underestimates the problem and here's why imagine you're a researcher in a field where there are a thousand hypotheses currently being investigated let's assume that 10% of them reflect true relationships and the rest are false but no one of course knows which or which that's the whole point of doing the research now assuming the experiments are pretty welld designed they should correctly identify around say 80 of the 100 True relationships this is known as a statistical power of 80% so 20 results are false negatives perhaps the sample size was too small or the measurements were not sensitive enough now consider that from those 900 false hypotheses using a P value of 05 45 false hypotheses will be incorrectly considered true as for the rest they will be correctly identified as false but most journals rarely publish null results they make up just 10 to 30% of papers depending on the field which means that the papers that eventually get published will include 80 true positive results 45 false positive results and maybe 20 true negative results nearly a third of published results will be wrong even with the system working normally things get even worse if studies are underpowered and Analysis shows they typically are if there is a higher ratio of false to True hypotheses being tested or if the researchers are biased all of this was pointed out in a 2005 paper entitled why most published research is false so recently researchers in a number of fields have attempted to quantify the problem by replicating some prominent past results the reproducibility project repeated 100 psychology studies but found only 36% had a statistically significant result the second time around and the strength of measured relationships were on average half those of the original studies an attempted verification of 53 studies considered landmarks in the basic science of cancer only managed to reproduce six even working closely with the original studies authors these results are even worse than I just calculated the reason for this is nicely illustrated by a 2015 study showing that eating a bar of chocolate every day can help you lose weight faster in this case the participants were randomly allocated to one of three treatment groups one went on a low carb diet another went on the same low carb diet plus a 1.5 o bar of chocolate per day and the third group was the control contr instructed just to maintain their regular eating habits at the end of 3 weeks the control group had neither lost nor gained weight but both low carb groups had lost an average of 5 lbs per person the group that ate chocolate however lost weight 10% faster than the non-chocolate eaters the finding was statistically significant with a P value less than .05 as you might expect this news spread like wildfire to the front page of build the most widely circulated Daily Newspaper in Europe then to the Daily Star the Irish Examiner The Huffington Post and even Shape magazine unfortunately the whole thing had been faked kind of I mean researchers did perform the experiment exactly as they described but they intentionally designed it to increase the likelihood of false positives the sample size was incredibly small just five people per treatment group and for each person 18 different measurements were tracked including weight cholesterol sodium blood protein levels Sleep Quality well-being and so on so if weight loss did show a significant difference there were plenty of other factors that might have so the headline could have been chocolate lowers cholesterol or increases Sleep Quality or something the the point is a P value is only really valid for a single measure once you're comparing a whole slew of variables the probability that at least one of them gives you a false positive goes way up and this is known as pcking researchers can make a lot of decisions about their analysis that can decrease the P value for example let's say you analyze your data and you find it nearly reaches statistic iCal significance so you decide to collect just a few more data points to be sure then if the P value drops below 05 you stop collecting data confident that these additional data points could only have made the result more significant if there were really a true relationship there but numerical simulations show that relationships can cross the significance threshold by adding more data points even though a much larger sample would show that there really is no relationship in fact there are a great number of ways to increase the likelihood of significant results like having two dependent variables adding more observations controlling for gender or dropping one of three conditions combining all three of these strategies together increases the likelihood of a false positive to over 60% and that is using p less than 05 now if you think this is just a problem for psychology Neuroscience or medicine consider the pentaquark an exotic particle made up of five quirks as opposed to the regular three for protons or neutrons particle physics employs particularly stringent requirements for statistical significance referred to as five Sigma or one chance in 3.5 million of getting a false positive but in 2002 a Japanese experiment found evidence for the theta plus pentor and in the two years that followed 11 other independent experiments then looked for and found evidence of that same pentor with very high levels of statistical significance from July 2003 to May 2004 a theoretical paper on pent cors was published on average every other day but alas it was a false false Discovery further experimental attempts to confirm the theta plus pentor using greater statistical power failed to find any trace of its existence the problem was those first scientists weren't blind to the data they knew how the numbers were generated and what answer they expected to get and the way the data was cut and analyzed or packed produced The False finding now most scientists aren't packing maliciously there are legitimate decisions to be made about how to collect analyze and Report data and these decisions impact on the statistical significance of results for example 29 different research groups were given the same data and asked to determine if darkskinned soccer players are more likely to be given red cards using identical data some groups found there was no significant effect While others concluded dark skin players were three times is likely to receive a red card the point is that data doesn't speak for itself it must be interpreted looking at those results it seems that dark skinn players are more likely to get red carded but certainly not three times as likely consensus helps in this case but for most results only one research group provides the analysis and therein lies the problem of incentives scientists have huge incentives to publish papers in fact their careers depend on it as one scientist Brian noek puts it there is no cost to getting things wrong the cost is not getting them published journals are far more likely to publish results that reach statistical significance so if a method of data analysis results in a P value less than 05 then you're likely to go with that method publication's also more likely if the result is novel and unexpected this encourages researchers to investigate more and more unlikely hypotheses which further decreases the ratio of true to spous relationships that are tested now what about replication isn't science meant to self-correct by having other scientists replicate the findings of an initial Discovery in theory yes but in practice it's more complicated like take the precognition study from the start of this video three researchers attempted to replicate one of those experiments and what did they find well surprise surprise the hit rate they obtained was not significantly different from chance when they tried to publish their findings in the same Journal as the original paper they were rejected the reason the journal refuses to publish replication studies so if you're a scientist the successful strategy is clear and don't even attempt replication studies because few journals will publish them and there is a very good chance that your results won't be statistically significant anyway in which case instead of being able to convince colleagues of the lack of reproducing possibility of an effect you will be accused of just not doing it right so a far better approach is to test novel and unexpected hypotheses and then pack your way to a statistically significant result now I don't want to be too cynical about this because over the past 10 years things have started changing for the better many scientists acknowledge the problems I've outlined and are starting to take steps to correct them there are more large- scale replication studies undertaken in the last 10 years plus there's a site retraction watch dedicated to publicizing papers that have been withdrawn there are online repositories for unpublished negative results and there is a move towards submitting hypotheses and methods for peerreview before conducting experiments with the guarantee that research will be published regardless of results so long as the procedure is followed this eliminates publication bias promotes higher powerered studies and lessens the incentive for packing the thing I find most striking about the reproducibility crisis in science is not the prevalence of incorrect information in published scientific journals after all getting to the truth we know is hard and mathematically not everything that is published can be correct what gets me is the thought that even trying our best to figure out what's true using our most sophisticated and rigorous mathematical tools peer review and standards of practice we still get it wrong so often so how frequently do we delude ourselves when we're not using the scientific method as flawed as our science may be it is Far and Away moral than any other way of knowing that we have this episode of veritasium was supported in part by These Fine people on patreon and by audible.com the leading provider of audiobooks online with hundreds of thousands of titles in all areas of literature including fiction non-fiction and periodicals audible offers a free 30-day trial to anyone who watches this channel just go to audible.com/veritasium so they know I sent you a book I'd recommend is called the invention of nature by Andrea wolf which is a biography of Alexander Von Hult an adventurer naturalist who actually inspired Darwin to board the Beagle you can download that book or any other of your choosing for a one-month free trial at audible.com/veritasium so as always I want to thank Audible for supporting me and I really want to thank you for watching
Resume
Categories