#### 科技工作者之家

科技工作者之家APP是专注科技人才，知识分享与人才交流的服务平台。

科技工作者之家 2019-03-31

又到了每周一次的 Nature Podcast 时间了！欢迎收听本周由Benjamin Thompson和 Nick Howe 带来的一周科学故事，本期播客片段讨论重新思考统计显著性。欢迎前往iTunes或你喜欢的其他播客平台下载完整版，随时随地收听一周科研新鲜事。

音频文本：

Host: Nick Howe

You have a science background, right Ben?

Host: Benjamin Thompson

Oh yes, I’ve got my PhD. I was very much a proper scientist.

Host: Nick Howe

Brilliant, so you must have done all sorts of statistical testing in your research then. I bet you know your ANOVAs from your ANCOVAs and your t-tests from your Mann-Whitney Us?

Host: Benjamin Thompson

Um, I recognise some of those words.

Host: Nick Howe

Well, don’t worry. Just to jog your memory, these are some of the statistical tests that are often used to calculate statistical significance. So, let’s say you’re looking at two sets of data and have found a difference between the two groups. How would you know that that difference is statistically significant?

Host: Benjamin Thompson

I know this! You’d need to calculate the P value to work it out, right?

Host: Nick Howe

Yeah, that’s right. The P value can be thought of as a measure of how likely you are to see this difference by chance, so a low P value means it’s unlikely to be coincidental. Results are often deemed significant if the P value is below a certain threshold.

Host: Benjamin Thompson

A threshold like 0.05?

Host: Nick Howe

Exactly, but what is so special about 0.05? I asked Regina Nuzzo, a statistician and writer from Gallaudet University.

Interviewee: Regina Nuzzo

Nothing, really. It’s an accident of history, believe it or not. The statistician in the early 1900s who’s most widely attributed for developing the P value, said 0.05 seems good enough to me, and it kind of stuck.

Host: Benjamin Thompson

Wait, so 0.05 is completely arbitrary?

Interviewer: Nick Howe

Well, it can be useful to have a threshold, but statisticians have been saying for a long time that whilst P values are a useful rule of thumb, they’re often misunderstood and misused when we talk about statistical significance. Now, there’s a Comment this week in Nature saying we should get rid of the concept of statistical significance altogether. I spoke to one of the authors of the Comment, Blake McShane, a statistician from Kellogg School of Management, about why he thinks statistical significance isn’t that significant.

Interviewee: Blake McShane

Basically, the idea is we’ll see some slide and usually there’ll be a graph and it’ll be comparing two or more groups, maybe a treatment group and a control group, and you’ll see the two measurements are not exactly equal. But the speaker will then tell you that in fact, these two things which the naked eye shows you are in fact different, there’s no difference. And the reason they do is because the difference failed to attain statistical significance.

Interviewer: Nick Howe

The problem here is that by crossing an arbitrary threshold and categorising results as significant or non-significant, researchers can be misled in their interpretation. They may consider differences to exist when they don’t, or vice versa.

Interviewee: Blake McShane

Even if the underlying statistics that led to the categorisation are perfectly in order, the very act of cutting them at some threshold, it then causes people to think that these items are in fact categorically different.

Interviewer: Nick Howe

Blake also tested scientists and undergraduates on their ability to decide whether there were differences in data. They were presented with significant and non-significant P values. When P values were non-significant, many respondents became blind to real differences. It could be that the way we’re being taught statistics leads to this categorisation.

Interviewee: Blake McShane

If you look at undergraduates and break them up by those who have taken a statistics class versus those who have not, those who have taken statistics classes make the error at the rate that the other scientists do, which means this is sort of deeply ingrained from the get-go in terms of statistical education. Interestingly, the undergraduates who have never taken a statistics class do not make this error because they don’t know what a P value is, they don’t know what statistical significance is, so when you present it, they ignore it and just say is 8.2 bigger than 7.3?

Interviewer: Nick Howe

Rather than dumping results into one of two categories and using that to decide whether or not an effect is interesting or even real, Regina and Blake think that statistics should be used thoughtfully. Reporting all the data, making clear the uncertainty in estimates and presenting P values precisely as a continuum, may help avoid the problems that come with categorisation. There are concerns, though, that humans are great groupers. We love putting things in boxes, so whatever subtle results are observed, humans will inevitably find a category to put them in. Regina, though, is optimistic that we can learn not to categorise.

Interviewee: Regina Nuzzo

I think as a society – and certainly many individuals – we’re now becoming more comfortable with acknowledging the shades of grey that exist in the real world. So, I wonder if now is a very good time, it’s the right time, for us to be looking at P values and saying you know what, it is continuous.

Interviewer: Nick Howe

So, what are the alternatives? Whilst Blake and his co-authors wish to put statistical significance out to pasture, there is utility in having some kind of threshold to assess the quality of publications. Regina has some thoughts on potential ways forward, in particular the idea of journals using ‘registered reports’.

Interviewee: Regina Nuzzo

Researchers can submit their entire study and data protocol before they’ve collected any data, submit that for peer review, and if accepted, the researchers get an in-principle publication no matter what their results are. So that means no matter what P value I get, if I’ve done it smartly then I get a publication.

Interviewer: Nick Howe

Many other solutions have been proposed, but so far, the concept of statistical significance has stuck. If categorisation is counter-productive then Regina thinks that we’ll need systemic change.

Interviewee: Regina Nuzzo

You can tell people what’s right and wrong and they can know that intellectually, but if they’re surrounded by a system where the incentives and rewards and all their peers are doing something different, then it’s very hard to shift them into doing the right thing. Journals and labs have been putting into place various policies and innovations that support using P values as they were originally intended and I think that’s very exciting when we can change the surrounding environment to make it easier to do the right thing.

Interviewer: Nick Howe

That was Regina Nuzzo from Gallaudet University in the US. You also heard from Blake McShane from the Kellogg School of Management, also in the US. You can read Blake’s Comment piece at nature.com/opinion. ⓝ

Nature Podcast每周为您带来科学世界的全球新闻故事，覆盖众多科研领域，重点讲述Nature期刊上激动人心的研究故事。我们将话筒递给研究背后的科学家，呈现来自Nature记者和编辑的深度分析。在2017年，来自中国的收听和下载超过50万次，居全球第二。

↓↓iPhone用户长按二维码进入iTunes订阅

↓↓安卓用户长按二维码进入推荐平台acast订阅

点击“阅读原文”访问Nature官网收听完整版播客

来源：Nature-Research Nature自然科研

原文链接：http://mp.weixin.qq.com/s?__biz=MzAwNTAyMDY0MQ==&mid=2652559525&idx=3&sn=8a9e1391904c315cd513f8b8a949fcfa&chksm=80cd762bb7baff3d27350684b0885baa0b0c0ab838e236b75c531cdd1f7c5f8477978daac8e3&scene=27#wechat_redirect

版权声明：除非特别注明，本站所载内容来源于互联网、微信公众号等公开渠道，不代表本站观点，仅供参考、交流、公益传播之目的。转载的稿件版权归原作者或机构所有，如有侵权，请联系删除。

电话：（010）86409582

邮箱：kejie@scimall.org.cn