Perspective

Beyond p-value: The Rigor and Power of Study

Fengyu Zhang, Claude Hughes

Received June 29, 2019, Accepted December 6, 2019

ABSTRACT

There have been a series of recent discussions and debates on the p-value and statistical significance, which has been taught in statistics and used in research for nearly a century after R.A. Fisher. These discussions, including publications of more than 40 papers in a special issue of the American Statistician, provide an excellent opportunity to think about some technical measures for practical implementation in grant applications and publications. While several factors have been discussed, it may be the rigor of a study that determines the p-value for reporting study results and judging a consistent replication of research. Based on the Fisherian and Neyman-Pearson theory for statistical hypothesis testing, we propose new criteria, which can be implemented without fundamental changes in existing statistics, to reduce false positives and irreplicability of studies that are either inadequately powered or overpowered.

KEYWORDS

P-value, rigor of study, statistical power, statistical hypothesis testing, statistical significance

How to cite this article:

Zhang, F and Hughes CL. Beyond p-value: the rigor and power of study. Glob Clin Transl Res. 2020; 2(1):1-6. DOI:10.36316/gcatr.02.0021. DOI:10.36316/gcatr.02.0021.

Comments in

HC Kraemer. A comment on “Beyond p-value: the rigor and power of study”. Glob Clin Transl Res. 2020; 2(1):7-9. doi:10.36316/gcatr.02.0022.

S Wu. Improve reproducibility by using appropriate statistical methods. Glob Clin Transl Res. 2020; 2(1):10-11. doi:10.36316/gcatr.02.0023.

A Hendrix. A comment on “Beyond p-value: the rigor and power of study”. Glob Clin Transl Res. 2020; 2(1):12. doi:10.36316/gcatr.02.0024.

F Zhang, C Hughes. Authors’ reply to comments. Glob Clin Transl Res. 2020; 2(1):12. doi:10.36316/gcatr.02.0025.

References

1. Ioannidis JPA. The Proposal to Lower P Value Thresholds to .005. JAMA. 2018;319(14):1429-30.

2. Benjamin DJ, Berger JO, Johannesson M, Nosek BA, Wagen-makers E-J, Berk R, et al. Redefine statistical significance. Nature Human Behaviour. 2018;2(1):6.

3. Amrhein V, Greenland S, McShane B. Scientists rise up agai-nst statistical significance. Nature. 2019;567 (7748): 305-7.

4. Ioannidis JP. Why most published research findings are false. PLoS Med. 2005;2(8):e124.

5. Savalei V, Dunn E. Is the call to abandon p-values the red herring of the replicability crisis? Front Psychol. 2015;6:245.

6. Benjamin DJ, Berger JO, Johannesson M, Nosek BA, Wagen-makers EJ, Berk R, et al. Redefine statistical significance. Nat Hum Behav. 2018;2(1):6-10.

7. McNutt M. Journals unite for reproducibility. Science. 2014; 346(6210):679.

8. Goodman S, Greenland S. Why most published research find-ings are false: problems in the analysis. PLoS Med. 2007; 4(4):e168.

9. Shrier I. Power, reliability, and heterogeneous results. PLoS Med. 2005;2(11):e386; author reply e98.

10. Nuzzo R. Scientific method: statistical errors. Nature. 2014; 506(7487):150-2.

11. Mathew R. The ASA’s p-value statement,one year on. Signi-ficance. 2017;14(2):38-41.

12. McShane BB, Gal D, Gelman A, Robert C, Tackett JL. Abandon Statistical Significance. The American Statistician. 2019; 73 (sup1):235-45.

13. Wasserstein R, Schirm A, Lazar N. Moving to aWorld Beyond "p<0.05". The American Statistician. 2019;73 (sup1):1-19.

14. Leek JT, Peng RD. Statistics: P values are just the tip of the iceberg. Nature. 2015;520(7549):612.

15. Fisher RA. On the mathematical foundations of theoretical statistics. PhilosophicalTransactions of the Royal Society of London Series A, Containing Papers of a Mathematical or Physical Character. 1922;222(1):309-36.

16. Fisher R. Statistical Methods for Research Worker. Eding-burg: Oliver and Boyd; 1925.

17. Lehmann E. Fisher, Neyman, and the Creation of Classical Statistics: Springer; 2011.

18. Cowles M. Statistics in Psychology: An Historical Perspective: Taylor & Francis; 2005.

19. Perezgonzalez JD. Fisher, Neyman-Pearson or NHST? A tutorial for teaching data testing. Front Psychol.2015;6:223.

20. Neyman J, Pearson ES. On the use and interpretation of cer-tain test criteria for purposes of statistical inference: Part I. Biometrika. 1928:175-240.

21. Neyman J, Pearson ES. IX. On the problem of the most effici-ent tests of statistical hypotheses. Philosophical Transa-ctions of the Royal Society of London Series A,Containing Papers of a Mathematical or Physical Character. 1933; 231 (694-706):289-337.

22. Lehman EL. The Fisher, Neyman-Pearson theories of testing hypotheses: one theory or two?. Journal of the American Sta-tistical Association 1993; 88(424 ):1242-9.

23. Xia L, Xia K, Weinberger DR, Zhang F. Common genetic vari-ants shared among five major psychiatric disorders: a large-scale genome-wide combined analysis. Glob Clin Transl Res. 2019;1(1):21-30.

24. Schizophrenia Working Group of the Psychiatric Genomics C. Biological insights from 108 schizophrenia-associated gene-tic loci. Nature. 2014;511(7510):421-7.

25. Locascio J. The Impact of Results Blind Science Publishing on Statistical Consultation and Collaboration The American Sta-tistician. 2019;73(Sup1):346-51.

26. Steele F, Diamond I, Wang D. The determinants of the dur-ation of contraceptive use in China: a multilevel multi-nomial discrete-hazards modeling approach. Demography. 1996; 33 (1):12-23.

27. Short S, Zhang F. Use of maternal health services in rural China. Popul Stud (Camb). 2004;58(1):3-19.

28. Finkel A. The road to bad research is paved with good inten-tions. Nature. 2019;566:297.

29. Landis SC, Amara SG, Asadullah K, Austin CP, Blumenstein R, Bradley EW, et al. A call for transparent reporting to optim-ize the predictive value of preclinical research. Nature. 2012; 490(7419):187-91.

30. Schulz KF, Altman DG, Moher D, Group C. CONSORT 2010 statement: updated guidelines for reporting parallel group randomised trials. BMJ. 2010;340:c332.

31. Trafimow D, Amrhein V, Areshenkoff CN, Barrera-Causil CJ, Beh EJ, Bilgic YK, et al. Manipulating the Alpha Level Cannot Cure Significance Testing. Front Psychol. 2018;9:699.

32. Bishop D. Rein in the four horsemen of irreproducibility. Nature. 2019;568:435.

33. Colquhoun D. An investigation of the false discovery rate and the misinterpretation of p-values. R Soc Open Sci. 2014;1 (3):140216.

34. Hochster HS. The power of "p": on overpowered clinical trials and "positive" results. Gastrointest Cancer Res. 2008; 2 (2): 108-9.

35. Ioannidis JP. The Mass Production of Redundant, Misleading, and Conflicted Systematic Reviews and Meta-analyses. Mil-bank Q. 2016;94(3):485-514.

36. Nikpay M, Goel A, Won HH, Hall LM, Willenborg C, Kanoni S, et al. A comprehensive1,000 Genomes-based genome-wide association meta-analysis of coronary artery disease. Nat Genet. 2015;47(10):1121-30.

37. Case LD, Ambrosius WT. Power and sample size. Methods Mol Biol. 2007;404:377-408.

38. International Schizophrenia Consortium, Purcell SM, Wray NR, Stone JL, Visscher PM, O'Donovan MC, et al. Common polygenic variation contributes to risk of schizophrenia and bipolar disorder. Nature. 2009;460(7256):748-52.

39. Kraft P, Zeggini E, Ioannidis JP. Replication in genome-wide association studies. Stat Sci. 2009;24(4):561-73.