Statistical significance: Difference between revisions

From Citizendium
Jump to navigation Jump to search
imported>Robert Badgett
mNo edit summary
 
(22 intermediate revisions by 3 users not shown)
Line 5: Line 5:
==Hypothesis testing==
==Hypothesis testing==
Usually, the null hypothesis is the there is no difference between two samples in regard to the factor being studied.<ref name="isbn0-910133-36-0">{{cite book |author=Mosteller, Frederick; Bailar, John Christian |authorlink= |editor= |others= |title=Medical uses of statistics |edition= |language= |publisher=NEJM Books |location=Boston, Mass |year=1992 |origyear= |pages= |quote= |isbn=0-910133-36-0 |oclc= |doi= |url= |accessdate=}} [http://books.google.com/books?isbn=0910133360 Google Books]</ref>
Usually, the null hypothesis is the there is no difference between two samples in regard to the factor being studied.<ref name="isbn0-910133-36-0">{{cite book |author=Mosteller, Frederick; Bailar, John Christian |authorlink= |editor= |others= |title=Medical uses of statistics |edition= |language= |publisher=NEJM Books |location=Boston, Mass |year=1992 |origyear= |pages= |quote= |isbn=0-910133-36-0 |oclc= |doi= |url= |accessdate=}} [http://books.google.com/books?isbn=0910133360 Google Books]</ref>
===Choosing a statistical method===
The choice of statistical method to use in an analysis is determined by:<ref name="urlAn Overview: Choosing the Correct Statistical Test">{{cite web |url=http://www-users.cs.umn.edu/~ludford/stat_overview.htm |title=An Overview: Choosing the Correct Statistical Test |author=Ludford PJ |authorlink= |coauthors= |date= |format= |work= |publisher=University of Minnesota |pages= |language= |archiveurl= |archivedate= |quote= |accessdate=}}</ref><ref name="urlHow to choose a statistical test">{{cite web |url=http://www.graphpad.com/www/Book/Choose.htm |title=How to choose a statistical test |author= |authorlink= |coauthors= |date= |format= |work= |publisher=GraphPad |pages= |language= |archiveurl= |archivedate= |quote= |accessdate=}}</ref><ref name="urlChoosing the Right Statistical Test">{{cite web |url=http://www.socr.ucla.edu/Applets.dir/ChoiceOfTest.html |title=Choosing the Right Statistical Test |author=Dinov I |authorlink= |coauthors= |date= |format= |work= |publisher=University of California at Los Angeles |pages= |language= |archiveurl= |archivedate= |quote= |accessdate=}}</ref>
* Type of data, for example: continuous, categorical, dichotomous
* Whether the data is normally distributed. Various [[normality test]]s such as the Shapiro-Wilk<ref>Shapiro, S. S. and Wilk, M. B. (1965). "[http://biomet.oxfordjournals.org/cgi/pdf_extract/52/3-4/591 An analysis of variance test for normality (complete samples)]", ''Biometrika'', Vol. 52, No. 3/4, pages 591–611. {{doi|10.1093/biomet/52.3-4.591}}</ref> are available.<ref name="doi10.2307/2286009">{{cite journal
| first = M. A. | last = Stephens |  year = 1974 | title = EDF Statistics for Goodness of Fit and Some Comparisons  | journal = Journal of the American Statistical Association  | volume = 69 | issue = | pages = 730–737 | id = | url =  | doi = 10.2307/2286009}}</ref>
* Whether the samples are independent or paired
* Number of samples to compare


==Statistical errors==
==Statistical errors==
Line 14: Line 22:


===Type II error (beta error)===
===Type II error (beta error)===
Type II error, also called beta error, is the acceptance of an incorrect null hypothesis. This error may occur when the sample size was insufficient to have power to detect a statistically significant difference.<ref name="pmid7647644">{{cite journal |author=Altman DG, Bland JM |title=Absence of evidence is not evidence of absence |journal=BMJ (Clinical research ed.) |volume=311 |issue=7003 |pages=485 |year=1995 |month=August |pmid=7647644 |pmc=2550545 |doi= |url=http://bmj.com/cgi/pmidlookup?view=long&pmid=7647644 |issn=}}</ref><ref name="pmid3985731">{{cite journal |author=Detsky AS, Sackett DL |title=When was a "negative" clinical trial big enough? How many patients you needed depends on what you found |journal=Archives of internal medicine |volume=145 |issue=4 |pages=709–12 |year=1985 |month=April |pmid=3985731 |doi= |url= |issn=}}</ref><ref name="pmid6881780">{{cite journal |author=Young MJ, Bresnitz EA, Strom BL |title=Sample size nomograms for interpreting negative clinical studies |journal=Annals of internal medicine |volume=99 |issue=2 |pages=248–51 |year=1983 |month=August |pmid=6881780 |doi= |url= |issn=}}</ref>
Type II error, also called beta error, is the acceptance of an incorrect null hypothesis. This error may occur when the sample size was insufficient to have power to detect a statistically significant difference.<ref name="pmid7647644">{{cite journal |author=Altman DG, Bland JM |title=Absence of evidence is not evidence of absence |journal=BMJ (Clinical research ed.) |volume=311 |issue=7003 |pages=485 |year=1995 |month=August |pmid=7647644 |pmc=2550545 |doi= |url=http://bmj.com/cgi/pmidlookup?view=long&pmid=7647644 |issn=}}</ref><ref name="pmid3985731">{{cite journal| author=Detsky AS, Sackett DL| title=When was a "negative" clinical trial big enough? How many patients you needed depends on what you found. | journal=Arch Intern Med | year= 1985 | volume= 145 | issue= 4 | pages= 709-12 | pmid=3985731 | doi=10.1001/archinte.1985.00360040141030 | pmc= | url= }} </ref><ref name="pmid6881780">{{cite journal |author=Young MJ, Bresnitz EA, Strom BL |title=Sample size nomograms for interpreting negative clinical studies |journal=Annals of internal medicine |volume=99 |issue=2 |pages=248–51 |year=1983 |month=August |pmid=6881780 |doi= |url= |issn=}}</ref>


==Philosophical approaches to error testing==
==Philosophical approaches to error testing==
===Frequentist method===
===Frequentist method===
This approach uses mathematical formulas to calculate deductive probabilities (p-value) of an experimental result.<ref name="pmid10383371">{{cite journal |author=Goodman SN |title=Toward evidence-based medical statistics. 1: The P value fallacy |journal=Ann Intern Med |volume=130 |pages=995–1004 |year=1999 |pmid=10383371 |doi=|url=http://www.annals.org/cgi/content/full/130/12/995}}</ref> This approach can generate [[confidence interval]]s.
This approach uses mathematical formulas to calculate deductive probabilities (p-value) of an experimental result.<ref name="pmid10383371"/> This approach can generate [[confidence interval]]s.


A problem with the [[frequentist]] analyses of p-values is that they may overstate "statistical significance".<ref name="pmid10383371">{{cite journal | author = Goodman SN | title = Toward evidence-based medical statistics. 1: The P value fallacy. | journal = Ann Intern Med | volume = 130 | issue = 12 | pages = 995–1004 | year = 1999 | url=http://www.annals.org/cgi/content/full/130/12/1005 | pmid = 10383371}}</ref><ref name="pmid10383350">{{cite journal | author = Goodman SN | title = Toward evidence-based medical statistics. 2: The Bayes factor. | journal = Ann Intern Med | volume = 130 | issue = 12 | pages = 1005–13 | year = 1999|url=http://www.annals.org/cgi/content/full/130/12/1005 | pmid = 10383350}}</ref>
A problem with the [[frequentist]] analyses of p-values is that they may overstate "statistical significance".<ref name="pmid10383371"/><ref name="pmid10383350">{{cite journal | author = Goodman SN | title = Toward evidence-based medical statistics. 2: The Bayes factor. | journal = Ann Intern Med | volume = 130 | issue = 12 | pages = 1005–13 | year = 1999|url=http://www.annals.org/cgi/content/full/130/12/1005 | pmid = 10383350}}</ref>


===Likelihood or Bayesian method===
===Likelihood or Bayesian method===
Some argue that the P-value should be interpreted in light of how plausible is the hypothesis based on the totality of prior research and physiologic knowledge.<ref name="pmid3573245">{{cite journal |author=Browner WS, Newman TB |title=Are all significant P values created equal? The analogy between diagnostic tests and clinical research |journal=JAMA |volume=257  |pages=2459–63 |year=1987 |pmid=3573245 |doi=}}</ref><ref name="pmid10383371"/><ref name="pmid10383350">{{cite journal |author=Goodman SN |title=Toward evidence-based medical statistics. 2: The Bayes factor |journal=Ann Intern Med |volume=130 |pages=1005–13 |year=1999 |pmid=10383350 |doi=|url=http://www.annals.org/cgi/content/full/130/12/1005}}</ref><ref name="pmid15172393">{{cite journal |author=Diamond GA, Kaul S |title=Prior convictions: Bayesian approaches to the analysis and interpretation of clinical megatrials |journal=J. Am. Coll. Cardiol. |volume=43 |issue=11 |pages=1929–39 |year=2004 |month=June |pmid=15172393 |doi=10.1016/j.jacc.2004.01.035 |url=http://linkinghub.elsevier.com/retrieve/pii/S0735109704004784 |issn=}}</ref><ref name="pmid18611956">{{cite journal |author=Ioannidis JP |title=Effect of formal statistical significance on the credibility of observational associations |journal=Am. J. Epidemiol. |volume=168 |issue=4 |pages=374–83; discussion 384–90 |year=2008 |month=August |pmid=18611956 |doi=10.1093/aje/kwn156 |url=http://aje.oxfordjournals.org/cgi/pmidlookup?view=long&pmid=18611956 |issn=}}</ref> This approach can generate Bayesian 95% credibility intervals.<ref name="isbn1-58488-410-X">{{cite book |author=Gelfand, Alan E.; Sudipto Banerjee; Carlin, Bradley P. |authorlink= |editor= |others= |title=Hierarchical Modeling and Analysis for Spatial Data (Monographs on Statistics and Applied Probability) |edition= |language= |publisher=Chapman & Hall/CRC |location=Boca Raton |year=2003 |origyear= |pages= |quote= |isbn=1-58488-410-X |oclc= |doi= |url= |accessdate=|id={{LCC| QA278.2 .B36}}}}</ref>
Some argue that the P-value should be interpreted with [[Bayes Theorem]], or in other words, in light of how plausible is the hypothesis based on the totality of prior research and physiologic knowledge.<ref name="pmid3573245">{{cite journal |author=Browner WS, Newman TB |title=Are all significant P values created equal? The analogy between diagnostic tests and clinical research |journal=JAMA |volume=257  |pages=2459–63 |year=1987 |pmid=3573245 |doi=}}</ref><ref name="pmid10383371"/><ref name="pmid10383350"/><ref name="pmid15172393">{{cite journal |author=Diamond GA, Kaul S |title=Prior convictions: Bayesian approaches to the analysis and interpretation of clinical megatrials |journal=J. Am. Coll. Cardiol. |volume=43 |issue=11 |pages=1929–39 |year=2004 |month=June |pmid=15172393 |doi=10.1016/j.jacc.2004.01.035 |url=http://linkinghub.elsevier.com/retrieve/pii/S0735109704004784 |issn=}}</ref><ref name="pmid18611956">{{cite journal |author=Ioannidis JP |title=Effect of formal statistical significance on the credibility of observational associations |journal=Am. J. Epidemiol. |volume=168 |issue=4 |pages=374–83; discussion 384–90 |year=2008 |month=August |pmid=18611956 |doi=10.1093/aje/kwn156 |url=http://aje.oxfordjournals.org/cgi/pmidlookup?view=long&pmid=18611956 |issn=}}</ref> This approach can generate Bayesian 95% credibility intervals.<ref name="isbn1-58488-410-X">{{cite book |author=Gelfand, Alan E.; Sudipto Banerjee; Carlin, Bradley P. |authorlink= |editor= |others= |title=Hierarchical Modeling and Analysis for Spatial Data (Monographs on Statistics and Applied Probability) |edition= |language= |publisher=Chapman & Hall/CRC |location=Boca Raton |year=2003 |origyear= |pages= |quote= |isbn=1-58488-410-X |oclc= |doi= |url= |accessdate=|id={{LCC| QA278.2 .B36}}}}</ref> Details of Bayesian calculations have been reviewed.<ref name="pmid16446352">{{cite journal |author=Greenland S |title=Bayesian perspectives for epidemiological research: I. Foundations and basic methods |journal=Int J Epidemiol |volume=35 |issue=3 |pages=765–75 |year=2006 |month=June |pmid=16446352 |doi=10.1093/ije/dyi312 |url=http://ije.oxfordjournals.org/cgi/pmidlookup?view=long&pmid=16446352 |issn=}}</ref>
 
The Bayesian method has been proposed for adaptive trial designs for comparative effectiveness research.<ref name="pmid19567619">{{cite journal |author=Luce BR, Kramer JM, Goodman SN, ''et al.'' |title=Rethinking Randomized Clinical Trials for Comparative Effectiveness Research: The Need for Transformational Change |journal=Ann. Intern. Med. |volume=151 |issue=3 |pages= |year=2009 |month=June |pmid=19567619 |doi= |url=http://www.annals.org/cgi/content/full/0000605-200908040-00126v1 |issn=}}</ref> In the United States, [[Medicare]]'s [[Centers for Medicare and Medicaid Services]] (CMS) is investigating this role.<ref name="urlHealth Care: Technology Assessment Subdirectory Page">{{cite web |url=http://www.ahrq.gov/clinic/techix.htm#progress |title=Health Care: Technology Assessment Subdirectory Page |author=Anonymous |authorlink= |coauthors= |date= |format= |work= |publisher=Agency for Healthcare Research and Quality |pages= |language=English |archiveurl= |archivedate= |quote= |accessdate=2009-08-03}}</ref>


Bayesian inference:<ref name="pmid10383350">{{cite journal |author=Goodman SN |title=Toward evidence-based medical statistics. 2: The Bayes factor |journal=Ann Intern Med |volume=130 |pages=1005–13 |year=1999 |pmid=10383350 |doi=|url=http://www.annals.org/cgi/content/full/130/12/1005}}</ref>
Bayesian inference:<ref name="pmid10383350"/>
:<math>\text{Prior Odds of Null Hypothesis}\ *\ \text{Bayes Factor}\ =\ \text{Posterior Odds of Null Hypothesis}</math>
:<math>\text{Posterior Odds of Null Hypothesis}\ =\ \text{Prior Odds of Null Hypothesis}\ *\ \text{Bayes Factor}</math>


The Bayesian analysis creates a Bayes Factor. Unlike the traditional P-value, the Bayes factor is not a probability of rejecting the null hypothesis, but is a  ratio of probabilities. A value greater than 1 supports the null hypotheses, whereas a value less than 1 supports the alternative hypothesis. The equation for the Bayes Factor is:<ref name="pmid10383350">{{cite journal |author=Goodman SN |title=Toward evidence-based medical statistics. 2: The Bayes factor |journal=Ann Intern Med |volume=130 |pages=1005–13 |year=1999 |pmid=10383350 |doi=|url=http://www.annals.org/cgi/content/full/130/12/1005}}</ref>
The Bayesian analysis creates a Bayes Factor. Unlike the traditional P-value, the Bayes factor is not a probability of rejecting the null hypothesis, but is a  ratio of probabilities. The Bayes Factor is a [[likelihood ratio]]. A value greater than 1 supports the null hypotheses, whereas a value less than 1 supports the alternative hypothesis. The equation for the Bayes Factor is:<ref name="pmid10383350"/>
:<math>\text{Bayes Factor}\ =\ \frac{\text{Probability of the null hypothesis given the data found}}{\text{Probability of the null hypothesis given the data found}}</math>
:<math>\text{Bayes Factor}\ =\ \frac{\text{Probability of the data found given the null hypothesis}}{\text{Probability of the data found given the alternative hypothesis}}</math>


Example of a coin flip that comes up heads in one of four tosses:
Example of a coin flip that comes up heads in one of four tosses, is the coin unbiased or is the chance of heads in each toss only 1/4?:
:<math>\text{Bayes Factor}\ =\ \frac{4\ *\ \left(\frac{1}{2}\right)^3\ *\ \left(\frac{1}{2}\right)^1}{\ 4\ *\ \left(\frac{3}{4}\right)^3\ *\ \left(\frac{1}{4}\right)^1}\ =\ \frac{4\ *\ \frac{1}{16}}{4\ *\ \frac{27}{256}} = 0.59</math>
:<math>\text{Bayes Factor}\ =\ \frac{4\ *\ \left(\frac{1}{2}\right)^3\ *\ \left(\frac{1}{2}\right)^1}{\ 4\ *\ \left(\frac{3}{4}\right)^3\ *\ \left(\frac{1}{4}\right)^1}\ =\ \frac{4\ *\ \frac{1}{16}}{4\ *\ \frac{27}{256}} = 0.59</math>


Line 41: Line 47:
-->
-->


Goodman gives the following three methods of interpreting an example Bayes Factor of 1/2:<ref name="pmid10383350">{{cite journal |author=Goodman SN |title=Toward evidence-based medical statistics. 2: The Bayes factor |journal=Ann Intern Med |volume=130 |pages=1005–13 |year=1999 |pmid=10383350 |doi=|url=http://www.annals.org/cgi/content/full/130/12/1005}}</ref>
Goodman gives the following three methods of interpreting an example Bayes Factor of 1/2:<ref name="pmid10383350"/>
# ''Objective probability:'' "The observed results are half as probable under the null hypothesis as they are under the alternative."
# ''Objective probability:'' "The observed results are half as probable under the null hypothesis as they are under the alternative."
# ''Inductive evidence:'' "The evidence supports the null hypothesis half as strongly as it does the alternative."
# ''Inductive evidence:'' "The evidence supports the null hypothesis half as strongly as it does the alternative."
# ''Subjective probability'': "The odds of the null hypothesis relative to the alternative hypothesis after the experiment are half what they were before the experiment."
# ''Subjective probability'': "The odds of the null hypothesis relative to the alternative hypothesis after the experiment are half what they were before the experiment."


The Minimum Bayes Factor is proposed by Goodman as another way to help readers make Bayesian interpretations if they are accustomed to p-values:<ref name="pmid10383350">{{cite journal |author=Goodman SN |title=Toward evidence-based medical statistics. 2: The Bayes factor |journal=Ann Intern Med |volume=130 |pages=1005–13 |year=1999 |pmid=10383350 |doi=|url=http://www.annals.org/cgi/content/full/130/12/1005}}</ref>
The Minimum Bayes Factor is proposed by Goodman as another way to help readers make Bayesian interpretations if they are accustomed to p-values:<ref name="pmid10383350"/>
:<math>\text{Minimum Bayes Factor }\ =\  e^\left(-\frac{Z^2}{2}\right)</math>
:<math>\text{Minimum Bayes Factor }\ =\  e^\left(-\frac{Z^2}{2}\right)</math>


Note that the Minimum Bayes Factor when p = 0.05, or Z= 1.96, is 0.15. This Bayes Factor leads to a posterior probability of 13%, far higher than the 5% probability calculated using frequentist statistics.
Note that the Minimum Bayes Factor when p = 0.05, or Z= 1.96, is 0.15. It corresponds to the likelihood ratio for comparing the hypotheses that the mean is zero of a normally distributed variable Z with variance one, with the hypothesis that the mean is 1.96, when we observe the variable Z take the observed value z=1.96. Under classical (frequentist) hypothesis testing, we would just reject the null hypothesis that the mean is zero against the alternative that it is not zero, when testing with significance level 0.05, if we observed this value of Z. From the Bayesian point of view, this Bayes Factor leads to a posterior probability of 13%, if the prior odds were 50-50.<ref name="pmid10383350"/> However it is suspect to be computing the Bayes factor or the posterior odds for a hypothesis, mean is 1.96, which is actually suggested by the data itself, z=1.06.


{| class="wikitable" align="right"
{| class="wikitable" align="right"
Line 67: Line 73:
| < 0.010|| "decisive support"  
| < 0.010|| "decisive support"  
|}
|}
An empiric comparison of frequentist and Bayesian approaches to the analyses of [[randomized controlled trial]]s has been done.<ref name="pmid18947971">{{cite journal| author=Wijeysundera DN,  Austin PC, Hux JE, Beattie WS, Laupacis A| title=Bayesian statistical  inference enhances the interpretation of contemporary randomized  controlled trials. | journal=J Clin Epidemiol | year= 2009 | volume= 62 |  issue= 1 | pages= 13-21.e5 | pmid=18947971 |  doi=10.1016/j.jclinepi.2008.07.006 | pmc= |  url=http://www.ncbi.nlm.nih.gov/entrez/eutils/elink.fcgi?dbfrom=pubmed&tool=sumsearch.org/cite&retmode=ref&cmd=prlinks&id=18947971  }} </ref>
====Applications of Bayesian approach====
;Interim analyses of randomized controlled trials.
A Bayesian approach to interim analysis may help reduce bias and adjust the estimate of effect in [[randomized controlled trial]]s.<ref name="pmid17577008">{{cite journal |author=Goodman SN |title=Stopping at nothing? Some dilemmas of data monitoring in clinical trials |journal=Ann. Intern. Med. |volume=146 |issue=12 |pages=882–7 |year=2007 |month=June |pmid=17577008 |doi= |url=http://www.annals.org/cgi/content/full/146/12/882 |issn=}}</ref>
;Subgroup analyses of randomized controlled trials.
Bayesian analyses provide an alternative to Bonferroni adjustments when testing significance of multiple comparisons.<ref name="pmid18453632">{{cite journal |author=Greenland S |title=Multiple comparisons and association selection in general epidemiology |journal=Int J Epidemiol |volume=37 |issue=3 |pages=430–4 |year=2008 |month=June |pmid=18453632 |doi=10.1093/ije/dyn064 |url=http://ije.oxfordjournals.org/cgi/pmidlookup?view=long&pmid=18453632 |issn=}}</ref>
;Adaptive trial designs.
The Bayesian method has been proposed for adaptive trial designs for comparative effectiveness research.<ref name="pmid19567619">{{cite journal |author=Luce BR, Kramer JM, Goodman SN, ''et al.''  |title=Rethinking Randomized Clinical Trials for Comparative  Effectiveness Research: The Need for Transformational Change  |journal=Ann. Intern. Med. |volume=151 |issue=3 |pages= |year=2009  |month=June |pmid=19567619 |doi= |url=http://www.annals.org/cgi/content/full/0000605-200908040-00126v1 |issn=}}</ref> In the United States, [[Medicare]]'s [[Centers for Medicare and Medicaid Services]] (CMS) is investigating this role.<ref name="urlHealth Care: Technology Assessment Subdirectory Page">{{cite web |url=http://www.ahrq.gov/clinic/techix.htm#progress  |title=Health Care: Technology Assessment Subdirectory Page  |author=Anonymous |authorlink= |coauthors= |date= |format= |work=  |publisher=Agency for Healthcare Research and Quality |pages=  |language=English |archiveurl= |archivedate= |quote=  |accessdate=2009-08-03}}</ref>
;Interpretation of clinical practice guidelines.
Diamond has proposed combining a Bayesian statistical approach with an evidentiary classification for the interpretation of [[clinical practice guideline]]s.<ref name="pmid19667308">{{cite journal| author=Diamond GA, Kaul S| title=Bayesian classification of clinical practice guidelines. | journal=Arch Intern Med | year= 2009 | volume= 169 | issue= 15 | pages= 1431-5 | pmid=19667308 | doi=10.1001/archinternmed.2009.235 | pmc= | url=http://www.ncbi.nlm.nih.gov/entrez/eutils/elink.fcgi?dbfrom=pubmed&tool=sumsearch.org/cite&retmode=ref&cmd=prlinks&id=19667308  }} </ref>


==References==
==References==
<references/>
<references/>
==External links==
* [http://sumsearch.org/calc/ Bayesian calculator][[Category:Suggestion Bot Tag]]

Latest revision as of 06:00, 22 October 2024

This article is developing and not approved.
Main Article
Discussion
Related Articles  [?]
Bibliography  [?]
External Links  [?]
Citable Version  [?]
Video [?]
 
This editable Main Article is under development and subject to a disclaimer.
Plot of the standard normal probability density function.[1]

In statistics, statistical significance is a "term indicating that the results obtained in an analysis of study data are unlikely to have occurred by chance, and the null hypothesis is rejected. When statistically significant, the probability of the observed results, given the null hypothesis, falls below a specified level of probability (most often P < 0.05)."[2] The P-value, which is used to represent the likelihood the observed results are due to chance, is defined at "the probability, under the assumption of no effect or no difference (the null hypothesis), of obtaining a result equal to or more extreme than what was actually observed."[3]

Hypothesis testing

Usually, the null hypothesis is the there is no difference between two samples in regard to the factor being studied.[4]

Choosing a statistical method

The choice of statistical method to use in an analysis is determined by:[5][6][7]

  • Type of data, for example: continuous, categorical, dichotomous
  • Whether the data is normally distributed. Various normality tests such as the Shapiro-Wilk[8] are available.[9]
  • Whether the samples are independent or paired
  • Number of samples to compare

Statistical errors

Two errors can occur in assessing the probability that the null hypothesis is true:

Type I error (alpha error)

Type I error, also called alpha error, is the the rejection of a correct null hypothesis. The probability of this is usually expressed by the p-value. Usually the null hypothesis is rejected if the p-value, or the chance of a type I error, is less than 5%. However, this threshold may be adjusted when multiple hypotheses are tested.[10]

Type II error (beta error)

Type II error, also called beta error, is the acceptance of an incorrect null hypothesis. This error may occur when the sample size was insufficient to have power to detect a statistically significant difference.[11][12][13]

Philosophical approaches to error testing

Frequentist method

This approach uses mathematical formulas to calculate deductive probabilities (p-value) of an experimental result.[3] This approach can generate confidence intervals.

A problem with the frequentist analyses of p-values is that they may overstate "statistical significance".[3][14]

Likelihood or Bayesian method

Some argue that the P-value should be interpreted with Bayes Theorem, or in other words, in light of how plausible is the hypothesis based on the totality of prior research and physiologic knowledge.[15][3][14][16][17] This approach can generate Bayesian 95% credibility intervals.[18] Details of Bayesian calculations have been reviewed.[19]

Bayesian inference:[14]

The Bayesian analysis creates a Bayes Factor. Unlike the traditional P-value, the Bayes factor is not a probability of rejecting the null hypothesis, but is a ratio of probabilities. The Bayes Factor is a likelihood ratio. A value greater than 1 supports the null hypotheses, whereas a value less than 1 supports the alternative hypothesis. The equation for the Bayes Factor is:[14]

Example of a coin flip that comes up heads in one of four tosses, is the coin unbiased or is the chance of heads in each toss only 1/4?:


Goodman gives the following three methods of interpreting an example Bayes Factor of 1/2:[14]

  1. Objective probability: "The observed results are half as probable under the null hypothesis as they are under the alternative."
  2. Inductive evidence: "The evidence supports the null hypothesis half as strongly as it does the alternative."
  3. Subjective probability: "The odds of the null hypothesis relative to the alternative hypothesis after the experiment are half what they were before the experiment."

The Minimum Bayes Factor is proposed by Goodman as another way to help readers make Bayesian interpretations if they are accustomed to p-values:[14]

Note that the Minimum Bayes Factor when p = 0.05, or Z= 1.96, is 0.15. It corresponds to the likelihood ratio for comparing the hypotheses that the mean is zero of a normally distributed variable Z with variance one, with the hypothesis that the mean is 1.96, when we observe the variable Z take the observed value z=1.96. Under classical (frequentist) hypothesis testing, we would just reject the null hypothesis that the mean is zero against the alternative that it is not zero, when testing with significance level 0.05, if we observed this value of Z. From the Bayesian point of view, this Bayes Factor leads to a posterior probability of 13%, if the prior odds were 50-50.[14] However it is suspect to be computing the Bayes factor or the posterior odds for a hypothesis, mean is 1.96, which is actually suggested by the data itself, z=1.06.

Interpretation of the Bayes Factor[20]
Bayes Factor
(B)
Interpretation of support for the alternative hypothesis
> 1.00 reduces the odds of the null hypothesis
0.32–1.00 "not worth more than a bare mention"
0.100–0.320 "substantial support"
0.032–0.100 "strong support"
0.010–0.032 "very strong support"
< 0.010 "decisive support"

An empiric comparison of frequentist and Bayesian approaches to the analyses of randomized controlled trials has been done.[21]

Applications of Bayesian approach

Interim analyses of randomized controlled trials.

A Bayesian approach to interim analysis may help reduce bias and adjust the estimate of effect in randomized controlled trials.[22]

Subgroup analyses of randomized controlled trials.

Bayesian analyses provide an alternative to Bonferroni adjustments when testing significance of multiple comparisons.[23]

Adaptive trial designs.

The Bayesian method has been proposed for adaptive trial designs for comparative effectiveness research.[24] In the United States, Medicare's Centers for Medicare and Medicaid Services (CMS) is investigating this role.[25]

Interpretation of clinical practice guidelines.

Diamond has proposed combining a Bayesian statistical approach with an evidentiary classification for the interpretation of clinical practice guidelines.[26]

References

  1. Anonymous (2006). “Normal Distribution”, NIST/SEMATECH e-Handbook of Statistical Methods. Gaithersburg, MD: National Institute of Standards and Technology. Retrieved on 2009-02-10. 
  2. Anonymous. JAMAevidence Glossary. American Medical Association. Retrieved on 2009-02-10.
  3. 3.0 3.1 3.2 3.3 Goodman SN (1999). "Toward evidence-based medical statistics. 1: The P value fallacy". Ann Intern Med 130: 995–1004. PMID 10383371[e]
  4. Mosteller, Frederick; Bailar, John Christian (1992). Medical uses of statistics. Boston, Mass: NEJM Books. ISBN 0-910133-36-0.  Google Books
  5. Ludford PJ. An Overview: Choosing the Correct Statistical Test. University of Minnesota.
  6. How to choose a statistical test. GraphPad.
  7. Dinov I. Choosing the Right Statistical Test. University of California at Los Angeles.
  8. Shapiro, S. S. and Wilk, M. B. (1965). "An analysis of variance test for normality (complete samples)", Biometrika, Vol. 52, No. 3/4, pages 591–611. DOI:10.1093/biomet/52.3-4.591
  9. Stephens, M. A. (1974). "EDF Statistics for Goodness of Fit and Some Comparisons". Journal of the American Statistical Association 69: 730–737. DOI:10.2307/2286009. Research Blogging.
  10. Hochberg, Yosef (1988-12-01). "A sharper Bonferroni procedure for multiple tests of significance". Biometrika 75 (4): 800-802. DOI:10.1093/biomet/75.4.800. Retrieved on 2008-10-15. Research Blogging.
  11. Altman DG, Bland JM (August 1995). "Absence of evidence is not evidence of absence". BMJ (Clinical research ed.) 311 (7003): 485. PMID 7647644. PMC 2550545[e]
  12. Detsky AS, Sackett DL (1985). "When was a "negative" clinical trial big enough? How many patients you needed depends on what you found.". Arch Intern Med 145 (4): 709-12. DOI:10.1001/archinte.1985.00360040141030. PMID 3985731. Research Blogging.
  13. Young MJ, Bresnitz EA, Strom BL (August 1983). "Sample size nomograms for interpreting negative clinical studies". Annals of internal medicine 99 (2): 248–51. PMID 6881780[e]
  14. 14.0 14.1 14.2 14.3 14.4 14.5 14.6 Goodman SN (1999). "Toward evidence-based medical statistics. 2: The Bayes factor.". Ann Intern Med 130 (12): 1005–13. PMID 10383350.
  15. Browner WS, Newman TB (1987). "Are all significant P values created equal? The analogy between diagnostic tests and clinical research". JAMA 257: 2459–63. PMID 3573245[e]
  16. Diamond GA, Kaul S (June 2004). "Prior convictions: Bayesian approaches to the analysis and interpretation of clinical megatrials". J. Am. Coll. Cardiol. 43 (11): 1929–39. DOI:10.1016/j.jacc.2004.01.035. PMID 15172393. Research Blogging.
  17. Ioannidis JP (August 2008). "Effect of formal statistical significance on the credibility of observational associations". Am. J. Epidemiol. 168 (4): 374–83; discussion 384–90. DOI:10.1093/aje/kwn156. PMID 18611956. Research Blogging.
  18. Gelfand, Alan E.; Sudipto Banerjee; Carlin, Bradley P. (2003). Hierarchical Modeling and Analysis for Spatial Data (Monographs on Statistics and Applied Probability). Boca Raton: Chapman & Hall/CRC. LCC QA278.2 .B36. ISBN 1-58488-410-X. 
  19. Greenland S (June 2006). "Bayesian perspectives for epidemiological research: I. Foundations and basic methods". Int J Epidemiol 35 (3): 765–75. DOI:10.1093/ije/dyi312. PMID 16446352. Research Blogging.
  20. Jeffreys, Harold [1961] (1998). Theory of probability. Oxford: Clarendon Press. ISBN 0-19-850368-7. 
  21. Wijeysundera DN, Austin PC, Hux JE, Beattie WS, Laupacis A (2009). "Bayesian statistical inference enhances the interpretation of contemporary randomized controlled trials.". J Clin Epidemiol 62 (1): 13-21.e5. DOI:10.1016/j.jclinepi.2008.07.006. PMID 18947971. Research Blogging.
  22. Goodman SN (June 2007). "Stopping at nothing? Some dilemmas of data monitoring in clinical trials". Ann. Intern. Med. 146 (12): 882–7. PMID 17577008[e]
  23. Greenland S (June 2008). "Multiple comparisons and association selection in general epidemiology". Int J Epidemiol 37 (3): 430–4. DOI:10.1093/ije/dyn064. PMID 18453632. Research Blogging.
  24. Luce BR, Kramer JM, Goodman SN, et al. (June 2009). "Rethinking Randomized Clinical Trials for Comparative Effectiveness Research: The Need for Transformational Change". Ann. Intern. Med. 151 (3). PMID 19567619[e]
  25. Anonymous. Health Care: Technology Assessment Subdirectory Page (English). Agency for Healthcare Research and Quality. Retrieved on 2009-08-03.
  26. Diamond GA, Kaul S (2009). "Bayesian classification of clinical practice guidelines.". Arch Intern Med 169 (15): 1431-5. DOI:10.1001/archinternmed.2009.235. PMID 19667308. Research Blogging.

External links