Statistics: Post hoc power analysis

From MathWiki

Some collected links and comments on post hoc power analysis

This page needs editing


Here are two very wrong things that people try to do with my software:

  • Retrospective power (a.k.a. observed power, post hoc power). You've got the data, did the analysis, and did not achieve "significance." So you compute power retrospectively to see if the test was powerful enough or not. This is an empty question. Of course it wasn't powerful enough -- that's why the result isn't significant. Power calculations are useful for design, not analysis.

(Note: These comments refer to power computed based on the observed effect size and sample size. Considering a different sample size is obviously prospective in nature. Considering a different effect size might make sense, but probably what you really need to do instead is an equivalence test; see Hoenig and Heisey, 2001.)

  • Specify T-shirt effect sizes ("small", "medium", and "large"). This is an elaborate way to arrive at the same sample size that has been used in past social science studies of large, medium, and small size (respectively). The method uses a standardized effect size as the goal. Think about it: for a "medium" effect size, you'll choose the same n regardless of the accuracy or reliability of your instrument, or the narrowness or diversity of your subjects. Clearly, important considerations are being ignored here. "Medium" is definitely not the message!

Post Hoc Power Analysis -- Another View

Joshua Fogel, M.A.

Drs. Levine and Ensom[1] advocate that publication of clinical trials of nonsignificance should state a confidence interval rather than a post hoc power analysis. Their discussion is quite valid. The psychology discipline has advocated for power analysis in reporting of empiric studies.[2]

Psychologists have an illustrious history of rigorous statistical expertise. Unlike other journals, all psychological journals published by the American Psychological Association require reporting of exact statistic values (e.g., F, t) and significance levels. Reporting of effect sizes are also encouraged. This reporting allows others to fully critique the statistical methods of a study.

Interestingly enough, in 1999 a committee of the American Psychological Association convened to discuss this issue among a variety of other statistical issues. In their discussion on power analysis, they concluded, "Once the study is analyzed, confidence intervals replace calculated power in describing results."[3] This policy is required for all submissions to American Psychological Association journals. I hope that in the near future, other journals will advocate this policy as well.

Author from: the Department of Clinical Health Psychology, Yeshiva University, Bronx, New York 10461.

Pharmacotherapy 21(9):1150, 2001. © 2001 Pharmacotherapy Publications

It is never possible to just ask "what is the power of this experiment?". Rather, you must ask "what is the power of this experiment to detect an effect of some specified size?". Which effect size should you use? How large a difference should you be looking for? It only makes sense to do a power analysis when you think about the data scientifically. It isn't purely a statistical question, but rather a scientific one.

Some programs try to take the thinking out of the process by computing only a single value for power. These programs compute the power to detect the effect size (or difference, relative risk, etc.) actually observed in that experiment. The result is sometimes called observed power, and the procedure is sometimes called a post-hoc power analysis or retrospective power analysis.


If your study reached a conclusion that the difference is not statistically significant, then by definition its power to detect the effect actually observed is very low. You learn nothing new by such a calculation. You already know that the difference was not statistically significant, and now you know that the power of the study to detect that particular difference is low. Not helpful. What would be helpful is to know the power of the study to detect some hypothetical difference that you think would have been scientifically or clinically worth detecting.

These articles discuss the futility of post-hoc power analyses:

  1. M Levine and MHH Ensom, Post Hoc Power Analysis: An Idea Whose Time Has Passed, Pharmacotherapy 21:405-409, 2001.
  2. SN Goodman and JA Berlin, The Use of Predicted Confidence Intervals When Planning Experiments and the Misuse of Power When Interpreting the Results, Annals Internal Medicine 121: 200-206, 1994.
  3. Lenth, R. V. (2001), Some Practical Guidelines for Effective Sample Size Determination, The American Statistician, 55, 187-193

Dr. Ke-Hai Yuan:

January 23, 2003 Ke-Hai Yuan

On the Post Hoc Power in Testing Mean Differences

Retrospective or post hoc power analysis is recommended by reviewers and editors of many journals. Few literature has been found that gave a serious study of the post hoc power. When the sample size is large, the observed effect size is a good estimator of the true effect size. One would hope that the post hoc power is also a good estimator of the true power. This paper studies whether such a power estimator provides valuable information about the true power.

Using analytical, numerical and Monte-Carlo approaches, our results show that the estimated power does not provide useful information when the true power is small. It is almost always a biased estimator of the true power. The bias can be negative or positive. Large sample size alone does not guarantee the post hoc power to be a good estimator of the true power. Actually, when the population variance is known, the cumulative distribution function of the post hoc power is solely a function of the population power. This distribution is uniform when the true power equals 0.5 and highly skewed when the true power is near 0 or 1. When the population variance is unknown, the post hoc power behaves essentially the same as when the variance is known.

A pdf file of the paper can be found at

Authors:	Onwuegbuzie, Anthony J.1
Leech, Nancy L.2
Source:	Understanding Statistics; 2004, Vol. 3 Issue 4, p201-230, 30p
Document Type:	Article
Subject Terms:	*EXPERIMENTAL design
STATISTICAL power analysis
	NAICS/Industry Codes5417 Scientific Research and Development Services
Abstract:	This article advocates the use of post hoc power analyses. First, reasons for the nonuse of a priori power analyses are presented. Next, post hoc power is defined and its utility delineated. Third, a step-by-step guide is provided for conducting post hoc power analyses. Fourth, a heuristic example is provided to illustrate how post hoc power can help to rule in/out rival explanations in the presence of statistically nonsignificant findings. Finally, several methods are outlined that describe how post hoc power analyses can be used to improve the design of independent replications. [ABSTRACT FROM AUTHOR]

Related links