Statistics: Gain scores vs. residualized gain scores

From MathWiki

Table of contents


Hand and Taylor (1987) provide a good summary:

"In general, when one is presented with measurements taken at two time points there are several ways in which the analysis may be approached, the the most obvious are either to work with difference scores or to use the first occasion's measurements as covariates in analysing the second. These two approaches can yield different results -- a fact which has led to some confusion in the past (and has been given the name of Lord's paradox). The reason that the results can differ is simply that the two approaches are asking different questions. This is most easily illustrated in the simple two-group comparison case; that is, we have two groups of subjects, each measured at two time points, and we wish to compare the changes the two groups experience. Then the difference-score approach enquires whether there is a difference in average change of the two populations. The covariance approach asks whether a member of group 1 is expected to change more than a member of group 2, given that they have the same initial value. It is this final question that distinguishes the questions." [p. 166]

The critical issue is then to understand which 'statistical question' addresses which 'scientific question'? The answer often hinges on the causal position of the first measurement. If it is a potential 'confounder' or covariate not affected by group membership, then it is correct to use it as a covariate. If it is a 'mediator' or a partial mediator, then the question is more complex. Different models may answer different scientific questions.

Some recent materials on using gain scores vs. using the pre-test as a covariate

Everitt and Pickles, 2004. p. 128, discussion of baseline measurements:

Three ways of using baseline measure:

  1. POST: analyze post measure only -- ignore baseline
  2. CHANGE: difference or gain score
  3. ANCOVA: use baseline as covariate (In this case one can use either the post measure or the gain score as an outcome. The estimated treatment effects will be identical)

Senn (1998) 'Some controversies in planning and analysing multi-centre trials' (

With clinical trials Senn seems to suggest that one should use ANCOVA unless correlation between pre and post is weak (< .2) because ANCOVA gives more power.

Fitzmaurice, Laird and Ware, 2004, have a more detailed discussion in the context of longitudinal data analysis:

Comparing CHANGE and ANCOVA, they say:

"The answer depends critically on whether the data arose from an observational study or a randomized trial. If the study is an observational one, for example, a longitudinal study for the determinants of rate of decline of pulmonary function in adults, it is usually not advisable to employ the analysis of covariance approach becasue the baseline value may be asociated with other variables whose effects are to be studied, raising problems of confouding in an analysis intended to decribe how the pattern of response over time is influenced by the characteristics of study participants.

"For example, individuals who are smokers as adults may have smoked during adolescence. If smoking affected the attained pulmonary function level for young adults, then smoking will likely be associated with pulmonary function level later in adult life, even if cigarette smoking does not influence the rate of decline of pulmonary function with age. Thus, adjustment for baseline pulmonary function level using 'ANCOVA' could introduce an association between smoking status and rate of decline of pulmonary function, even if the unadjusted rates of decline are nearly equivalent in the various smoking groups." (p. 123)

The basic idea here is that controlling for the pre-test might control for other outcomes thereby mitigating the relationship between X and Y. In other words, the pre-test should be treated as a covariate only if it has a status equivalent to a 'pre-treatment' covariate in an experimental study. Typically, the 'treatment' in an observational study precedes the 'pre-test'.

I.e. let's call the pre-test Y0, a predictor of interest X, and the post-test Y.

We want the relationship (in some sense) between X and Y. If Y0 is also related to X as an other outcome, then we don't want to keep Y0 constant when assessing the effect of X on Y.

Note, however, that Y0 might, in the right situation, provide a very valuable control variable provided we can exclude the possibility that it acts as another outcome variables affect by X.

FLW continue: [comments added in square brackets]

"When participants have been randomized to the several treatment groups and the baseline value has been obtained before any study interventions, adjustment for baseline through analysis of covariance is of interest. (Comment: here we can't have Y0 as an alternative outcome) In that setting, the mean resonse at time 1 is independent of treatment assignment. One can then show that the one-degree-of-freedom test for equality of response profiles based on a contrast and the corresponding test based on analysis of covariance represent alternative tests of the same null hypothesis and that the t-test based on the analysis of covariance approach will always be more efficient. That is, the analysis of covariance approach yields estimates of treatment effects with smaller standard errors than those obtained by calculating contrasts."

"The 'conundrum' between CHANGE or ANCOVA in observational data is Lord's Paradox. A good example is comparing weight gain in boys and girls (X). Suppose a measurement is made at the age of 10 (Y0) and again at 15 (Y). Controlling for Y0 will show much greater gain among boys [imagine the ellipses] -- but that gain is a consequence of regression to the mean -- i.e. the within gender slopes relating Y to Y0 are not 1. Looking at unadjusted change scores themselves might show no difference. The problem here is equating weight in boys and girls. A given weight might be heavy for a girl and light for a boy. Regression to the mean via the mitigation of transient effects will result in the boy tending to be relatively lighter and the girl heavier at the time of the post-test.

"In conclusion, it is the study design [and its relationship to research questions] and not issues of statistical precision and power that should primarily determine the choice of analytic methods for adjusting the baseline response."

Baseline score and treatment selection

Maris (1998) Covariance Adjustment Versus Gain Scores—Revisited ( discusses the choice between gain scores and ANCOVA quite thoroughly in relationship with the determinants of selection for treatment. Roughly summarizing:

  • If allocation is related directly to the baseline measure, then ANCOVA is appropriate.
  • If allocation is related to a latent variable, of the baseline and post-treatment outcome are equally reliable measure (given treatment effects), then gain scores are appropriate.

Alternative ways of handling baseline response for longitudinal studies

In longitudinal studies, FLW, p. 126 ff:

  1. retain baseline in outcome vector and make no assumptions about group differences at baseline
  2. retain but assume baseline group means are the same
  3. subtract baseline from all post responses and analyze these differences
  4. use baseline as covariate in analysis of post

Suppose we have Time equal to 0 or 1, response is Y, group is Group, Y0 is the contextual variable equal to Y at time 0, id is individual identifier. As lmer models we would be estimating the bolded effect in each model below:


Y ~ Time + Group + Time:Group + (1|id)


Y ~ Time + Time:Group + (1|id)

3) Raw change scores

I(Y - Y0) ~ Group, subset = Time > 0

4) Adjusted post scores

Y ~ Group + Y0, subset = Time > 0

Note that (4) is equivalent to

4a) Adjusted change scores

I(Y - Y0) ~ Group + Y0, subset = Time > 0

Some references

  • Rogosa, D. R., & Willett, J. B. (1983). Demonstrating the reliability of the difference score in the measurement of change. Journal of Educational Measurement, 20, 335-343.
  • Rogosa, D. (1988). Myths about longitudinal research. In K. W. Schaie, R. T. Campbell, W. M. Meredith, & S. C. Rawlings (Eds.), Methodological issues in aging research (pp. 171-209). New York, NY: Springer Publishing Company.
  • Maris, E. (1998). Covariance adjustment versus gain scores--revisited. Psychological Methods, 3, 309-327.
  • Maxwell, S. E., & Delaney, H. D. (1990). Designing experiments and analyzing data: A model comparison approach. Belmont, CA: Wadsworth.
  • Willett, J. B. (1989). Questions And Answers In The Measurement Of Change. ( In Ernest Z. Rothkopf (Ed.), Review of Research in Education, Volume 15. Washington, D.C.: American Education Research Association, 345-422.