# Statistics: Propensity scores

 Table of contents

## A few notes on propensity scores

The propensity score is a predictor of the probability of being in a treatment group versus a control group. We can think of an analogue in the conditional expectation of a continuous predictor variable using other covariates. Although the resulting prediction is not a propensity score in the strict sense it has, nevertheless, some linear properties that are similar to those of the propensity score.

I am sure that the following results are well known and I would be very interested in being made aware of a source.

## Basic concept

The idea of the propensity score is based on the idea of a coarsest conditioning partition. For a linear analogue, consider a regression of a vector of responses, $Y\,\!$, on two sets of predictor variables contained in two matrices $X_1\,\!$ and $X_2\,\!$ The vector of least-squares regression coefficients on $X_1\,\!$ is $\hat{\beta}_1$ where

$Y = X_1 \hat{\beta}_1 + X_2 \hat{\beta}_2 + e$ with $e' [X_1 X_2] = 0 \,$

Let $Q_2 = I - X_2(X'_2X_2)^{-1}X'_2\,\!$ be the projection matrix onto the orthogonal complement of $\operatorname{span}(X_2)\,\!$.

Then:

$Q_2 Y =Q_2 X_1 \hat{\beta}_1 + Q_2 X_2 \hat{\beta}_2 + Q_2 e= Q_2 X_1 \hat{\beta}_1 + 0 + Q_2 e$

Now $Q_2 e \perp Q_2 X_1$ since $e'Q'_2Q_2 X_1 = e'Q_2 X_1 =e'(I - P_2)X_1=e'X_1 - e'P_2X_1 =0\,\!$ since $e \perp X_1$ and $e \perp X_2$ .

Thus, $\hat{\beta}_1$ is the regression coefficient of $Q_2Y\,\!$ on $Q_2 X_1\,\!$ .

This is the basis of added-variable plots and an early theorem of Econometrics known as the Frisch-Waugh-Lovell theorem. [Thanks to Barry Smith]

We could take this a few steps further and show how the residuals are the same and R2 is the same as partial R2 in the multiple regression, etc. But all we need to do here is to contemplate the resulting formula for $\hat{\beta}_1$:

$\hat{\beta}_1 = (X'_1 Q'_2 Q_2 X_1)^{-1} ( X'_1 Q'_2 Q_2Y )$
$= (X'_1 Q_2 X_1)^{-1} ( X'_1 Q_2Y ) = \left([Q_2 X_1]' [Q_2 X_1] \right)^{-1} ( [Q_2 X_1]'Y )$

The formula reveals that, given X1, $\hat{\beta}_1$ depends on $X_2\,\!$ only through $Q_2 X_1\,\!$ or, equivalently, P2X1 since Q2X1 = X1P2X1.

In other words, if we replace $X_2\,\!$ with a different set of variables, $X_3\,\!$, say, then $\hat{\beta}_1$ will have the same value if $P_2 X_1 = P_3 X_1\,\!$ that is, if the predicted values from regressing $X_1\,\!$ on $X_2\,\!$ are the same as those from regressing $X_1\,\!$ on $X_3\,\!$.

Now, the smallest such space is $\operatorname{span}( P_2 X_1)= \operatorname{span}(\hat{X}_{1 \cdot 2})\,\!$ where $P_2 X_1\,\!$ is the least-squares predictor of X1 regressing on $X_2\,\!$, i.e. the linear analogue to the propensity score.

## Balancing spaces

We can call any space represented by a basis matrix, $X_3\,\!$, that produces the same residuals when $X_1\,\!$ is regressed on it, a 'balancing' space [actually this is an inappropriate borrowing from the expression used in the context of propensity scores. I'm sure these concepts are well know, probably in Econometrics ... help1].

It can be shown that $\operatorname{span}(X_3)\,\!$ is a balancing space if and only if $\operatorname{span}(P_2 X_1) \subseteq \operatorname{span}(X_3) \subseteq \operatorname{span} (X_2) + \operatorname{span}(X_1)^\perp\,\!$

## Better proofs

There are better ways of seeing this that reveal how the linear analogue of the propensity score allows $\operatorname{span}(X_2)\,\!$ to be decomposed into a sum of two orthogonal subspaces.

## Broader approaches

This discussion is based on the assumption of a linear model. The concept of propensity scores is developed in a broader context in which the relationship between $Y\,\!$ and $X_2\,\!$ controlling for $X_1\,\!$ is not necessarily linear. Still, we only need to include the prediction (not necessarily linear) of $X_1\,\!$ on $X_2\,\!$ .

But then, we can't merely treat $\hat{X}_{1\cdot 2}$ as a linear predictor. We need to condition on actual values of $\hat{X}_{1\cdot 2}$ i.e. treat it as a categorical variable or a suitable approximation: intervals as categories or a non-parametric fit.

## References (annotated)

GW Imbens (2000) The role of the propensity score in estimating dose-response functions, Biometrika 2000 87(3):706-710; doi:10.1093/biomet/87.3.706