# Statistics: Venn diagram for correlations

Image:Monette-correlation-venn-diagram.png

## Venn diagram for correlations

A problematic heuristic?

## Enhancer variables data example

Each predictor, x or z, becomes much more significant when the other variable is in the model than when the variable is alone.

```> round(dd,2)
x    y z
1  0.16 1.67 0
2  0.53 2.06 0
3  0.85 2.45 0
4  1.28 2.85 0
5  1.79 3.25 0
6  0.97 2.72 0
7  0.78 2.18 0
8  0.32 1.89 0
9  0.58 2.31 0
17 3.05 1.57 1
21 2.92 1.70 1
31 3.39 2.04 1
41 3.13 1.82 1
51 3.33 2.41 1
61 3.62 2.39 1
71 3.94 2.55 1
81 4.36 2.92 1
91 4.01 2.93 1
10 3.71 2.65 1
11 4.00 2.72 1
12 4.29 2.78 1
13 4.46 3.25 1
14 4.25 3.21 1
15 3.78 2.12 1
16 3.15 2.02 1
```

## Anova tables

```> fit <- lm( y ~ x + z, dd)
> anova(fit)
Analysis of Variance Table

Response: y
Df Sum Sq Mean Sq F value    Pr(>F)
x          1 0.8235  0.8235  26.321 3.844e-05 ***
z          1 4.6040  4.6040 147.154 3.228e-11 ***
Residuals 22 0.6883  0.0313
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
> fit <- lm( y ~ z + x, dd)
> anova(fit)
Analysis of Variance Table

Response: y
Df Sum Sq Mean Sq  F value    Pr(>F)
z          1 0.0247  0.0247   0.7901    0.3837
x          1 5.4028  5.4028 172.6845 6.816e-12 ***
Residuals 22 0.6883  0.0313
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
```

## Enhancer variables

```From first Anova table

A + B  =  0.8235
C  =  4.6040
D  =  0.6883

From the second Anova table

B + C  =  0.0247
A  =  5.4028
D  =  0.6883

Therefore:
B  =  (A + B) - A  =  0.8235 - 5.4028  =  -4.5793

Or:
B  =  (B + C) - C  =  0.0247 - 4.6040  =  -4.5793
```

## Reference

• David Hamilton, Sometimes $R^2 > r^2_{yx_1} + r^2_{yx_2}$: Correlated Variables Are Not Always Redundant.

American Statistician, Vol. 41, No. 2 (May, 1987) , pp. 129-132 Stable URL (http://links.jstor.org/sici?sici=0003-1305%28198705%2941%3A2%3C129%3ASCVANA%3E2.0.CO%3B2-O&size=LARGE#abstract)