Statistics: Ecological correlation

From MathWiki

  • Freedman, D., Pisani, R, Purves, R. (1978) Statistics. Norton, N.Y. pp 141-142:
However, correlation coefficients based on rates or averages are often misleading. Here is an example. From 1970 Census data, it is possible to compute the correlation between income and education, for men aged 35 to 54 in the United States. This correlation is about 0.4. The Census Bureau divides the United States up into nine geographical regions. For each region, it is possible to compute the average income and average education for the men living in that region. Then, it is possible to compute the correlation coefficient between these nine pairs of averages and it works out to 0.7. If you used the correlation for the regions to estimate the correlation for the men, you would be way off. The reason is that within each region, there is a lot of spread around the averages. Replacing each region by the average eliminates the spread, and gives a misleading imprression of tight clustering. ...
Correlations based on rates or averages are called ecological correlations. They are often used in political science and sociology. So watch out.
  • Robinson, W. S. (1950) "Ecological Correlations and the Behavior of Individuals," American Sociological Review, 15, pp. 351-357 Stable URL (http://links.jstor.org/sici?sici=0003-1224%28195006%2915%3A3%3C351%3AECATBO%3E2.0.CO%3B2-R)