# Prac07 Quetelet Assigment 3

## Measurement Error in the Covariates

What Happens to the Analysis of the Data When We Have Measurement Error in a Covariate?

Measurement Error in Stress: its effect on the estimate of the effect of Stress and Coffee

We have data on three variables: Heart, Coffee and Stress.'Heart', which is a measure of heart condition -- the higher the less healthy; 'Coffee', a measure of coffee consumption, and finally, 'Stress', measure of occupational stress.

We are using the measure of Heart as the dependent variable and we are interested in the predictor variable Coffee and the effect of Coffee on Heart. We have also included the predictor variable Stress in the model as our previous analysis has shown that this variable needs to be controlled for.

Our task is to examine the effect of measurement error in the variable Stress. We want to study what happens to the estimates of the effect of Stress and Coffee in the model as we add greater levels of measurement error to Stress.

The link for the R code used in this portion can be accessed here.

First we look at the original data and it's plot. This data is called Data0.

 X ID Coffee Stress Heart 1 1 23 14 6 2 2 35 34 9 3 3 38 58 41 4 4 48 50 31 5 5 52 86 63 6 6 56 73 44 7 7 61 82 69 8 8 62 87 80 9 9 64 74 63 10 10 71 80 72 11 11 74 87 83 12 12 76 92 58 13 13 87 128 113 14 14 97 115 88 15 15 100 123 92 16 16 104 117 92 17 17 107 148 144 18 18 124 146 103 19 19 141 175 145 20 20 154 197 162

Now we look at the data and its plot with a small amount of measurement error added to the variable stress. This data is called data_1.

 X ID Coffee Stress Heart 1 1 23 10 6 2 2 35 34 9 3 3 38 58 41 4 4 48 52 31 5 5 52 86 63 6 6 56 73 44 7 7 61 82 69 8 8 62 87 80 9 9 64 70 63 10 10 71 80 72 11 11 74 87 83 12 12 76 82 58 13 13 87 128 113 14 14 97 115 88 15 15 100 123 92 16 16 104 117 92 17 17 107 140 144 18 18 124 146 103 19 19 141 190 145 20 20 154 190 162

Now we look at the data and its plot with a medium amount of measurement error added to the variable stress. This data is called data_2.

 X ID Coffee Stress Heart 1 1 23 18 6 2 2 35 34 9 3 3 38 58 41 4 4 48 59 31 5 5 52 86 63 6 6 56 73 44 7 7 61 82 69 8 8 62 100 80 9 9 64 70 63 10 10 71 88 72 11 11 74 87 83 12 12 76 82 58 13 13 87 150 113 14 14 97 115 88 15 15 100 123 92 16 16 104 100 92 17 17 107 100 144 18 18 124 146 103 19 19 141 200 145 20 20 154 190 162

Now we look at the data and its plot with a large amount of measurement error added to the variable stress. This data is called data_3 from here forward.

 X ID Coffee Stress Heart 1 1 23 10 6 2 2 35 10 9 3 3 38 10 41 4 4 48 10 31 5 5 52 86 63 6 6 56 73 44 7 7 61 82 69 8 8 62 100 80 9 9 64 100 63 10 10 71 88 72 11 11 74 87 83 12 12 76 108 58 13 13 87 190 113 14 14 97 115 88 15 15 100 193 92 16 16 104 170 92 17 17 107 200 144 18 18 124 246 103 19 19 141 220 145 20 20 154 290 162

We can see that the data stretches in the direction of stress as we add error. This is because as we add error, the variance in the data increases.

Now that we have looked at what happens to the data as we add measurement error to Stress, let's look at what happens to the model as we add measurement error to Stress.

This is the visual regression of the model with only Coffee on Heart.

And this is what happens visually as we add measurement error to Stress.

The blue plane does not change, because this is the regression plane that does not include the Stress variable. But the yellow plane changes quite significantly as we add error to the Stress variable. The more error we add, the more the yellow plane becomes similar to the blue plane.

From this we can see that as we add error, it is like we are not measuring the Stress variable at all. The multiple regression plane becomes more and more like the simple regression plane, with only the Heart and Coffee variable.

Let's look at the value for Beta Coffee and Beta Stress as we add measurement error:

First looking at the multiple regression with the original data:

lm(formula = Heart ~ Coffee + Stress)

Residuals:

Min 1Q Median 3Q Max

-13.5744 -7.5225 -0.4664 6.8669 18.0733

Coefficients:

Estimate Std. Error t value Pr(>|t|)

(Intercept) -7.7943 5.7927 -1.346 0.196

Coffee -0.4091 0.2918 -1.402 0.179

Stress 1.1993 0.2244 5.345 5.36e-05 ***

---

For data_1, the data with a small amount measurement error

lm(formula = Heart ~ Coffee + Stress)

Residuals:

Min 1Q Median 3Q Max

-22.865 -9.770 -2.130 8.351 45.166

Coefficients:

Estimate Std. Error t value Pr(>|t|)

(Intercept) -10.1854 8.8635 -1.149 0.2664

Coffee 0.7175 0.2681 2.676 0.0159 *

Stress 0.3225 0.2043 1.578 0.1329

---

;

For data_2, the data with a medium amount of measurement error

lm(formula = Heart ~ Coffee + Stress)

Residuals:

Min 1Q Median 3Q Max

-13.5744 -22.865 -9.770 -2.130 8.351 45.166

Coefficients:

Estimate Std. Error t value Pr(>|t|)

(Intercept) -10.1854 8.8635 -1.149 0.2664

Coffee 0.7175 0.2681 2.676 0.0159 *

Stress 0.3225 0.2043 1.578 0.1329

---

; 1

R

For the data_3, the data with a large amount of measurement error

lm(formula = Heart ~ Coffee + Stress)

Residuals:

Min 1Q Median 3Q Max

-30.79077 -12.84350 0.04408 10.76147 30.84538

Coefficients:

Estimate Std. Error t value Pr(>|t|)

(Intercept) 3.3275 12.0023 0.277 0.785

Coffee 0.6076 0.3368 1.804 0.089

Stress 0.2241 0.1435 1.561 0.137

---

Multiple R-Squared: 0.9462,

It is useful to compare these beta coefficients to the coefficients in the coffee only model.

lm(formula = Heart ~ Coffee)

Residuals:

Min 1Q Median 3Q Max

-25.1006 -10.8545 -0.6428 10.4100 34.7385

Coefficients:

Estimate Std. Error t value Pr(>|t|)

(Intercept) -9.3138 9.2055 -1.012 0.325

Coffee 1.1082 0.1072 10.339 5.34e-09 ***

---

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 16.48 on 18 degrees of freedom

Multiple R-Squared: 0.8559, Adjusted R-squared: 0.8479

F-statistic: 106.9 on 1 and 18 DF, p-value: 5.337e-09

This is what the visual images of the Beta confidence intervals look like as we add error.

As a result of looking at the above models and visualizations, it is clear that accurate measurement of all variables in the model is essential. If the cofounding factor is measured with enough error, it is as if you are effectively controlling for nothing - even if the cofounding variable appears to be included in the model.