# SPIDA 2009: Mixed Models with R: Lab Session 3: Generalized Linear Mixed Models and Related Topics

### From MathWiki

- Home Page for SPIDA 2009: Mixed Models with R
- Installing the 'spida' and 'p3d' packages:SPIDA 2009: Mixed Models with R: Installing spida and p3d packages

# The master copy of this script is currently kept at # http://wiki.math.yorku.ca/index.php?title=SPIDA_2009:_Mixed_Models_with_R:_Lab_Session_4:_Generalized_Linear_Mixed_Models_and_Related_Topics # please make corrections or discuss suggestions on the wiki ### ### ### Lab Session Section 3: Generalized linear mixed models and related topics ### ## ## Related Topics: ## #* Accelerated Longitudinal Designs and age-period-cohort linear confounding #* Modeling seasonal and periodic effects with Fourier Analysis #* Using general splines to model effect of age or time #* Linear, quadratic, cubic and natural cubic splines #* General spline generator: splines with arbitrary degrees and smoothness #* Defining hypothesis matrices and using Wald tests to explore splines #* Plotting splines and spline features with confidence bounds #* Plotting log-odds or probabilities #* Interpreting hypothesis tests using confidence bounds #* Bonferroni and Scheffe confidence bound adjustment factors #* Testing non-linear cohort effects #* Alternatives to glmmPQL: lmer, glmmML,GLMMGibbs, library( car ) library( MASS ) library( nlme ) library( spida ) library( p3d ) library( lattice ) # Also need later: # # library( GLMMGibbs ) # library( lme4 ) # library( glmmML ) # # # Data: Respiratory infections among pre-school children in Indonesia # - 274 children were oberved at community clinics up to 6 times 3 months apart # - Baseline age ranges from 4 to 80 months # Question: Experience suggests that boys get more infections than # girls especially around 3 to 4 years of age. # # This data set is particularly interesting because it has a structure very # similar to Statistics Canada longitudinal surveys like the NPHS and the # NLSCY. There are 6 waves of data with subjects entering at different ages. # # This kind of design has been called an "Accelerated Longitudinal Design" # -- also a cross-sequential design or a cohort design (Miyazaki and # Raudenbush, 2000). Its competitors are the cross-sectional design in which # individuals of different ages are observed at one point in time, and a # single cohort design in which a single cohort is followed over a very # long time. # # In the cross-sectional design, age effects and cohort effects are perfectly # confounded. # # In the single cohort design, age effects and period effects are perfectly # confounded. # # In the ALD (accelerated longitudinal design) no pair of age, cohort.birth or period # are perfectly confounded but the LINEAR effect of any one is confounded with # a combination of the linear effects of the other two, in that the predictors # obey: # # period = cohort.birth + age # # so that there are only two degrees of freedom to estimate three parameters. # Note that we let period = time of measurement, cohort = time of birth. # # If we use all three variables, the model will be singular. Thus we drop one # of the three. If we drop 'cohort' on the basis that the effect of cohort # is more 'distal' than that of 'period' or 'age', we get the following # where we use 'beta[x]' for the 'true' but unknowable effect of 'x' and # 'beta.estimated[x]' for its estimated value in a particular model. # # combined effect of period, cohort and age: # # = beta[period] x period + beta[cohort.birth] x cohort.birth + beta[age] x age # # = beta[period] x period + beta[cohort.birth] x (period - age) + beta[age] x age # # = (beta[period] + beta[cohort.birth]) x period + (beta[age] - beta[cohort.birth]) x age # # So # beta.estimated[period] = beta[period] + beta[cohort.birth] # and # beta.estimated[age] = beta[age] - beta[cohort.birth] # # So depending on our assumption about beta[cohort.birth] we get: # # beta[period] = beta.estimated[period] - beta[cohort.birth] # and # beta[age] = beta.estimated[age] + beta[cohort.birth] # # In other words, the same data can be explained by a 0 effect of cohort # or by a non-zero effect of cohort with corresponding adjustments to the # estimated effect of age and period. # # Note that if cohort is identified by 'age at baseline' then the # sign of the cohort effect is the reverse of the sign if it is measured # by data of birth. So: # # beta[period] = beta.estimated[period] + beta[cohort.baseline.age] # and # beta[age] = beta.estimated[age] - beta[cohort.baseline.age] # # # It may seem paradoxical that the separate effects of age, period and cohort # cannot be estimated and the reason is that they, in fact, can be. # 'Non-linear' effects can be estimated separately under the assumption # that there are no interactions, i.e. that the separate effects are # additive. # It is only the linear effects that are fully confounded. # # Miyazaki and Raudenbush (2000) propose an approach using mixed models to # test wether cohort effects can be neglected. Essentially their tests ask # whether each cohorts trajectory 'splices' into those of adjoing cohorts. # If so the entire age trajectory can be explained by age effects only. # Their approach, however, assumes that there are no period effect, an effect # that can be irregular and potentially strong in my limited experience. # # We propose later to test a model that includes period effects and test # for possible effect of cohort beyond linear. If higher order cohort # can be ignored, then the estimation of an age trajectory is simplified. # # # # A first look at the data: # data(Indonesia) ?Indonesia ind <- Indonesia xqplot( ind ) scatterplot.matrix( ind , ell = TRUE ) indu <- up( ind, ~ id ) xqplot( indu ) # number of observations per child: ind$n <- capply( ind, ~ id, nrow ) indu <- up( ind, ~ id ) xqplot( indu ) scatterplot.matrix( indu, ell = TRUE ) inda <- up( ind, ~ id , all = T ) xqplot( inda ) scatterplot.matrix( inda, ell = T ) # # QUESTIONS: # How would you interpret # 1) the correlation between n and time # 2) the tapering of the season vs n plot # # Notes: # Although there are 1154 observations on 274 subjects, there are only tab( ind, ~ resp) # 107 instances of respiratory infections in tab( up( ind, ~ id, all=T)$resp > 0 ) # 81 subjects. For a dichotomous response, the 'effective sample size' # (Harrell, 2001) is the number of cases in the smaller group, here 107. # # Despite the seemingly large number of observations, we need to # interpret results cautiously. # # # Looking at marginal relationships between resp and age # # # Note: I do not carry out the following sequence of plots when # I'm analyzing data unless I'm preparing a graphic for publication # or I'm trying to impress someone. For exploratory analysis # most plots are created with 1 or 2 lines of code. # # The following sequence shows how you might go # from a simple plot for quick visualization to a more complex plot. # xyplot( resp ~ age, ind ) xyplot( resp ~ age, ind , panel = function(x,y,...) { panel.xyplot( x, y, ...) panel.loess( x, y, ...) } ) # fails for a subtle reason: the 'default' in panel.loess # family = "symmetric" which tries to fit a ROBUST estimate # of location. Here, however, a robust estimator is precisely # what we don't want. We want to fit the curve with an # ordinary mean to estimate the probability of 'resp'. # We get that by specifying family = "gaussian" # ?panel.loess xyplot( resp ~ age, ind , panel = function(x,y,...) { panel.xyplot( x, y, ...) panel.loess( x, y, ..., family = 'gaussian') } ) # focus scale of y to see predicted proportion of resp xyplot( resp ~ age, ind , ylim = c(0, .2), panel = function(x,y,...) { panel.xyplot( x, y, ...) panel.loess( x, y, ..., family = 'gaussian') } ) # create a factor for sex ind$Sex <- factor( ifelse( ind$sex == 0, "Male", "Female")) xyplot( resp ~ age | Sex, ind , ylim = c(0, .2), panel = function(x,y,...) { panel.xyplot( x, y, ...) panel.loess( x, y, ..., family = 'gaussian') } ) td( lwd = 2 ) # by specifying 'lwd' here instead of as an argument to # to xyplot, the change in lwd will be applied to # the key as well as to the main graph. # But it will persist for subsequent graphs. # xyplot( resp ~ age , ind , ylim = c(0, .2), groups = Sex, panel = panel.superpose, panel.groups = function(x,y,...) { panel.loess( x, y, ..., family = 'gaussian') }, auto.key = list( columns = 2, lines = T, points = F) ) # # Analysis # # Issues: # # -- data are longitudinal clustered in children so obs. are not independent # # -- the trajectories are not linear or quadratic, each might have a # a different degree as a polynomial # # -- age is confounded with time, hence possible period effects # # -- age and period are confounded with cohort, i.e. 'baseline_age' # # # We used a generalized linear mixed model fitted by penalized quasi-likelihood # # Essentially, we fit a sequence of 'lme' models with changing weights # much as a 'glm' involves fitting a sequence of 'lm' models with changing # weights. # # With true 'glm's the result is equivalent to maximizing # the likelihood of the the glm model. # # With 'glmm' models it is not, # hence PQL = penalized QUASI likelihood. # We cannot do LRTs to compare models but can use Wald tests. # # For binary outcomes with few observations per cluster, this approach # can yield biased estimates, particularly of variance parameters. # So we interpret results cautiously. # # The advantage of glmmPQL is that it is relatively fast and convergent # and its approach is not too difficult to understand since it parallels # lm and glm. Also glmmPQL allows correlation structures and realively complex # structures for the G matrix, allowing, for example, the fitting of # smoothing splines. # # One alternative is the 'lmer' function in 'lme4' that has a superior # fitting algorithm but lacks some of the other features of glmmPQL. # Currently it also lacks many methods. # library(MASS) # for glmmPQL fit <- glmmPQL ( resp ~ time + season + age + sex + height_for_age , data = ind, family = binomial, random = ~ 1 | id) summary( fit ) # # QUESTION: anything wrong? # fit <- glmmPQL ( resp ~ time + factor(season) + age * sex + height_for_age , data = ind, family = binomial, random = ~ 1 | id) summary(fit) wald( fit, "season") # straddling the Equator doesn't eliminate seasonal effects!! # QUESTIONS: # 1) What happened to season? it went from not significant to highly significant. # 2) Note that sex and its interaction with age are not significant. Do we quit # looking for sex effects? Or why not? # ind$Season <- factor( c("Spring","Summer","Fall","Winter") [ ind$season ] ) ind$Season <- reorder( ind$Season, ind$season) # QUESTION: Why are we doing this? tab( ~ Season, ind) # QUESTION: Why so imbalanced? tab( ~ Season + time, ind ) # We have 5 degrees of freedom for time (WHY?) and we've used 4: # 1 for linear time and # 3 for season. # We could try one more before moving on: fit2 <- glmmPQL ( resp ~ time + I(time^2) + Season + age * Sex + height_for_age , data = ind, family = binomial, random = ~ 1 | id) summary( fit2 ) wald( fit2, 'time' ) # # QUESTION: Should we drop quadratic time? # I would lean to drop because I'm concerned about the size of # the eventual model. # # So our provisional model is: fit3 <- glmmPQL ( resp ~ time + Season + age * Sex + height_for_age , data = ind, family = binomial, random = ~ 1 | id) summary( fit3 ) wald( fit3, 'Season' ) wald( fit3, 'Sex' ) wald( fit3, list( Interaction = ":")) # # Modeling Seasonal effects # # # In this example time is discrete as if all subjects were observed # nearly at the same time on each occasion. Often time is more # continuous, measured to the day or to the month. # # When the time scale of a study is a substantial part of a year # or stretches over many years, and the phenomemon might include # a periodic (seasonal) pattern that is repeated # each year, you can consider modeling periodic effects. # # Another common context for periodic effects is modeling # diurnal periodic patterns. Circadian cycles that are not # locked into a fixed period can sometimes be approximated # by diurnal models or modeled through autoregressive models # on the R side of a mixed model. # # The basic tool for periodic patterns is Fourier Analysis. # Fourier Analysis is to periodic effects what # exponential models are to asymptotic effects and what # lines and polynomials are to general smooth effects. # # Fourier analysis sounds extremely advanced but it's only # slightly more complicated than fitting polynomials. # # Fourier analysis involves fitting a sine 'wave' to the data, a wave # whose period is equal to the basic period (e.g. year, day). # This fundamental wave corresponds to a line in general analysis. # If a single sine wave doesn't do the job we consider adding # higher harmonics: sine waves whose periods are increasingly # smaller whole fractions of the basic period, i.e. 1/2 year, # 1/3 year, 1/4 year, 1/5 year and so on. As in fitting polynomials # we can keep going until the is no significant improvement. # # Just like it takes at least 2 points to fit a line, 3 points to fit # to fit a quadratic,4 points to fit a cubic and so on, # it takes takes at least # # 3 points within each period to fit a fundamental wave # 5 points ... to fit a fundamental + a first harmonic ( 1/2 year) # 7 points ... to fit up to the second harmonic (1/3 year) # 9 points ... to fit up to the third harmonic (1/4 year) # ... etc # # In our data we have 4 points so we can't go higher than the # fundamental. The fundamental is not a 'saturated' model for the # effect and we can test whether time used as a factor (the # saturated model) gives a better fit. # # The equivalent of terms in a polynomial (i.e. x, x^2, x^3, etc. # are sine,cosine pairs. So additional terms come in twos # # Suppose we have data measured in days stretching from -100 to 600, # almost two years. days <- -100:600 # Since we have 365 points # per year we could go very high with our harmonics but we # just look at the kinds of patterns generated by the first # few harmonics. # # For a first look, we won't fit a model but we will look at # what models look like. A bit like looking at a straight line # then a quadratic, then a cubic, etc. # # Note that our time points DO NOT have to be evenly spaced NOR # at the same times in each period. # # The period of the sin or cos function is 2*pi 2*pi # The period for our data is 365.25 days. (The .25 is only # necessary if our data stretches over such a long time that # ignoring leap years would cause the model to get out of phase, # but there's no harm in illustrating how easy it is to # incorporate a fractional period. # # To produce a sine curve with a period of 365.25 days, we need # to change our time scale from that of the data to that of the # the sin function. We do this by dividing by the period of # the data and multiplying by the period of the sin function. plot( days, sin( 2*pi* days / 365.25) , type = 'l') abline( h = 0, col = 'gray' ) abline( v = 365.25*(-1:2), col = 'gray') # When we fit a model, sin( 2*pi*days/365.25) will be # a regressor. Fitting the model will fit a coeffiecient for # this regressor, for example, 0.345 lines( days, 0.345 * sin( 2*pi* days / 365.25), col = 'green3') # A single sin curve will always have value 0 at 0, i.e. # its 'phase' must be zero. How could we fit other possibilities # for the phase (the first time when the curve is equal to 0). # Obviously we're talking about two parameters here: # the amplitude and the phase # We could use a formally non-linear model but the magic # of Fourier analysis is that we can accomplish this by # combining our sin curve with a cos curve of the same period # but possibly different amplitudes; lines ( days, 0.789 * cos( 2*pi* days / 365.25), col = 'blue') # Note that the cos curve is a sin curve with phase # equal to 1/4 year. # # When we add them we get another 'sine' curve with some other # phase: lines( days, 0.345 * sin( 2*pi* days / 365.25) + 0.789 * cos( 2*pi* days / 365.25), col = 'red') # The amplitude of the resulting curve is sqrt( 0.345^2 + 0.789^2) # and its (a) phase is 365.25 * atan2( -0.345, 0.789) # Fitting sine curves ('sine' generic for 'sin' or 'cos' also # called trigonometric curves) fit2.sin <- glmmPQL ( resp ~ time + sin( 2*3.1416 * (time + 6) / 4) + cos( 2*3.1416 * (time + 6) / 4) + age * Sex + height_for_age , data = ind, family = binomial, random = ~ 1 | id) summary( fit2.sin ) # QUESTIONS: # 1 ) Why did we add 6 to time? What happens if we don't. # 2 ) The 'sin' term is highly significant but the 'cos' term is # not. Should we drop the 'cos' term. # # The answer to the second question is too important to leave up # in the air: # If we drop the 'cos' term, then the 'sine' curve is force to # have a phase of 0, i.e. the curve = 0 at time = 0. # Thus, the model would not be invariant if we add a constant # to the time variable and the model would violate a principle of # invariance. # # Another way of thinking about it is that each term is # 'marginal' to the other and dropping one without the other # violates the principle of marginality. Except in very rare # circumstances, should be added or dropped as a pair. This # means that we need to carry out a wald test with 2 numDFs: wald ( fit2.sin, "sin|cos") # Of course we knew this would be significant. # # If we had more time points, we could try adding higher harmonics, # the next being: lines( days, sin(2 * 2 * 3.1416 * days / 365.25), lty = 2, col = 'green3') lines( days, cos(2 * 2 * 3.1416 * days / 365.25), lty = 2, col = 'blue') # Note how easy it is to generate each harmonic. You just multiply # the whole argument of the 'sin' or 'cos' by the appropriate integer: # 2 for the first harmonic # 3 for the second, etc. # # Alas we have too few points per year to illustrate this here. # If we did, we could keep adding other harmonics until we decide # to stop. Note that lower harmonics are treated as 'marginal' to # higher harmonics, just like polynomials. # # To make the process much cleaner and easier, we can define a # 'Sine' function that generates the matrix with our cos,sin pair # incorporating phase shifts and period scaling: Sine <- function( x ) cbind( sin = sin( 2 * 3.1416 * (x + 6) / 4), cos = cos( 2 * 3.1416 * (x + 6) / 4)) # to see what this does: time.ex <- seq( -1, 6, .1) head( Sine( time.ex ) ) matplot( time.ex, Sine( time.ex ), type = 'l') # generating higher harmonics is a cinch: matlines( time.ex , Sine( 2 * time.ex ), col = 'blue') matlines( time.ex , Sine( 3 * time.ex ), col = 'red') # combining all this with random coeffients produces white noise: try it a few times: plot( time.ex, cbind( Sine(time.ex) , Sine(2*time.ex), Sine(3 * time.ex)) %*% rnorm( 6 ), type = 'l') # Back to Indonesia: (we can fit the same model with:) Sine <- function( x ) cbind( sin = sin( 2 * 3.1416 * (x + 6) / 4), cos = cos( 2 * 3.1416 * (x + 6) / 4)) fit2.sin <- glmmPQL ( resp ~ time + Sine ( time ) + age * Sex + height_for_age , data = ind, family = binomial, random = ~ 1 | id) summary( fit2.sin ) wald( fit2.sin, 'Sine') # QUESTIONS: # 1) Sine( time ) is a function of 'time' NOT 'Season'. # Is is appropriate to think of it as modeling a # a Season effect. # 2) With this model, what is the interpretation of the # coefficient for the intercept term. # # There are 3 degrees of freedom for Season (4 levels - 1). # Our Sine function uses 2. # We'd like to test the next harmonic by adding the term: # # ... + Sine(2*time) + ..... # # but this would require 4 degrees of freedom. # How can we test whether we need to include the omitted # degree of freedom in this case? Just include the 'cos' # term of the next highest harmonic. Note: very rarely # the cos term won't work and you should try again with # the sin term. # fit2.sinS <- update( fit2.sin , . ~ . + Sine(2*time)[,2]) summary(fit2.sinS) # Turns out to be highly significant so we need all available # degrees of freedom to model Season. # # So we might as well just use the factor Season. # # This suggests that if we had more time points we would have # needed at least the first harmonic in our model. # # With the same number of subjects and the same number of # occasions for each subject BUT subjects coming in at different # dates, we could have used the variability in dates to # fit and test higher harmonics. # # So we are back where we started BUT much wiser! # EXERCISE: # # Try fitting a periodic curve to the sunspot data # in the 20th century # Note: we will use part of the same data later for splines. # data(sun) # monthly sunspot data in 20th century plot( spots ~ year , sun) # This is not strictly periodic data because the phase is # not fixed but it wanders randomly. # Trying a model with a period of 9.5 years plus a few harmonics Sine <- function( year ) cbind( sin = sin( 2*3.1416*year), cos = cos( 2.*3.1416*year)) # NOTE: By using a period of 1 in the Sine function we set things up so # that when we use the Sine function # we DIVIDE BY THE PERIOD and MULTIPLY FOR HARMONICS: fitsun <- lm( spots ~ Sine( year / 9.5) + Sine( 2 * year / 9.5) + Sine( 3 * year / 9.5), sun) summary( fitsun ) lines( predict( fitsun) ~ year, sun) # EXERCISE: # # Look at the graph, particularly at the peaks of the fitted curve # Is the fit too slow or too fast? # Adjust the period to see whether you can find one that fits better> # # # Modeling age with splines # # Splines: a very simple idea # # Problem with fitting a single polynomial: # # - Might need a very high degree for a good fit # # - But high degree polynomials go wild away from data # (Runge's phenomenon) # # - Points have influence at a distance -- generally not reasonable # # Spline: # # - split the range into disjoint intervals (e.g. 3 intervals) # # - the points at the end points are called 'knots' # ( 3 intervals => 2 knots ; p intervala => p-1 knots) # # - fit a separate polynomial in each interval # # - BUT do it in such a way that the polynomials join up # at the knots. # # - How? different 'degrees' of joining up: # - 0: just continuous but perhaps a kink # - change in slope = 1st derivative # - 1: smooth: no kink but perhaps a sudden change in curvature # - change in 2nd derivative = curvature = acceleration # - 2: smooth of degree 2: no sudden change in curvature but # possible sudden change 3rd derivative = jerk # - 3, 4, 5: snap, crackle and pop # # - Difficult: generating regressors that do the right job # BUT if the regressors are generated for you, nothing's # easier than using splines. # # gsp is a simple function to generate the columns of the X matrix # for arbitrary splines # # # Primer for splines # data(sun) # monthly sunspot data in 20th century spex <- sun[ sun$year > 1980,] xqplot( spex ) plot( spots ~ year , spex) # # Linear spline: lines continuous at knots: choose knots where slope changes: # # General Spline Generator: # Preliminary step: # - transform time so it has a resonable range # - max( abs(time) )^(highest degree) should not be much greater than 10000 # - you can keep the original time scale for plotting spex$ys <- spex$year - 1980 ?gsp # define a function to generate the spline: # linear spline splin <- function( x ) { gsp( x, knots = c( 1986, 1990, 1996 )-1980, # 3 knots => 4 intervals degree = c( 1, 1, 1, 1 ) , # linear in each interval smooth = c( 0, 0, 0 ) # continuous at each knot ) } fitlin <- lm( spots ~ splin(ys), spex ) summary( fitlin ) lines( predict(fitlin) ~ year, spex) abline( v = 1980 + splin(NULL)$knots , col = 'gray') # quadratic spline: quadratic in each interval and smooth at knots spquad <- function( x ) { gsp( x, knots = c( 1989, 1993 )-1980, # 2 knots => 3 intervals degree = c( 2, 2, 2 ) , # quadratic in each interval smooth = c( 1, 1 ) # smooth at each knot ) } fitquad <- lm( spots ~ spquad(ys), spex ) summary( fitquad ) lines( predict(fitquad) ~ year, spex, col = 'red') abline( v = 1980 + spquad(NULL)$knots , col = 'pink') # note the appearance of the spline where it goes over the knot # cubic spline spcubic <- function( x ) gsp( x, 1987 - 1980, c(3, 3), 2) fitcubic <- lm( spots ~ spcubic(ys), spex ) summary( fitcubic ) lines( predict(fitcubic) ~ year, spex, col = 'blue') abline( v = 1980 + spcubic(NULL)$knots , col = 'blue') AIC ( fitlin, fitquad, fitcubic ) # fit linear was best for 6 dfs # but we might have done better # by finding better places for knots # natural cubic spline # add 'boundary knots' at the extreme values of the data and # extrapolate with linear term # spnat <- function( x ) gsp( x, c(1980, 1987, 1998) - 1980, c(1, 3, 3, 1), c(1,2,1)) fitnat <- lm( spots ~ spnat(ys), spex ) summary( fitnat) lines( predict(fitnat) ~ year, spex, col = 'green') abline( v = 1980 + spnat(NULL)$knots , col = 'green') # # Summary: # The right spline depends on the phenomenon being modeled # # Cubic splines are very popular but for many situations they # don't bend fast enough. # # AIC ( fitlin, fitquad, fitcubic, fitnat ) # # Using splines for age # quantile(ind$age, seq(0,100,5)/100) ind$age.y <- ind$age / 12 # transform to age in years -- good to limit max(abs(x)) -- # when using raw polynomial and splines to control condition # number of X matrix # try a cubic fit.3 <- glmmPQL ( resp ~ time + Season + ( age.y + I(age.y^2) + I(age.y^3)) * Sex + height_for_age , data = ind, family = binomial, random = ~ 1 | id) summary( fit.3 ) wald( fit.3, 'Sex' ) # does not reveal much # Try a natural cubic spline qs <- quantile( ind$age.y, c(0,1,2,3,4,5,6)/6) qs # natural cubic spline puts knots at both ends with linear terms to extrapolate if necessary sp <- function( x ) gsp( x, qs, c(1,3,3,3,3,3,3,1), c(2,2,2,2,2,2,2)) # natural cubic spline using 5 interior # knots separating intervals # witn equal proportions of the data # and two boundary knots fitsp <- glmmPQL( resp ~ sp(age.y) * Sex + time + Season + height_for_age, ind, random = ~ 1 | id, family = binomial ) summary( fitsp ) pred <- expand.grid( age = 4:84, Sex = levels( ind$Sex), height_for_age = 0, Season = levels(ind$Season), time = 1:6) pred$age.y <- pred$age / 12 pred$resp.lo <- predict( fitsp, pred, level = 0) wald( fitsp, 'Sex') # No strong evidence for an overall effect of Sex # But what if we target the question we wanted to ask # Effect of Season: xyplot( resp.lo ~ Season, pred, type = 'l', subset = Sex == "Male" & time == 1 & age == 50) wald( fitsp, Ldiff( fitsp, "Season", reflevel = "Spring")) # Overall period effect: wald( fitsp, list( "period effects" = "time|Season") ) # QUESTION: What does this say? Recall that we are # controlling for age at the time of observation xyplot( resp.lo ~ age, pred, groups = Sex, type = 'l', subset = Season == "Summer" & time == 1, auto.key = list( columns = 2, lines = T, points =F)) # # Using Wald tests to explore splines # # We need an estimate of the difference between the two curves wald( fitsp ) # # We see that the columns of the model matrix consist of: # 1: a column of 1s for the intercept, # 2-7: a block of 6 columns for the spline, # 8: male indicator # 9: time # 10-12: a block of 3 columns for Season # 13: height_for_age # 14-19: another block of 6 columns for the interaction between # age and Sex # # To estimate the difference between the two sexes at each age # we need to generate the correct L matrix. This would seem formidable # without a detailed understanding of the parametrization of the # spline. # # However, there is a function to help with this. It generates the # portions of the L matrix for the blocks that model the spline. # # # To generate the portion of the matrix to evaluate the spline at # at age.y = 2, 4, and 6, use sc( sp , c(2,4,6) ) # # This is the portion of the L matrix that would estimate the # value of the response offset from the intercept. # # To evaluate the derivative (slope) of the spline at the same # points: sc( sp, c(2,4,6), D = 1) # first derivative (slope) of spline # the second, third and fourth derivatives: sc( sp, c(2,4,6), D = 2) sc( sp, c(2,4,6), D = 3) sc( sp, c(2,4,6), D = 4) # # To handle possible discontinuites at knots, a fourth argument # 'type' that can be set to 0, 1 or 2 signifies that the # derivative should be evaluated from the left, from the right # or that the saltus should be evaluated. # To create an L matrix to estimate Female response at # each month, given values of other predictors. The values # are eventually arbitrary and we choose: # # Season == "Spring" & time == 2 # # We need: # # Lfemale = the L matrix is Season == "Summer" & time == 1, # Lfemale <- cbind( 1, sc( sp, seq(4,84,1)/12), 0, 1, 1, 0, 0, 0, 0*sc(sp,seq(4,84,1)/12) ) some( Lfemale ) zapsmall( some( Lfemale )) Lmale <- cbind( 1, sc( sp, seq(4,84,1)/12), 1, 1, 1, 0, 0, 0, 1*sc(sp,seq(4,84,1)/12) ) zapsmall( some( Lmale )) Ldiff <- Lmale - Lfemale zapsmall( some( Ldiff) ) (ww <- wald( fitsp, Ldiff )) # estimates gap at each age wald( fitsp, "Sex") # QUESTION: How is this output similar to the previous wald output? # Turn the output of the 'wald' function into a data frame for plotting # include eta.hat +/- 2 SE df.diff <- as.data.frame( wald( fitsp, Ldiff), se = 2) df.diff$age <- seq(4,84,1) some( df.diff ) # estimated coef with Upper and Lower confidence Bound xyplot( coef + U2 + L2 ~ age, df.diff, type = 'l', panel = function(x,y,...) { panel.superpose(x,y,...) panel.abline( h = 0) }) # Better labels td( lty = c(1,2,2), col = c('blue','blue','blue'), lwd = 2) xyplot( coef + U2 + L2 ~ age, df.diff, type = 'l', sub = "Boy-Girl gap in log odds of respiratory infection", ylab = "Log odds difference +/- 2 SEs (single prior hypothesis)", xlab = "age (in months)", panel = function(x,y,...) { panel.superpose(x,y,...) panel.abline( h = 0) }) # To make this more interpretable, we would like to transform # graphs to a probability scale where possible. # Fitted responses are easy: probability = 1/(1+exp(-log.odds)) # Let's write it as a function Invlogit <- function ( lo ) 1/(1+exp(-lo)) Logit <- function( p ) log( p / (1-p) ) # i.e. log(odds(p)) # So the graph showing each sex is, instead of: xyplot(resp.lo ~ age, pred, groups = Sex, type = 'l', subset = Season == "Summer" & time == 1, auto.key = list( columns = 2, lines = T, points =F)) # we get: xyplot( Invlogit(resp.lo) ~ age, pred, groups = Sex, type = 'l', ylab = "Probability of respiratory infection", subset = Season == "Summer" & time == 1, auto.key = list( columns = 2, lines = T, points =F)) # QUESTION: Are the bumps and valleys artifacts of our knots? # # EXERCISE LATER: redo this playing with different knots, # possibly knots chosen on the basis of the initial plot # What price should you pay? The knot adjustments are # non-linear parameters and you could pay 1 DF per knot # to be safe. # # # Plotting the difference: # # There isn't a direct way to turn the difference in log odds into # a difference of probabilities. Probabilities are 1-1 functions of log odds # but probability differences are not 1-1 functions of differences in log odds # # However: # # the ratio of odds = exp( diff of log odds ) # and if the probabilities are relatively small, then the ratio of odds # is approximately the proportion of probabilities which we can # wrap our minds around. # # td( lty = c(1,2,2), col = c('blue','blue','blue'), lwd = 2) xyplot( exp(coef) + exp(U2) + exp(L2) ~ age, df.diff, type = 'l', sub = "Boy/Girl odds ratio for respiratory infection", ylab = "Odds ratio +/- 2 SEs (single prior hypothesis)", xlab = "age (in months)", panel = function(x,y,...) { panel.superpose(x,y,...) panel.abline( h = 0) }) # oops xyplot( exp(coef) + exp(U2) + exp(L2) ~ age, df.diff, type = 'l', ylim = c(.1,10), sub = "Boy/Girl odds ratio for respiratory infection", ylab = "Odds ratio +/- 2 SEs (single prior hypothesis)", xlab = "age (in months)", panel = function(x,y,...) { panel.superpose(x,y,...) panel.abline( h = 1) }) # not pretty. If the changing the plotted values does not work, then # just change the labels on the axes! xyplot( coef + U2 + L2 ~ age, df.diff, type = 'l', ylim = log( 2^c(-8,8)), scale = list( y = list( at = log( 2^seq(-7,7,1)), labels = fractions( 2^seq(-7,7,1))), x = list( at = seq( 0,90,10))), sub = "Boy/Girl odds ratio for respiratory infection (on log odds scale)", ylab = "Odds ratio +/- 2 SEs (single prior hypothesis)", xlab = "age (in months)", panel = function(x,y,...) { panel.superpose(x,y,...) panel.abline( h = log( 2^seq(-7,7,1)), col = 'gray') panel.abline( h = 0) panel.abline( v = seq(0,90,5), col = 'gray') }) # # So ... at 35 months it seems that boys are estimated to be over 4 times # as likely as girls to get respiratory infections and # they are at least twice as likely as girls using a 95% CI. # # BUT this strictly works only if we have one age as a prior hypothesis. # What if we have selected 35 months because it is significant. # # We can adjust the CIs to take multiple prior hypotheses into # account using Bonferroni adjustment or to protect against any # number by using a Scheffe adjustment. # # # Buying a fishing license # # # Rules for fishing licenses: # # The analogy doesn't quite work because # you are not paying for the number of fish you catch, # you are paying for the particular fish you plan to # look at whether you decide to keep them or not. # If you've paid to look at the fish, # there's no extra cost for keeping it! # # This subtlety is critical: it's your intentions that # count, not your actions!! You pay for the privilege # to look at fish, not for keeping them. # # 1) No window shopping. You have to pay for every fish you PLAN to look at. # # 2) The first fish is free. # # 3) Bonferroni's franchise: if you plan to look at a fixed number of fish # you get your licence from Mr. Bonferroni who will charge you per # fish you plan to look at. # # 4) Scheffe's franchise: You can get an unlimited fishing licence for a # SUBSPACE of hypotheses from Mr. Scheffe. He charges according to # the dimension of the subspace. # # 5) If you find that a Scheffe license is cheaper for your purposes # than a Bonferroni license, you can get a full refund on your # Bonferroni license and use a Scheffe license instead. # # Note: there are many other purveyors of fishing licences but # they are special purpose licences. They tend to be cheaper when # they are just right for you application. But they aren't as general # as Bonferroni and Scheffe. # # How do we buy fishing licences? # # It's extremely easy: # When we use # # estimated.value +/- 2 x SE # # the '2' comes from the fact that the interval -2 to +2 contains # approximately 95% of the probability for a standard normal random # variable. # # All we need to do is to use something other than a '2' # # For Bonferroni to test H hypotheses, we want to choose K (instead of 2) # so that the probability that H normals (independent or not) will all # fall in the interval (-K , K) is at least 95%. # # For Scheffe to fish in a SUBSPACE of dimension D, we want to choose K # so that any linear combination chosen from a particular subspace of # dimension D, will fall in the interval (-K , K) # # The following are two functions that will produce the right factor # to use instead of '2': Bonferroni.factor <- function( nhypotheses, alpha = .95, denDF = Inf) qt( 1 - (1-alpha)/(2*nhypotheses), denDF) Scheffe.factor <- function( dimension , alpha = .95, denDF = Inf) sqrt( dimension * qf( alpha, dimension, denDF)) # Note: qnorm( .975 ) Bonferroni.factor( 1 ) Scheffe.factor( 1 ) cat( "\n\nCost of protection:\n") mat <- cbind( Bonf = Bonferroni.factor( 1:20 )/qnorm(.975) , Sch = Scheffe.factor( 1:20)/qnorm(.975)) rownames( mat ) <- 1:20 round( mat, 2) # # Note that costs are not really comparable because you aren't buying the # same thing. # # # Some scenarios: # # Scenario 1: # # Suppose we had preselected (credibly on some basis independent of the data) # three ages we wanted to test, say 36,42 and 48 months. This could use # a Bonferroni adjustment with nhypotheses = 3. # # Scenario 2: # # Suppose we want to test 36 months but we played with two of the knots to # make the gap as big as possible. We could use a Scheffe adjustment # with dimension 3 (2 more than 1) # # Scenario 3: # # Suppose we're just looking for any Sex differences at all, we have # no preconceived idea where to look. We could use s Scheffe adjustment # with dimension equal to the dimension of the Sex comparisons. This is # numDF in the wald output for Ldiff wald(fitsp, Ldiff) # i.e. 7. # # QUESTION: How many prior hypotheses do you think we would need to test # before a Scheffe adjustment is cheaper than a Bonferroni adjustment. # # Last question first: uniroot( function(K) Bonferroni.factor(K) - Scheffe.factor(7), c(1,1000)) # Scheffe is, surprisingly, quite expensive. # # Now the first 3 questions: ses = c(std = qnorm(.975), Bonf.3 = Bonferroni.factor( 3 ), Sch.3 = Scheffe.factor( 3 ), Sch.7 = Scheffe.factor( 7 )) ses dmult <- as.data.frame( wald( fitsp, Ldiff), se = ses) some(dmult) dmult$age <- seq(4,84,1) td( col = c('blue','blue','green3','red','black','blue','green3','red','black'), lty = c( 1,2,3,4,5,2,3,4,5), lwd = 2) xyplot( coef + Ustd + UBonf.3 + USch.3 + USch.7 + Lstd + LBonf.3 + LSch.3 + LSch.7 ~ age, dmult, type = 'l', ylim = log( 2^c(-8,8)), scale = list( y = list( at = log( 2^seq(-7,7,1)), labels = fractions( 2^seq(-7,7,1))), x = list( at = seq( 0,90,10))), sub = "Boy/Girl odds ratio for respiratory infection (on log odds scale)", ylab = "Odds ratio with confidence intervals", xlab = "age (in months)", panel = function(x,y,...) { panel.superpose(x,y,...) panel.abline( h = log( 2^seq(-7,7,1)), col = 'gray') panel.abline( h = 0) panel.abline( v = seq(0,90,5), col = 'gray') }, auto.key = list( text = c('Estimated','Ordinary 95% CI', "Bonferroni k=3", "Scheffe d=3", "Scheffe d=7"), lines = T, points = F, columns = 2) ) # # EXERCISE: # What happens if, instead of using Scheffe(7) we resort to # Bonferroni on a fine grid of ages, e.g. every whole month. # Do we get significant results? Is this a reasonable approach? # # QUESTION: # Is there a connection between the p-value of wald( fitsp, Ldiff)[[1]]$anova$`p-value` # of 0.06 with 7 dfs in the overall F-test for Sex differences and the fact that # the specific Sex differences at each age are never significant when # we use the Scheffe( 7 ) adjustment? # # EXERCISES: # 1) Experimenting with knots: # See what happens if you choose different knots. # Coordinate with participants near you to try # different experiments and then compare results. # Fit a model with different knots, then compare # results for testing sex differences: # Things to try: # 1) Try 7 interior knots instead of 5 # 2) Try 3 interior knots and move them around # to increase the significance (decrease the # p-value) for sex comparisons # 3) Try a natural quadratic spline with 5 # interior knots # # e.g. Try using 3 interior # possibly knots chosen on the basis of the initial plot # What price should you pay? The knot adjustments are # non-linear parameters and you could pay 1 DF per knot # to be safe. # # # Age - period = cohort # # We performed the analysis above ignoring cohort (baseline_age), i.e. # assuming there is no effect. # # Consider: # # If 20-year =-olds in 2000 are much healthier than 20-year-olds in 1990 # this could have two explanations: # - General health improved a lot from 1990 to 2000 (period effect), or # - People born in 1980 are much healthier than people born in # 1970 (cohort effect) # What if people of all ages saw steadily improving health from 1990 to # to 2000. Again this could be the period or it could be that # people born later are healthier than people born earlier. # Of course, if this is the explanation, we would expect the # the phenomenon to be persistent. If some cohorts improve steadily # from 1990 to 2000 and others do not, then the explanation that # it's only a period effect is harder to support. # # The problem is that steadily (linearly) improving health over a # long period of time looks exactly the same as lineary improving # cohorts over a long period of time. Age, period and cohort are # collinear. The only way to break the collinearity would be to find # a 20-year-old in 2000 who was born in 1970. # # So disentangling linear effects can't be done. However, non-linear # effects are a different thing. For example if everyone suddenly # becomes less stressed between 1994 and 1995, it could be a period # effect (the referendum?) or it could be that a characteristic of # people born in 1970 was that they would become less stressed at the age of 25, # people born in 1971 would become less stressed at the age of 24, # people born in 1972 would become less stressed at the age of 23, etc. # The only way to blame a sudden change in one calendar year without # accepting that it's a period effect is to postulate a very complex # and implausible interaction between cohort and age. # # In our analysis, we did not include cohort effects. If we had tried # by including baseline_age, we would have had a collinearity in the model. # Is there a way of verifying whether it's reasonable to ignore cohort # effect and to attribute changes to age and period? One way is to test # for cohort effects that are beyond linear. We can do this by # generating a spline and then removing its linear component. If # such cohort effects are negligible, we gain some confidence about # ignoring them. If they are significant, we will wish to explore # them. # # # Since our model is already burdened with parameter, we will simplify # it by removing all but time and age effects before adding higher-order # cohort effects. # # In addition, the fact that we can never disentangle linear age # effects from possibly compensating linear period and # linear cohort effects implies that the non-linear effects # of age should be particularly interesting since they # are the only age effects that are identifiable. # # summary( ind$baseline_age) ind$cohort <- scale( ind$baseline_age ) # mean 0 sd 1 quantile( ind$cohort ) cqs <- quantile( ind$cohort, (1:5)/6) # five knots some(gsp( ind$cohort, cqs, c(2,2,2,2,2,2), c(1,1,1,1,1))) spcohort <- function( x ) gsp( x, cqs, 2, 1)[,-1] # remove 1st column fitspc <- glmmPQL( resp ~ sp(age.y) + time + Season + spcohort( cohort) , ind , random = ~ 1 | id, family = binomial ) summary( fitspc ) wald( fitspc, "cohort" ) # there is no striking evidence of a cohort effect suggesting that # the earlier analysis is reasonable # Recall that our effective sample size is somewhat small so this # result needs to be interpreted cautiously # # Alternatives to glmmPQL # # Advantages of glmmPQL: # # -- Based on lme and can model G and R side models like lme, # -- Can be used for smoothing splines with mixed models # -- Its limitations are somewhat understood. # -- Genearally fast # -- Wide set of methods for lme can be applied # # Disadvantages: # # -- Questionable estimates of components of variance # especially with binary # -- not based on likelihood - can't do LRTs # # glmmPQL has flaws in theory but is still a popular workhorse # because it generally converges and is easy to use # # Alternatives for generalized linear mixed models: # # glmmML in glmmML: # likelihood based, random intercept only. # # lmer in package lme4: # likelihood based, handles crossed effects well (so does lme # but not quite as well), very advanced numerical underpinnings, # no R side, more limited options for G side # hence no smoothing splines. # Many methods have not been developed yet. # lme4 has a function to perform a MCMC analysis on a # fitted model but there are bugs with the method # that need to be resolved. # # glmm in GLMMGibbs: beta release, limited choices: binomial and poisson # Use Bayesian inference. Can use glmmPQL for starting values. # # # glmmML # library(glmmML) fit.ML <- glmmML( resp ~ sp(age.y) * Sex + time + Season + height_for_age, data = ind, cluster = id, family = binomial ) summary( fit.ML ) coef( fit.ML ) str( fit.ML ) str(fitsp) sqrt( diag( fitsp$varFix)) # comparison with glmmPQL comparison <- data.frame( "PQL_coef" = pc <- fixef( fitsp ) , "PQL_sd" = ps <- sqrt( diag( fitsp$varFix)), "PQL_z" = pc/ps, "ML_coef" = mc <- coef(fit.ML), "ML_sd" = ms <- fit.ML$coef.sd, "ML_z" = mc/ms) comparison # ML is more conservative. I would expect them to be more similar # with a larger effective sample size and more observations per cluster. # # Some disadvantage: # Limited G side, no R side, only two families, limited links, # Few methods: (e.g. predict, resid, vcov do no exist yet) but it would # probably be easy to write them # # # lmer # # recall: fitsp <- glmmPQL( resp ~ sp(age.y) * Sex + time + Season + height_for_age, ind, random = ~ 1 | id, family = binomial ) library( lme4 ) # this used to conflict with nlme and one would have # to close and restart R before loading the other # I have experienced some problems but the two # should work much better now fitlmer <- lmer( resp ~ sp(age.y) * Sex + time + Season + height_for_age + (1|id), data = ind, family = binomial) ss <- summary(fitlmer) fixef( fitlmer) # coefficient sqrt( diag( vcov ( fitlmer ))) # their sds # EXERCISE: # # Add lmer to the comparison matrix. # Does lmer look closer to glmmPQL or to glmmML? # Plot the profile of coefs, sds and zs superposed for each method # What patterns emerge? # # # # Summer projects # # 1) Add robust covariance option to 'wald' 2) Add influence diagnostics to nlme 3) Wald test for non-linear functions of parameters 4) Bootstrapping for mixed models 5) Weighted bootstrapping for mixed models 6) Clean up documentation for 'spida' and 'p3d'