R: lmer

From MathWiki

This page needs cleaning up.

Many new methods have been written for 'lmer' objects. For example, there is now an 'mcmcsamp' method that allows mcmc sampling of a posterior distribution. There is also a non-linear version, 'nlmer'. The main limitation of 'lmer' in comparison with 'lme' is the absence of an "R-side" model.


Using lmer and nlme at the same time

Summary:

A workaround: Redefine print.summary.lme by commenting the line

   class(corr) <- "correlation"

R list posting by Martin Maechler:

>>>>> "Dirk" == Dirk Enzmann <dirk.enzmann@uni-hamburg.de>
>>>>> >>>>>     on Mon, 24 Jul 2006 13:55:44 +0200 writes:

    Dirk> After loading the package "Matrix" (version 0.995-12),
    Dirk> using the summary function with an lme (package nlme
    Dirk> version 3.1-75) object results in an error message
    Dirk> saying

    Dirk> Fehler in dim(x) : kein Slot des Namens "Dim" für
    Dirk> dieses Objekt der Klasse "correlation"

    Dirk> (translated: 'Error in dim(x) : no slot of the name
    Dirk> "Dim" for this object of class "correlation")'.

    Dirk> Without loading "Matrix" this error message does not
    Dirk> occur.

    Dirk> Any ideas?

Yes. Thank you for the report.
The error happens when PRINTing a summary.lme object, 
i.e.,
	ssl <- summary(<lme> )  ## works fine
	ssl                     ##-> gives the error

The short answer:  
   Don't use 'Matrix' together with 'nlme'  for the time being
   (but rather use 'lme4' and lmer() if possible)

A long answer:

The problem happens because
'Matrix' defines a (S4) class "correlation"  (which inherits from
"dpoMatrix").  Hence  dim( <correlation> ) is accessing the
'Dim' slot of the <correlation> object --- which obviously
must fail for that part of a  "summary.lme" object which is not
a proper S4 object at all.

Here is reproducible example of showing the problem without even
doing an nlme related thing: 

   > x <- round(cor(matrix(rnorm(100), 20,5)),2); class(x) <- "correlation" ; x
	 [,1]  [,2]  [,3]  [,4]  [,5]
   [1,]  1.00 -0.29 -0.07 -0.11  0.03
   [2,] -0.29  1.00  0.10 -0.02  0.20
   [3,] -0.07  0.10  1.00  0.05  0.13
   [4,] -0.11 -0.02  0.05  1.00 -0.21
   [5,]  0.03  0.20  0.13 -0.21  1.00
   attr(,"class")
   [1] "correlation"
   > dim(x)
   [1] 5 5
   > library(Matrix)
   Loading required package: lattice
   > dim(x)
   Error in dim(x) : no slot of name "Dim" for this object of class "correlation"
   > dim
   .Primitive("dim")
   > showMethods(dim)

   Function "dim":
    x = "ANY"
    x = "Matrix"
    x = "correlation"
       (inherited from x = "Matrix")
   > showClass("correlation")

   Slots:

   Name:         sd         x       Dim  Dimnames      uplo   factors
   Class:   numeric   numeric   integer      list character      list

   Extends: 
   Class "dpoMatrix", directly
   Class "dsyMatrix", by class "dpoMatrix"
   Class "ddenseMatrix", by class "dpoMatrix"
   Class "symmetricMatrix", by class "dpoMatrix"
   Class "dMatrix", by class "dpoMatrix"
   Class "denseMatrix", by class "dpoMatrix"
   Class "Matrix", by class "dpoMatrix"
   Class "Matrix", by class "dpoMatrix"
   Class "compMatrix", by class "dpoMatrix"
   Class "Matrix", by class "dpoMatrix"
   > 


A workaround -- not for you but the authors of 'Matrix' and/or 'nlme' --
is for one of the two packages to call the class differently.
At the moment, I tend to do the change in 'nlme', since there,
the "correlation" class has been almost entirely hidden from the user.

A workaround for you:  Redefine print.summary.lme, e.g., by
commenting the line 
    class(corr) <- "correlation"

Regards,
Martin

______________________________________________
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

R-list messages from Doug Bates

Hi Spencer,

On 7/15/06, Spencer Graves <spencer.graves@pdf.com> wrote:

>> p.s.  I intended to include the following extension to an example from
>> the 'lmer' help page:
>>
>> fm1 <- lmer(Reaction ~ Days + (Days|Subject), sleepstudy)
>> (fm1.fix <- fixef(fm1))
>> (fm1.fix.se <- sqrt(diag(vcov(fm1))))
>> fm1.fix/fm1.fix.se
>>
>> fm1.ran <- VarCorr(fm1)
>> diag(fm1.ran$Subject)


I'm confident that you are aware that sqrt(diag(vcov(fm1))) and
sqrt(diag(fm1.ran$Subject)) refer to different parameters the model.
However, some readers of your reply may not see the distinction.
Allow me to try to clarify.

The summary for this fitted model is

lmer> (fm1 <- lmer(Reaction ~ Days + (Days | Subject), sleepstudy))
Linear mixed-effects model fit by REML
Formula: Reaction ~ Days + (Days | Subject)
	  Data: sleepstudy
      AIC      BIC    logLik MLdeviance REMLdeviance
 1753.628 1769.593 -871.8141   1751.986     1743.628
Random effects:
 Groups   Name        Variance Std.Dev. Corr
 Subject  (Intercept) 612.090  24.7405
          Days         35.072   5.9221  0.066
 Residual             654.941  25.5918
number of obs: 180, groups: Subject, 18

Fixed effects:
            Estimate Std. Error t value
(Intercept) 251.4051     6.8246  36.838
Days         10.4673     1.5458   6.771

Correlation of Fixed Effects:
     (Intr)
Days -0.138

The fixed effects described in the lower part of this summary are a
familiar type of parameter in a statistical model.  They are
coefficients in a linear predictor.  We produce estimates of these
parameters and also provide a measure of the precision of these
estimates - their standard errors.  The vcov generic function returns
an estimate of the precision of the estimated parameters (typically
parameters that are coefficients in a linear predictor).  Thus


>> sqrt(diag(vcov(fm1)))

[1] 6.824558 1.545789

provides the standard errors of the fixed effects estimates.

The random effects, although they also appear in the linear predictor,
are not formally parameters in the model.  They are unobserved random
variables whose variance covariance matrix has a known form but
unknown value.  It is the values that determine the
variance-covariance matrix that are parameters in the model.  These
parameters are returned by VarCorr


>> VarCorr(fm1)

$Subject
2 x 2 Matrix of class "dpoMatrix"
            (Intercept)     Days
(Intercept)   612.09032  9.60428
Days            9.60428 35.07165

attr(,"sc")
[1] 25.59182

In other words, the 2 x 2 matrix shown above is the estimate of the
variance-covariance matrix of the random effects associated with the
grouping factor "Subject".

Thus


>> sqrt(diag(VarCorr(fm1)$Subject))

[1] 24.740459  5.922132

gives the estimated standard deviations of the random effects.  These
are estimates of parameters in the model.  They are *not* standard
errors of parameter estimates.  The lmer function and related software
does not return standard errors of the estimates of variance
components.  This is intentional.  Below I give my familiar rant on
why I think returning such standard errors is a bad practice.  I
encourage users of lmer who wish to determine the precision of the
estimates of the variance components to create a Markov chain Monte
Carlo sample of the parameters and evaluate the HPDintervals.


>> sm1 <- mcmcsamp(fm1, 50000)
>> library(coda)

Warning message:
use of NULL environment is deprecated

>> HPDinterval(sm1)

                          lower        upper
(Intercept)         236.6518363  266.5465536
Days                  7.0136243   13.8947993
log(sigma2)          6.2550082    6.7295329
log(Sbjc.(In))        5.4928205    7.5751372
log(Sbjc.Days)        2.8197523    4.6337518
atanh(Sbj.(I).Dys)   -0.6988632    0.9836688
deviance           1752.5158501 1766.6461469
attr(,"Probability")
[1] 0.95

<rant>
Some software, notably SAS PROC MIXED, does produce standard errors
for the estimates of variances and covariances of random effects.  In
my opinion this is more harmful than helpful.  The only use I can
imagine for such standard errors is to form confidence intervals or to
evaluate a z-statistic or something like that to be used in a
hypothesis test.  However, those uses require that the distribution of
the parameter estimate be symmetric, or at least approximately
symmetric, and we know that the distribution of the estimate of a
variance component is more like a scaled chi-squared distribution
which is anything but symmetric.  It is misleading to attempt to
summarize our information about a variance component by giving only
the estimate and a standard error of the estimate.
</rant>

>>
>> ########################
>>           The structure of 'lmer' objects and helper functions is outlined in
>> the 'lmer' and 'lmer-class' help pages.  The latter mentions "'vcov
>> 'signature(object = "mer")': Calculate variance-covariance matrix of the
>> _fixed_ effect terms, see also 'vcov'."  Thus,
>> sqrt(diag(vcov(lmer.object))) should give you standard errors of the
>> fixed effects.
>>
>>           The parameter estimates can be obtained using 'VarCorr'.  However,
>> characterizing their random variability is harder, because their
>> distribution is not as simply summarized.  The 'lmer-class' help page
>> says that an 'lmer' object includes a slot, "'Omega': A list of
>> positive-definite matrices stored as '"dpoMatrix"' objects that are the
>> relative precision matrices of the random effects associated with each
>> of the grouping factors."  However, I don't know how to use this.  For
>> problems like this, the 'lme4' and 'Matrix' packages include a function
>> 'simulate', which is what I might use, at least until I figured out how
>> to get essentially the same answer from the Omega slot.
>>
>>           Hope this helps,
>>           Spencer Graves
>>
>> prabhu bhaga wrote:
>
>>> > Dear all,
>>> >
>>> > I'm trying to store/extract the mean& standard error of the fixed effects
>>> > parameter and the variance of the random effects parameter from "lmer"
>>> > procedure from mlmre4 package developed by bates n pinheiro. while storing
>>> > fixed effects parameter is straight forward, the same is not true for
>>> > storing the variance parameter of the random effects. kindly help me
>>> >
>>> > ~prabhu
>>> >
>>> >       [[alternative HTML version deleted]]
>>> >
>>> > ______________________________________________
>>> > R-help@stat.math.ethz.ch mailing list
>>> > https://stat.ethz.ch/mailman/listinfo/r-help
>>> > PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
>
>>
>> ______________________________________________
>> R-help@stat.math.ethz.ch mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
>>


______________________________________________
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

On 7/15/06, A.R. Criswell <rstatistics@gmail.com> wrote:

>> Hello All,
>>
>> I would like to retrieve some of the results from the lmer(...)
>> function in library lme4. If I run a model, say
>>
>> fm.1 <- lmer(y ~ 1 + (1 | x), data = dog)
>>
>> and try names(fm.1), I get NULL. Is there anyway to retrieve the information?


Yes.

The recommended way of retrieving information form a fitted model
object like an lmer object is with extractor functions.  See

?lmer-class

for a list of such methods.  That help page also documents the slots
in the object.  The lmer class is an S4 class and uses typed slots
instead of  named components.

The str function displays the structure of pretty well any type of R
object, including S3 classed objects or S4 classed objects.  That is
my favorite way of checking the structure of an object.

Please remember that it is risky to count on being able to reach in to
an object and pull out slots or components and operate on them.  The
names and contents of slots are not guaranteed to stay constant.  The
lme4 and Matrix packages have been under development for a long time
and should continue to be regarded as under development.  When we
change the internal representation we do change the extractor
functions accordingly.  It is a bug if an internal change causes an
extractor function in the package to fail to return correct results.
It is not a bug if an internal change causes your code that assumes a
particular, obsolete representation to break.

______________________________________________
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html



The apparent contradiction here is again a case of inadequate
documentation and checking.

Because the order of the triplets in the triplet form of a sparse
matrix is essentially random it is permissible to have repeated
indices.  As you have seen, the interpretation of repeated indices is
that the value at any index is the sum of the values in the triplets
corresponding to that index.

 It is not permissible to have repeated indices in the compressed
form.  In the compressed form there is a well-defined ordering of the
indices, first by columns then by row within column and the row
indices must be increasing within columns.

Your matrix Mc should be flagged as invalid.  Martin and I should
discuss whether we want to add such a test to the validity method.  It
is not difficult to add the test but there will be a penalty in that
it will slow down all operations on such matrices and I'm not sure if
we want to pay that price to catch a rather infrequently occuring
problem.

There is some documentation of the internal representations of these
formats in the directory $R_LIB/Matrix/doc/UFSparse/.  The User Guides
in that directory are taken directly from the various sparse matrix
packages that Tim Davis at the University of Florida has written.  He
has a book that is scheduled for publication this  September

Tim Davis (2006), Direct Methods for Sparse Linear Systems, SIAM,
Philadelphia, PA

I hope we will be able to refer to that book for details of the
representation and algorithms.

On 7/8/06, Thaden, John J <ThadenJohnJ@uams.edu> wrote:

>> In the Matrix package v. 0.995-11 I see that the dgTMatrix
>> Class for compressed, sparse, triplet-form matrices handles
>> Identically indexed data instances by summing their values,
>> e.g.,
>>
>> library(Matrix)
>> (Mt <- new("dgTMatrix",
>>    i = as.integer(c(0,0,1,1,4)),
>>    j = as.integer(c(0,1,2,2,4)),
>>    x = as.double(1:5),
>>    Dim = as.integer(c(5,5))))
>> ## 5 x 5 sparse Matrix of class "dgTMatrix"
>> ## [1,] 1 2 . . .
>> ## [2,] . . 7 . .    <--- 7 = 3 + 4.
>> ## [3,] . . . . .
>> ## [4,] . . . . .
>> ## [5,] . . . . 5
>>
>> # If instead I make a dgCMatrix-class matrix, the first
>> # instance is overwritten by the second, e.g.,
>>
>> library(Matrix)
>> (Mc <- new("dgCMatrix",
>>    i = as.integer(c(0,0,1,1,4)),
>>    p = as.integer(c(0,1,2,4,5)),
>>    x = as.double(1:5),
>>    Dim = as.integer(c(5,5))))
>> ## 5 x 5 sparse Matrix of class "dgCMatrix"
>> ##
>> ## [1,] 1 2 . .
>> ## [2,] . . 4 .   <-- the datum '3' has been lost.
>> ## [3,] . . . .
>> ## [4,] . . . .
>> ## [5,] . . . 5
>>
>> # If one arrives at the dgCMatrix via the dgTMatrix class,
>> # the summed value is of course preserved, e.g.,
>>
>> (Mtc <- as(Mt, "dgCMatrix"))
>> ## 5 x 5 sparse Matrix of class "dgCMatrix"
>> ##
>> ## [1,] 1 2 . . .
>> ## [2,] . . 7 . .
>> ## [3,] . . . . .
>> ## [4,] . . . . .
>> ## [5,] . . . . 5
>>
>> As there is nothing inherent in either compressed, sparse,
>> format that would prevent recognition and handling of
>> duplicated index pairs, I'm curious why the dgCMatrix
>> class doesn't also add x values in those instances?
>> I wonder also if others might benefit also by being able
>> to choose how these instances are handled, i.e.,
>> whether they are summed, averaged or overwritten?
>>
>> -John Thaden, Ph.D.
>> Research Assistant Professor of Geriatrics
>> University of Arkansas for Medical Sciences
>> Little Rock AR, USA
>>
>>
>> Confidentiality Notice: This e-mail message, including any attachments, is for the sole use of the intended recipient(s) and may contain confidential and privileged information.  Any unauthorized review, use, disclosure or distribution is prohibited.  If you are not the intended recipient, please contact the sender by reply e-mail and destroy all copies of the original message.
>>
>>


______________________________________________
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Degrees of freedom for lmer

On 7/26/06, Bill Shipley <bill.shipley@usherbrooke.ca> wrote:

> > Hello.  Douglas Bates has explained in a previous posting to R why he does
> > not output residual degrees of freedom, F values and probabilities in the
> > mixed model (lmer) function:  because the usual degrees of freedom (obs -
> > fixed df -1) are not exact and are really only upper bounds.  I am
> > interpreting what he said but I am not a professional statistician, so I
> > might be getting this wrong...

> > Does anyone know of any more recent results, perhaps from simulations, that
> > quantify the degree of bias that using such upper bounds for the demoninator
> > degrees of freedom produces?  Is it possible to calculate a lower bounds for
> > such degrees of freedom?

I have not seen any responses to your request yet Bill.  I was hoping
that others might offer their opinions and provide some new
perspectives on this issue.  However, it looks as if you will be stuck
with my responses for the time being.

You have phrased your question in terms of the denominator degrees of
freedom associated with terms in the fixed-effects specification and,
indeed, this is the way the problem is usually addressed.  However,
that is jumping ahead two or three steps from the iniital problem
which is how to perform an hypothesis test comparing two nested models
- a null model without the term in question and the alternative model
including this term.

If we assume that the F statistic is a reasonable way of evaluating
this hypothesis test and that the test statistic will have an F
distribution with a known numerator degrees of freedom and an unknown
denominator degrees of freedom then we can reduce the problem of
testing the hypothesis to one of approximating the denominator degrees
of freedom.  However, there is a lot of assumption going on in that
argument.  These assumptions may be warranted or they may not.

As far as I can see, the usual argument made for those assumptions is
by analogy.  If we had a balanced design and if we used error strata
to get expected and observed mean squares and if we equated expected
and observed mean squares to obtain estimates of variance components
then the test for a given term in the fixed effects specification
would have a certain form.  Even though we are not doing any of these
things when estimating variance components by maximum likelihood or by
REML, the argument is that the test for a fixed effects term should
end up with the same form.  I find that argument to be a bit of a
stretch.

Because the results from software such as SAS PROC MIXED are based on
this type of argument many people assume that it is a well-established
result that the test should be conducted in this way.  Current
versions of PROC MIXED allow for several different ways of calculating
denominator degrees of freedom, including at least one, the
Kenward-Roger  method, that uses two tuning parameters - denominator
degrees of freedom and a scale factor.

Some simulation studies have been performed comparing the methods in
SAS PROC MIXED and other simulation studies are planned but for me
this is all putting the cart before the horse.  There is no answer to
the question "what is the _correct_ denominator degrees of freedom for
this test statistic" if the test statistic doesn't have a F
distribution with a known numerator degrees of freedom and an unknown
denominator degrees of freedom.

I don't think there is a perfect answer to this question.  I like the
approach using Markov chain Monte Carlo samples from the posterior
distribution of the parameters because it allows me to assess the
distribution of the parameters and it takes into account the full
range of the variation in the parameters (the F-test approach is
conditional on estimates of the variance components).  However, it
does not produce a nice cryptic p-value for publication.

I understand the desire for a definitive answer that can be used in a
publication.  However, I am not satisfied with any of the "definitive
answers" that are out there and I would rather not produce an answer
than produce an answer that I don't believe in.

______________________________________________
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.