2009-09-18

Power Analysis for mixed-effect models in R

The power of a statistical test is the probability that a null hypothesis will be rejected when the alternative hypothesis is true. In lay terms, power is your ability to refine or "prove" your expectations from the data you collect. The most frequent motivation for estimating the power of a study is to figure out what sample size will be needed to observe a treatment effect. Given a set of pilot data or some other estimate of the variation in a sample, we can use power analysis to inform how much additional data we should collect.

I recently did a power analysis on a set of pilot data for a long-term monitoring study of the US National Park Service. I thought I would share some of the things I learned and a bit of R code for others that might need to do something like this. If you aren't into power analysis, the code below may still be useful as examples of how to use the error handling functions in R (withCallingHandlers, withRestarts), parallel programming using the snow package, and linear mixed effect regression using nlme. If you have any suggestions for improvement or if I got something wrong on the analysis, I'd love to hear from you.

1 The Study

The study system was cobblebars along the Cumberland river in Big South Fork National Park (Kentucky and Tennessee, United States). Cobblebars are typically dominated by grassy vegetation that include disjunct tall-grass prairie species. It is hypothesized that woody species will encroach onto cobblebars if they are not seasonally scoured by floods. The purpose of the NPS sampling was to observe changes in woody cover through time. The study design consisted of two-stages of clustering: the first being cobblebars, and the second being transects within cobblebars. The response variable was the percentage of the transect that was woody vegetation. Because of the clustered design, the inferential model for this study design has mixed-effects. I used a simple varying intercept model:

where y is the percent of each transect in woody vegetation sampled n times within J cobblebars, each with K transects. The parameter of inference for the purpose of monitoring change in woody vegetation through time is β, the rate at which cover changes as a function of time. α, γ, σ2γ, and σ2y are hyper-parameters that describe the hierarchical variance structure inherent in the clustered sampling design.

Below is the function code used I used to regress the pilot data. It should be noted that with this function you can log or logit transform the response variable (percentage of transect that is woody). I put this in because the responses are proportions (0,1) and errors should technically follow a beta-distribution. Log and logit transforms with Gaussian errors could approximate this. I ran all the models with transformed and untransformed response, and the results did not vary at all. So, I stuck with untransformed responses:

Model <- function(x = cobblebars,
  type = c("normal","log","logit")){
  ## Transforms
  if (type[1] == "log")
    x$prop.woody <- log(x$prop.woody)
  else if (type[1] == "logit")
    x$prop.woody <- log(x$prop.woody / (1 - x$prop.woody))

  mod <- lme(prop.woody ~ year,
             data = x,
             random = ~ 1 | cobblebar/transect,
             na.action = na.omit,
             control = lmeControl(opt = "optim",
               maxIter = 800, msMaxIter = 800)
             )
  mod$type <- type[1]

  return(mod)
}

Here are the results from this regression of the pilot data:

Linear mixed-effects model fit by REML
 Data: x 
        AIC       BIC   logLik
  -134.4319 -124.1297 72.21595

Random effects:
 Formula: ~1 | cobblebar
        (Intercept)
StdDev:  0.03668416

 Formula: ~1 | transect %in% cobblebar
        (Intercept)   Residual
StdDev:  0.02625062 0.05663784

Fixed effects: prop.woody ~ year 
                  Value  Std.Error DF   t-value p-value
(Intercept)  0.12966667 0.01881983 29  6.889896  0.0000
year        -0.00704598 0.01462383 29 -0.481815  0.6336
 Correlation: 
     (Intr)
year -0.389

Number of Observations: 60
Number of Groups: 
              cobblebar transect %in% cobblebar 
                      6                      30 

2 We don't learn about power analysis and complex models

When I decided upon the inferential model the first thing that occurred to me was that I never learned in any statistics course I had taken how to do such a power analysis on a multi-level model. I've taken more statistics courses than I'd like to count and taught my own statistics courses for undergrads and graduate students, and the only exposure to power analysis that I had was in the context of simple t-tests or ANOVA. You learn about it in your first 2 statistics courses, then it rarely if ever comes up again until you actually need it.

I was, however, able to find a great resource on power analysis from a Bayesian perspective in the excellent book "Data Analysis Using Regression and Multilevel/Hierarchical Models" by Andrew Gelman and Jennifer Hill. Andrew Gelman has thought and debated about power analysis and you can get more from his blog. The approach in the book is a simulation-based one and I have adopted it for this analysis.

3 Analysis Procedure

For the current analysis we needed to know three things: effect size, sample size, and estimates of population variance. We set effect size beforehand. In this context, the parameter of interest is the rate of change in woody cover through time β, and effect size is simply how large or small a value of β you want to distinguish with a regression. Sample size is also set a priori. In the analysis we want to vary sample size by varying the number of cobblebars, the number of transects per cobblebar or the number of years the study is conducted.

The population variance cannot be known precisely, and this is where the pilot data come in. By regressing the pilot data using the model we can obtain estimates of all the different components of the variance (cobblebars, transects within cobblebars, and the residual variance). Below is the R function that will return all the hyperparameters (and β) from the regression:

GetHyperparam<-function(x,b=NULL){
  ## Get the hyperparameters from the mixed effect model
  fe <- fixef(x)
  
  if(is.null(b))
    b<-fe[2] # use the data effect size if not supplied

  mu.a <- fe[1] 

  vc <- VarCorr(x)
  sigma.y <- as.numeric(vc[5, 2]) # Residual StdDev
  sigma.a <- as.numeric(vc[2, 2]) # Cobblebar StdDev
  sigma.g <- as.numeric(vc[4, 2]) # Cobblebar:transect StdDev

  hp<-c(b, mu.a, sigma.y, sigma.a, sigma.g)
  names(hp)<-c("b", "mu.a", "sigma.y", "sigma.a", "sigma.g")
  return(hp)
}

To calculate power we to regress the simulated data in the same way we did the pilot data, and check for a significant β. Since optimization is done using numeric methods there is always the chance that the optimization will not work. So, we make sure the regression on the fake data catches and recovers from all errors. The solution for error recovery is to simply try the regression on a new set of fake data. This function is a pretty good example of using the R error handling function withCallingHandlers and withRestarts.

fakeModWithRestarts <- function(m.o, n = 100,  ...){
  ## A Fake Model
  withCallingHandlers({
    i <- 0
    mod <- NULL
    while (i < n & is.null(mod)){
      mod <- withRestarts({
        f <- fake(m.orig = m.o, transform = F, ...)
        return(update(m.o, data = f))
      },
      rs = function(){
        i <<- i + 1
        return(NULL)
      })
    }
    if(is.null(mod))
      warning("ExceededIterations")
    return(mod)
  },
  error = function(e){
    invokeRestart("rs")
  },
  warning = function(w){
    if(w$message == "ExceededIterations")
      cat("\n", w$message, "\n")
    else
      invokeRestart("rs")
  })
}

To calculate the power of a particular design we run fakeModWithRestarts 1000 times and look at the proportion of significant β values:

dt.power <- function (m, n.sims = 1000, alpha=0.05, ...){
  ## Calculate power for a particular sampling design
  signif<-rep(NA, n.sims)
  for(i in 1:n.sims){
    lme.power <- fakeModWithRestarts(m.o = m, ...)
    if(!is.null(lme.power))
      signif[i] <- summary(lme.power)$tTable[2, 5] < alpha
  }
  power <- mean(signif, na.rm = T)
  return(power)
}

Finally, we want to perform this analysis on many different sampling designs. In my case I did all combinations of set of effect sizes, cobblebars, transects, and years. So, I generated the appropriate designs:

factoredDesign <- function(Elevs = 0.2/c(1,5,10,20),
                           Nlevs = seq(2, 10, by = 2),
                           Jlevs = seq(4, 10, by = 2),
                           Klevs = c(3, 5, 7), ...){
  ## Generates factored series of sampling designs for simulation
  ## of data that follow a particular model.
  ## Inputs:
  ##   Elevs - vector of effect sizes for the slope parameter.
  ##   Nlevs - vector of number of years to sample.
  ##   Jlevs - vector of number of cobblebars to sample.
  ##   Klevs - vector of number of transects to sample.
  ## Results:
  ##   Data frame with where columns are the factors and
  ##   rows are the designs.

  # Level lengths
  lE <- length(Elevs)
  lN <- length(Nlevs)
  lJ <- length(Jlevs)
  lK <- length(Klevs)

  # Generate repeated vectors for each factor
  E <- rep(Elevs, each = lN*lJ*lK)
  N <- rep(rep(Nlevs, each = lJ*lK), times = lE)
  J <- rep(rep(Jlevs, each = lK), times = lE*lN)
  K <- rep(Klevs, times = lE*lN*lJ)
  
  return(data.frame(E, N, J, K))
}

Once we know our effect sizes, the different sample sizes we want, and the estimates of population variance we can generate simulated dataset that are similar to the pilot data. To calculate power we simply simulate a large number of dataset and calculate the proportion of slopes, β that are significantly different from zero (p-value < 0.05). This procedure is repeated for all the effect sizes and sample sizes of interest. Here is the code for generating a simulated dataset. It also does the work of doing the inverse transform of the response variables if necessary.

fake <- function(N = 2, J = 6, K = 5, b = NULL, m.orig = mod,
                 transform = TRUE, ...){
  ## Simulated Data for power analysis
  ## N = Number of years
  ## J = Number of cobblebars
  ## K = Number of transects within cobblebars
  year <- rep(0:(N-1), each = J*K)
  cobblebar <- factor(rep(rep(1:J, each = K), times = N))
  transect <- factor(rep(1:K, times = N*J))

  ## Simulated parameters
  hp<-GetHyperparam(x=m.orig)
  if(is.null(b))
    b <- hp['b']
  g <- rnorm(J*K, 0, hp['sigma.g'])
  a <- rnorm(J*K, hp['mu.a'] + g, hp['sigma.a'])
  
  ## Simulated responses
  eta <- rnorm(J*K*N, a + b * year, hp['sigma.y'])
  if (transform){
    if (m.orig$type == "normal"){
      y <- eta
      y[y > 1] <- 1 # Fix any boundary problems.
      y[y < 0] <- 0
    }
    else if (m.orig$type == "log"){
      y <- exp(eta)
      y[y > 1] <- 1
    }
    else if (m.orig$type == "logit")
      y <- exp(eta) / (1 + exp(eta))
  }
  else{
    y <- eta
  }
  
  return(data.frame(prop.woody = y, year, transect, cobblebar))
}

Then I performed the power calculations on each of these designs. This could take a long time, so I set this procedure to use parallel processing if needed. Note that I had to re-~source~ the file with all the necessary functions for each processor.

powerAnalysis <- function(parallel = T, ...){
  ## Full Power Analysis
  
  ## Parallel
  if(parallel){
    closeAllConnections()
    cl <- makeCluster(7, type = "SOCK")
    on.exit(closeAllConnections())
    clusterEvalQ(cl, source("cobblebars2.r"))
  }
  
  ## The simulations
  dat <- factoredDesign(...)

  if (parallel){
    dat$power <- parRapply(cl, dat, function(x,...){
      dt.power(N = x[2], J = x[3], K = x[4], b = x[1], ...)
    }, ...)
  } else {
    dat$power <- apply(dat, 1, function(x, ...){
      dt.power(N = x[2], J = x[3], K = x[4], b = x[1], ...)
    }, ...)
  }

  return(dat)
}

The output of the powerAnalysis function is a data frame with columns for the power and all the sample design settings. So, I wrote a custom plotting function for this data frame:

plotPower <- function(dt){
  xyplot(power~N|J*K, data = dt, groups = E,
         panel = function(...){panel.xyplot(...)
                               panel.abline(h = 0.8, lty = 2)},
         type = c("p", "l"),
         xlab = "sampling years",
         ylab = "power",
         strip = strip.custom(var.name = c("C", "T"),
           strip.levels = c(T, T)),
         auto.key = T
         )
}

Below is the figure for the cobblebar power analysis. I won't go into detail on what the results mean since I am concerned here with illustrating the technique and the R code. Obviously, as the number of cobblebars and transects per year increase, so does power. And, as the effect size increases, observing it with a test is easier.

Author: Todd Jobe <toddjobe@unc.edu>

Date: 2009-09-18 Fri

HTML generated by org-mode 6.30trans in emacs 22

Labels: , , , , ,