For the current analysis we needed to know three things: effect
size, sample size, and estimates of population variance. We set
effect size beforehand. In this context, the parameter of interest
is the rate of change in woody cover through time β, and
effect size is simply how large or small a value of β you want
to distinguish with a regression. Sample size is also set a priori. In the analysis we want to vary sample size by varying the
number of cobblebars, the number of transects per cobblebar or the
number of years the study is conducted.
The population variance cannot be known precisely, and this is where
the pilot data come in. By regressing the pilot data using the
model we can obtain estimates of all the different components of the
variance (cobblebars, transects within cobblebars, and the residual
variance). Below is the R function that will return all the
hyperparameters (and β) from the regression:
GetHyperparam<-function(x,b=NULL){
## Get the hyperparameters from the mixed effect model
fe <- fixef(x)
if(is.null(b))
b<-fe[2] # use the data effect size if not supplied
mu.a <- fe[1]
vc <- VarCorr(x)
sigma.y <- as.numeric(vc[5, 2]) # Residual StdDev
sigma.a <- as.numeric(vc[2, 2]) # Cobblebar StdDev
sigma.g <- as.numeric(vc[4, 2]) # Cobblebar:transect StdDev
hp<-c(b, mu.a, sigma.y, sigma.a, sigma.g)
names(hp)<-c("b", "mu.a", "sigma.y", "sigma.a", "sigma.g")
return(hp)
}
To calculate power we to regress the simulated data in the same way we
did the pilot data, and check for a significant β. Since
optimization is done using numeric methods there is always the chance
that the optimization will not work. So, we make sure the regression
on the fake data catches and recovers from all errors. The solution
for error recovery is to simply try the regression on a new set of
fake data. This function is a pretty good example of using the R
error handling function withCallingHandlers
and
withRestarts
.
fakeModWithRestarts <- function(m.o, n = 100, ...){
## A Fake Model
withCallingHandlers({
i <- 0
mod <- NULL
while (i < n & is.null(mod)){
mod <- withRestarts({
f <- fake(m.orig = m.o, transform = F, ...)
return(update(m.o, data = f))
},
rs = function(){
i <<- i + 1
return(NULL)
})
}
if(is.null(mod))
warning("ExceededIterations")
return(mod)
},
error = function(e){
invokeRestart("rs")
},
warning = function(w){
if(w$message == "ExceededIterations")
cat("\n", w$message, "\n")
else
invokeRestart("rs")
})
}
To calculate the power of a particular design we run
fakeModWithRestarts
1000 times and look at the proportion of
significant β values:
dt.power <- function (m, n.sims = 1000, alpha=0.05, ...){
## Calculate power for a particular sampling design
signif<-rep(NA, n.sims)
for(i in 1:n.sims){
lme.power <- fakeModWithRestarts(m.o = m, ...)
if(!is.null(lme.power))
signif[i] <- summary(lme.power)$tTable[2, 5] < alpha
}
power <- mean(signif, na.rm = T)
return(power)
}
Finally, we want to perform this analysis on many different sampling
designs. In my case I did all combinations of set of effect sizes,
cobblebars, transects, and years. So, I generated the appropriate designs:
factoredDesign <- function(Elevs = 0.2/c(1,5,10,20),
Nlevs = seq(2, 10, by = 2),
Jlevs = seq(4, 10, by = 2),
Klevs = c(3, 5, 7), ...){
## Generates factored series of sampling designs for simulation
## of data that follow a particular model.
## Inputs:
## Elevs - vector of effect sizes for the slope parameter.
## Nlevs - vector of number of years to sample.
## Jlevs - vector of number of cobblebars to sample.
## Klevs - vector of number of transects to sample.
## Results:
## Data frame with where columns are the factors and
## rows are the designs.
# Level lengths
lE <- length(Elevs)
lN <- length(Nlevs)
lJ <- length(Jlevs)
lK <- length(Klevs)
# Generate repeated vectors for each factor
E <- rep(Elevs, each = lN*lJ*lK)
N <- rep(rep(Nlevs, each = lJ*lK), times = lE)
J <- rep(rep(Jlevs, each = lK), times = lE*lN)
K <- rep(Klevs, times = lE*lN*lJ)
return(data.frame(E, N, J, K))
}
Once we know our effect sizes, the different sample sizes we want,
and the estimates of population variance we can generate simulated
dataset that are similar to the pilot data. To calculate power we
simply simulate a large number of dataset and calculate the
proportion of slopes, β that are significantly different from
zero (p-value < 0.05). This procedure is repeated for all the
effect sizes and sample sizes of interest. Here is the code for
generating a simulated dataset. It also does the work of doing the
inverse transform of the response variables if necessary.
fake <- function(N = 2, J = 6, K = 5, b = NULL, m.orig = mod,
transform = TRUE, ...){
## Simulated Data for power analysis
## N = Number of years
## J = Number of cobblebars
## K = Number of transects within cobblebars
year <- rep(0:(N-1), each = J*K)
cobblebar <- factor(rep(rep(1:J, each = K), times = N))
transect <- factor(rep(1:K, times = N*J))
## Simulated parameters
hp<-GetHyperparam(x=m.orig)
if(is.null(b))
b <- hp['b']
g <- rnorm(J*K, 0, hp['sigma.g'])
a <- rnorm(J*K, hp['mu.a'] + g, hp['sigma.a'])
## Simulated responses
eta <- rnorm(J*K*N, a + b * year, hp['sigma.y'])
if (transform){
if (m.orig$type == "normal"){
y <- eta
y[y > 1] <- 1 # Fix any boundary problems.
y[y < 0] <- 0
}
else if (m.orig$type == "log"){
y <- exp(eta)
y[y > 1] <- 1
}
else if (m.orig$type == "logit")
y <- exp(eta) / (1 + exp(eta))
}
else{
y <- eta
}
return(data.frame(prop.woody = y, year, transect, cobblebar))
}
Then I performed the power calculations on each of these designs. This
could take a long time, so I set this procedure to use parallel processing
if needed. Note that I had to re-~source~ the file with all the
necessary functions for each processor.
powerAnalysis <- function(parallel = T, ...){
## Full Power Analysis
## Parallel
if(parallel){
closeAllConnections()
cl <- makeCluster(7, type = "SOCK")
on.exit(closeAllConnections())
clusterEvalQ(cl, source("cobblebars2.r"))
}
## The simulations
dat <- factoredDesign(...)
if (parallel){
dat$power <- parRapply(cl, dat, function(x,...){
dt.power(N = x[2], J = x[3], K = x[4], b = x[1], ...)
}, ...)
} else {
dat$power <- apply(dat, 1, function(x, ...){
dt.power(N = x[2], J = x[3], K = x[4], b = x[1], ...)
}, ...)
}
return(dat)
}
The output of the powerAnalysis
function is a data frame with
columns for the power and all the sample design settings. So, I wrote
a custom plotting function for this data frame:
plotPower <- function(dt){
xyplot(power~N|J*K, data = dt, groups = E,
panel = function(...){panel.xyplot(...)
panel.abline(h = 0.8, lty = 2)},
type = c("p", "l"),
xlab = "sampling years",
ylab = "power",
strip = strip.custom(var.name = c("C", "T"),
strip.levels = c(T, T)),
auto.key = T
)
}
Below is the figure for the cobblebar power analysis. I won't go into
detail on what the results mean since I am concerned here with
illustrating the technique and the R code. Obviously, as the number of
cobblebars and transects per year increase, so does power. And, as
the effect size increases, observing it with a test is easier.