Nested Anova (Stan)

Quarto
R
Academia
Software
Statistics
Published

February 9, 2020

Abstract

This tutorial will focus on the use of Bayesian estimation to fit simple linear regression models …

Keywords

Software, Statistics, Stan

This tutorial will focus on the use of Bayesian estimation to fit simple linear regression models. BUGS (Bayesian inference Using Gibbs Sampling) is an algorithm and supporting language (resembling R) dedicated to performing the Gibbs sampling implementation of Markov Chain Monte Carlo (MCMC) method. Dialects of the BUGS language are implemented within three main projects:

  1. OpenBUGS - written in component pascal.

  2. JAGS - (Just Another Gibbs Sampler) - written in C++.

  3. Stan - a dedicated Bayesian modelling framework written in C++ and implementing Hamiltonian MCMC samplers.

Whilst the above programs can be used stand-alone, they do offer the rich data pre-processing and graphical capabilities of R, and thus, they are best accessed from within R itself. As such there are multiple packages devoted to interfacing with the various software implementations:

This tutorial will demonstrate how to fit models in Stan (Gelman, Lee, and Guo (2015)) using the package rstan (Stan Development Team (2018)) as interface, which also requires to load some other packages.

Overview

Introduction

When single sampling units are selected amongst highly heterogeneous conditions, it is unlikely that these single units will adequately represent the populations and repeated sampling is likely to yield very different outcomes. For example, if we were investigating the impacts of fuel reduction burning across a highly heterogeneous landscape, our ability to replicate adequately might be limited by the number of burn sites available.

Alternatively, sub-replicates within each of the sampling units (e.g. sites) can be collected (and averaged) so as to provided better representatives for each of the units and ultimately reduce the unexplained variability of the test of treatments. In essence, the sub-replicates are the replicates of an additional nested factor whose levels are nested within the main treatment factor. A nested factor refers to a factor whose levels are unique within each level of the factor it is nested within and each level is only represented once. For example, the fuel reduction burn study design could consist of three burnt sites and three un-burnt (control) sites each containing four quadrats (replicates of site and sub-replicates of the burn treatment). Each site represents a unique level of a random factor (any given site cannot be both burnt and un-burnt) that is nested within the fire treatment (burned or not).

A nested design can be thought of as a hierarchical arrangement of factors (hence the alternative name hierarchical designs) whereby a treatment is progressively sub-replicated. As an additional example, imagine an experiment designed to comparing the leaf toughness of a number of tree species. Working down the hierarchy, five individual trees were randomly selected within (nested within) each species, three branches were randomly selected within each tree, two leaves were randomly selected within each branch and the force required to shear the leaf material in half (transversely) was measured in four random locations along the leaf. Clearly any given leaf can only be from a single branch, tree and species. Each level of sub-replication is introduced to further reduce the amount of unexplained variation and thereby increasing the power of the test for the main treatment effect. Additionally, it is possible to investigate which scale has the greatest (or least, etc) degree of variability - the level of the species, individual tree, branch, leaf, leaf region etc.

  • Nested factors are typically random factors, of which the levels are randomly selected to represent all possible levels (e.g. sites). When the main treatment effect (often referred to as Factor A) is a fixed factor, such designs are referred to as a mixed model nested ANOVA, whereas when Factor A is random, the design is referred to as a Model II nested ANOVA.

  • Fixed nested factors are also possible. For example, specific dates (corresponding to particular times during a season) could be nested within season. When all factors are fixed, the design is referred to as a Model I mixed model.

  • Fully nested designs (the topic of this chapter) differ from other multi-factor designs in that all factors within (below) the main treatment factor are nested and thus interactions are un-replicated and cannot be tested. Indeed, interaction effects (interaction between Factor A and site) are assumed to be zero.

Linear models (frequentist)

The linear models for two and three factor nested design are:

\[ y_{ijk} = \mu + \alpha_i + \beta_{j(i)} + \epsilon_{ijk}, \]

\[ y_{ijkl} = \mu + \alpha_i + \beta_{j(i)} + gamma_{k(j(i))} + \epsilon_{ijkl}, \]

where \(\mu\) is the overall mean, \(\alpha\) is the effect of Factor A, \(\beta\) is the effect of Factor B, \(\gamma\) is the effect of Factor C and \(\epsilon\) is the random unexplained or residual component.

Linear models (Bayesian)

So called “random effects” are modelled differently from “fixed effects” in that rather than estimate their individual effects, we instead estimate the variability due to these “random effects”. Since technically all variables in a Bayesian framework are random, some prefer to use the terms ‘fixed effects’ and ‘varying effects’. As random factors typically represent “random” selections of levels (such as a set of randomly selected sites), incorporated in order to account for the dependency structure (observations within sites are more likely to be correlated to one another - not strickly independent) to the data, we are not overly interested in the individual differences between levels of the ‘varying’ (random) factor. Instead (in addition to imposing a separate correlation structure within each nest), we want to know how much variability is attributed to this level of the design. The linear models for two and three factor nested design are:

\[ y_{ijk} = \mu + \alpha_i + \beta_{j(i)} + \epsilon_{ijk}, \;\;\; \epsilon_{ijk} \sim N(0, \sigma^2), \;\;\; \beta_{j(i)} \sim N(0, \sigma^2_{B}) \]

\[ y_{ijkl} = \mu + \alpha_i + \beta_{j(i)} + \gamma_{k(j(i))} + \epsilon_{ijkl}, \;\;\; \epsilon_{ijkl} \sim N(0, \sigma^2), \;\;\; \beta_{j(i)} \sim N(0, \sigma^2_{B}) \;\;\; \gamma_{k(j(i))} \sim N(0, \sigma^2_C) \]

where \(\mu\) is the overall mean, \(\alpha\) is the effect of Factor A, \(\beta\) is the variability of Factor B (nested within Factor A), \(\gamma\) is the variability of Factor C (nested within Factor B) and \(\epsilon\) is the random unexplained or residual component that is assumed to be normally distributed with a mean of zero and a constant amount of standard deviation (\(\sigma^2\)). The subscripts are iterators. For example, the \(i\) represents the number of effects to be estimated for Factor A. Thus the first formula can be read as the \(k\)-th observation of \(y\) is drawn from a normal distribution (with a specific level of variability) and mean proposed to be determined by a base mean (\(\mu\) - mean of the first treatment across all nests) plus the effect of the \(i\)-th treatment effect plus the variabilitythe model proposes that, given a base mean (\(\mu\)) and knowing the effect of the \(i\)-th treatment (factor A) and which of the \(j\)-th nests within the treatment the \(k\)-th observation from Block \(j\) (factor B) within treatment effect.

Null hypotheses

Separate null hypotheses are associated with each of the factors, however, nested factors are typically only added to absorb some of the unexplained variability and thus, specific hypotheses tests associated with nested factors are of lesser biological importance. Hence, rather than estimate the effects of random effects, we instead estimate how much variability they contribute.

Factor A: the main treatment effect (fixed)

  • \(H_0(A): \mu_1=\mu_2=\ldots=\mu_i=\mu\) (the population group means are all equal). That is, that the mean of population \(1\) is equal to that of population \(2\) and so on, and thus all population means are equal to one another - no effect of the factor on the response. If the effect of the \(i\)-th group is the difference between the \(i\)-th group mean and the mean of the first group (\(\alpha_i=\mu_i-\mu_1\)) then the \(H_0\) can alternatively be written as:

  • \(H_0(A) : \alpha_1=\alpha_2=\ldots=\alpha_i=0\) (the effect of each group equals zero). If one or more of the \(\alpha_i\) are different from zero (the response mean for this treatment differs from the overall response mean), there is evidence that the null hypothesis is not true indicating that the factor does affect the response variable.

Factor A: the main treatment effect (random)

  • \(H_0(A) : \sigma^2_{\alpha}=0\) (population variance equals zero). There is no added variance due to all possible levels of A.

Factor B: the nested effect (random)

  • \(H_0(B) : \sigma^2_{\beta}=0\) (population variance equals zero). There is no added variance due to all possible levels of B within the (set or all possible) levels of A.

Factor B: the nested effect (fixed)

  • \(H_0(B): \mu_{1(1)}=\mu_{2(1)}=\ldots=\mu_{j(i)}=\mu\) (the population group means of B (within A) are all equal).

  • \(H_0(B): \beta_{1(1)}=\beta_{2(1)}=\ldots=\beta_{j(i)}=0\) (the effect of each chosen B group equals zero).

Analysis of variance

Analysis of variance sequentially partitions the total variability in the response variable into components explained by each of the factors (starting with the factors lowest down in the hierarchy - the most deeply nested) and the components unexplained by each factor. Explained variability is calculated by subtracting the amount unexplained by the factor from the amount unexplained by a reduced model that does not contain the factor. When the null hypothesis for a factor is true (no effect or added variability), the ratio of explained and unexplained components for that factor (F-ratio) should follow a theoretical F-distribution with an expected value less than 1. The appropriate unexplained residuals and therefore the appropriate F-ratios for each factor differ according to the different null hypotheses associated with different combinations of fixed and random factors in a nested linear model (see Table below).

fact_anova_table
NA       df        MS         F-ratio (B random)    Var comp (B random)        
NA A     "a-1"     "MS A"     "(MS A)/(MS B'(A))"   "((MS A) - (MS B'(A)))/nb" 
NA B'(A) "(b-1)a"  "MS B'(A)" "(MS B'(A))/(MS res)" "((MS B'(A)) - (MS res))/n"
NA Res   "(n-1)ba" "MS res"   ""                    ""                         
NA       F-ratio (B fixed)     Var comp (B fixed)         
NA A     "(MS A)/(MS res)"     "((MS A) - (MS res))/nb"   
NA B'(A) "(MS B'(A))/(MS res)" "((MS B'(A)) - (MS res))/n"
NA Res   ""                    ""

The corresponding R syntax is given below.

#A fixed/random, B random (balanced)
summary(aov(y~A+Error(B), data))
VarCorr(lme(y~A,random=1|B, data))

#A fixed/random, B random (unbalanced)
anova(lme(y~A,random=1|B, data), type='marginal')

#A fixed/random, B fixed(balanced)
summary(aov(y~A+B, data))

#A fixed/random, B fixed (unbalanced)
contrasts(data$B) <- contr.sum
Anova(aov(y~A/B, data), type='III')

Variance components

As previously alluded to, it can often be useful to determine the relative contribution (to explaining the unexplained variability) of each of the factors as this provides insights into the variability at each different scales. These contributions are known as Variance components and are estimates of the added variances due to each of the factors. For consistency with leading texts on this topic, I have included estimated variance components for various balanced nested ANOVA designs in the above table. However, variance components based on a modified version of the maximum likelihood iterative model fitting procedure (REML) is generally recommended as this accommodates both balanced and unbalanced designs. While there are no numerical differences in the calculations of variance components for fixed and random factors, fixed factors are interpreted very differently and arguably have little clinical meaning (other to infer relative contribution). For fixed factors, variance components estimate the variance between the means of the specific populations that are represented by the selected levels of the factor and therefore represent somewhat arbitrary and artificial populations. For random factors, variance components estimate the variance between means of all possible populations that could have been selected and thus represents the true population variance.

Assumptions

An F-distribution represents the relative frequencies of all the possible F-ratio’s when a given null hypothesis is true and certain assumptions about the residuals (denominator in the F-ratio calculation) hold. Consequently, it is also important that diagnostics associated with a particular hypothesis test reflect the denominator for the appropriate F-ratio. For example, when testing the null hypothesis that there is no effect of Factor A (\(H_0(A):\alpha_i=0\)) in a mixed nested ANOVA, the means of each level of Factor B are used as the replicates of Factor A. As with single factor anova, hypothesis testing for nested ANOVA assumes the residuals are:

  • normally distributed. Factors higher up in the hierarchy of a nested model are based on means (or means of means) of lower factors and thus the Central Limit Theory would predict that normality will usually be satisfied for the higher level factors. Nevertheless, boxplots using the appropriate scale of replication should be used to explore normality. Scale transformations are often useful.

  • equally varied. Boxplots and plots of means against variance (using the appropriate scale of replication) should be used to explore the spread of values. Residual plots should reveal no patterns. Scale transformations are often useful.

  • independent of one another - this requires special consideration so as to ensure that the scale at which sub-replicates are measured is still great enough to enable observations to be independent.

Unbalanced nested designs

Designs that incorporate fixed and random factors (either nested or factorial), involve F-ratio calculations in which the denominators are themselves random factors other than the overall residuals. Many statisticians argue that when such denominators are themselves not statistically significant (at the \(0.25\) level), there are substantial power benefits from pooling together successive non-significant denominator terms. Thus an F-ratio for a particular factor might be recalculated after pooling together its original denominator with its denominators denominator and so on. The conservative \(0.25\) is used instead of the usual 0.05 to reduce further the likelihood of Type II errors (falsely concluding an effect is non-significant - that might result from insufficient power).

For a simple completely balanced nested ANOVA, it is possible to pool together (calculate their mean) each of the sub-replicates within each nest (site) and then perform single factor ANOVA on those aggregates. Indeed, for a balanced design, the estimates and hypothesis for Factor A will be identical to that produced via nested ANOVA. However, if there are an unequal number of sub-replicates within each nest, then the single factor ANOVA will be less powerful that a proper nested ANOVA. Unbalanced designs are those designs in which sample (subsample) sizes for each level of one or more factors differ. These situations are relatively common in biological research, however such imbalance has some important implications for nested designs.

Firstly, hypothesis tests are more robust to the assumptions of normality and equal variance when the design is balanced. Secondly (and arguably, more importantly), the model contrasts are not orthogonal (independent) and the sums of squares component attributed to each of the model terms cannot be calculated by simple additive partitioning of the total sums of squares. In such situations, exact F-ratios cannot be constructed (at least in theory), variance components calculations are more complicated and significance tests cannot be computed. The denominator MS in an F-ratio is determined by examining the expected value of the mean squares of each term in a model. Unequal sample sizes result in expected means squares for which there are no obvious logical comparators that enable the impact of an individual model term to be isolated. The severity of this issue depends on which scale of the sub-sampling hierarchy the unbalance(s) occurs as well whether the unbalance occurs in the replication of a fixed or random factor. For example, whilst unequal levels of the first nesting factor (e.g. unequal number of burn vs un-burnt sites) has no effect on F-ratio construction or hypothesis testing for the top level factor (irrespective of whether either of the factors are fixed or random), unequal sub-sampling (replication) at the level of a random (but not fixed) nesting factor will impact on the ability to construct F-ratios and variance components of all terms above it in the hierarchy. There are a number of alternative ways of dealing with unbalanced nested designs. All alternatives assume that the imbalance is not a direct result of the treatments themselves. Such outcomes are more appropriately analysed by modelling the counts of surviving observations via frequency analysis.

  • Split the analysis up into separate smaller simple ANOVA’s each using the means of the nesting factor to reflect the appropriate scale of replication. As the resulting sums of squares components are thereby based on an aggregated dataset the analyses then inherit the procedures and requirements of single ANOVA.
  • Adopt mixed-modelling techniques.

We note that, in a Bayesian framework, issues of design balance essentially evaporate.

Linear mixed effects models

Although the term “mixed-effects” can be used to refer to any design that incorporates both fixed and random predictors, its use is more commonly restricted to designs in which factors are nested or grouped within other factors. Typical examples include nested, longitudinal (measurements repeated over time) data, repeated measures and blocking designs. Furthermore, rather than basing parameter estimations on observed and expected mean squares or error strata (as outline above), mixed-effects models estimate parameters via maximum likelihood (ML) or residual maximum likelihood (REML). In so doing, mixed-effects models more appropriately handle estimation of parameters, effects and variance components of unbalanced designs (particularly for random effects). Resulting fitted (or expected) values of each level of a factor (for example, the expected population site means) are referred to as Best Linear Unbiased Predictors (BLUP’s). As an acknowledgement that most estimated site means will be more extreme than the underlying true population means they estimate (based on the principle that smaller sample sizes result in greater chances of more extreme observations and that nested sub-replicates are also likely to be highly intercorrelated), BLUP’s are less spread from the overall mean than are simple site means. In addition, mixed-effects models naturally model the “within-block” correlation structure that complicates many longitudinal designs.

Whilst the basic concepts of mixed-effects models have been around for a long time, recent computing advances and adoptions have greatly boosted the popularity of these procedures. Linear mixed effects models are currently at the forefront of statistical development, and as such, are very much a work in progress - both in theory and in practice. Recent developments have seen a further shift away from the traditional practices associated with degrees of freedom, probability distribution and p-value calculations. The traditional approach to inference testing is to compare the fit of an alternative (full) model to a null (reduced) model (via an F-ratio). When assumptions of normality and homogeneity of variance apply, the degrees of freedom are easily computed and the F-ratio has an exact F-distribution to which it can be compared. However, this approach introduces two additional problematic assumptions when estimating fixed effects in a mixed effects model. Firstly, when estimating the effects of one factor, the parameter estimates associated with other factor(s) are assumed to be the true values of those parameters (not estimates). Whilst this assumption is reasonable when all factors are fixed, as random factors are selected such that they represent one possible set of levels drawn from an entire population of possible levels for the random factor, it is unlikely that the associated parameter estimates accurately reflect the true values. Consequently, there is not necessarily an appropriate F-distribution. Furthermore, determining the appropriate degrees of freedom (nominally, the number of independent observations on which estimates are based) for models that incorporate a hierarchical structure is only possible under very specific circumstances (such as completely balanced designs). Degrees of freedom is a somewhat arbitrary defined concept used primarily to select a theoretical probability distribution on which a statistic can be compared. Arguably, however, it is a concept that is overly simplistic for complex hierarchical designs. Most statistical applications continue to provide the “approximate” solutions (as did earlier versions within R). However, R linear mixed effects development leaders argue strenuously that given the above shortcomings, such approximations are variably inappropriate and are thus omitted.

Markov chain Monte Carlo (MCMC) sampling methods provide a Bayesian-like alternative for inference testing. Markov chains use the mixed model parameter estimates to generate posterior probability distributions of each parameter from which Monte Carlo sampling methods draw a large set of parameter samples. These parameter samples can then be used to calculate highest posterior density (HPD) intervals (also known as Bayesian credible intervals). Such intervals indicate the interval in which there is a specified probability (typically \(95\)%) that the true population parameter lies. Furthermore, whilst technically against the spirit of the Bayesian philosophy, it is also possible to generate P values on which to base inferences.

Data generation

Imagine we has designed an experiment in which we intend to measure a response (\(y\)) to one of treatments (three levels; “a1”, “a2” and “a3”). The treatments occur at a spatial scale (over an area) that far exceeds the logistical scale of sampling units (it would take too long to sample at the scale at which the treatments were applied). The treatments occurred at the scale of hectares whereas it was only feasible to sample y using 1m quadrats. Given that the treatments were naturally occurring (such as soil type), it was not possible to have more than five sites of each treatment type, yet prior experience suggested that the sites in which you intended to sample were very uneven and patchy with respect to \(y\). In an attempt to account for this inter-site variability (and thus maximize the power of the test for the effect of treatment, you decided to employ a nested design in which 10 quadrats were randomly located within each of the five replicate sites per three treatments. As this section is mainly about the generation of artificial data (and not specifically about what to do with the data), understanding the actual details are optional and can be safely skipped.

library(plyr)
set.seed(123)
nTreat <- 3
nSites <- 15
nSitesPerTreat <- nSites/nTreat
nQuads <- 10
site.sigma <- 12
sigma <- 5
n <- nSites * nQuads
sites <- gl(n=nSites,k=nQuads, lab=paste0('S',1:nSites))
A <- gl(nTreat, nSitesPerTreat*nQuads, n, labels=c('a1','a2','a3'))
a.means <- c(40,70,80)
## the site means (treatment effects) are drawn from normal distributions
## with means of 40, 70 and 80 and standard deviations of 12
A.effects <- rnorm(nSites, rep(a.means,each=nSitesPerTreat),site.sigma)
#A.effects <- a.means %*% t(model.matrix(~A, data.frame(A=gl(nTreat,nSitesPerTreat,nSites))))+rnorm(nSites,0,site.sigma)
Xmat <- model.matrix(~sites -1)
lin.pred <- Xmat %*% c(A.effects)
## the quadrat observations (within sites) are drawn from
## normal distributions with means according to the site means
## and standard deviations of 5
y <- rnorm(n,lin.pred,sigma)
data.nest <- data.frame(y=y, A=A, Sites=sites,Quads=1:length(y))
head(data.nest)  #print out the first six rows of the data set
NA          y  A Sites Quads
NA 1 42.20886 a1    S1     1
NA 2 35.76354 a1    S1     2
NA 3 23.44121 a1    S1     3
NA 4 36.78107 a1    S1     4
NA 5 30.91034 a1    S1     5
NA 6 27.93517 a1    S1     6
library(ggplot2)
ggplot(data.nest, aes(y=y, x=1)) + geom_boxplot() + facet_grid(.~Sites)

Exploratory data analysis

Normality and Homogeneity of variance

#Effects of treatment
boxplot(y~A, ddply(data.nest, ~A+Sites,numcolwise(mean, na.rm=T)))

#Site effects
boxplot(y~Sites, ddply(data.nest, ~A+Sites+Quads,numcolwise(mean, na.rm=T)))

## with ggplot2
ggplot(ddply(data.nest, ~A+Sites,numcolwise(mean, na.rm=T)), aes(y=y, x=A)) +
  geom_boxplot()

Conclusions:

  • there is no evidence that the response variable is consistently non-normal across all populations - each boxplot is approximately symmetrical.

  • there is no evidence that variance (as estimated by the height of the boxplots) differs between the five populations. More importantly, there is no evidence of a relationship between mean and variance - the height of boxplots does not increase with increasing position along the y-axis. Hence it there is no evidence of non-homogeneity.

Obvious violations could be addressed either by:

  • transform the scale of the response variables (to address normality, etc). Note transformations should be applied to the entire response variable (not just those populations that are skewed).

Model fitting

For non-hierarchical linear models, uniform priors on variance (standard deviation) parameters seem to work reasonably well. Gelman et al. (2006) warns that the use of the inverse-gamma family of non-informative priors are very sensitive to ϵ particularly when variance is close to zero and this may lead to unintentionally informative priors. When the number of groups (treatments or varying/random effects) is large (more than \(5\)), Gelman et al. (2006) advocated the use of either uniform or half-cauchy priors. Yet when the number of groups is low, Gelman et al. (2006) indicates that uniform priors have a tendency to result in inflated variance estimates. Consequently, half-cauchy priors are generally recommended for variances.

Full parameterisation

\[ y_{ijk} \sim N(\mu_{ij}, \sigma^2), \;\;\; \mu_{ij}=\alpha_0 + \alpha_i + \beta_{j(i)}, \]

where \(\beta_{ij)} \sim N(0, \sigma^2_B)\), \(\alpha_0, \alpha_i \sim N(0, 1000000)\), and \(\sigma^2, \sigma^2_B \sim \text{Cauchy(0, 25)}\). The full parameterisation, shows the effects parameterisation in which there is an intercept (\(\alpha_0\)) and two treatment effects (\(\alpha_i\), where \(i\) is \(1,2\)).

Matrix parameterisation

\[ y_{ijk} \sim N(\mu_{ij}, \sigma^2), \;\;\; \mu_{ij}=\boldsymbol \alpha \boldsymbol X + \beta_{j(i)}, \]

where \(\beta_{ij} \sim N(0, \sigma^2_B)\), \(\boldsymbol \alpha \sim MVN(0, 1000000)\), and \(\sigma^2, \sigma^2_B \sim \text{Cauchy(0, 25)}\). The full parameterisation, shows the effects parameterisation in which there is an intercept (\(\alpha_0\)) and two treatment effects (\(\alpha_i\), where \(i\) is \(1,2\)). The matrix parameterisation is a compressed notation, In this parameterisation, there are three alpha parameters (one representing the mean of treatment a1, and the other two representing the treatment effects (differences between a2 and a1 and a3 and a1). In generating priors for each of these three alpha parameters, we could loop through each and define a non-informative normal prior to each (as in the Full parameterisation version). However, it turns out that it is more efficient (in terms of mixing and thus the number of necessary iterations) to define the priors from a multivariate normal distribution. This has as many means as there are parameters to estimate (\(3\)) and a \(3\times3\) matrix of zeros and \(100\) in the diagonals.

\[ \boldsymbol \mu = \begin{bmatrix} 0 \\ 0 \\ 0 \end{bmatrix}, \;\;\; \sigma^2 \sim \begin{bmatrix} 1000000 & 0 & 0 \\ 0 & 1000000 & 0 \\ 0 & 0 & 1000000 \end{bmatrix}. \]

Hierarchical parameterisation

\[ y_{ijk} \sim N(\beta_{i(j)}, \sigma^2), \;\;\; \beta_{i(j)}\sim N(\mu_i, \sigma^2_B), \]

where \(\mu_i = \boldsymbol \alpha \boldsymbol X\), \(\alpha_i \sim N(0, 1000000)\), and \(\sigma^2, \sigma^2_B \sim \text{Cauchy(0, 25)}\). In the heirarchical parameterisation, we are indicating two residual layers - one representing the variability in the observed data between individual observations (within sites) and the second representing the variability between site means (within the three treatments).

Full means parameterisation

rstanString="
data{
   int n;
   int nA;
   int nB;
   vector [n] y;
   int A[n];
   int B[n];
}

parameters{
  real alpha[nA];
  real<lower=0> sigma;
  vector [nB] beta;
  real<lower=0> sigma_B;
}
 
model{
    real mu[n];

    // Priors
    alpha ~ normal( 0 , 100 );
    beta ~ normal( 0 , sigma_B );
    sigma_B ~ cauchy( 0 , 25 );
    sigma ~ cauchy( 0 , 25 );
    
    for ( i in 1:n ) {
        mu[i] = alpha[A[i]] + beta[B[i]];
    }
    y ~ normal( mu , sigma );
}

"

## write the model to a text file
writeLines(rstanString, con = "fullModel.stan")

Arrange the data as a list (as required by Stan). As input, Stan will need to be supplied with: the response variable, the predictor matrix, the number of predictors, the total number of observed items. This all needs to be contained within a list object. We will create two data lists, one for each of the hypotheses.

data.nest.list <- with(data.nest, list(y=y, A=as.numeric(A), B=as.numeric(Sites),
  n=nrow(data.nest), nB=length(levels(Sites)),nA=length(levels(A))))

Define the nodes (parameters and derivatives) to monitor and the chain parameters.

params <- c("alpha","sigma","sigma_B")
burnInSteps = 3000
nChains = 2
numSavedSteps = 3000
thinSteps = 1
nIter = burnInSteps+ceiling((numSavedSteps * thinSteps)/nChains)

Start the Stan model (check the model, load data into the model, specify the number of chains and compile the model). Load the rstan package.

library(rstan)

Now run the Stan code via the rstan interface.

data.nest.rstan.c <- stan(data = data.nest.list, file = "fullModel.stan", chains = nChains, pars = params, iter = nIter, warmup = burnInSteps, thin = thinSteps)
NA 
NA SAMPLING FOR MODEL 'anon_model' NOW (CHAIN 1).
NA Chain 1: 
NA Chain 1: Gradient evaluation took 3.3e-05 seconds
NA Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 0.33 seconds.
NA Chain 1: Adjust your expectations accordingly!
NA Chain 1: 
NA Chain 1: 
NA Chain 1: Iteration:    1 / 4500 [  0%]  (Warmup)
NA Chain 1: Iteration:  450 / 4500 [ 10%]  (Warmup)
NA Chain 1: Iteration:  900 / 4500 [ 20%]  (Warmup)
NA Chain 1: Iteration: 1350 / 4500 [ 30%]  (Warmup)
NA Chain 1: Iteration: 1800 / 4500 [ 40%]  (Warmup)
NA Chain 1: Iteration: 2250 / 4500 [ 50%]  (Warmup)
NA Chain 1: Iteration: 2700 / 4500 [ 60%]  (Warmup)
NA Chain 1: Iteration: 3001 / 4500 [ 66%]  (Sampling)
NA Chain 1: Iteration: 3450 / 4500 [ 76%]  (Sampling)
NA Chain 1: Iteration: 3900 / 4500 [ 86%]  (Sampling)
NA Chain 1: Iteration: 4350 / 4500 [ 96%]  (Sampling)
NA Chain 1: Iteration: 4500 / 4500 [100%]  (Sampling)
NA Chain 1: 
NA Chain 1:  Elapsed Time: 0.682 seconds (Warm-up)
NA Chain 1:                0.37 seconds (Sampling)
NA Chain 1:                1.052 seconds (Total)
NA Chain 1: 
NA 
NA SAMPLING FOR MODEL 'anon_model' NOW (CHAIN 2).
NA Chain 2: 
NA Chain 2: Gradient evaluation took 8e-06 seconds
NA Chain 2: 1000 transitions using 10 leapfrog steps per transition would take 0.08 seconds.
NA Chain 2: Adjust your expectations accordingly!
NA Chain 2: 
NA Chain 2: 
NA Chain 2: Iteration:    1 / 4500 [  0%]  (Warmup)
NA Chain 2: Iteration:  450 / 4500 [ 10%]  (Warmup)
NA Chain 2: Iteration:  900 / 4500 [ 20%]  (Warmup)
NA Chain 2: Iteration: 1350 / 4500 [ 30%]  (Warmup)
NA Chain 2: Iteration: 1800 / 4500 [ 40%]  (Warmup)
NA Chain 2: Iteration: 2250 / 4500 [ 50%]  (Warmup)
NA Chain 2: Iteration: 2700 / 4500 [ 60%]  (Warmup)
NA Chain 2: Iteration: 3001 / 4500 [ 66%]  (Sampling)
NA Chain 2: Iteration: 3450 / 4500 [ 76%]  (Sampling)
NA Chain 2: Iteration: 3900 / 4500 [ 86%]  (Sampling)
NA Chain 2: Iteration: 4350 / 4500 [ 96%]  (Sampling)
NA Chain 2: Iteration: 4500 / 4500 [100%]  (Sampling)
NA Chain 2: 
NA Chain 2:  Elapsed Time: 0.659 seconds (Warm-up)
NA Chain 2:                0.377 seconds (Sampling)
NA Chain 2:                1.036 seconds (Total)
NA Chain 2:
print(data.nest.rstan.c, par = c("alpha", "sigma", "sigma_B"))
NA Inference for Stan model: anon_model.
NA 2 chains, each with iter=4500; warmup=3000; thin=1; 
NA post-warmup draws per chain=1500, total post-warmup draws=3000.
NA 
NA           mean se_mean   sd  2.5%   25%   50%   75% 97.5% n_eff Rhat
NA alpha[1] 41.90    0.19 5.62 30.38 38.19 41.93 45.39 53.23   866    1
NA alpha[2] 69.63    0.18 5.43 59.38 66.03 69.66 73.13 80.67   865    1
NA alpha[3] 82.69    0.17 5.49 71.08 79.24 82.76 86.26 93.08  1020    1
NA sigma     5.04    0.01 0.30  4.49  4.82  5.03  5.23  5.66  1693    1
NA sigma_B  11.73    0.08 2.78  7.76  9.78 11.22 13.12 18.49  1174    1
NA 
NA Samples were drawn using NUTS(diag_e) at Mon Jul 22 12:41:33 2024.
NA For each parameter, n_eff is a crude measure of effective sample size,
NA and Rhat is the potential scale reduction factor on split chains (at 
NA convergence, Rhat=1).
data.nest.rstan.c.df <-as.data.frame(extract(data.nest.rstan.c))
head(data.nest.rstan.c.df)
NA    alpha.1  alpha.2  alpha.3    sigma   sigma_B      lp__
NA 1 34.66394 74.84391 83.33568 4.783857 11.030622 -355.9341
NA 2 38.10838 72.29916 82.46015 5.186627 10.425868 -355.7169
NA 3 33.98648 69.14414 95.22563 5.515138 11.765572 -360.1139
NA 4 35.26812 65.90908 84.96725 4.967946  7.948799 -354.9913
NA 5 30.38241 77.09534 84.52162 4.453866 12.540387 -358.2463
NA 6 40.79643 65.72887 86.83717 4.993259  8.487666 -353.6030

Full effect parameterisation

rstan2String="
data{
   int n;
   int nB;
   vector [n] y;
   int A2[n];
   int A3[n];
   int B[n];
}

parameters{
  real alpha0;
  real alpha2;
  real alpha3;
  real<lower=0> sigma;
  vector [nB] beta;
  real<lower=0> sigma_B;
}
 
model{
    real mu[n];

    // Priors
    alpha0 ~ normal( 0 , 100 );
    alpha2 ~ normal( 0 , 100 );
    alpha3 ~ normal( 0 , 100 );
    beta ~ normal( 0 , sigma_B );
    sigma_B ~ cauchy( 0 , 25 );
    sigma ~ cauchy( 0 , 25 );
    
    for ( i in 1:n ) {
        mu[i] = alpha0 + alpha2*A2[i] + 
               alpha3*A3[i] + beta[B[i]];
    }
    y ~ normal( mu , sigma );
}

"

## write the model to a text file
writeLines(rstan2String, con = "full2Model.stan")

Arrange the data as a list (as required by Stan). As input, Stan will need to be supplied with: the response variable, the predictor matrix, the number of predictors, the total number of observed items. This all needs to be contained within a list object. We will create two data lists, one for each of the hypotheses.

A2 <- ifelse(data.nest$A=='a2',1,0)
A3 <- ifelse(data.nest$A=='a3',1,0)
data.nest.list <- with(data.nest, list(y=y, A2=A2, A3=A3, B=as.numeric(Sites),
   n=nrow(data.nest), nB=length(levels(Sites))))

Define the nodes (parameters and derivatives) to monitor and the chain parameters.

params <- c("alpha0","alpha2","alpha3","sigma","sigma_B")
burnInSteps = 3000
nChains = 2
numSavedSteps = 3000
thinSteps = 1
nIter = burnInSteps+ceiling((numSavedSteps * thinSteps)/nChains)

Now run the Stan code via the rstan interface.

data.nest.rstan.c2 <- stan(data = data.nest.list, file = "full2Model.stan", chains = nChains, pars = params, iter = nIter, warmup = burnInSteps, thin = thinSteps)
NA 
NA SAMPLING FOR MODEL 'anon_model' NOW (CHAIN 1).
NA Chain 1: 
NA Chain 1: Gradient evaluation took 3.3e-05 seconds
NA Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 0.33 seconds.
NA Chain 1: Adjust your expectations accordingly!
NA Chain 1: 
NA Chain 1: 
NA Chain 1: Iteration:    1 / 4500 [  0%]  (Warmup)
NA Chain 1: Iteration:  450 / 4500 [ 10%]  (Warmup)
NA Chain 1: Iteration:  900 / 4500 [ 20%]  (Warmup)
NA Chain 1: Iteration: 1350 / 4500 [ 30%]  (Warmup)
NA Chain 1: Iteration: 1800 / 4500 [ 40%]  (Warmup)
NA Chain 1: Iteration: 2250 / 4500 [ 50%]  (Warmup)
NA Chain 1: Iteration: 2700 / 4500 [ 60%]  (Warmup)
NA Chain 1: Iteration: 3001 / 4500 [ 66%]  (Sampling)
NA Chain 1: Iteration: 3450 / 4500 [ 76%]  (Sampling)
NA Chain 1: Iteration: 3900 / 4500 [ 86%]  (Sampling)
NA Chain 1: Iteration: 4350 / 4500 [ 96%]  (Sampling)
NA Chain 1: Iteration: 4500 / 4500 [100%]  (Sampling)
NA Chain 1: 
NA Chain 1:  Elapsed Time: 1.48 seconds (Warm-up)
NA Chain 1:                0.731 seconds (Sampling)
NA Chain 1:                2.211 seconds (Total)
NA Chain 1: 
NA 
NA SAMPLING FOR MODEL 'anon_model' NOW (CHAIN 2).
NA Chain 2: 
NA Chain 2: Gradient evaluation took 1.4e-05 seconds
NA Chain 2: 1000 transitions using 10 leapfrog steps per transition would take 0.14 seconds.
NA Chain 2: Adjust your expectations accordingly!
NA Chain 2: 
NA Chain 2: 
NA Chain 2: Iteration:    1 / 4500 [  0%]  (Warmup)
NA Chain 2: Iteration:  450 / 4500 [ 10%]  (Warmup)
NA Chain 2: Iteration:  900 / 4500 [ 20%]  (Warmup)
NA Chain 2: Iteration: 1350 / 4500 [ 30%]  (Warmup)
NA Chain 2: Iteration: 1800 / 4500 [ 40%]  (Warmup)
NA Chain 2: Iteration: 2250 / 4500 [ 50%]  (Warmup)
NA Chain 2: Iteration: 2700 / 4500 [ 60%]  (Warmup)
NA Chain 2: Iteration: 3001 / 4500 [ 66%]  (Sampling)
NA Chain 2: Iteration: 3450 / 4500 [ 76%]  (Sampling)
NA Chain 2: Iteration: 3900 / 4500 [ 86%]  (Sampling)
NA Chain 2: Iteration: 4350 / 4500 [ 96%]  (Sampling)
NA Chain 2: Iteration: 4500 / 4500 [100%]  (Sampling)
NA Chain 2: 
NA Chain 2:  Elapsed Time: 1.632 seconds (Warm-up)
NA Chain 2:                1.047 seconds (Sampling)
NA Chain 2:                2.679 seconds (Total)
NA Chain 2:
print(data.nest.rstan.c2, par = c("alpha0", "alpha2", "alpha3", "sigma", "sigma_B"))
NA Inference for Stan model: anon_model.
NA 2 chains, each with iter=4500; warmup=3000; thin=1; 
NA post-warmup draws per chain=1500, total post-warmup draws=3000.
NA 
NA          mean se_mean   sd  2.5%   25%   50%   75% 97.5% n_eff Rhat
NA alpha0  42.46    0.19 5.30 32.37 39.01 42.53 45.80 52.98   792    1
NA alpha2  27.25    0.28 7.67 11.75 22.68 27.36 32.25 41.95   750    1
NA alpha3  40.71    0.26 7.55 25.57 35.91 40.46 45.42 55.99   850    1
NA sigma    5.04    0.01 0.31  4.47  4.81  5.03  5.25  5.70  1239    1
NA sigma_B 11.54    0.09 2.65  7.69  9.70 11.09 12.92 18.04   779    1
NA 
NA Samples were drawn using NUTS(diag_e) at Mon Jul 22 12:42:12 2024.
NA For each parameter, n_eff is a crude measure of effective sample size,
NA and Rhat is the potential scale reduction factor on split chains (at 
NA convergence, Rhat=1).
data.nest.rstan.c2.df <-as.data.frame(extract(data.nest.rstan.c2))
head(data.nest.rstan.c2.df)
NA     alpha0   alpha2   alpha3    sigma  sigma_B      lp__
NA 1 47.83462 26.07003 29.50999 5.543406 13.06805 -355.4811
NA 2 45.27770 25.90933 37.47072 5.470879 12.19998 -353.8504
NA 3 40.49191 35.16179 39.21332 5.767743 11.64687 -358.2453
NA 4 45.51317 26.64989 36.41803 5.613998 10.87368 -360.0820
NA 5 47.70665 21.64392 33.10168 4.785280 11.33905 -353.9069
NA 6 31.05164 43.27888 45.95428 4.722624 12.17590 -356.0434

Matrix parameterisation

rstanString2="
data{
   int n;
   int nB;
   int nA;
   vector [n] y;
   matrix [n,nA] X;
   int B[n];
   vector [nA] a0;
   matrix [nA,nA] A0;
}

parameters{
  vector [nA] alpha;
  real<lower=0> sigma;
  vector [nB] beta;
  real<lower=0> sigma_B;
}
 
model{
    real mu[n];

    // Priors
    //alpha ~ normal( 0 , 100 );
    alpha ~ multi_normal(a0,A0);
    beta ~ normal( 0 , sigma_B );
    sigma_B ~ cauchy( 0 , 25);
    sigma ~ cauchy( 0 , 25 );
    
    for ( i in 1:n ) {
        mu[i] = dot_product(X[i],alpha) + beta[B[i]];
    }
    y ~ normal( mu , sigma );
}

"

## write the model to a text file
writeLines(rstanString2, con = "matrixModel.stan")

Arrange the data as a list (as required by Stan). As input, Stan will need to be supplied with: the response variable, the predictor matrix, the number of predictors, the total number of observed items. This all needs to be contained within a list object. We will create two data lists, one for each of the hypotheses.

X <- model.matrix(~A, data.nest)
nA <- ncol(X)
data.nest.list <- with(data.nest, list(y=y, X=X, B=as.numeric(Sites),
   n=nrow(data.nest), nB=length(levels(Sites)), nA=nA,
   a0=rep(0,nA), A0=diag(100000,nA)))

Define the nodes (parameters and derivatives) to monitor and the chain parameters.

params <- c("alpha","sigma","sigma_B")
burnInSteps = 3000
nChains = 2
numSavedSteps = 3000
thinSteps = 1
nIter = burnInSteps+ceiling((numSavedSteps * thinSteps)/nChains)

Now run the Stan code via the rstan interface.

data.nest.rstan.m <- stan(data = data.nest.list, file = "matrixModel.stan", chains = nChains, pars = params, iter = nIter, warmup = burnInSteps, thin = thinSteps)
NA 
NA SAMPLING FOR MODEL 'anon_model' NOW (CHAIN 1).
NA Chain 1: 
NA Chain 1: Gradient evaluation took 4.1e-05 seconds
NA Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 0.41 seconds.
NA Chain 1: Adjust your expectations accordingly!
NA Chain 1: 
NA Chain 1: 
NA Chain 1: Iteration:    1 / 4500 [  0%]  (Warmup)
NA Chain 1: Iteration:  450 / 4500 [ 10%]  (Warmup)
NA Chain 1: Iteration:  900 / 4500 [ 20%]  (Warmup)
NA Chain 1: Iteration: 1350 / 4500 [ 30%]  (Warmup)
NA Chain 1: Iteration: 1800 / 4500 [ 40%]  (Warmup)
NA Chain 1: Iteration: 2250 / 4500 [ 50%]  (Warmup)
NA Chain 1: Iteration: 2700 / 4500 [ 60%]  (Warmup)
NA Chain 1: Iteration: 3001 / 4500 [ 66%]  (Sampling)
NA Chain 1: Iteration: 3450 / 4500 [ 76%]  (Sampling)
NA Chain 1: Iteration: 3900 / 4500 [ 86%]  (Sampling)
NA Chain 1: Iteration: 4350 / 4500 [ 96%]  (Sampling)
NA Chain 1: Iteration: 4500 / 4500 [100%]  (Sampling)
NA Chain 1: 
NA Chain 1:  Elapsed Time: 1.714 seconds (Warm-up)
NA Chain 1:                1.076 seconds (Sampling)
NA Chain 1:                2.79 seconds (Total)
NA Chain 1: 
NA 
NA SAMPLING FOR MODEL 'anon_model' NOW (CHAIN 2).
NA Chain 2: 
NA Chain 2: Gradient evaluation took 1.4e-05 seconds
NA Chain 2: 1000 transitions using 10 leapfrog steps per transition would take 0.14 seconds.
NA Chain 2: Adjust your expectations accordingly!
NA Chain 2: 
NA Chain 2: 
NA Chain 2: Iteration:    1 / 4500 [  0%]  (Warmup)
NA Chain 2: Iteration:  450 / 4500 [ 10%]  (Warmup)
NA Chain 2: Iteration:  900 / 4500 [ 20%]  (Warmup)
NA Chain 2: Iteration: 1350 / 4500 [ 30%]  (Warmup)
NA Chain 2: Iteration: 1800 / 4500 [ 40%]  (Warmup)
NA Chain 2: Iteration: 2250 / 4500 [ 50%]  (Warmup)
NA Chain 2: Iteration: 2700 / 4500 [ 60%]  (Warmup)
NA Chain 2: Iteration: 3001 / 4500 [ 66%]  (Sampling)
NA Chain 2: Iteration: 3450 / 4500 [ 76%]  (Sampling)
NA Chain 2: Iteration: 3900 / 4500 [ 86%]  (Sampling)
NA Chain 2: Iteration: 4350 / 4500 [ 96%]  (Sampling)
NA Chain 2: Iteration: 4500 / 4500 [100%]  (Sampling)
NA Chain 2: 
NA Chain 2:  Elapsed Time: 2.052 seconds (Warm-up)
NA Chain 2:                1.185 seconds (Sampling)
NA Chain 2:                3.237 seconds (Total)
NA Chain 2:
print(data.nest.rstan.m, par = c("alpha", "sigma", "sigma_B"))
NA Inference for Stan model: anon_model.
NA 2 chains, each with iter=4500; warmup=3000; thin=1; 
NA post-warmup draws per chain=1500, total post-warmup draws=3000.
NA 
NA           mean se_mean   sd  2.5%   25%   50%   75% 97.5% n_eff Rhat
NA alpha[1] 41.97    0.21 5.56 30.65 38.51 41.99 45.45 53.17   700    1
NA alpha[2] 27.79    0.27 7.79 12.57 22.80 27.76 32.78 43.27   860    1
NA alpha[3] 41.44    0.28 7.77 25.51 36.54 41.54 46.56 56.74   782    1
NA sigma     5.04    0.01 0.32  4.47  4.81  5.03  5.24  5.71  1484    1
NA sigma_B  11.58    0.08 2.63  7.53  9.70 11.16 13.06 17.69  1012    1
NA 
NA Samples were drawn using NUTS(diag_e) at Mon Jul 22 12:42:50 2024.
NA For each parameter, n_eff is a crude measure of effective sample size,
NA and Rhat is the potential scale reduction factor on split chains (at 
NA convergence, Rhat=1).
data.nest.rstan.m.df <-as.data.frame(extract(data.nest.rstan.m))
head(data.nest.rstan.m.df)
NA    alpha.1  alpha.2  alpha.3    sigma   sigma_B      lp__
NA 1 38.51260 28.65160 45.31224 5.234954  9.437163 -352.1193
NA 2 42.44434 27.88099 30.81416 4.885081 10.432388 -354.4981
NA 3 45.66044 15.23036 31.42478 4.916171 18.635395 -357.9730
NA 4 38.10635 29.53828 46.86861 4.890449  9.377387 -363.9728
NA 5 45.01428 21.53882 31.16592 4.814843  9.749733 -355.6716
NA 6 37.51673 39.07947 51.74908 4.770156  8.408097 -359.3753

Hierarchical parameterisation

rstanString3="
data{
   int n;
   int nA;
   int nSites;
   vector [n] y;
   matrix [nSites,nA] X;
   matrix [n,nSites] Z;
}

parameters{
   vector[nA] beta;
   vector[nSites] gamma;
   real<lower=0> sigma;
   real<lower=0> sigma_S;
   
}
 
model{
    vector [n] mu_site;
    vector [nSites] mu;

    // Priors
    beta ~ normal( 0 , 1000 );
    gamma ~ normal( 0 , 1000 );
    sigma ~ cauchy( 0 , 25 );
    sigma_S~ cauchy( 0 , 25 );

    mu_site = Z*gamma;
    y ~ normal( mu_site , sigma );
    mu = X*beta;
    gamma ~ normal(mu, sigma_S);
}

"

## write the model to a text file
writeLines(rstanString3, con = "hierarchicalModel.stan")

Arrange the data as a list (as required by Stan). As input, Stan will need to be supplied with: the response variable, the predictor matrix, the number of predictors, the total number of observed items. This all needs to be contained within a list object. We will create two data lists, one for each of the hypotheses.

dt.A <- ddply(data.nest,~Sites,catcolwise(unique))
X<-model.matrix(~A, dt.A)
Z<-model.matrix(~Sites-1, data.nest)
data.nest.list <- list(y=data.nest$y, X=X, Z=Z, n=nrow(data.nest),
  nSites=nrow(X),nA=ncol(X))

Define the nodes (parameters and derivatives) to monitor and the chain parameters.

params <- c("beta","sigma","sigma_S")
burnInSteps = 3000
nChains = 2
numSavedSteps = 3000
thinSteps = 1
nIter = burnInSteps+ceiling((numSavedSteps * thinSteps)/nChains)

Now run the Stan code via the rstan interface.

data.nest.rstan.h <- stan(data = data.nest.list, file = "hierarchicalModel.stan", chains = nChains, pars = params, iter = nIter, warmup = burnInSteps, thin = thinSteps)
NA 
NA SAMPLING FOR MODEL 'anon_model' NOW (CHAIN 1).
NA Chain 1: 
NA Chain 1: Gradient evaluation took 3.3e-05 seconds
NA Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 0.33 seconds.
NA Chain 1: Adjust your expectations accordingly!
NA Chain 1: 
NA Chain 1: 
NA Chain 1: Iteration:    1 / 4500 [  0%]  (Warmup)
NA Chain 1: Iteration:  450 / 4500 [ 10%]  (Warmup)
NA Chain 1: Iteration:  900 / 4500 [ 20%]  (Warmup)
NA Chain 1: Iteration: 1350 / 4500 [ 30%]  (Warmup)
NA Chain 1: Iteration: 1800 / 4500 [ 40%]  (Warmup)
NA Chain 1: Iteration: 2250 / 4500 [ 50%]  (Warmup)
NA Chain 1: Iteration: 2700 / 4500 [ 60%]  (Warmup)
NA Chain 1: Iteration: 3001 / 4500 [ 66%]  (Sampling)
NA Chain 1: Iteration: 3450 / 4500 [ 76%]  (Sampling)
NA Chain 1: Iteration: 3900 / 4500 [ 86%]  (Sampling)
NA Chain 1: Iteration: 4350 / 4500 [ 96%]  (Sampling)
NA Chain 1: Iteration: 4500 / 4500 [100%]  (Sampling)
NA Chain 1: 
NA Chain 1:  Elapsed Time: 0.248 seconds (Warm-up)
NA Chain 1:                0.089 seconds (Sampling)
NA Chain 1:                0.337 seconds (Total)
NA Chain 1: 
NA 
NA SAMPLING FOR MODEL 'anon_model' NOW (CHAIN 2).
NA Chain 2: 
NA Chain 2: Gradient evaluation took 1e-05 seconds
NA Chain 2: 1000 transitions using 10 leapfrog steps per transition would take 0.1 seconds.
NA Chain 2: Adjust your expectations accordingly!
NA Chain 2: 
NA Chain 2: 
NA Chain 2: Iteration:    1 / 4500 [  0%]  (Warmup)
NA Chain 2: Iteration:  450 / 4500 [ 10%]  (Warmup)
NA Chain 2: Iteration:  900 / 4500 [ 20%]  (Warmup)
NA Chain 2: Iteration: 1350 / 4500 [ 30%]  (Warmup)
NA Chain 2: Iteration: 1800 / 4500 [ 40%]  (Warmup)
NA Chain 2: Iteration: 2250 / 4500 [ 50%]  (Warmup)
NA Chain 2: Iteration: 2700 / 4500 [ 60%]  (Warmup)
NA Chain 2: Iteration: 3001 / 4500 [ 66%]  (Sampling)
NA Chain 2: Iteration: 3450 / 4500 [ 76%]  (Sampling)
NA Chain 2: Iteration: 3900 / 4500 [ 86%]  (Sampling)
NA Chain 2: Iteration: 4350 / 4500 [ 96%]  (Sampling)
NA Chain 2: Iteration: 4500 / 4500 [100%]  (Sampling)
NA Chain 2: 
NA Chain 2:  Elapsed Time: 0.242 seconds (Warm-up)
NA Chain 2:                0.103 seconds (Sampling)
NA Chain 2:                0.345 seconds (Total)
NA Chain 2:
print(data.nest.rstan.h, par = c("beta", "sigma", "sigma_S"))
NA Inference for Stan model: anon_model.
NA 2 chains, each with iter=4500; warmup=3000; thin=1; 
NA post-warmup draws per chain=1500, total post-warmup draws=3000.
NA 
NA          mean se_mean   sd  2.5%   25%   50%   75% 97.5% n_eff Rhat
NA beta[1] 42.39    0.13 5.42 31.60 38.77 42.40 45.95 52.81  1758    1
NA beta[2] 27.34    0.17 7.66 12.31 22.48 27.20 32.26 42.77  2089    1
NA beta[3] 40.91    0.17 7.74 25.75 35.93 40.89 45.73 56.32  2100    1
NA sigma    5.04    0.01 0.31  4.49  4.83  5.03  5.24  5.69  3058    1
NA sigma_S 11.51    0.05 2.62  7.59  9.65 11.13 12.86 17.92  2297    1
NA 
NA Samples were drawn using NUTS(diag_e) at Mon Jul 22 12:43:22 2024.
NA For each parameter, n_eff is a crude measure of effective sample size,
NA and Rhat is the potential scale reduction factor on split chains (at 
NA convergence, Rhat=1).
data.nest.rstan.h.df <-as.data.frame(extract(data.nest.rstan.h))
head(data.nest.rstan.h.df)
NA     beta.1   beta.2   beta.3    sigma   sigma_S      lp__
NA 1 37.97267 19.74559 46.25986 4.486942 11.265958 -355.4335
NA 2 43.04120 22.90106 41.98754 5.236119  9.358648 -350.9895
NA 3 41.08867 34.12170 42.33801 5.183215 12.442448 -354.5326
NA 4 50.68459 16.02613 25.41038 5.002917 10.957330 -357.6446
NA 5 47.11724 15.21674 35.28746 5.426139 11.288743 -357.4966
NA 6 43.36194 26.12687 34.56285 5.336729  8.833431 -355.4301

If you want to include finite-population standard deviations in the model you can use the following code.

rstanString4="
data{
   int n;
   int nA;
   int nSites;
   vector [n] y;
   matrix [nSites,nA] X;
   matrix [n,nSites] Z;
}

parameters{
   vector[nA] beta;
   vector[nSites] gamma;
   real<lower=0> sigma;
   real<lower=0> sigma_S;
   
}

model{
    vector [n] mu_site;
    vector [nSites] mu;

    // Priors
    beta ~ normal( 0 , 1000 );
    gamma ~ normal( 0 , 1000 );
    sigma ~ cauchy( 0 , 25 );
    sigma_S~ cauchy( 0 , 25 );

    mu_site = Z*gamma;
    y ~ normal( mu_site , sigma );
    mu = X*beta;
    gamma ~ normal(mu, sigma_S);
}

generated quantities {
    vector [n] mu_site;
    vector [nSites] mu;
    vector [n] y_err;
    real sd_y;
    vector [nSites] mu_site_err;
    real sd_site;
    real sd_A;
    
    mu_site = Z*gamma;
    y_err = mu_site - y;
    sd_y = sd(y_err);

    mu = X*beta;
    mu_site_err = mu - gamma;
    sd_site = sd(mu_site_err);

    sd_A = sd(beta);
}

"

## write the model to a text file
writeLines(rstanString4, con = "SDModel.stan")

#data list
dt.A <- ddply(data.nest,~Sites,catcolwise(unique))
X<-model.matrix(~A, dt.A)
Z<-model.matrix(~Sites-1, data.nest)
data.nest.list <- list(y=data.nest$y, X=X, Z=Z, n=nrow(data.nest),
   nSites=nrow(X),nA=ncol(X))

#parameters and chain details
params <- c('beta','sigma','sigma_S','sd_A','sd_site','sd_y')
adaptSteps = 1000
burnInSteps = 3000
nChains = 2
numSavedSteps = 3000
thinSteps = 1
nIter = burnInSteps+ceiling((numSavedSteps * thinSteps)/nChains)

data.nest.rstan.SD <- stan(data = data.nest.list, file = "SDModel.stan", chains = nChains, pars = params, iter = nIter, warmup = burnInSteps, thin = thinSteps)
NA 
NA SAMPLING FOR MODEL 'anon_model' NOW (CHAIN 1).
NA Chain 1: 
NA Chain 1: Gradient evaluation took 3.3e-05 seconds
NA Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 0.33 seconds.
NA Chain 1: Adjust your expectations accordingly!
NA Chain 1: 
NA Chain 1: 
NA Chain 1: Iteration:    1 / 4500 [  0%]  (Warmup)
NA Chain 1: Iteration:  450 / 4500 [ 10%]  (Warmup)
NA Chain 1: Iteration:  900 / 4500 [ 20%]  (Warmup)
NA Chain 1: Iteration: 1350 / 4500 [ 30%]  (Warmup)
NA Chain 1: Iteration: 1800 / 4500 [ 40%]  (Warmup)
NA Chain 1: Iteration: 2250 / 4500 [ 50%]  (Warmup)
NA Chain 1: Iteration: 2700 / 4500 [ 60%]  (Warmup)
NA Chain 1: Iteration: 3001 / 4500 [ 66%]  (Sampling)
NA Chain 1: Iteration: 3450 / 4500 [ 76%]  (Sampling)
NA Chain 1: Iteration: 3900 / 4500 [ 86%]  (Sampling)
NA Chain 1: Iteration: 4350 / 4500 [ 96%]  (Sampling)
NA Chain 1: Iteration: 4500 / 4500 [100%]  (Sampling)
NA Chain 1: 
NA Chain 1:  Elapsed Time: 0.249 seconds (Warm-up)
NA Chain 1:                0.115 seconds (Sampling)
NA Chain 1:                0.364 seconds (Total)
NA Chain 1: 
NA 
NA SAMPLING FOR MODEL 'anon_model' NOW (CHAIN 2).
NA Chain 2: 
NA Chain 2: Gradient evaluation took 1.1e-05 seconds
NA Chain 2: 1000 transitions using 10 leapfrog steps per transition would take 0.11 seconds.
NA Chain 2: Adjust your expectations accordingly!
NA Chain 2: 
NA Chain 2: 
NA Chain 2: Iteration:    1 / 4500 [  0%]  (Warmup)
NA Chain 2: Iteration:  450 / 4500 [ 10%]  (Warmup)
NA Chain 2: Iteration:  900 / 4500 [ 20%]  (Warmup)
NA Chain 2: Iteration: 1350 / 4500 [ 30%]  (Warmup)
NA Chain 2: Iteration: 1800 / 4500 [ 40%]  (Warmup)
NA Chain 2: Iteration: 2250 / 4500 [ 50%]  (Warmup)
NA Chain 2: Iteration: 2700 / 4500 [ 60%]  (Warmup)
NA Chain 2: Iteration: 3001 / 4500 [ 66%]  (Sampling)
NA Chain 2: Iteration: 3450 / 4500 [ 76%]  (Sampling)
NA Chain 2: Iteration: 3900 / 4500 [ 86%]  (Sampling)
NA Chain 2: Iteration: 4350 / 4500 [ 96%]  (Sampling)
NA Chain 2: Iteration: 4500 / 4500 [100%]  (Sampling)
NA Chain 2: 
NA Chain 2:  Elapsed Time: 0.247 seconds (Warm-up)
NA Chain 2:                0.113 seconds (Sampling)
NA Chain 2:                0.36 seconds (Total)
NA Chain 2:
print(data.nest.rstan.SD, par = c('beta','sigma','sigma_S','sd_A','sd_site','sd_y'))
NA Inference for Stan model: anon_model.
NA 2 chains, each with iter=4500; warmup=3000; thin=1; 
NA post-warmup draws per chain=1500, total post-warmup draws=3000.
NA 
NA          mean se_mean   sd  2.5%   25%   50%   75% 97.5% n_eff Rhat
NA beta[1] 42.38    0.10 5.21 31.59 39.13 42.35 45.65 52.59  2861    1
NA beta[2] 27.42    0.13 7.35 12.38 22.67 27.53 32.19 42.08  3205    1
NA beta[3] 40.80    0.14 7.63 25.26 36.06 40.92 45.51 55.99  3090    1
NA sigma    5.04    0.01 0.31  4.48  4.83  5.03  5.25  5.70  3330    1
NA sigma_S 11.53    0.05 2.61  7.70  9.61 11.11 12.96 17.74  2592    1
NA sd_A    10.20    0.09 4.36  2.89  7.14  9.79 12.63 20.38  2148    1
NA sd_site 10.67    0.03 1.08  9.27  9.95 10.43 11.12 13.40  1385    1
NA sd_y     5.00    0.00 0.10  4.85  4.93  4.99  5.06  5.23  1269    1
NA 
NA Samples were drawn using NUTS(diag_e) at Mon Jul 22 12:43:55 2024.
NA For each parameter, n_eff is a crude measure of effective sample size,
NA and Rhat is the potential scale reduction factor on split chains (at 
NA convergence, Rhat=1).
data.nest.rstan.SD.df <-as.data.frame(extract(data.nest.rstan.SD))
head(data.nest.rstan.SD.df)
NA     beta.1   beta.2   beta.3    sigma  sigma_S     sd_A   sd_site     sd_y
NA 1 44.26601 29.36915 37.10326 5.296506 13.05594 7.450252  9.905245 4.890486
NA 2 42.18142 32.66489 37.72024 5.022314 12.72271 4.761357 10.325405 4.927700
NA 3 39.63720 28.41220 44.19112 4.902748  8.36598 8.121100  9.584279 4.940438
NA 4 39.76594 39.67555 47.83435 4.457138 13.66167 4.684609 11.376652 4.976266
NA 5 35.87241 34.38392 37.95117 5.496118 12.26217 1.791749 10.397791 4.946690
NA 6 38.79789 31.78442 45.31698 5.063085 10.51038 6.767781 10.070051 4.980664
NA        lp__
NA 1 -352.4128
NA 2 -352.8046
NA 3 -352.5025
NA 4 -357.9645
NA 5 -356.4320
NA 6 -354.2973

Data generation - second example

Now imagine a similar experiment in which we intend to measure a response (\(y\)) to one of treatments (three levels; “a1”, “a2” and “a3”). As with the previous design, we decided to establish a nested design in which there are sub-replicate (\(1\)m Quadrats) within each Site. In the current design, we have decided to further sub-replicate. Within each of the \(5\) Quadrats, we are going to randomly place \(2\times10\)cm pit traps. Now we have Sites nested within Treatments, Quadrats nested within Sites AND, Pits nested within Sites. The latter of these (Pits nested within Sites) are the observations (\(y\)). As this section is mainly about the generation of artificial data (and not specifically about what to do with the data), understanding the actual details are optional and can be safely skipped.

set.seed(123)
nTreat <- 3
nSites <- 15
nSitesPerTreat <- nSites/nTreat
nQuads <- 5
nPits <- 2
site.sigma <- 10 # sd within between sites within treatment
quad.sigma <- 10
sigma <- 7.5
n <- nSites * nQuads * nPits
sites <- gl(n=nSites,n/nSites,n, lab=paste("site",1:nSites))
A <- gl(nTreat, n/nTreat, n, labels=c('a1','a2','a3'))
a.means <- c(40,70,80)

#A<-gl(nTreat,nSites/nTreat,nSites,labels=c('a1','a2','a3'))
a<-gl(nTreat,1,nTreat,labels=c('a1','a2','a3'))
a.X <- model.matrix(~a, expand.grid(a))
a.eff <- as.vector(solve(a.X,a.means))
site.means <- rnorm(nSites,a.X %*% a.eff,site.sigma)

A <- gl(nTreat,nSites/nTreat,nSites,labels=c('a1','a2','a3'))
A.X <- model.matrix(~A, expand.grid(A))
#a.X <- model.matrix(~A, expand.grid(A=gl(nTreat,nSites/nTreat,nSites,labels=c('a1','a2','a3'))))
site.means <- rnorm(nSites,A.X %*% a.eff,site.sigma)

SITES <- gl(nSites,(nSites*nQuads)/nSites,nSites*nQuads,labels=paste('site',1:nSites))
sites.X <- model.matrix(~SITES-1)
quad.means <- rnorm(nSites*nQuads,sites.X %*% site.means,quad.sigma)

#SITES <- gl(nSites,1,nSites,labels=paste('site',1:nSites))
#sites.X <- model.matrix(~SITES-1)
#quad.means <- rnorm(nSites*nQuads,sites.X %*% site.means,quad.sigma)

QUADS <- gl(nSites*nQuads,n/(nSites*nQuads),n,labels=paste('quad',1:(nSites*nQuads)))
quads.X <- model.matrix(~QUADS-1)
#quads.eff <- as.vector(solve(quads.X,quad.means))
#pit.means <- rnorm(n,quads.eff %*% t(quads.X),sigma)
pit.means <- rnorm(n,quads.X %*% quad.means,sigma)

PITS <- gl(nPits*nSites*nQuads,1, n, labels=paste('pit',1:(nPits*nSites*nQuads)))
data.nest1<-data.frame(Pits=PITS,Quads=QUADS,Sites=rep(SITES,each=2), A=rep(A,each=nQuads*nPits),y=pit.means)
#data.nest1<-data.nest1[order(data.nest1$A,data.nest1$Sites,data.nest1$Quads),]
head(data.nest1)  #print out the first six rows of the data set
NA    Pits  Quads  Sites  A        y
NA 1 pit 1 quad 1 site 1 a1 61.79607
NA 2 pit 2 quad 1 site 1 a1 56.24699
NA 3 pit 3 quad 2 site 1 a1 42.40885
NA 4 pit 4 quad 2 site 1 a1 52.06672
NA 5 pit 5 quad 3 site 1 a1 73.71286
NA 6 pit 6 quad 3 site 1 a1 62.50529
ggplot(data.nest1, aes(y=y, x=1)) + geom_boxplot() + facet_grid(.~Quads)

Exploratory data analysis

Normality and Homogeneity of variance

#Effects of treatment
boxplot(y~A, ddply(data.nest1, ~A+Sites,numcolwise(mean, na.rm=T)))

#Site effects
boxplot(y~Sites, ddply(data.nest1, ~A+Sites+Quads,numcolwise(mean, na.rm=T)))

#Quadrat effects
boxplot(y~Quads, ddply(data.nest1, ~A+Sites+Quads+Pits,numcolwise(mean, na.rm=T)))

Conclusions:

  • there is no evidence that the response variable is consistently non-normal across all populations - each boxplot is approximately symmetrical.

  • there is no evidence that variance (as estimated by the height of the boxplots) differs between the five populations. More importantly, there is no evidence of a relationship between mean and variance - the height of boxplots does not increase with increasing position along the \(y\)-axis. Hence it there is no evidence of non-homogeneity.

  • it is a little difficult to assess normality/homogeneity of variance of quadrats since there are only two pits per quadrat. Nevertheless, there is no suggestion that variance increases with increasing mean.

Obvious violations could be addressed either by:

  • transform the scale of the response variables (to address normality, etc). Note transformations should be applied to the entire response variable (not just those populations that are skewed).

Model fitting

Frequentist for comparison

library(nlme)
d.lme <- lme(y ~ A, random=~1|Sites/Quads,data=data.nest1)
summary(d.lme)
NA Linear mixed-effects model fit by REML
NA   Data: data.nest1 
NA        AIC      BIC   logLik
NA   1137.994 1155.937 -562.997
NA 
NA Random effects:
NA  Formula: ~1 | Sites
NA         (Intercept)
NA StdDev:    10.38249
NA 
NA  Formula: ~1 | Quads %in% Sites
NA         (Intercept) Residual
NA StdDev:    8.441617 7.161177
NA 
NA Fixed effects:  y ~ A 
NA                Value Std.Error DF  t-value p-value
NA (Intercept) 41.38646  5.043341 75 8.206159  0.0000
NA Aa2         21.36271  7.132361 12 2.995180  0.0112
NA Aa3         39.14584  7.132361 12 5.488482  0.0001
NA  Correlation: 
NA     (Intr) Aa2   
NA Aa2 -0.707       
NA Aa3 -0.707  0.500
NA 
NA Standardized Within-Group Residuals:
NA         Min          Q1         Med          Q3         Max 
NA -2.11852502 -0.54600753 -0.03428581  0.53382436  2.26256392 
NA 
NA Number of Observations: 150
NA Number of Groups: 
NA            Sites Quads %in% Sites 
NA               15               75
anova(d.lme)
NA             numDF denDF  F-value p-value
NA (Intercept)     1    75 446.9150  <.0001
NA A               2    12  15.1037   5e-04

Full effect parameterisation

rstanString="
data{
   int n;
   int nSite;
   int nQuad;
   vector [n] y;
   int A2[n];
   int A3[n];
   int Site[n];
   int Quad[n];
}

parameters{
  real alpha0;
  real alpha2;
  real alpha3;
  real<lower=0> sigma;
  vector [nSite] beta_Site;
  real<lower=0> sigma_Site;
  vector [nQuad] beta_Quad;
  real<lower=0> sigma_Quad;
}
 
model{
    real mu[n];

    // Priors
    alpha0 ~ normal( 0 , 100 );
    alpha2 ~ normal( 0 , 100 );
    alpha3 ~ normal( 0 , 100 );
    beta_Site~ normal( 0 , sigma_Site );
    sigma_Site ~ cauchy( 0 , 25 );
    beta_Quad~ normal( 0 , sigma_Quad );
    sigma_Quad ~ cauchy( 0 , 25 );
    sigma ~ cauchy( 0 , 25 );
    
    for ( i in 1:n ) {
        mu[i] = alpha0 + alpha2*A2[i] + 
               alpha3*A3[i] + beta_Site[Site[i]] + beta_Quad[Quad[i]];
    }
    y ~ normal( mu , sigma );
}

"

## write the model to a text file
writeLines(rstanString, con = "fullModel2.stan")

A2 <- ifelse(data.nest1$A=='a2',1,0)
A3 <- ifelse(data.nest1$A=='a3',1,0)
data.nest.list <- with(data.nest1, list(y=y, A2=A2, A3=A3, Site=as.numeric(Sites),
   n=nrow(data.nest1), nSite=length(levels(Sites)),
   nQuad=length(levels(Quads)), Quad=as.numeric(Quads)))

params <- c('alpha0','alpha2','alpha3','sigma','sigma_Site', 'sigma_Quad')
burnInSteps = 3000
nChains = 2
numSavedSteps = 3000
thinSteps = 1
nIter = burnInSteps+ceiling((numSavedSteps * thinSteps)/nChains)

data.nest1.rstan.f <- stan(data = data.nest.list, file = "fullModel2.stan", chains = nChains, pars = params, iter = nIter, warmup = burnInSteps, thin = thinSteps)
NA 
NA SAMPLING FOR MODEL 'anon_model' NOW (CHAIN 1).
NA Chain 1: 
NA Chain 1: Gradient evaluation took 3.4e-05 seconds
NA Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 0.34 seconds.
NA Chain 1: Adjust your expectations accordingly!
NA Chain 1: 
NA Chain 1: 
NA Chain 1: Iteration:    1 / 4500 [  0%]  (Warmup)
NA Chain 1: Iteration:  450 / 4500 [ 10%]  (Warmup)
NA Chain 1: Iteration:  900 / 4500 [ 20%]  (Warmup)
NA Chain 1: Iteration: 1350 / 4500 [ 30%]  (Warmup)
NA Chain 1: Iteration: 1800 / 4500 [ 40%]  (Warmup)
NA Chain 1: Iteration: 2250 / 4500 [ 50%]  (Warmup)
NA Chain 1: Iteration: 2700 / 4500 [ 60%]  (Warmup)
NA Chain 1: Iteration: 3001 / 4500 [ 66%]  (Sampling)
NA Chain 1: Iteration: 3450 / 4500 [ 76%]  (Sampling)
NA Chain 1: Iteration: 3900 / 4500 [ 86%]  (Sampling)
NA Chain 1: Iteration: 4350 / 4500 [ 96%]  (Sampling)
NA Chain 1: Iteration: 4500 / 4500 [100%]  (Sampling)
NA Chain 1: 
NA Chain 1:  Elapsed Time: 3.081 seconds (Warm-up)
NA Chain 1:                2.472 seconds (Sampling)
NA Chain 1:                5.553 seconds (Total)
NA Chain 1: 
NA 
NA SAMPLING FOR MODEL 'anon_model' NOW (CHAIN 2).
NA Chain 2: 
NA Chain 2: Gradient evaluation took 1.9e-05 seconds
NA Chain 2: 1000 transitions using 10 leapfrog steps per transition would take 0.19 seconds.
NA Chain 2: Adjust your expectations accordingly!
NA Chain 2: 
NA Chain 2: 
NA Chain 2: Iteration:    1 / 4500 [  0%]  (Warmup)
NA Chain 2: Iteration:  450 / 4500 [ 10%]  (Warmup)
NA Chain 2: Iteration:  900 / 4500 [ 20%]  (Warmup)
NA Chain 2: Iteration: 1350 / 4500 [ 30%]  (Warmup)
NA Chain 2: Iteration: 1800 / 4500 [ 40%]  (Warmup)
NA Chain 2: Iteration: 2250 / 4500 [ 50%]  (Warmup)
NA Chain 2: Iteration: 2700 / 4500 [ 60%]  (Warmup)
NA Chain 2: Iteration: 3001 / 4500 [ 66%]  (Sampling)
NA Chain 2: Iteration: 3450 / 4500 [ 76%]  (Sampling)
NA Chain 2: Iteration: 3900 / 4500 [ 86%]  (Sampling)
NA Chain 2: Iteration: 4350 / 4500 [ 96%]  (Sampling)
NA Chain 2: Iteration: 4500 / 4500 [100%]  (Sampling)
NA Chain 2: 
NA Chain 2:  Elapsed Time: 2.708 seconds (Warm-up)
NA Chain 2:                2.875 seconds (Sampling)
NA Chain 2:                5.583 seconds (Total)
NA Chain 2:
print(data.nest1.rstan.f, par = c('alpha0','alpha2','alpha3','sigma','sigma_Site', 'sigma_Quad'))
NA Inference for Stan model: anon_model.
NA 2 chains, each with iter=4500; warmup=3000; thin=1; 
NA post-warmup draws per chain=1500, total post-warmup draws=3000.
NA 
NA             mean se_mean   sd  2.5%   25%   50%   75% 97.5% n_eff Rhat
NA alpha0     41.57    0.14 5.72 30.27 37.98 41.42 45.26 52.74  1614    1
NA alpha2     21.23    0.21 8.21  5.01 16.18 21.30 26.33 37.63  1515    1
NA alpha3     38.58    0.19 7.88 21.98 33.85 38.54 43.57 54.66  1750    1
NA sigma       7.28    0.01 0.61  6.18  6.86  7.25  7.69  8.56  1892    1
NA sigma_Site 11.33    0.07 3.00  6.76  9.28 10.96 12.94 18.46  1937    1
NA sigma_Quad  8.60    0.03 1.16  6.53  7.80  8.54  9.34 11.06  1688    1
NA 
NA Samples were drawn using NUTS(diag_e) at Mon Jul 22 12:44:39 2024.
NA For each parameter, n_eff is a crude measure of effective sample size,
NA and Rhat is the potential scale reduction factor on split chains (at 
NA convergence, Rhat=1).
data.nest1.rstan.f.df <-as.data.frame(extract(data.nest1.rstan.f))
head(data.nest1.rstan.f.df)
NA     alpha0   alpha2   alpha3    sigma sigma_Site sigma_Quad      lp__
NA 1 42.75689 18.78058 40.83975 6.864756   9.238436   8.184025 -599.4890
NA 2 38.59912 22.03675 41.22345 7.177420   8.068274   7.236541 -601.9102
NA 3 49.30184 16.27629 25.40227 7.029283  12.095883   9.013823 -605.9202
NA 4 35.43777 24.99815 41.62737 8.155809  10.740356   9.575282 -610.5900
NA 5 46.74108 16.50691 35.65499 6.641079  10.477957   8.424534 -592.3729
NA 6 45.12399 15.84058 38.70849 6.232023  13.835963   7.301801 -610.0283

Matrix parameterisation

rstanString2="
data{
   int n;
   int nSite;
   int nQuad;
   int nA;
   vector [n] y;
   matrix [n,nA] X;
   int Site[n];
   int Quad[n];
   vector [nA] a0;
   matrix [nA,nA] A0;
}

parameters{
  vector [nA] alpha;
  real<lower=0> sigma;
  vector [nSite] beta_Site;
  real<lower=0> sigma_Site;
  vector [nQuad] beta_Quad;
  real<lower=0> sigma_Quad;
}
 
model{
    real mu[n];

    // Priors
    //alpha ~ normal( 0 , 100 );
    alpha ~ multi_normal(a0,A0);
    beta_Site ~ normal( 0 , sigma_Site );
    sigma_Site ~ cauchy( 0 , 25);
    beta_Quad ~ normal( 0 , sigma_Quad );
    sigma_Quad ~ cauchy( 0 , 25);
    sigma ~ cauchy( 0 , 25 );
    
    for ( i in 1:n ) {
        mu[i] = dot_product(X[i],alpha) + beta_Site[Site[i]] + beta_Quad[Quad[i]];
    }
    y ~ normal( mu , sigma );
}

"

## write the model to a text file
writeLines(rstanString2, con = "matrixModel2.stan")

X <- model.matrix(~A, data.nest)
nA <- ncol(X)
data.nest.list <- with(data.nest1, list(y=y, X=X, Site=as.numeric(Sites),
   Quad=as.numeric(Quads),
   n=nrow(data.nest1), nSite=length(levels(Sites)),
   nQuad=length(levels(Quads)),
   nA=nA,
   a0=rep(0,nA), A0=diag(100000,nA)))

params <- c('alpha','sigma','sigma_Site', 'sigma_Quad')
burnInSteps = 3000
nChains = 2
numSavedSteps = 3000
thinSteps = 1
nIter = burnInSteps+ceiling((numSavedSteps * thinSteps)/nChains)

data.nest1.rstan.m2 <- stan(data = data.nest.list, file = "matrixModel2.stan", chains = nChains, pars = params, iter = nIter, warmup = burnInSteps, thin = thinSteps)
NA 
NA SAMPLING FOR MODEL 'anon_model' NOW (CHAIN 1).
NA Chain 1: 
NA Chain 1: Gradient evaluation took 4.7e-05 seconds
NA Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 0.47 seconds.
NA Chain 1: Adjust your expectations accordingly!
NA Chain 1: 
NA Chain 1: 
NA Chain 1: Iteration:    1 / 4500 [  0%]  (Warmup)
NA Chain 1: Iteration:  450 / 4500 [ 10%]  (Warmup)
NA Chain 1: Iteration:  900 / 4500 [ 20%]  (Warmup)
NA Chain 1: Iteration: 1350 / 4500 [ 30%]  (Warmup)
NA Chain 1: Iteration: 1800 / 4500 [ 40%]  (Warmup)
NA Chain 1: Iteration: 2250 / 4500 [ 50%]  (Warmup)
NA Chain 1: Iteration: 2700 / 4500 [ 60%]  (Warmup)
NA Chain 1: Iteration: 3001 / 4500 [ 66%]  (Sampling)
NA Chain 1: Iteration: 3450 / 4500 [ 76%]  (Sampling)
NA Chain 1: Iteration: 3900 / 4500 [ 86%]  (Sampling)
NA Chain 1: Iteration: 4350 / 4500 [ 96%]  (Sampling)
NA Chain 1: Iteration: 4500 / 4500 [100%]  (Sampling)
NA Chain 1: 
NA Chain 1:  Elapsed Time: 2.162 seconds (Warm-up)
NA Chain 1:                1.431 seconds (Sampling)
NA Chain 1:                3.593 seconds (Total)
NA Chain 1: 
NA 
NA SAMPLING FOR MODEL 'anon_model' NOW (CHAIN 2).
NA Chain 2: 
NA Chain 2: Gradient evaluation took 1.4e-05 seconds
NA Chain 2: 1000 transitions using 10 leapfrog steps per transition would take 0.14 seconds.
NA Chain 2: Adjust your expectations accordingly!
NA Chain 2: 
NA Chain 2: 
NA Chain 2: Iteration:    1 / 4500 [  0%]  (Warmup)
NA Chain 2: Iteration:  450 / 4500 [ 10%]  (Warmup)
NA Chain 2: Iteration:  900 / 4500 [ 20%]  (Warmup)
NA Chain 2: Iteration: 1350 / 4500 [ 30%]  (Warmup)
NA Chain 2: Iteration: 1800 / 4500 [ 40%]  (Warmup)
NA Chain 2: Iteration: 2250 / 4500 [ 50%]  (Warmup)
NA Chain 2: Iteration: 2700 / 4500 [ 60%]  (Warmup)
NA Chain 2: Iteration: 3001 / 4500 [ 66%]  (Sampling)
NA Chain 2: Iteration: 3450 / 4500 [ 76%]  (Sampling)
NA Chain 2: Iteration: 3900 / 4500 [ 86%]  (Sampling)
NA Chain 2: Iteration: 4350 / 4500 [ 96%]  (Sampling)
NA Chain 2: Iteration: 4500 / 4500 [100%]  (Sampling)
NA Chain 2: 
NA Chain 2:  Elapsed Time: 2.231 seconds (Warm-up)
NA Chain 2:                1.191 seconds (Sampling)
NA Chain 2:                3.422 seconds (Total)
NA Chain 2:
print(data.nest1.rstan.m2, par = c('alpha','sigma','sigma_Site', 'sigma_Quad'))
NA Inference for Stan model: anon_model.
NA 2 chains, each with iter=4500; warmup=3000; thin=1; 
NA post-warmup draws per chain=1500, total post-warmup draws=3000.
NA 
NA             mean se_mean   sd  2.5%   25%   50%   75% 97.5% n_eff Rhat
NA alpha[1]   41.39    0.16 5.62 29.81 38.00 41.41 44.96 52.01  1183    1
NA alpha[2]   21.23    0.21 7.88  5.61 16.10 21.25 26.34 36.70  1399    1
NA alpha[3]   39.05    0.22 8.01 23.57 34.01 39.05 44.01 54.87  1303    1
NA sigma       7.29    0.01 0.61  6.23  6.87  7.26  7.68  8.54  1638    1
NA sigma_Site 11.44    0.08 3.14  6.76  9.34 11.00 12.98 18.57  1509    1
NA sigma_Quad  8.56    0.03 1.13  6.54  7.76  8.48  9.29 10.93  1704    1
NA 
NA Samples were drawn using NUTS(diag_e) at Mon Jul 22 12:45:19 2024.
NA For each parameter, n_eff is a crude measure of effective sample size,
NA and Rhat is the potential scale reduction factor on split chains (at 
NA convergence, Rhat=1).
data.nest1.rstan.m2.df <-as.data.frame(extract(data.nest1.rstan.m2))
head(data.nest1.rstan.m2.df)
NA    alpha.1  alpha.2  alpha.3    sigma sigma_Site sigma_Quad      lp__
NA 1 45.46429 15.67815 32.24059 8.029760  15.517948   8.471629 -611.0464
NA 2 43.55070 13.67668 34.26886 8.049030  12.398252   6.753086 -605.1953
NA 3 46.52918 16.82928 35.08262 8.001375  14.637084   6.799795 -613.9694
NA 4 47.98525 21.02618 28.33026 6.677800   8.867418   9.866074 -604.2998
NA 5 50.02583 18.74219 29.08627 6.694833  14.597826   6.932240 -593.2225
NA 6 40.88942 22.87433 39.15047 8.547485   9.473757   8.468906 -599.5212

References

Gelman, Andrew et al. 2006. “Prior Distributions for Variance Parameters in Hierarchical Models (Comment on Article by Browne and Draper).” Bayesian Analysis 1 (3): 515–34.
Gelman, Andrew, Daniel Lee, and Jiqiang Guo. 2015. “Stan: A Probabilistic Programming Language for Bayesian Inference and Optimization.” Journal of Educational and Behavioral Statistics 40 (5): 530–43.
Stan Development Team. 2018. RStan: The R Interface to Stan.” http://mc-stan.org/.