This tutorial will focus on the use of Bayesian estimation to fit simple linear regression models …
Keywords
Software, Statistics, Stan
This tutorial will focus on the use of Bayesian estimation to fit simple linear regression models. BUGS (Bayesian inference Using Gibbs Sampling) is an algorithm and supporting language (resembling R) dedicated to performing the Gibbs sampling implementation of Markov Chain Monte Carlo (MCMC) method. Dialects of the BUGS language are implemented within three main projects:
OpenBUGS - written in component pascal.
JAGS - (Just Another Gibbs Sampler) - written in C++.
STAN - a dedicated Bayesian modelling framework written in C++ and implementing Hamiltonian MCMC samplers.
Whilst the above programs can be used stand-alone, they do offer the rich data pre-processing and graphical capabilities of R, and thus, they are best accessed from within R itself. As such there are multiple packages devoted to interfacing with the various software implementations:
R2OpenBUGS - interfaces with OpenBUGS
R2jags - interfaces with JAGS
rstan - interfaces with STAN
This tutorial will demonstrate how to fit models in JAGS (Plummer (2004)) using the package R2jags (Su et al. (2015)) as interface, which also requires to load some other packages.
Overview
Item response theory (IRT) is a paradigm for investigating the relationship between an individual’s response to a single test item and their performance on an overall measure of the ability or trait that item was intended to measure. Many models exist in the IRT field for evaulating how well an item captures an underlying latent trait, but some of the most popular IRT models are logistic IRT models for dichotmous responses. In particular, the main types of models are:
1 1 parameter logistic model
2 2 parameter logistic model
3 3 parameter logistic model
Throughout this tutorial, I assume that the reader has some basic understanding of IRT model and working knowledge of a software implementation of the JAGS language. However, if this is not the case, excellent sources for learning IRT are Baker and Kim (2004), who provide a mathematically detailed introduction to IRT, and Hambleton, Swaminathan, and Rogers (1991), who give an intuitive introduction to the topic. For an in-depth description of how to implement different types of IRT models in OpenBUGS and JAGS, I also refer to this very nice review of Curtis et al. (2010) and this other online tutorial.
At the core of all the IRT models presented in this tutorial is the Item Response Function (IRF). The IRF estimates the probability of getting item \(j\) “correct” as a function of item characteristics and the \(i\)-th individual’s latent trait/ability level (\(\theta_i\)). These item response functions are defined by a logistic curve (i.e. an \(S\)-shape from \(0-1\)).
1 parameter logistic model (1PLM)
The 1PLM is used for data collected on \(n\) individuals who have each given responses on \(p\) different items. The items have binary outcomes, i.e. the items are scored as \(1\) if correct and \(0\) if not. The \(i\)-th individual in the sample is assumed to have a latent ability \(\theta_i\), and the \(i\)-th individual’s response on the \(j\)-th item is a random variable \(Y_{ij}\) with a Bernoulli distribution. The probability that the \(i\)-th individual correctly answers the \(j\)-th item (i.e. the probability that \(Y_{ij} = 1\)) is assumed to have the following IRF form:
where \(\delta_j\) is the difficulty parameter for item \(j\) of the test, and is assumed to be normally distributed according to some mean \(\mu_{\delta}\) and standard deviation \(\sigma_{\delta}\) which must be specified by the analyst. Each latent ability parameter\(\theta_i\) is also assumed to be distributed according to a standard normal distribution.
Load the data
I read in the data from the file wideformat.csv, which contains (simulated) data from \(n=1000\) individuals taking a \(5\)-item test. Items are coded \(1\) for correct and \(0\) for incorrect responses. When we get descriptives of the data, we see that the items differ in terms of the proportion of people who answered correctly, so we expect that we have some differences in item difficulty here.
Start the JAGS model (check the model, load data into the model, specify the number of chains and compile the model). Run the JAGS code via the R2jags interface and the jags function.
NA Compiling model graph
NA Resolving undeclared variables
NA Allocating nodes
NA Graph information:
NA Observed stochastic nodes: 5000
NA Unobserved stochastic nodes: 6007
NA Total graph size: 26016
NA
NA Initializing model
Plot the item characteristic curves
Item characteristic curves (ICC) are the logistic curves which result from the fitted models (e.g. estimated item difficulty, plugged into the item response function). Latent trait/ability is plotted on the \(x\)-axis (higher values represent hight ability). Probability of a “correct” answer (\(Y_{ij}=1\)) to an item is plotted on the \(y\)-axis.
#see average value of item difficultydiff<-m1.r2jags$BUGSoutput$sims.list$deltaapply(diff,2,mean)
NA [1] -2.8583972 -1.0620835 -0.2608953 -1.3853516 -2.2159276
#plot icc for each individual with respect to each of the 5 itemstheta<-apply(m1.r2jags$BUGSoutput$sims.list$theta, 2, mean)prob<-apply(m1.r2jags$BUGSoutput$sims.list$prob,c(2,3),mean)plot(theta,prob[,1], type ="n", ylab ="probability of correct response", xlab="ability",xlim =c(-2.5,1), ylim =c(0,1))lines(theta,prob[,1],col="red")lines(theta,prob[,2],col="blue")lines(theta,prob[,3],col="orange")lines(theta,prob[,4],col="green")lines(theta,prob[,5],col="black")legend("bottomright",legend =c("1","2","3","4","5"), lty =c(1), col=c("red","blue","orange","green","black"), bty ="n", cex =0.5)
We see that item \(3\) is the most difficult item (it’s curve is farthest to the right), and item \(1\) is the easiest (it’s curve is farthest to the left). The same conclusions can be drawn by checking the difficulty estimates above.
Plot the item information curves
Item information curves (IIC) show how much “information” about the latent trait ability an item gives. Mathematically, these are the \(1\)st derivatives of the ICCs or, equivalently, to the product of the probability of correct and incorrect response. Item information curves peak at the difficulty value (point where the item has the highest discrimination), with less information at ability levels farther from the difficulty estimate. Practically speaking, we can see how a very difficult item will provide very little information about persons with low ability (because the item is already too hard), and very easy items will provide little information about persons with high ability levels.
#plot iic for each individual with respect to each of the 5 itemsneg_prob<-1-probinformation<-prob*neg_probplot(theta,information[,1], type ="n", ylab ="information", xlab="ability",xlim =c(-2.5,1), ylim =c(0,0.3))lines(theta,information[,1],col="red")lines(theta,information[,2],col="blue")lines(theta,information[,3],col="orange")lines(theta,information[,4],col="green")lines(theta,information[,5],col="black")legend("bottomleft",legend =c("1","2","3","4","5"), lty =c(1), col=c("red","blue","orange","green","black"), bty ="n", cex =0.5)
Similar to the ICCs, we see that item \(3\) provides the most information about high ability levels (the peak of its IIC is farthest to the right) and item \(1\) and \(5\) provides the most information about lower ability levels (the peak of its IIC is farthest to the left). We have seen that all ICCs and IICs for the items have the same shape in the 1PL model (i.e. all items are equally good at providing information about the latent trait). In the 2PL and 3PL models, we will see that this does not have to be the case.
Next, we plot the information curve for the whole test. This is simply the sum of the individual IICs above. Ideally, we want a test which provides fairly good covereage of a wide range of latent ability levels. Otherwise, the test is only good at identifying a limited range of ability levels.
#plot iic for each individual with respect to whole testinformation_test<-apply(information,1,sum)plot(theta,information_test, type ="n", ylab ="information (test)", xlab="ability",xlim =c(-2.5,1), ylim =c(0,1.5))lines(theta,information_test,col="black",lty=2)
summary(information_test)
NA Min. 1st Qu. Median Mean 3rd Qu. Max.
NA 0.5472 0.5710 0.7669 0.7756 0.9305 1.0786
We see that this test provides the most information about low ability levels (the peak is around ability level \(-1.5\)), and less information about very high ability levels.
Assess fit
We perform posterior predictive checks to test whether individual items fit the 1PLM by comparing quantities computed from the predictions of the model with those from the observed data. If these match reasonably well, then there is indication that the model has a good fit.
library(bayesplot)library(ggplot2)Y.rep<-m1.r2jags$BUGSoutput$sims.list$Y.rep#Bar plot of y with yrep medians and uncertainty intervals superimposed on the barsppc_bars(Y[,1],Y.rep[1:8,,1]) +ggtitle("Item 1")
ppc_bars(Y[,2],Y.rep[1:8,,2]) +ggtitle("Item 2")
ppc_bars(Y[,3],Y.rep[1:8,,3]) +ggtitle("Item 3")
ppc_bars(Y[,4],Y.rep[1:8,,4]) +ggtitle("Item 4")
ppc_bars(Y[,5],Y.rep[1:8,,5]) +ggtitle("Item 5")
Plot ability scores
We can conclude by summarising and plotting the latent ability scores of the participants
#summary stats for theta across both iterations and individualssummary(theta)
NA Min. 1st Qu. Median Mean 3rd Qu. Max.
NA -2.074587 -0.483828 0.078138 0.001189 0.692185 0.763989
#histogram and kernel density plot of theta averaged across iterationsdens.theta<-density(theta, bw=0.3)hist(theta, breaks =5, prob = T)lines(dens.theta, lwd=2, col="red")
We see that the mean of ability scores is around \(0\), and the standard deviation about \(1\) (these are estimated ability scores are standardised).
2 parameter logistic model (2PLM)
In the 2PLM, the probability that the \(i\)-th individual correctly answers the \(j\)-th item (i.e. the probability that \(Y_{ij} = 1\)) is assumed to have the following IRF form:
where \(\alpha_j\) is the discrimination parameter for item \(j\) of the test, and is assumed to be positive and normally distributed (truncated above zero) according to some mean \(\mu_{\alpha}\) and standard deviation \(\sigma_{\alpha}\) which must be specified by the analyst. The item discriminability \(\alpha_j\) indicates how well an item is able to discriminate between persons with different ability levels. Item discriminability is reflected in the steepness of the slope of the ICC.
Fit the model
We fit the 2PLM to the data using the following JAGS code.
model2<-"model { for (i in 1:n){ for (j in 1:p){ Y[i , j] ~ dbern ( prob [i , j]) logit ( prob [i , j]) <- alpha [j] * ( theta [i] - delta [j]) } theta [i] ~ dnorm (0.0 , 1.0) } for (j in 1:p){ delta [j] ~ dnorm (m.delta , pr.delta ) alpha [j] ~ dnorm (m.alpha , pr.alpha ) T(0 , ) } pr.delta <- pow(s.delta , -2) pr.alpha <- pow(s.alpha , -2) m.delta ~ dnorm(0,5) m.alpha ~ dnorm(0,5) s.delta ~ dunif(0,10) s.alpha ~ dunif(0,10) for(i in 1:n){ for(j in 1:p){ Y.rep[i, j] ~ dbern(prob[i, j]) loglik_y[i, j]<-logdensity.bern(Y[i,j], prob[i, j]) } }}"## write the model to a text filewriteLines(model2, con ="model2PLM.txt")
Next, I define the nodes (parameters and derivatives) to monitor and the chain parameters.
Start the JAGS model (check the model, load data into the model, specify the number of chains and compile the model). Run the JAGS code via the R2jags interface and the jags function.
NA Compiling model graph
NA Resolving undeclared variables
NA Allocating nodes
NA Graph information:
NA Observed stochastic nodes: 5000
NA Unobserved stochastic nodes: 6014
NA Total graph size: 31024
NA
NA Initializing model
Plot the item characteristic curves
Item characteristic curves (ICC) are the logistic curves which result from the fitted models (e.g. estimated item difficulty, plugged into the item response function). Latent trait/ability is plotted on the \(x\)-axis (higher values represent hight ability). Probability of a “correct” answer (\(Y_{ij}=1\)) to an item is plotted on the \(y\)-axis.
discr<-m2.r2jags$BUGSoutput$sims.list$alphadiff<-m2.r2jags$BUGSoutput$sims.list$delta#see average value of item difficultyapply(diff,2,mean)
NA [1] -3.4898494 -1.4275914 -0.3210807 -1.8517091 -3.0020341
#see average value of item discriminabilityapply(discr,2,mean)
NA [1] 0.8300819 0.7204114 0.7766516 0.7243196 0.7142145
#plot icc for each individual with respect to each of the 5 itemstheta<-apply(m2.r2jags$BUGSoutput$sims.list$theta, 2, mean)prob<-apply(m2.r2jags$BUGSoutput$sims.list$prob,c(2,3),mean)plot(theta,prob[,1], type ="n", ylab ="probability of correct response", xlab="ability",xlim =c(-2.5,1), ylim =c(0,1))lines(theta,prob[,1],col="red")lines(theta,prob[,2],col="blue")lines(theta,prob[,3],col="orange")lines(theta,prob[,4],col="green")lines(theta,prob[,5],col="black")legend("bottomright",legend =c("1","2","3","4","5"), lty =c(1), col=c("red","blue","orange","green","black"), bty ="n", cex =0.5)
Unlike the ICCs for the 1PLM, the ICCs for the 2PLM do not all have the same shape. Item curves which are more “spread out” indicate lower discriminability (i.e. that individuals of a range of ability levels have some probability of getting the item correct). Compare this to an item with high discriminability (steep slope): for this item, we have a better estimate of the individual’s latent ability based on whether they got the question right or wrong. Because of the differing slopes, the rank-order of item difficulty changes across different latent ability levels. We can see that item \(3\) is still the most difficult item (i.e. lowest probability of getting correct for most latent trait values, up until about \(\theta=0.2\)). Items \(1\) and \(5\) are the easiest.
Plot the item information curves
Item information curves (IIC) show how much “information” about the latent trait ability an item gives. Mathematically, these are the \(1\)st derivatives of the ICCs or, equivalently, to the product of the probability of correct and incorrect response. Item information curves peak at the difficulty value (point where the item has the highest discrimination), with less information at ability levels farther from the difficulty estimate. Practially speaking, we can see how a very difficult item will provide very little information about persons with low ability (because the item is already too hard), and very easy items will provide little information about persons with high ability levels.
#plot iic for each individual with respect to each of the 5 itemsneg_prob<-1-probinformation<-prob*neg_probinformation2<-information*(apply(discr,2,mean))^2plot(theta,information2[,1], type ="n", ylab ="information", xlab="ability",xlim =c(-2.5,1), ylim =c(0,0.3))lines(theta,information2[,1],col="red")lines(theta,information2[,2],col="blue")lines(theta,information2[,3],col="orange")lines(theta,information2[,4],col="green")lines(theta,information2[,5],col="black")legend("bottomleft",legend =c("1","2","3","4","5"), lty =c(1), col=c("red","blue","orange","green","black"), bty ="n", cex =0.5)
The item IICs demonstrate that some items provide more information about latent ability for different ability levels. The higher the item discriminability estimate, the more information an item provides about ability levels around the point where there is a \(50\%\) chance of getting the item right (i.e. the steepest point in the ICC slope). For example, item \(3\) (orange) clearly provides the most information at high ability levels, around \(\theta=-0.5\), but almost no information about low ability levels (\(< -1\)) because the item is already too hard for those participants. In contrast, item \(1\) (red), which has low discriminability, doesn’t give very much information overall, but covers a wide range of ability levels.
Next, we plot the item information curve for the whole test. This is the sum of all the item IICs above.
#plot iic for each individual with respect to whole testinformation_test<-apply(information2,1,sum)plot(theta,information_test, type ="n", ylab ="information (test)", xlab="ability",xlim =c(-2.5,1), ylim =c(0,1.5))lines(theta,information_test,col="black",lty=2)
summary(information_test)
NA Min. 1st Qu. Median Mean 3rd Qu. Max.
NA 0.3242 0.3973 0.4500 0.4541 0.5182 0.7528
The IIC for the whole test shows that the test provides the most information for slightly-lower-than average ability levels (about \(\theta=-1\)), but does not provide much information about extremely high ability levels.
Assess fit
Next, we check how well the 2PLM fits the data.
Y.rep<-m2.r2jags$BUGSoutput$sims.list$Y.rep#Bar plot of y with yrep medians and uncertainty intervals superimposed on the barsppc_bars(Y[,1],Y.rep[1:8,,1]) +ggtitle("Item 1")
ppc_bars(Y[,2],Y.rep[1:8,,2]) +ggtitle("Item 2")
ppc_bars(Y[,3],Y.rep[1:8,,3]) +ggtitle("Item 3")
ppc_bars(Y[,4],Y.rep[1:8,,4]) +ggtitle("Item 4")
ppc_bars(Y[,5],Y.rep[1:8,,5]) +ggtitle("Item 5")
We can also compare the fit of the 1PLM and 2PLM using relative measures of fit or information criteria. These are computed based on the deviance and a penalty for model complexity called the effective number of parameters \(p\). Here we consider two Bayesian measures known as the Widely Applicable (WAIC) and Leave One Out (LOOIC) Information Criterion, which can be easily obtained through the functions waic and loo in the package loo.
Both criteria suggest that the 2PLM has a slightly better fit to the data.
Plot ability scores
Plot the density curve of the estimated ability scores
#summary stats for theta across both iterations and individualssummary(theta)
NA Min. 1st Qu. Median Mean 3rd Qu. Max.
NA -1.917888 -0.419394 0.078240 0.001822 0.608518 0.677354
#histogram and kernel density plot of theta averaged across iterationsdens.theta<-density(theta, bw=0.3)hist(theta, breaks =5, prob = T)lines(dens.theta, lwd=2, col="red")
We see that the mean of ability scores is around \(0\), and the standard deviation about \(1\) (these are estimated ability scores are standardised).
3 parameter logistic model (3PLM)
In the 3PLM, the probability that the \(i\)-th individual correctly answers the \(j\)-th item (i.e. the probability that \(Y_{ij} = 1\)) is assumed to have the following IRF form:
where \(\eta_j\) is the guessing parameter. Under this model, individuals with zero ability have a nonzero chance of endorsing any item, just by guessing randomly. The guessing parameter is reflected in the \(y\)-intercept (i.e. probability) of the ICC. The parameter is normally distributed according to some mean \(\mu_{\eta}\) and standard deviation \(\sigma_{\eta}\) which must be specified by the analyst.
Fit the model
We fit the 3PLM to the data using the following JAGS code.
model3<-"model { for (i in 1:n){ for (j in 1:p){ Y[i , j] ~ dbern ( prob [i , j]) logit ( prob.star [i , j]) <- alpha [j]*( theta [i] - delta [j]) prob [i , j] <- eta[j ] + (1 - eta[j]) * prob.star [i , j] } theta [i] ~ dnorm (0.0 , 1.0) } for (j in 1:p){ delta [j] ~ dnorm (m.delta , pr.delta ) alpha [j] ~ dnorm (m.alpha , pr.alpha ) T(0 , ) eta[j] ~ dbeta (a.eta , b.eta) } pr.delta <- pow(s.delta , -2) pr.alpha <- pow(s.alpha , -2) m.delta ~ dnorm(0,5) m.alpha ~ dnorm(0,5) s.delta ~ dunif(0,10) s.alpha ~ dunif(0,10) a.eta ~ dunif(0,100) b.eta ~ dunif(0,100) for(i in 1:n){ for(j in 1:p){ Y.rep[i, j] ~ dbern(prob[i, j]) loglik_y[i, j]<-logdensity.bern(Y[i,j], prob[i, j]) } }}"## write the model to a text filewriteLines(model3, con ="model3PLM.txt")
Next, I define the nodes (parameters and derivatives) to monitor and the chain parameters.
Start the JAGS model (check the model, load data into the model, specify the number of chains and compile the model). Run the JAGS code via the R2jags interface and the jags function.
NA Compiling model graph
NA Resolving undeclared variables
NA Allocating nodes
NA Graph information:
NA Observed stochastic nodes: 5000
NA Unobserved stochastic nodes: 6021
NA Total graph size: 41037
NA
NA Initializing model
Plot the item characteristic curves
Item characteristic curves (ICC) are the logistic curves which result from the fitted models (e.g. estimated item difficulty, plugged into the item response function). Latent trait/ability is plotted on the \(x\)-axis (higher values represent hight ability). Probability of a “correct” answer (\(Y_{ij}=1\)) to an item is plotted on the \(y\)-axis.
discr<-m3.r2jags$BUGSoutput$sims.list$alphadiff<-m3.r2jags$BUGSoutput$sims.list$deltagues<-m3.r2jags$BUGSoutput$sims.list$eta#see average value of item difficultyapply(diff,2,mean)
NA [1] -2.8361776 -0.6672983 0.3323608 -1.0648926 -2.1758250
#see average value of item discriminabilityapply(discr,2,mean)
NA [1] 0.9249227 1.1278880 1.4997368 0.8733855 0.8561385
#see average value of item guessingapply(gues,2,mean)
NA [1] 0.2446947 0.2360422 0.2150625 0.2438348 0.2395731
#plot icc for each individual with respect to each of the 5 itemstheta<-apply(m3.r2jags$BUGSoutput$sims.list$theta, 2, mean)prob<-apply(m3.r2jags$BUGSoutput$sims.list$prob,c(2,3),mean)plot(theta,prob[,1], type ="n", ylab ="probability of correct response", xlab="ability",xlim =c(-2.5,1), ylim =c(0,1))lines(theta,prob[,1],col="red")lines(theta,prob[,2],col="blue")lines(theta,prob[,3],col="orange")lines(theta,prob[,4],col="green")lines(theta,prob[,5],col="black")legend("bottomright",legend =c("1","2","3","4","5"), lty =c(1), col=c("red","blue","orange","green","black"), bty ="n", cex =0.5)
The slopes of the ICCs look very similar to those of the 2PLM. We can see that all items have \(y\)-intercepts greater than zero, so that even at very low ability levels, there is some chance of getting these items correct (via guessing).
Plot the item information curves
Item information curves (IIC) show how much “information” about the latent trait ability an item gives. Mathematically, these are the \(1\)st derivatives of the ICCs or, equivalently, to the product of the probability of correct and incorrect response. Item information curves peak at the difficulty value (point where the item has the highest discrimination), with less information at ability levels farther from the difficulty estimate. Practially speaking, we can see how a very difficult item will provide very little information about persons with low ability (because the item is already too hard), and very easy items will provide little information about persons with high ability levels.
Here I plot the IICs using points, rather than lines, to better display the patterns of the individuals, which vary substantially according to whether the item was correctly chosen due to chance or not.
#plot iic for each individual with respect to each of the 5 itemsneg_prob<-1-probinformation.p1<-neg_prob/probinformation.p2<-(prob-apply(gues,2,mean))^2/(1-apply(gues,2,mean))^2information3<-(apply(discr,2,mean))^2*(information.p2)*(information.p1)plot(theta,information3[,1], type ="n", ylab ="information", xlab="ability",xlim =c(-2.5,1), ylim =c(0,0.7))points(theta,information3[,1],col="red")points(theta,information3[,2],col="blue")points(theta,information3[,3],col="orange")points(theta,information3[,4],col="green")points(theta,information3[,5],col="black")legend("bottomleft",legend =c("1","2","3","4","5"), lty =c(1), col=c("red","blue","orange","green","black"), bty ="n", cex =0.5)
Next, we plot the item information curve for the whole test. This is the sum of all the item IICs above.
#plot iic for each individual with respect to whole testinformation_test<-apply(information3,1,sum)plot(theta,information_test, type ="n", ylab ="information (test)", xlab="ability",xlim =c(-2.5,1), ylim =c(0,1.5))points(theta,information_test,col="black",lty=2)
summary(information_test)
NA Min. 1st Qu. Median Mean 3rd Qu. Max.
NA 0.3793 0.4489 0.5062 0.6978 0.7956 1.5096
Assess fit
Next, we check how well the 3PLM fits the data.
Y.rep<-m3.r2jags$BUGSoutput$sims.list$Y.rep#Bar plot of y with yrep medians and uncertainty intervals superimposed on the barsppc_bars(Y[,1],Y.rep[1:8,,1]) +ggtitle("Item 1")
ppc_bars(Y[,2],Y.rep[1:8,,2]) +ggtitle("Item 2")
ppc_bars(Y[,3],Y.rep[1:8,,3]) +ggtitle("Item 3")
ppc_bars(Y[,4],Y.rep[1:8,,4]) +ggtitle("Item 4")
ppc_bars(Y[,5],Y.rep[1:8,,5]) +ggtitle("Item 5")
We can also compare the fit of the 1PLM, 2PLM and 3PLM using relative measures of fit or information criteria. These are computed based on the deviance and a penalty for model complexity called the effective number of parameters \(p\). Here we consider two Bayesian measures known as the Widely Applicable (WAIC) and Leave One Out (LOOIC) Information Criterion, which can be easily obtained through the functions waic and loo in the package loo.
Both criteria suggest that both 1PLM and 2PLM have a better fit to the data than 3PLM.
Plot ability scores
Plot the density curve of the estimated ability scores
#summary stats for theta across both iterations and individualssummary(theta)
NA Min. 1st Qu. Median Mean 3rd Qu. Max.
NA -1.7435119 -0.4170588 0.0471025 0.0006053 0.6298597 0.7189831
#histogram and kernel density plot of theta averaged across iterationsdens.theta<-density(theta, bw=0.3)hist(theta, breaks =5, prob = T)lines(dens.theta, lwd=2, col="red")
We see that the mean of ability scores is around \(0\), and the standard deviation about \(1\) (these are estimated ability scores are standardised).
Conclusions
The use of JAGS software to estimate IRT models allows the user to alter existing code to fit new variations of current models that cannot be fit in existing software packages. For example, longitudinal or multilevel data can easily be accommodated by small changes to existing JAGS code. The JAGS software takes care of the “grunt work” involved in estimating model parameters by constructing an MCMC algorithm to sample from the posterior distribution. Using JAGS frees the user to experiment with different models that may be more appropriate for specialised data than the models that can currently be fit in other software packages. Of course, more complicated models involve more parameters than simpler models, and the analyst must specify prior distributions for these new parameters. This is a small price to pay, however, for the flexibility that the Bayesian framework and JAGS software provide.
References
Baker, Frank B, and Seock-Ho Kim. 2004. Item Response Theory: Parameter Estimation Techniques. CRC Press.
Curtis, S McKay et al. 2010. “BUGS Code for Item Response Theory.”Journal of Statistical Software 36 (1): 1–34.
Hambleton, Ronald K, Hariharan Swaminathan, and H Jane Rogers. 1991. Fundamentals of Item Response Theory. Sage.
Plummer, Martyn. 2004. “JAGS: Just Another Gibbs Sampler.”
Su, Yu-Sung, Masanao Yajima, Maintainer Yu-Sung Su, and JAGS SystemRequirements. 2015. “Package ‘R2jags’.”R Package Version 0.03-08, URL Http://CRAN. R-Project. Org/Package= R2jags.