November 2020
If all the nuts you ever have to tighten are the same one or two sizes then you buy the best spanners in those sizes. You don’t use a monkey wrench that needs to be set each time. But if you’re encountering different sizes of nuts, fittings, and other objects that need tightening then you get a monkey wrench. In statistical modelling, Stan is the monkey wrench.
According to https://mc-Stan.org/: “Stan is a state-of-the-art platform for statistical modelling and high-performance statistical computation.”
You type out your model in Stan language and then Stan does all the work, including: simulating the whole posterior distribution, simulating the approximate posterior, and/or Maximum Likelihood.
Suppose you are doing a regression, but
While tools exist for dealing with each of these issues, each issue has its own tools with their own notations and limitations. Few of these tools can handle any of the other issues at the same time.
With Stan you can handle all these issues at the same time.
The Stan language is intuitive and sensible, but you will still need to look up most things the first time you want to use them.
The key bookmark is https://mc-stan.org/users/documentation/ which has everything you need.
I find myself visiting the language functions reference often to see the exact notation for the distribution I want to use.
For the other issues, a standard internet search gets you very far. Stan is popular free software so it has a large and active user base ready to help.
There are many helper packages that can do a lot of (or all) the work for you, including creating the model code for you to play with.
Stan is currently available for R, Python, and C++ as far as I can tell. We will focus on rstan as it’s the easiest for most statisticians.
There are many ways to use Stan in R:
I will use Rmarkdown to demonstrate.
I use Rmarkdown for almost everything, including teaching, assessment, consultation, some research, and even making presentations like this one.
To use Rmarkdown you open RStudio, click File -> New File -> R Markdown…
To put in a Stan model you click on the Insert button at the top right of the code window, then click Stan.
On the next slide I’ll show a very basic Stan model in RStudio.
Notice the play button in the top right corner (circled in red).
The critical components are:
The optional components are:
To be able to use rstan the way I do:
Let’s apply the t model to some data.
library(openxlsx) sourcedata <- read.xlsx("IBM19891998.xlsx") lret <- sourcedata$LogReturns dates <- sourcedata$year + sourcedata$month/12 + sourcedata$day/365 library(viridisLite) cols <- viridis(3) par(mfrow = c(2, 1), mar = c(4, 4, 2, 0.2)) plot(dates, lret, type = "l", col = cols[1], lwd = 1, xlab = "Date", ylab = "Log return", main = "IBM") hist(lret, 50, col = cols[2], density = 20, ylab = "Frequency", xlab = "Log return", main = "IBM")
stan_data <- list(x = lret, n = length(lret)) # Create the data list fitML1 <- optimizing(studentt, stan_data) # ML estimation # This is the main simulation function: fitBayes1 <- sampling(studentt, stan_data, chains = mycores) # Type ?sampling for a lot of additional options saveRDS(fitBayes1, "studenttIBM.Rds") # Save model fit summary(fitBayes1)$summary # See results of simulation traceplot(fitBayes1) # See simulation paths to check for stability list_of_draws1 <- extract(fitBayes1) # Extract simulations
mean | se_mean | sd | 2.5% | 97.5% | Rhat | |
---|---|---|---|---|---|---|
v | 4.263 | 0.01 | 0.371 | 3.604 | 5.067 | 1.000 |
s | 0.013 | 0.00 | 0.000 | 0.012 | 0.013 | 1.002 |
mu | 0.000 | 0.00 | 0.000 | 0.000 | 0.001 | 1.000 |
The Rhat measure should always be close to 1.
Rhat compares the variation between chains to the variation within chains. If these are the same then that is a good indication of convergence in distribution.
On the trace plot you should see the same patterns on the left as on the right to indicate that you have done enough burn-in.
You should also see the same sort of patterns in each chain.
When the chains are viewed together they should cover the whole posterior distribution.
# Posterior Predictive Distribution postpred1 <- rt(length(list_of_draws1$mu), list_of_draws1$v) * list_of_draws1$s + list_of_draws1$mu par(mar = c(4.4, 4.4, 0.4, 0.4)) plot(density(postpred1), xlab = "Future values", ylab = "Posterior density", main = "", col = "green", lwd = 2)
We don’t need to only use standard distributions in Stan.
For example, we could use a better prior on the degrees of freedom (provided by Prof Abrie van der Merwe):
\[p(\nu)\propto \left[\psi'(0.5\nu)-\psi'(0.5(\nu+1))-2(\nu+3)\nu^{-1}(\nu+1)^{-2}\right]^{0.5}\]
where \(\psi'\) is the trigamma function.
A standard regression model is just fitting a normal distribution with a linear function for the mean parameter. From Part 1.1 of the Stan user manual:
But with Stan we can fit any functions on any parameters.
(Within reason obviously, simpler is better.)
Opportunistic potatoes are a pest that affects farms in many negative ways and need to be controlled. This experiment attempts to find the economically optimal dose of a specific treatment for this purpose.
In the original experiment 8 measures were taken for various cultivars and doses, but we will restrict ourselves to only one measure and combine the cultivars for this discussion.
This research was done by Talana Cronje, UFS, on behalf of Potatoes South Africa.
They applied the treatment at various proportions of the recommended dose. Lower proportions are cheaper to implement.
You must enable Javascript to view this page properly.
Can we get the same answer more formally?
I propose the following model:
Given an unknown change point \(\gamma\), the model is formally defined as \(y_i\sim N(\beta_0+\beta_1(x_i-\gamma)I(x_i>\gamma)+\beta_2(x_i-\gamma)I(x_i<\gamma),\ \sigma^2)\).
After fitting the model, we can obtain intervals for any quantity of interest as follows:
pvalfunc <- function(sims, target = 0) { 2 * min(mean(sims < target), mean(sims > target)) }
For a given coverage, a shorter interval provides more useful information.
If the distribution of the quantity of interest is skew then the symmetric interval is not ideal.
We can calculate the shortest interval, also known as the highest posterior density (HPD) interval, quite easily though if we have simulations.
The process is simple: to get the shortest 95% interval, just work out the length of every 95% interval and see which one is the shortest!
shortestinterval <- function(sims, alpha = 0.05) { # Coded by Sean van der Merwe, UFS sorted.sims <- sort(sims) nsims <- length(sims) gap <- round(nsims * (1 - alpha)) widths <- diff(sorted.sims, gap) interval <- sorted.sims[c(which.min(widths), (which.min(widths) + gap))] return(interval) }
If we take logs then the zeros are a problem. We can consider zeros as censored below 1 or 0.5.
Here we want to do a regression on the scale parameter as it changes with time and claimant properties.
Another advantage of Stan is efficient implementation of Leave-One-Out (LOO) cross-validation.
If we have multiple models we can compare their fit on the same set of data in a reliable way.
The most common uses would be to compare a complicated and simple model formulation, or to compare models with different sets of explanatory variables.
The steps are: calculate ‘log_lik’ for individual observations in your Stan models as a generated quantity, load the loo library https://cran.r-project.org/web/packages/loo/index.html , and then compare the models.
log_lik_1 <- extract_log_lik(ModelFit1, merge_chains = FALSE) log_lik_2 <- extract_log_lik(ModelFit2, merge_chains = FALSE) r_eff_1 <- relative_eff(exp(log_lik_1), cores = mycores) r_eff_2 <- relative_eff(exp(log_lik_2), cores = mycores) loo_1 <- loo(log_lik_1, r_eff = r_eff_1, cores = mycores) loo_2 <- loo(log_lik_2, r_eff = r_eff_2, cores = mycores) comp <- loo_compare(loo_1, loo_2) print(comp, simplify = FALSE)
Thinning is the process of dropping some simulations from the output.
At some point a few decades ago it became popular to use high thinning values, like only using every hundredth simulated vector. In the early 2000s it became clear that this is almost always counter-productive, and yet people still keep doing it.
If your model is very complicated and you are doing massively long simulation chains then it may be useful to apply thinning of 2, i.e. keeping every second simulation vector, in order to speed up post-processing; but never more than that.
Almost all statisticians are just fine forgetting that this is even a thing.
We were asked to help rank 200 types of wheat according to some measures of how well they resist different pathogens.
While we did various standard analyses, the measurements were quite skew and seemed like they should fit a gamma model. The measurements were taken under varying conditions, such as site, year, and pathogen. Ultimately though we need a single measure of resistance per wheat strain under general conditions.
One possible approach is to model the measurements jointly, with wheat strain as fixed effect and the nuisance factors as random effects. We also have to account for differing variances under differing conditions.
The general form of the model is:
\(y_i\sim Gamma(\alpha_i,\lambda_i)\)
\(\alpha_i=\mu_i^2/\sigma^2_{g_i}\)
\(\lambda_i=\mu_i/\sigma^2_{g_i}\)
\(\mu_i=\exp\left(\eta_{g_i} + \beta_{s_i}\right)\)
\(\eta_j\sim N(0,\tau^2)\)
\(g\) indicates the group and \(s\) the subject, so \(\exp(\beta_{s_1}),\exp(\beta_{s_2}),\dots\) is what we are interested in.
Stan is the flexible all-purpose solution for people who find themselves with unusual statistical modelling problems.
It can do much more advanced stuff than what was shown here. See their website for more information.
The help files are extensive. The reference documents are the most useful when you are trying to solve a specific problem.
There are many helper packages that can do a lot of (or all) the work for you, including creating the model code for you to play with.
Fitting models with Stan can be a lot of fun because of the freedom it provides.