Flexible regression of bounded count data

Divan A. Burger, Sean van der Merwe, Emmanuel Lesaffre

Introduction

This presentation focusses on the problem of parametric regression where the dependent variable is a bounded integer of any kind.

Source and authors

This presentation is drawn from our paper that is under review at “Statistical Modelling”.

Divan A. Burger\(^{1,2,3}\), Sean van der Merwe\(^{3}\), Emmanuel Lesaffre\(^{5}\)

Syneos Health, Bloemfontein, Free State, South Africa
Department of Statistics, University of Pretoria, Pretoria, South Africa
Department of Mathematical Statistics and Actuarial Science, University of the Free State, Bloemfontein, South Africa
I-BioStat, KU Leuven, Leuven, Belgium

Outline

The type of data
The basic models
The more interesting models (including ours)
The results

Examples of data

Our data

The example in the paper is the atorvastatin dataset from the study by Vrijens et al. (2006) which examined medication adherence
Assessed the impact of a pharmaceutical care program on adherence to once-daily atorvastatin, used to manage cholesterol levels
A control group received standard care, and an intervention group participated in a program providing personalized counselling, medication management, and adherence support
The study included 392 patients across both groups
Medication Event Monitoring System (MEMS) records medication package openings
Adherence was calculated monthly as the number of days the medication package was opened

Other examples

Test results (marks awarded), particularly smaller tests with granular marking
- Suppose you have multiple markers and want to detect systematic bias
Olympic shooting scores
Ratings on a survey or scorecard
- Opinion ratings
- Performance ratings
- Pain ratings in medicine
- Product or App ratings

The basic models

Generalised Linear Models (GLMs)

A GLM revolves around two things:

A distribution explaining the variation
A link function \((g)\) connecting the expected value to the linear predictor

\[g(E[Y])=X\boldsymbol\beta \]

For continuous data we might assume that \(y_i \sim N(\mu_i, \sigma)\),

and that \(\mu_i=\mathbf{x}_i\boldsymbol\beta\) with \(g(\mu)=\mu\).

Binomial Generalised Linear Model

If \(y_i \sim Bin(m, p_i)\) then a good link function is the scaled logit: \(g(\mu)=\log \frac{\mu/m}{(1-\mu/m)}\)

It converts the bounded probability of success \(0\leq p_i \leq 1\) to an unbounded \((-\infty;\infty)\) space for the linear predictor as follows:

\(logit\ p_i = \mathbf{x}_i\boldsymbol\beta\ \Rightarrow\ p_i = \frac{1}{1+\exp(-\mathbf{x}_i\boldsymbol\beta)}\)

What aspect of the Gaussian (normal) model is missing here?

Implied variation

The binomial distribution has variance equal to \(m\ p(1-p)\). It does not have a separate parameter for the variation around what is expected \((m\ p)\).
The Poisson is similar in that it has variance of \(\lambda\), equal to the expected value \((\lambda)\), and is thus most suitable for unbounded count processes that meet its strict assumptions.
That is why unbounded counts are very often modelled with a negative binomial (type 2) distribution, which is closely related to the Poisson but with an extra parameter that allows for extra variance.
Specifically, if we say that \(\lambda\sim Gamma\) instead of being fixed then the negative binomial arises as a mixture distribution. This is very useful.

The more interesting models

Can the same principle be applied in the binomial case?

Beta-Binomial

If we define the success probability \(p\) of a binomial to follow a beta distribution then we arrive at the beta-binomial distribution.
One can then put a GLM on the beta distribution, just like if you were doing a beta regression.
- Beta regression is itself not easy, since the classic parameters are designed for simple probabilities.
- One possible transformation is \(\alpha=\mu v\) and \(\beta=(1-\mu)v\)
- Then we can set \(logit\ \mu_i = \mathbf{x}_i\boldsymbol\beta\)
This works, but it is cumbersome and beta regression has some issues of its own.

Binomial GLM with innovations

A much simpler approach is to use a standard distribution for innovations/errors/residuals, such as the normal or Student-\(t\), on the logit scale. This is what we have implemented and tested.

\[logit\ p_i \sim t(\nu, \mu_i, \sigma)\]

where \(\mu_i = \mathbf{x}_i\boldsymbol\beta\) for fixed effects regression, or with some fun added terms for mixed effects models.

Our data had irregular and unbalanced repeated sampling
- People are not reliable so a lot of months have missing values
So including random effects per individual is essential to reliable inference

The implementation and results

Let’s get practical…

Model fit procedure

The best tool for fitting this particular model is JAGS
- JAGS stands for Just Another Gibbs Sampler
The fitting was done using the R package runjags (Denwood 2016)
JAGS uses clever Gibbs sampling to arrive at posterior simulations for all the parameters
Once you have posterior simulations you can calculate anything you want
- Fits and predictions for a particular subject
- Fits for the average subject
- Predictions for a random future subject, with full uncertainty

Real data model fits

Real data parameter estimates

Model comparison

To compare the models for real data the following measures were employed:

WAIC (Watanabe and Opper 2010) using the loo package in R
- LOOIC is also produced from this package and is generally the best IC since it is based on leave-one-out cross validation
The K-L divergence is used to evaluate model adequacy, identifying influential observations and outliers (Wang and Luo 2016)
Standardised residuals as a measure of goodness of fit based on the posterior predictive distribution (Hartig and Lohse 2022)
- Also calculated p-values for uniformity (Kolmogorov-Smirnov test), overdispersion (parametric bootstrap test), and outliers (binomial test) based on standardised residuals

Real data residuals

Real data residual diagnostics

Considering the residual diagnostic test p-values below, only the binomial-logit-\(t\) model fully fits the data.

Prediction for example subject

Here we consider a fake subject from a simulated data set (similar to the real data though). The subject is fitted along with many others, but thanks to the use of random effects we obtain personalised predictions. The predictions can extend past the fitted portion of the data.

Simulation study

Two simulation studies were considered: first 16 scenarios of regular data from the proposed models, and then 3 levels of data contamination, i.e. mostly well behaved data but with extreme outliers thrown in to try to break the models.

Results for selected parameter

For all the coefficients \((\beta_0,\dots,\beta_3)\) the coverage of the 95% interval was between 93% and 96%, using 1000 data sets per scenario.

Contamination study results

For no contamination, there is no difference between the models examined
For low (2%) contamination, there is only a small difference in RMSE
For high contamination (10%), there is a clear difference in RMSE:

Conclusion

Summary

Allowing for additional variation on the underlying (latent) scale beneath the observed counts results in more robust fits and better inference
Useful when data is not all neat and tidy
- Disclaimer: if your data is neat and tidy then stick to simpler approaches as this method will not be able to separate the sources of variation perfectly

This presentation was created using the Reveal.js format in Quarto, using the RStudio IDE. Font and line colours according to UFS branding, and background image using image editor GIMP by compositing images from CoPilot (DALL-E_v3).

References

Denwood, M. J. 2016. “runjags: An R package providing interface utilities, model templates, parallel computing methods and additional distributions for MCMC models in JAGS.” Journal of Statistical Software 71 (9): 1–25.

Hartig, F., and L. Lohse. 2022. DHARMa: residual diagnostics for hierarchical (multi-level / mixed) regression models. https://florianhartig.github.io/DHARMa/.

Vrijens, B., A. Belmans, K. Matthys, de Klerk, E., and E. Lesaffre. 2006. “Effect of intervention through a pharmaceutical care program on patient adherence with prescribed once-daily atorvastatin.” Pharmacoepidemiology and Drug Safety 15 (2): 115–21.

Wang, J., and S. Luo. 2016. “Augmented Beta rectangular regression models: A Bayesian perspective.” Biometrical Journal 58 (1): 206–21.

Watanabe, S., and M. Opper. 2010. “Asymptotic equivalence of Bayes cross validation and widely applicable information criterion in singular learning theory.” Journal of Machine Learning Research 11 (12).