A quantile regression model for bounded longitudinal data and survival data

Divan A. Burger, Sean van der Merwe, Janet van Niekerk, Emmanuel Lesaffre, Antoine Pironet

Introduction

In this study you are interested in how long people are going to keep taking the medication and how consistently they are taking the medication. Further, you know that these two behaviours interact, and you want to know what the effect of an intervention is on the joint behaviour.

While the example used today is not from South Africa, the work would be useful in South Africa.

Source and authors

This presentation is drawn from our paper that is accepted from publication in Statistical Methods in Medical Research.

Divan A. Burger\(^{1,2,3}\), Sean van der Merwe\(^{3}\), Janet van Niekerk\(^{2,4}\), Emmanuel Lesaffre\(^{5}\), Antoine Pironet\(^{6}\)

Syneos Health, Bloemfontein, Free State, South Africa
Department of Statistics, University of Pretoria, Pretoria, South Africa
Department of Mathematical Statistics and Actuarial Science, University of the Free State, Bloemfontein, South Africa
CEMSE Division, 4700 King Abdullah University of Science and Technology, Thuwal, 23955-6900, Kingdom of Saudi Arabia
I-BioStat, KU Leuven, Leuven, Belgium
AARDEX Group, 15/1, Rue Bois St Jean, 4102, Liège, Belgium

Abstract

This study introduces a novel joint modelling framework
- integrating quantile regression for longitudinal continuous proportions data
- with Cox-like regression for time-to-event analysis
Employing integrated nested Laplace approximation (INLA) for Bayesian inference
Particularly (but not exclusively) in the context of medication adherence and persistence

Abstract (continued)

Applying this model to a dataset of patients who underwent treatment with atorvastatin, we see the
Impact of targeted interventions on improving medication adherence and persistence
The value of dynamic prediction
Simulation studies validate the reliability of our modelling approach

The problem that inspired the research

Adherence to a full course of medication

Vrijens et al. (2006) evaluated the influence of a pharmaceutical care program on the adherence and persistence of patients to a prescribed regimen of once-daily atorvastatin, a medication for managing cholesterol levels and mitigating the risk of cardiovascular diseases.
The study divided participants into two cohorts:
- a control group that received standard care and
- an intervention group engaged in a pharmaceutical care program.
The study required scheduled follow-up visits over 12 months based on the dosing frequency prescribed.

The variable types

Patients’ adherence is treated as a continuous proportion outcome, tracked monthly.
- This adherence data is derived from electronically compiled dosing histories.
Persistence is a time-to-event outcome that measures the treatment regimen’s duration from initiation to discontinuation.
The initial day of the study was recorded as the day the patients first opened their pill containers, initiating the baseline period that lasted until the second visit to the pharmacy. The duration of this baseline period varied among individuals.

Interactions and covariates

People who are taking the medication (adherence) daily seldom go cold-turkey,
but people who barely remember to take a pill in a week have no problem going all the way to zero.
This relationship should be part of the model for how long they will keep taking the pills (persistence).
People who receive the intervention, explaining/reminding them of the importance of taking the pills, will hopefully be more likely to both adhere and persist, although not necessarily to the same extent.
There may be other measurable factors that influence adherence and persistence (covariates).

The proposed model

The Kumaraswamy distribution

\[f\left(y;\alpha_1,\alpha_2\right)=\alpha_1\alpha_2y^{\alpha_1-1}\left(1-y^{\alpha_1}\right)^{\alpha_2-1},\] where \(0<y<1\) and \(\alpha_1,\alpha_2>0\) are shape parameters (Mitnik and Baek 2013).

An alternative parameterization of the Kumaraswamy distribution involves expressing \(\alpha_1\) and \(\alpha_2\) in terms of the \(q^\text{th}\) quantile location parameter \(\kappa_q\), defined such that \(F^{-1}\left(q\right)=\kappa_q\), where \(F^{-1}\) is the inverse of the CDF (Burger et al. 2023).

In this parameterization, \(\alpha_1\) and \(\alpha_2\) are given by \(\alpha_1=\frac{\log\left(1-\left(1-q\right)^\frac{1}{\alpha_2}\right)}{\log\left(\kappa_q\right)}\) and \(\alpha_2=\frac{\log\left(1-q\right)}{\log\left(1-e^{-\psi}\right)}\), with \(\psi>0\) as the precision parameter.

Continuous proportions quantile regression

Suppose that \(y_{ij}\) is the continuous proportions outcome for patient \(i=1,\ldots,I\) and time point \(j=1,\ldots,J_i\). The regression model of the \(q^\text{th}\) conditional quantile of \(y_{ij}\) is written as \(\kappa_{q_{ij}} = g^{-1}(\eta_i\left(t_{j}\right)),\)

where \(\eta_{q,i}\left(t_{ij}\right) = \left(\beta_0 + b_{0_i}\right) + \left(\beta_{\text{time}} + b_{\text{time}_i} + \mathbf{z}_i^{\prime} {\boldsymbol\beta}_{\text{tx}}\right) t_{ij} + \mathbf{x}_i^{\prime} {\boldsymbol\beta}_{\text{cov}}\)

is the linear predictor corresponding to subject \(i\) at time point \(t_{ij}\).

\(g\) is the logit link function; \(\beta_0\) is the fixed intercept; \(\beta_{\text{time}}\) is the fixed coefficient for the effect of time; \({\boldsymbol\beta}_{\text{cov}}\) is a vector of fixed coefficients corresponding to the covariates in \(\mathbf{x}_i\); \(\mathbf{z}_i\) is a vector of covariates for subject \(i\) that interact with time, and \({\boldsymbol\beta}_{\text{tx}}\) is a vector of coefficients for these interactions; \(b_{0_i}\) and \(b_{\text{time}_i}\) are the random intercept and slope for subject \(i\).

Time-to-event submodel

The time-to-event submodel is formulated using the Cox proportional hazards model to analyse the time until an event of interest occurs, considering both the event time and censoring.
The hazard function for patient \(i\) at time \(t\) in the Cox model is given by: \[h_i\left(t\right) = h_0\left(t\right)\exp\left(\phi\eta_i\left(t\right) + \mathbf{w}_i^{'}{\boldsymbol\gamma}\right),\] where \(h_i\left(t\right)\) denotes the hazard function for patient \(i\) at time \(t\); \(h_0\left(t\right)\) is the baseline hazard function; \(\eta_i\left(t\right)\) represents the linear predictor derived from the longitudinal sub-model for patient \(i\) at time \(t\); and \(\phi\) measures the impact of the longitudinal process on the event risk; \(\mathbf{w}_i\) is a design matrix for patient \(i\) containing additional covariates; and \(\boldsymbol\gamma\) are coefficients.

The baseline hazard

In the INLA framework, the baseline hazard \(h_0\left(t\right)\) is discretised into intervals to allow flexibility in modelling the hazard over time. Specifically, for each interval \(k = 1, \ldots, K\), the hazard is assumed to be constant and is defined in terms of the RW2 process:

\[\delta_k = 2\delta_{k-1} - \delta_{k-2} + \epsilon_k,\] with \(\epsilon_k \sim N(0, \sigma^2)\). This formulation ensures a smoothing effect, reflecting the change from \(\delta_{k-1}\) to \(\delta_k\) based on the previous change (from \(\delta_{k-2}\) to \(\delta_{k-1}\)). Then \[h_0\left(t\right) = \exp(\delta_k) \quad \text{for } t \in (s_{k-1}, s_k],\] where \(\delta_k\) represents the log-baseline hazard at time \(t\) within the interval \(k\), and \((s_{k-1}, s_k]\) denotes the time interval.

The survival function

The survival function for patient \(i\), denoted as \(S_i\left(t\right)\), is defined as the probability of survival beyond time \(t\), expressed as the exponential of the negative cumulative hazard function: \[S_i\left(t\right) = \exp\left(-H_i\left(t\right)\right),\] where \(H_i\left(t\right)\) is the cumulative hazard function for patient \(i\). In the context of the RW2 model, the cumulative hazard is calculated as: \[H_i\left(t\right) = \sum_{k=1}^{K\left(t\right)} \left[\int_{s_{k-1}}^{\min\left(t, s_k\right)}\exp\left(\delta_k + \phi \eta_i\left(u\right)+\mathbf{w}_i^{'}{\boldsymbol\gamma}\right)du\right],\] where \(K\left(t\right)\) is the number of intervals up to time \(t\).

The model fitting

You might be thinking this seems a bit complicated, but I wrote a detailed code guide explaining every detail of the implementation.

Basics

Started with descriptives and illustrations to understand the data
Did Kaplan-Maier analysis, but the usefulness is very limited

We employ the INLA methodology (Rue, Martino, and Chopin 2009) to estimate the posterior distribution of the model parameters
- It fits large and complex models in seconds

The results

If plots appear badly sized please refresh (F5).

Coefficients

Partial dependence plots

Primary variables of interest are treatment and time
- These have strong interactions
- So these are the primary ‘axes’
In this study the averages are 61.1 for age, 11.4 for cardiac risk score, and 56.6 for cholesterol score
- These are of secondary interest
- So these are typically fixed at the averages
- Or illustrated at select values

Longitudinal adherence treatment effect

In the plots to follow each line has two intervals: a narrower interval representing the average subject, and a wider interval representing a random future subject.

By age

Hazard function

This boring plot is just a check that the modelled hazards are indeed proportional. If the estimate lines are parallel on the log scale then all is well, if not then we messed up the modelling. Parallel lines do not imply that the true hazards are proportional.

Survival curves

This plot is far more interesting as it shows the modelled persistence.

Making the results practical for a future person

Web interface

The company that owns the data (typically the one who developed the medication) can host an app on their website
That would allow a doctor or pharmacist to type in a patient’s record anonymously
And get reasonable predictions in seconds
- of how well the patient will take the pills (adherence) and
- how long a patient will keep taking the pills (persistance)

Mockup

Here is what the app interface could look like (“Here’s one I prepared earlier”)
Download from Janet’s Github Repository

Simulation studies

16 Scenarios comparing the models and alternatives for various parameter combinations
Additional scenarios with contamination to test robustness

Summary of one scenario

Parameter	Value	Bias	Coverage
\(\beta_0\)	1.5	-0.056	0.933
\(\beta_{\text{time}}\)	-0.4	0.010	0.933
\(\beta_{\text{bin}}\)	0.4	-0.023	0.957
\(\beta_{\text{cont}}\)	0.5	-0.012	0.937
\(\psi\)	0.3	0.007	0.917
\(\sigma_{b_0}\)	0.2	0.276	0.920
\(\sigma_{b_1}\)	0.2	0.267	0.970
\(\rho_{{b_0},{b_1}}\)	0.2	-0.211	0.957
\(\phi\)	-0.4	-0.010	0.940

Conclusion

Summary

We saw that it is both possible and sensible to model the joint processes together.
The data types and processes might be very different, but they are connected, so statistical creativity and ingenuity is needed.
As mathematically intricate as this model may seem, it is ultimately built on an understanding of the real world data generating process, and driven by the need to help address a real world problem.

This presentation was created using the Reveal.js format in Quarto, using the RStudio IDE. Font and line colours according to UFS branding, and background image by CoPilot (DALL-E_v3) using image editor GIMP.

References

Burger, D. A., S. van der Merwe, E. Lesaffre, P. C. le Roux, and M. J. Raath-Krüger. 2023. “A robust mixed-effects parametric quantile regression model for continuous proportions: Quantifying the constraints to vitality in cushion plants.” Statistica Neerlandica 77 (4): 444–70.

Mitnik, P. A., and S. Baek. 2013. “The Kumaraswamy distribution: median-dispersion re-parameterizations for regression modeling and simulation-based estimation.” Statistical Papers 54 (1): 177–92.

Rue, H., S. Martino, and N. Chopin. 2009. “Approximate Bayesian inference for latent Gaussian models by using integrated nested Laplace approximations.” Journal of the Royal Statistical Society: Series B 71 (2): 319–92.

Vrijens, B., A. Belmans, K. Matthys, de Klerk, E., and E. Lesaffre. 2006. “Effect of intervention through a pharmaceutical care program on patient adherence with prescribed once-daily atorvastatin.” Pharmacoepidemiology and Drug Safety 15 (2): 115–21.