In this study you are interested in how long people are going to keep taking the medication and how consistently they are taking the medication. Further, you know that these two behaviours interact, and you want to know what the effect of an intervention is on the joint behaviour.
While the example used today is not from South Africa, the work would be useful in South Africa.
This presentation is drawn from our paper that is accepted from publication in Statistical Methods in Medical Research.
Divan A. Burger\(^{1,2,3}\), Sean van der Merwe\(^{3}\), Janet van Niekerk\(^{2,4}\), Emmanuel Lesaffre\(^{5}\), Antoine Pironet\(^{6}\)
\[f\left(y;\alpha_1,\alpha_2\right)=\alpha_1\alpha_2y^{\alpha_1-1}\left(1-y^{\alpha_1}\right)^{\alpha_2-1},\] where \(0<y<1\) and \(\alpha_1,\alpha_2>0\) are shape parameters (Mitnik and Baek 2013).
An alternative parameterization of the Kumaraswamy distribution involves expressing \(\alpha_1\) and \(\alpha_2\) in terms of the \(q^\text{th}\) quantile location parameter \(\kappa_q\), defined such that \(F^{-1}\left(q\right)=\kappa_q\), where \(F^{-1}\) is the inverse of the CDF (Burger et al. 2023).
In this parameterization, \(\alpha_1\) and \(\alpha_2\) are given by \(\alpha_1=\frac{\log\left(1-\left(1-q\right)^\frac{1}{\alpha_2}\right)}{\log\left(\kappa_q\right)}\) and \(\alpha_2=\frac{\log\left(1-q\right)}{\log\left(1-e^{-\psi}\right)}\), with \(\psi>0\) as the precision parameter.
Suppose that \(y_{ij}\) is the continuous proportions outcome for patient \(i=1,\ldots,I\) and time point \(j=1,\ldots,J_i\). The regression model of the \(q^\text{th}\) conditional quantile of \(y_{ij}\) is written as \(\kappa_{q_{ij}} = g^{-1}(\eta_i\left(t_{j}\right)),\)
where \(\eta_{q,i}\left(t_{ij}\right) = \left(\beta_0 + b_{0_i}\right) + \left(\beta_{\text{time}} + b_{\text{time}_i} + \mathbf{z}_i^{\prime} {\boldsymbol\beta}_{\text{tx}}\right) t_{ij} + \mathbf{x}_i^{\prime} {\boldsymbol\beta}_{\text{cov}}\)
is the linear predictor corresponding to subject \(i\) at time point \(t_{ij}\).
\(g\) is the logit link function; \(\beta_0\) is the fixed intercept; \(\beta_{\text{time}}\) is the fixed coefficient for the effect of time; \({\boldsymbol\beta}_{\text{cov}}\) is a vector of fixed coefficients corresponding to the covariates in \(\mathbf{x}_i\); \(\mathbf{z}_i\) is a vector of covariates for subject \(i\) that interact with time, and \({\boldsymbol\beta}_{\text{tx}}\) is a vector of coefficients for these interactions; \(b_{0_i}\) and \(b_{\text{time}_i}\) are the random intercept and slope for subject \(i\).
In the INLA framework, the baseline hazard \(h_0\left(t\right)\) is discretised into intervals to allow flexibility in modelling the hazard over time. Specifically, for each interval \(k = 1, \ldots, K\), the hazard is assumed to be constant and is defined in terms of the RW2 process:
\[\delta_k = 2\delta_{k-1} - \delta_{k-2} + \epsilon_k,\] with \(\epsilon_k \sim N(0, \sigma^2)\). This formulation ensures a smoothing effect, reflecting the change from \(\delta_{k-1}\) to \(\delta_k\) based on the previous change (from \(\delta_{k-2}\) to \(\delta_{k-1}\)). Then \[h_0\left(t\right) = \exp(\delta_k) \quad \text{for } t \in (s_{k-1}, s_k],\] where \(\delta_k\) represents the log-baseline hazard at time \(t\) within the interval \(k\), and \((s_{k-1}, s_k]\) denotes the time interval.
The survival function for patient \(i\), denoted as \(S_i\left(t\right)\), is defined as the probability of survival beyond time \(t\), expressed as the exponential of the negative cumulative hazard function: \[S_i\left(t\right) = \exp\left(-H_i\left(t\right)\right),\] where \(H_i\left(t\right)\) is the cumulative hazard function for patient \(i\). In the context of the RW2 model, the cumulative hazard is calculated as: \[H_i\left(t\right) = \sum_{k=1}^{K\left(t\right)} \left[\int_{s_{k-1}}^{\min\left(t, s_k\right)}\exp\left(\delta_k + \phi \eta_i\left(u\right)+\mathbf{w}_i^{'}{\boldsymbol\gamma}\right)du\right],\] where \(K\left(t\right)\) is the number of intervals up to time \(t\).
You might be thinking this seems a bit complicated, but I wrote a detailed code guide explaining every detail of the implementation.
If plots appear badly sized please refresh (F5).
In the plots to follow each line has two intervals: a narrower interval representing the average subject, and a wider interval representing a random future subject.
This boring plot is just a check that the modelled hazards are indeed proportional. If the estimate lines are parallel on the log scale then all is well, if not then we messed up the modelling. Parallel lines do not imply that the true hazards are proportional.
This plot is far more interesting as it shows the modelled persistence.
Parameter | Value | Bias | Coverage |
---|---|---|---|
\(\beta_0\) | 1.5 | -0.056 | 0.933 |
\(\beta_{\text{time}}\) | -0.4 | 0.010 | 0.933 |
\(\beta_{\text{bin}}\) | 0.4 | -0.023 | 0.957 |
\(\beta_{\text{cont}}\) | 0.5 | -0.012 | 0.937 |
\(\psi\) | 0.3 | 0.007 | 0.917 |
\(\sigma_{b_0}\) | 0.2 | 0.276 | 0.920 |
\(\sigma_{b_1}\) | 0.2 | 0.267 | 0.970 |
\(\rho_{{b_0},{b_1}}\) | 0.2 | -0.211 | 0.957 |
\(\phi\) | -0.4 | -0.010 | 0.940 |
This presentation was created using the Reveal.js format in Quarto, using the RStudio IDE. Font and line colours according to UFS branding, and background image by CoPilot (DALL-E_v3) using image editor GIMP.
2024/11/21 - Joint Model