Sean van der Merwe
The short link to open this interactive presentation on your own device and follow along is: https://seanvdm.co.za/present
Replace assumption of normality with skew-t
Yes, I have regrets
Burger, D.A., Schall, R. & van der Merwe, S. A robust method for the assessment of average bioequivalence in the presence of outliers and skewness. Pharm Res (2021). https://doi.org/10.1007/s11095-021-03110-z
Divan analysed 130 clinical trials, and did a large simulation study, and concluded:
“The robust methodology is needed in a small but non-negligible proportion of bioequivalence trials: the robust method safeguards against occasional problematic outlier cases.”
Assuming we have a good regression model, except for those pesky outliers,
We want the best of both worlds:
There are many variations of skew-t distributions. My favourite is the simplest one:
Fernandez, C., & Mark F. J. Steel. (1998). On Bayesian Modeling of Fat Tails and Skewness. Journal of the American Statistical Association, 93(441), 359–371. https://doi.org/10.2307/2669632
\[f_t(w) = \frac{\Gamma\left(\frac{\nu+1}{2}\right)}{\sqrt{\nu\pi}\ \Gamma\left(\frac{\nu}{2}\right)} \left(1 + \frac{w^2}{\nu} \right)^{-\frac{\nu+1}{2}}\]
functions { # Sean van der Merwe, UFS
real skewt_lpdf(real y, real nu, real mu, real s, real xi) {
real z;
if (y >= mu) {
z = (y - mu) / xi;
} else {
z = (y - mu) * xi;
}
return student_t_lpdf(z | nu, 0, s) + log(2) - log(xi + 1/xi);
}
}
data {
int<lower=0> n; // number of observations
vector[n] x; // observations
}
parameters {
real<lower=0.2> v; // degrees of freedom
real<lower=0> s; // scale parameter
real mu; // location parameter
real<lower=0> xi; // skewness parameter
}
model {
target += -2*log(s) + log(v) - 3*log(v + 0.75) + log(xi) - 2*log(xi^2+1);
x ~ skewt(v, mu, s, xi);
}
The densities under consideration are all location-scale, which has many advantages.
\[y_i\sim i.i.d.\ \text{skew-t}(\nu, \mu, \sigma, \xi)\]
becomes
\[y_i\sim \text{skew-t}(\nu, \mu_i, \sigma, \xi)\]
where \(\mu_i=f(\mathbf{x}_i)\).
Probability of plant making it worse \(\approx\) 0, and predicted difference \(\approx\) -18.9.
Probability of plant making it worse \(\approx\) 0.129, and predicted difference \(\approx\) 15.5.
Probability of plant making it worse \(\approx\) 0, and predicted difference \(\approx\) 135.3.
They need to compare the effectiveness of control measures on wild stinkbugs. Wild stinkbugs (being wild and all) are very difficult to find, control, and study. The only data I received was how many bugs died during each of a few days.
By using the interval during which each bug died as an interval censored observation, we can fit lifetime distributions and compare treatments.
I was asked to analyse a time series with fun features, like: missing values, censoring (2+ types), outliers, and of course dependencies (which had the same issues).
Yes, this is possible:
I needed to fit curves with specific formulas:
When meaningful prior information is available then it must be incorporated. Doing so can help avoid silly conclusions in many real-world areas.
When meaningful prior information is not available, as is usually the case, then we can use objective or vague priors.
But what are good objective or vague priors for this skew-t?
We have a good idea of what are good priors for the symmetric t thanks to UFS researchers Abrie van der Merwe, Michael von Maltitz, Johan Meyer, and Piet Groenewald.
We focus on the 3 most relevant of the 6 they investigated:
Number 2 is their proposed choice (PM, reference prior), while 5 (exponential) and 6 (Juarez and Steel, 2010) are the most popular otherwise (although 6 is often applied with different \(d\) values, like 0.75, and 5 is often applied with different \(\xi\) values, like 1 or 0.01).
Relative root median square error of the posterior median
Now we consider the coverage of the symmetric 95% interval (2.5% to 97.5%), as well as the median interval width of this interval.
But does the same apply for a skew t?
Spoiler: yes it does 😀
What is a good prior for the skewness parameter?
Before we continue, let us try to relate the skewness parameter to the empirical skewness.
We can simulate a sample from this skew-t density and then calculate the skewness of the sample.
# Skew-t simulation function by Sean van der Merwe, based on code of the 'skewt'
# package, for the variant defined by Fernandez and Steel (1998)
rskt <- function (n, nu, xi=1, mu=0, s=1) {
p <- runif(n)
result <- rep(NA_real_, n)
probzero <- 1/(xi^2 + 1)
pos <- p >= probzero
result[!pos] <- 1/xi * qt(((xi^2 + 1) * p[!pos])/2, nu)
result[pos] <- xi * qt((1 + 1/xi^2)/2 * (p[pos] - probzero[pos]) + 1/2, nu)
result * s + mu
}
# Sample skewness function by Sean van der Merwe
skewness <- function(x) {
stdx <- x - mean(x)
n <- length(x)
(mean(stdx^3) / ((mean(stdx^2))^1.5)) * sqrt(n * (n - 1))/(n - 2)
}
Since this transformed scale seems linearly related to the sample skewness of samples from our density of choice, it makes more sense to work on this scale.
First, we derive a new prior on this scale:
Second, we will also calculate errors on this scale, for ease of interpretation.
Consider some options:
First we implement the simulation study on 1000 samples of size 30 from a standard t density (no skewness); and do this for each of the \(\nu\) options from 1 to 24.
Now we make \(\xi=4\):
Addressing any of these requires being able to evaluate probabilities of the density itself.
Fernandez and Steel (1998) define the density as follows:
\[f(z|\xi)=\frac{2\xi}{\xi^2+1}\left[ f_t(z\xi)I_{(-\infty,0]}(z) + f_t(z/\xi)I_{[0,\infty)]}(z)\right]\]
Thus, the distribution function must be obtained by integrating over this function.
When \(z\leq 0\) then the distribution function integration is simple and just a scaled version of the Student-t CDF results:
\[F(z|\xi) = \frac{ 2 {\xi} }{\xi^2+1} \int_{-\infty}^z f_t(u\xi) du = \frac{2}{\xi^2+1} F_t(z\xi) \]
And when \(z\geq 0\) then the survival function just a scaled version of the Student-t survival results:
\[\bar{F}(z|\xi) = \frac{2\xi}{\xi^2 + 1}\ \int_z^{\infty} f_t(u/\xi) du = \frac{2\xi^2}{\xi^2+1} \bar{F}_t(z/\xi) \]
For the CDF above the mean and the survival function below the mean we did the far more complicated integrations just to be sure that we do in fact get to the correct easy results.
Distribution function above the mean:
\[F(z|\xi) = 1 - \frac{2\xi^2\bar{F}_t(z/\xi)}{\xi^2+1}\]
Survival function below the mean:
\[\bar{F}(z|\xi) = 1 - \frac{2 F_t(z\xi)}{\xi^2+1}\]
Thank you for your time and attention.
I really hope I could inspire some people, or at least broaden minds as to what is possible.
This is the prior I’m recommending:
\[\begin{aligned}\log\pi(\mu,\sigma,\nu,\xi) = & -2\log(\sigma) + \log(\nu) - 3\log(\nu + 0.75) \\ & + \log(\xi) - 2\log(\xi^2+1) + constants\end{aligned}\]
2022/12/01 - Robust inference