This work was suggested by Prof Jan Beirlant of KU Leuven, and extends his recent work on this topic.
Sometimes you might have reason to believe that the largest values in a sample are discarded or distorted and wish to adjust for this
This creates distortion at the end of the tail of the distribution
Let us have a look at the tail of a Pareto sample via the Pareto QQ Plot:
It is clear that we have a good fit to the top part of the observations
Consider now the case where the top values are missing. We explicitly exclude the top 10 values from the sample:
Now we observe a rather distorted plot
Assume that the underlying distribution satisfies the max-domain of attraction condition, i.e. assume that the maximum of independent and identically distributed observations \(X_1, X_2, \ldots, X_n\) can be approximated by the generalized extreme value distribution. Mathematically: as \(n\to\infty\)
\[ \mathbb{P}\left( a_n^{-1} \left(\max\limits_{i=1,\ldots,n}X_i -b_n \right) \leq y \right) \hspace{2mm} \longrightarrow \hspace{2mm} G_{\gamma} (y) = \exp \left( - (1 + \gamma y)^{-1/\gamma}\right) \hspace{5mm} \text{ for } 1+\gamma\, y>0 \]
where \(b_n \in \mathbb{R}\), \(a_n >0\) and \(\gamma \in \mathbb{R}\) are the location, scale and shape parameters, respectively. The EVI \(\gamma\) is a measure of the tail-heaviness of the distribution of \(X\) with a larger value of \(\gamma\) implying a heavier tail of \(F\).
First define
\[V_{j,n} := \log \frac{X_{(n-j+1)}}{X_{(n-j)}} = \log X_{(n-j+1)} - \log X_{(n-j)},\ \ j=1, \ldots,k,\]
then, if \(\gamma\) (EVI) \(>0\), and \(m>0\) quantifies the missing information,
\[V_{j,n}\sim Exp\left(\frac{j+m}{\gamma}\right)\]
The log-likelihood function is then given by
\[ \ell(\gamma,m) = \sum_{j=1}^k \log (j+m) - k\log \gamma - \left(\frac{j+m}{\gamma}\right)v_{j,n} \]
The standard spacings can be generalised for any reasonable EVI using the exponential regression model from Matthys and Beirlant (2003). First define
\[W_{j,n} := \log \frac{X_{(n-j+1)} - X_{(n-k)}}{X_{(n-j)} - X_{(n-k)}} \ j=1, \ldots,k-1,\]
and \(k_{\gamma}(u)= \int_u^1 v^{\gamma -1}dv = \frac{1-u^{\gamma}}{\gamma}\), then
\[W_{j,n}\sim Exp\left((j+m)k_{\gamma}\left( {j+m \over k+m}\right)\right)\]
The log-likelihood function is then given by
\[ \ell(\gamma,m) = \sum_{j=1}^k \log (j+m) + \sum_{j=1}^k \log k_{\gamma}\left( {j+m \over k+m}\right) - \sum_{j=1}^k (j+m) k_{\gamma}\left( {j+m \over k+m}\right) w_{j,n} \]
This approach needs a much larger amount of information to be accurate though.
Since this is a simulated sample we can use the true values of the parameters and see
What if we use a more general distribution that isn’t Pareto all the way, but has Pareto tails?
But now we have less information with which to address it as we can’t use the whole sample
If you’re far enough in the tail and still have enough to work with
Pulling \(m\) towards zero in a systematic way helps the estimation
For the largest observed value \((j=1)\), let \(g(v)=-\log f(v) = -\log (1+m) + k\log \gamma + \left(\frac{1+m}{\gamma}\right)v\), then
\[\begin{aligned} \frac{\partial g}{\partial m} &= -(1+m)^{-1} + v\gamma^{-1} \\ \frac{\partial^2 g}{\partial m^2} &= (1+m)^{-2} \\ E_v\left[\frac{\partial^2 g}{\partial m^2}\right] &= (1+m)^{-2} \end{aligned}\]
So the log Jeffreys prior on \(m\) alone would be \(-\log (1+m)\), which is the prior I suggested with \(\lambda=1\) and \(a=1=j\) 😀
\[\begin{aligned} \frac{\partial^2 g}{\partial m\partial \gamma} &= -\gamma^{-2} \\ \frac{\partial g}{\partial \gamma} &= k\gamma^{-1} - (1+m)v\gamma^{-2} \\ \frac{\partial^2 g}{\partial \gamma^2} &= -k\gamma^{-2} + 2(1+m)v\gamma^{-3} \\ \end{aligned}\]
Giving an independence Jeffreys log prior of \(-\log (1+m) -\log\gamma + c_2\)
and a joint Jeffreys prior of
\[0.5\log\left[(1+m)^{-2}(2-k) - \gamma^{-2}\right]-\log\gamma + c_3\]
\[\begin{aligned} V_{j,n} &\sim Exp\left(\lambda_j\right) \\ \lambda_j &= (j+m)\gamma^{-1} \\ \gamma &\sim U(0, 1) \\ \log\pi(m) &= -\lambda m^a + c\text{ OR } -\lambda\log(m+a) + c \\ \text{where }\lambda,a &>0\text{ are prior hyper-parameters } \\ \text{and }c&\text{ is an unknown constant} \end{aligned}\]
The model is implemented using the STAN interface via rstan.
base_scenarios <- rbind(
expand.grid(
Distribution = c("GPD"), # vs c("Pareto", "Frechet"),
EVI_true = c(0.2, 0, -0.2),
m_true = c(5, 10),
k = c(100, 200, 300),
N = 500,
lambda = c(0.4, 0.2, 0.1, 0.05, 0.02, 0.01),
model = "Exponential decay", # vs "Inverse decay",
a = c(0.5, 1, 2),
spacings = "Generalised" # vs "Standard"
),
expand.grid(
...
)
)
Each of these scenarios is then repeated, say 1000 samples per scenario.
For brevity we show the results for the generalised spacings only.
Standard spacings are more accurate (require that you 100% know the EVI is positive), but the patterns are similar.
For this problem we recommend either \(\log\pi(m) = -0.01 m^2\) (smallest absolute error), or \(-0.1m\) (the one in the source paper), or the Jeffreys prior (most reliable).
Thank you for your time and attention.
This presentation was created using the Reveal.js format in Quarto, using the RStudio IDE. Font and line colours according to UFS branding, and background image by Midjouney using image editor GIMP.
2023/11/29 - Missing Extremes