Variance Of Sample Kurtosis

2018-07-10 R

Kurtosis

First we define the sample kurtosis function, as well as a matrix version.

kurtosis1 <- function(x) {
  xstand <- x - mean(x)
  kurt <- (mean((xstand)^4))/((mean((xstand)^2))^2)
  return(kurt)
}
kurtosis <- function(X) {   apply(X,2,kurtosis1)   }

Generate samples

We select the range of sample sizes and generate a matrix of samples arranged in columns, for each size.

samplesizes <- seq(minsize<-100,maxsize<-10000,stepsize<-50)
nsizes <- length(samplesizes)
nsample.persize <- 500
samples <- vector('list',nsizes)
for (i in 1:nsizes) {
  samples[[i]] <- matrix(rnorm(samplesizes[i]*nsample.persize),samplesizes[i])
}

We used 500 samples of each size ranging from 100 to 10000 in steps of 50.

Calculate Kurtosis

We use parallel processing to speed up the computation. Set cluster size to number of available cores.

library(parallel)
cl <- makeCluster(4)
nul <- clusterExport(cl,c('kurtosis1','kurtosis'))
system.time({
outputs <- parLapply(cl,samples,kurtosis)
})

##    user  system elapsed 
##    4.83   12.77   37.06

stopCluster(cl)

Variances

We calculate the variance for each sample size and do a regression on the results.

vars <- unlist(lapply(outputs,var))
lvars <- log(vars); logn <- log(samplesizes)
summary(m1 <- lm(lvars~logn))

## 
## Call:
## lm(formula = lvars ~ logn)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.180953 -0.043097  0.003116  0.044296  0.166688 
## 
## Coefficients:
##              Estimate Std. Error t value            Pr(>|t|)    
## (Intercept)  3.141518   0.042761   73.47 <0.0000000000000002 ***
## logn        -0.996792   0.005152 -193.46 <0.0000000000000002 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.06569 on 197 degrees of freedom
## Multiple R-squared:  0.9948, Adjusted R-squared:  0.9947 
## F-statistic: 3.743e+04 on 1 and 197 DF,  p-value: < 0.00000000000000022

b0 <- exp(coef(m1)[1])
plot(samplesizes,vars,type='l',log='xy')

The regression seems good.

Conclusion

The final estimated formula is \(v\approx\) 23.1389604 \(n^{-0.9967917}\).

Thus it is clear that the formula suggested on Wikipedia is correct.

That is \(v\approx \frac{24n(n-1)^2}{(n-3)(n-2)(n+3)(n+5)}\).

simulation-study

Sean van der Merwe

Coordinator of UFS Statistical Consultation Unit

Statistician