Extreme Value Theory [Introduction]

Extreme value theory  is a branch of statistics dealing with the extreme deviations from the median of probability distributions. It seeks to assess, from a given ordered sample of a given random variable, the probability of events that are more extreme than any observed prior. Extreme value analysis is widely used in many disciplines, ranging from structural engineering, finance, earth sciences, traffic prediction, geological engineering, etc. EVT is a tool which attempts to provide us with the best possible estimate of the tail area of the distribution. However, even in the absence of useful historical data, EVT provides guidance on the kind of distribution we should select so that extreme risks can be handled prudently.

Click here for the complete Wikipedia entry

There are two broad methods of modelling extreme values. The oldest group of models are the block maxima models;  these are models for the largest observations collected from large samples of identically distributed observations. A more modern group of models are the peaks-over-threshold (POT) models; these are models for all large observations which exceed a high threshold. The POT models are generally considered to be the most useful for practical purposes.

  

Block Maxima Model

Distribution: Generalised Extreme Value Distribution (GEV) that encompasses the Gumbel,Frechet and Weibull distributions ( corresponding to ξ = 0,  ξ > 0  and  ξ  < 0 respectively).

Implementation: Data are segmented into blocks of equal length and a series of block maxima/minima are generated to which the GEV is fitted.

Interpretation: Model fit is assessed using graphical analysis

  1. Probability plot: the fitted value of the c.d.f. is plotted against the empirical value of the c.d.f. for each data point.
  2. Quantile plot: the empirical quantile is plotted against the fitted quantile for each data point.
  3. Return level plot: the return level (with error bars) is plotted against the return period. 
  4. Density plot: the fitted p.d.f. is supereimposed on a histogram of the data. 

fff  

 

Threshold Exceedence Model

Distribution: Generalised Pareto Distribution (GPD)

 Implementation: 

  1. Choose some threshold u0 so that GPD is a good model for (X − u0|X > u0). There are 2 methods available for setting the threshold level. The exploratory method is carried out prior to model estimation and is based on mean residual life plots (graphs the means of threshold exceedences across various threshold levels). The threshold u0 should be the lowest value of u above which threshold exceedence means are roughly linear. The other method assesses the stability of parameter estimates for models fit across a range of thresholds
  2. Fit the GPD to the observed excesses x − u0.
  3. Use the fitted GPD, together with some model for the rate of exceedances X > u0, to provide estimates for return levels.
                                                                                                                                                                                                                                                  fff   

The 2 libraries that deal with Extreme Value Theory are fExtremes and extRemes. I rely on the former package for this code. Other libraries used are

setwd("C:/University of Warwick/R experiments/Extreme Value Theory")
library(fExtremes)
library(timeSeries)
library(quantmod)

ggg

Let’s simulate from the normal ,GEV and GPD distributions and visualise the differences.

windows()
par(mfrow=c(2,3))
n<-NULL
n[[1]]<-rnorm(n=500000,mean=0,sd=1)
plot(cex.main=0.85,xlab='',density(n[[1]]),main="Normal Distribution",cex=0.75,col="blue",lwd=2)
grid()
for(i in 2:4){
  n[[i]]<-rnorm(n=500000,mean=i-0.5,sd=i/2+0.05)
  lines(density(n[[i]]),col=i,lwd=1.5)
}

r<-NULL
r[[1]]=rgev(n=100000,xi=0,mu=0.5,beta=2)
plot(cex.main=0.85,xlab='',density(r[[1]]),col='blue',main="Gumbel Distribution",lwd=2,cex=0.75)
grid()
for(i in 2:4){
  r[[i]]=rgev(n=100000,xi=0,mu=i-0.5,beta=i+1)
  lines(density(r[[i]]),col=i,lwd=1.5)
}

f<-NULL
f[[1]]=rgev(n=100000,xi=0.05,mu=0,beta=0.2)
plot(cex.main=0.85,xlab='',density(f[[1]]),col='blue',main="Frechet Distribution",lwd=2,cex=0.75)
grid()
for(i in 2:4){
  f[[i]]=rgev(n=100000,xi=0.05,mu=i-0.5,beta=i/10+0.15)
  lines(density(f[[i]]),col=i,lwd=1.5)
}

w<-NULL
w[[1]]=rgev(n=100000,xi=-0.3,mu=0,beta=2)
plot(xlab='',density(w[[1]]),col='blue',main="Weibull Distribution",lwd=2,cex=0.75,cex.main=0.85)
grid()
for(i in 2:4){
  w[[i]]=rgev(n=100000,xi=-0.3,mu=i-0.5,beta=i+1)
  lines(density(w[[i]]),col=i,lwd=1.5)
}

g<-NULL
g[[1]]=rgpd(n=100000,xi=0,mu=0.5,beta=1)
plot(xlab='',density(g[[1]]),col='blue',main="GPD Distribution",lwd=2,cex=0.75,cex.main=0.85)
grid()
for(i in 2:4){
  g[[i]]=rgpd(n=100000,xi=0,mu=i/2+0.8,beta=i)
  lines(density(g[[i]]),col=i,lwd=1.5)
}

plot(xlab='',density(n[[1]]),col='blue',main="All Distributions",lwd=2,cex=0.75,cex.main=0.85,xlim=c(-5,20))
lines(density(g[[1]]),col=1,lwd=1.5)
lines(density(w[[1]]),col=2,lwd=1.5)
lines(density(f[[1]]),col=3,lwd=1.5)
lines(density(r[[1]]),col=4,lwd=1.5)

Created by Pretty R at inside-R.org

Leave a comment