# Extreme Value Theory [Introduction]

* Extreme value theory * is a branch of statistics dealing with the extreme deviations from the median of probability distributions. It seeks to assess, from a given ordered sample of a given random variable, the probability of events that are more extreme than any observed prior. Extreme value analysis is widely used in many disciplines, ranging from structural engineering, finance, earth sciences, traffic prediction, geological engineering, etc. EVT is a tool which attempts to provide us with the best possible estimate of the tail area of the distribution. However, even in the absence of useful historical data, EVT provides guidance on the kind of distribution we should select so that extreme risks can be handled prudently.

Click here for the complete Wikipedia entry

There are two broad methods of modelling extreme values. The oldest group of models are the *block maxima models*; these are models for the largest observations collected from large samples of identically distributed observations. A more modern group of models are the *peaks-over-threshold (POT) models*; these are models for all large observations which exceed a high threshold. The POT models are generally considered to be the most useful for practical purposes.

**Block Maxima Model**

**Distribution**: Generalised Extreme Value Distribution (GEV) that encompasses the Gumbel,Frechet and Weibull distributions ( corresponding to ξ = 0, ξ > 0 and ξ < 0 respectively).

**Implementation: **Data are segmented into blocks of equal length and a series of block maxima/minima are generated to which the GEV is fitted.

**Interpretation:** Model fit is assessed using graphical analysis

*Probability plot:*the fitted value of the c.d.f. is plotted against the empirical value of the c.d.f. for each data point.*Quantile plo**t*: the empirical quantile is plotted against the fitted quantile for each data point.*Return level plot:*the return level (with error bars) is plotted against the return period.- Density plot: the fitted p.d.f. is supereimposed on a histogram of the data.

fff

**Threshold Exceedence Model**

**Distribution:** Generalised Pareto Distribution (GPD)

** Implementation: **

- Choose some threshold u0 so that GPD is a good model for (X − u0|X > u0). There are 2 methods available for setting the threshold level.
is carried out prior to model estimation and is based on mean residual life plots (graphs the means of threshold exceedences across various threshold levels). The threshold u0 should be the lowest value of u above which threshold exceedence means are roughly linear.*The exploratory method*assesses the stability of parameter estimates for models fit across a range of thresholds**The other method** - Fit the GPD to the observed excesses x − u0.
- Use the fitted GPD, together with some model for the rate of exceedances X > u0, to provide estimates for return levels.

The 2 libraries that deal with Extreme Value Theory are **fExtremes** and **extRemes. **I rely on the former package for this code. Other libraries used are

ggg

Let’s simulate from the normal ,GEV and GPD distributions and visualise the differences.

windows() par(mfrow=c(2,3)) n<-NULL n[[1]]<-rnorm(n=500000,mean=0,sd=1) plot(cex.main=0.85,xlab='',density(n[[1]]),main="Normal Distribution",cex=0.75,col="blue",lwd=2) grid() for(i in 2:4){ n[[i]]<-rnorm(n=500000,mean=i-0.5,sd=i/2+0.05) lines(density(n[[i]]),col=i,lwd=1.5) } r<-NULL r[[1]]=rgev(n=100000,xi=0,mu=0.5,beta=2) plot(cex.main=0.85,xlab='',density(r[[1]]),col='blue',main="Gumbel Distribution",lwd=2,cex=0.75) grid() for(i in 2:4){ r[[i]]=rgev(n=100000,xi=0,mu=i-0.5,beta=i+1) lines(density(r[[i]]),col=i,lwd=1.5) } f<-NULL f[[1]]=rgev(n=100000,xi=0.05,mu=0,beta=0.2) plot(cex.main=0.85,xlab='',density(f[[1]]),col='blue',main="Frechet Distribution",lwd=2,cex=0.75) grid() for(i in 2:4){ f[[i]]=rgev(n=100000,xi=0.05,mu=i-0.5,beta=i/10+0.15) lines(density(f[[i]]),col=i,lwd=1.5) } w<-NULL w[[1]]=rgev(n=100000,xi=-0.3,mu=0,beta=2) plot(xlab='',density(w[[1]]),col='blue',main="Weibull Distribution",lwd=2,cex=0.75,cex.main=0.85) grid() for(i in 2:4){ w[[i]]=rgev(n=100000,xi=-0.3,mu=i-0.5,beta=i+1) lines(density(w[[i]]),col=i,lwd=1.5) } g<-NULL g[[1]]=rgpd(n=100000,xi=0,mu=0.5,beta=1) plot(xlab='',density(g[[1]]),col='blue',main="GPD Distribution",lwd=2,cex=0.75,cex.main=0.85) grid() for(i in 2:4){ g[[i]]=rgpd(n=100000,xi=0,mu=i/2+0.8,beta=i) lines(density(g[[i]]),col=i,lwd=1.5) } plot(xlab='',density(n[[1]]),col='blue',main="All Distributions",lwd=2,cex=0.75,cex.main=0.85,xlim=c(-5,20)) lines(density(g[[1]]),col=1,lwd=1.5) lines(density(w[[1]]),col=2,lwd=1.5) lines(density(f[[1]]),col=3,lwd=1.5) lines(density(r[[1]]),col=4,lwd=1.5)