# Tag Archives: Joint Distribution of Stock returns

The basic implication of the CAPM is that the expected excess return of an asset is linearly related to the expected excess return on the market portfolio according to the following relation:

This is simply a specific instance of a generic factor pricing model in which the factor is an excess return of a surrogate market portfolio and the test assets are all excess returns of risky assets. The betas are defined by regression coefficients

and the model states that expected returns are linear in the betas :

From the expressions above, it is clear that there are two testable implications with regard to the validity of the CAPM:

[1] All regression intercepts should be individually equal to zero

[2] All regression intercepts should be jointly equal to zero

While there are numerous ways to estimate the model and evaluate the properties of its parameters, this post simply seeks to apply the Gibbons,Ross & Shanken methodology, in both its numerical and graphical incarnations, to a subset of the data. An attempt was made to download price and return data for the constituents of the SP500 since 1995. Data availability issues however constrained the number of assets under examination to 351 in total, with 216 monthly observations across said assets (as well as the index and T-Bill rate). The previous post summarised key return and risk statistics associated with each of these 351 assets with the help of the rpanel package (for control) and the PerformanceAnalytics package (for a host of measures). To implement the GRS test, one has to ensure that the number of test assets used in the process is less than the number of return observations.

For the sake of convenience the (updated) dashboards from the previous blog post are given below.

After estimating a conventional time-series regression for each risky asset, a dashboard of residual diagnostic plots can also be helpful.

dcdc

```#Residual diag
windows()
layout(matrix(c(1,1,2,3,1,1,4,5,6,6,7,7),byrow=T,nrow=3,ncol=4))

if (interactive()) {
draw <- function(panel) {
par(mai=c(0,0.3,0.3,0.2))
plot(main=paste('Time Series Regression :',colnames(monthly.ret)[panel\$asset],'\n','Alpha= ',round(ts.list\$alphas[panel\$asset],3),'|| Beta= ',round(ts.list\$betas[panel\$asset],3)),x=exm.ret,y=ex.ret[,panel\$asset],xlab='',ylab='',cex.main=0.85,cex.lab=0.7,cex.axis=0.8,lwd=1,cex=0.75)
legend('topleft',legend=c('Actual','Fitted'),fill=c('black','red'),border.col=NA,bg=NA,cex=0.7,ncol=2)
abline(ts.list\$fit[[panel\$asset]],col='red',lwd=2)

par(mai=c(0,0.15,0.3,0.2))
qqPlot(ts.list\$fit[[panel\$asset]],xlab='',ylab='',cex.lab=0.7,cex.axis=0.8,lwd=1,cex=0.75)

par(mai=c(0,0.15,0.3,0.2))
acf(main='',ts.list\$resid[[panel\$asset]],xlab='',ylab='',,cex.lab=0.7,cex.axis=0.8,lwd=1,cex=0.75)

par(mai=c(0,0.15,0.3,0.2))

par(mai=c(0,0.15,0.3,0.2))
plot(x=ts.list\$fitted[[panel\$asset]],y=ts.list\$resid[[panel\$asset]],xlab='',ylab='',cex.lab=0.7,cex.axis=0.8,lwd=1,cex=0.75)

par(mai=c(0.3,0.3,0.3,0.2))
plot(type='l',ts.list\$resid[[panel\$asset]],xlab='',ylab='',cex.lab=0.7,cex.axis=0.8,lwd=1,cex=0.75)
legend('topright',legend=c('Residuals'),fill=c('black'),border.col=NA,bg=NA,cex=0.7,ncol=1)

gq.p <- as.numeric(gqtest(ts.list\$fit[[panel\$asset]])[4])
bp <- as.numeric(bptest(ts.list\$fit[[panel\$asset]])[4])

sw <- as.numeric(shapiro.test(ts.list\$resid[[panel\$asset]])[2])
jb <- as.numeric(jarque.bera.test(ts.list\$resid[[panel\$asset]])[3])

dw<-as.numeric(durbinWatsonTest(ts.list\$fit[[panel\$asset]])[3])

an <- c('Alpha','Beta','G-Quandt','')
an1 <- c('B-Pagan','S-Wilk','J-Bera','D-Watson')

te <- cbind(an,rbind(round(ts.list\$alphas.p[panel\$asset],3),round(ts.list\$betas.p[panel\$asset],3),round(gq.p,3),''))
te1 <-cbind(an1,rbind(round(bp,3),round(sw,3),round(jb,3),round(dw,3)))
tab <- cbind(te,te1)

par(mai=c(0.2,0,0.3,0.1))
panel
}
panel<- rp.control(asset=1)
rp.slider(panel,asset,1,(ncol(monthly.ret)), action=draw,resolution=1,showvalue=TRUE)
}

```

The residual diagnostics dashboard covers the conventional issues of [1] fitted vs actual data,[2] normality, [3] residual autocorrelation, [4] heteroskedasticity, [5] stationarity and [6] a table of p-values that are colour coded to reflect rejection (red) or non-rejection (green) of the null hypotheses associated with the named measure for the selected asset, at the 10% significance level. Asset selection is once again done via the rpanel package.

So far we have only concerned ourselves with the first testable implication of the CAPM in the context of a time series regression, namely the estimation and visualisation of residual diagnostics for each of the 351 test assets. The significance of the parameters for each model is assessed by comparing the test statistics to critical values or the p-values to chosen significance levels. For those assets that have an alpha-p-value less (greater) than 0.1, one would (not) reject the null hypothesis that their pricing error was equal to zero at the 10% level of significance.

The second testable implication of the CAPM in a time series framework relates to the condition of pricing errors (alphas) being jointly equal to zero across all test assets. Gibbons,Ross and Shanken (GRS for short) provide a useful methodology to test this condition under assumptions of residual normality,homoscedasticity and independence. The GRS test statistic I tried to replicate takes the following functional form :

It appears that this statistic can be rewritten in such a way as to provide an intuitive graphical presentation of CAPM validity.More precisely, GRS show that the test statistic can be expressed in terms of  how far inside the ex post frontier the factor return is (excess market return in the CAPM).

Disequilibrium in markets implies that prices continue adjusting until the market clears. As prices move, so will asset returns and relative market values, affecting tangency- and market portfolio weights respectively. In the ideal CAPM universe, market and tangency portfolios will eventually converge and every investor will hold the tangency portfolio. Hence for the CAPM to hold, the market portfolio surrogate used in model estimation must not deviate too far, in a statistical sense, from the tangency portfolio.This code snippet calculates the Test statistic,p-value and plots the usual frontiers,assets and portfolios.

```#Joint test (use only 200 assets because n.obs > n.assets otherwise)

t.per <- nrow(monthly.ret)
n.ass <- 200
t.term <- (t.per-n.ass-1)/n.ass

alphas <- ts.list\$alphas[1:200]
res.cov <- NULL
for(i in 1:200)
{
res.cov<-cbind(res.cov,ts.list\$resid[[i]])
}
res.cov.m <- cov(res.cov)

term <- ((1+(mean(exm.ret)/apply(exm.ret,2,sd))^2)^(-1))
a.term <- t(alphas)%*%ginv(res.cov.m)%*%(alphas)

t.stat.g <- t.term*term*a.term
grs.pval<-pf(t.stat.g,200,15,lower.tail=T)

ret.set <- t(as.matrix(colMeans(cbind(bench.ret,monthly.ret[,1:200]))*100))
cov.set <- var(cbind(bench.ret,monthly.ret[,1:200])*100)

risky.asset.data <- list()
risky.asset.data\$mean.ret <- ret.set
risky.asset.data\$cov.matrix <- cov.set
risky.asset.data\$risk.free <- mean(rf.vec)

base<-Frontiers(risky.asset.data)
Frontier.Draw(risky.asset.data,base,rainbow(200),'new',lty=1,paste('Gibbons,Ross & Shanken Interpretation','\n','GRS statistic/pvalue :',round(t.stat.g,3),'/',round(grs.pval,3),'\n','For 200 assets'))
CAL.Draw(risky.asset.data,base,'black',lty=1)

x <- seq(0,30,by=0.01)
lin <- mean(rf.vec)+((((mean(bench.ret)*100)-mean(rf.vec))/(apply(bench.ret,2,sd)*100))*x)
lines(x=x,lin,col='gold',lwd=1.5)
points(x=apply(bench.ret,2,sd)*100,y=mean(bench.ret)*100,col='black',cex=1,pch=17)
text('Market\nReturn',x=apply(bench.ret,2,sd)*100,y=mean(bench.ret)*100,col='black',cex=0.7,pos=1)

```

The CAPM will always hold if the market proxy is mean variance efficient. For this condition to hold true, the surrogate for the market portfolio should lie on the capital market line, the efficient frontier when a risk free asset is introduced alongside the collection of risky assets. Since the market portfolio is not identifiable, the CAPM cannot be really tested. The market proxy used above, monthly returns to the SP 500 index, does not include factors such as [1] real estate and [2] human capital.

In the previous post we explored the univariate distribution of daily log returns for JPM and MS and concluded non normality for both, invalidating the correlation coefficient as a good measure of dependency between these financial assets. We proceeded to compare the joint distribution of stock returns with simulated joint distributions under a normality assumption. While the bi-variate normal distribution does a decent job in explaining the core of the joint distribution of asset returns, it fails to capture tail dependencies. This observation led us to fit a semi-parametric model to the data, where the tails of the distribution are modelled using a parametric GPD distribution while the core of the joint stock distribution is modelled using a non-parametric empirical distribution. From a visual inspection of the figures we concluded that the semi-parametric approach provides a better approximation for the actually observed joint distribution of stock returns than the bi-variate normal approach. In other words, simulating data from a fitted semi-parametric model can better explain the observed joint distribution of stock returns than simulated data from a bi-variate normal distribution.

Copulas are used to describe the dependence between random variables. Copulas are popular in statistical applications as they allow one to easily model and estimate the distribution of random vectors by estimating marginals and copula separately.

A flexible way to successfully model the joint behaviour of  financial returns, after modelling the marginal distribution (as per previous post), is with copulas. Let’s first visualise the empirical joint distribution of daily log returns.

```################################################################################
# Modelling Copula (Dependence structure between marginal distributions)
# ##############################################################################

# [1] Create the empirical copula for daily returns

asset1.gpd<-pspd(ret.mat.nz[,1],semiparam.1.fit)
asset2.gpd<-pspd(ret.mat.nz[,2],semiparam.2.fit)

emp.cop.p<-pempiricalCopula(asset1.gpd,asset2.gpd,N=15)
emp.cop.d<-dempiricalCopula(asset1.gpd,asset2.gpd,N=15)

jet.colors <- colorRampPalette( c("red", "darkred") )
nbcol <- 10
color <- jet.colors(nbcol)

windows()
layout(matrix(c(1,2,3,4),nrow=2,ncol=2))
par(mai=c(0.75,0.85,0.4,0.2))
plot(cex.main=0.85,cex.lab=0.75,cex.axis=0.75,cex=0.85,asset1.gpd,asset2.gpd,xlab=paste("Univariate random variable from\n",ticker[1],"semiparametric GPD"),ylab=paste("Univariate random variable from\n",ticker[2],"semiparametric GPD"))
par(mai=c(0.75,0.85,0.3,0.2))
contour(col=color,emp.cop.p,xlab=ticker[1],ylab=ticker[2],cex.main=0.85,cex.lab=0.75,cex.axis=0.75,cex=0.85)
par(mai=c(0.75,0.75,0.4,0.2))

x <- emp.cop.d\$x
y <- emp.cop.d\$y
z <- emp.cop.d\$z
nrz <- nrow(z)
ncz <- ncol(z)

zfacet <- z[-1, -1] + z[-1, -ncz] + z[-nrz, -1] + z[-nrz, -ncz]
facetcol <- cut(zfacet, nbcol)

par(mai=c(0.75,0.75,0.4,0.2))
contour(col=color,xlab=ticker[1],ylab=ticker[2],emp.cop.d)```

Created by Pretty R at inside-R.org

Now that we have visualised the joint distribution of asset returns, let’s proceed to find the Copula that best fits this observed structure. First let’s fit elliptical copulae (Gaussian, Student T, Cauchy) to the observed data and visualise the fit in 2 and 3 dimensions.

```# [2] Fitting the empirical copula with parametric copulae

n.obs<-nrow(ret.mat.nz)
margin.data<-cbind(asset1.gpd,asset2.gpd)
u=grid2d(x=emp.cop.p\$x)

# [2.a] Elliptical copulae

Gauss.cop.fit<-ellipticalCopulaFit(margin.data[,1],margin.data[,2],type='norm')
Gauss.r<-rellipticalCopula(n.obs,rho=Gauss.cop.fit\$par,param=NULL,type='norm')
Gauss.p<-pellipticalCopula(u,rho=Gauss.cop.fit\$par,param=NULL,type='norm',output='list')
Gauss.d<-dellipticalCopula(u,rho=Gauss.cop.fit\$par,param=NULL,type='norm',output='list')

g.sim.1<-qspd(Gauss.r[,1],semiparam.1.fit)
g.sim.2<-qspd(Gauss.r[,2],semiparam.2.fit)

Ts.cop.fit<-ellipticalCopulaFit(margin.data[,1],margin.data[,2],type='t')
Ts.r<-rellipticalCopula(n.obs,rho=Ts.cop.fit\$par[1],param=Ts.cop.fit\$par[2],type='t')
Ts.p<-pellipticalCopula(u,rho=Ts.cop.fit\$par[1],param=Ts.cop.fit\$par[2],type='t',output='list')
Ts.d<-dellipticalCopula(u,rho=Ts.cop.fit\$par[1],param=Ts.cop.fit\$par[2],type='t',output='list')

t.sim.1<-qspd(Ts.r[,1],semiparam.1.fit)
t.sim.2<-qspd(Ts.r[,2],semiparam.2.fit)

Cauchy.cop.fit<-ellipticalCopulaFit(margin.data[,1],margin.data[,2],type='cauchy')
Cauchy.r<-rellipticalCopula(n.obs,rho=Cauchy.cop.fit\$par[1],param=NULL,type='cauchy')
Cauchy.p<-pellipticalCopula(u,rho=Cauchy.cop.fit\$par[1],param=NULL,type='cauchy',output='list')
Cauchy.d<-dellipticalCopula(u,rho=Cauchy.cop.fit\$par[1],param=NULL,type='cauchy',output='list')

c.sim.1<-qspd(Cauchy.r[,1],semiparam.1.fit)
c.sim.2<-qspd(Cauchy.r[,2],semiparam.2.fit)

# Setting up the plot for Elliptical copulae

windows(width=7,height=7)
layout(matrix(c(1:16),nrow=4,ncol=4))

# Empirical Plots
x <- emp.cop.d\$x
y <- emp.cop.d\$y
z <- emp.cop.d\$z
nrz <- nrow(z)
ncz <- ncol(z)

zfacet <- z[-1, -1] + z[-1, -ncz] + z[-nrz, -1] + z[-nrz, -ncz]
facetcol <- cut(zfacet, nbcol)
jet.colors <- colorRampPalette( c("red", "darkred") )
nbcol <- 50
color <- jet.colors(nbcol)

par(mai=c(0,0.1,0.3,0.1))
plot(main="Empirical",cex.main=0.85,cex.lab=0.75,cex.axis=0.75,cex=0.85,asset1.gpd,asset2.gpd,yaxt='n',xaxt='n',col="darkred")
par(mai=c(0,0.1,0.05,0.1))
contour(xaxt='n',yaxt='n',col="red",emp.cop.p,xlab=ticker[1],ylab=ticker[2],cex.main=0.85,cex.lab=0.75,cex.axis=0.75,cex=0.85)
contour(xaxt='n',yaxt='n',col='darkred',emp.cop.d)

# Fitted Plots For Gaussian
x <- Gauss.d\$x
y <- Gauss.d\$y
z <- Gauss.d\$z
nrz <- nrow(z)
ncz <- ncol(z)

zfacet <- z[-1, -1] + z[-1, -ncz] + z[-nrz, -1] + z[-nrz, -ncz]
facetcol <- cut(zfacet, nbcol)
jet.colors <- colorRampPalette( c("cyan", "darkblue") )
nbcol <- 50
color <- jet.colors(nbcol)

par(mai=c(0,0.05,0.3,0.1))
plot(main="Fitted-Gaussian",cex.main=0.85,cex.lab=0.75,cex.axis=0.75,cex=0.85,Gauss.r[,1],Gauss.r[,2],yaxt='n',xaxt='n',col="royalblue")
par(mai=c(0,0.05,0.05,0.1))
contour(xaxt='n',yaxt='n',col="royalblue",Gauss.p,xlab=ticker[1],ylab=ticker[2],cex.main=0.85,cex.lab=0.75,cex.axis=0.75,cex=0.85)
contour(xaxt='n',yaxt='n',col="royalblue",Gauss.d)

# Fitted Plots for Student T

x <- Ts.d\$x
y <- Ts.d\$y
z <- Ts.d\$z
nrz <- nrow(z)
ncz <- ncol(z)

zfacet <- z[-1, -1] + z[-1, -ncz] + z[-nrz, -1] + z[-nrz, -ncz]
facetcol <- cut(zfacet, nbcol)
jet.colors <- colorRampPalette( c("cyan", "darkblue") )
nbcol <- 50
color <- jet.colors(nbcol)

par(mai=c(0,0.05,0.3,0.1))
plot(main="Fitted-Student T",cex.main=0.85,cex.lab=0.75,cex.axis=0.75,cex=0.85,Ts.r[,1],Ts.r[,2],yaxt='n',xaxt='n',col="royalblue")
par(mai=c(0,0.05,0.05,0.1))
contour(xaxt='n',yaxt='n',col="royalblue",Ts.p,xlab=ticker[1],ylab=ticker[2],cex.main=0.85,cex.lab=0.75,cex.axis=0.75,cex=0.85)
contour(xaxt='n',yaxt='n',col="royalblue",Ts.d)

# Fitted Plots for Cauchy

x <- Cauchy.d\$x
y <- Cauchy.d\$y
z <- Cauchy.d\$z
nrz <- nrow(z)
ncz <- ncol(z)

zfacet <- z[-1, -1] + z[-1, -ncz] + z[-nrz, -1] + z[-nrz, -ncz]
facetcol <- cut(zfacet, nbcol)
jet.colors <- colorRampPalette( c("cyan", "darkblue") )
nbcol <- 50
color <- jet.colors(nbcol)

par(mai=c(0,0.05,0.3,0.1))
plot(main="Fitted-Cauchy",cex.main=0.85,cex.lab=0.75,cex.axis=0.75,cex=0.85,Cauchy.r[,1],Cauchy.r[,2],yaxt='n',xaxt='n',col="royalblue")
par(mai=c(0,0.05,0.05,0.1))
contour(xaxt='n',yaxt='n',col="royalblue",Cauchy.p,xlab=ticker[1],ylab=ticker[2],cex.main=0.85,cex.lab=0.75,cex.axis=0.75,cex=0.85)
contour(xaxt='n',yaxt='n',col="royalblue",Cauchy.d)

# comparing actual with simulated data
windows()
par(mfrow=c(2,3),mar=c(6,4,5,0.5))

plot(ylim=c(x_min,x_max),xlim=c(x_min,x_max),xlab=ret.unl[1],ylab=ret.unl[2],x=ret.mat.nz[,1],y=ret.mat.nz[,2],pch=1,col="darkred",cex.main=0.95,cex.lab=0.95,cex=0.95,cex.axis=0.95,main=paste("Empirical data with rho of\n ",emp.corr))
grid()
abline(h=0,v=0,col="black")

plot(ylim=c(x_min,x_max),xlim=c(x_min,x_max),xlab=paste("Simulated ",ret.unl[1]),ylab=paste("Simulated ",ret.unl[2]),x=simulated.ret[,1],y=simulated.ret[,2],pch=1,col="darkblue",cex.main=0.95,cex.lab=0.95,cex=0.95,cex.axis=0.95,main="Simulated data from \n Bivariate Normal Distribution")
grid()
abline(h=0,v=0,col="black")

plot(ylim=c(x_min,x_max),xlim=c(x_min,x_max),xlab=colnames(semiparametric.sim.ret)[1],ylab=colnames(semiparametric.sim.ret)[2],cex.main=1,cex=1,cex.lab=0.95,cex.axis=0.85,x=semiparametric.sim.ret[,1],y=semiparametric.sim.ret[,2],main=paste("Simulated Return Data From \n","Semiparametric GPD"),col='darkblue',pch=10)
abline(h=0,v=0)
grid()

plot(x=g.sim.1,y=g.sim.2,xlab=paste("Simulated",ret.unl[1]),ylab=paste("Simulated ",ret.unl[2]),main="Simulated data using \n Gaussian Copula",col="darkblue",pch=1,cex.main=0.95,cex.lab=0.95,cex=0.95,cex.axis=0.95)
grid()
abline(h=0,v=0,col="black")

plot(x=t.sim.1,y=t.sim.2,xlab=paste("Simulated",ret.unl[1]),ylab=paste("Simulated ",ret.unl[2]),main="Simulated data using \n Student T Copula",col="darkblue",pch=1,cex.main=0.95,cex.lab=0.95,cex=0.95,cex.axis=0.95)
grid()
abline(h=0,v=0,col="black")

plot(x=c.sim.1,y=c.sim.2,xlab=paste("Simulated",ret.unl[1]),ylab=paste("Simulated ",ret.unl[2]),main="Simulated data using \n Cauchy Copula",col="darkblue",pch=1,cex.main=0.95,cex.lab=0.95,cex=0.95,cex.axis=0.95)
grid()
abline(h=0,v=0,col="black")```

Created by Pretty R at inside-R.org

While the red plots in the figure are representations of the empirical bi-variate distribution of asset returns,  the blue plots represent the different elliptical copulae fitted to the data. Clearly, the greater the alignment between the empirical contours and the model (red vs blue lines in the midsection of the figure), the greater the fit of the model. Both the Gaussian and the Student T copulae appear to provide a decent fit to the data. Another way to visualise this is to plot the joint distribution of stock returns and compare it against simulated data under different assumptions.

The figure above provides a simple summary of the main findings so far.

[1] From an empirical plot of MS returns against JPM returns, we can deduce that the joint distribution is centered around 0 but has tail dependencies not captured by a joint normal distribution, a typical assumption imposed by financial theories on asset returns.

[2] If we had imposed a normality assumption on financial asset returns and simulated from this bi-variate distribution, we would be in the position of the second plot. Clearly, modelling the empirical joint distribution (red) using a bi-variate normal distribution understates the tail dependencies.

[3] To capture the tail dependencies while preserving the ability of the joint normal distribution to explain the core of the actual returns distribution,we fitted a semi-parametric model : GPD in the tails and empirical in the centre. This puts us in the position of the third plot, which is a bit better at explaining tail behaviour than the bi-variate normal distribution but not by much.

[4] While the GPD may capture tail behaviour of marginal distributions, the implicit copula does not capture the observed tail dependencies of the data set. Instead of relying on the implicit copula for the dependency structure, we fitted a variety of elliptical copulae to the data set, placing us in the position of plots 4-6, which are clearly an improvement on previous simulations.