Tag Archives: two pass regression

The Fama-French 3 factor model is an extension of the market model to the tune of two additional factors to explain security/portfolio returns : the size premium and the book-to-market (value) premium. The model was developed in response to prior research and empirical observations that indicated systematic outperformance of small firm / value stocks versus large firm / growth stocks and the market as a whole. From the previous post it is clear that variation in size produces a variation in average returns that is positively related to variation in market betas, the central risk-return relation posited by the CAPM. By contrast,variation in the book-to-market ratio produces a variation in average returns that is negatively aligned to variation in market betas,violating the central implication of risk-return models.

The Carhart 4 factor model extends the FF3F model with the addition of a momentum factor. The inclusion of this fourth factor is a response to studies that showed how stocks with strong past performance continue to outperform stocks with poor performance in the next period gathering an average excess return of 1% per month (Jegadeesh & Titman,1993)

In some sense, the FF3F model is a synthesis of a string of anomalies that contradict CAPM predictions. The literature surrounding CAPM implications (and more importantly their contradictions) focuses on sorting stocks according to some characteristic (size,value,financial ratios), relating this sorted set of assets with their average returns and econometrically aligning them to market risk. To put the findings in context, the following summarises one of the many literature surveys easily found on the internet.


[Survey of contradictions]

:: Earnings/Price Ratio

The earliest blow to the CAPM came from Basu(1977) who showed that stocks with high E/P ratios earned significantly higher returns than assets with low E/P ratios. Since differences in beta could not explain these return differences,the E/P effect is a direct contradiction of the CAPM.

:: Firm Size

The size effect,uncovered by Banz(1981),indicates that stocks of firms with low market capitalisations have higher average returns than large cap stocks. While smaller firms also suffer from greater market risk and hence preserve a positive risk-return relation required by theory, the beta differences are unfortunately not large enough to explain the observed return differences to conform to CAPM predictions.

:: Long-Term Return Reversals

Defining ‘losers’ as stocks that have had poor returns over the past 3-5 years and ‘winners’ as those stocks that had high returns over a similar period,Debondt and Thaler(1985) found that losers tend to have much higher average returns than winners over the next 3 to 5 years. The key question,of whether or not the superior returns associated with looser stocks are aligned with higher systematic risk, had been settled in the early 1990s by Lakonishok and Ritter (1992) who suggested that the variation in beta cannot account for this difference in average returns between losers and winners.

:: Book-to-Market Ratio

Rosenberg,Reid and Lanstein(1985) found that stocks with high book-to-market ratios have significantly higher returns than stocks with lower values for this ratio. This variable still produces significant dispersion in average returns that cannot be explained with the CAPM alone.

:: Leverage

Bhandri(1988) found that firms with high leverage (measured by the debt/equity ratio) have higher average returns than firms with less debt even after controlling for size and market beta. The increased riskiness of a highly leveraged firm’s equity should be reflected in a higher beta coefficient.

:: Momentum

Jegadeesh and Titman (1993) found that stock returns tend to exhibit short-term momentum. Momentum tends to be stronger for firms that have had poor recent performance. This pattern is opposite of that found in the long-term overreaction studies where long-term losers outperform long term winners.

This blog post simply extends the FF3F model examined previously with a momentum factor (that can be downloaded from Kenneth French’s data library for free). A different subset of the 25 size/value sorted portfolios shall be used here : Jan 1932 – Dec 2002. The code is largely the same with small amendments made to include the 4th factor in regressions and additional plots for the Carhart model.

First a collective plot of the excess returns across 25 portfolios as before but using a different subset.



Following a time series regression for the 3 models (CAPM,FF3F,Carhart), let’s summarise the estimated intercepts and goodness of fit statistics for the 25 portfolios and provide a table of R-squared measures for the FF3F and Carhart models to better observe the differences.

#Store Rsq and alpha values in matrix across models
border.names <- list(c(paste('Size',1:5)),c(paste('Value',1:5)))
Rsq.mat.capm <- NULL
Rsq.mat.ff <- NULL <- NULL
for(i in 1:5){
  Rsq.mat.capm <- rbind(Rsq.mat.capm,SZ.BM$size.excess[[i]]$capm.rsq)
	Rsq.mat.ff <- rbind(Rsq.mat.ff,SZ.BM$size.excess[[i]]$ff.rsq) <- rbind(,SZ.BM$size.excess[[i]]$car.rsq)
dimnames(Rsq.mat.capm) <- border.names
dimnames(Rsq.mat.ff) <- border.names
dimnames( <- border.names
alpha.mat.capm <- NULL
alpha.mat.ff <- NULL <- NULL
for(i in 1:5){
	alpha.mat.capm <- cbind(alpha.mat.capm,SZ.BM$size.excess[[i]]$ts.alpha)
	alpha.mat.ff <- cbind(alpha.mat.ff,SZ.BM$size.excess[[i]]$ff.alpha) <- cbind(,SZ.BM$size.excess[[i]]$car.alpha)
alpha.mat.capm <- t(alpha.mat.capm)
alpha.mat.ff <- t(alpha.mat.ff) <- t(
dimnames(alpha.mat.capm) <- border.names
dimnames(alpha.mat.ff) <- border.names
dimnames( <- border.names
#Collective plots for alphas and Rsquared
for(i in 1:5){
	plot(cex.main=0.85,cex.lab=0.35,cex.axis=0.65,main=paste('Size Quintile',i),x=1:5,y=alpha.mat.capm[i,],'b',col='darkgreen',lwd=1,xaxt='n',yaxt='n',ylab='',xlab='')
  legend(title='TS-Alphas','topleft',fill=c('darkgreen','blue','red','darkgrey'),legend=c('CAPM','FF3F','CARHART','Zero value'),ncol=1,bg='white',bty='n',cex=0.6)
for(i in 1:5){
#Tablemaker for Rsquared <- matrix(round(Rsq.mat.ff,5),ncol=5) <- apply(, 2, rev) <- cbind(rev(rownames(Rsq.mat.ff)),
		TableMaker(row.h=1,,c('Portfolios',border.names[[2]]),strip=F,strip.col=c('red','green'),col.cut=0,alpha=0.7,border.col='lightgrey',text.col='black',header.bcol='red',header.tcol='white',title='Goodness of Fit for FF3F') <- matrix(round(,5),ncol=5) <- apply(, 2, rev) <- cbind(rev(rownames(,
		TableMaker(row.h=1,,c('Portfolios',border.names[[2]]),strip=F,strip.col=c('red','green'),col.cut=0,alpha=0.7,border.col='lightgrey',text.col='black',header.bcol='red',header.tcol='white',title='Goodness of Fit for Carhart')



The first row of line plots show estimated time series intercepts for all 3 models across size quintiles. Visual inspection suggests the multifactor models to provide lower values for the alpha term than the CAPM. Ofcourse p-values have been computed too,but I have not plotted them here. The second row of line plots shows how the goodness of fit measure (R-squared) varies according to the fitted models across the 25 double sorted portfolios. While the multifactor models tend to outperform the single factor model,especially in relation to small sized portfolios (size quintile 1 for instance),the distinction between the FF3F  and Carhart models is less apparent from visual inspection,motivating a tabular representation of the same below the set of described plots. The Carhart model tends to do a bit better when it comes to smaller size and lower value portfolios.

For a high quality pdf file click here.

As before,to illustrate the failure of the CAPM in explaining double sorted portfolios (but applied to a different subset of the original data), I plot the average excess returns against CAPM beta. Interpretation is as before with a negative relation between risk and return within a size quintile and across growth/value portfolios signifying the failure of the single index model.


Finally,a set of plots charting average excess returns against model predictions are made to visualise the cross sectional fits.

#Fitted vs Actual


#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~[Size Focus]~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~#
    for(i in 1:5){
legend(title=paste('CAPM\n',paste('[Rsq',round(SZ.capm.rsq,4)*100,'%]'),'\n\nLines connect same\n Size Quintile'),'bottomright',pch=21:25,col=cols.m,legend=c('Smallest Size Portfolio',paste('Size Portfolio',2:4),'Largest Size Portfolio'),ncol=1,bg='white',bty='n',cex=0.7)
mtext(cex=0.85,adj=-0.7,side=3,paste('Actual versus Predicted Excess Returns for the period',sub.c.1[1],'-',sub.c.2[1]))

#Fitted 3 factor models
    for(i in 1:5){

legend(title=paste('Fama & French Model\n',paste('[Rsq',round(SZ.ff.rsq,4)*100,'%]'),'\n\nLines connect same\n Size Quintile'),'bottomright',pch=21:25,col=cols.m,legend=c('Smallest Size Portfolio',paste('Size Portfolio',2:4),'Largest Size Portfolio'),ncol=1,bg='white',bty='n',cex=0.7)
#mtext(font=2,line=2,cex=0.75,text='25 Size/BM sorted Portfolios : Actual average excess returns against model predictions',side=2)

#Fitted 4 factor models
    for(i in 1:5){

legend(title=paste('Carhart Model\n',paste('[Rsq',round(,4)*100,'%]'),'\n\nLines connect same\n Size Quintile'),'bottomright',pch=21:25,col=cols.m,legend=c('Smallest Size Portfolio',paste('Size Portfolio',2:4),'Largest Size Portfolio'),ncol=1,bg='white',bty='n',cex=0.7)


#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~[Value Focus]~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~#
    for(i in 1:5){
legend(title=paste('CAPM\n',paste('[Rsq',round(SZ.capm.rsq,4)*100,'%]'),'\n\nLines connect same\n Book Market Quintile'),'bottomright',pch=21:25,col=cols.m,legend=c('Lowest Value Portfolio',paste('Value Portfolio',2:4),'Highest Value Portfolio'),ncol=1,bg='white',bty='n',cex=0.7)

#Fitted 3 factor models
    for(i in 1:5){
legend(title=paste('Fama & French Model\n',paste('[Rsq',round(SZ.ff.rsq,4)*100,'%]'),'\n\nLines connect same\n Book Market Quintile'),'bottomright',pch=21:25,col=cols.m,legend=c('Lowest Value Portfolio',paste('Value Portfolio',2:4),'Highest Value Portfolio'),ncol=1,bg='white',bty='n',cex=0.7)

#Fitted 4 factor models
    for(i in 1:5){
legend(title=paste('Carhart Model\n',paste('[Rsq',round(,4)*100,'%]'),'\n\nLines connect same\n Book Market Quintile'),'bottomright',pch=21:25,col=cols.m,legend=c('Lowest Value Portfolio',paste('Value Portfolio',2:4),'Highest Value Portfolio'),ncol=1,bg='white',bty='n',cex=0.7)



For a better view click here

The basic implication of the CAPM is that the expected excess return of an asset is linearly related to the expected excess return on the market portfolio according to the following relation:


This is simply a specific instance of a generic factor pricing model in which the factor is an excess return of a surrogate market portfolio and the test assets are all excess returns of risky assets. The betas are defined by regression coefficients 


 and the model states that expected returns are linear in the betas :


From the expressions above, it is clear that there are two testable implications with regard to the validity of the CAPM:

[1] All regression intercepts should be individually equal to zero

[2] All regression intercepts should be jointly equal to zero

While there are numerous ways to estimate the model and evaluate the properties of its parameters, this post simply seeks to apply the Gibbons,Ross & Shanken methodology, in both its numerical and graphical incarnations, to a subset of the data. An attempt was made to download price and return data for the constituents of the SP500 since 1995. Data availability issues however constrained the number of assets under examination to 351 in total, with 216 monthly observations across said assets (as well as the index and T-Bill rate). The previous post summarised key return and risk statistics associated with each of these 351 assets with the help of the rpanel package (for control) and the PerformanceAnalytics package (for a host of measures). To implement the GRS test, one has to ensure that the number of test assets used in the process is less than the number of return observations.

For the sake of convenience the (updated) dashboards from the previous blog post are given below.


After estimating a conventional time-series regression for each risky asset, a dashboard of residual diagnostic plots can also be helpful.


#Residual diag

if (interactive()) {
  draw <- function(panel) {
	plot(main=paste('Time Series Regression :',colnames(monthly.ret)[panel$asset],'\n','Alpha= ',round(ts.list$alphas[panel$asset],3),'|| Beta= ',round(ts.list$betas[panel$asset],3)),x=exm.ret,y=ex.ret[,panel$asset],xlab='',ylab='',cex.main=0.85,cex.lab=0.7,cex.axis=0.8,lwd=1,cex=0.75)



	chart.Histogram(border='black',ts.list$resid[[panel$asset]][,1,drop=T],methods = c( "add.density", "add.normal"),xlab='',ylab='',cex.main=0.85,cex.lab=0.7,cex.axis=0.8,lwd=1,cex=0.75)



	gq.p <- as.numeric(gqtest(ts.list$fit[[panel$asset]])[4])
	bp <- as.numeric(bptest(ts.list$fit[[panel$asset]])[4])

	sw <- as.numeric(shapiro.test(ts.list$resid[[panel$asset]])[2])
	jb <- as.numeric(jarque.bera.test(ts.list$resid[[panel$asset]])[3])


	an <- c('Alpha','Beta','G-Quandt','')
	an1 <- c('B-Pagan','S-Wilk','J-Bera','D-Watson')

	te <- cbind(an,rbind(round(ts.list$alphas.p[panel$asset],3),round(ts.list$betas.p[panel$asset],3),round(gq.p,3),''))
	te1 <-cbind(an1,rbind(round(bp,3),round(sw,3),round(jb,3),round(dw,3)))
	tab <- cbind(te,te1)

	panel<- rp.control(asset=1)
	rp.slider(panel,asset,1,(ncol(monthly.ret)), action=draw,resolution=1,showvalue=TRUE)

The residual diagnostics dashboard covers the conventional issues of [1] fitted vs actual data,[2] normality, [3] residual autocorrelation, [4] heteroskedasticity, [5] stationarity and [6] a table of p-values that are colour coded to reflect rejection (red) or non-rejection (green) of the null hypotheses associated with the named measure for the selected asset, at the 10% significance level. Asset selection is once again done via the rpanel package.

So far we have only concerned ourselves with the first testable implication of the CAPM in the context of a time series regression, namely the estimation and visualisation of residual diagnostics for each of the 351 test assets. The significance of the parameters for each model is assessed by comparing the test statistics to critical values or the p-values to chosen significance levels. For those assets that have an alpha-p-value less (greater) than 0.1, one would (not) reject the null hypothesis that their pricing error was equal to zero at the 10% level of significance.

The second testable implication of the CAPM in a time series framework relates to the condition of pricing errors (alphas) being jointly equal to zero across all test assets. Gibbons,Ross and Shanken (GRS for short) provide a useful methodology to test this condition under assumptions of residual normality,homoscedasticity and independence. The GRS test statistic I tried to replicate takes the following functional form :


It appears that this statistic can be rewritten in such a way as to provide an intuitive graphical presentation of CAPM validity.More precisely, GRS show that the test statistic can be expressed in terms of  how far inside the ex post frontier the factor return is (excess market return in the CAPM).


Disequilibrium in markets implies that prices continue adjusting until the market clears. As prices move, so will asset returns and relative market values, affecting tangency- and market portfolio weights respectively. In the ideal CAPM universe, market and tangency portfolios will eventually converge and every investor will hold the tangency portfolio. Hence for the CAPM to hold, the market portfolio surrogate used in model estimation must not deviate too far, in a statistical sense, from the tangency portfolio.This code snippet calculates the Test statistic,p-value and plots the usual frontiers,assets and portfolios.

#Joint test (use only 200 assets because n.obs > n.assets otherwise)

t.per <- nrow(monthly.ret)
n.ass <- 200
t.term <- (t.per-n.ass-1)/n.ass

alphas <- ts.list$alphas[1:200]
res.cov <- NULL
for(i in 1:200)
res.cov.m <- cov(res.cov)

term <- ((1+(mean(exm.ret)/apply(exm.ret,2,sd))^2)^(-1))
a.term <- t(alphas)%*%ginv(res.cov.m)%*%(alphas)

t.stat.g <- t.term*term*a.term

ret.set <- t(as.matrix(colMeans(cbind(bench.ret,monthly.ret[,1:200]))*100))
cov.set <- var(cbind(bench.ret,monthly.ret[,1:200])*100) <- list()$mean.ret <- ret.set$cov.matrix <- cov.set$ <- mean(rf.vec)

 Frontier.Draw(,base,rainbow(200),'new',lty=1,paste('Gibbons,Ross & Shanken Interpretation','\n','GRS statistic/pvalue :',round(t.stat.g,3),'/',round(grs.pval,3),'\n','For 200 assets'))

x <- seq(0,30,by=0.01)
lin <- mean(rf.vec)+((((mean(bench.ret)*100)-mean(rf.vec))/(apply(bench.ret,2,sd)*100))*x)


The CAPM will always hold if the market proxy is mean variance efficient. For this condition to hold true, the surrogate for the market portfolio should lie on the capital market line, the efficient frontier when a risk free asset is introduced alongside the collection of risky assets. Since the market portfolio is not identifiable, the CAPM cannot be really tested. The market proxy used above, monthly returns to the SP 500 index, does not include factors such as [1] real estate and [2] human capital.