In this series of posts I intend to replicate some of the methodologies applied by Kritzman et al.(2012) in their paper  Regime Shifts: Implications for Dynamic Strategies – , wherein :

  • a two-state hidden Markov model is fitted to observed (or transformed) measures of financial turbulence,inflation and economic growth to partition history into meaningful regimes .
  • the in-sample performance of a variety of risk premia is probed by comparing scaled differences in respective state means across regime variables.
  • the out-of-sample performance of risk premia is examined by [1] tilting portfolios dynamically according to the likelihood of a particular regime and by [2] comparing the results of asset allocation that is conscious of regime switching versus the static alternative.

The excellent (but sadly idle) Quantivity blog has replicated some parts of the paper with much more rigour than I am capable of. This is perhaps as good a time as any to mention some differences between my post,the original paper and Quantivity’s post (QP).

One point of departure between my and QP’s post relates to the library/package used to estimate the hidden markov model. While QP used the RHmm package, I can only use the depmixS4 package (RHmm seems to be incompatible with newer versions of R…at least for me). As far as I can tell,the depmixS4 package does not support forecasting which means that I will probably not be able to take a gander at the out-of-sample issues alluded to above.

Another difference relates to the risk premia used in the original paper versus my post.

ko

Instead of using these data series (shamelessly snipped from the paper), I simply downloaded a revised edhec data set (which comes with the PerformanceAnalytics package) from the edhec-risk-homepage : http://www.edhec-risk.com/indexes/pure_style

These are simply returns on various hedge fund strategies as opposed to deliberately constructed risk premia that are expected to respond to different states across economic/financial regime variables. A snapshot of the strategies follows, along with their most recent summary results :

nmop

 


 

[Data]

In terms of the data used in this project, it is useful to make a distinction at the outset between economic regime variables which are used to partition history into different states in a meaningful way) and risk premia (simple returns in my case) which are used to assess performance across states for each economic regime variable.

 

–Economic Regime Variables–

[1] Financial Market Turbulence

The paper defines this variable as a condition in which asset prices behave in an unusual fashion given historical patterns of behaviour and proposes to measure it using the Mahalanobis distance :

cxFinancial market turbulence comes in two flavours : Equity and Foreign exchange.

The Equity Turbulence Index (ETI) was constructed by [1] downloading sp500 daily returns,[2] estimating mean and covariances using equally weighted historical returns for the past 10 years and [3] averaging the daily turbulence scores within each month. The Currency Turbulence Index (CTI) was constructed by [1] downloading daily currency returns (vs.USD) for 8 of the G-10 countries, [2] estimating mean and covariances using equally weighted historical returns for the past 3 years and [3] averaging the daily turbulence scores within each month.

The code for downloading relevant time series follows :

######################[a] Economic Regime Variables######################################

#[a.1] Equity Turbulence
symbol <- c('^GSPC')
beg <- as.Date('1970/01/01')
end <- as.Date('2014/01/01')

getSymbols(symbol,from=beg,to=end)
SPX.ret <- ROC(eval(parse(text=paste(substr(symbol,start=which(strsplit(symbol,split='^')[[1]]=='^')+1,stop=nchar(symbol)),'$GSPC.Close',sep=''))),n=1)
attr(SPX.ret,'type')='Equity'

#[a.2] Currency Turbulence
G9curr <- c('DEXUSAL','DEXUSUK','DEXCAUS','DEXNOUS','DEXJPUS','DEXUSNZ','DEXSDUS','DEXSZUS')
col.xts <- c('AUS','GBP','CAD','NOR','JAP','NZD','SWD','CHF')
inverted.idx <- grep('DEXUS',G9curr,value=F)
normal.idx <- index(G9curr)[-inverted.idx]

getSymbols(G9curr,src='FRED')
fx.xts <- do.call(merge,lapply(G9curr,function(x) eval(parse(text=x))))
names(fx.xts) <- col.xts

for(i in inverted.idx) fx.xts[,i] <- 1/eval(parse(text=G9curr[i]))
fx.xts <- ROC(fx.xts)
attr(fx.xts,'type')='Currency'

lI

[2] Inflation

Inflation was measured using the Consumer Price Index For All Urban Consumers :All Items, available from the FRED website and directly accessible via r :

#[a.3] Inflation Turbulence
getSymbols('CPIAUCNS',src='FRED')
inf.xts <- ROC(CPIAUCNS['1946-01-01::',1],n=1)
attr(inf.xts,'type')='Inflation'

l
[3] Economic Growth

Economic growth was measured using quarterly percentage growth in Real Gross Domestic Product :

#[a.4] Economic growth Turbulence
getSymbols('GDPC1',src='FRED')
ecg.xts <- ROC(GDPC1['1947-01-01::',1],n=1)
attr(ecg.xts,'type')='Growth'

ll
Now that we have downloaded the necessary time series for our regime variables,let’s actually generate the Mahalanobis distances for the ETI and CTI. As far as I can tell, Inflation and economic growth variables are already in the desired form (simple returns), so there is no need for further transformations. After storing the regime variables in a list object, I apply a custom function to each element of that list :

#[1] Store return data in list object

	regime.variables<-list(Equity=SPX.ret,Currency=fx.xts,Inflation=inf.xts,Growth=ecg.xts)

#[2] Generate Turbulence indices where required (we focus on the final complete turbulence index where available)

	Turbulence.list <- lapply(regime.variables,function(x) GenMahalanobis(x)$complete.turbulence)

l
The GenMahalanobis function detects which regime variable is being passed to it,generates daily turbulence indices where appropriate and averages the result within each month :

#########################################################################################
# Rolling Mahalanobis function
#########################################################################################

GenMahalanobis <- function(inp.ret){
#Set up variables	

	#inp.ret <- inf.xts
	ret.xts <- na.omit(inp.ret)
	n.var <- dim(ret.xts)[2]
	map <- data.frame(Type=c('Equity','Currency'),Window=c(2500,750))
	estwind <- map[map$Type==attributes(ret.xts)$type,2]
	mahalanobis.list <- NULL

#Check which type of variable we have
	cond.eval <- eval(parse(text='attributes(ret.xts)$type'))
	if(cond.eval=='Equity'|| cond.eval=='Currency'){

		#Looping across variables and windows & storing results in list elements
			for(i in 1:n.var){
				window.beg <- 1
				window.idx <- 1
				window.end <- estwind

				mahalanobis.list[[i]] <- list()
				names(mahalanobis.list)[i] <- colnames(ret.xts)[i]

				temp <- NULL
				mahalanobis.complete <- NULL

				while(window.end<=dim(ret.xts)[1]){
		  		mahalanobis.list[[i]][[window.idx]] <- list()
					names(mahalanobis.list[[i]])[[window.idx]] <- sprintf('Window%i',window.idx)

					window.data <- eval(parse(text='window.beg:window.end'))
					temp <- mahalanobis(x=ret.xts[window.data,i],center=mean(ret.xts[window.data,i]),cov=cov(ret.xts[window.data,i]))

					mahalanobis.list[[i]][[window.idx]] <- xts(temp,order.by=as.Date(names(temp)))

    			window.beg = window.beg+estwind
					window.end = window.end+estwind
    			window.idx = window.idx+1
				}
			mahalanobis.list[[i]]$xts <- do.call(rbind,mahalanobis.list[[i]])
		}

	#Collect in single xts structure
		mahalanobis.list$distances <- NULL
		for(i in 1:n.var){
			mahalanobis.list$distances <- cbind(mahalanobis.list$distances,mahalanobis.list[[i]]$xts)
			colnames(mahalanobis.list$distances)[i] <-  colnames(ret.xts)[i]
		}

	#Average across variables & calculate mean within each month
		mahalanobis.list$daily.turbulence <- NULL
		mahalanobis.list$monthly.turbulence <- NULL
		mahalanobis.list$daily.turbulence <- xts(rowMeans(mahalanobis.list$distances),order.by=index(mahalanobis.list$distances))
		mahalanobis.list$complete.turbulence <- apply.monthly(x=mahalanobis.list$daily.turbulence,FUN=mean)
  	names(mahalanobis.list$daily.turbulence) <- names (mahalanobis.list$complete.turbulence) <- attributes(ret.xts)$type

}else{

	mahalanobis.list$complete.turbulence <- xts(mahalanobis(x=ret.xts,center=mean(ret.xts),cov=cov(ret.xts)),order.by=index(ret.xts))
  names(mahalanobis.list$complete.turbulence) <- attributes(ret.xts)$type

}

#return list object
return(mahalanobis.list)
}

#########################################################################################

l

— Risk Premia —

For this category of variables,I simply downloaded the updated csv file for the edhec data set,imported it into r and stored a ggplot of each strategy’s return as an element of a list object :

#[7.1] Load and date the updated edhec dataset & store fund returns in list

orig.csv <- read.csv('edhecfinal.csv',header=T,stringsAsFactors=F)
edhec.xts <- xts(order.by=seq(as.Date('1997-01-01'),as.Date('2014-07-01'),by='1 month'),orig.csv[,-1])

n.funds <- ncol(edhec.xts)
fund.names <- names(edhec.xts)
temp.df <- data.frame(TimeIdx=index(edhec.xts),coredata(edhec.xts))

edhec.ret <- list()
	for(i in 1:n.funds){
		edhec.ret[[i]] <- list()
		names(edhec.ret)[i] <- fund.names[i]
		edhec.ret[[i]] <-EdhecRet(edhec.xts[,i])
	}

l
The EdhecRet function simply returns a ggplot of time series returns for the chosen strategy. The reason for storing these plots as elements of a list object is to simplify access to fund/regime results in the final dashboard of results. For example, to access the monthly return plot for Short.Selling strategy,edhec.ret$Short.Selling yields :

kk

 

The code for that function is simply :

#########################################################################################
# Simple Fund Returns
#########################################################################################

EdhecRet <- function(xts.ret){
	loc <- environment()
	temp.df <- NULL
	temp.df <- data.frame(TimeIdx=as.Date(index(xts.ret)),coredata(xts.ret),stringsAsFactors=F)
	pp <- ggplot(data=temp.df,aes(x=TimeIdx,y=eval(parse(text=colnames(temp.df)))),environment=loc)+
						      theme_tufte(base_size=11)+
			            geom_line(size=0.5,color='skyblue4') +
			            labs(y='Values',title='Monthly Returns',x='')+
			        		theme(legend.position = "none")
	return(pp)
}
#########################################################################################

l
Now that we have downloaded or generated the necessary data series, the next posts shall deal with [1] fitting a 2 state hidden markov model to the economic regime variables and [2] assessing in-sample performance of different strategies across the hidden states for each regime variable. Ultimately I wish to present a dashboard of results for a chosen fund/regime combination.

Advertisements

In terms of the main issue with Cochrane’s(2008) paper, we are asking : Which (and how much) of dividend growth or returns is forecastable. Given the relationship between [a] returns,[b]dividend growth and [c] dividend yields, a null hypothesis in which returns are not forecastable must also account for dividend growth that is forecastable.Hence the need to evaluate the joint distribution of return and dividend growth coefficients in simulated datasets.

So far there are three tests with respect to return predictability :

  1. the one-period return regression coefficient b(r)
  2. the one-period dividend growth regression coefficient b(d)
  3. the long-horizon return regression coefficient (b_lrun) [provides same results as long-horizon dividend growth regression but is more convenient to look at]

In terms of the results for the single period case :

  • The return forecasting coefficient,taken alone, is not significant.Evidence against the null is weak since only about 19% of simulated return forecasts exceed the sample counterpart. and only around 11% of simulated t-statistics are more extreme than that of the sample.
  • Absence of dividend growth forecastability offers much more evidence against the null than the presence of return forecastability,lowering probability values to the single digit range.

In terms of the long horizon case :

  • The long horizon return and dividend growth forecasts generate coefficients that are larger than their sample counterparts only 1-2% of the time (1.43% in my case) complementing the single period dividend growth forecast results.

 


 

[6] Once we have summarised both single-period and long horizon results ,we can look at the source of power regarding single-period dividend growth and long horizon tests, as well as examine the effect of varying dividend yield autocorrelation on short- and long run percentage values.

To investigate the source of the greater power afforded the [a] single period dividend growth estimates and [b] long horizon estimates,Cochrane (2008)  plots the joint distribution of return regression coefficients b(r) and dividend yield regression coefficient φ. I have also included the small sample distribution for the long horizon return coefficient estimates for the case where φ=0.941 and φ=0.99,the sample and fixed cases respectively.A simple call to the JointPlot() function with the fourth parameter set to ‘long horizon’ suffices :

#Long horizon results
windows()
par(mfrow=c(2,2))
JointPlot(mcs.sample,5000,initial.reg,'long horizon')
JointPlot(mcs.fixed,5000,initial.reg,'long horizon')

ll

Resulting in the following set of plots :

hj

Focusing on the joint distribution for which φ =0.941 :

  1. The triangle marks the null hypothesis and the circle marks the estimated coefficients
  2. The golden line marks the line  br = 1 − ρφ + bd_smpl. Points above and to the right are samples in which bd exceeds its sample value
  3. The black line is defined by br /(1 − ρφ) = br_smpl /(1 − ρ φ_smpl).Points above and to the right are draws in which b(r)_lrun exceeds the sample value.
  4. Percentages are probabilities of belonging to the associated quadrant.
  5. A high return coefficient by itself is not uncommon (just under 20% of the time).
  6. b(r) and φ estimates are negatively correlated across samples
  7. We never see b(r) larger than in the data ALONG with φ larger than in the data.The top right quadrant is empty.

These plots basically capture why single period dividend growth tests and long-horizon regression results give so many fewer rejections under the null than the single-period return test.While the single period return test falls in the 10-20% range probability values,the single-period dividend growth and the long-run tests fall in the 1-2% range. It is the strong negative correlation of estimates b(r) and φ across simulated samples that underlies the power of long-horizon return and dividend-growth tests.

The final part of this post shall address the effects of varying φ (the dividend yield autocorrelation) on the percentage values of simulated estimates being more extreme than their sample counterparts. A quick glance at the joint distribution plots above clearly suggest that as φ  rises from 0.941 to 0.99,the cloud of points rises and more of them cross the sample estimates of [a] the single period dividend growth variable and [b] the long horizon return variable.

In the following code we specify a vector of phis (from 0.9 to 1.01),calculate the half life of each member of that vector,create a list object to hold our monte carlo simulation results for each value of phi and save the resulting list in another .Rdata file.I only run 5000 trials to save time.As before, the MonteCarlo function() returns a list object with two elements,the first being a summary of the inputs and the second being a collection of simulation results. The mcs.list variable below only stores the second element.

 

#######Varying phi###################################################
# Vector of phis
phi.vec <- c(0.9,0.941,0.96,0.98,0.99,1.00,1.01)
half.life <- log(0.5)/log(phi.vec)
mcs.list<-list()
numT.temp <- 5000

#Save data [run once]
for(i in 1:length(phi.vec)){
	mcs.list[[i]] <- MonteCarloSimulation(num.trials=numT.temp,num.obs=100,phi=phi.vec[i],rho=0.9638,error.cov=res.cov)[[2]]
}
save(mcs.list,file='Dog2.Rdata')
#####################################################################

ll

Now that we have saved a list object with 7 elements,each corresponding to the monte carlo results of a particular value of φ ,let us load the data,and apply to each element of that list a custom function that [a] extracts coefficients/estimates whose values exceed their sample counterparts and [b] express these as a fraction of the number of simulated samples (5000).The TableMaker() function once again is used to tabulate the results.

 

################Table of phis########################################
#Load and Plot
load('Dog2.Rdata')

p.br <-unlist(lapply(mcs.list,function(g)length(which(g[,'b[r]']>betas[4]))/numT.temp*100))
p.bd <- unlist(lapply(mcs.list,function(g)length(which(g[,'b[d]']>betas[5]))/numT.temp*100))
p.lbr <- unlist(lapply(mcs.list,function(g)length(which(g[,'b[long.r]']>lr.beta))/numT.temp*100))
p.lbd <- unlist(lapply(mcs.list,function(g)length(which(g[,'b[long.d]']>(lr.beta-1)))/numT.temp*100))

tbl <- matrix(c(phi.vec,p.br,p.bd,p.lbr,p.lbd,half.life),ncol=6)
colnames(tbl) <- c('phi','b[r]','b[d]','long b[r]','long b[d]','half life')

col.names <- colnames(tbl)
reg.table = tbl

est.tab <- round(reg.table,5)
est.tab <- apply(est.tab, 2, rev)
est.tab <- cbind(c(' '),est.tab)
TableMaker(row.h=1,est.tab,c('Effects of phi',col.names),strip=F,strip.col=c('green','blue'),col.cut=0.05,alpha=0.7,border.col='lightgrey',text.col='black',header.bcol='gold',header.tcol='black',title=c('The effects of dividend yield autocorrelation\n5000 Montecarlo trials per phi'))
#####################################################################

km

 

As φ rises :

  • The probability of the single-period return coefficient exceeding its sample value is little changed.
  • The probability of the single-period dividend growth coefficient exceeding its sample value increases quite strongly.
  • The probability values of long run coefficients are identical with respect to each other and similar to the single-period dividend growth case. The latter result corroborates the overlap between the two areas carved out by the coefficients in the joint distribution plot from before.
  • To 1.01, nothing dramatic happens to the coefficients but the half life which should be infinite (as per the paper) takes on a strange number in my case (not sure what i have done wrong here).

The most important point here is that in all cases,that is to say for all reasonable value of φ ,the single-period dividend-growth and long-run regression tests have a great deal more power than the single-period return regression in rejecting the notion that returns are unpredictable.

Continuing from the previous post,we now turn to the distribution of coefficients and t-statistics associated with the return and dividend growth regressions across simulated datasets. Following the outline at the beginning of this set of posts  :

[4] Once we have collected regression results across 20000 simulated datasets,we can look at the probability of observing b(r) and b(d) pairs as well as t-statistic pairs that are more extreme than original sample estimates.

Each Monte Carlo trial involves running the single period forecasting regressions of the VAR. I use the custom JoinPlot() function to plot the joint distributions of return and dividend growth coefficients as well as respective t-statistics. Implementation is quite simple :

#Single Period results
windows()
par(mfrow=c(2,2))
JointPlot(mcs.sample,5000,initial.reg,'single period')
JointPlot(mcs.fixed,5000,initial.reg,'single period')

While the he mcs.sample captures simulated results using original sample estimates (1927-2004),the mcs.fixed case uses phi=0.99 as per the paper. This yields the following set of plots,with the first row showing the joint distributions for return/div growth coefficients and t-stats using original sample estimates in the simulation, and the second row capturing the same information but for the fixed case of phi=0.99 in the simulation :

fftAlthough we have simulated 20000 datasets,i only plot the results for 5000 simulations (for clarity and to keep the optician at bay). The black lines which meet at the red circle represent original sample estimates.The blue circles represent simulated results.The percentages in each quadrant capture the portion of simulated results more extreme than sample estimates. The green triangle marks the location of the null hypothesis (e.g. b(r)=0 and b(d) =-0.1 in the case of the original sample).

The original null hypothesis is that returns are unpredictable and that dividend growth is predictable,or more compactly that b(r)=0 and b(d)=-0.1 as per the identity. The null hypothesis is meant to help us answer the question : which of and how much of each of return and dividend growth is forecastable. As far as I understand it (and correct me if i am wrong),evidence against the null hypothesis should consist of the following (accounting for statistical significance) :

  1. return coefficient from simulated data > return coefficient from sample data.
  2. dividend growth coefficient from simulated data > dividend growth coefficient from sample data.

In the light of the above and the joint distribution plots for the case of our original sample (first row of plots) :

  1. If we ONLY look at the marginal distribution of the simulated return coefficients (br) we would find weak evidence against the null hypothesis since the monte carlo draw produces a coefficient larger than the sample estimate around 19% of the time (top right + bottom right quadrant).Coefficients larger the original sample estimate are to be seen even if the true underlying coefficient is zero.
  2. If we look at the joint distribution,most of the events with more extreme return coefficients are paired with dividend-growth forecast coefficients that are more negative than seen in the data (bottom right quadrant). In these events,both dividend growth AND returns are forecastable. Here prices are moving partially on forecasts of future dividend growth and in the right direction.
  3. Dividend growth fails to be forecastable in only 1.27% of the samples generated under the null (top right quadrant). The absence of dividend-growth forecastability offers stronger evidence against the null.

If I understand this correctly,the dividend-growth variable is the dog that did not bark, given that it is the absence of its forecastability that provides a stronger evidence against the null hypothesis than the presence of return forecastability.The absence of dividend growth forecastability can be thought of as evidence that return forecastability is economically significant.In other words,if we wanted to prove that returns are forecastable in the short run,the lack of dividend growth forecastability is more compelling than the presence of return forecastability for achieving that purpose.

 


[5] Once we have looked at the single period results,we can proceed with the long-horizon regression coefficients ( which are mercifully not new estimates, but rather calculated from the single period estimates of the original sample 1927-2004)

Long horizon estimates and tests provide a further avenue for delivering economically significant return forecasts.Conveniently,long horizon regression coefficients are not new estimates but rather derived from existing single-period results :

juu

Once the long-run return coefficient has been calculated,the corresponding dividend growth coefficient can be derived from :

kkl

These results reinforce the notion of economically significant return forecastability.The dividend yield volatility is almost exactly accounted for by return forecasts(b_longrun=1) with essentially no contribution from dividend-growth forecasts (b_longrun=0). In our case :

#Identitical div/lrun results
rho <- mcs.sample[[1]]['rho']
phi <- mcs.sample[[1]]['phi']
lr.beta <- betas[4]/(1-rho*phi)
ld.beta <- lr.beta-1
betas.lr <- c(lr.beta,ld.beta)

numT <- mcs.sample[[1]]['num.trials']
pr <- length(which(mcs.sample[[2]][,'b[long.r]']>lr.beta))/numT*100
pd <- length(which(mcs.sample[[2]][,'b[long.d]']>(ld.beta)))/numT*100
pvals.lr <- c(pr,pd)

#Table
col.names <- c('Long-run coeff','%p values')
row.names <- c('Return','Dividend growth')
reg.table <- cbind(betas.lr,pvals.lr)
est.tab <- apply(reg.table, 2, rev)
est.tab <- cbind(rev(row.names),est.tab)
windows()
TableMaker(row.h=1,est.tab,c('Results',col.names),strip=F,strip.col=c('green','blue'),col.cut=0.05,alpha=0.7,border.col='lightgrey',text.col='black',header.bcol='lightblue',header.tcol='black',title=c('Long horizon estimates'))

This results in the following table :

hhh

 

Although the t-statistics and standard errors are missing from this table (because i do not know how there are determined),they would,according to the paper, be exactly identical. Hence one advantage of using long-horizon regression coefficients is that they obviate the need to choose between return and dividend growth tests (as was the case in single period regressions). The last column shows our conventional probability values,essentially telling us : how many forecasts are greater than the sample value under the unforecastable null hypothesis.There is a 1.43% chance of seeing a long-run forecast more extreme than seen in the data. Ultimately,the long-run return AND dividend growth regressions give essentially the same strong rejections as the short-run dividend-growth regression. As a consequence, we can tabulate the small-sample distribution of the test in a conventional histogram rather than a two dimensional plot.

#histogram
windows()
par(mfrow=c(1,2))
lr<-hist(plot=FALSE,mcs.sample[[2]][,'b[long.r]'],breaks=100)
	  plot(xlim=c(-3,3),lr,freq=FALSE,col=ifelse(lr$breaks<=lr.beta,'lightgrey','red'),ylab='',xlab='long_b[r]',cex.main=0.85,cex.lab=0.75,cex.axis=0.75,cex=0.65,main=paste('phi=',phi))
	  abline(v=lr.beta,col='red')
	  text(x=2,y=0.85,paste('Data:',round(lr.beta,3),'\nProb :',pr),col='black',cex=0.6)

ld<-hist(plot=FALSE,mcs.sample[[2]][,'b[long.d]'],breaks=100)
	  plot(xlim=c(-3,3),ld,freq=FALSE,col=ifelse(ld$breaks<=(lr.beta-1),'lightgrey','red'),ylab='',xlab='long_b[d]',cex.main=0.85,cex.lab=0.75,cex.axis=0.75,cex=0.65,main=paste('phi=',phi))
	  abline(v=(lr.beta-1),col='red')
	  text(x=1,y=0.85,paste('Data:',round((lr.beta-1),3),'\n Prob :',pd),col='black',cex=0.6)

 

I think this reinforces the point of long run estimates being similar :

mnj

The previous post outlined the dataset,custom functions used throughout the code as well as some of the intuition behind Cochrane’s(2008) paper. Now we shall continue with the actual implementation of the simulation procedures and steps previously listed.

[1] Obtain original sample estimates (1927-2004) and the error covariance matrix by running the three VAR regressions. This step provides the inputs needed for the subsequent simulation.

After subsetting our original dataset to contain only data for the 1927-2004 date range across transformed variables, I ran the following single period forecasting regressions :

c3

c4

Lowercase variables denote log-transformed variables as usual.The last three equations are the familiar VAR regressions mentioned in previous posts and provide the inputs necessary for the simulation. The first three regressions are just there for completeness sake (they are run by Cochrane as well).These set of equations basically replicate some of the information contained in Tables 1&2 of the paper albeit without inflation adjustments. In the following code, I :

  • stored each of the above regressions as an element of a list object.
  • The custom RegExtractor)() function is then called to extract regression summaries from this list object using the sapply() function.
  • The output is then tabulated using the TableMaker() function.

 

######################################################################################
# Initial single period regression
######################################################################################

#Single Period regressions
initial.reg <- list(
	ret_divyield=dyn$lm(Idx.ret~lag(Div.yield,-1),data=orig.extract),
	exret_divyield=dyn$lm(Exc.ret~lag(Div.yield,-1),data=orig.extract),
	dgrowth_divyield=dyn$lm(Div.growth~lag(Div.yield,-1),data=orig.extract),
	log_ret_divyield=dyn$lm(log.ret~lag(log.div.yield,-1),data=orig.extract),
	log_divgrowth_divyield=dyn$lm(log.div.growth~lag(log.div.yield,-1),data=orig.extract),
	log_divyield_divyield=dyn$lm(log.div.yield~lag(log.div.yield,-1),data=orig.extract)
)
reg.names <- names(initial.reg)

#Extract regression values
betas <- matrix(RegExtractor(initial.reg,'est')[,2],ncol=1)
tvals <- matrix(RegExtractor(initial.reg,'tval')[,2],ncol=1)
pvals <- matrix(RegExtractor(initial.reg,'pval')[,2],ncol=1)
rsq <- matrix(RegExtractor(initial.reg,'rsq'),ncol=1)*100
res <- matrix(RegExtractor(initial.reg,'res'),ncol=6,dimnames=list(NULL,reg.names))
res.cov <- cbind(res[,'log_divgrowth_divyield'],res[,'log_divyield_divyield'])
std.vals <- sapply(initial.reg,function(v) sd(fitted(v)))*100

#Table
col.names <- c('Beta','t-stats','p-vals','R2','Std(Fitted)')
reg.table = cbind(betas,tvals,pvals,rsq,std.vals)

est.tab <- round(reg.table,5)
est.tab <- apply(est.tab, 2, rev)
est.tab <- cbind(rev(reg.names),est.tab)
TableMaker(row.h=1,est.tab,c('Single Period',col.names),strip=F,strip.col=c('green','blue'),col.cut=0.05,alpha=0.7,border.col='lightgrey',text.col='black',header.bcol='gold',header.tcol='black',title=c('Regression on lagged variables\nAnnual data for 1926-2004'))

gg

This results in the following :

t1

The inputs used in the Null hypothesis are

  • b(r)=0 and phi=0.94125.The value for phi is simply the coefficient estimate of the log_divyield_divyield regression (last row of table).
  • ρ=0.9638 (from the Cochrane paper)
  • the dividend forecast coefficient follows from the identity as b(d) = ρφ − 1 + b(r)≈ −0.1
  • sample estimates of the covariance matrix from div_growth and div_yield regression.

Such that the Null takes the form of :

ff

 


[2 & 3] Each of the 20000 monte carlo trials (50000 in the paper) generates a dataset.Once we have 20000 simulated datasets,we run each of the VAR regressions for each simulated dataset and collect regression estimates.

From the paper : I simulate 20,000 artificial data sets from each null. For φ < 1, I draw the first observation d0 − p0 from the unconditional density d0 − p0 ∼ N(0, σ2(εdp)/(1 − φ^2).For φ ≥ 1, I start at d0 − p0 = 0. I,then draw εdt
and εdpt as random normals and simulate the system forward.

  • inputs from the previous step are fed into the MonteCarloSimulation() function which calls the SimulateVAR function() to generate and return the simulated data.
  • with each simulated dataset,the MonteCarloSimulation() function continues with our conventional VAR regressions.
  • such that for each trial of the simulation we calculate and store the following : b(r),b(d),phi,t-stats for each variable,b(long.r),b(long.d)
  • ultimately,the MonteCarloSimulation() function returns a list object with 2 elements.dds
  • the first element is a summary of the inputs used/provided by the userffg
  • the second element of that list contains the simulated results with rows representing trials, and columns representing different stats (e.g. coefficients,tstats)kkk

 

In terms of the code, I have made it such that I only run the simulation once for the sample and once for the fixed case of phi=0.99 (as Cochrane does this too) and save those results, along with the sample regressions and their stats (from step 1), in an RData file.

 

# Run Montecarlo simulation
mcs.sample <- MonteCarloSimulation(num.trials=20000,num.obs=100,phi=betas[6],rho=0.9638,error.cov=res.cov)
mcs.fixed <- MonteCarloSimulation(num.trials=20000,num.obs=100,phi=0.99,rho=0.9638,error.cov=res.cov)

#Save all the sample estimates,and simulation results in a .Rdata file
save(initial.reg,betas,tvals,pvals,rsq,res.cov,mcs.sample,mcs.fixed,file='DogBark.Rdata')

ll

By saving the simulation results we are also saving time.Loading the data back into R is as simple as :

 

#Load data
load('DogBark.Rdata')

lll

So far we have run all the regressions and simulations intended and saved the resulting sample and simulation data in a convenient .RData file for simple access. The next post will examine the marginal and joint distributions of simulated parameter estimates,their respective test statistics as well as long horizon issues and the effects of varying phi on estimates.

In this post I will attempt to replicate some basic results from Cochrane’s(2008) paper ,-‘The Dog that did not bark : a defense of return predictability’ – ; as usual with the posts found on this blog,I do not claim to understand many of the nuances of the paper or the mathematics/statistical procedures behind it (in fact most of it is completely beyond me), it is nevertheless interesting to see how much one can tinker with and replicate using online resources for free.

The present value identity sketched in previous posts establishes the following relationship between [1] dividend yield ,[2] dividend growth and [3] return variables. If the state of the world were such that observed dividend yields are constant over time,then neither return nor dividend growth variables are predictable. Since the state of the world is such that observed dividend yields vary over time,it must be the case that at least one of dividend growth or return is/are predictable. Simple regressions that were run in previous posts establish several main findings (which i shall repeat lest i forget them again) :

  1. The dividend price ratio predicts returns but NOT dividend growth.
  2. Long-horizon return regressions provide stronger evidence of return predictability than single period counterparts.
  3. Returns which should not be forecastable are and dividend growth which should be forecastable isn’t.
  4. On average,price dividend ratios are moving in response to expected return news rather than cash flow news.
  5. Time varying discount rates explain virtually all the variance of dividend yield while dividend growth forecasts explain essentially none of it.

As has been repeated quite often in these posts on time series predictability: If returns are not predictable, dividend growth must be predictable, to generate the observed variation in dividend yields.The most basic (but nonetheless important) result of Cochrane’s paper that we are going to replicate is the finding that [1] the absence of dividend growth predictability gives stronger evidence than does the presence of return predictability and that [2] long-horizon return forecasts complement single period results. There will be some discrepancies between Cochrane and our results which are probably due to [1] coding errors on my part, [2] not using real (inflation adjusted) variables since they are a hassle to find online (…at least to me), and [3] the use of 20000 monte carlo trials instead of the 50000 used in the paper. Also,I wrote the code almost a year ago and have only come around to blogging about it now,hence many of the issues that may have cropped up at that time are oblivious to me at the moment. This part of the post simply outlines some of the intuitions and procedures behind Cochrane’s paper, as well as the dataset and custom functions used in the code. The subsequent post shall deal with the results obtained and perhaps compare them somewhat to those of the paper.

[Data]

The data used in the code comes from the Chicago Booth advanced investments seminar, which contains [1]Return, [2]Dividend-yield,[3] Dividend-growth and [4] T-bill rate variables. The data file I have goes from 1926 to 2011,the one from the linked site goes to 2012. We only need data from 1927 to 2004, so it doesn’t really matter.After converting the txt file to a csv file in Excel,let’s load it into R,transform some of the variables to the ones we need and subset the data to the date range used in the paper.
ll

#####################################################################################
# Load and extract annual data
#####################################################################################

#Import csv file
orig.csv  &lt;- read.csv('ps1_data.csv',header=T)
orig.ts &lt;- ts(data=orig.csv,start=c(1926),end=c(2011),frequency=1)
orig.extract &lt;- window(orig.ts,start=c(1926),end=c(2004),frequency=1)
orig.extract &lt;- cbind(orig.extract,orig.extract[,1]-orig.extract[,4],log(1+orig.extract[,1]),log(orig.extract[,2]),log(1+orig.extract[,3]))
colnames(orig.extract) &lt;- c('Idx.ret','Div.yield','Div.growth','Tbill.ret','Exc.ret','log.ret','log.div.yield','log.div.growth')

kkk

[Functions]

Functions include: SimulateVAR(),MonteCarloSimulation(),JointPlot(),RegExtractor(),TableMaker(),of which the last two are general helper functions to extract results from regression objects and display tabular information from matrices and such respectively. I don’t know why the code,when copied into wordpress,becomes oddly misaligned at times.
ll

#####################################################################################
#Extract Regression Values
#####################################################################################
RegExtractor &lt;- function(x,type)
{
 if(type=='est'){
  output &lt;- t(sapply(x,function(v)coef(summary(v))[,1]))
 }
 else if (type=='stderr'){
  output &lt;- t(sapply(x,function(v)coef(summary(v))[,2]))
 }
 else if(type=='tval'){
  output &lt;- t(sapply(x,function(v)coef(summary(v))[,3]))
 }
 else if(type=='pval'){
  output &lt;- t(sapply(x,function(v)coef(summary(v))[,4]))
 }
 else if(type=='rsq'){
  output &lt;- t(sapply(x,function(v)(summary(v))$r.squared))
 }
 else if(type=='res'){
  output &lt;- sapply(x,function(v)(resid(v)))
 }
return(output)
}
#####################################################################################


#####################################################################################
# Simulate the VAR
#####################################################################################

SimulateVAR &lt;- function(num.obs,phi,rho,error.cov){
 	dp.sd = round(sqrt(var(error.cov))[2,2],3)
	dg.sd = round(sqrt(var(error.cov))[1,1],2)
	dp.shock = rnorm(num.obs+1,mean=0,sd=dp.sd)
	dg.shock = rnorm(num.obs,mean=0,sd=dg.sd)
	sim.data = matrix(data=0,nrow=num.obs,ncol=3,dimnames=list(NULL,c('sim.divyield','sim.dgrowth','sim.returns')))

	ifelse(phi&lt;1,dp.beg&lt;-rnorm(1,mean=0,sd=dp.sd^2/(1-phi^2)),dp.beg&lt;-0)
  
  sim.data[1,'sim.divyield'] &lt;- phi*dp.beg+dp.shock[2]
  sim.data[1,'sim.dgrowth'] &lt;- ((rho*phi)-1)*dp.beg+dg.shock[1]
	
	for(i in 2:num.obs){
  	sim.data[i,'sim.divyield']&lt;- phi*sim.data[(i-1),'sim.divyield']+dp.shock[i+1]
  	sim.data[i,'sim.dgrowth'] &lt;- ((rho*phi)-1)*sim.data[(i-1),'sim.divyield']+dg.shock[i]
  }
 	sim.data[,'sim.returns'] &lt;- dg.shock-(rho*dp.shock[2:(num.obs+1)])
	return(data.frame(sim.data))
}
#####################################################################################


#####################################################################################
# MonteCarlo
#####################################################################################

MonteCarloSimulation &lt;-function(num.trials,num.obs,phi,rho,error.cov){
	coeff.names &lt;- c('b[r]','b[d]','phi')
	montecarlo.results &lt;- matrix(data=0,nrow=num.trials,ncol=8,dimnames=list(NULL,c(coeff.names,paste(coeff.names,'_tstat',sep=''),c('b[long.r]','b[long.d]'))))
  inputs &lt;- c(num.trials=num.trials,num.obs=num.obs,phi=phi,rho=rho,error.cov=error.cov)

	for(i in 1:num.trials){
		simulated.data &lt;- SimulateVAR(num.obs,phi,rho,error.cov)
		sim.reg &lt;- list(
			dyn$lm(sim.returns~lag(sim.divyield,-1),data=ts(simulated.data)),
			dyn$lm(sim.dgrowth~lag(sim.divyield,-1),data=ts(simulated.data)),
			dyn$lm(sim.divyield~lag(sim.divyield,-1),data=ts(simulated.data))
		)
	  montecarlo.results[i,1:3] &lt;- (RegExtractor(sim.reg,'est')[,2])
		montecarlo.results[i,4:6] &lt;- (RegExtractor(sim.reg,'tval')[,2])
  	montecarlo.results[i,7] &lt;- montecarlo.results[i,1]/(1-rho*montecarlo.results[i,3])
  	montecarlo.results[i,8] &lt;- montecarlo.results[i,7]-1
		
		print(paste('Monte Carlo Simulation :',i))
	}
return(list(inputs,montecarlo.results))
}
#####################################################################################


######################################################################################
# Joint distribution plot
######################################################################################

JointPlot &lt;- function(mc.object,num.points,initial.reg,type){
	
  #Define variables
	betas &lt;- matrix(RegExtractor(initial.reg,'est')[,2],ncol=1)
  tvals &lt;- matrix(RegExtractor(initial.reg,'tval')[,2],ncol=1)
  num.trials &lt;- mc.object[[1]]['num.trials']
	rho &lt;- mc.object[[1]]['rho']
	phi &lt;- betas[6]
  lr.beta &lt;- betas[4]/(1-rho*phi)
	
	if(type=='single period'){
		
		bottom.right &lt;- length(mc.object[[2]][which(mc.object[[2]][,'b[r]']&gt;betas[4] &amp; mc.object[[2]][,'b[d]']&lt;betas[5],)])/num.trials*100
	  top.right &lt;- length(mc.object[[2]][which(mc.object[[2]][,'b[r]']&gt;betas[4] &amp; mc.object[[2]][,'b[d]']&gt;betas[5],)])/num.trials*100
	  bottom.left &lt;- length(mc.object[[2]][which(mc.object[[2]][,'b[r]']&lt;betas[4] &amp; mc.object[[2]][,'b[d]']&lt;betas[5],)])/num.trials*100
	  top.left &lt;- length(mc.object[[2]][which(mc.object[[2]][,'b[r]']&lt;betas[4] &amp; mc.object[[2]][,'b[d]']&gt;betas[5],)])/num.trials*100
	
		plot(ylim=c(min(mc.object[[2]][1:num.points,'b[d]'])-0.15,max(mc.object[[2]][1:num.points,'b[d]'])+0.15),xlim=c(min(mc.object[[2]][1:num.points,'b[r]'])-0.15,max(mc.object[[2]][1:num.points,'b[r]'])+0.15),x=mc.object[[2]][1:num.points,'b[r]'],y=mc.object[[2]][1:num.points,'b[d]'],ylab='b[d]',xlab='b[r]',cex.main=0.85,cex.lab=0.75,cex.axis=0.75,cex=0.65,main=paste('Coefficients\nphi=',mc.object[[1]]['phi']),col='blue',pch=21)
  	abline(h=betas[5])
  	abline(v=betas[4])
		points(x=betas[4],y=betas[5],col='red',cex=1,pch=19)
		points(x=0,y=mc.object[[1]]['rho']*mc.object[[1]]['phi']-1+0,cex=1,pch=17,col='green')
	
		text(x=max(mc.object[[2]][1:num.points,'b[r]']),y=min(mc.object[[2]][1:num.points,'b[d]'])-0.1,paste(bottom.right,'%'),cex=0.75)
		text(x=max(mc.object[[2]][1:num.points,'b[r]']),y=max(mc.object[[2]][1:num.points,'b[d]'])+0.1,paste(top.right,'%'),cex=0.75)
		text(x=min(mc.object[[2]][1:num.points,'b[r]']),y=min(mc.object[[2]][1:num.points,'b[d]'])-0.1,paste(bottom.left,'%'),cex=0.75)
		text(x=min(mc.object[[2]][1:num.points,'b[r]']),y=max(mc.object[[2]][1:num.points,'b[d]'])+0.1,paste(top.left,'%'),cex=0.75)

		bottom.right &lt;- length(mc.object[[2]][which(mc.object[[2]][,'b[r]_tstat']&gt;tvals[4] &amp; mc.object[[2]][,'b[d]_tstat']&lt;tvals[5],)])/num.trials*100
		top.right &lt;- length(mc.object[[2]][which(mc.object[[2]][,'b[r]_tstat']&gt;tvals[4] &amp; mc.object[[2]][,'b[d]_tstat']&gt;tvals[5],)])/num.trials*100
		bottom.left &lt;- length(mc.object[[2]][which(mc.object[[2]][,'b[r]_tstat']&lt;tvals[4] &amp; mc.object[[2]][,'b[d]_tstat']&lt;tvals[5],)])/num.trials*100
		top.left &lt;- length(mc.object[[2]][which(mc.object[[2]][,'b[r]_tstat']&lt;tvals[4] &amp; mc.object[[2]][,'b[d]_tstat']&gt;tvals[5],)])/num.trials*100
	
		plot(ylim=c(min(mc.object[[2]][1:num.points,'b[d]_tstat'])-1,max(mc.object[[2]][1:num.points,'b[d]_tstat'])+1),xlim=c(min(mc.object[[2]][1:num.points,'b[r]_tstat'])-1,max(mc.object[[2]][1:num.points,'b[r]_tstat'])+1),x=mc.object[[2]][1:num.points,'b[r]_tstat'],y=mc.object[[2]][1:num.points,'b[d]_tstat'],ylab='tstat,b[d]',xlab='tstat,b[r]',cex.main=0.85,cex.lab=0.75,cex.axis=0.75,cex=0.65,main=paste('T-statistics\nphi=',mc.object[[1]]['phi']),col='blue',pch=21)
  	abline(h=tvals[5])
  	abline(v=tvals[4])
		points(x=tvals[4],y=tvals[5],col='red',cex=1,pch=19)

		text(x=max(mc.object[[2]][1:num.points,'b[r]_tstat']),y=min(mc.object[[2]][1:num.points,'b[d]_tstat'])-0.1,paste(bottom.right,'%'),cex=0.75)
		text(x=max(mc.object[[2]][1:num.points,'b[r]_tstat']),y=max(mc.object[[2]][1:num.points,'b[d]_tstat'])+0.1,paste(top.right,'%'),cex=0.75)
		text(x=min(mc.object[[2]][1:num.points,'b[r]_tstat']),y=min(mc.object[[2]][1:num.points,'b[d]_tstat'])-0.1,paste(bottom.left,'%'),cex=0.75)
		text(x=min(mc.object[[2]][1:num.points,'b[r]_tstat']),y=max(mc.object[[2]][1:num.points,'b[d]_tstat'])+0.1,paste(top.left,'%'),cex=0.75)

} else if(type=='long horizon'){
     
	  h&lt;-hist(plot=FALSE,mc.object[[2]][,'b[long.r]'],breaks=100)
	  plot(xlim=c(-3,3),h,freq=FALSE,col=ifelse(h$breaks&lt;=lr.beta,'lightgrey','red'),ylab='',xlab='b(r)/(1-rho*phi)',cex.main=0.85,cex.lab=0.75,cex.axis=0.75,cex=0.65,main=paste('phi=',mc.object[[1]]['phi']))
	  abline(v=lr.beta,col='red')
	  text(x=lr.beta+0.3,y=0.45,'Data',col='black',cex=0.75)

		bottom.right &lt;- length(mc.object[[2]][which(mc.object[[2]][,'b[r]']&gt;betas[4] &amp; mc.object[[2]][,'phi']&lt;phi,)])/num.trials*100
	  top.right &lt;- length(mc.object[[2]][which(mc.object[[2]][,'b[r]']&gt;betas[4] &amp; mc.object[[2]][,'phi']&gt;phi,)])/num.trials*100
	  bottom.left &lt;- length(mc.object[[2]][which(mc.object[[2]][,'b[r]']&lt;betas[4] &amp; mc.object[[2]][,'phi']&lt;phi,)])/num.trials*100
	  top.left &lt;- length(mc.object[[2]][which(mc.object[[2]][,'b[r]']&lt;betas[4] &amp; mc.object[[2]][,'phi']&gt;phi,)])/num.trials*100
	
    plot(ylim=c(min(mc.object[[2]][1:num.points,'phi'])-0.1,max(mc.object[[2]][1:num.points,'phi'])+0.1),xlim=c(min(mc.object[[2]][1:num.points,'b[r]'])-0.1,max(mc.object[[2]][1:num.points,'b[r]'])+0.1),x=mc.object[[2]][1:num.points,'b[r]'],y=mc.object[[2]][1:num.points,'phi'],ylab='phi',xlab='b[r]',cex.main=0.85,cex.lab=0.75,cex.axis=0.75,cex=0.65,main=paste('phi=',mc.object[[1]]['phi']),col='blue',pch=21)
    abline(h=phi)
  	abline(v=betas[4])
		points(x=betas[4],y=phi,col='red',cex=1,pch=19)
		points(x=0,y=mc.object[[1]]['phi'],cex=1,pch=17,col='green')
	  lines(x=1-rho*mc.object[[2]][,'phi']+betas[5],y=mc.object[[2]][,'phi'],lty='dotted',lwd=0.75,col='gold')
	  lines(x=betas[4]*(1-rho*mc.object[[2]][,'phi'])/(1-phi*rho),col='black',y=mc.object[[2]][,'phi'],lwd=0.75)
  
	  text(x=max(mc.object[[2]][1:num.points,'b[r]']),y=min(mc.object[[2]][1:num.points,'phi'])-0.06,paste(bottom.right,'%'),cex=0.75)
		text(x=max(mc.object[[2]][1:num.points,'b[r]']),y=max(mc.object[[2]][1:num.points,'phi'])+0.06,paste(top.right,'%'),cex=0.75)
		text(x=min(mc.object[[2]][1:num.points,'b[r]']),y=min(mc.object[[2]][1:num.points,'phi'])-0.06,paste(bottom.left,'%'),cex=0.75)
		text(x=min(mc.object[[2]][1:num.points,'b[r]']),y=max(mc.object[[2]][1:num.points,'phi'])+0.06,paste(top.left,'%'),cex=0.75)

}
}

################################################################################
#Table Maker
################################################################################

TableMaker &lt;- function(row.h,core,header,strip,strip.col,col.cut,alpha,border.col,text.col,header.bcol,header.tcol,title)
{
	rows &lt;- nrow(core)+1
	cols &lt;- ncol(core)
  colours &lt;- adjustcolor(col=strip.col,alpha.f=alpha)
  idx &lt;- 1
	
	plot.new()
  plot.window(xlim=c(0,cols*11),ylim=c(0,rows*7))
  mtext(title,side=3,cex=0.7)
	
	for(i in 1:cols){
		for(j in 1:rows){
			if(j==1){
       rect((i-1)*11,rows*7-7,11*i,rows*7,density=NA,col=header.bcol,lty=1,border=border.col)
       text(((i-1)*11+11*i)/2,((rows*7-7)+rows*7)/2,header[i],cex=0.8,col=header.tcol,font=1)
			}
			else if((j&gt;1) &amp;&amp; (row.h==0))
      {
       rect((i-1)*11,(j-1)*7-7,11*i,(j-1)*7,density=NA,col=ifelse(strip,ifelse(core[j-1,i]&lt;col.cut,colours[1],colours[2]),'white'),lwd=1.0,border=border.col)
       text(((i-1)*11+11*i)/2,(((j-1)*7-7)+(j-1)*7)/2,core[j-1,i],col='black',cex=0.75,font=1)
      }
			else if((j&gt;1) &amp;&amp; (row.h!=0) &amp;&amp; (i==row.h[idx]))
      {
       rect((i-1)*11,(j-1)*7-7,11*i,(j-1)*7,density=NA,col=border.col,lwd=1.0,border=border.col)
       text(((i-1)*11+11*i)/2,(((j-1)*7-7)+(j-1)*7)/2,core[j-1,i],col='black',cex=0.75,font=1)
			}
			else if((j&gt;1) &amp;&amp; (row.h!=0) &amp;&amp; (i!=row.h[idx]))
      {
       rect((i-1)*11,(j-1)*7-7,11*i,(j-1)*7,density=NA,col=ifelse(strip,ifelse(core[j-1,i]&lt;col.cut,colours[1],colours[2]),'white'),lwd=1.0,border=border.col)
       text(((i-1)*11+11*i)/2,(((j-1)*7-7)+(j-1)*7)/2,core[j-1,i],col='black',cex=0.75,font=1)
      }
		}
     ifelse(idx&lt;length(row.h),idx&lt;-idx+1,idx&lt;-length(row.h))

	}
}
#####################################################################################
#####################################################################################
#@@@@@@@@@@@@@@@@@@@@@@@@@@@Function Ends@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@#

ll

[Intuition & Procedure]

Cochrane starts with the conventional VAR representation that links log returns,dividend yield and dividend growth variables as seen before:

klk

The Campbell-Shiller (1988) linearisation of returns is then used to link regression coefficients and errors of the VAR such that one can infer the data,coefficients and errors for any one equation from those of the other two :

cv1

cv2

We are interested in learning something about the return predictability coefficient b(r) on the basis of what we can say about phi. In this framework,to generate a coherent null hypothesis with b(r)=0,one must assume a negative b(d) and then address the absence of this coefficient in the data. Essentially this hypothesis will be tested by simulating the VAR under the assumption of no return predictability,b(r)=0 and dividend growth predictability,b(d)=negative. The simulation procedure is as follows (taken from the original paper and other sites such as this presentation) :

[1] Obtain original sample estimates (1927-2004) and the error covariance matrix by running the three VAR regressions. This step provides the inputs needed for the subsequent simulation.

[2] Each of the 20000 monte carlo trials (50000 in the paper) generates a dataset simulated according to the following (shamelessly snipped from the presentation) :

cs

[3] Once we have 20000 simulated datasets,we run each of the VAR regressions for each simulated dataset and collect regression estimates.

[4] Once we have collected regression results across 20000 simulated datasets,we can look at the probability of observing b(r) and b(d) pairs as well as t-statistic pairs that are more extreme than original sample estimates.

[5] Once we have looked at the single period results,we can proceed with the long-horizon regression coefficients ( which are mercifully not new estimates, but rather calculated from the single period estimates of the original sample 1927-2004)

[6] Once we have summarised both single-period and long horizon results ,we can look at the source of power regarding single-period dividend growth coefficients and long horizon tests, as well as examine the effect of varying dividend yield autocorrelation on short- and long run percentage values.