A reader asked me recently why I believe that energy stock prices (e.g. XLE) are correlated with crude oil futures front-month contract (QM). Actually I don’t believe they are necessarily correlated – I only think they are “cointegrated”.
What is the difference between correlation and cointegration? If XLE and QM were really correlated, when XLE goes up one day, QM would likely go up also on the same day, and vice versa. Their daily (or weekly, or monthly) returns would have risen or fallen in synchrony. But that’s not what my analysis was about. I claim that XLE and QM are cointegrated, meaning that the two price series cannot wander off in opposite directions for very long without coming back to a mean distance eventually. But it doesn’t mean that on a daily basis the two prices have to move in synchrony at all.
Two hypothetical graphs illustrate the differences. In the first graph, stock A and stock B are correlated. You can see that their prices move in the same direction almost everyday.
Now consider stock A and stock C.
Stock C clearly doesn’t move in any correlated fashion with stock A: some days they move in same direction, other days opposite. Most days stock C doesn’t move at all! But notice that the spread in stock prices between C and A always return to about $1 after a while. This is a manifestation of cointegration between A and C. In this instance, a profitable trade would be to buy A and short C at around day 10, then exit both positions at around day 19. Another profitable trade would be to buy C and short A at around day 31, then closing out the positions around day 40.
Cointegration is the foundation upon which pair trading (“statistical arbitrage”) is built. If two stocks simply move in a correlated manner, there may never be any widening of the spread. Without a temporary widening of the spread in either direction, there is no opportunity to short (or buy) the spread, and no reason to expect the spread to revert to the mean either.
For further reading:
Alexander, Carol (2001). Market Models: A Guide to Financial Data Analysis. John Wiley & Sons.
34 comments:
Interesting post. I've found this cointegration to be between OIH and USO/XLE.
I usually like to short OIH due to its higher intra-day volatility than XLE and go long XLE as a hedge, or vice versa.
It doesn't always cointegrate like you mentioned but every now and then there is an opportunity. I.e. On short-covering days
Yes, OIH is certainly an alternative to XLE. OIH is the most liquid oil services ETF. Frankly, I don't remember the reason anymore why I chose XLE instead of OIH to do the analysis. They both cointegrate with USO equally well.
Dear Mr. Chan,
Wonderful blog you have over here. I looked for something like this for a long time.
You might also try DVN as an alternative to XLE.
Cheers,
Max
Max,
Glad you like my articles, and thanks for your suggestion.
Hope to exchange more ideas with you in the future!
Ernie
Ernie and Yaser: For my trading, I've found XLE has a strong advantage over the alternatives: There is a single-stock futures available for XLE, and using the SSF drops the margin requirement from 50% to 20%. This extra leverage is very useful in spread trading. It is difficult to capture spread profits without that leverage, due to the small size of spread changes.
Hi Ernie et al.,
Just wondering if there are any traders out there that use correlation or cointegration on an intra-day time scale to do day-trading. For example taking data samples as fast as 15 seconds, or maybe longer like every 10 minutes. Is there any useful information in time scales that small? I would think that it would depend highly on the volatility/liquidity of the underlyings so that enough margin could be made on the spread for such a strategy to be profitable. Just wondering if you have any experience or opinions on this.
Cheers,
Jack
Hi Jack,
Theoretically, cointegration is time-scale independent. So we cannot say a pair of stocks are cointegrated on a time scale of years, but not minutes. However, it is meaningful to ask what the average mean-reversion time is. I have written elsewhere on this blog (see Ornstein-Uhlenbeck formula) a good way to estimate this, and it will help you determine whether the pair of stocks is suitable for trading at the time-scale of interest.
Ernie
Hi Ernie,
I have been trading pairs in the Indian stock markets. I find your blog very much informative and educative. I really appreciate your efforts towards sharing indepth knowledge on the subject.
Can u explain the cointegration method via spreadsheet and if possible, share the spreadsheet. Appreciate if you can explain in a non-quantitative style. I want to learn interpreting the output of the cointegration test, whether it is mean-reverting or not for a given time frame.
Thanks
Bhumir
Hi Bhumir,
Thank you for your interest in my blog. Unfortunately, cointegration test cannot easily be performed on Excel. I performed mine using Matlab. If you purchase my book, you will find sample codes on how to compute this.
Ernie
Hi Ernie, I have been reading your book. I must say it's very informative and it has helped me tremendously.
One question though about LeSage's cadf function when testing for co-integration. I notice that if you reverse the order of the y and x parameters (in cadf(y,x,p,nlag)), the resulting t-statistic can be very different for the same two sets of data.
Using your Matlab sample code 7_2.m as an example, if y is GLD and x is GDX, I get a t-statistic of -3.52. If y is GDX and x is GLD, I get -4.11. So what do I make out of this? Which result should I rely on to see if there's co-integration between the two sets of data? Or should I use both results (or the average of the two) as a guideline?
Thanks
Sam
Hi Sam,
Yes, indeed the results are different depending on which series you pick as the independent variable.
My rule of thumb is to be conservative: regard a pair as cointegrating only if both t-stats meet the criterion.
Ernie
Hey Ernie, this is Peter from University of Cape Town South Africa. I am writing to ask you if you get any meaningful link if one is testing for integration if one uses correlation. i am testing integration across african markets for my thesis and have used Engel Granger cointegration test, but thought it might be nice to include a correlation matrix but dont want to look stupid. Also, just to confirm, does it matter if i only use A and independant and B as dependant and not test both ways?
Thanks
Please respond asap if possible!
Peter,
Including a correlation matrix will not convince anybody that the African markets are cointegrated. However, it might serve as an useful comparison in technique.
Indeed cointegration tests are variable-order-dependent, esp. for borderline cases. Try both orders.
Ernie
Hi, Ernie:
I have a rather simple question regarding index tracking using cointegration optimal portfolio (following an earler paper by Dunis & Ho: Cointegration portfolio of European Equities for Index Tracking) Suppose I am able to find cointegration in the following manner: ln(index)=2*ln(p1)+3*ln(p2) where p1 and p2 are the prices of constituent stocks in the index. The paper suggests using the "normalized" parameters for weights (can you please explain what normalization means in that paper?). I assume it is 2/5=0.4 and 3/5=0.6 for weights. Suppose asset 1&2 each has return of 5%, then the portfolio constructed with the 0.4, 0.6 weight would give 5%*0.4+5%*0.6=5% return. However by the original cointegration result: ln(index)=2*ln(p1)+3*ln(p2) and by first differencing it (becoming returns on both sides), the index return should be 2*10%+3*10%=50%. Definitely the portfolio is not tracking the index. I am sure there is something not right here... Thanks for your help.
Fuzhi
Hi Fuzhi,
You have to apply the normalized weights before computing returns, otherwise the two sides won't match. It would be like comparing the P&L of $1 capital with the P&L of $1M capital if you don't normalize by capital.
Ernie
Ernie:
Appreciate very much your reply. However, I am still a little confused. Could you please explain again how you would normalize the weights if this is the cointegration results you get: ln(index)=2*ln(p1)+3*ln(p2)
where "index" is the index price, "p1" and "p2" are the prices of constituent stocks in the index. Seems all are in percent return terms and have nothing to do with the amount of capital.
Thanks.
Fuzhi
Fuzhi,
The 2 and 3 represents units of capital. So clearly we need to normalize them so that both sides have the same total capital, typically 1 unit.
In any case, I dislike using logs. I prefer raw prices so that the number of shares are fixed.
Ernie
Ernie:
Thank you so much for your help.
Fuzhi
Dear Ernie, like say I manage to identify a good cointegrated pair. My question now is to work on a hedge ratio. When I regress price of A over B compared to B over A, I end up with two different hedge ratios. Which hedge ratio should I choose? As I will need to use the residual to determine a band for entry and exit. Depending on which hedge ratio I use, I end up with two different entry and exit.
Suny,
The eigenvector obtained from the Johansen test can be used to determine a unique linear combination (i.e. hedge ratio) of the 2 price series.
Ernie
Hi Erin,
As you said in one of the earlier blog here, You said S1 ~ S2 and
S2 ~ S1 both should pass co integration test.
1. Let us say S1 ~ S2 is co integrated while reverse order is not. So does it mean that such pair is not co integrated?
2. Let us say, we have 5 stock to trade with, which one I should use as independent variable and other 4 as dependent without trying so many combination.
Hi Jeet,
1) This indicates the pair is borderline cointegrating. Trade at your own risk!
2) You should use Johansen test: it will give you all good combinations of symbols with no unique "independent" variable.
Ernie
what should be the logic behind choosing the independent set of stock and dependent stock from a basket of stock?
Johnson set might give result but I am looking for logic.
Jeet,
Logic can only be found if you have a fundamental economic understanding of the relationship between the assets. For e.g. if you believe that firm A and B are both big customers of firm C, you might argue that C's price should be a dependent variable.
However, I usually do not find it important to find out why a variable is independent: it makes no difference to the trading model.
Ernie
Dear Mr.Chan,
I used your file to test ex7_2.m for GLD and GDX.
The t-statistic is -9.72, not -3.36.
What's wrong with it??
By the way, your book is very good.
Thanks.
Anon,
Did you use my data file for the test? Did you set all the parameters for the cadf test to be the same as mine?
Ernie
Dear Mr.Chan,
Only used copy and paste.(ex7-2 , jplv7 *m-files and GLD/ GLD)
parameters?
Not only use those m.file, but also need to change parameters??
Base on the result, only t-statistic is not right.
Others are similar.
I tried GLD and GLD, two the same data, t-statistic is also -9. @@||
99,
There is no need to change the input parameters to the cadf function for testing cointegration.
Are you using the same input data as I used? Have you made sure the dates of those price series are ascending (most recent data on last row)?
Ernie
Dear Chan,
hmm.... I used those GLD/ GDX files from your server.(2006/05/23~2007/11/30 data)
I tested those file to "adftest.xls", the "Dickey Fuller Test Statistic" is right.
Is my Matlab wrong? @.@||
Dear Chan,
I mailed a letter to your G-mail with all m.file and my matlab screen.
If you have time, could you help to read it?
Thanks and sorry... disturb you.
99,
I did not receive your email (I checked the spam folder too). Could u pls resend?
Ernie
I am wondering what everyone feels is the most reliable cointegration test in matlab? I have tried egci adf and jci and get widely different results. Then to make matters worse I check with catalystcorner and many pairs show significantly different results there.
cbucks,
Have you tried Johansen test?
Ernie
Cointegration is not the same as correlation.
certainly. it can be proven that pearson correlation coefficient will be close to 1 only if variance of each asset is relatively small to the variance of random walk process that generates data.
Post a Comment