Reversion to the mean, regression to the mean – is there a difference and who cares? Not everyone agrees that there is a difference – “reversion to the mean, also called regression to the mean” is a phrase frequently found on the internet.
The answer is that there are two different concepts involved and that people get the two mixed and commit serious errors as a result. So everyone who uses the concepts should care.
The reason is that one concept is a law, one a model. If you apply a model thinking that it is a law then you can go wrong. And if you don’t think that the law applies to you when it does then you can also go wrong.
Regression to the mean (more correctly called regression towards the mean) is the law. It has a strict statistical definition in terms of conditional expectations and is easily defined. Interestingly, Francis Galton introduced the term in the 19th century but first he called it reversion before calling it regression. He also called it “regression to mediocrity” but he was referring to the mean. Regression is only a law in probability but still a law nevertheless.
Reversion to the mean is usually defined more loosely but we can tighten it up by defining it to be any tendency to move towards the mean that is not due to regression. So what remains now is to define regression and to illustrate the concepts.
Galton noticed that the offspring of sweet-peas tended to revert to the mean size when measuring the diameters of the seeds. Large seeds had offspring that tended to be smaller and small seeds had offspring that tended to be larger. These are for purely statistical reasons and that is the essence of regression. The statistical effect can be quantified and thus regression is a statistical law.
Here is an example. In this distribution suppose you sample a point and it falls where the red region and the yellow region meet. If you sample another point is it more likely to be closer to the mean (yellow region) or further from the mean (red region)? Obviously it is more likely to be closer to the mean because there is more probability there. That is regression to the mean. It says that probabilistically extreme values are more likely to be followed by probabilistically less extreme values.
Mathematically this is expressed as
which says that on average the new value will fall between the mean and the old value.
Reversion to the mean says that extreme values are more likely to be followed by less extreme values. Note that the word probabilistically has been removed. Reversion is not about probabilistic extremes. Because of this it is not a law. It is a model. Unfortunately it seems that people regard it as a law. They say “what goes up must come down” therefore reversion to the mean must occur. There is no such law (it’s not even true for gravity).
As an example, consider P/E ratios in a price bubble where the ratios are above their long term mean. People say things like “due to mean reversion the P/E ratios will come down.” They are quoting mean reversion as though it is a law. It is not a law. It is a model or a hypothesis. You need to justify the model before you can make the claim that mean reversion will hold. Maybe the high P/E ratio is due to a drop in interest rates that may last decades or forever. It is not due to probabilistic extremes and so regression to the mean does not apply.
So whenever you claim that reversion to the mean will apply you have to also justify the claim. A statement such as "due to mean reversion the price will decrease" is equivalent to saying "due to the expectation of price decreasing the price will decrease." A better statement is to say "under the mean reversion hypothesis (MRH) the price will decrease. The MRH is believed to hold since ..."
This is where many people go wrong. Books and the internet are full of stock trading schemes that worked in the past but which, strangely, stop working. The reason may be due to regression to the mean.
Suppose you look at some stock market history data and pick out some trading schemes that have done well. There is a lot of probabilistic noise in the markets and the schemes that have done well will have some element of success due to chance alone. All schemes will have some element of chance and your biased selection process will pick the ones with the most extreme chances. When you then trade the schemes they will regress to the mean and perform worse in the future. That is a law. You cannot cheat the law (unless you are lucky).
A good example of this is if you have 100 monkeys and you choose one to be your portfolio manager based on its performance in the past. The top monkey may have beaten Warren Buffet in the past 5 years but we assume that this is entirely due to luck. So naturally it will revert to the mean and not outperform in the future.
If there is some element of skill in the trading schemes then that element will not be removed by regression to the mean. Only the probabilistic element regresses. The skill element may, of course, revert to the mean. But there is no law saying that it will or will not.
Since reversion to the mean is a model and requires justification before it can be used then we require tests and measures for reversion. An example is the Variance Ratio (VR) which measure mean reversion.
Suppose we have a sequence of monthly values that we wish to test such as the NZD/USD exchange rates shown. This is an interesting example because I just read a quote "... assuming the New Zealand dollar will return to its long-run average level of $US0.617." That quote assumes mean reversion.
The Variance Ratio is defined as VR(m) = sum of squared deviations of values m months apart. (This is then divided by m * VR(1) to scale it).
The idea behind this measure is that if there is mean reversion then differences between months over long periods of time tend to cancel out since the sequence has to wander back to the mean. So yearly changes tend to be smaller than 12 times the monthly changes.
If there is a trend in the data then the VR tends to be higher than one. If the data is a random walk then the VR goes nowhere and so stays near 1. If there is mean reversion then the deviations tend to cancel each other out and so the VR goes to zero. Thus a plot of VR vs m shows visually how much mean reversion there is in the data.
For the exchange rate data the plot is this. It looks as if there is no mean reversion and lots of trending up to 50 month intervals then some signs of reversion start appearing. After 100 months the amount of reversion levels out. It looks like some tendency for mean reversion over a 100 month period. The exchange rate may return to 0.617 but the author of the quote wasn't expecting it to take 8 years.
August 2009 Update: Below is the exchange rate up to August 2009. It did revert to 0.617! But was the reversion due to the "law of mean reversion" or was it due to the financial crisis? The latter, surely.
The main point of modeling reversion to the mean is to use the model to predict future behaviour. There are three elements to this: the mean, the rate of reversion to the mean, and the volatility around the mean. An example of a model which uses these three elements is the Ornstein-Uhlenbeck (OU) model.
Models such as the OU can be useful for measuring and using mean reversion. But they have a danger in that they assume mean reversion exists – they are not models that can prove mean reversion.
An example of a model that gives rise to mean reversion is the cointegration model. Two processes are cointegrated if they are tied together in a certain way (so that a linear combination of them has certain desirable properties that neither has on their own). Cointegration can give rise to mean reversion. Cointegration can be tested for and verified and thereby verifying that mean reversion exists.