Classic: Avoid the Dangers of Data-Mining, Part 1

The following was published at RealMoney on 5/28/2004:

-=-=–=-==-=-=-=-=–=-=-==-=-=-=-=-=-=-=-=-=-=-=–=-=-=-==-=-=-=–=-=–=-=-=-=-=-=-==-=-=-=-=-=-=-=-=-

Investing Strategies

 

Data-mining attempts to get data to give a sharp answer when one may not be present.

Technical analysis can involve data-mining.

Chance can make a method look better than it is.

 

Investors often get pitched quantitative methods for investing. These methods can be either fundamental or technical in nature and often have shown great results on a pro forma basis in the past, but when ordinary investors (and often, professional investors) try them out, they don’t work as well in practice. Why?

There are many reasons, but in my opinion, there’s one main reason: data-mining. I’ll define data-mining and give you practical ways to avoid it whether you apply quantitative methods or create new quantitative investment methods.

 

Data-Mining Defined

I never got my doctorate, but I did complete my field in econometrics in grad school. One of the things that they drilled into us was the danger of overinterpreting your data. As a mythical economist supposedly once said, “If I torture the data enough, I can make it confess to anything.”

When a quantitative analyst mines data, he repeatedly tests new hypotheses against the same data set. When the analyst finds an economically or statistically significant relationship, he stops testing alternative hypotheses. He may start to optimize the hypothesis that gave a significant result.

Data-mining, or as some call it, specification searching, attempts to get the data to give a sharp answer when no sharp answer may be present. Financial data are messy; there is a lot of noise and often not much signal. Every time data get analyzed, there is a small but significant probability that noise in the data will be interpreted as a signal.  Overinterpreting the data increases the odds that what the analyst thought was signal was actually noise.

 

Examples of Data-Mining

As examples, consider Michael O’Higgins’ Beating the Dow, which introduced and popularized his “Dogs of the Dow” theory, or James P. O’Shaughnessy’s What Works on Wall Street. In each of these books, different hypotheses were tweaked to find a method that would have produced the best result in the past.

The basic idea underlying the “Dogs of the Dow” theory has merit: Buy cheap, large-cap stocks. But in testing multiple theories, the cheapness metric was varied. Which is the best: low price-to-book, earnings, sales, cash flow, low price or dividend yield? Another factor that varied was which stocks would be picked. Would it be the top 10, top five, top one or even the second-best? How often would the strategy get rebalanced: annually, quarterly, or monthly?  With this many permutations, the strategy that ended up performing best likely did so accidentally.

What Works on Wall Street also contained some good core ideas (although it was a bit misnamed; it should have been titled, What Has Worked on Wall Street, but that would not have sold as well). Its core theory: Buy cheap stocks that have positive price and earnings momentum. But in this theory, the cheapness metric also varied, along with the methods for analyzing momentum — enough that more than 50 different theories got tested. The basic idea is sound, but again, the variation with the best result won only by accident.

 

… And Technical Analysis

Bloomberg has a back-testing technical analysis function [BTST]. It takes eight different technical analysis methods and shows how each would have performed in the past for a given security. Even if some of the methods had validity, if an analyst fed the BTST function a stream of random data instead of a real price series, the function would likely flag one of the methods as profitable.

Another area where I have seen abuse is in “services” that offer to identify “rolling stocks,” i.e., stocks that seem to oscillate between two predictable boundaries. This gives the potential for an investor to make quick and easy profits by buying at the low boundary and selling at the high boundary. The trouble here is that it is easy to identify stocks that have traveled in boundaries in the past, but the past is usually a poor predictor of the future. Results from following advice like this should be random at best, with the danger that your losses could increase if the conditions that created the temporary stability shift.

 

Data-Mining in Modern Portfolio Theory

Why do stocks always seem to do better than bonds in the long run? How much better should they be expected to do? These questions frame what is called the Equity Premium Puzzle. Academics who use data-mining assume that past is prologue and that initial valuation levels have no impact on the results for their forecast period. Back in 1999, I often commented that since 1926, we’d seen only one and a half full cycles of the equity markets. Naive estimates of the equity premium were popular among academics and practitioners then. We had not seen a second major bear market like that of the 1930s. The bear market of 2000-2002 has adjusted my view, but I am not convinced that valuation levels have returned to normal.

There are many societal and political factors that affect how much better stocks will do than bonds. People do not have infinite investment horizons; they will need at least some of the money at some point in their lives, so long-term total return averages are not indicative of what average investors are likely to achieve. Valuations matter, as do the current yields of bonds. Neglecting equity valuations and bond yields when doing asset allocation work will lead asset allocators to overweight stocks and bonds, which have done well historically but are unlikely to do as well over the next 10 years as the historic averages.

In a past job, I was a quantitative analyst for an asset manager that had a life insurance company as a client. There were a variety of derivative investments that got pitched to us that used diversification of different credit risks as a means for reducing risk. Often I would be shown a correlation matrix of past returns that showed high reductions in volatility from mixing different risky asset classes. I would ask the quantitative analysts on the sell side how stable the correlation matrix was, given how highly correlated most risky fixed-income asset classes were in 1998 during the Long Term Capital Management crisis, and afterward in the recovery. Most of the time, they hadn’t considered the question.

 

A Big Warning Sign

Anytime you see an analysis that relies on a correlation matrix of returns through some sort of mean-variance framework, be careful. My favorite target here tends to be a fund-of-funds, whether of the CTA, hedge, or mutual fund variety. There are several reasons for that.

First, there usually aren’t enough data to estimate the correlation matrix. Inexperienced practitioners do  so anyway, without realizing that they need at minimum, one data period for each unique correlation coefficient that they calculate. For example, for a correlation matrix of 10 return series, you would need at least 46 periods for the data, and really, you would want more than 70 to gain sufficient statistical credibility on a historical level.

Second, even if there are enough data to calculate correlation coefficients that are statistically credible, the financial processes that produce the correlation coefficients aren’t stable. Past correlation coefficients are poor predictors of future correlation.

Third, “past performance may not be indicative of future returns.” This is not only true of the level of returns, but also the variation of returns. It should not surprise anyone, then, that ratios of historical average return to the variability of return aren’t good predictors of the future ability of a manager to obtain returns with low variability of results. In short, Sharpe ratios (or reward-to-variability ratios) are, in my opinion, poor predictors of the ability of a manager or assets class to produce return and mitigate risk. Efficient frontier analyses draw pretty pictures, but they usually do not produce asset allocations that optimize the future risk/return tradeoff when the parameters are estimated from historical data.

Another data-mining villain is returns-based style analysis, which assumes that a manager’s true style can be discerned from the correlations of his returns with a variety of different asset class indices. Leaving aside the problems of multicollinearity and inability to develop confidence intervals on the constrained regression, the use of short historical data series might give a clear view of the past, but it is poor when used to predict how a manager will perform in the future. In short, the past correlations are poor for predicting future returns.

With academic financial research, it is good to remember that only the survivors get published, and surviving requires statistical or economic significance, either of which can occur for reasons of structure or chance. Data-mining allows marginal academics an opportunity to publish.

In the second part of this column, I will review some practical ways to assess quantitative methods and sidestep data-mining.