The Problem of Publishing in the Social Sciences

One of the troubles with the way that academic research in the social (and biological) sciences is set up, is that there is a bias toward publishing research that is statistically significant. Here are some of the problems:

  1. If honestly done, there is value in publishing research that says there doesn’t seem to be any relationship between variable being studied and the cofactors. If nothing else, it would tell future researchers that that avenue has been checked already. Try another idea.
  2. It encourages quiet specification searches, where the researcher tries out a number of different variables or functional forms, until he gets one with significant t-coefficients. Try enough models, one will eventually hit the 95% significance threshold.
  3. What is statistically significant is sometimes not really significant. The result might be statistically significantly different than the null hypothesis, but be so small that it lacks real significance. I.e., learning that a compound increases cancer risk by one billionth should not be significant enough to merit attention.
  4. Researchers are people just like you and me, and all of the foibles of behavioral finance apply to them. They want tenure, promotions, don’t want to be let go, respect from colleagues and students, etc. They have biases in the selection of research and the framing of hypotheses. For example, we can’t assume that stock price movements have infinite variance, because then Black-Scholes, and many other option formulas don’t work. The Normal distribution and its close cousins become a crutch that allows for papers to get published.
  5. Once an idea becomes a researcher’s “baby”, they tend to nurture it until a lot of contrary evidence comes in. (I’ve seen it.)
  6. Famous researchers tend to get more slack than those that are not well-known. I would trot out as my example here returns-based style analysis, which was proposed by William Sharpe. When I ran into it, one of the first things I noticed was that there were no error bounds on the calculations, and that the cofactors were all highly correlated with each other. The paper didn’t get much traction in the academic world, but was an instant hit in the manager selection consultant community. A FAJ paper in 1998 (I think) came up with approximate error bounds, and proved it useless, but it is still used by some consultants today. (I have many stories on that one; it is that only time that I wrote a pseudo-academic paper in my career to keep some overly slick consultants from bamboozling my bosses.)
  7. Data sets are usually smaller than one would like, and the collection of raw data is expensive. Sample sizes can get so small that relying on the results of subsamples for various cofactors can be unreliable. This is a particular problem in the media when they publish the summary results on drug trials, but don’t catch how small the samples were. People get excited over results that may very well get overturned in the next study.
  8. Often companies fund research, and they have an interest in the results. That can bias things two ways: a) A drug company wants their proposed drug approved by the FDA. A researcher finding borderline results could be incented to look a little harder in order to get the result his patron is looking for. b) A finance professor could stumble across a new profitable anomaly to trade on. That paper ends up not getting published, and he goes to work for a major hedge fund.
  9. The same can be true of government-funded research. Subtle pressure can be brought on researchers to adjust their views. Politically motivated economists can ignore a lot of relevant data while serving their masters, and this is true on the right and the left.


The reason that I write this is not to denigrate academic research; I use it in my investing, but I try to be careful about what I accept.

Now, recently, I took a little heat for making a comment that I thought that the unadjusted CPI or median CPI was a better predictor of the unadjusted CPI than the “core” CPI. So, I went over to the database at FRED (St. Louis Fed), and downloaded the three series. I regressed six month lagged unadjusted, median, and core CPI data on unadjusted CPI data for the next six months. I made sure that the data periods were non-overlapping, and long enough that data corrections would induce little bias. I constrained the weights on my three independent variables to sum to one, since that I am trying to figure out which one gets the most weight. My data set had 80 non-overlapping six-month observations stretching back to 1967. Well, here are the results:

  • Intercept: -0.0002 (good, it should be close to zero)
  • Unadjusted CPI: 0.1720 (prob-value 12.3%)
  • “Core” CPI: -0.1665 (prob-value 11.2%)
  • Median CPI: 0.9945 (no prob-value because of the constraint imposed)
  • Prob-value on the F-test: 24.3% (ouch)
  • Adjusted R-squared: 1.10%. (double ouch)

What does this tell me? Not much. The regression as a whole is not significant at a 95% level. Does the median CPI (from the Cleveland Fed) better predict the unadjusted CPI than the “core” or unadjusted CPI? Maybe, but with these results, who can tell? It is fair to say that core CPI does not possess any special ability to forecast unadjusted CPI over a six-month horizon.

From basic statistics, we already know that the median is a more robust estimator of central tendency than the mean, when the underlying distribution is not known. We also know that tossing out data (“core”) arbitrarily because they are more volatile (and higher) than the other components will not necessarily estimate central tendency better. Instead, it may bias the estimate.

So, be wary of the received opinion of economists that are in the public view. Our ability to use past inflation measures to predict future inflation measures is poor at best, and “core” measures don’t help in the explanation.