The Problem of Publishing in the Social Sciences

Posted on 27 February 200827 February 2008 by David Merkel

One of the troubles with the way that academic research in the social (and biological) sciences is set up, is that there is a bias toward publishing research that is statistically significant. Here are some of the problems:

If honestly done, there is value in publishing research that says there doesn’t seem to be any relationship between variable being studied and the cofactors. If nothing else, it would tell future researchers that that avenue has been checked already. Try another idea.
It encourages quiet specification searches, where the researcher tries out a number of different variables or functional forms, until he gets one with significant t-coefficients. Try enough models, one will eventually hit the 95% significance threshold.
What is statistically significant is sometimes not really significant. The result might be statistically significantly different than the null hypothesis, but be so small that it lacks real significance. I.e., learning that a compound increases cancer risk by one billionth should not be significant enough to merit attention.
Researchers are people just like you and me, and all of the foibles of behavioral finance apply to them. They want tenure, promotions, don’t want to be let go, respect from colleagues and students, etc. They have biases in the selection of research and the framing of hypotheses. For example, we can’t assume that stock price movements have infinite variance, because then Black-Scholes, and many other option formulas don’t work. The Normal distribution and its close cousins become a crutch that allows for papers to get published.
Once an idea becomes a researcher’s “baby”, they tend to nurture it until a lot of contrary evidence comes in. (I’ve seen it.)
Famous researchers tend to get more slack than those that are not well-known. I would trot out as my example here returns-based style analysis, which was proposed by William Sharpe. When I ran into it, one of the first things I noticed was that there were no error bounds on the calculations, and that the cofactors were all highly correlated with each other. The paper didn’t get much traction in the academic world, but was an instant hit in the manager selection consultant community. A FAJ paper in 1998 (I think) came up with approximate error bounds, and proved it useless, but it is still used by some consultants today. (I have many stories on that one; it is that only time that I wrote a pseudo-academic paper in my career to keep some overly slick consultants from bamboozling my bosses.)
Data sets are usually smaller than one would like, and the collection of raw data is expensive. Sample sizes can get so small that relying on the results of subsamples for various cofactors can be unreliable. This is a particular problem in the media when they publish the summary results on drug trials, but don’t catch how small the samples were. People get excited over results that may very well get overturned in the next study.
Often companies fund research, and they have an interest in the results. That can bias things two ways: a) A drug company wants their proposed drug approved by the FDA. A researcher finding borderline results could be incented to look a little harder in order to get the result his patron is looking for. b) A finance professor could stumble across a new profitable anomaly to trade on. That paper ends up not getting published, and he goes to work for a major hedge fund.
The same can be true of government-funded research. Subtle pressure can be brought on researchers to adjust their views. Politically motivated economists can ignore a lot of relevant data while serving their masters, and this is true on the right and the left.

The reason that I write this is not to denigrate academic research; I use it in my investing, but I try to be careful about what I accept.

Now, recently, I took a little heat for making a comment that I thought that the unadjusted CPI or median CPI was a better predictor of the unadjusted CPI than the “core” CPI. So, I went over to the database at FRED (St. Louis Fed), and downloaded the three series. I regressed six month lagged unadjusted, median, and core CPI data on unadjusted CPI data for the next six months. I made sure that the data periods were non-overlapping, and long enough that data corrections would induce little bias. I constrained the weights on my three independent variables to sum to one, since that I am trying to figure out which one gets the most weight. My data set had 80 non-overlapping six-month observations stretching back to 1967. Well, here are the results:

Intercept: -0.0002 (good, it should be close to zero)
Unadjusted CPI: 0.1720 (prob-value 12.3%)
“Core” CPI: -0.1665 (prob-value 11.2%)
Median CPI: 0.9945 (no prob-value because of the constraint imposed)
Prob-value on the F-test: 24.3% (ouch)
Adjusted R-squared: 1.10%. (double ouch)

What does this tell me? Not much. The regression as a whole is not significant at a 95% level. Does the median CPI (from the Cleveland Fed) better predict the unadjusted CPI than the “core” or unadjusted CPI? Maybe, but with these results, who can tell? It is fair to say that core CPI does not possess any special ability to forecast unadjusted CPI over a six-month horizon.

From basic statistics, we already know that the median is a more robust estimator of central tendency than the mean, when the underlying distribution is not known. We also know that tossing out data (“core”) arbitrarily because they are more volatile (and higher) than the other components will not necessarily estimate central tendency better. Instead, it may bias the estimate.

So, be wary of the received opinion of economists that are in the public view. Our ability to use past inflation measures to predict future inflation measures is poor at best, and “core” measures don’t help in the explanation.

2 thoughts on “The Problem of Publishing in the Social Sciences”

Excellent piece. A few comments:

– A couple of years back, a social scientist published a paper that looked at the distribution of p-values in published studies. Not surprisingly, there was a significant “overage” of studies with p-vales that were “barely” significant (i.e. at the 4.99% level) and very few that just missed being the cutoff under the null, it should be approximately uniformly distributed. Having done this research gig a while, it looks to my jaundiced eye like many of the researchers “trained” their data (you call it specification searching. It’s an old joke – if your first pass give you results with p-values of (e.g.) 6% or 7%, you can usually find a tweak (additional regressors, transformation, eliminate a few “bad” observations that’ll get yo over the hump.

– We studied a paper in grad school that looked at the distribution of the R-Squared. It looked at the problem that arises because we have such large data sets (# obs) and so many potential regressors (think of all the data in Compustat – bsically the universe of financial statement variables on disk). It turns out that an R-2 of 6-7% comes up quite often with a large enouygh data set even if regressors are chosen by chance. Given that so many studies get R-squared in that range, it makes you think.

– While the #3 effect is common, it just increases my admiration for folks like Gene Fama. He made his early career off market efficiency and the CAPM, and then 20 years later got into his size/book-market stuff that cast doubt on his earlier stuff.

By the way – love the blog. Your stuff is highly relevent to the classes I teach (securities analysis and a student-run investment portfolio class). I require it as reading for my investment students. It’s good to see someone playing out their faith in the marketplace and doing both well and good.

Unknown, good points. I cut my teeth on the 1969 stock splits article by Fama, Fisher, Jensen and Roll. Distinguished men all. Roll?s critique of the CAPM is still a classic. And, yes, Fama has demonstrated intellectual flexibility that many others have not. I like it when the data forces me to abandon old notions; then again, the markets are fluid, and good money management requires a balance between fixed principles and recognizing changed conditions.

Comments are closed.