Where to Find Data

Here’s another letter from a reader:

Hi David,

I have been a long time reader of your blog but writing for the first time. To me a key part of the investment process for a generalist investor has to be a way to efficiently screen stocks to generate  investment ideas and also measure historical returns and fundametals for various industry groups under various economic conditions. I am curious as to what data sources you use in your own work for historical stock market and fundamental data? Do you pull this into your own database and do you use Excel or  a statistical package for any quantitative backtests for your screens?

In a previous job I used FactSet to pull historical monthly pricing and quarterly fundamental data for a universe of over 100 regulated utility stocks (both current and past public firms). I also taught myself a fair bit of statistics along the way including logistic regressions and discriminant analysis in order to backtest different models for identifying outperformers, dividend growth/cuts etc. Unfortunately I had to do this all in Excel, which made the whole process pretty painful right on from cleaning up outliers, sorting etc. I guess for simple queries of stock performance and tracking various fundamental metrics over time it would work touse Excel.

One motivation for asking is that I hope to one day become an investing blogger myself, and am wondering if there are low cost ways of accesing this kind of data. Additionally I am always interest in real world methods people follow to prune the thousands of possible stocks to invest in to a smaller more promising subset that people can invest more time analyzing on a fundamental basis. To me the hallmark of a successful investor is the willingness to unturn many investing stones until a promising idea is found.

I am a tightwad when it comes to paying up for data or software.  I use the following:

  • FRED
  • Yahoo Finance
  • Value Line via my local library
  • AAII Stock Investor (A screening package, but more than that)
  • The Wall Street Journal
  • Bloomberg.com
  • FINRA TRACE — bond data
  • Bureau of Labor Statistics
  • Federal Reserve (but not FRED)
  • Microsoft Excel
  • If I need to do something complex, I can use the open source statistics package R.

AAII Stock Investor and Value Line are my main screeners.  I pay $100/year for AAII, and nothing for Value Line.  Oh, my library gives me Morningstar for free as well.  Both subscriptions are very full, and very useful as well.

Now all that said, though it is important to be able to access the data, developing the ability to interpret it is far more important.  There can be too much rigor in trying to analyze quantitative data.  You need to identify the three most significant variables that affect the result being analyzed and focus on analyzing them.  Most investment questions can be analyzed through the three most important variables.

Though I do backtests occasionally, I am happier to stick with theory, and base my actions off that.  Backtests are fraught with all sorts of bias, and basically say that the future will be like the past, only more so.

It would be great to have Bloomberg, FactSet, and some off-the-shelf statistics/programming package that integrates with them.  But life is tough, and we don’t always have that luxury, so we have to seek out data on the cheap, and analyze it cheaply also.

That’s how I do it now, but if I get more clients, I will start paying up for data and software.

1 Comment

  • cig says:

    http://www.quandl.com/ is a great aggregator of publically available data.

    Also when it comes to data two things you surely know but may be a useful reminder for your readers:

    – Lot of providers operate on a junk-in junk-out basis, and while people love data nobody likes to verify it, so basically plain wrong data is quite common (Yahoo is really hit and miss for instance). People will happily make theories about what they find in the data when often it’s just a bug in their dataset (or processes).

    – When backtesting over long periods, you’ll find things that happened because at the time few people had the data that is now broadly accessible…

    Also are you not possibly breaking some rules if you use the public Bloomberg site or a library’s Value Line sub for professional fund management? (I would expect vendor terms to have exclusions against pro use.)