Information and money go hand in hand. Share markets are unpredictable, but are not inherently random – they are driven by information. Unfavorable quarterly reports or profitable business ventures are determined by forces operating in the real world. With the right information at hand, such events would be entirely predictable.


The world share market is a dynamic, ever-changing beast. Overall, it is considered to be “efficient” – meaning that prices reflect the true value of shares once the available knowledge is accounted for. Well, that’s the theory – but it doesn’t stop the financial sector from spending billions deciding what to buy and what to sell.


Data as a service

Data from the global share market is available from subscription services, but is also available for free. Both Google and Yahoo track and archive global financial data.


Download trading data from Yahoo

This R function downloads financial data from Yahoo and converts it to a time-series.


Now let’s download some tech stock data…



Now we have the adjusted close values for all a stocks in stocks.matrix – it looks like this:


As we can see, only IBM was listed in 1980.

Before we can plot our stock data, we need to convert it from a “wide” table to a “long” table:



Now we can use the ggplot2 package to make a nice graph…



And the result:


Tech stocks since 1980 – downloaded from Yahoo


Download trading data from Quandl

Often makes sense to use a data aggregator such as Quandl. Quandl provides access to Google and Yahoo financial archives, as well as data from many other providers. And it’s got a handy R package to handle the API for you.





Lets get animated!

The animated graph below shows the profits we could have earned by investing $1 in the share market in 2001. It was generated by downloading global stock market data from yahoo, and cleaning it up a bit using the spike smoother function below (since the data contained a few glitches). The gif was generated with the R animation package.


animated graph of investment returns since 2001

What could you have earned by now from a $1 investment in 2001?

A few notes on data quality

Data quality is an important issue from both paid and free data providers.  Errors can be introduced if events such as share splits or buybacks are not accounted for.  Paid services may pay more attention to eliminating errors, however the issue can never be overcome completely.


Scrubbing spikes from Yahoo

While the quality of data from Yahoo is generally good, it seems to struggle with a small subset of historical prices.  Price fluctuations are normal, but transient changes by orders of magnitude should be concerning.

Several techniques can be applied to deal with this situation. For very choppy data, you might choose to apply a moving average. For one-off spikes, it usually makes sense to remove the spike and keep everything else. Here’s an escape of a simple spike scrubber for R.