This post is the first in a series that will explore the forecasting of stock prices using machine learning (ML) methods (for a quick intro to ML see my previous post). If you’ve spent any reasonable amount of time with me then you’ll know that I tend not to talk too kindly of papers that attempt to forecast equity prices using machine learning. However, here we are simply using equity price time-series as a great example of a non-stationary process.

Price prediction or, more generally, models for generating alpha are not the best use of ML in the quantitative trading process. Far from it. At some point I’ll dedicate a post solely to describing the architecture of a quantitative trading system but for now let’s just say that portfolio optimisation, the mixture of prediction models and algorithmic execution are tasks better suited to ML.

All the same, it is possible to forecast long term price movement with ML. This series of posts will be based on work for a paper that I published earlier this year called Automated trading with performance weighted random forests and seasonality where I demonstrated the power of the online generation of ML models and suggested a novel and highly successful way to combine the predictions of multiple models.

Before we get to the nitty-gritty of combining model outputs, we first need to cover some housekeeping essentials . Initially, we’ll look at the input data and how we turn this into useful features for our model. Without data, we’re nothing so this step is arguably our most important. Next we’ll go on to look out how to measure the performance of our prediction systems including a number of important and all-too-often forgotten metrics for understanding the long term success (or not) of our model. Finally we’ll get to the fun stuff and begin to train some ML models. We’ll start simple and add layers of complexity with associated justifications along the way.

I leave it at that for now. Below is a list of the posts to come and I’ll hyperlink the items as I get them written:

- Part 2 – Data and features
- Part 3 – Performance Metrics
- Part 4 – Standard Methods
- Part 5 – Ensemble Methods
- Part 6 – Incorporating “online” performance weighting
- Part 7 – Summary