Category Archives: Artificial Intelligence

Predicting Stock Prices with Machine Learning – Part 1- Introduction

This post is the first in a series that will explore the forecasting of stock prices using machine learning (ML) methods (for a quick intro to ML see my previous post). If you’ve spent any reasonable amount of time with me then you’ll know that I tend not to talk too kindly of papers that attempt to forecast equity prices using machine learning. However, here we are simply using equity price time-series as a great example of a non-stationary process.

Price prediction or, more generally, models for generating alpha are not the best use of ML in the quantitative trading process. Far from it. At some point I’ll dedicate a post solely to describing the architecture of a quantitative trading system but for now let’s just say that portfolio optimisation, the mixture of prediction models and algorithmic execution are tasks better suited to ML.

All the same, it is possible to forecast long term price movement with ML. This series of posts will be based on work for a paper that I published earlier this year called Automated trading with performance weighted random forests and seasonality where I demonstrated the power of the online generation of ML models and suggested a novel and highly successful way to combine the predictions of multiple models.

Before we get to the nitty-gritty of combining model outputs, we first need to cover some housekeeping essentials . Initially, we’ll look at the input data and how we turn this into useful features for our model. Without data, we’re nothing so this step is arguably our most important. Next we’ll go on to look out how to measure the performance of our prediction systems including a number of important and all-too-often forgotten metrics for understanding the long term success (or not) of our model. Finally we’ll get to the fun stuff and begin to train some ML models. We’ll start simple and add layers of complexity with associated justifications along the way.

I leave it at that for now. Below is a list of the posts to come and I’ll hyperlink the items as I get them written:

  • Part 2 – Data and features
  • Part 3 – Performance Metrics
  • Part 4 – Standard Methods
  • Part 5 – Ensemble Methods
  • Part 6 – Incorporating “online” performance weighting
  • Part 7 – Summary

What is machine learning? A simple introduction

Even amongst practitioners, there is no truly well accepted definition for machine learning. So, I’ll provide two:

  • Pioneer machine learning researcher Arthur Samuel defined machine learning as: “the field of study that gives computers the ability to learn without being explicitly programmed”. This definition is beautiful in its simplicity though lacks a little formality. So, with a little more structure,..
  • Tom Mitchell states that “a computer program is said to learn from experience E, with respect to some task T, and some performance measure P, if its performance on T as measured by P improves with experience E”.

Let’s reinforce the definitions with an example. A classic practical application is the email spam filter. The email program watches which emails the user does or does not mark as spam and, based on that, learns how to better filter future spam automatically. In the parlance of Tim Mitchell’s definition, classifying the emails as spam or not span is the task, T, watching the user label emails as spam or not spam is the experience, E, and the fraction emails correctly classified could be the perform measure, P.

There are a great number of machine learning algorithms and, as such, they are often divided into three main types: supervised, unsupervised and reinforcement learning algorithms.

Supervised Learning – machine learning with labels

Before providing a definition, let’s start with an example. Imagine you want to predict the stopping distance for cars given the speed that the car is travelling. The graph below shows some data from the “cars” dataset in R.

distance vs speed

A supervised learning algorithm would allow us to use the data available to make a general rule for making  predictions about future distances for speeds that we have not yet witnessed. In our two-variable example, this is the well-know task of fitting a line to the data. Eyeballing the data, we could conceivably fit a linear model (red) or a polynomial model (blue) shown below.

distance vs speed LINdistance vs speed POL

Fitting these models is an example of a supervised learning algorithm. The term supervised learning refers to the fact the algorithm requires a dataset for training that contain the “right” answers. That is to say, in our example, for every datapoint on the cars speed, we also had the corresponding data for the actual stopping distance.

The cars example is also a case of a regression problem, where we are predicting a continuous valued output (the distance).

Another type of supervised learning task is classification. Again, let’s set the scene with an example.

class

The figure above shows data from the well-known iris database. It shows a scatter plot of the sepal length vs. petal length of a number of iris plants. The points are coloured by species. Here, the machine learning task is to predict the species given new petal and sepal measurements. Which species would you label the new data-point in black? What makes this a classification task is that the variable to be predicted (species) is discrete valued.

 Unsupervised Learning – machine learning without labels

With unsupervised learning, the data contain no labels and the machine learning algorithm is tasked with finding structure in the data.

One very common type of unsupervised learning is known as clustering and is used to for categorisation of google news. Each day, google algorithms crawl the web for news stories and use clustering algorithms to group similar stories together. Other pertinent examples of clustering include: organising computer clusters, social network analysis and market segmentation.

There are a number of other unsupervised learning algorithms and indeed a number of other types of machine learning than we have not touched upon in this post. If you’ve found this page interesting and have been inspired to leaner more, I recommend the following books:

Also, for some great advice on the practical application of machine learning methods as well as a detailed derivation of some common algorithms, I strongly recommend Andrew Ng’s Coursera course .

So you want to predict the stock market…

At least once a week, a Google Scholar Alert pops into my inbox featuring another bored academic that though they’d try using their “expertise” to “predict the stock market”. Said paper tends to follow this structure:

  • Starts with a paragraph about how humans are obsolete and that the market is being run by computers.
  • Next comes the ‘novel’ machine learning technique, which is really just a well known algo. (SVM, neural net., logistic regression) with an adaptive learning rate.
  • Some (usually 5) simple technical analysis indicators are used on unprocessed daily price data to create some features for the new super-algo.
  • Said algorithm is applied to predict whether price(t+1) > price(t) or vice versa.
  • Results report prediction accuracy of ~65% (And we know they tried 100 different stocks before settling on the 3 that they reported results for).
  • There are usually no out of sample results at all!

Some will stop here, reporting that they’ve cracked it and that their technique beats all others. If we’re lucky, however, they’ll go on to assume that they can trade at the close price of each day, buying when they predict upward movement and selling before a fall with a constant size strategy, reporting annual returns of 40%.

Sigh…

Putting aside the lack of out of sample results and non-existant estimation of transaction costs, they’re really missing the point. Attempting to predict the price return over the next 24 hours is not the best way to go about using machine learning to create investment opportunities.

Predicting the direction of price movement is certainly an essential part of automated trading models. But it’s not the only part. A simple automated trading model is better based around event detection: bull, bear or neutral markets, insider trading or institutional trading. Detecting such schemes can easily be approached with either binary classification of hypothesis testing.

In binary classification, a feature matrix, X of size (N*K), is used to predict a binary vector, Y of size N. In this case we would interpret each row as representing one day with K features that we hope relate in some way to the occurrence of the market event that we are trying to predict, Y. Once in this format, binary classification is a simple matter of applying one of the many machine learning algorithms to map each row, Xi to Yi. If our out of sample predictions are consistent, we have a means of quickly identifying regimes and adjusting our trading strategy accordingly. However, given that this kind of binary classification is a standard machine learning problem, it suffers all the pitfalls of overfitting.

For hypothesis testing, we make an assumption about the distribution of an observable variable before applying a test for low probability events. As an example, let’s say we’re into high-frequency volatility trading and we’re looking for unusual activity in a stock. We can describe the number of orders arriving per second with a Poisson distribution, X, and the individual order volume * price with a Gamma distribution, Y. Now we can fit X and Y through maximum likelihood estimation and calculate the probability, p, of the market acting “as normal” each second. Taking p<0.05 as a rare event, if we see more than 5 such events in a 100 second period we might be tempted to make a move. It’s important to bare in mind, however, that hypothesis testing assumes structure in data and thus requires a stationary distribution for decent results.

These kinds of techniques produce far more stable predictions than forecasting daily returns and provide us with information upon which it is much easier to trade.