Out of Sample Data – How the Human Can Add Value to the Automated Trading Process
First, I need to describe over-fitting or more commonly known as curve-fitting. Curve-fitting is creating a model that too “perfectly” fits your sample data and will not generalize well on new unseen data. In trading, this can be thought of as your model too closely fits the historical data and will surely fail/struggle to adapt to new live data.
Here are two visuals I found to help illustrate this idea of curve-fitting.
How can we avoid curve fitting?
The simplest and best way to avoid (or reduce our risk of) curve-fitting is to use “Out of Sample” data. We simply designate a portion of our historical data (say the last 30% of the test period) to act as our unseen or “Out-of-Sample” data.
We then go about our normal process designing/testing/optimizing rules for trading or investing using only the first 70% of the test period or the “In-Sample” data.
After finding a satisfactory trading method we pull out our Out of Sample data and test on the last 30% of our test period.
It is often said that if the model performs similarly in both the in and out of sample data then we can have increased confidence the model generalizes well enough to new data.
No need to bring up selection bias or data mining here, but I will certainly cover it in another post/video series.
How can the human add value to the automated trading process?
The intelligent reader will question why we chose 30% and why the last portion of the data (as opposed to the first 27% or last 15%)?
The trader can add value and increase a trading model’s success by controlling the
- Date Ranges of entire test
- Out of sample location
- Percentage of out of sample data
I have always heard that good science is often mostly attributable to good experimental design. In the trader’s case, good science would be setting up a proper test by choosing an appropriate test period, out of sample location, and out of sample percent.
Trading Data Example
Let’s take a look at the S&P500 from 2004 to 2017. In the chart below I have designated the last 40% of the data to be our Out of Sample data.
This means we would create a trading model on the data from 2004 to roughly 2011 – the blue In Sample data. However, 2011 to present day (red Out of Sample) has been largely straight up.
If we build a long strategy that avoids most of 2008 via some rule or filter it may certainly do well in on our Out of Sample data simply because the underlying market went straight up!
You can see the importance of intelligently selecting your test period and out of sample period’s location and size.
What if we used the first 40% of the data as our Out of Sample data? This provides a few benefits. First, it allows us to build our trading model on the most recent data or the last 60% of the data set – in our case 2009 to 2017.
Many traders will argue that they prefer to build their models on the most recent data as it is most likely the most similar to the live data they will soon experience. They then obviously test Out of Sample but just use older data and in our case 2004 to 2008 or the first 40%.
Now how did I know 40%? I simply looked at the chart and selected a percentage that would capture the financial crisis. My thought process is that if we train a model from 2009 to 2017 and then test it on 2004 to 2008 and it performs similarly in both periods then we surely have uncovered some persistent edge that generalizes over two unique sets of data. The two unique sets being our In-Sample (2009 to 2017) and our Out-of-Sample (2004 to 2008).
Selecting a proper location and percentage is mission critical. You want to design your test to be as difficult as possible to pass – try to break your system in the testing process. If you do not, then the market will surely break it once you start live trading!
Testing design and set up is undoubtedly where the human still adds value to the automated trading process. Build Alpha allows users to leverage computational power in system design, validation, and testing; however, the test set-up in BA is still an area where a smarter, more thoughtful trader can capture an edge over his competitors and add robustness to the output.
Bad Examples of OOS Data
Below I have some photos of some terrible experiment design to help drive the point home. Both present fairly simple out of sample tests to “pass”. Passing OOS testing is not the goal. Creating robust strategies is. This requires difficult OOS tests to pass (not what is pictured below). Please watch the video above for an explanation.
The main takeaway is that the human can still add value to the automated trading process by proper test/experiment design. That is why BuildAlpha software allows the trader/money manager to adjust everything (or nothing) from
- Test Period
- Out of Sample percent
- Out of Sample location
- In-Sample Minimum number of trades
- Out of Sample minimum number of trades
I hope this was helpful – catch you in the next one,
For a more in-depth discussion check out