Improving Strategies

A crazy cool way to use Build Alpha. I have to admit that I did not come up with this idea, but it was suggested to me by a potential Build Alpha user.

He was wondering if Build Alpha could help come up with some rules of when he should avoid trading his existing strategy or even when to fade his existing strategy. Heck any improvement is a plus, right?

**Please note Build Alpha now accepts data in this format:  mm/dd/yyyy, hh:mm, open, high, low, close, volume, OI. Please refer to buildalpha.com/demo page for adding own data instructions**

*I say we found one strategy but we actually found tons that would be an improvement to his original strategy. Him and I only spoke specifically about one so that’s why in the video I slip and say we found one strategy. Did not feel like making a new video to clarify this minor point.*

He had a day trading system and compiled profit and loss results for that system in the following (Build Alpha accepted) format. Date, time, open, high, low, close, volume.  (*note BuildAlpha now accepts the time column as intraday capabilities are becoming fully operational*).

Below is his sample file. We purposely left the open (high and low) columns as all 0’s. The close column contains the end of day p&l from his original strategy.

We then set Build Alpha to have a maximum one bar holding period and to ONLY enter on the next bar’s open and to the ONLY exit on the next bar’s close. I will explain why this is in a minute.

We then chose the underlying symbol the original strategy was built on as market2. So for example, his original strategy trades ES (S&P500 Emini futures) so we only select Build Alpha signals calculated on Market2 which is set for ES.

So now if Build Alpha calculates a rule on ES-like close[0] <= square root(high[0] * low[0]) then we would “buy” the next bar’s open of market1 (again his results – which are 0) and “sell” the next bar’s close of his results which is the original strategy’s p&l for that day. This would essentially say that if this rule is true then go ahead with a green light to trade the original strategy the next day. If the rules are not true, then don’t trade the original strategy the next day. Ideally, we can find rules that increase risk-adjusted returns for the original strategy (which we did).

Now, what is even cooler is if we set Build Alpha to find short strategies we would essentially be “fading” his original strategy or finding rules of when to go opposite his original strategy.

Build Alpha found some good short/ “fade” rules to use as well. Here is an example that did quite well fading his original strategy (even out of sample – highlighted section).

After emailing him the results here is what he had to say in his email response:

“There are 2028 negative periods in my data with a gross loss of -1,217,880.26. That’s the theoretical maximum a short rule can achieve, if it were to find all losses. Your graph seems to show 380,000 short rule profits. That’s already 31% of all losses. If I don’t trade on these days, my net profit would go up by 380,000, a 46% increase.”

I thought this was a really unique way to use Build Alpha and I wanted to share. I think the same analysis can be done on strategies with longer holding periods too. I would just import daily marked to market results of the original strategy and Build Alpha would essentially find rules of when to hedge your strategy or fade it for a day or two. I think this is certainly a unique approach to add some alpha to performance.

Anyways, thanks for reading as always and keep a lookout for some MAJOR upgrades coming to Build Alpha very soon!

Thanks,

Dave

Thanks for reading,
Dave

$70,000 in profits in just 3 months. A Build Alpha testimonial

This is hands down the best email I have received since launching the BuildAlpha software a little over six months ago. It is a thank you note sent from a Build Alpha user, Madhur, who licensed the software back in March 2017 and has grown his account about $70,000 in that span (or about +55.92%).

Below is a photo of the email he sent and his account statements verifying his amazing first three months of trading while using the software.

What is amazing is that Madhur is/was a discretionary or hand trader! That’s right, he’s found a way to combine what he was already doing and the systematic edges that the Build Alpha software can find to further increase his OVERALL edge in the markets.

I love this story for so many reasons. First, it shows that finding a trading edge is vital regardless if you choose to automate your execution or not. Second, Build Alpha is a trading tool and not necessarily a system trading tool (albeit geared toward system traders no doubt). Third, Madhur found a unique way to incorporate the old with the new to make something better – a lesson for all traders (myself included).

I have received tons of emails of Build Alpha success and thank you notes, but none as specific as this one. It is hard to market user success without the proof otherwise you all would have your doubts (rightfully so) – but after receiving this I cannot help but feel proud and share.

Congrats Madhur, who can be found on twitter here: @MadhuryAlba, and the other successful Build Alpha users out there! Thanks for pushing my development of the software and continuing to support me with this pursuit.

Also, please read all the disclaimers. I am not guaranteeing that if you license the software and poof 3 months later you are up big. No one can promise anything in this game – I just wanted to share a story that put a smile on my face and makes all the development hours worth it!

Thanks for reading

Dave

Free Friday #19 – Long/Short Small Caps and June Update

This Free Friday, Free Friday #19, is a user submission! It is a long/short strategy for $IWM – the Russell 2000 ETF. Both the long and the short strategy only have two rules each and only hold for 1 day. Below I’ve posted the long strategy on the left and the short strategy on the right. Short edges have certainly been difficult to find over the past few years in the US equity indexes on a daily time frame, but one hopes they’ll pay for the effort when/if things turn south!

Both strategies were tested from 2002 to 2017 using 35% out of sample data. All performance is based on only a simple 100 shares per trade. *1 S&P500 futures contract is equivalent to about 500 $SPY shares for reference*

There is also $SPY (green plot) and $TLT (gold plot) plotted to see how the strategies would have performed on these markets as well; the strategy maintains profitability in both cases.

The long strategy rules are simple and all trades exit at the next day’s open.

  1. Day number is greater than 5. Today is June 30, 2017. Today’s day number is 30.
  2. High[3] <= Low[7]

The short strategy rules are simple as well and all trades exit at the next day’s open.

  1. Close[3] > Low[6]
  2. Close[0] > 8 Period Simple Moving Average

Below there is a photo of the long/short equity performance for this simple portfolio.

I also want to add an update to some of the Free Friday strategies. Things were pretty quiet for most of the futures strategies other than the equity index strategies this month.

Strategies #5, #6, #16 were the only futures strategies that traded so I wanted to show their June performance below.

Nasdaq #5: +$1,640.00
Russell Futures #6: +680.00
S&P500 Futures #16: +862.50

Again, all are just trading 1 contract for demonstration purposes and were posted publicly months ago. You can see the strategies on twitter here: @dburgh

Thanks as always and have a Happy Fourth of July,

Dave

Thanks for reading,
Dave

Free Friday #18 – Building a Strategy with Open Interest

As always, happy Friday!

This week I was asked by a Build Alpha user if he could build strategies using a contract’s open interest. Open interest is just the total number of outstanding contracts that are held by market participants at the end of each day. So it is intuitive that as more contracts are opened or closed then it might be telling of how traders are positioning.

This is a detailed and advanced post. Build Alpha is all point and click, but this is certainly a way more advanced blog post showing how a more sophisticated user can utilize the software.

I have to admit this is not something I have looked at previously so I was quite intrigued but I pulled some open interest data from TradeStation and saved it.

I then went on to create columns I – M below. Columns I-L are momentum measures (N period change) of Open Interest. For example, column J holds the Open Interest change over the past 5 days. Column K holds the Open Interest change over the past 10 days. Column N just holds the 3 bar sum of the 1-period momentum of Open Interest. The data can be seen below opened in Excel (I know who uses Excel anymore).

In order to use the above data in BuildAlpha, we need to format two separate files. First, we need to create a date, open, high, low, close, volume file of the actual S&P500 futures data (columns A, C, D, E, F, G). I copy and pasted those columns to a new sheet and then reformatted the date to YYYY/MM/DD, removed the headers, and saved it as a .csv file. Pictured below…

I then copy and pasted our dates and custom Open Interest data to a new excel sheet. This time instead of having the date, open, high, low, close, the volume we’ll use (copy) the date, 1-period OI change, 5-period OI change, 10-period OI change, 20-period OI change, 3 bar sum of OI as our six columns.

We can now pass this data into Build Alpha and build strategies using the Intermarket capabilities. However, in this case, our intermarket or Market 2 will be this custom open interest data and not some other asset.

The two videos below show exactly how I did this process in case you didn’t follow my Thursday night, two glasses of scotch deep, blog writing.

There is a different strategy displayed in the video above, but I promised some guys on twitter I’d share the strategy I posted Thursday. So below is the actual Free Friday #18. It holds for one day and trades when these conditions are true:

  1. Momentum(OpenInterest,20)[0] <= Momentum(OpenInterest,20)[5]
  2. Momentum(OpenInterest,20)[0] > Momentum(OpenInterest,5)[1]
  3. Momentum(OpenInterest,10)[0] > Momentum(OpenInterest,10)[1]

**S&P500 Futures strategy built on open interest data only and tested across Nasdaq, Dow Jones, and Russell Futures. Results just based on 1 contract**

So the last rule in Build Alpha would appear as Low2[0] > Low2[1] or translated as the low of Market 2 is greater than the low of Market 2 one bar ago. However, if you remember we created a custom data series for Market 2 and in the low column, we inserted the 10-period momentum of open interest!

Like I said this is a confusing post, but a really neat idea of how creative you can be with this software. The possibility of things we can test are immense.

Furthermore, when Build Alpha calculates RSI or Hurst, for example, using the close price of Market 2 (our intermarket selected) it will actually calculate RSI or Hurst on 20 bar momentum of the Open Interest (what we passed in for the close column)! You can also use the custom indicator/rule builder on these custom data columns.

You can also run strategies built on custom data like this through all the robustness and validation tests as well.

All in all, thanks for reading. I thought this was a cool idea taking system development to a whole new level.

Cheers,

Dave

Old Posts:

Thanks for reading,
Dave

Free Friday #17 – Would you trade this?

As always, happy Friday!

In this Free Friday post, I want to pose a poll question. After reading the post and viewing the graphs please respond to the poll below and I will publish the results in another post later next week.

The question is… would you trade this strategy?

First, let’s go over the strategy. The strategy was designed using GBPAUD spot data and only has three rules to determine entry. The simulation to create this strategy (and hundreds of other strategies) took less than 2 minutes.

  1. Vix[0] > Vix[1]  – Remember [1] means 1 bar ago
  2. High[4] <= Close[6]
  3. Low[6] <= High[8]

The strategy has two exit criteria. A 1.5 times 20 Period ATR profit target and 1.0 times 20 period ATR stop loss.

Here are some simple performance measures

  • January 1, 2003 to May 1, 2017 (Last 30% Out of Sample)
  • Profit $147,626.20
  • Drawdown $8,289.70
  • Win Rate 54.50%
  • Trades 198
  • Sharpe 1.78

     T-Test 3.76

Here is the strategy’s equity curve on GBPAUD. You can see the short strategy continues to perform in the out of sample period (highlighted portion of the blue line).

I’ve also plotted how the strategy performed on three other markets. It remains profitable on Crude Oil futures, Canadian Dollar futures, and AUDUSD spot. Generally, we like to see profitability across markets and assets. However, how good is good enough to pass the test?

Next, I want to share the randomized Monte Carlo test. This test re-trades the strategy 1000 times but randomizes the exit for each entry signal. It is a test to see if we have curve-fit our exits and if our entry is strong enough to remain profitable with random exits. We can see the randomized Monte Carlo test maintains general profitability. Some fare better and some worse.

Next I want to share the Noise Test. This test adds and subtracts random amounts of noise (percentage of ATR) to user-selected amounts of data creating 100 new price series with differing amounts of noise. The test then re-trades the strategy on these 100 new price series to see if profitability is maintained on price series with differing amounts of noise. You can see here that as we change the noise the performance degraded a bit and there are some signs of curve-fitting to the noise of the original price series.

Next, I want to share the forward simulator or variance testing results. In this test, we simulate the strategy forward but assume the winning percentage will degrade by 5% (user defined % in test settings). This is a useful test because things are never as rosey as our backtest results. Now we can get an idea of how things can play out in the future if the strategy were to win x% less than it did in our backtest. This is good for setting expectations of where we expect to be in the next N trades as well.

Risk of Ruin was set to $10,000 for this test. So interpreting these results… if the winning percentage in the future is 5% lower than our backtest than 23% of our simulations will have a drawdown of 10,000 or more.

This is all the information I want to provide for this poll. There are plenty more tests and information we can gather (like E-Ratio), but I want to avoid analysis by paralysis. Build Alpha licenses come with access to a 20+ video library where I explain what I look for in all the tests and features offered by Build Alpha.

If you answered yes to this poll and have a BA license then you can now generate trade-able code for MetaTrader4 in addition to the original TradeStation, MultiCharts, and NinjaTrader.


I also hope all you ES (S&P500) traders that email me caught the dip last Friday like the first Free Friday strategy did (pictured below). I posted this strategy on Twitter in 2016 and it only holds for 1 day. I know a few of you have adjusted the logic and I’m hoping you caught the whole move to new highs!

Luck in Trading and Favorable Distributions

The role of luck in (algorithmic) trading is ever present. Trading is undoubtedly a field that experiences vast amounts of randomness compared to mathematical proofs or chess, for example.

That being said, a smart trader must be conscious of the possibility of outcomes and not just a single outcome. I spoke about this in my Chatwithtraders.com/103 interview, but I want to reiterate the point as I am often asked about it to this day.

The point I want to make is that it is very important to understand the distribution your trading strategy comes from and not just make decisions off the single backtest’s results.

In the interview I spoke about this graph below that shows two different trading systems that have very similar backtests. The black line on the left represents system A’s backtest and the black line on the right represents system B’s backtest. For our intents and purposes let’s assume the two individual backtest results are “similar” enough.

The colorful lines on the left is system A simulated out (can use a variety of methods such as Monte Carlo, Bootstrapping, etc.) and the colorful lines on the right is system B simulated out using the same method. These are the possible outcomes or paths that system A and system B can take when applied to new data (Theoretically – read disclaimers about trading).

These graphs are the “distributions of outcomes” so many successful traders speak about. This picture makes it quite obvious which system you would want to trade even though system A and system B have very comparable backtests (black lines).

*There are many ways to create these “test” distributions but I will not get into specifics as BuildAlpha does quite a few of them*

This second example below demonstrates this point in another way but incorporates the role luck can have on your trading. Let’s say the blue line is the single backtest from System A (blue distribution is all possibilities). The single green line is the single backtest from System B (green possibilities).

In this graph, you can see that System A (part of the blue possibilities) was lucky and performed way better than most of the possibilities and of course better than the single backtest for System B.

You can also see that System B (part of the green possibilities) was extremely unlucky and performed way worse than most of the green possibilities.

Moving forward… do you want to count on Mother Market to give system A the same extremely favorable luck? or do you want to bet on system B’s luck evening out?

I always assume I will be close to the average of the distribution moving forward which would put us at the peaks of both of these possibilities or distributions… if that is the assumption then the choice is clear.


Build Alpha licenses now come with an instructional video series or course that goes over all the features and how to use the statistical tests the software offers. It makes spotting systems and their related distributions much easier than Build Alpha already makes it.

Thanks for reading,
Dave

3 Simple Ways To Reduce The Risk Of Curve-fitting

Curve-fitting is almost certain death for a trading or investment strategy.

So, what is curve-fitting?

Well, you know when you test a trading or investment hypothesis, fall in love with the historical results, and then the idea fails to generate similar (or even positive) returns once you decide to trade it live?

Most of the pitfalls of system trading and trading, in general, can be avoided or mitigated following these three simple techniques or rules of thumb.

  1. Use out of sample data! Out of sample data is simply withholding some of the data in your “test” period for further evaluation. For example, you have ten years of historical data and opt to put the last 30% in your back pocket. You develop a great trading strategy on the first seven years of the data set and then whip out your “out of sample” data (remaining 30% from your back pocket) and validate your findings. If the strategy fails to produce similar results in the out of sample data then you can be almost certain you have curve-fit to the first seven years of your data set. Below is a chart of a strategy built using Build Alpha that highlights the out of sample period. You like to see similar growth (and characteristics) in both the in-sample and out-of-sample data.

2. Make sure your strategy has enough occurrences or trades. This can be simply explained with a coin flip example. If you flip a coin 10 times and it lands on heads 7 times you cannot be certain you do or do not have a rigged coin. However, if you flip a coin 10,000 times and it lands on heads 7,000 times you can have very high confidence it is indeed a rigged coin. In trading, if a strategy has 30 trades then it is unlikely you would have high confidence that what you have found is legitimate. However, if a strategy holds up over 1,000 or 3,000 plus trades then you can have higher confidence you’ve discovered true edge.

Below is a photo of only 30 coin flips and below that is a photo of six different trials of 100,000 coin flips. You can see after a large number of occurrences things tend to converge toward the true expectation (also known as the Law of Large Numbers).

3. Validate your strategy across other markets. If a strategy works on only one market it has a higher chance of being curve-fit to the data set than if a strategy performs profitably on a handful of markets. I am not saying that a strategy that only works on one market is curve-fit – as there are many nuances, different players, and idiosyncrasies that exist within each market. However, if a strategy performs across markets then you can certainly have higher confidence that it is not curve-fit.

Thanks for reading,

Dave

Python Tips – Reading Text Files, Working with dates, the command line

In this post let’s talk about two Python tutorials I put together. The first one goes over how to read in a text file, format dates, and create new columns inside a data frame. A data frame is a structure that stores your data in a convenient “table” for easy access. There are a few parts, but I will break down the code below.

The first thing we will do is import pandas library and call the built-in read_csv function. The read_csv function’s first input is the name of the file you desire to read in and store in your pandas data frame. The delimiter option allows you to specify the character that separates your text fields within your file.

import pandas as pd
df = pd.read_csv("ES.txt",delimiter=',')

Just like that, we have read a text file into a pandas data frame that we can now work with. However, if we were to plot our data frame (closing prices) now the x-axis would simply be the number of bars as we did not specify an index column. In trading and time series analysis it is often nice to have dates as your x-axis.

In the next few lines of code, I import a built-in python library that can read string dates (“12/30/2007”) and convert them into Python “DateTime” objects. To simplify this… we convert dates into Python dates.

I actually accomplish this by setting the built-in pandas index column to a list of newly Python formatted dates. I essentially loop through each string date, convert it, and add it to our data frame’s index.

I then delete the original string Dates.

from dateutil import parser
df.index = [parser.parse(d) for d in df['Date']]
del df['Date']

Now we can plot our closing prices and our x-axis will be dates.

df['Close'].plot()

In the code below I create a new column called “Range”. Notice how Python understands I want to do the calculation on all of the highs and lows inside our dataframe without me specifying so!

df['Range'] = df['High'] - df['Low']

Finally, the line below plots our Close and Range in two separate plots. This is from a previous tutorial video.

df[['Close','Range']].plot(subplots=True)

The second part of this tutorial is to make our lives easier. Let’s say that we wanted to run that last program on a bunch of different stocks whenever we wanted. It would be quite annoying to open up the file or notebook and change the filename in our read_csv function every time.

Instead what we can do is create a filename variable and put the filename variable inside the read_csv function. Ideally, this filename variable could be dynamically set with user input.

This code is tricky and has a few moving parts. Below is the code and then I will explain what we did.

symbol = "ES"

import sys,getopt
myopts,args = getopt.getopt(sys.argv[1:],"s")

for o,a in myopts:

if o == '-s':symbol = str(a).upper()

filename = "%s.csv" % symbol
df = pd.read_csv(filename,delimiter=',')

First, we created a symbol variable that will accept our user input. Second, we imported some built-in libraries and called the getopt function to read user input. We also specified that our desired input would be preceded by the “s” option.

We then wrote a simple for loop to read through all the command line inputs (which in this example is only one, but this template will allow you to create multiple command line input options). We then said, “if the command line option is  ‘s’ then set symbol to whatever follows it”. We also morphed “whatever follows it” into an upper case, string variable.

We then set our filename variable and proceeded to read our text file into our data frame (df) as before.

This is complicated, but a major time saver. Please review the video as the extra 3 minutes might save you hours of our life by utilizing tricks like this!

Remember for those of you who don’t want to learn programming you can use research tools like Build Alpha to save even more time.

Best,

David

Thanks for reading,
Dave

Free Friday #16 – Market Regime Switching Models

Happy Friday!

For this Free Friday edition I want to talk about market regimes or market filters. I have a very simple intermarket filter or regime monitor to share.

The idea with market regimes or filters is to identify a condition or set of conditions that alters the market’s characteristics or risk profile. Ideally, you could find a bull and bear regime that would enable you to go long when in the bull regime and get into cash or go short when in the bear regime.

The simple regime filter I want to share was found using Build Alpha’s Intermarket signals. It only uses one rule and creates a clear bullish and bearish regime.

The rule says that if the Close of Emini S&P500 divided by the Close of the US 10 Yr Notes is less than or equal to the 10 days simple moving average of the Emini S&P500 divided by the 10 days simple moving average of the US 10 Yr Notes then we are in the bull regime.

Here it is in pseudo-code assuming eMini S&p500 is market 1 and US 10 Yr Note is market 2.

Bull = Close1/Close2 <= SMA(Market1,10) / SMA(Market2,10)
Bear=Close1/Close2  >    SMA(Market1,10) / SMA(Market2,10)

Let’s verify with some numbers that we have a discernible difference in market activity before I start flashing some charts at you.

Here are the S&P500’s descriptive statistics when in the bull regime:

  • Average Daily Return: 1.20
  • Std Dev Daily Return: 17.49
  • Annualized Information Rate:

     1.09

Here are the S&P500’s descriptive statistics when in the bear regime:

  • Average Daily Return: -0.34
  • Std Dev Daily Return: 12.11
  • Annualized Information Rate:

      -0.44

This would definitely qualify as something of interest. Let’s take a look at the equity curve going long when ES, the eMini S&P500 futures, enter into the bull regime.

It actually performed quite well with no other rules or adjustments only trading 1 contract since early 2002. It even looks to have started to go parabolic in the out of sample data (last 30% highlighted).

Build Alpha now offers another check for validity -> The ability to test strategy rules across other markets. This is very important when determining how well a rule generalizes to new (and different) data. The user can select whatever markets to compare against, but in the example below I chose the other US equity index futures contracts. You can see Nasdaq futures in gold, Russell Futures in green, and Dow Jones futures in red.

Now back to our Free Friday regime filter… Wouldn’t it be cool if the US 10 Yr Note performed well while Emini S&P500 was in the bear regime? That way instead of divesting from the S&P500 and going into cash we could invest in US 10 Yr Notes until our bull regime returned.

Well, guess what… the US 10 Yr Note Futures do perform better in the bear regime we’ve identified.

The best part is… Build Alpha now lets you test market regime switching strategies.

That is, invest in one market when the regime is good and invest in another market when the regime changes. This ability smoothed our overall equity curve and increased the profit by about 50%! Below is an equity curve going long Emini S&P500 in the bull regime and going long US 10 Yr Note Futures when the regime turns bearish.

Free Friday #15 – Downloading Custom Data for Build Alpha using Python

Happy Friday!

For this Free Friday edition, I am going to do something new. I am going to make this slightly educational and give away some code.

I get tons of questions every week, but they mainly fall into two categories. The first question is in regards to adding custom data to Build Alpha. You can add intraday data, weekly data, custom bar type data, sentiment data, or even simple single stock data. The second question is in regards to using or learning Python.

In this post, I will attempt to “kill two birds with one stone” and show a simple Python code to download stock data from the Yahoo Finance API.

In fact, we will use Python to pull and save data ready formatted for Build Alpha for all 30 Dow stocks in less than 30 seconds.

You can view the entire script later in this code or in the video below.

The first few lines are simple to import statements pulling public code that we can reuse including the popular pandas library.

import pandas as pd
from datetime import datetime
from pandas_datareader import data

I then define a function that downloads the data using the built-in DataReader function of the pandas_datareader library. I also adjust the open, high, low and close prices by the split ratio at every bar. This ensures we have a consistent time series if a stock has undergone a split, for example. **Please note other checks could be recommended like verifying high > open and high > close and high > low, but I have left these up to Yahoo in this post**. I then end the function returning a pandas data frame that contains our downloaded data. This get_data function will be valuable later in the code.

def get_data(symbol, start_date, end_date):

dat = data.DataReader(symbol, "yahoo", start_date, end_date)
dat['Ratio'] = dat['Adj Close'] / dat['Close']
dat['Open']  = dat['Open']  * dat['Ratio']
dat['High']  = dat['High']  * dat['Ratio']
dat['Low']   = dat['Low']   * dat['Ratio']
dat['Close'] = dat['Close'] * dat['Ratio']
return dat

I then go ahead and put all 30 dow tickers in a Python list named DJIA. I also go ahead and create our start and end dates in which we desire to download data.
DJIA=["AAPL","AXP","BA","CAT","CSCO","CVX","KO","DD","XOM","GE","GS","HD","IBM","INTC","JNJ","JPM","MCD","MMM","MRK","MSFT",
"NKE","PFE","PG","TRV","UNH","UTX","V","VZ","WMT","DIS"]
start = datetime(2007,1,1)
end   = datetime.today()

Finally, and the guts of this code, I loop through all 30 of our tickers calling the get_data function on each one of them. After downloading the first one, AAPL in our case, I open a file named AAPL.csv and then loop through the downloaded price series retrieved from our get_data function. I then write each bar to the file appropriately named AAPL.csv. I then close the AAPL.csv file before downloading the second symbol, AXP in our case. This process is repeated for each and every symbol. The result is 30 seconds to download 30 stocks worth of data! Each symbol’s data is saved in a file named Symbol.csv.

for ticker in DJIA:

DF = get_data(ticker,start,end)
fh = open("%s.csv" % ticker,'w+')
for i,date in enumerate(DF.index):
fh.write("%s,%.2f,%.2f,%.2f,%.2f,%dn" % (date.strftime('%Y%m%d'),DF['Open'][i],DF['High'][i],DF['Low'][i],DF['Close'][i],DF['Volume'][i]))
fh.close()

Now to the second part. Using this data in BuildAlpha is as simple as clicking on settings and searching for your desired file. I’ve attached a photo below that shows how the trader/money manager can now run tests on the newly downloaded AAPL data using the symbol “User Defined 1”. Pictures below for clarity.

I’m showing a strategy created for $AAPL stock, but it is only to prove this Python code and Build Alpha feature work. There is major selection bias creating a strategy on a stock that has basically been in a major uptrend for 90%+ of its existence. That being said, and in a later post, I will show a new Build Alpha feature that allows you to test strategies across different symbols to make sure the strategy holds up on both correlated and uncorrelated securities. Either way here is the AAPL strategy.

Buy Rules:

1.Today’s Low > Open of 3 Day’s Ago
2.Today’s 14 Period Stochastics > Yesterday’s 14 Period Stochastic
3. Today’s Upper Keltner Channel > Yesterday’s Upper Keltner Channel

Exit Rules:
1. Two Day Maximum Hold
2. 1.00 * 20 Period ATR Stop Loss

I like this strategy because it is convex. We limit the downside, but let the market give us as much as possible in 2 days. Below is the equity graph with the highlighted part being out of sample and based on 1 share as this is just for demonstration purposes!

Out of Sample Data – How the Human Can Add Value to the Automated Trading Process

First, I need to describe over-fitting or more commonly known as curve-fitting. Curve-fitting is creating a model that too “perfectly” fits your sample data and will not generalize well on new unseen data. In trading, this can be thought of as your model too closely fits the historical data and will surely fail/struggle to adapt to new live data.

Here are two visuals I found to help illustrate this idea of curve-fitting.

So how can we avoid curve-fitting? The simplest and best way to avoid curve-fitting is to use “Out of Sample” data.

We simply designate a portion of our historical data (say the last 30% of the test period) to act as our unseen or “Out of Sample” data.

We then go about our normal process designing/testing rules for trading or investing using only the first 70% of the test period or the “In Sample” data.

After finding a satisfactory trading method we pull out our Out of Sample data and test on the last 30% of our test period.

It is often said that if the model performs similarly in both the in and out of sample data then we can have increased confidence the model generalizes well enough to new data.

No need to bring up selection bias or data mining here, but I will certainly cover it in another post/video series.

How can the human add value to the automated trading process?  The intelligent reader will question why we chose 30% and why the last portion of the data (as opposed to the first 30%)?

The test period, out of sample location, and the percentage of out of sample data chosen is still very critical to the trading model’s success.

I have always heard that good science is often mostly attributable to good experimental design. In the trader’s case, good science would be setting up a proper test by choosing an appropriate test period, out of sample location, and out of sample percent.

Let’s take a look at the S&P500 from 2004 to 2017. In the chart below I have designated the last 40% of the data to be our Out of Sample data.

This means we would create a trading model on the data from 2004 to roughly 2011 – the blue In Sample data. However,  2011 to present day (red Out of Sample) has been largely straight up.

If we build a long strategy that avoids most of 2008 via some rule or filter it may certainly do well in on our Out of Sample data simply because the underlying market went straight up!

You can see the importance of intelligently selecting your test period and out of sample period’s location and size.

What if we used the first 40% of the data as our Out of Sample data? This provides a few benefits. First, it allows us to build our trading model on the most recent data or the last 60% of the data set – in our case 2009 to 2017.

Many traders will argue that they prefer to build their models on the most recent data as it is most likely the most similar to the live data they will soon experience. They then obviously test Out of Sample but just use older data and in our case 2004 to 2008 or the first 40%.

Now how did I know 40%? I simply looked at the chart and selected a percentage that would capture the financial crisis. My thought process is that if we train a model from 2009 to 2017 and then test it on 2004 to 2008 and it performs similarly in both periods then we surely have uncovered some persistent edge that generalizes over two unique sets of data. The two unique sets being our In Sample (2009 to 2017) and our Out of Sample (2004 to 2008).

Selecting a proper location and percentage is mission critical. You want to design your test to be as difficult as possible to pass – try to break your system in the testing process. If you do not, then the market will surely break it once you start live trading!

Testing design and set up is undoubtedly where the human still adds value to the automated trading process. Build Alpha allows users to leverage computational power in system design, validation, and testing; however, the test set-up in BA is still an area where a smarter, more thoughtful trader can capture an edge over his competitors and add robustness to the output.

Below I have some photos of some terrible experiment design to help drive the point home. Both present fairly simple out of sample tests to “pass”. Please watch the video above for an explanation.

The main takeaway is that the human can still add value to the automated trading process by proper test/experiment design. That is why BuildAlpha software allows the trader/money manager to adjust everything (or nothing) from out of sample percent, out of sample location, test periods, the minimum number of trades In Sample, and the minimum number of trades in the Out of Sample periods.

I hope this was helpful – catch you in the next one,

Dave

Thanks for reading,
Dave

Visualizing Data with Python

In this post I will go over a few different ways to manipulate price data to create visuals to aid in the investing and trading research process. I have attached a ten minute YouTube video that has explanations, etc. However, this post also attempts to briefly walk you through the Python code.

First, we will use some Python code to download some free data from the Yahoo Finance API. The code below creates a function called “get_data” that downloads and adjusts price data for a specified symbol over a specified period of time. I then download and store $SPY and $VIX data into a pandas dataframe.

import pandas as pd
import matplotlib.pyplot as plt
from datetime import datetime
from pandas_datareader import data
import seaborn as sns

print "Start Time: ", datetime.today().now()

def get_data(symbol, start_date, end_date):

dat = data.DataReader(symbol, "yahoo", start_date, end_date)
dat['Ratio'] = dat['Adj Close'] / dat['Close']
dat['Open'] = dat['Open'] * dat['Ratio']
dat['High'] = dat['High'] * dat['Ratio']
dat['Low'] = dat['Low'] * dat['Ratio']
dat['Close'] = dat['Close'] * dat['Ratio']
return dat

Ticker1 = get_data("SPY",datetime(2005,1,1),datetime.today())
Ticker2 = get_data("^VIX",datetime(2005,1,1),datetime.today())

df = pd.DataFrame(index=Ticker1.index)

df['spy'] = Ticker1['Close']
df['vix'] = Ticker2['Close']

This next piece of code is two ways to accomplish the same thing – a graph of both SPY and VIX. Both will create the desired plots, but in later posts we will build on why it is important to know how to plot the same graph in two different ways.

1.
df[['spy','vix']].plot(subplots=True)

2.
fig, ax = plt.subplots(figsize=(12,6))
ax = plt.subplot(211)
plt.plot(df['spy'],label='spy')
plt.legend(loc=0)

ax = plt.subplot(212)
plt.plot(df['vix'],label='vix',color='g')
plt.legend(loc=0)

The first method is simple and straight forward. The second method creates a “figure” and “axis”. We then use plt.subplot to specify how many rows, columns, and which chart we are working with. For example, ax = plt.subplot(212) means we want to set our axis to our display that has 2 rows, 1 column, and we want to work with our 2nd graph. plt.subplot(743) would be 7 rows, 4 columns, and work with the 3rd graph (of 28). You can also use commas to specify like this plt.subplot(7,4,3).

Anyways, here is the output.

The next task is to mark these graphs whenever some significant event happens. In this example, I show code that marks each time SPY falls 10 points or more below its 20 period simple moving average. I then plot SPY and mark each occurrence with a red diamond. I also added a line of code that prints a title, “Buying Opportunities?”, on our chart.

df['MovAvg'] = Ticker1['Close'].rolling(20).mean()
markers = [idx for idx,close in enumerate(df['spy']) if df['MovAvg'][idx] - close &gt= 10]
plt.suptitle("Buying Opportunities?")
plt.plot(df['spy'],marker='D',markerfacecolor='r',markevery=markers)

This code creates a python list named markers. In this list we loop through our SPY data and if our condition is true (price is 10 or more points below the moving average) we store the bar number in our markers list. In the plot line we specify the shape of our marker as a diamond using ‘D’, give it the color red using ‘r’, and mark each point in our markers list using the markevery option. The output of this piece of the code is below.

Next, and simply, I show some code on how to shade an area of the chart. This may be important if you are trying to specify different market regimes and want to visualize when one started or ended. In this example I use the financial crisis and arbitrarily defined it by the dates October 2007 to March 2009. The code below is extremely simple and we only introduce the axvspan function. It takes a start and stopping point of where shading should exist. The code and output are below.

fig, ax = plt.subplots()
ax.plot(df['spy'],label='spy')
ax.axvspan(datetime(2007,10,1), datetime(2009,3,9), alpha=0.5, color='red')

Personally I do not like the shading of graphs, but prefer the changing of the lines colors. There are a few ways to do this, but this is the simplest work around for this post. I create two empty lists for our x and y values named marked_dates and marked_prices. These will contain the points we want to plot with an alternate color. I then loop through the SPY data and say if date is within our financial crisis window then add the date to our x list and add the price to our y list. I do this with the code below.

marked_dates = []
marked_prices = []

for date,close in zip(df.index,df['spy']):

if date >= datetime(2007,10,1) and date <= datetime(2009,3,9):marked_dates.append(date)
marked_prices.append(close)

I then plot our original price series and then also plot our new x’s and y’s to overlap our original series. The new x’s and y’s are colored red whereas our original price series is plotted with default blue. The code and output is below.

fig,ax = plt.subplots()
ax.plot(df['spy'],label='spy')
ax.plot(marked_dates,marked_prices,color='r',label='financial crisis')
plt.legend(loc=0)

That’s it for this post, but I hope this info helps you in visualizing your data. Please let me know if you enjoy these Python tutorial type posts and I will keep doing them – I know there is a huge interest in Python due to its simplicity.

Also, I understand there may be simpler or more “pythonic” ways to accomplish some of these things. I am often writing this code with intentions of simplifying the code for mass understanding, unaware of the better ways, or attempting to build on these blocks in later posts.

Cheers,

Dave


It has been brought to my attention that Yahoo Finance has changed their API and this code will no longer work. However, we can simply change the get_data function to the code below to call from the Google Finance API

def get_data(symbol,start_date,end_date):

dat = data.DataReader(symbol,"google",start_date,end_date)
dat = dat.dropna()
return dat

Google adjusts their data so we do not have to. So I removed those lines. I also swapped out ‘yahoo’ for ‘google’ in the DataReader function parameters. Google’s data is also not as clean so I added a line to drop NaN values. That’s it. Simple adjustment to change data sources.

Thanks for reading,
Dave