Visualizing Data with Python
In this post I will go over a few different ways to manipulate price data to create visuals to aid in the investing and trading research process. I have attached a ten minute YouTube video that has explanations, etc. However, this post also attempts to briefly walk you through the Python code.
First, we will use some Python code to download some free data from the Yahoo Finance API. The code below creates a function called “get_data” that downloads and adjusts price data for a specified symbol over a specified period of time. I then download and store $SPY and $VIX data into a pandas dataframe.
import pandas as pd
import matplotlib.pyplot as plt
from datetime import datetime
from pandas_datareader import data
import seaborn as sns
print "Start Time: ", datetime.today().now()
def get_data(symbol, start_date, end_date):
dat = data.DataReader(symbol, "yahoo", start_date, end_date)
dat['Ratio'] = dat['Adj Close'] / dat['Close']
dat['Open'] = dat['Open'] * dat['Ratio']
dat['High'] = dat['High'] * dat['Ratio']
dat['Low'] = dat['Low'] * dat['Ratio']
dat['Close'] = dat['Close'] * dat['Ratio']
return dat
Ticker1 = get_data("SPY",datetime(2005,1,1),datetime.today())
Ticker2 = get_data("^VIX",datetime(2005,1,1),datetime.today())
df = pd.DataFrame(index=Ticker1.index)
df['spy'] = Ticker1['Close']
df['vix'] = Ticker2['Close']
This next piece of code is two ways to accomplish the same thing – a graph of both SPY and VIX. Both will create the desired plots, but in later posts we will build on why it is important to know how to plot the same graph in two different ways.
1.df[['spy','vix']].plot(subplots=True)
2.fig, ax = plt.subplots(figsize=(12,6))
ax = plt.subplot(211)
plt.plot(df['spy'],label='spy')
plt.legend(loc=0)
ax = plt.subplot(212)
plt.plot(df['vix'],label='vix',color='g')
plt.legend(loc=0)
The first method is simple and straight forward. The second method creates a “figure” and “axis”. We then use plt.subplot to specify how many rows, columns, and which chart we are working with. For example, ax = plt.subplot(212) means we want to set our axis to our display that has 2 rows, 1 column, and we want to work with our 2nd graph. plt.subplot(743) would be 7 rows, 4 columns, and work with the 3rd graph (of 28). You can also use commas to specify like this plt.subplot(7,4,3).
Anyways, here is the output.
The next task is to mark these graphs whenever some significant event happens. In this example, I show code that marks each time SPY falls 10 points or more below its 20 period simple moving average. I then plot SPY and mark each occurrence with a red diamond. I also added a line of code that prints a title, “Buying Opportunities?”, on our chart.
df['MovAvg'] = Ticker1['Close'].rolling(20).mean()
markers = [idx for idx,close in enumerate(df['spy']) if df['MovAvg'][idx] - close >= 10]
plt.suptitle("Buying Opportunities?")
plt.plot(df['spy'],marker='D',markerfacecolor='r',markevery=markers)
This code creates a python list named markers. In this list we loop through our SPY data and if our condition is true (price is 10 or more points below the moving average) we store the bar number in our markers list. In the plot line we specify the shape of our marker as a diamond using ‘D’, give it the color red using ‘r’, and mark each point in our markers list using the markevery option. The output of this piece of the code is below.
Next, and simply, I show some code on how to shade an area of the chart. This may be important if you are trying to specify different market regimes and want to visualize when one started or ended. In this example I use the financial crisis and arbitrarily defined it by the dates October 2007 to March 2009. The code below is extremely simple and we only introduce the axvspan function. It takes a start and stopping point of where shading should exist. The code and output are below.
fig, ax = plt.subplots()
ax.plot(df['spy'],label='spy')
ax.axvspan(datetime(2007,10,1), datetime(2009,3,9), alpha=0.5, color='red')
Personally I do not like the shading of graphs, but prefer the changing of the lines colors. There are a few ways to do this, but this is the simplest work around for this post. I create two empty lists for our x and y values named marked_dates and marked_prices. These will contain the points we want to plot with an alternate color. I then loop through the SPY data and say if date is within our financial crisis window then add the date to our x list and add the price to our y list. I do this with the code below.
marked_dates = []
marked_prices = []
for date,close in zip(df.index,df['spy']):
if date >= datetime(2007,10,1) and date <= datetime(2009,3,9):
marked_dates.append(date)
marked_prices.append(close)
I then plot our original price series and then also plot our new x’s and y’s to overlap our original series. The new x’s and y’s are colored red whereas our original price series is plotted with default blue. The code and output is below.
fig,ax = plt.subplots()
ax.plot(df['spy'],label='spy')
ax.plot(marked_dates,marked_prices,color='r',label='financial crisis')
plt.legend(loc=0)
That’s it for this post, but I hope this info helps you in visualizing your data. Please let me know if you enjoy these Python tutorial type posts and I will keep doing them – I know there is a huge interest in Python due to its simplicity.
Also, I understand there may be simpler or more “pythonic” ways to accomplish some of these things. I am often writing this code with intentions of simplifying the code for mass understanding, unaware of the better ways, or attempting to build on these blocks in later posts.
Cheers,
Dave
It has been brought to my attention that Yahoo Finance has changed their API and this code will no longer work. However, we can simply change the get_data function to the code below to call from the Google Finance API
def get_data(symbol,start_date,end_date):
dat = data.DataReader(symbol,"google",start_date,end_date)
dat = dat.dropna()
return dat
Google adjusts their data so we do not have to. So I removed those lines. I also swapped out ‘yahoo’ for ‘google’ in the DataReader function parameters. Google’s data is also not as clean so I added a line to drop NaN values. That’s it. Simple adjustment to change data sources.
Thanks for reading,
Dave
Thanks for the post! I decided to pick up with my dip into the python waters again this week. Your post is well-timed.
For some reason, in the ‘buying opportunities’ part of the code, I got the right display except that vix was also plotted below it. Not sure why! An exercise to figure out…
Anytime. To be honest, it is probably something simple like forgetting to re-comment out the first group of plots.
Thanks for sharing the post, very helpful and useful!
Awesome to hear. Thanks for reading!