Python Tips – Reading Text Files, Working with dates, the command line

In this post let’s talk about two Python tutorials I put together. The first one goes over how to read in a text file, format dates, and create new columns inside a data frame. A data frame is a structure that stores your data in a convenient “table” for easy access. There are a few parts, but I will break down the code below.

The first thing we will do is import pandas library and call the built-in read_csv function. The read_csv function’s first input is the name of the file you desire to read in and store in your pandas data frame. The delimiter option allows you to specify the character that separates your text fields within your file.

import pandas as pd
df = pd.read_csv("ES.txt",delimiter=',')

Just like that we have read a text file into a pandas dataframe that we can now work with. However, if we were to plot our data frame (closing prices) now the x-axis would simply be the number of bars as we did not specify an index column. In trading and time series analysis it is often nice to have dates as your x-axis.

In the next few lines of code  I import a bulit-in python library that can read string dates ("12/30/2007") and convert them into Python "DateTime" objects. To simplify this... we convert dates into Python dates.

I actually accomplish this by setting the built-in pandas index column to a list of newly Python formatted dates. I essentially loop through each string date, convert it, and add it to our data frame's index.

I then delete the original string Dates.

from dateutil import parser
df.index = [parser.parse(d) for d in df['Date']]
del df['Date']

Now we can plot our closing prices and our x-axis will be dates.

df['Close'].plot()

In the code below I create a new column called “Range”. Notice how Python understands I want to do the calculation on all of the highs and lows inside our dataframe without me specifying so!

df['Range'] = df['High'] - df['Low']

Finally, the line below plots our Close and Range in two separate plots. This is from a previous tutorial video.

df[['Close','Range']].plot(subplots=True)

The second part of this tutorial is to make our lives easier. Let’s say that we wanted to run that last program on a bunch of different stocks whenever we wanted. It would be quite annoying to open up the file or notebook and change the filename in our read_csv function every time.

Instead what we can do is create a filename variable and put the filename variable inside the read_csv function. Ideally, this filename variable could be dynamically set with user input.

This code is tricky and has a few moving parts. Below is the code and then I will explain what we did.

symbol = "ES"

import sys,getopt
myopts,args = getopt.getopt(sys.argv[1:],"s")

for o,a in myopts:

    if o == '-s':

      symbol = str(a).upper()

filename = "%s.csv" % symbol
df = pd.read_csv(filename,delimiter=',')

First, we created a symbol variable that will accept our user input. Second, we imported some built-in libraries and called the getopt function to read user input. We also specified that our desired input would be preceded by the "s" option.

We then wrote a simple for loop to read through all the command line inputs (which in this example is only one, but this template will allow you to create multiple command line input options). We then said, "if the command line option is 's' then set symbol to whatever follows it". We also morphed "whatever follows it" into an upper case, string variable.

We then set our filename variable and proceeded to read our text file into our data frame (df) as before.

This is complicated, but a major time saver. Please review the video as the extra 3 minutes might save you hours of our life by utilizing tricks like this!

Remember for those of you who don't want to learn programming you can use research tools like Build Alpha to save even more time.

Best,

David

Leave a Reply

Your email address will not be published. Required fields are marked *