-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] Fix invalid data loading in csvdir bundle #2223
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @ASzot thanks for the PR! I have two questions for you below. Also, it looks like tests failed on this run;. You can check if tests pass locally by running:
nosetests tests.data.bundles.test_csvdir:CSVDIRBundleTestCase
zipline/data/bundles/csvdir.py
Outdated
fname = [fname for fname in files | ||
if '%s.csv' % symbol in fname][0] | ||
except IndexError: | ||
dfr = read_csv(os.path.join(csvdir, fname), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we remove the second call to read_csv()
below on line 185? We're already calling it here and exiting early if we don't have the file we want.
zipline/data/bundles/csvdir.py
Outdated
except IndexError: | ||
dfr = read_csv(os.path.join(csvdir, fname), | ||
parse_dates=[0], | ||
infer_datetime_format=False, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why wouldn't we want to infer the datetime format?
Calculating the name of CSV file from the symbol name is not done corretly. The current method would match a filename to a symbol if it ended in the symbol concatenated with
.csv
. This means that if there are two stocks Ford (symbol F) and Regions Financial Corporation (symbol RF). Then the calculated filename to load for Ford could be RF.csv which is clearly not the data file for Ford.This is clearly a bug.