Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Database access #4

Open
AlexS12 opened this issue Dec 9, 2017 · 4 comments
Open

Database access #4

AlexS12 opened this issue Dec 9, 2017 · 4 comments

Comments

@AlexS12
Copy link
Member

AlexS12 commented Dec 9, 2017

Currently some functions under flight_safety/queries are provided to access to the data base filter following a criteria and return a DataFrame with the suitable column types.

Issues:

  • The filtering strategy is not very flexible, as it only covers the study case for the talks: accidents for far_parts 121, 125. Coding more and more functions is not an option and passing the filtering options as arguments would lead to tones of coding too. We need a viable alternative here.
  • It seems that pandas dtype inference is not working well (probably due to data quality) and anyway, some object columns must be converted to categorical. Is there a better way than typing everything at the beginning of the script?

Would something like sqlalchemy (https://www.sqlalchemy.org/) or pypika (https://github.com/kayak/pypika) help here?

@astrojuanlu
Copy link
Member

Regarding the filtering: the more functions you provide, the more you approach to reimplement SQL... SQLAlchemy won't be of much help here, since it's intended to serve as an abstraction layer to several databases, and if I understand correctly, our data will always be in SQLite. For arbitrary queries, I think users will have to learn SQL anyway.

If pandas dtype inference is not working well, some manual work will be needed... This is unavoidable and someone has to do it. A series of transformations should be written to prepare the data and present it with a clean schema. Welcome to the "data wrangling" world 🤠

@AlexS12
Copy link
Member Author

AlexS12 commented Dec 9, 2017

This is related to #2, but given that we have to transform the database from access, would it be a good approach to transform it a to an hdf5 file with the suitable dtypes? That would remove the dtype inference from filtering functions and we wouldn't use SQL anymore.

@astrojuanlu
Copy link
Member

I don't have experience reading multi-table HDF5 files and not everyone loves the format, so I cannot comment here.

@AlexS12
Copy link
Member Author

AlexS12 commented Dec 9, 2017

Thanks for the link! Moving this kind of data from a SQL database to an hdf5 does not seem a sensible option.

@AlexS12 AlexS12 mentioned this issue Jan 2, 2018
1 task
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants