You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently some functions under flight_safety/queries are provided to access to the data base filter following a criteria and return a DataFrame with the suitable column types.
Issues:
The filtering strategy is not very flexible, as it only covers the study case for the talks: accidents for far_parts 121, 125. Coding more and more functions is not an option and passing the filtering options as arguments would lead to tones of coding too. We need a viable alternative here.
It seems that pandas dtype inference is not working well (probably due to data quality) and anyway, some object columns must be converted to categorical. Is there a better way than typing everything at the beginning of the script?
Regarding the filtering: the more functions you provide, the more you approach to reimplement SQL... SQLAlchemy won't be of much help here, since it's intended to serve as an abstraction layer to several databases, and if I understand correctly, our data will always be in SQLite. For arbitrary queries, I think users will have to learn SQL anyway.
If pandas dtype inference is not working well, some manual work will be needed... This is unavoidable and someone has to do it. A series of transformations should be written to prepare the data and present it with a clean schema. Welcome to the "data wrangling" world 🤠
This is related to #2, but given that we have to transform the database from access, would it be a good approach to transform it a to an hdf5 file with the suitable dtypes? That would remove the dtype inference from filtering functions and we wouldn't use SQL anymore.
Currently some functions under
flight_safety/queries
are provided to access to the data base filter following a criteria and return a DataFrame with the suitable column types.Issues:
Would something like sqlalchemy (https://www.sqlalchemy.org/) or pypika (https://github.com/kayak/pypika) help here?
The text was updated successfully, but these errors were encountered: