-
Notifications
You must be signed in to change notification settings - Fork 904
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[DataCatalog]: Pretty printing #3913
Comments
Possibly a couple of bottlenecks for meaningful progress here are
|
Notice also that, despite being somewhat limited in IPython, on Jupyter we can go crazy. See what Dask and Xarray do: https://tutorial.dask.org/00_overview.html |
This can be solved if #3932 is done, but the last is a much more complex task, and it's unclear whether/when it will be done. |
This feels like a duplicate of a #1721 rather than just being related. Can we merge these tickets in a way and create something more actionable as sub task?
That sounds like the most obvious task we could start with? cc: @noklam @astrojuanlu |
I think that comment from #1721 referred to Agreed with having either this or #1721 be the main parent issue, closing the other, and opening sub-issues for specific classes, namely
|
TL;DR I agree we should keep one of the two tickets only.
I am not sure #3932 is the full solution to this. The use case of (de)serialisation of I think I am holding my I have the use case of interacting with big Kedro project in mind here. I agree we should keep one of the two tickets only. This ticket is not necessarily contradict with #1721, it could be part of the solution to this use case. i.e. Top level information that I would love to know:
If I need further details of the datasets, i will then access it through a specific API (it shouldn't print all datasets details when I do
|
Sorry @noklam I didn't quite get if you're in favour or against, could you clarify? |
@astrojuanlu TL;DR, I think we should close one of the ticket. I don't want to get into the solution yet since I think the scope is not clear enough yet. (Something need to be done for
The former should be handled separately, while "printing" in my mind is specific to interactive workflow (a.k.a Notebook 95% the time) I think there are 2 approaches:
This is actually not too bad, but unfortunately framework and component is competing here. For example |
We can extend the built-in @astrojuanlu In notebooks, you can do quite a bit by providing an HTML representation. Scikit-learn also does this. If this is not super urgent, I would be happy to prototype something, as I have some recent experience working with the pretty printer (and, because doing so is quite non-trivial, if you've never done it before). |
A summary of what we suggest:
|
Related issue: kedro-org/kedro-plugins#769 |
Description
Compiling the catalog at runtime hinders users' ability to assess its structure and contents effectively. They express the need for an improved visual representation of the catalog when printing.
We propose:
catalog.print()
/catalog.__repr__
function specifically tailored to improve the visual representation of the catalog when printed or displayed.Relates to #1721
Context
There is no a particular requirement from user side on how output should look like. They mention that they expect to have something better than the following to help them understand what's in the catalog when debugging:
"I have Kedro jupyter notebook opening. I have a catalog object. Maybe we could have a nicer representation when you do that. You know how many data sets are available, things like that potentially."
"pretty printing (print(catalog) should give something understandable than a long, a minima catalog.list() and maybe details on the dataset)"
Current catalog printing output
Catalog datasets printing output
The text was updated successfully, but these errors were encountered: