-
Notifications
You must be signed in to change notification settings - Fork 904
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pretty printing: AbstractDataset.__repr__
#3980
Comments
I've prototyped two different approaches for printing:
I'm curious about what you think. Does it feel good enough? Do we want more or less information provided? Which approach seems better? |
One caveats if you are doing |
The non-indented version was preferred to the indented, closing #3991 |
I think this is great! Inspired by @astrojuanlu 's post about the sklearn (provided slack link, linen not working for me) I had a think about what we could do with The main bits of novelty are:
Code
import importlib
import re
import sys
import yaml
import kedro
from kedro.io.data_catalog import DataCatalog
csv = """
cars:
type: pandas.CSVDataset
filepath: data/01_raw/company/cars.csv
load_args:
sep: ","
na_values: ["#NA", NA]
save_args:
index: False
date_format: "%Y-%m-%d %H:%M"
decimal: .
"""
catalog = DataCatalog.from_config(yaml.safe_load(csv))
def find_shortest_import_path(class_path):
parts = class_path.split('.')
class_name = parts[-1]
# Iterate over the possible module paths
for i in range(1, len(parts)):
module_path = ".".join(parts[:i])
try:
module = importlib.import_module(module_path)
# If the module contains the class name, return the shortest path
if hasattr(module, class_name):
return f"{module_path}.{class_name}"
except ModuleNotFoundError:
continue
# If no shorter path found, return the full class path
return class_path
def generate_docs_url(class_path: str) -> str:
base_module = 'kedro_datasets'
if class_path.startswith('kedro_datasets'):
try:
split_class_path = find_shortest_import_path(class_path).split('.')
module_import = split_class_path[0]
module_install = base_module.replace('_','-')
version = importlib.import_module('kedro_datasets').__version__
short_class_path = f"{split_class_path[1]}.{split_class_path[2]}"
prefix = "https://docs.kedro.org/projects"
module_loc = f"{module_install}-{version}"
class_loc = f"{module_import}.{short_class_path}.html"
url = f"{prefix}/{module_install}/en/{module_loc}/api/{class_loc}"
return url
except ModuleNotFoundError:
return None
return None
import rich
from rich import box
from rich.layout import Layout
from rich.syntax import Syntax
from rich.table import Table
def build_rich_repr(ds: kedro.io.AbstractDataset):
load_args = yaml.safe_dump({"load_args":getattr(ds,'_load_args')}).strip()
save_args = yaml.safe_dump({"save_args":getattr(ds,'_save_args')}).strip()
help_message = re.sub("(\n|\t|\s{1,5})", " ", ds.__doc__.split('\n\n')[0].replace('`',''))
class_path = f"{ds.__module__}.{ds.__class__.__name__}"
docs_url = generate_docs_url(class_path)
import_statement = f"from {ds.__module__} import {ds.__class__.__name__}"
layout = Layout()
theme = dict(theme="default") if 'ipykernel' in sys.modules else dict()
syntax_args = dict(padding=1, line_numbers=True, word_wrap=True, **theme)
load_args_r = Syntax(load_args, "yaml", **syntax_args)
save_args_r = Syntax(save_args, "yaml", **syntax_args)
import_statement_r = Syntax(import_statement, "python", **syntax_args)
t = Table(
"Attribute", "Value",
padding=1,
show_header=False,
box=box.SIMPLE
)
t.add_row("Class documentation", f'[b][link={docs_url}]{class_path}[/link][/b]')
t.add_row("Docstring snippet", help_message)
t.add_row("Load arguments", load_args_r)
t.add_row("Save arguments", save_args_r)
t.add_row("Import statement\n[i](useful for REPL testing)[/]", import_statement_r)
return t
rich.print(build_rich_repr(catalog.datasets.cars)) Would yield this in Jupyter Or this in IPython: Or this in the default mac os terminal (the hyperlink doesn't work here): |
One other piece - does your implementation @ElenaKhaustova expose encrypted database connection strings? would this YAML:
render our the constructed connection string or not? |
What is exposed depends on the implementation of
So none of the credentials or connection string are exposed. |
Solved in #3987 |
Description
Parent ticket: #3913
Implement
__repr__
forAbstractDataset
for better dataset representation and printing and further use it withinDataCatalog.__repr__
Context
#3913 (comment)
#1721
Possible Implementation
__str__
method forAbstractDataset
based on the dataset's_describe
which can be adjusted and moved to__repr__
._describe
forMemoryDataset
,LambdaDataset
,SharedMemoryDataset
, andCachedDataset
if needed.pprint.PrettyPrinter
.The text was updated successfully, but these errors were encountered: