Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Docstring guidelines #1943

Closed
MarcoGorelli opened this issue Feb 5, 2025 · 9 comments · Fixed by #1957
Closed

Docstring guidelines #1943

MarcoGorelli opened this issue Feb 5, 2025 · 9 comments · Fixed by #1957
Labels
documentation Improvements or additions to documentation

Comments

@MarcoGorelli
Copy link
Member

MarcoGorelli commented Feb 5, 2025

In #1942 I mentioned a desire to keep the package size down. One way to do this is by reducing overly-long docstrings.

Here are some general guidelines:

  • The examples should be clear and to-the-point
  • Most docstrings should just be:
    • import dataframe library (usually pandas, pyarrow, or duckdb)
    • import narwhals
    • construct native dataframe. Make it as small as possible: 2 rows and 2 columns should be plenty for most docstrings
    • call nw.from_native and apply operation
    • only show a single example
  • Variety between docstrings is encouraged, they don't all need to follow the same exact template. They can use different libraries, some can end with to_native, some can use selectors or nw.all...this is all OK. Just keep it minimal
  • Essential functionality (such as DataFrame.filter, DataFrame.with_columns, DataFrame.select), it's OK to have even 2-3 examples showing different ways to use it
  • For examples which use null values, use a library with proper support for them (i.e. not pandas with classical data types). Such docstrings should also link to the pandas null handling page

Also, the reason many of the current docstrings are so long is because that's how I'd originally encouraged them to be. I think that was a mistake on my part, I need to acknowledge this, and as the project has evolved, it's time to revisit

@MarcoGorelli MarcoGorelli added the documentation Improvements or additions to documentation label Feb 5, 2025
@dangotbanned
Copy link
Member

dangotbanned commented Feb 5, 2025

Have you thought about omitting the Returns: ... section, when it doesn't add context to/simplify the return annotation?

In (#1924) I'm taking inspiration from these methods, but they seem to state the same thing 3 times:

def to_pandas(self: Self) -> pd.DataFrame:
"""Convert this DataFrame to a pandas DataFrame.
Returns:
A pandas DataFrame.

def to_polars(self: Self) -> pl.DataFrame:
"""Convert this DataFrame to a polars DataFrame.
Returns:
A polars DataFrame.

narwhals/narwhals/dataframe.py

Lines 2014 to 2018 in 46fec73

def to_arrow(self: Self) -> pa.Table:
r"""Convert to arrow table.
Returns:
A new PyArrow table.

All of the above all seem less useful than cases like below:

Returns:
Object of class that user started with.

Returns:
DataFrame, LazyFrame, Series, or original object, depending
on which combination of parameters was passed.

Returns:
Decorated function.

narwhals/narwhals/functions.py

Lines 2211 to 2212 in 46fec73

Returns:
A "when" object, which `.then` can be called on.

@MarcoGorelli
Copy link
Member Author

MarcoGorelli commented Feb 5, 2025

That would be nice, but unfortunately it seems we need the Returns section for the type annotations to appear in the rendered docstrings https://narwhals-dev.github.io/narwhals/api-reference/expr/

If we had a way to get the return type annotation to appear in the signature in the rendered docstrings, then maybe it'd be ok to remove the 'returns' in the docstrings


TBH it might not be a bad idea to use Sphinx / readthedocs for the API reference, and keep mkdocs for the user guide, like how Polars does

@dangotbanned
Copy link
Member

but unfortunately it seems we need the Returns section for the type annotations to appear in the rendered docstrings

Interesting 🤔

So are you saying mkdocs needs to have both of these, if you only need the signature?

Image

Image

def from_native(
native_object: IntoFrameT | IntoSeriesT | IntoFrame | IntoSeries | T,
*,
strict: bool | None = None,
pass_through: bool | None = None,
eager_only: bool = False,
series_only: bool = False,
allow_series: bool | None = None,
) -> LazyFrame[IntoFrameT] | DataFrame[IntoFrameT] | Series[IntoSeriesT] | T:
"""Convert `native_object` to Narwhals Dataframe, Lazyframe, or Series.
Arguments:
native_object: Raw object from user.
Depending on the other arguments, input object can be:
- a Dataframe / Lazyframe / Series supported by Narwhals (pandas, Polars, PyArrow, ...)
- an object which implements `__narwhals_dataframe__`, `__narwhals_lazyframe__`,
or `__narwhals_series__`
strict: Determine what happens if the object can't be converted to Narwhals:
- `True` or `None` (default): raise an error
- `False`: pass object through as-is
**Deprecated** (v1.13.0):
Please use `pass_through` instead. Note that `strict` is still available
(and won't emit a deprecation warning) if you use `narwhals.stable.v1`,
see [perfect backwards compatibility policy](../backcompat.md/).
pass_through: Determine what happens if the object can't be converted to Narwhals:
- `False` or `None` (default): raise an error
- `True`: pass object through as-is
eager_only: Whether to only allow eager objects:
- `False` (default): don't require `native_object` to be eager
- `True`: only convert to Narwhals if `native_object` is eager
series_only: Whether to only allow Series:
- `False` (default): don't require `native_object` to be a Series
- `True`: only convert to Narwhals if `native_object` is a Series
allow_series: Whether to allow Series (default is only Dataframe / Lazyframe):
- `False` or `None` (default): don't convert to Narwhals if `native_object` is a Series
- `True`: allow `native_object` to be a Series
Returns:
DataFrame, LazyFrame, Series, or original object, depending
on which combination of parameters was passed.
"""

I would have thought they were independent

@FBruzzesi
Copy link
Member

Should we just add these guidelines to the contributing guide? If I am new to a repo/project, I would not be searching for issues to know how to write docs 🙈

@dangotbanned
Copy link
Member

@MarcoGorelli not sure how this works, but...

Found an example function with overloads and no returns section (via https://mkdocstrings.github.io/#used-by)

https://textual.textualize.io/api/logger/#textual.work

Image

Image

@MarcoGorelli
Copy link
Member Author

thanks - so, if there's overloads, then the signature shows, but not if there's not (e.g. https://textual.textualize.io/api/logger/#textual.on)

looks like for on the return type isn't visible at all from the docs?

@dangotbanned
Copy link
Member

thanks - so, if there's overloads, then the signature shows, but not if there's not (e.g. textual.textualize.io/api/logger#textual.on)

looks like for on the return type isn't visible at all from the docs?

Ah that's odd.
I only mentioned overloads in reference to it being complex - didn't expect it to factor in here.

Probably worth a look through some more of the used by to see if someone else has figured it out

I think that style of the signature could be an improvement for from_native()

@DeaMariaLeon
Copy link
Member

it might not be a bad idea to use Sphinx / readthedocs for the API reference, and keep mkdocs for the user guide, like how Polars does

That's something I dislike about polars docs. You search something in the guide and you miss the API part. You have to go to the API site. IMHO not great when you are learning about it or you only use it occasionally.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants