-
Notifications
You must be signed in to change notification settings - Fork 123
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Commitment to keep package size down #1942
Comments
This might only be a neglible win, but I think you could remove these by adding Lines 211 to 213 in 78f8c0a
That rule seems like a poor fit for narwhals/stable/v1/__init__.py:2325: error: Unused "type: ignore" comment [unused-ignore] |
Little bit of #1942 When combined with another change, a lot of cases might become one-liners (#1657 (comment))
This is an interesting one. Thinking about it, I'm not sure how much trying to save on whitespace characters and docstrings would help, when comparing to other packages which take about 100MB after installation. I very much appreciate narwhals being very lightweight, but I'd say as long as it doesn't have any dependencies, it's VERY lightweight, almost independent of saving on little things. As for including features only if a usecase is there, I think of it as a double edged sword kinda thing:
So I guess the tl;dr; here for me is:
but I wouldn't worry about the package having a 5MB download size. |
Thanks for your comments, much appreciated! Maybe we don't indeed need a strict cap, but I would like to keep closely monitoring size - I don't think any dataframe library started thinking they'd get 400MB wheels, but it is where things tend to go if unchecked (seriously, the PySpark 4.0 wheel is >400MB, wut 🤯 https://pypi.org/project/pyspark/4.0.0.dev2/#files) I'd still like to suggest slowing down on new features, so that we can focus on:
We can then resume adding features (filling out the |
Narwhals started off with the objective of being a lightweight compatibility layer. As we've been adding feature and supported backends, the package size has been growing
There's a lot of essential dataframe functionality, and a lot of libraries that people want to support, so some increase in size since the earliest days is expected. But we do need to monitor it, it does need to stay under control, and Narwhals does need to stay lightweight.
Commitment: I'd like to suggest a hard commitment that:
pip install narwhals
. This includes some cached files which Python generates, but still, I think it's good to monitor the overall size). It's currently 3789 kB.#1886 will probably increase our size a bit more. I think that's OK, as Ibis is a library that a few maintainers have said that they want to support. But it does bring us closer to the limits.
Some strategies to reduce size are:
name
namespaces to lower code duplication #1876 is a nice example, and I think there's more opportunities to do thisis_duplicated
just the negation ofis_unique
?Series.hist
is fine because it's been requested by Marimo, so it has a clear use case. Anything without a clear use-case, I think we may need to put the brakes on, at least until Request for contributions: Ibis support #1886 is resolvedAny help towards this goal would be appreciated - thank you, and thank you to everyone who has contributed in any way to Narwhals 🙏
The text was updated successfully, but these errors were encountered: