Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce core dependencies, split in optional dependencies #2265

Merged
merged 2 commits into from
Sep 27, 2024
Merged

Reduce core dependencies, split in optional dependencies #2265

merged 2 commits into from
Sep 27, 2024

Conversation

EwoutH
Copy link
Member

@EwoutH EwoutH commented Sep 1, 2024

This PR introduces a more flexible dependency management system for Mesa, allowing users to choose which additional components they need while maintaining core functionality.

Motivation

This change allows for more tailored installations, reducing unnecessary dependencies and potential conflicts, while ensuring core functionality remains intact.

Key Changes

  1. Updated pyproject.toml to define new optional dependency groups:
    • [network]: Network-related dependencies
    • [viz]: Visualization dependencies
    • [rec]: Recommended dependencies (includes network and viz)
    • [all]: All dependencies, including developer dependencies
    • Additional groups for development, examples, and documentation
  2. Updated README.md with new installation instructions for optional dependencies.
  3. Core dependencies now only include essential packages (numpy, pandas, tqdm).

Installation Options

Users can now customize their Mesa installation:

# Basic installation with core dependencies
pip install -U --pre mesa

# Install with recommended dependencies
pip install -U --pre mesa[rec]

# Install with specific additional dependencies
pip install -U --pre mesa[network,viz]

# Install all dependencies (including dev dependencies)
pip install -U --pre mesa[all]

Impact

  • Existing projects using only core Mesa functionality should not be affected.
  • Projects using network or visualization features may need to update their installation to include the appropriate optional dependencies.

Points to Review

  1. Are the chosen dependency groups ([network], [viz], [rec], [all]) appropriate and sufficient?
  2. Is the README clear about the new installation options?
  3. Are there any parts of the documentation or examples that need updating to reflect these changes?

Closes #626

Copy link

Performance benchmarks:

Model Size Init time [95% CI] Run time [95% CI]
BoltzmannWealth small 🔵 +1.1% [+0.0%, +2.1%] 🔵 -0.9% [-1.0%, -0.7%]
BoltzmannWealth large 🔵 -0.3% [-1.0%, +0.5%] 🟢 -10.5% [-14.6%, -7.2%]
Schelling small 🔵 -1.5% [-2.0%, -1.0%] 🔵 -2.6% [-3.2%, -2.0%]
Schelling large 🔵 -2.0% [-2.8%, -1.1%] 🟢 -8.9% [-10.8%, -6.8%]
WolfSheep small 🔵 -2.3% [-2.7%, -2.0%] 🔵 -1.9% [-2.2%, -1.6%]
WolfSheep large 🔵 -1.0% [-1.9%, -0.0%] 🔵 -3.6% [-5.1%, -2.4%]
BoidFlockers small 🟢 -5.3% [-5.7%, -4.9%] 🟢 -4.3% [-5.0%, -3.6%]
BoidFlockers large 🟢 -4.8% [-5.1%, -4.5%] 🔵 -2.7% [-3.3%, -2.0%]

@quaquel
Copy link
Member

quaquel commented Sep 20, 2024

  1. Do we still want to allow direct imports for model and agents (like from mesa import Agent, Model)?

Yes, I am strongly in favor of a simple high-level API for the core functionality. Why should a novice user bother with the internal module structure of MESA?

  1. Do we still want to allow direct imports like for space and time (like from mesa import RandomActivation, SingleGrid)?

Again, yes. The main issue is what is part of the default set of things that users might use for 80% of the models?

3 Do we want to allow direct imports for DataCollector and batch_run, if the required dependencies are installed? If so, what > do we do if that fails, throw a warning? What if it's the intended behavior for the user?

As far as I know, during coding dependencies are not checked so while running an import error will automatically be raised.

  1. Any suggestions on how the dependency groups in pyproject.toml are structured?

I'll try to look at this asap.

  1. Did I miss any documentation, tutorials, etc.

This PR is getting a bit big in terms of affected files. Might it be possible to split it into two: update toml, update docs?

@EwoutH
Copy link
Member Author

EwoutH commented Sep 20, 2024

Thanks for the review!

On point 2 and 3, the biggest issue arises from the updated __init__.py:

 __all__ = [
     "Model",
     "Agent",
     "time",
     "space",
-    "DataCollector",
-    "batch_run",
-    "experimental",
 ]

DataCollector, batch_run and experimental are removed because they contain dependencies that are note installed by default anymore. Not everyone might need a DataCollector, batch_run or experimental feature, so that's the idea.

The thing is, ideally you don't import those three if they are not used. As far as I have found out, you have two options:

  1. Always attempt to import them
    a. Warning if they can't be. But it's weird behavior, because why get a warning from a module you might not even need?
    b. Do it silently. This may be confusing, why does it work sometime and other times not?
  2. Try some lazy import evaluation. But that's difficult, since it's running in __init__.py. Haven't found a satisficing solution.

@quaquel
Copy link
Member

quaquel commented Sep 20, 2024

I quickly checked the toml file. It seems sensible not to always install, e.g., networks or Solara. I am less sure about pandas. In my view this is such a standard default library, why not always include this. TQDM is very lightweight as far as I know, so why not always also install this? My personal view on this, in general, is that this fine sliceing of dependencies is a bit odd given Python's overall batteries included philosophy. In particular, if dependencies are stable and readily available (so not requiring conda forge or so), I rarely see the point of not simply installing everything.

@EwoutH
Copy link
Member Author

EwoutH commented Sep 20, 2024

I think there is a middle way feasible here. Don't underestimate how heavy pandas is though. One example is that they currently don't have Python 3.13 wheels (months after the betas, and two weeks before stable release), and building (in CI) takes a full 4 minutes. But I agree there not that many scenarios you don't need it.

Including tqdm and pandas in the core could be a way to go, to enable batch_run and DataCollector by default. Curious what the rest thinks about that.

@rht
Copy link
Contributor

rht commented Sep 20, 2024

I'm not sure of the performance impact, but one could import pandas inside of the DataCollector class instead of globally within the datacollection.py file. This way, if an import error happens, you could provide a helpful error message saying that pandas needs to be installed. Needs to see how Solara does it with their FigureMatplotlib component (Matplotlib is not a dependency of Solara). Also, Solara has recently separated the heavy server code from their core package. I read it somewhere in a thread, that it is possibly solara-ui and solara-server, but couldn't find an evidence of this after reading their pyproject.toml.

@EwoutH
Copy link
Member Author

EwoutH commented Sep 20, 2024

I'm not sure of the performance impact, but one could import pandas inside of the DataCollector class instead of globally within the datacollection.py file.

I think we can do this quite elegantly:

class DataCollector:
    ....

    @cached_property
    def _pd(self):
        """Lazy-loaded pandas module using cached_property."""
        try:
            import pandas as pd
            return pd
        except ImportError as e:
            raise ImportError(
                "Pandas is required for this operation. Please install it using 'pip install pandas'."
            ) from e
            
    def get_model_vars_dataframe(self):
        ...
        return self._pd.DataFrame(self.model_vars)

Also, in the datacollector, Pandas is generally only used on methods that are ran after the model is done, like get_model_vars_dataframe. The lazy import allows to not install in on - let's say - a super computer, and do the analysis afterwards.

However, I'm also still considering making tqdm and pandas required dependencies again.

@rht
Copy link
Contributor

rht commented Sep 21, 2024

I was stating pandas just as an illustration, as it applies to other optional dependencies such as matplotlib. This is how Solara does it for Altair: https://github.com/widgetti/solara/blob/2cf59d9bb40ed7d10d976c9c44c292d61e01ec89/solara/components/figure_altair.py#L24.

Copy link
Member

@tpike3 tpike3 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@EwoutH This is awesome!

I had one question on the import and thinking through latest and stable readthedocs pages. Not sure the best answer, except may be calling explicitly that there is a difference.

My thought on the discussion is and to @quaquel's comments of "batteries included" can you set it up so all is installed by default (e.g. if no specification install all or rec) this is good for new users but for more advanced users they can specify specific dependencies they want. (e.g. which sub categories they want)

@EwoutH
Copy link
Member Author

EwoutH commented Sep 22, 2024

Thanks for reviewing! I answered your comment on the docs.

I will try to update the introduction tutorial, but we probably have to rewrite large parts of it. I did a small update in #2315, but removing the schedulers requires completely rewriting the Agent activation parts.

@tpike3
Copy link
Member

tpike3 commented Sep 22, 2024

Thanks for reviewing! I answered your comment on the docs.

I will try to update the introduction tutorial, but we probably have to rewrite large parts of it. I did a small update in #2315, but removing the schedulers requires completely rewriting the Agent activation parts.

Ha just saw #2315 and went right into the code so didn't see everyone else comments, as such I updated my comment.

Thanks for all this work, this is such great progress!

@tpike3
Copy link
Member

tpike3 commented Sep 22, 2024

Just want to make sure you know the readthedocs build is failing due to datacollecter; it also seems there is a bunch of collisions on the names

image

Think about what to do with that
@EwoutH
Copy link
Member Author

EwoutH commented Sep 26, 2024

Okay, I vastly simplified this PR, by keeping Pandas and tqdm core dependencies.

Now only the pyproject.toml and Readme are updated. Please review.

@EwoutH EwoutH removed the breaking Release notes label label Sep 26, 2024
Copy link
Member

@quaquel quaquel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry one last question about cookiecutter. But yes this looks fine and clear now.

@Corvince
Copy link
Contributor

Okay, I vastly simplified this PR, by keeping Pandas and tqdm core dependencies.

Now only the pyproject.toml and Readme are updated. Please review.

Its indeed much better now, although pandas is really a mixed bag. Previously I would have said everyone who uses mesa should probably also use pandas anyway, but nowadays with polars, ibis and others this is much less true. And, as far as I remember, we only use pandas in one place, to convert the data collected data into a dataframe. I don't think that is really worth the dependency. But lets maybe do that in a separate PR, to move this one forward.

@quaquel
Copy link
Member

quaquel commented Sep 27, 2024

Okay, I vastly simplified this PR, by keeping Pandas and tqdm core dependencies.
Now only the pyproject.toml and Readme are updated. Please review.

Its indeed much better now, although pandas is really a mixed bag. Previously I would have said everyone who uses mesa should probably also use pandas anyway, but nowadays with polars, ibis and others this is much less true. And, as far as I remember, we only use pandas in one place, to convert the data collected data into a dataframe. I don't think that is really worth the dependency. But lets maybe do that in a separate PR, to move this one forward.

I am inclined, as part of the broader overhoal of data collection, to no longer include any to_dataframe style helper method and leave this completely to the user. For example, as long as we store agent level data in a dict of dicts (dict[agent.unique_id] = {attr1:list, attr2:list} or something similar that is easy to convert to your dataframe library of choice, we don't need to have pandas as a dependency.

@EwoutH
Copy link
Member Author

EwoutH commented Sep 27, 2024

although pandas is really a mixed bag

Fully agree

But lets maybe do that in a separate PR, to move this one forward.

Also agree


I'm merging, follow-up PRs can be made as everyone sees fit.

@EwoutH EwoutH merged commit df51862 into main Sep 27, 2024
10 checks passed
@EwoutH EwoutH added the enhancement Release notes label label Sep 27, 2024
@EwoutH EwoutH deleted the opt branch September 27, 2024 09:51
README.md Show resolved Hide resolved
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Release notes label maintenance Release notes label
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Remove all library requirements
5 participants