Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor water demand and supply for readability and separation-of-concerns #319

Open
wants to merge 14 commits into
base: main
Choose a base branch
from

Conversation

Wegatriespython
Copy link

Refactored Water Demand and Supply For Readability and Separation of Concerns

Short Desc : model/water/data/demands.py and model/water/data/water_supply.py are drop in replacements for water_supply_legacy and demands_legacy. They have been refactored to take in rule based input which gives the parameters and data information and the files them

Long Desc :
PR aims to clean up code in demands and water_supply. Notably address the following issues :

  • Repeated logic for make_df operations for different technologies with slight differences.
  • Hardcoding data into the code
  • Messy control flow and poor readability

It addresses these things by moving all data to rules files which contain instructions based on the specifics of the input data along with information like units. water_supply and demands now use these rules and no longer hard code data specifics. Structural Pattern matching introduced in python 3.10 is used to handle repeated logic and clear up control flow for readability over nested if-else statements. Requires dropping support for python <3.10

How to review

Required: Review changes in implementation, read the diff, check the tests set up for issues. Verify documentation and ask for clarifying questions on the new workflow to help augment documentation.

PR checklist

  • Continuous integration checks all ✅
  • Add or expand tests; coverage checks both ✅
  • Add, expand, or update documentation.
  • Update doc/whatsnew.

Comment on lines +229 to +237
match len(unique_prefixes):
case 0:
return eval(expr, {}, {})
case 1:
local_context = {unique_prefixes[0]: df_processed}
case 2 if df_processed2 is not None:
local_context = {unique_prefixes[0]: df_processed, unique_prefixes[1]: df_processed2}
case _:
raise ValueError(f"Expression '{expr}' uses more than two different dataframes: {unique_prefixes}")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's great to see you're already using these :)

However, this feature was only introduced with Python 3.10 and message-ix-models still supports 3.9. So if you want to keep these match statements, you'll have to apply markers that this function (and similar functions plus all functions referencing them) only work with Python 3.10 and above. Alternatively, you can try to replicate this using if statements.

@khaeru
Copy link
Member

khaeru commented Mar 26, 2025

Thanks for this contribution, @Wegatriespython, including a detailed PR description. Without yet looking into the particular changes, one reaction:

Requires dropping support for python <3.10.

This is contrary to our Upstream version policy, so we can't do this.

Some options, in no particular order:

  • Revise the code to avoid language features that are not available in still-supported versions of Python. I find this a handy reference for this purpose.
  • Ensure the module(s) containing certain language features are not imported until/unless used, and:
    • Decorate the code that imports them using the minimum_version utility (see the code for examples).
    • Mark tests appropriately so that they XFAIL on Python 3.9.
    • Declare clearly in the documentation that the code is only usable with certain version(s) of Python.
  • Postpone the PR until ca. October/November of this year, when Python 3.9 will reach EOL and we will drop support for it.

(I see @glatterf42 was quicker than me, as usual 😅)

@@ -135,7 +136,7 @@ def shares(
def apply_act_cap_multiplier(
df: pd.DataFrame,
hold_cost: pd.DataFrame,
cap_fact_parent: Optional[pd.DataFrame] = None,
cap_fact_parent: pd.DataFrame = None,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you're going to have a default of None for cap_fact_parent, your type hint cannot be just pd.DataFrame. None and pd.Dataframe are not compatible. If mypy were enabled for the water files, as it should be, it would tell you as much.
Optional[pd.DataFrame] essentially means pd.DataFrame or None are both okay (which they should be given your choice of default).
It's then up to the code below to figure out how to handle the None default.

Alternatively, you could set the default to pd.DataFrame() (and empty dataframe). You would likely still have to handle this default case separately as en empty dataframe has no column names, no index, etc.

@khaeru khaeru changed the title Refactored Water Demand and Supply For Readability and Separation of Concerns Refactor water demand and supply for readability and separation-of-concerns Mar 26, 2025
@khaeru khaeru added enh New features or functionality water MESSAGEix-Nexus (water) variant labels Mar 26, 2025
Copy link
Member

@glatterf42 glatterf42 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently, our CI is failing for message-ix-models because a PR was merged that erroneously increased the complexity of the cool_tech() function. I will therefore create a PR to fix this, which will need to make changes to the water_for_ppl file. Please rebase this PR on top of main after we merge this fixing PR (which should hopefully happen today already).

In preparation for this, I noticed several changes for which I don't see any reason. Originally, I though single comments might suffice, but as they became more, I made this a review. This is not complete, though, and the other files will requires a similarly thorough check still before being merged.

Comment on lines 937 to 942
if param_name in multip_list:
df_param_share = apply_act_cap_multiplier(
df_param, hold_cost, cap_fact_parent, param_name
)
else:
df_param_share = df_param
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, these single comments are becoming enough to count as a small review, even though this is not intended to be a complete review.

Why are you changing this? The lines we had before did the exact same thing except they reduce the complexity measurement of the cool_tech() function. This is something I will have to open a dedicated PR for because the function is too complex for our own code quality standards.
It also increases the number of test cases needed to completely cover this function.
I'm against this change unless you have a good reason for it.

@@ -658,7 +657,7 @@ def cool_tech(context: "Context") -> dict[str, pd.DataFrame]:
year_list = [2020, 2010, 2030, 2050, 2000, 2080, 1990]

for year in year_list:
log.debug(f"cool_tech() for year '{year}'")
print(year)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are you replacing the log.debug() functionality with print()?
Having print() everywhere will bloat the output a user receives, while log.debug() will only print() its message when the logging level requires it, leaving a much better experience for most users. Same below for log.warning().

f"Warning: Some combinations are still missing even after trying all "
f"years: {still_missing}"
print(
f"Warning: Some combinations are still missing even after trying all years: {still_missing}"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same question as above for log.debug().

@glatterf42
Copy link
Member

One note about the CI checks: PRs from forks do not get access to a repository's secrets per default to prevent exposing them to just anyone. We figured out a workaround for this since our tests need a GAMS license, which is stored in a secret: when a PR from a fork is opened, it needs to be labelled "safe to test" in order to run the tests. If you want the tests to pass (and you're able to label your PRs, which you should be, otherwise please let me know), please apply this label to your PRs :)

@@ -181,8 +180,7 @@ def apply_act_cap_multiplier(
df = df.merge(cap_fact_parent, how="left")
df["value"] *= df["cap_fact"] * 1.2 # flexibility
df.drop(columns="cap_fact", inplace=True)
# remove if there are Nan values, but write a log that inform on the parameter and
# the head of the data
# remove if there are Nan values, but write a log that inform on the parameter and the head of the data
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think here and in all comments, it was the policy to keep also commented text within the black limits of 88 characters. Although black might not detect it automatically. At least I was asked to do to in previous PRs :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, that's also true! We don't use black anymore, but ruff should detect the same thing. This makes me wonder: @Wegatriespython, do you know that you can set up ruff and mypy (our main code quality tools) to run automatically whenever you hit "save"?
This will make your life much easier in the long run and avoid cases like this :)

@glatterf42
Copy link
Member

Another general comment (sorry if this is a lot, but I figure it's best to let you know early-on): we have a documentation page with a guide for contributing and we would like everyone, including team members, to follow that. We frequently have to remind even experienced team members about this, but I wish that weren't the case and you could be a great example that it needn't be :)
In particular, we want to use the seven rules of a great Git commit message. This means we want the commit message to start with a capitalized verb and explain what the commit is doing. In this case, "Demand fully refactored and tested" should become something like "Refactor and test demand". For "Refactoring attempt 1", I'm not even sure what else this could be without looking at the changes made because the commit message is so vague.

Bonus tip for keeping a clear git history and avoiding commit messages like "refactor 1", "refactor 2", etc: Make one commit (e.g. "Refactor demand"), and when you find you need to fix/expand just that one commit, you git add your files and run git commit --fixup <hash of the original commit>. Then, you do a git rebase -i HEAD~<N>, where N is the number of commits since the original commit (but including the original commit).

Please let me know if you have any questions about this :)

@Wegatriespython
Copy link
Author

Dear all thank you for your quick engagement.

I realised there are many issues with this PR, the main one is that I was supposed to make a seperate branch not merge into main, it was my oversight along with other things.

I also think water_for_ppl.py is an old version from either a pre-merge branch its not supposed to be here, again my oversight. Can you tell me what I should do to make this my branch on the main repo and then ill start working on the highligted issues?

Thank you for your patience and attention

@adrivinca
Copy link
Contributor

adrivinca commented Mar 26, 2025

Thanks Paul and Frido for the comments!
maybe to clarify, the PR is not ready for review yet. I think Vignesh clicked on Frido as reviewer by mistake as I told him to tag me.
so @glatterf42 very appreciated feedback, especially at the beginning. But we can ping you when things are ready so that your time spent on this is minimized :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enh New features or functionality safe to test water MESSAGEix-Nexus (water) variant
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants