Refactor water demand and supply for readability and separation-of-concerns #319

Wegatriespython · 2025-03-26T08:43:19Z

Refactored Water Demand and Supply For Readability and Separation of Concerns

Short Desc : model/water/data/demands.py and model/water/data/water_supply.py are drop in replacements for water_supply_legacy and demands_legacy. They have been refactored to take in rule based input which gives the parameters and data information and the files them

Long Desc :
PR aims to clean up code in demands and water_supply. Notably address the following issues :

Repeated logic for make_df operations for different technologies with slight differences.
Hardcoding data into the code
Messy control flow and poor readability

It addresses these things by moving all data to rules files which contain instructions based on the specifics of the input data along with information like units. water_supply and demands now use these rules and no longer hard code data specifics. Structural Pattern matching introduced in python 3.10 is used to handle repeated logic and clear up control flow for readability over nested if-else statements. Requires dropping support for python <3.10

How to review

Required: Review changes in implementation, read the diff, check the tests set up for issues. Verify documentation and ask for clarifying questions on the new workflow to help augment documentation.

PR checklist

Continuous integration checks all ✅
Add or expand tests; coverage checks both ✅
Add, expand, or update documentation.
Update doc/whatsnew.

glatterf42 · 2025-03-26T09:09:07Z

message_ix_models/model/water/data/demand_rules.py

+    match len(unique_prefixes):
+        case 0:
+            return eval(expr, {}, {})
+        case 1:
+            local_context = {unique_prefixes[0]: df_processed}
+        case 2 if df_processed2 is not None:
+            local_context = {unique_prefixes[0]: df_processed, unique_prefixes[1]: df_processed2}
+        case _:
+            raise ValueError(f"Expression '{expr}' uses more than two different dataframes: {unique_prefixes}")


It's great to see you're already using these :)

However, this feature was only introduced with Python 3.10 and message-ix-models still supports 3.9. So if you want to keep these match statements, you'll have to apply markers that this function (and similar functions plus all functions referencing them) only work with Python 3.10 and above. Alternatively, you can try to replicate this using if statements.

khaeru · 2025-03-26T09:12:54Z

Thanks for this contribution, @Wegatriespython, including a detailed PR description. Without yet looking into the particular changes, one reaction:

Requires dropping support for python <3.10.

This is contrary to our Upstream version policy, so we can't do this.

Some options, in no particular order:

Revise the code to avoid language features that are not available in still-supported versions of Python. I find this a handy reference for this purpose.
Ensure the module(s) containing certain language features are not imported until/unless used, and:
- Decorate the code that imports them using the minimum_version utility (see the code for examples).
- Mark tests appropriately so that they XFAIL on Python 3.9.
- Declare clearly in the documentation that the code is only usable with certain version(s) of Python.
Postpone the PR until ca. October/November of this year, when Python 3.9 will reach EOL and we will drop support for it.

(I see @glatterf42 was quicker than me, as usual 😅)

glatterf42 · 2025-03-26T09:12:58Z

message_ix_models/model/water/data/water_for_ppl.py

@@ -135,7 +136,7 @@ def shares(
 def apply_act_cap_multiplier(
    df: pd.DataFrame,
    hold_cost: pd.DataFrame,
-    cap_fact_parent: Optional[pd.DataFrame] = None,
+    cap_fact_parent: pd.DataFrame = None,


If you're going to have a default of None for cap_fact_parent, your type hint cannot be just pd.DataFrame. None and pd.Dataframe are not compatible. If mypy were enabled for the water files, as it should be, it would tell you as much.
Optional[pd.DataFrame] essentially means pd.DataFrame or None are both okay (which they should be given your choice of default).
It's then up to the code below to figure out how to handle the None default.

Alternatively, you could set the default to pd.DataFrame() (and empty dataframe). You would likely still have to handle this default case separately as en empty dataframe has no column names, no index, etc.

glatterf42

Currently, our CI is failing for message-ix-models because a PR was merged that erroneously increased the complexity of the cool_tech() function. I will therefore create a PR to fix this, which will need to make changes to the water_for_ppl file. Please rebase this PR on top of main after we merge this fixing PR (which should hopefully happen today already).

In preparation for this, I noticed several changes for which I don't see any reason. Originally, I though single comments might suffice, but as they became more, I made this a review. This is not complete, though, and the other files will requires a similarly thorough check still before being merged.

glatterf42 · 2025-03-26T09:15:57Z

message_ix_models/model/water/data/water_for_ppl.py

+        if param_name in multip_list:
+            df_param_share = apply_act_cap_multiplier(
+                df_param, hold_cost, cap_fact_parent, param_name
+            )
+        else:
+            df_param_share = df_param


Okay, these single comments are becoming enough to count as a small review, even though this is not intended to be a complete review.

Why are you changing this? The lines we had before did the exact same thing except they reduce the complexity measurement of the cool_tech() function. This is something I will have to open a dedicated PR for because the function is too complex for our own code quality standards.
It also increases the number of test cases needed to completely cover this function.
I'm against this change unless you have a good reason for it.

glatterf42 · 2025-03-26T09:17:37Z

message_ix_models/model/water/data/water_for_ppl.py

@@ -658,7 +657,7 @@ def cool_tech(context: "Context") -> dict[str, pd.DataFrame]:
    year_list = [2020, 2010, 2030, 2050, 2000, 2080, 1990]

    for year in year_list:
-        log.debug(f"cool_tech() for year '{year}'")
+        print(year)


Why are you replacing the log.debug() functionality with print()?
Having print() everywhere will bloat the output a user receives, while log.debug() will only print() its message when the logging level requires it, leaving a much better experience for most users. Same below for log.warning().

glatterf42 · 2025-03-26T09:17:49Z

message_ix_models/model/water/data/water_for_ppl.py

-            f"Warning: Some combinations are still missing even after trying all "
-            f"years: {still_missing}"
+        print(
+            f"Warning: Some combinations are still missing even after trying all years: {still_missing}"


Same question as above for log.debug().

glatterf42 · 2025-03-26T09:24:20Z

One note about the CI checks: PRs from forks do not get access to a repository's secrets per default to prevent exposing them to just anyone. We figured out a workaround for this since our tests need a GAMS license, which is stored in a secret: when a PR from a fork is opened, it needs to be labelled "safe to test" in order to run the tests. If you want the tests to pass (and you're able to label your PRs, which you should be, otherwise please let me know), please apply this label to your PRs :)

adrivinca · 2025-03-26T09:35:09Z

message_ix_models/model/water/data/water_for_ppl.py

@@ -181,8 +180,7 @@ def apply_act_cap_multiplier(
        df = df.merge(cap_fact_parent, how="left")
        df["value"] *= df["cap_fact"] * 1.2  # flexibility
        df.drop(columns="cap_fact", inplace=True)
-    # remove if there are Nan values, but write a log that inform on the parameter and
-    # the head of the data
+    # remove if there are Nan values, but write a log that inform on the parameter and the head of the data


I think here and in all comments, it was the policy to keep also commented text within the black limits of 88 characters. Although black might not detect it automatically. At least I was asked to do to in previous PRs :)

Good point, that's also true! We don't use black anymore, but ruff should detect the same thing. This makes me wonder: @Wegatriespython, do you know that you can set up ruff and mypy (our main code quality tools) to run automatically whenever you hit "save"?
This will make your life much easier in the long run and avoid cases like this :)

glatterf42 · 2025-03-26T09:35:14Z

Another general comment (sorry if this is a lot, but I figure it's best to let you know early-on): we have a documentation page with a guide for contributing and we would like everyone, including team members, to follow that. We frequently have to remind even experienced team members about this, but I wish that weren't the case and you could be a great example that it needn't be :)
In particular, we want to use the seven rules of a great Git commit message. This means we want the commit message to start with a capitalized verb and explain what the commit is doing. In this case, "Demand fully refactored and tested" should become something like "Refactor and test demand". For "Refactoring attempt 1", I'm not even sure what else this could be without looking at the changes made because the commit message is so vague.

Bonus tip for keeping a clear git history and avoiding commit messages like "refactor 1", "refactor 2", etc: Make one commit (e.g. "Refactor demand"), and when you find you need to fix/expand just that one commit, you git add your files and run git commit --fixup <hash of the original commit>. Then, you do a git rebase -i HEAD~<N>, where N is the number of commits since the original commit (but including the original commit).

Please let me know if you have any questions about this :)

Wegatriespython · 2025-03-26T09:45:58Z

Dear all thank you for your quick engagement.

I realised there are many issues with this PR, the main one is that I was supposed to make a seperate branch not merge into main, it was my oversight along with other things.

I also think water_for_ppl.py is an old version from either a pre-merge branch its not supposed to be here, again my oversight. Can you tell me what I should do to make this my branch on the main repo and then ill start working on the highligted issues?

Thank you for your patience and attention

adrivinca · 2025-03-26T10:52:02Z

Thanks Paul and Frido for the comments!
maybe to clarify, the PR is not ready for review yet. I think Vignesh clicked on Frido as reviewer by mistake as I told him to tag me.
so @glatterf42 very appreciated feedback, especially at the beginning. But we can ping you when things are ready so that your time spent on this is minimized :)

…_SSP2.csv from upstream/main

Wegatriespython added 12 commits March 26, 2025 09:24

Refactoring attempt 1

679ab28

Refactoring attempt 1

4090ffc

Refactoring attempt 2

90054f8

Supply & Demand

f7341ca

Supply DSL Partial Working(removed DSL engine)

61ed052

Entering the hard part

b57d279

Water Supply Refactored and Tested

fd967dc

Demand pt3 refactored and tested

fe7bdbc

Demand fully refactored and tested

fa06059

Demand & Supply Refactored and Tested. Files Cleaned Up

40dcaba

Demand & Supply Refactored and Tested. Files Cleaned Up

87fea15

Demand & Supply Refactored and Tested. Files Cleaned Up. TOML updated

75f1f12

Wegatriespython self-assigned this Mar 26, 2025

Wegatriespython requested review from adrivinca and awais307 as code owners March 26, 2025 08:43

glatterf42 added the safe to test label Mar 26, 2025

glatterf42 reviewed Mar 26, 2025

View reviewed changes

khaeru changed the title ~~Refactored Water Demand and Supply For Readability and Separation of Concerns~~ Refactor water demand and supply for readability and separation-of-concerns Mar 26, 2025

khaeru added enh New features or functionality water MESSAGEix-Nexus (water) variant labels Mar 26, 2025

glatterf42 requested changes Mar 26, 2025

View reviewed changes

adrivinca reviewed Mar 26, 2025

View reviewed changes

Wegatriespython added 2 commits March 27, 2025 13:23

Restore water_for_ppl.py from upstream/main

22bcfb5

Restore message_ix_models/data/water/demands/harmonized/ZMB/all_rates…

bcb2257

…_SSP2.csv from upstream/main

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor water demand and supply for readability and separation-of-concerns #319

Refactor water demand and supply for readability and separation-of-concerns #319

Wegatriespython commented Mar 26, 2025

glatterf42 Mar 26, 2025

khaeru commented Mar 26, 2025 •

edited

Loading

glatterf42 Mar 26, 2025

glatterf42 left a comment

glatterf42 Mar 26, 2025

glatterf42 Mar 26, 2025

glatterf42 Mar 26, 2025

glatterf42 commented Mar 26, 2025

adrivinca Mar 26, 2025

glatterf42 Mar 26, 2025

glatterf42 commented Mar 26, 2025

Wegatriespython commented Mar 26, 2025

adrivinca commented Mar 26, 2025 •

edited

Loading

Refactor water demand and supply for readability and separation-of-concerns #319

Are you sure you want to change the base?

Refactor water demand and supply for readability and separation-of-concerns #319

Conversation

Wegatriespython commented Mar 26, 2025

Refactored Water Demand and Supply For Readability and Separation of Concerns

How to review

PR checklist

glatterf42 Mar 26, 2025

Choose a reason for hiding this comment

khaeru commented Mar 26, 2025 • edited Loading

glatterf42 Mar 26, 2025

Choose a reason for hiding this comment

glatterf42 left a comment

Choose a reason for hiding this comment

glatterf42 Mar 26, 2025

Choose a reason for hiding this comment

glatterf42 Mar 26, 2025

Choose a reason for hiding this comment

glatterf42 Mar 26, 2025

Choose a reason for hiding this comment

glatterf42 commented Mar 26, 2025

adrivinca Mar 26, 2025

Choose a reason for hiding this comment

glatterf42 Mar 26, 2025

Choose a reason for hiding this comment

glatterf42 commented Mar 26, 2025

Wegatriespython commented Mar 26, 2025

adrivinca commented Mar 26, 2025 • edited Loading

khaeru commented Mar 26, 2025 •

edited

Loading

adrivinca commented Mar 26, 2025 •

edited

Loading