-
Notifications
You must be signed in to change notification settings - Fork 38
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor water demand and supply for readability and separation-of-concerns #319
base: main
Are you sure you want to change the base?
Conversation
match len(unique_prefixes): | ||
case 0: | ||
return eval(expr, {}, {}) | ||
case 1: | ||
local_context = {unique_prefixes[0]: df_processed} | ||
case 2 if df_processed2 is not None: | ||
local_context = {unique_prefixes[0]: df_processed, unique_prefixes[1]: df_processed2} | ||
case _: | ||
raise ValueError(f"Expression '{expr}' uses more than two different dataframes: {unique_prefixes}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's great to see you're already using these :)
However, this feature was only introduced with Python 3.10 and message-ix-models still supports 3.9. So if you want to keep these match
statements, you'll have to apply markers that this function (and similar functions plus all functions referencing them) only work with Python 3.10 and above. Alternatively, you can try to replicate this using if
statements.
Thanks for this contribution, @Wegatriespython, including a detailed PR description. Without yet looking into the particular changes, one reaction:
This is contrary to our Upstream version policy, so we can't do this. Some options, in no particular order:
(I see @glatterf42 was quicker than me, as usual 😅) |
@@ -135,7 +136,7 @@ def shares( | |||
def apply_act_cap_multiplier( | |||
df: pd.DataFrame, | |||
hold_cost: pd.DataFrame, | |||
cap_fact_parent: Optional[pd.DataFrame] = None, | |||
cap_fact_parent: pd.DataFrame = None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you're going to have a default of None
for cap_fact_parent
, your type hint cannot be just pd.DataFrame
. None
and pd.Dataframe
are not compatible. If mypy were enabled for the water files, as it should be, it would tell you as much.
Optional[pd.DataFrame]
essentially means pd.DataFrame
or None
are both okay (which they should be given your choice of default).
It's then up to the code below to figure out how to handle the None
default.
Alternatively, you could set the default to pd.DataFrame()
(and empty dataframe). You would likely still have to handle this default case separately as en empty dataframe has no column names, no index, etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently, our CI is failing for message-ix-models because a PR was merged that erroneously increased the complexity of the cool_tech()
function. I will therefore create a PR to fix this, which will need to make changes to the water_for_ppl
file. Please rebase this PR on top of main
after we merge this fixing PR (which should hopefully happen today already).
In preparation for this, I noticed several changes for which I don't see any reason. Originally, I though single comments might suffice, but as they became more, I made this a review. This is not complete, though, and the other files will requires a similarly thorough check still before being merged.
if param_name in multip_list: | ||
df_param_share = apply_act_cap_multiplier( | ||
df_param, hold_cost, cap_fact_parent, param_name | ||
) | ||
else: | ||
df_param_share = df_param |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, these single comments are becoming enough to count as a small review, even though this is not intended to be a complete review.
Why are you changing this? The lines we had before did the exact same thing except they reduce the complexity measurement of the cool_tech()
function. This is something I will have to open a dedicated PR for because the function is too complex for our own code quality standards.
It also increases the number of test cases needed to completely cover this function.
I'm against this change unless you have a good reason for it.
@@ -658,7 +657,7 @@ def cool_tech(context: "Context") -> dict[str, pd.DataFrame]: | |||
year_list = [2020, 2010, 2030, 2050, 2000, 2080, 1990] | |||
|
|||
for year in year_list: | |||
log.debug(f"cool_tech() for year '{year}'") | |||
print(year) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are you replacing the log.debug()
functionality with print()
?
Having print()
everywhere will bloat the output a user receives, while log.debug()
will only print()
its message when the logging level requires it, leaving a much better experience for most users. Same below for log.warning()
.
f"Warning: Some combinations are still missing even after trying all " | ||
f"years: {still_missing}" | ||
print( | ||
f"Warning: Some combinations are still missing even after trying all years: {still_missing}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same question as above for log.debug()
.
One note about the CI checks: PRs from forks do not get access to a repository's secrets per default to prevent exposing them to just anyone. We figured out a workaround for this since our tests need a GAMS license, which is stored in a secret: when a PR from a fork is opened, it needs to be labelled "safe to test" in order to run the tests. If you want the tests to pass (and you're able to label your PRs, which you should be, otherwise please let me know), please apply this label to your PRs :) |
@@ -181,8 +180,7 @@ def apply_act_cap_multiplier( | |||
df = df.merge(cap_fact_parent, how="left") | |||
df["value"] *= df["cap_fact"] * 1.2 # flexibility | |||
df.drop(columns="cap_fact", inplace=True) | |||
# remove if there are Nan values, but write a log that inform on the parameter and | |||
# the head of the data | |||
# remove if there are Nan values, but write a log that inform on the parameter and the head of the data |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think here and in all comments, it was the policy to keep also commented text within the black limits of 88 characters. Although black might not detect it automatically. At least I was asked to do to in previous PRs :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, that's also true! We don't use black
anymore, but ruff
should detect the same thing. This makes me wonder: @Wegatriespython, do you know that you can set up ruff
and mypy
(our main code quality tools) to run automatically whenever you hit "save"?
This will make your life much easier in the long run and avoid cases like this :)
Another general comment (sorry if this is a lot, but I figure it's best to let you know early-on): we have a documentation page with a guide for contributing and we would like everyone, including team members, to follow that. We frequently have to remind even experienced team members about this, but I wish that weren't the case and you could be a great example that it needn't be :) Bonus tip for keeping a clear git history and avoiding commit messages like "refactor 1", "refactor 2", etc: Make one commit (e.g. "Refactor demand"), and when you find you need to fix/expand just that one commit, you Please let me know if you have any questions about this :) |
Dear all thank you for your quick engagement. I realised there are many issues with this PR, the main one is that I was supposed to make a seperate branch not merge into main, it was my oversight along with other things. I also think water_for_ppl.py is an old version from either a pre-merge branch its not supposed to be here, again my oversight. Can you tell me what I should do to make this my branch on the main repo and then ill start working on the highligted issues? Thank you for your patience and attention |
Thanks Paul and Frido for the comments! |
…_SSP2.csv from upstream/main
Refactored Water Demand and Supply For Readability and Separation of Concerns
Short Desc : model/water/data/demands.py and model/water/data/water_supply.py are drop in replacements for water_supply_legacy and demands_legacy. They have been refactored to take in rule based input which gives the parameters and data information and the files them
Long Desc :
PR aims to clean up code in demands and water_supply. Notably address the following issues :
It addresses these things by moving all data to rules files which contain instructions based on the specifics of the input data along with information like units. water_supply and demands now use these rules and no longer hard code data specifics. Structural Pattern matching introduced in python 3.10 is used to handle repeated logic and clear up control flow for readability over nested if-else statements. Requires dropping support for python <3.10
How to review
Required: Review changes in implementation, read the diff, check the tests set up for issues. Verify documentation and ask for clarifying questions on the new workflow to help augment documentation.
PR checklist