Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generic statistics #510

Draft
wants to merge 4 commits into
base: main
Choose a base branch
from
Draft

Generic statistics #510

wants to merge 4 commits into from

Conversation

jsmariegaard
Copy link
Member

New method statistics in generic module which calculates min, max and mean for any dfs file.

@ecomodeller
Copy link
Member

@jsmariegaard still relevant?

@jsmariegaard
Copy link
Member Author

@jsmariegaard still relevant?

Yes, I think having this functionality would be of great value. I am afraid I don't have the time to finish this myself though - and I think it would be easier to start a new branch and then copy the relevant parts from this PR over. Maybe a task for @otzi5300 ?

@ecomodeller
Copy link
Member

It would be fantastic if we could support arbitrary aggregation using e.g. https://realpython.com/python-reduce-function/ , but I don't know if it is possible.

@jsmariegaard
Copy link
Member Author

Yeah, that would be cool - reminds me about our array API ideas also

@ecomodeller
Copy link
Member

E.g. exceedance (according to 🤖)

import numpy as np
from functools import reduce

# Example data: Each row represents a different time step (t=0, t=1, ...) (from `.ReadItemTimeStep`)
data = [
    np.array([10, 15, 22]),  # t=0
    np.array([12, 18, 25]),  # t=1
    np.array([20, 21, 23]),  # t=2
]

thresholds = np.array([10, 15, 20])

def reducer(acc, x):
    exceedance_counts, total_count = acc
    exceedance_counts += (x[:, None] > thresholds).sum(axis=0)
    total_count += len(x)
    return (exceedance_counts, total_count)

initial_counts = np.zeros(len(thresholds), dtype=int)
initial = (initial_counts, 0)

exceedance_counts, total_count = reduce(reducer, data, initial)
exceedance_probs = exceedance_counts / total_count

results = {t: prob for t, prob in zip(thresholds, exceedance_probs)}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants