-
Notifications
You must be signed in to change notification settings - Fork 5
Description
Is your feature request related to a problem? Please describe.
We often have apply functions that look like the following (the grouping isn't important here) and end with an appending of a set of S
def f(run) -> ScmRun:
return scmdata.run_append(
[
run.set_meta("col", True),
run.set_meta("col", False),
]
)
df.groupby("variable").apply(f)
Rather than performing n + 1
appends (one for each call of f and 1 to combine), a single append could be performed if a list of ScmRun
objects is returned and the results are lazily appended together at the end of the groupby operation.
f
would become:
def f(run) -> list[ScmRun]:
return [
run.set_meta("col", True),
run.set_meta("col", False),
]
This should result in a small performance improvement for the case where there are lots of groups.
Describe the solution you'd like
Update run_append
to handle appending runs of type list[BaseScmRun | list[BaseScmRun]
. aka a list of a mix of ScmRuns or lists of ScmRuns.
This wouldn't require much, if any, change to the groupby code other than updating documentation.
Describe alternatives you've considered
Handling apply functions return values differently if it is a ScmRun or a list of ScmRun. The proposed soln is more flexible as similar functionality may be used in other places.