Skip to content

Lazy append in groupby.apply #231

@lewisjared

Description

@lewisjared

Is your feature request related to a problem? Please describe.

We often have apply functions that look like the following (the grouping isn't important here) and end with an appending of a set of S

def f(run) -> ScmRun:
    return scmdata.run_append(
        [
            run.set_meta("col", True),
            run.set_meta("col", False),
        ]
    )


df.groupby("variable").apply(f)

Rather than performing n + 1 appends (one for each call of f and 1 to combine), a single append could be performed if a list of ScmRun objects is returned and the results are lazily appended together at the end of the groupby operation.

f would become:

def f(run) -> list[ScmRun]:
    return [
            run.set_meta("col", True),
            run.set_meta("col", False),
        ]

This should result in a small performance improvement for the case where there are lots of groups.

Describe the solution you'd like

Update run_append to handle appending runs of type list[BaseScmRun | list[BaseScmRun]. aka a list of a mix of ScmRuns or lists of ScmRuns.

This wouldn't require much, if any, change to the groupby code other than updating documentation.

Describe alternatives you've considered

Handling apply functions return values differently if it is a ScmRun or a list of ScmRun. The proposed soln is more flexible as similar functionality may be used in other places.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions