Skip to content

stats: std population or sample calculation on 1 value #231

@slabasan

Description

@slabasan

Calling th.stats.std(ttk, cols) may result in aggregation of a single row. In thicket, we are calling .agg(np.std), which calculates the standard deviation with a degrees of freedom (ddof) of 1. In other words, it divides by n-1 (where n is the number of elements). This is statistically appropriate for estimating the population standard deviation from a sample. However, with only one element, the calculation becomes 0/0, resulting in a NaN value.

One alternative is to set ddof=0, calculating the standard deviation with ddof=0, dividing by n instead of n-1, resulting in a standard deviation of 0 for a single element:

   import pandas as pd
   import numpy as np

   df = pd.DataFrame({'A': [1]})

   # Calculate standard deviation with ddof=0
   result = df.agg(lambda x: np.std(x, ddof=0))
   print(result)

For standard deviation, it may be appropriate to have an option to toggle between population and sample calculation.

Metadata

Metadata

Assignees

No one assigned

    Labels

    area-statsIssues and PRs related to Thicket's stats subpackage

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions