-
Notifications
You must be signed in to change notification settings - Fork 9
Open
Labels
area-statsIssues and PRs related to Thicket's stats subpackageIssues and PRs related to Thicket's stats subpackage
Description
Calling th.stats.std(ttk, cols) may result in aggregation of a single row. In thicket, we are calling .agg(np.std), which calculates the standard deviation with a degrees of freedom (ddof) of 1. In other words, it divides by n-1 (where n is the number of elements). This is statistically appropriate for estimating the population standard deviation from a sample. However, with only one element, the calculation becomes 0/0, resulting in a NaN value.
One alternative is to set ddof=0, calculating the standard deviation with ddof=0, dividing by n instead of n-1, resulting in a standard deviation of 0 for a single element:
import pandas as pd
import numpy as np
df = pd.DataFrame({'A': [1]})
# Calculate standard deviation with ddof=0
result = df.agg(lambda x: np.std(x, ddof=0))
print(result)
For standard deviation, it may be appropriate to have an option to toggle between population and sample calculation.
Metadata
Metadata
Assignees
Labels
area-statsIssues and PRs related to Thicket's stats subpackageIssues and PRs related to Thicket's stats subpackage