weightedcalcs is a pandas-based Python library for calculating weighted means, medians, standard deviations, and more.
- Plays well with
pandas. - Support for weighted means, medians, quantiles, standard deviations, and distributions.
- Support for grouped calculations, using
DataFrameGroupByobjects. - Raises an error when your data contains null-values.
- Full test coverage.
pip install weightedcalcsEvery weighted calculation in weightedcalcs begins with an instance of the weightedcalcs.Calculator class. Calculator takes one argument: the name of your weighting variable. So if you're analyzing a survey where the weighting variable is called "resp_weight", you'd do this:
import weightedcalcs as wc
calc = wc.Calculator("resp_weight")Currently, weightedcalcs.Calculator supports the following calculations:
calc.mean(my_data, value_var): The weighted arithmetic average ofvalue_var.calc.quantile(my_data, value_var, q): The weighted quantile ofvalue_var, whereqis between 0 and 1.calc.median(my_data, value_var): The weighted median ofvalue_var, equivalent to.quantile(...)whereq=0.5.calc.std(my_data, value_var): The weighted standard deviation ofvalue_var.calc.distribution(my_data, value_var): The weighted proportions ofvalue_var, interpretingvalue_varas categories.calc.count(my_data): The weighted count of all observations, i.e., the total weight.calc.sum(my_data, value_var): The weighted sum ofvalue_var.
The obj parameter above should one of the following:
- A
pandasDataFrameobject - A
pandasDataFrame.groupbyobject - A plain Python dictionary where the keys are column names and the values are equal-length lists.
Below is a basic example of using weightedcalcs to find what percentage of Wyoming residents are married, divorced, et cetera:
import pandas as pd
import weightedcalcs as wc
# Load the 2015 American Community Survey person-level responses for Wyoming
responses = pd.read_csv("examples/data/acs-2015-pums-wy-simple.csv")
# `PWGTP` is the weighting variable used in the ACS's person-level data
calc = wc.Calculator("PWGTP")
# Get the distribution of marriage-status responses
calc.distribution(responses, "marriage_status").round(3).sort_values(ascending=False)
# -- Output --
# marriage_status
# Married 0.425
# Never married or under 15 years old 0.421
# Divorced 0.097
# Widowed 0.046
# Separated 0.012
# Name: PWGTP, dtype: float64See this notebook to see examples of other calculations, including grouped calculations.
Max Ghenis has created a version of the example notebook that can be run directly in your browser, via Google Colab.
