Daffy - DataFrame Column Validator

Description

Working with DataFrames often means passing them through multiple transformation functions, making it easy to lose track of their structure over time. Daffy adds runtime validation and documentation to your DataFrame operations through simple decorators. By declaring the expected columns and types in your function definitions, you can:

@df_in(columns=["price", "bedrooms", "location"])
@df_out(columns=["price_per_room", "price_category"])
def analyze_housing(houses_df):
    # Transform raw housing data into price analysis
    return analyzed_df

Like type hints for DataFrames, Daffy helps you catch structural mismatches early and keeps your data pipeline documentation synchronized with the code. Compatible with both Pandas and Polars.

Key Features

Validate DataFrame columns at function entry and exit points
Support regex patterns for matching column names (e.g., "r/column_\d+/")
Check data types of columns
Control strictness of validation (allow or disallow extra columns)
Works with both Pandas and Polars DataFrames
Project-wide configuration via pyproject.toml
Integrated logging for DataFrame structure inspection
Enhanced type annotations for improved IDE and type checker support

Documentation

Usage Guide - Detailed usage instructions
Development Guide - Guide for contributing to Daffy
Changelog - Version history and release notes

Installation

Install with your favorite Python dependency manager:

pip install daffy

Quick Start

from daffy import df_in, df_out

@df_in(columns=["Brand", "Price"])  # Validate input DataFrame columns
@df_out(columns=["Brand", "Price", "Discount"])  # Validate output DataFrame columns
def apply_discount(cars_df):
    cars_df = cars_df.copy()
    cars_df["Discount"] = cars_df["Price"] * 0.1
    return cars_df

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 165 Commits
.github		.github
daffy		daffy
docs		docs
tests		tests
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
CHANGELOG.md		CHANGELOG.md
LICENSE		LICENSE
README.md		README.md
mise.toml		mise.toml
mypy.ini		mypy.ini
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Daffy - DataFrame Column Validator

Description

Key Features

Documentation

Installation

Quick Start

License

About

Releases

Packages

Contributors 7

Languages

License

ThoughtWorksInc/daffy

Folders and files

Latest commit

History

Repository files navigation

Daffy - DataFrame Column Validator

Description

Key Features

Documentation

Installation

Quick Start

License

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 7

Languages

Packages