diff --git a/CONTRIBUTION_PLAN.md b/CONTRIBUTION_PLAN.md new file mode 100644 index 0000000000000..305b3dc82d0f3 --- /dev/null +++ b/CONTRIBUTION_PLAN.md @@ -0,0 +1,154 @@ +## 1. Basic Information + +- **Project Name**: pandas +- **GitHub URL**: [https://github.com/pandas-dev/pandas](https://github.com/pandas-dev/pandas) + +- **Primary Language(s)**: + - Python + - C (some extensions) + - Cython (for performance-critical code) + +- **What is the project used for?** + `pandas` is a fast, powerful, and flexible open-source data analysis and manipulation library for Python. It provides high-level data structures such as `DataFrame` and `Series`, which are essential for working with structured data, time series, and large datasets in a convenient and efficient way. + +--- + +## 2. Contribution Guidelines + +- **Are there clear steps in a `CONTRIBUTING.md` file?** + ✅ No + +- **Is there a Code of Conduct?** + ✅ Yes. The project includes a [Code of Conduct](https://github.com/pandas-dev/pandas/blob/main/CODE_OF_CONDUCT.md) to ensure respectful and inclusive participation. + +- **Is a CLA (Contributor License Agreement) needed?** + ❌ No CLA is currently required for contributions to pandas. + +- **Are first-time contributors welcomed?** + ✅ Yes. The project welcomes first-time contributors and provides good first issues with the `good first issue` label to help newcomers get started. + +--- + +## 3. Environment Setup + +- **How do you set up the project locally?** + +```bash +# Step 1: Clone your fork +git clone https://github.com/<your-username>/pandas.git +cd pandas + +# Step 2: Set up a virtual environment +python -m venv venv +source venv/bin/activate # On Windows: venv\Scripts\activate + +# Step 3: Upgrade pip +pip install --upgrade pip + +# Step 4: Install development dependencies +pip install -r requirements-dev.txt + +# Step 5: Build the C extensions (optional but recommended) +python setup.py build_ext --inplace + +# Step 6: Run tests to verify setup +pytest pandas +``` + + + +## 4. Making a Contribution + +- **Open Issue Chosen:** + [ENH: the error behaviour in pandas operation should be consistent – rename errors are ignore by default whereas drop errors are raise #61687](https://github.com/pandas-dev/pandas/issues/61687) + +- **Issue Summary:** + In pandas, operations like `rename` silently ignore missing labels by default (unless `errors='raise'`), but operations like `drop` will raise an error for missing labels. This inconsistency in error-handling behavior is confusing. The feature request suggests making error behavior consistent across pandas APIs. + +### Steps to Resolve the Issue: +1. Clarify API behavior by reviewing pandas functions (`rename`, `drop`, etc.) and documenting their default `errors` behavior. +2. Decide on a consistent default approach—either always ignore or always raise—ideally matching common use cases or enhancing explicitness. +3. Implement the change consistently: + - Modify default parameters or error-checking logic in `DataFrame.rename()`, `Series.rename()`, `drop()`, `set_index()`, etc. +4. Update documentation in both docstrings and user guides to reflect the standardized behavior. +5. Add unit tests in files like `pandas/tests/frame/test_rename_drop_errors.py`, covering: + - Missing labels with default behavior. + - Explicit use of `errors='ignore'` and `errors='raise'`. +6. Validate changes by running `pytest pandas/tests/` and the full test suite. + +--- + +## 5. Create a Pull Request Plan + +### Pull Request Workflow: + +1. Create a feature branch: + ```bash + git checkout -b standardize-errors-behavior + ``` + +2. Make code changes in `pandas/core/generic.py`, potentially `indexing.py`, and documentation files. + +3. Stage and commit changes: + ```bash + git add pandas/core/generic.py pandas/core/indexing.py pandas/*.rst pandas/tests/frame/test_rename_drop_errors.py + git commit -m "ENH: Standardize default errors behavior across rename/drop (#61687)" + ``` + +4. Push the branch to your fork: + ```bash + git push origin standardize-errors-behavior + ``` + +5. Open a Pull Request: + - Base: `pandas-dev/pandas:main` + - Head: `yourusername/standardize-errors-behavior` + +--- + +### Example PR Title: +``` +ENH: Standardize default errors behavior across rename and drop (#61687) +``` + +--- + +### PR Description: +``` +This PR addresses [#61687](https://github.com/pandas-dev/pandas/issues/61687), which highlights an inconsistency in pandas default error handling: `rename()` silently ignores missing labels unless `errors='raise'` is specified, while methods like `drop()` raise errors for missing labels by default. + +### Changes: +- Updated DataFrame/Series `rename()` to default to `errors='raise'` for consistency. +- Updated documentation and docstrings to reflect this change. +- Added unit tests to validate both default and explicit `errors` behavior for `rename` and `drop`. + +### Rationale: +This enhances API consistency and helps avoid silent failures, making pandas operations more reliable. + +### Testing: +- Tests covering missing-label scenarios with default/explicit `errors`. +- Run: + ```bash + pytest pandas/tests/frame/test_rename_drop_errors.py + ``` + Plus full suite with: + ```bash + pytest + ``` +``` + +--- + +### Testing the Fix + +- Run scoped tests: + ```bash + pytest pandas/tests/frame/test_rename_drop_errors.py + ``` + +- Run full test suite: + ```bash + pytest + ``` + +- Confirm that default errors behavior is consistent and that explicit usage of `errors='ignore'/'raise'` works correctly. \ No newline at end of file