Skip to content

Create contribution plan_1 #61695

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
154 changes: 154 additions & 0 deletions CONTRIBUTION_PLAN.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,154 @@
## 1. Basic Information

- **Project Name**: pandas
- **GitHub URL**: [https://github.com/pandas-dev/pandas](https://github.com/pandas-dev/pandas)

- **Primary Language(s)**:
- Python
- C (some extensions)
- Cython (for performance-critical code)

- **What is the project used for?**
`pandas` is a fast, powerful, and flexible open-source data analysis and manipulation library for Python. It provides high-level data structures such as `DataFrame` and `Series`, which are essential for working with structured data, time series, and large datasets in a convenient and efficient way.

---

## 2. Contribution Guidelines

- **Are there clear steps in a `CONTRIBUTING.md` file?**
✅ No

- **Is there a Code of Conduct?**
✅ Yes. The project includes a [Code of Conduct](https://github.com/pandas-dev/pandas/blob/main/CODE_OF_CONDUCT.md) to ensure respectful and inclusive participation.

- **Is a CLA (Contributor License Agreement) needed?**
❌ No CLA is currently required for contributions to pandas.

- **Are first-time contributors welcomed?**
✅ Yes. The project welcomes first-time contributors and provides good first issues with the `good first issue` label to help newcomers get started.

---

## 3. Environment Setup

- **How do you set up the project locally?**

```bash
# Step 1: Clone your fork
git clone https://github.com/<your-username>/pandas.git
cd pandas

# Step 2: Set up a virtual environment
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate

# Step 3: Upgrade pip
pip install --upgrade pip

# Step 4: Install development dependencies
pip install -r requirements-dev.txt

# Step 5: Build the C extensions (optional but recommended)
python setup.py build_ext --inplace

# Step 6: Run tests to verify setup
pytest pandas
```



## 4. Making a Contribution

- **Open Issue Chosen:**
[ENH: the error behaviour in pandas operation should be consistent – rename errors are ignore by default whereas drop errors are raise #61687](https://github.com/pandas-dev/pandas/issues/61687)

- **Issue Summary:**
In pandas, operations like `rename` silently ignore missing labels by default (unless `errors='raise'`), but operations like `drop` will raise an error for missing labels. This inconsistency in error-handling behavior is confusing. The feature request suggests making error behavior consistent across pandas APIs.

### Steps to Resolve the Issue:
1. Clarify API behavior by reviewing pandas functions (`rename`, `drop`, etc.) and documenting their default `errors` behavior.
2. Decide on a consistent default approach—either always ignore or always raise—ideally matching common use cases or enhancing explicitness.
3. Implement the change consistently:
- Modify default parameters or error-checking logic in `DataFrame.rename()`, `Series.rename()`, `drop()`, `set_index()`, etc.
4. Update documentation in both docstrings and user guides to reflect the standardized behavior.
5. Add unit tests in files like `pandas/tests/frame/test_rename_drop_errors.py`, covering:
- Missing labels with default behavior.
- Explicit use of `errors='ignore'` and `errors='raise'`.
6. Validate changes by running `pytest pandas/tests/` and the full test suite.

---

## 5. Create a Pull Request Plan

### Pull Request Workflow:

1. Create a feature branch:
```bash
git checkout -b standardize-errors-behavior
```

2. Make code changes in `pandas/core/generic.py`, potentially `indexing.py`, and documentation files.

3. Stage and commit changes:
```bash
git add pandas/core/generic.py pandas/core/indexing.py pandas/*.rst pandas/tests/frame/test_rename_drop_errors.py
git commit -m "ENH: Standardize default errors behavior across rename/drop (#61687)"
```

4. Push the branch to your fork:
```bash
git push origin standardize-errors-behavior
```

5. Open a Pull Request:
- Base: `pandas-dev/pandas:main`
- Head: `yourusername/standardize-errors-behavior`

---

### Example PR Title:
```
ENH: Standardize default errors behavior across rename and drop (#61687)
```

---

### PR Description:
```
This PR addresses [#61687](https://github.com/pandas-dev/pandas/issues/61687), which highlights an inconsistency in pandas default error handling: `rename()` silently ignores missing labels unless `errors='raise'` is specified, while methods like `drop()` raise errors for missing labels by default.

### Changes:
- Updated DataFrame/Series `rename()` to default to `errors='raise'` for consistency.
- Updated documentation and docstrings to reflect this change.
- Added unit tests to validate both default and explicit `errors` behavior for `rename` and `drop`.

### Rationale:
This enhances API consistency and helps avoid silent failures, making pandas operations more reliable.

### Testing:
- Tests covering missing-label scenarios with default/explicit `errors`.
- Run:
```bash
pytest pandas/tests/frame/test_rename_drop_errors.py
```
Plus full suite with:
```bash
pytest
```
```

---

### Testing the Fix

- Run scoped tests:
```bash
pytest pandas/tests/frame/test_rename_drop_errors.py
```

- Run full test suite:
```bash
pytest
```

- Confirm that default errors behavior is consistent and that explicit usage of `errors='ignore'/'raise'` works correctly.
Loading