Skip to content

Fix critical bugs and improve robustness across codebase#65

Open
giocaizzi wants to merge 2 commits intomainfrom
claude/senior-code-review-K14uu
Open

Fix critical bugs and improve robustness across codebase#65
giocaizzi wants to merge 2 commits intomainfrom
claude/senior-code-review-K14uu

Conversation

@giocaizzi
Copy link
Copy Markdown
Owner

Summary

This PR addresses 15 actionable findings from a comprehensive code review, focusing on critical correctness issues, robustness improvements, and performance optimizations. The changes ensure the codebase is production-ready for v1.0.0.

Key Changes

Critical Fixes

  • Replace assertions with explicit error handling (assert statements are stripped with python -O flag):

    • interpolate.py: Replace assertion with explicit check
    • idw.py: Replace assertions with RuntimeError for model state validation
    • plot_2d.py, plot_3d.py: Replace assertions with RuntimeError for result validation
    • preprocessor.py: Replace assertion with explicit check for downsampling resolution
  • Fix einsum axis labeling in 3D plotting (plot_3d.py):

    • Correct misleading einsum from "ZXY->XYZ" to "ZYX->XYZ" to match pykrige's actual (Z, Y, X) output convention
    • Add clarifying comment about axis ordering
  • Add validation for grid resolution (core/types.py):

    • Add __post_init__ validation to reject zero or negative resolution values
    • Prevents silent failures with empty arrays from np.arange

Robustness Improvements

  • Handle empty DataFrames (core/griddata.py):

    • Add guard in GridDataSpecs.from_dataframe() to reject empty DataFrames with clear error message
  • Document NaN handling (modelling/utils.py):

    • Add docstring notes to normalize() and standardize() explaining that NaN values are silently propagated (pandas skipna=True default)
  • Fix single-ID downsampling plot crash (plotting/downsampling.py):

    • Add squeeze=False to plt.subplots() to always return 2D array, preventing IndexError when num_rows=1 and num_cols=1
  • Move loop-invariant code outside loop (plotting/downsampling.py):

    • Move empty subplot visibility toggling after the loop (was running on every iteration)
  • Move DataFrame copy outside loop (plotting/plot_2d.py):

    • Move points_df = gd_reversed.data.copy() outside the plotting loop to avoid redundant copies

API & Design Improvements

  • Allow default model parameters (modelling/interpolate.py):

    • Change interpolate() to default model_params to {} when neither model_params nor model_params_grid is provided
    • Removes requirement to pass empty dict for models with default parameters
    • Update test to verify default behavior works
  • Document SklearnModel registry exclusion (modelling/models/__init__.py):

    • Add docstring note explaining why SklearnModel is intentionally excluded from MODEL_REGISTRY
  • Document Modeler construction behavior (modelling/modeler.py):

    • Add docstring note clarifying that model is fitted immediately during construction

Performance Optimizations

  • Cache grid properties (core/grid3d.py):

    • Convert grid, normalized_grid, mesh, and normalized_mesh from @property to @cached_property
    • Eliminates redundant recomputation of meshgrid and arange arrays on every access
    • Safe because GridAxis is frozen (immutable)
  • Eliminate redundant DataFrame copies (modelling/preprocessor.py):

    • Remove .copy() calls in _normalize_xyz() and _standardize_v() since data is already copied at pipeline entry
    • Reduces memory overhead for large datasets

Testing

  • Add test for grid resolution validation (zero and negative values)
  • Add test for empty DataFrame rejection
  • Update test_interpolate_no_params_raises to test_interpolate_default_params to verify new default behavior
  • All existing tests pass with changes

Notes

  • A comprehensive code review document (REVIEW.md) has been added

https://claude.ai/code/session_01D3HyJ8x8oE2sGrgtNZ85zE

claude added 2 commits March 18, 2026 00:26
Comprehensive review covering architecture, correctness, performance,
testing, and API design. Identifies 15 findings across 4 severity levels
with specific file/line references and recommended fixes.

https://claude.ai/code/session_01D3HyJ8x8oE2sGrgtNZ85zE
…rmance, and API design

CRITICAL:
- Replace assert with explicit RuntimeError/ValueError in 5 files (safe under python -O)
- Tighten _apply_downsampling return type annotation to pd.DataFrame
- Fix misleading einsum axis labels in plot_3d (ZXY->ZYX)
- Add GridResolution.__post_init__ validation rejecting zero/negative values

HIGH:
- Guard empty DataFrame in GridDataSpecs.from_dataframe
- Document NaN handling behavior in normalize/standardize
- Vectorize hull filtering with shapely.contains_xy (~100-1000x speedup)
- Document fit-on-construction design in Modeler docstring

MEDIUM:
- Cache grid/mesh properties with functools.cached_property in Grid3D
- Remove redundant DataFrame copies in Preprocessor._normalize_xyz/_standardize_v
- Hoist DataFrame copy outside per-slice loop in plot_2d_model

LOW:
- Document SklearnModel exclusion from MODEL_REGISTRY
- Move loop-invariant subplot hiding out of per-ID loop in downsampling
- Add squeeze=False to plt.subplots in plot_downsampling (fixes single-ID crash)
- Default model_params to {} when neither param argument provided

Tests: Add 4 new validation tests, update 1 existing test. All 95 tests pass.

https://claude.ai/code/session_01D3HyJ8x8oE2sGrgtNZ85zE
Copilot AI review requested due to automatic review settings March 20, 2026 09:33
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR addresses a set of correctness, robustness, and performance findings across py3dinterpolations to improve production readiness for the v1.0.0 release.

Changes:

  • Replaces runtime assert usage with explicit exceptions, and adds validation for invalid grid resolutions / empty inputs.
  • Improves plotting and preprocessing robustness/performance (e.g., subplots handling, moving loop-invariant work, caching grid computations).
  • Updates/extends tests to cover the new behaviors and defaults.

Reviewed changes

Copilot reviewed 16 out of 16 changed files in this pull request and generated no comments.

Show a summary per file
File Description
tests/modelling/test_interpolate.py Updates test to confirm interpolate() defaults model params to {}.
tests/core/test_types.py Adds tests for rejecting zero/negative GridResolution.
tests/core/test_griddata.py Adds test for rejecting empty DataFrames in GridDataSpecs.from_dataframe().
py3dinterpolations/plotting/plot_3d.py Replaces assertion with RuntimeError and updates axis transpose logic for volume plotting.
py3dinterpolations/plotting/plot_2d.py Replaces assertion with RuntimeError; moves DataFrame copy out of loop.
py3dinterpolations/plotting/downsampling.py Fixes single-ID subplot indexing and moves empty-subplot hiding after loop.
py3dinterpolations/modelling/utils.py Documents NaN propagation behavior in normalization/standardization helpers.
py3dinterpolations/modelling/preprocessor.py Removes redundant DataFrame copies; replaces assert with RuntimeError; adjusts downsampling typing.
py3dinterpolations/modelling/models/idw.py Replaces asserts with explicit RuntimeError when predicting before fit.
py3dinterpolations/modelling/models/init.py Documents why SklearnModel is excluded from the registry.
py3dinterpolations/modelling/modeler.py Documents that fitting happens during Modeler construction.
py3dinterpolations/modelling/interpolate.py Allows default model params when neither params nor param grid provided; replaces assert with explicit check.
py3dinterpolations/core/types.py Adds GridResolution.__post_init__ validation for positive resolutions.
py3dinterpolations/core/griddata.py Adds explicit guard for empty DataFrames in GridDataSpecs.from_dataframe().
py3dinterpolations/core/grid3d.py Caches grid/mesh computations; vectorizes hull filtering with shapely.contains_xy.
REVIEW.md Adds an internal review document summarizing findings and recommendations.
Comments suppressed due to low confidence (2)

py3dinterpolations/modelling/preprocessor.py:169

  • _apply_downsampling() is annotated as returning pd.DataFrame, but the built-in statistic branches return grouped_df[["V"]].mean()/max()/..., which are pd.Series per pandas typing. This will likely fail mypy (strict) and is misleading for callers. Either adjust the return annotation (and the Callable type) to match the actual Series return, or change the implementation to consistently return a DataFrame for all branches (including custom callables) without altering the expected downstream shape after groupby(...)[["V"]].apply(...).
def _apply_downsampling(
    grouped_df: pd.DataFrame,
    downsampling_func: DownsamplingStatistic | str | Callable[..., pd.DataFrame],
) -> pd.DataFrame:
    """Apply a downsampling statistic to a grouped DataFrame."""
    if callable(downsampling_func) and not isinstance(downsampling_func, str):
        return downsampling_func(grouped_df)

    stat = DownsamplingStatistic(downsampling_func)
    match stat:
        case DownsamplingStatistic.MEAN:
            return grouped_df[["V"]].mean()
        case DownsamplingStatistic.MAX:
            return grouped_df[["V"]].max()
        case DownsamplingStatistic.MIN:
            return grouped_df[["V"]].min()
        case DownsamplingStatistic.MEDIAN:
            return grouped_df[["V"]].median()
        case DownsamplingStatistic.SUM:
            return grouped_df[["V"]].sum()
        case DownsamplingStatistic.QUANTILE75:
            return grouped_df[["V"]].quantile(0.75)

py3dinterpolations/plotting/plot_3d.py:46

  • values is transposed to (X, Y, Z) via np.einsum("ZYX->XYZ", ...), but Grid3D.mesh is built with np.meshgrid(..., indexing="xy"), which yields mesh arrays shaped (Y, X, Z). Flattening these together will misalign coordinates and voxel values in go.Volume. Consider either (a) transposing to (Y, X, Z) to match the existing mesh layout, or (b) switching Grid3D.mesh/normalized_mesh to indexing="ij" so both mesh and values are consistently (X, Y, Z).
    # pykrige outputs (Z, Y, X) -> transpose to (X, Y, Z)
    values = np.einsum("ZYX->XYZ", modeler.result.interpolated)

    data: list[go.Volume | go.Scatter3d] = [
        go.Volume(
            x=modeler.grid.mesh["X"].flatten(),
            y=modeler.grid.mesh["Y"].flatten(),
            z=modeler.grid.mesh["Z"].flatten(),
            value=values.flatten(),

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants