Fix critical bugs and improve robustness across codebase#65
Open
Fix critical bugs and improve robustness across codebase#65
Conversation
Comprehensive review covering architecture, correctness, performance, testing, and API design. Identifies 15 findings across 4 severity levels with specific file/line references and recommended fixes. https://claude.ai/code/session_01D3HyJ8x8oE2sGrgtNZ85zE
…rmance, and API design
CRITICAL:
- Replace assert with explicit RuntimeError/ValueError in 5 files (safe under python -O)
- Tighten _apply_downsampling return type annotation to pd.DataFrame
- Fix misleading einsum axis labels in plot_3d (ZXY->ZYX)
- Add GridResolution.__post_init__ validation rejecting zero/negative values
HIGH:
- Guard empty DataFrame in GridDataSpecs.from_dataframe
- Document NaN handling behavior in normalize/standardize
- Vectorize hull filtering with shapely.contains_xy (~100-1000x speedup)
- Document fit-on-construction design in Modeler docstring
MEDIUM:
- Cache grid/mesh properties with functools.cached_property in Grid3D
- Remove redundant DataFrame copies in Preprocessor._normalize_xyz/_standardize_v
- Hoist DataFrame copy outside per-slice loop in plot_2d_model
LOW:
- Document SklearnModel exclusion from MODEL_REGISTRY
- Move loop-invariant subplot hiding out of per-ID loop in downsampling
- Add squeeze=False to plt.subplots in plot_downsampling (fixes single-ID crash)
- Default model_params to {} when neither param argument provided
Tests: Add 4 new validation tests, update 1 existing test. All 95 tests pass.
https://claude.ai/code/session_01D3HyJ8x8oE2sGrgtNZ85zE
There was a problem hiding this comment.
Pull request overview
This PR addresses a set of correctness, robustness, and performance findings across py3dinterpolations to improve production readiness for the v1.0.0 release.
Changes:
- Replaces runtime
assertusage with explicit exceptions, and adds validation for invalid grid resolutions / empty inputs. - Improves plotting and preprocessing robustness/performance (e.g., subplots handling, moving loop-invariant work, caching grid computations).
- Updates/extends tests to cover the new behaviors and defaults.
Reviewed changes
Copilot reviewed 16 out of 16 changed files in this pull request and generated no comments.
Show a summary per file
| File | Description |
|---|---|
| tests/modelling/test_interpolate.py | Updates test to confirm interpolate() defaults model params to {}. |
| tests/core/test_types.py | Adds tests for rejecting zero/negative GridResolution. |
| tests/core/test_griddata.py | Adds test for rejecting empty DataFrames in GridDataSpecs.from_dataframe(). |
| py3dinterpolations/plotting/plot_3d.py | Replaces assertion with RuntimeError and updates axis transpose logic for volume plotting. |
| py3dinterpolations/plotting/plot_2d.py | Replaces assertion with RuntimeError; moves DataFrame copy out of loop. |
| py3dinterpolations/plotting/downsampling.py | Fixes single-ID subplot indexing and moves empty-subplot hiding after loop. |
| py3dinterpolations/modelling/utils.py | Documents NaN propagation behavior in normalization/standardization helpers. |
| py3dinterpolations/modelling/preprocessor.py | Removes redundant DataFrame copies; replaces assert with RuntimeError; adjusts downsampling typing. |
| py3dinterpolations/modelling/models/idw.py | Replaces asserts with explicit RuntimeError when predicting before fit. |
| py3dinterpolations/modelling/models/init.py | Documents why SklearnModel is excluded from the registry. |
| py3dinterpolations/modelling/modeler.py | Documents that fitting happens during Modeler construction. |
| py3dinterpolations/modelling/interpolate.py | Allows default model params when neither params nor param grid provided; replaces assert with explicit check. |
| py3dinterpolations/core/types.py | Adds GridResolution.__post_init__ validation for positive resolutions. |
| py3dinterpolations/core/griddata.py | Adds explicit guard for empty DataFrames in GridDataSpecs.from_dataframe(). |
| py3dinterpolations/core/grid3d.py | Caches grid/mesh computations; vectorizes hull filtering with shapely.contains_xy. |
| REVIEW.md | Adds an internal review document summarizing findings and recommendations. |
Comments suppressed due to low confidence (2)
py3dinterpolations/modelling/preprocessor.py:169
_apply_downsampling()is annotated as returningpd.DataFrame, but the built-in statistic branches returngrouped_df[["V"]].mean()/max()/..., which arepd.Seriesper pandas typing. This will likely fail mypy (strict) and is misleading for callers. Either adjust the return annotation (and theCallabletype) to match the actualSeriesreturn, or change the implementation to consistently return aDataFramefor all branches (including custom callables) without altering the expected downstream shape aftergroupby(...)[["V"]].apply(...).
def _apply_downsampling(
grouped_df: pd.DataFrame,
downsampling_func: DownsamplingStatistic | str | Callable[..., pd.DataFrame],
) -> pd.DataFrame:
"""Apply a downsampling statistic to a grouped DataFrame."""
if callable(downsampling_func) and not isinstance(downsampling_func, str):
return downsampling_func(grouped_df)
stat = DownsamplingStatistic(downsampling_func)
match stat:
case DownsamplingStatistic.MEAN:
return grouped_df[["V"]].mean()
case DownsamplingStatistic.MAX:
return grouped_df[["V"]].max()
case DownsamplingStatistic.MIN:
return grouped_df[["V"]].min()
case DownsamplingStatistic.MEDIAN:
return grouped_df[["V"]].median()
case DownsamplingStatistic.SUM:
return grouped_df[["V"]].sum()
case DownsamplingStatistic.QUANTILE75:
return grouped_df[["V"]].quantile(0.75)
py3dinterpolations/plotting/plot_3d.py:46
valuesis transposed to (X, Y, Z) vianp.einsum("ZYX->XYZ", ...), butGrid3D.meshis built withnp.meshgrid(..., indexing="xy"), which yields mesh arrays shaped (Y, X, Z). Flattening these together will misalign coordinates and voxel values ingo.Volume. Consider either (a) transposing to (Y, X, Z) to match the existing mesh layout, or (b) switchingGrid3D.mesh/normalized_meshtoindexing="ij"so both mesh and values are consistently (X, Y, Z).
# pykrige outputs (Z, Y, X) -> transpose to (X, Y, Z)
values = np.einsum("ZYX->XYZ", modeler.result.interpolated)
data: list[go.Volume | go.Scatter3d] = [
go.Volume(
x=modeler.grid.mesh["X"].flatten(),
y=modeler.grid.mesh["Y"].flatten(),
z=modeler.grid.mesh["Z"].flatten(),
value=values.flatten(),
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
This PR addresses 15 actionable findings from a comprehensive code review, focusing on critical correctness issues, robustness improvements, and performance optimizations. The changes ensure the codebase is production-ready for v1.0.0.
Key Changes
Critical Fixes
Replace assertions with explicit error handling (
assertstatements are stripped withpython -Oflag):interpolate.py: Replace assertion with explicit checkidw.py: Replace assertions with RuntimeError for model state validationplot_2d.py,plot_3d.py: Replace assertions with RuntimeError for result validationpreprocessor.py: Replace assertion with explicit check for downsampling resolutionFix einsum axis labeling in 3D plotting (
plot_3d.py):"ZXY->XYZ"to"ZYX->XYZ"to match pykrige's actual(Z, Y, X)output conventionAdd validation for grid resolution (
core/types.py):__post_init__validation to reject zero or negative resolution valuesnp.arangeRobustness Improvements
Handle empty DataFrames (
core/griddata.py):GridDataSpecs.from_dataframe()to reject empty DataFrames with clear error messageDocument NaN handling (
modelling/utils.py):normalize()andstandardize()explaining that NaN values are silently propagated (pandasskipna=Truedefault)Fix single-ID downsampling plot crash (
plotting/downsampling.py):squeeze=Falsetoplt.subplots()to always return 2D array, preventing IndexError when num_rows=1 and num_cols=1Move loop-invariant code outside loop (
plotting/downsampling.py):Move DataFrame copy outside loop (
plotting/plot_2d.py):points_df = gd_reversed.data.copy()outside the plotting loop to avoid redundant copiesAPI & Design Improvements
Allow default model parameters (
modelling/interpolate.py):interpolate()to defaultmodel_paramsto{}when neithermodel_paramsnormodel_params_gridis providedDocument SklearnModel registry exclusion (
modelling/models/__init__.py):SklearnModelis intentionally excluded fromMODEL_REGISTRYDocument Modeler construction behavior (
modelling/modeler.py):Performance Optimizations
Cache grid properties (
core/grid3d.py):grid,normalized_grid,mesh, andnormalized_meshfrom@propertyto@cached_propertyGridAxisis frozen (immutable)Eliminate redundant DataFrame copies (
modelling/preprocessor.py):.copy()calls in_normalize_xyz()and_standardize_v()since data is already copied at pipeline entryTesting
test_interpolate_no_params_raisestotest_interpolate_default_paramsto verify new default behaviorNotes
https://claude.ai/code/session_01D3HyJ8x8oE2sGrgtNZ85zE