Allow normalizers to skip NaN values #333
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Hi,
I have a dataset with a lot of missing data, and I wanted to use the
HotDeckImputerwith theGowerkernel to fill in the blanks. I had preprocessed my dataset so thatnulls were converted toNANor?depending on the data type.The problem was then that
Gowerexpects continuous features to have been normalized. I then wanted to use theMinMaxNormalizerto do this on the continuous features in the dataset, but it doesn't handleNAN- essentially every value is normalized to zero.I updated the
MinMaxNormalizerand theMaxAbsoluteScalerto skipNANvalues, and compute min/max or abs only the finite values and leave theNANvalues where they were in the original dataset.Being new to ML, I wasn't sure if this was a valid approach for using the normalizers together with the Gower imputer - feedback welcome!