Skip to content

Conversation

@27pchrisl
Copy link
Contributor

Hi,

I have a dataset with a lot of missing data, and I wanted to use the HotDeckImputer with the Gower kernel to fill in the blanks. I had preprocessed my dataset so that nulls were converted to NAN or ? depending on the data type.

The problem was then that Gower expects continuous features to have been normalized. I then wanted to use the MinMaxNormalizer to do this on the continuous features in the dataset, but it doesn't handle NAN - essentially every value is normalized to zero.

I updated the MinMaxNormalizer and the MaxAbsoluteScaler to skip NAN values, and compute min/max or abs only the finite values and leave the NAN values where they were in the original dataset.

Being new to ML, I wasn't sure if this was a valid approach for using the normalizers together with the Gower imputer - feedback welcome!

@andrewdalpino andrewdalpino changed the base branch from master to 3.0 May 23, 2024 17:47
@andrewdalpino
Copy link
Member

Targetting ML 3.0 release with this since it can be construed as a backwards compatibility break.

@andrewdalpino andrewdalpino merged commit 646b1a2 into RubixML:3.0 Dec 26, 2024
@github-actions github-actions bot locked and limited conversation to collaborators Dec 26, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants