Skip to content

Version 4.0.0#186

Merged
riley-harper merged 142 commits intomainfrom
v4-dev
Apr 7, 2025
Merged

Version 4.0.0#186
riley-harper merged 142 commits intomainfrom
v4-dev

Conversation

@riley-harper
Copy link
Copy Markdown
Contributor

@riley-harper riley-harper commented Mar 6, 2025

This is a tracking PR for changes in version 4.0.0, and the issues that those changes are related to.

Changes

ccdavis and others added 30 commits November 14, 2024 15:12
… and test all threshold matrix members against that set of params. Still has a failure.
…oesn't give good results making no matches in the test data, so precision is NaN.
…split used to test all thresholds isn't a good one.
We can just pass the list of model_parameters from the config file to this
function.
This will make this piece of code easier to understand and test.
…rch setting

One of these tests is failing because we haven't implemented this logic in the
_get_model_parameters() function yet.
riley-harper and others added 18 commits December 13, 2024 21:25
Allow setting the checkpoint directory through SparkConnection
This is an alpha release of 4.0.0. It's a pre-release, so pip shouldn't
download it unless you specifically request it. Until we go to 4.0.0 for real,
the last official release will be 3.8.0.
This module has been deprecated for more than a year and is ready for removal.
pyspark.ml.feature.Interaction provides the same interface, and users should
use that class instead.
This is an old, deprecated way of specifying blocking.
Now that blocking_steps isn't supported, it's simpler to inline this private
helper function.
This has been deprecated in favor of the current column_mappings format.
This documentation was unfortunately using the old, deprecated form. So I've
updated it to use the new form instead.
Remove deprecated code for version 4
To support backwards compatibility, there is a "use_legacy_toml_parser"
argument. Setting this tells load_conf_file() to use the toml library.
Use tomli instead of the toml package by default
In some rare cases with very large inputs, mcc() could return values outside of
the range [-1, 1] due to floating-point precision limitations. To fix this,
I've just added a clamp() function and called it to force the return value into
the acceptable range.
Fix a bug where model_metrics.mcc() < -1.0
@riley-harper riley-harper marked this pull request as draft March 7, 2025 14:44
riley-harper and others added 8 commits March 7, 2025 20:01
So far, this has information on model parameter searches.
Because of the changes on main, needed to regenerate the Sphinx docs.
Since this is now deprecated, replace most of the references to
training.param_grid with equivalent references to
training.model_parameter_search.
Update docs for training.param_grid
This is the beta pre-release for version 4. At this point, we expect all
feature work and breaking changes to be done. There may be bug fixes,
documentation improvements, and code cleanup still happening, but the general
behavior should be pretty close to stable if all goes well.
@riley-harper riley-harper marked this pull request as ready for review April 7, 2025 15:55
@riley-harper riley-harper merged commit c2a6a7a into main Apr 7, 2025
6 checks passed
@riley-harper riley-harper deleted the v4-dev branch April 7, 2025 16:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment