Merged
Conversation
… and test all threshold matrix members against that set of params. Still has a failure.
…oesn't give good results making no matches in the test data, so precision is NaN.
…s given to the thresholding eval.
…split used to test all thresholds isn't a good one.
We can just pass the list of model_parameters from the config file to this function.
This will make this piece of code easier to understand and test.
…rch setting One of these tests is failing because we haven't implemented this logic in the _get_model_parameters() function yet.
Allow setting the checkpoint directory through SparkConnection
This is an alpha release of 4.0.0. It's a pre-release, so pip shouldn't download it unless you specifically request it. Until we go to 4.0.0 for real, the last official release will be 3.8.0.
This module has been deprecated for more than a year and is ready for removal. pyspark.ml.feature.Interaction provides the same interface, and users should use that class instead.
This is an old, deprecated way of specifying blocking.
Now that blocking_steps isn't supported, it's simpler to inline this private helper function.
This has been deprecated in favor of the current column_mappings format.
This documentation was unfortunately using the old, deprecated form. So I've updated it to use the new form instead.
Remove deprecated code for version 4
To support backwards compatibility, there is a "use_legacy_toml_parser" argument. Setting this tells load_conf_file() to use the toml library.
Use tomli instead of the toml package by default
In some rare cases with very large inputs, mcc() could return values outside of the range [-1, 1] due to floating-point precision limitations. To fix this, I've just added a clamp() function and called it to force the return value into the acceptable range.
Fix a bug where model_metrics.mcc() < -1.0
So far, this has information on model parameter searches.
Add docs for Model Exploration
Because of the changes on main, needed to regenerate the Sphinx docs.
Since this is now deprecated, replace most of the references to training.param_grid with equivalent references to training.model_parameter_search.
Update docs for training.param_grid
This is the beta pre-release for version 4. At this point, we expect all feature work and breaking changes to be done. There may be bug fixes, documentation improvements, and code cleanup still happening, but the general behavior should be pretty close to stable if all goes well.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This is a tracking PR for changes in version 4.0.0, and the issues that those changes are related to.
Changes
core.model_metricsmodule with functions for computing metrics on model confusion matrices. Include more metrics and information on the raw confusion matrices in model exploration output. Closes Add F-measure to the computed model metrics, and include the raw confusion matrix in the output #179.core.classifierfunctions to not interact withthresholdandthreshold_ratio. The caller should ensure that the passed dictionary only contains parameters for the model to be trained. Closes Don't handle threshold and threshold_ratio in core.classifier.choose_classifier() #172.core.thresholdand simplify the parameters required for a few of the functions. Closes Simplify the interface to linking/core/threshold.py #174.checkpoint_dirargument toSparkConnection. Closes Don't set the Spark checkpoint directory to the tmp directory #181.hlink.linking.transformers.interaction_transformermodule #98 and closes Remove deprecation warnings and associated code for previous config structures #127.tomlias the default TOML parser. Closes Use a different TOML package #45.