Skip to content

Training time finished without completing a successful trial. Either no trial completed or the metric for all completed trials are NaN or Infinity #7389

@jackpotcityco

Description

@jackpotcityco

Hello,

  • ML.net: 3.0.1
  • CPU: i7-12800h
  • 24 MB Intel® Smart Cache
  • RAM: 64 GB

I encounter an error that happens sometimes and sometimes not which I can't understand why that happens when I train a:
mlContext.Auto().CreateRegressionExperiment

(I have over 20 GB of free RAM when the error occurs)

General Exception:
Message: One or more errors occurred.
at System.Threading.Tasks.Task1.GetResultCore(Boolean waitCompletionNotification) at Microsoft.ML.AutoML.AutoMLExperiment.Run() at Microsoft.ML.AutoML.RegressionExperiment.Execute(IDataView trainData, IDataView validationData, ColumnInformation columnInformation, IEstimator1 preFeaturizer, IProgress`1 progressHandler)
Inner Exception:
Training time finished without completing a successful trial. Either no trial completed or the metric for all completed trials are NaN or Infinity at Microsoft.ML.AutoML.AutoMLExperiment.d__24.MoveNext()

I will explain step by step what I have done:

Step 1:
I have filled below IDataViews where each row has 50 Features and a "Label" Target Truth column.

(Pseudo code) IDataViews contains those number of rows with 51 Columns:

             IDataView trainData          (Has 175000 rows)
             IDataView hold_out_data      (Has 75000 rows)

Each Feature value for all rows has float values and has been checked to ensure valid values against this function:

bool IsValid(float value)
{
    // A valid number is not NaN and not Infinity
    return !float.IsNaN(value) && !float.IsInfinity(value);
}

Step 2:
I now use those 2 IDataViews in below function to train the Model. But the training stops after 0-3 seconds all the time (I also use a loop to call the below function where I shuffle the data in each iteration to see if that could solve it but it doesn't help) and produces the above/below error which is:
Training time finished without completing a successful trial. Either no trial completed or the metric for all completed trials are NaN or Infinity

As seen I have set training time to 240 seconds. Increasing does not seem to be the problem as training stops after 0-3 seconds.
Why is that happening as all feature values has valid "float" values?

If I for example use less rows. For example 150,000 rows. The error will most of the time not occur and the training of the models works fine.
But to not confuse, I also have other instances where I use more than those 250,000 rows and training can succeed.

So this error happens truly at random as it seems and how to understand why this is happening as the error doesn't tell exactly where and why this is happening. How to solve this problem?

Thank you!

        void Model_Training(IDataView trainData, IDataView hold_out_data)
        {
            var mlContext = new MLContext();
            var cts = new CancellationToken();
            ExperimentBase<RegressionMetrics, RegressionExperimentSettings> regression_Experiment = null;
            regression_Experiment = mlContext.Auto().CreateRegressionExperiment(new RegressionExperimentSettings
            {
                MaxExperimentTimeInSeconds = 240,
                CacheBeforeTrainer = CacheBeforeTrainer.Off,
                CacheDirectoryName = "C:/Aintelligence/temp/cache",
                MaximumMemoryUsageInMegaByte = 16384,
                OptimizingMetric = RegressionMetric.RSquared,
                CancellationToken = cts
            });

            // Progress handler for regression
            var regressionProgressHandler = new Progress<RunDetail<RegressionMetrics>>(ph =>
            {
                if (ph.ValidationMetrics != null) { progress(Math.Round(ph.ValidationMetrics.RSquared, 3), ph.TrainerName, ph.ValidationMetrics, ph.Model); }
            });
            void progress(double metricValue, string TrainerName, object ValidationMetrics, ITransformer Model)
            {
                //Log this info
                var logInfo = (TrainerName, ValidationMetrics, Model);
            }
            try
            {
                //Do something with the results
                var results = regression_Experiment.Execute(trainData, hold_out_data, labelColumnName: "Label", progressHandler: regressionProgressHandler);
            }
            catch (Exception ex)
            {
                //Log this error
                string str = $"General Exception:\nMessage: {ex.Message}\n{ex.StackTrace}\n{(ex.InnerException != null ? $"Inner Exception:\n{ex.InnerException.Message}\n{ex.InnerException.StackTrace}\n" : "")}";
            }
        }

Metadata

Metadata

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions