Skip to content

feat: Add support for MetricDefinitions in ModelTrainer #5202

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Jun 12, 2025
Merged

Conversation

chad119
Copy link
Collaborator

@chad119 chad119 commented Jun 10, 2025

Issue #, if available:
Issue: #5018

Description of changes:

  • Adding support for MetricDefinitions following existing .with_xxx pattern
  • Enable usage like below:
  from sagemaker.modules.train import ModelTrainer
  from sagemaker.modules.configs import MetricDefinition

  metric_definitions = [
      MetricDefinition(
          name="loss",
          regex="Loss: (.*?);",
      )
  ]

  model_trainer = ModelTrainer(
      ...
  ).with_metric_definitions(metric_definitions)

Testing done:

  • Added unit tests and tested end to end for sanity

Merge Checklist

Put an x in the boxes that apply. You can also fill these out after creating the PR. If you're unsure about any of them, don't hesitate to ask. We're here to help! This is simply a reminder of what we are going to look for before merging your pull request.

General

  • I have read the CONTRIBUTING doc
  • I certify that the changes I am introducing will be backward compatible, and I have discussed concerns about this, if any, with the Python SDK team
  • I used the commit message format described in CONTRIBUTING
  • I have passed the region in to all S3 and STS clients that I've initialized as part of this change.
  • I have updated any necessary documentation, including READMEs and API docs (if appropriate)

Tests

  • I have added tests that prove my fix is effective or that my feature works (if appropriate)
  • I have added unit and/or integration tests as appropriate to ensure backward compatibility of the changes
  • I have checked that my tests are not configured for a specific region or account (if appropriate)
  • I have used unique_name_from_base to create resource names in integ tests (if appropriate)
  • If adding any dependency in requirements.txt files, I have spell checked and ensured they exist in PyPi

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@chad119 chad119 requested a review from a team as a code owner June 10, 2025 20:20
@chad119 chad119 requested a review from Aditi2424 June 10, 2025 20:20
Copy link

codecov bot commented Jun 10, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 86.19%. Comparing base (70b2f9a) to head (7616328).
Report is 1 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #5202      +/-   ##
==========================================
+ Coverage   86.18%   86.19%   +0.01%     
==========================================
  Files         446      446              
  Lines       43267    43277      +10     
==========================================
+ Hits        37290    37304      +14     
+ Misses       5977     5973       -4     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@chad119 chad119 merged commit 0215512 into master Jun 12, 2025
14 checks passed
@chad119 chad119 deleted the chadchc-test branch June 12, 2025 20:17
Aditi2424 added a commit that referenced this pull request Jun 23, 2025
* change: update jumpstart region_config, update image_uri_configs 06-12-2025 07:18:12 PST

* feat: Add support for MetricDefinitions in ModelTrainer (#5202)

* feat: Add support for MetricDefinitions in ModelTrainer

* style fix

* Update model_trainer.py to generate the doc

* resolve unit test failed

* solve another unit test error

---------

Co-authored-by: Chad Chiang <[email protected]>

* prepare release v2.247.0

* update development version to v2.247.1.dev0

* change: update image_uri_configs 06-19-2025 07:18:34 PST

* prepare release v2.247.1

* update development version to v2.247.2.dev0

* documentation: update theme and add new pages

* Add template files for custom logo and navbar

---------

Co-authored-by: sagemaker-bot <[email protected]>
Co-authored-by: Chad Chiang <[email protected]>
Co-authored-by: Chad Chiang <[email protected]>
Co-authored-by: ci <ci>
Co-authored-by: adishaa <[email protected]>
Aditi2424 added a commit that referenced this pull request Jun 23, 2025
* change: update jumpstart region_config, update image_uri_configs 06-12-2025 07:18:12 PST

* feat: Add support for MetricDefinitions in ModelTrainer (#5202)

* feat: Add support for MetricDefinitions in ModelTrainer

* style fix

* Update model_trainer.py to generate the doc

* resolve unit test failed

* solve another unit test error

---------

Co-authored-by: Chad Chiang <[email protected]>

* prepare release v2.247.0

* update development version to v2.247.1.dev0

* change: update image_uri_configs 06-19-2025 07:18:34 PST

* prepare release v2.247.1

* update development version to v2.247.2.dev0

* change: relax protobuf to <6.32 (#5211)

---------

Co-authored-by: sagemaker-bot <[email protected]>
Co-authored-by: Chad Chiang <[email protected]>
Co-authored-by: Chad Chiang <[email protected]>
Co-authored-by: ci <ci>
Co-authored-by: parknate@ <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants