Skip to content

Conversation

@anyangml
Copy link
Collaborator

@anyangml anyangml commented May 7, 2025

The original design uses fixed datasets for LAMs. However, since different LAMs have varying GPU memory footprints during inference, they can trigger out-of-memory (OOM) errors with different structure sizes.

This PR introduces a new procedure with the following changes:

  • The maximum number of atoms that a LAM can handle on a given device is dynamically determined using binary search.
  • The primitive cells of the test structures are expanded on-the-fly to approach the maximum allowed number of atoms, ensuring convergence.

TODO:

  • add UTs

@codecov
Copy link

codecov bot commented May 7, 2025

Codecov Report

Attention: Patch coverage is 95.06173% with 4 lines in your changes missing coverage. Please review.

Project coverage is 65.99%. Comparing base (18630d9) to head (c7e4b9c).
Report is 14 commits behind head on main.

Files with missing lines Patch % Lines
...alculator/inference_efficiency/efficiency_utils.py 93.22% 3 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #264      +/-   ##
==========================================
+ Coverage   64.40%   65.99%   +1.59%     
==========================================
  Files          33       35       +2     
  Lines        1458     1538      +80     
  Branches      170      182      +12     
==========================================
+ Hits          939     1015      +76     
- Misses        482      485       +3     
- Partials       37       38       +1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@anyangml anyangml requested a review from caic99 May 7, 2025 03:55
@caic99 caic99 requested a review from Copilot May 7, 2025 03:58
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

@anyangml anyangml requested a review from Copilot May 7, 2025 04:08
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR introduces a dynamic method to determine the maximum number of atoms processable by a model for the efficiency test and expands test structures on-the-fly accordingly. Key changes include:

  • Adding a new OOM test atom and dynamic binary search for maximum atoms.
  • Updating the inference functions to apply on-the-fly structure expansion.
  • Introducing utility functions in efficiency_utils.py for factorization and binary search operations.

Reviewed Changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.

File Description
lambench/tasks/calculator/inference_efficiency/inference_efficiency.py Updated inference function to dynamically expand atoms and include a binary search for max atoms.
lambench/tasks/calculator/inference_efficiency/efficiency_utils.py Added utility functions for binary search, factorization, and OOM error handling.

@anyangml anyangml requested a review from caic99 May 7, 2025 07:52
@caic99 caic99 requested a review from Copilot May 7, 2025 07:55
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds a new efficiency testing procedure to dynamically determine the maximum number of atoms a LAM can process without running into OOM errors. Key changes include:

  • Adding unit tests for efficiency utilities (find_even_factors and binary_search_max_natoms) in the tests folder.
  • Integrating binary search logic for maximum natoms into the inference workflow and updating the run_one_inference API.
  • Modifying metrics calculation code that applies cell-level error computation.

Reviewed Changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.

File Description
tests/tasks/calculator/test_efficiency_utils.py Introduces tests for factor finding and binary search max_natoms functionality.
lambench/tasks/calculator/inference_efficiency/inference_efficiency.py Updates run_inference and run_one_inference to leverage dynamic natom scaling using binary search.
lambench/tasks/calculator/inference_efficiency/efficiency_utils.py Provides new helper functions for OOM detection, dynamic scaling, and factor balancing.
lambench/metrics/vishelper/metrics_calculations.py Adjusts the method to apply an instability error function over DataFrame cells.
Comments suppressed due to low confidence (2)

lambench/tasks/calculator/inference_efficiency/efficiency_utils.py:9

  • Typo found in the docstring: 'Perfrom' should be corrected to 'Perform'.
    Perfrom force field prediction for one system, return energy, forces and stress.

lambench/tasks/calculator/inference_efficiency/efficiency_utils.py:81

  • [nitpick] The parameter name 'safe_guard' could be more descriptive; consider renaming it to 'max_iterations' for improved clarity.
def binary_search_max_natoms(model: ASEModel, atoms: Atoms, upper_limit: int = 1000, safe_guard: int = 15) -> int:

@anyangml anyangml merged commit f758454 into main May 9, 2025
4 checks passed
@anyangml anyangml deleted the feat/redesign-efficiency-tests branch May 9, 2025 06:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants