Skip to content

[QEff. Finetuning]: Enhance test cases to match intermediate step level loss/metrics #531

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 12 commits into from
Aug 18, 2025

Conversation

quic-akuruvil
Copy link
Contributor

@quic-akuruvil quic-akuruvil commented Aug 5, 2025

Enable test cases for Intermediate step level loss/metric matching in single and DDP set up.

Nested dictionary structure for mapping the reference losses at different test scenarios. The test scenarios with the ref values are listed in a separate reference file.
The test scenarios at present include single device testing for below models:
Llama, Bert on Alpaca and GSM8k dataset.

REFERNCE DATA based on SDK - 1.21.0.23

@quic-akuruvil quic-akuruvil marked this pull request as ready for review August 5, 2025 13:54
Copy link
Contributor

@quic-meetkuma quic-meetkuma left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks good, just address minor comments and we are good to merge.

@quic-akuruvil quic-akuruvil force-pushed the test_cases branch 2 times, most recently from d76ef7f to c31b88a Compare August 13, 2025 05:55
Copy link
Contributor

@quic-meetkuma quic-meetkuma left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks Ann for making this change.

f"{name} length mismatch for scenario '{scenario_key}' (WS: {current_world_size}, Rank: {current_rank}). "
f"Expected {len(ref_list)} elements, but got {len(actual_list)}."
)
max_diff = np.max(np.abs(np.array(ref_list) - np.array(actual_list)))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In case of mismatch, it will report the max diff. It should instead report: 1) The step numbers at which deviation is happening, 2) diff in value at each of these steps. np.isclose() will help in getting the deviated indices. Before this, np.allclose() can be used to check if the assertion is passing or not.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added step wise details for deviation

REFERENCE_DATA = {
# Scenario 1: Single-device llama 3.2-1B training on Alpaca dataset.
"llama_config_alpaca_single_device": {
"description": "Baseline for Llama on Alpaca single-device",
Copy link
Contributor

@quic-swatia quic-swatia Aug 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add the complete model ID here and in other configs as well.

Signed-off-by: Ann Kuruvilla <[email protected]>
Signed-off-by: Ann Kuruvilla <[email protected]>
Signed-off-by: Ann Kuruvilla <[email protected]>
Signed-off-by: Ann Kuruvilla <[email protected]>
Signed-off-by: Ann Kuruvilla <[email protected]>
Signed-off-by: Ann Kuruvilla <[email protected]>
Signed-off-by: Ann Kuruvilla <[email protected]>
Signed-off-by: Ann Kuruvilla <[email protected]>
Signed-off-by: Ann Kuruvilla <[email protected]>
Signed-off-by: Ann Kuruvilla <[email protected]>
Signed-off-by: Ann Kuruvilla <[email protected]>
Signed-off-by: Ann Kuruvilla <[email protected]>
@quic-akuruvil quic-akuruvil merged commit d37233e into quic:main Aug 18, 2025
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants