Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix the cuda benchmark parameter value in the examples #129

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

yuanchen8911
Copy link
Collaborator

@yuanchen8911 yuanchen8911 commented Jun 3, 2024

  1. The number of bodies in the cuda benchmark should be a multiple of 256. Replace the existing values of 4226000 with 4226048 in the examples.
> Compute 8.0 CUDA device: [NVIDIA A100-SXM4-40GB MIG 1g.5gb]
Warning: "number of bodies" specified 4226000 is not a multiple of 256.
Rounding up to the nearest multiple: 4226048.
  1. Remove mig+mps/sharing-demo/sharing-demo-job-1.yaml, which is not used.

Copy link
Member

@elezar elezar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@@ -3,7 +3,7 @@ apiVersion: batch/v1
kind: Job
metadata:
namespace: sharing-demo
name: sharing-demo-job
name: sharing-demo-job-1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose one question I have is whether we should also rename the quickstart pod?

Also, does this require a readme update?

Copy link
Collaborator

@klueska klueska Jun 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file was never meant to be run. It was just for showing during the demo at Kubecon NA 2023. The name of the job is the same because in the demo I flip screens and show how the spec got expanded.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's no such an example in the quickstart folder. The README in mig+msps doesn't mention sharing-demo-job-1.yaml. Perhaps we can add a description later.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file is only here because it's what I started off showing here:
https://youtu.be/1QfShSQLsbs?si=9rFmRUMi-e_puwO2&t=1128

Its not in the README because by the time I get to the README in the demo I don't need this file anymore.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you find it confusing to have in here but never meant to be run, I'd just remove it rather than update it.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It could cause confusion to users. I've removed the file.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the folder name mig+mps misleading too? There's a mig+ts configuration in the example. How about renaming the folder to mig?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The purpose is to demonstrate mig+mps, the other configs are for comparison.

Signed-off-by: Yuan Chen <[email protected]>

Remove job-1

Signed-off-by: Yuan Chen <[email protected]>
@yuanchen8911 yuanchen8911 changed the title Fix issues in the examples Fix an issue in the examples Jun 4, 2024
@yuanchen8911 yuanchen8911 changed the title Fix an issue in the examples Fix the cuda benchmark parameter value in the examples Jun 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants