Skip to content

Conversation

@nvrohanv
Copy link
Contributor

Overview:

Details:

Where should the reviewer start?

Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)

  • closes GitHub issue: #xxx

# 4096 = 256 * 16
# moe_max_num_tokens: 4096
load_balancer: /mnt/recipes/deepseek-r1/trtllm/wide_ep/eplb.yaml
load_balancer: /mnt/recipes/deepseek-r1/trtllm/agg/wide_ep/eplb.yaml
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this assuming some specific PVC mounting path?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is actually coming from the slurm guides, in a separate pr we should remove those and turn all of these into k8s examples (if we want to keep slurm examples we need to fully separate it out)

values:
- "true"
mainContainer:
image: rohanv672/dynamo:0.5.1-trtllm-ssh
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will need to update this obviously 🙂

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, testing with actual dynamo container today

initialDelaySeconds: 30
periodSeconds: 10
timeoutSeconds: 5
failureThreshold: 3000
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should probably fix this to a more reasonable number.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whats a good value you think, depending on pvc speed it can end up taking around an hour, I'm thinking 1.5 hours with a comment saying its dependent on your pvc speed

requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: nvidia.com/gpu.present
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this necessary given we also specify it in the limits?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good point

metadata:
name: trtllm-test-compute-domain
spec:
numNodes: 9
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should add a comment here and in prefill/decode that these need to match.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or ideally that you don't need to specify it ahead of time and the compute domain will grow/shrink accordingly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once grove adds in support for automatic dra we'll remove this, will add in the comment

name: trtllm-disagg-multinode
spec:
pvcs:
- name: modelcache-pvc
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason we need the PVC here and in each component? Also where is this PVC defined?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ya will be adding in the same model_downloader setup as biswas

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants