Skip to content

Distributed group information for MOE layer #20632

@preddy5

Description

@preddy5

📚 Documentation

Thank you for maintaining this amazing repository.

I am integrating MOE layers into my model architecture, which I am training using lightning.
I am using megablocks implementation due to its wider adoption. One of the variables required to enable moe_expert_model_parallelism is distributed group information(https://github.com/databricks/megablocks/blob/main/megablocks/layers/memory_test.py#L97C5-L97C10). I am wondering if there is a way to access this information in LightningModule before model initialization.

I would appreciate any guidance you can provide on how to access the group variable, even if it is not straightforward with the current lightning API. Thank you very much for your time and help!

Regards,
Pradyumna.

cc @lantiga @Borda

Metadata

Metadata

Assignees

No one assigned

    Labels

    docsDocumentation relatedneeds triageWaiting to be triaged by maintainers

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions