Skip to content

Releases: flagos-ai/Megatron-LM-FL

v0.1.0+megatron0.15.0rc7

25 Mar 14:49

Choose a tag to compare

Megatron-LM-FL Release Notes

Highlights

This release establishes the foundational architecture for Megatron-LM-FL, introducing a unified multi-platform backend, CI/CD infrastructure, training optimizations, and support for multiple hardware platforms.


Multi-Platform Support

  • Unified platform abstraction (#11): Refactored CUDA-specific APIs into a PlatformXXX abstraction layer with automatic platform detection, flexible registration, and seamless execution across CUDA, CPU, and other devices.
  • MUSA platform support (#14): Added support for Moore Threads (mthreads) chips via the MUSA backend.
  • TXDA platform support (#22): Added support for Tsingmicro chips via the TXDA backend.

Training Optimizations

  • Engram optimization (#15): Added DeepSeek Engram optimizations including independent communication groups, optional CPU offloading of optimizer states, and adaptation for overlap_moe_expert_parallel_comm.

CI/CD

  • CI/CD workflow infrastructure (#8): Added reusable GitHub Actions workflows for unit and functional tests, a platform configuration layer (configs/cuda.yml) for declarative platform-specific settings, and a CI Dockerfile (docker/Dockerfile.fl.ci).
  • FlagScale integration tests (#12): Added cross-repo CI/CD that automatically triggers downstream FlagScale functional training tests on push/PR events, ensuring compatibility between Megatron-LM-FL and FlagScale.

Core & Documentation

  • Plugin refactor (#2): Preserved policy changes in megatron/core (hetero, dualpipev, attention) and partially abstracted them under megatron/plugin, keeping all other modules and CI/CD untouched for upstream compatibility.
  • Documentation polish (#3, #4): Improved README and QuickStart guide.

Contributors

Initial Preview

02 Feb 02:21
ebde3e2

Choose a tag to compare

Initial Preview Pre-release
Pre-release

🚀 Release Notes: v0.15.0rc7+fl.0.1.0

⚠️ ALPHA RELEASE - UNSTABLE

This is the first public release of the project based on Nvidia Megatron Core v0.15.0rc7. It is currently in an unstable alpha state. Expect bugs, breaking changes, and incomplete features. Use in production environments at your own risk.


🌟 What's New

This release marks the initial foundation of the project. We've focused on establishing the core architecture and basic functionality:

🛠 Known Issues

Since this is a "first-light" build, please be aware of the following:

  • Stability: Unexpected crashes may occur under heavy load.
  • Incomplete Features: Some features are visible but not yet functional.
  • Documentation: README and API docs are still a work in progress.

🧪 Feedback Wanted

Help us make the stable release better! If you encounter a bug or have a suggestion:

  1. Check the [Issues Tab] to see if it's already known.
  2. If not, please open a new issue with the label bug or feedback.