-
Notifications
You must be signed in to change notification settings - Fork 1.1k
[megatron] Support MiniMaxAI/MiniMax-M2.1 #7262
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[megatron] Support MiniMaxAI/MiniMax-M2.1 #7262
Conversation
Summary of ChangesHello @Jintao-Huang, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly expands the capabilities of the Megatron framework by introducing comprehensive support for the "MiniMaxAI/MiniMax-M2.1" large language model. The changes involve integrating the new model definition, implementing specialized attention and bridge classes to manage its unique architectural features like QK layernorm and Mixture of Experts (MoE) weight mapping, and updating documentation. These modifications ensure that the "MiniMax-M2.1" model can be effectively utilized and tested within the existing system. Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request adds support for the MiniMax-M2.1 model to Megatron. The changes include updating documentation, registering the new model, and implementing a custom bridge for state dictionary conversion. The implementation introduces a new MinimaxM2Bridge to handle model-specific logic, particularly for MoE layers and QK layernorm.
I've found a critical bug in gpt_bridge.py that could cause a NameError for models other than minimax_m2. I've also suggested a minor refactoring in minimax_m2.py to improve code conciseness. Overall, the changes are well-structured to support the new model.
…megatron' into support_minimax_m2_megatron
|
/gemini review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request adds support for the MiniMaxAI/MiniMax-M2.1 model, including necessary code for its architecture, configuration, and documentation. The changes also feature some beneficial refactoring in gpt_bridge.py to better handle model-specific variations. My review focuses on improving dependency management by suggesting a less restrictive version for transformers and enhancing code clarity in the new model implementation with an additional comment.
| q_layernorm = submodules.q_layernorm | ||
| k_layernorm = submodules.k_layernorm | ||
| submodules.q_layernorm = IdentityOp | ||
| submodules.k_layernorm = IdentityOp | ||
| super().__init__(config, submodules, *args, **kwargs) | ||
| submodules.q_layernorm = q_layernorm | ||
| submodules.k_layernorm = k_layernorm |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The logic of temporarily setting q_layernorm and k_layernorm to IdentityOp is a clever way to work around the base class's initialization. For better maintainability, please add a comment explaining why this is done. It will help future developers understand the purpose of this code block more quickly.
|
/gemini review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request adds support for the MiniMax-M2 and M2.1 models, including their conversion to Megatron format. The changes are well-structured, with new model-specific logic encapsulated in swift/megatron/model/gpt/minimax_m2.py. The refactoring in swift/megatron/model/gpt_bridge.py to introduce helper methods like get_hf_mlp_prefix and _set_qk_layernorm is a great improvement for extensibility and maintainability. The documentation and tests have also been updated accordingly. I have one suggestion to improve the design by moving a model-specific hack out of a generic testing function.
| # router to bfloat16 | ||
| for n, m in mg_language_model.named_modules(): | ||
| if n.endswith('router'): | ||
| m.to(hf_model.dtype) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This hardcoded loop to change the dtype of 'router' modules is model-specific logic, likely for MiniMax-M2. Placing it inside the generic test_convert_precision function makes this utility less reusable and harder to maintain as more models with special requirements are added.
Consider moving this logic to a model-specific preparation step. For instance, you could introduce a prepare_for_test method in the GPTBridge class, which can be overridden by model-specific bridges like MinimaxM2Bridge to handle such preparations before precision testing.
MiniMax-AI/MiniMax-M2#48