Fix SFT + GA (gradient accumulation) #2897

gobbleturk · 2025-12-29T19:50:56Z

Description

Fixes an issue where gradient accumulation + sft would fail due to missing param sharding

FIXES: b/472309528

Tests

Added a unit test (integration test) for sft + ga

Manually tested via

python3 -m MaxText.sft_trainer MaxText/configs/base.yml run_name=mattdavidow-train-base base_output_directory=$output_dir dataset_path=$dataset steps=5 enable_checkpointing=False enable_goodput_recording=False use_sft=True gradient_accumulation_steps=5

Checklist

Before submitting this PR, please make sure (put X in square brackets):

I have performed a self-review of my code. For an optional AI review, add the gemini-review label.
I have necessary comments in my code, particularly in hard-to-understand areas.
I have run end-to-end tests tests and provided workload links above if applicable.
I have made or will make corresponding changes to the doc if needed, including adding new documentation pages to the relevant Table of Contents (toctree directive) as explained in our documentation.

codecov · 2025-12-29T20:16:57Z

Codecov Report

❌ Patch coverage is 0% with 2 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
src/MaxText/sft_trainer.py	0.00%	2 Missing ⚠️

📢 Thoughts on this report? Let us know!

gobbleturk added 2 commits December 29, 2025 19:28

Fix sft

60b3b08

Fix sft + ga

d09e83c

gobbleturk requested review from A9isha, NicoGrande, NuojCheng, RissyRan, SurbhiJainUSC, aireenmei, bvandermoon, gagika, hengtaoguo, jiangjy1982, khatwanimohit, richjames0, shralex, suexu1025 and vipannalla as code owners December 29, 2025 19:50

gobbleturk added 2 commits December 29, 2025 19:54

Fix SFT + GA

48f795a

Fix SFT + GA

b915bae

Fix SFT + GA

a0b389f

gobbleturk mentioned this pull request Dec 29, 2025

SFT + Gradient accumulation fix #2896

Closed

4 tasks

NuojCheng approved these changes Dec 29, 2025

View reviewed changes

gagika approved these changes Dec 29, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix SFT + GA (gradient accumulation) #2897

Fix SFT + GA (gradient accumulation) #2897

gobbleturk commented Dec 29, 2025 •

edited

Loading

Uh oh!

codecov bot commented Dec 29, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Fix SFT + GA (gradient accumulation) #2897

Are you sure you want to change the base?

Fix SFT + GA (gradient accumulation) #2897

Conversation

gobbleturk commented Dec 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Tests

Checklist

Uh oh!

codecov bot commented Dec 29, 2025

Codecov Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

gobbleturk commented Dec 29, 2025 •

edited

Loading