-
Notifications
You must be signed in to change notification settings - Fork 6.6k
Add support for Magcache #12744
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Add support for Magcache #12744
Conversation
|
@leffff could you review as well if possible? |
|
Hi @AlanPonnachan @sayakpaul |
|
@leffff , Thank you for your review. To address this, I am implementing a Calibration Mode. My plan is to add a
Users can then simply run one calibration pass for their specific model/scheduler, copy the output ratios, and pass them into I am working on this update now and will push the changes shortly! |
Sounds great! |
|
Thanks for the thoughtful discussions here @AlanPonnachan and @leffff! I will leave my two cents below:
Ccing @DN6 to get his thoughts here, too. |
|
Thanks @sayakpaul and @leffff for the feedback! I have updated the PR to address these points. Instead of a standalone utility script, I integrated the calibration logic directly into the hook configuration for better usability:
Ready for review! |
|
Looks Great! Could you please provide a usage example:
And Provide Generations To be Sure it works, please provide generations for SD3.5 Medium, Flux, Wan T2V 2.1 1.3b I also believe, as caching is suitable for all tasks, can we also try Kandinsky 5.0 Video Pro I2V kandinskylab/Kandinsky-5.0-I2V-Pro-sft-5s-Diffusers |
1. Usage Example import torch
from diffusers import FluxPipeline from diffusers.hooks import MagCacheConfig, apply_mag_cache
pipe = FluxPipeline.from_pretrained("black-forest-labs/FLUX.1-schnell", torch_dtype=torch.bfloat16).to("cuda")
# CALIBRATION STEP
config = MagCacheConfig(calibrate=True, num_inference_steps=4)
apply_mag_cache(pipe.transformer, config)
pipe("A cat playing chess", num_inference_steps=4)
# Logs: [1.0, 1.37, 0.97, 0.87]
# INFERENCE STEP
config = MagCacheConfig(mag_ratios=[1.0, 1.37, 0.97, 0.87], num_inference_steps=4)
apply_mag_cache(pipe.transformer, config)
pipe("A cat playing chess", num_inference_steps=4)2. Benchmark ResultsI validated the implementation on Flux, SD 3.5, and Wan 2.1 using a T4 Colab environment.
3. GenerationsAttached below are the outputs for the successful runs. |
|
Here is the Colab notebook used to generate the benchmarks above. It includes the full setup, memory optimizations (sequential offloading/dummy embeds), and the execution logs: |






What does this PR do?
This PR adds support for MagCache (Magnitude-aware Cache), a training-free inference acceleration method for diffusion models, specifically targeting Transformer-based architectures like Flux.
This implementation follows the
ModelHookpattern (similar toFirstBlockCache) to integrate seamlessly into Diffusers.Key features:
MagCacheConfig: Configuration class to control threshold, retention ratio, and skipping limits.calibrate=Trueflag. When enabled, the hook runs full inference and calculates/prints the magnitude ratios for the specific model and scheduler. This makes MagCache compatible with any transformer model (e.g., Hunyuan, Wan, SD3), not just Flux.mag_ratiosmust be explicitly provided in the config (or calibration enabled).FLUX_MAG_RATIOSas a constant for convenience, derived from the official implementation.Fixes Magcache Support. #12697
Before submitting
documentation guidelines, and
here are tips on formatting docstrings.
Who can review?
@sayakpaul