Hello,
I ran an experiment with two models: base qwen2.5-7b vs qwen2.5-7b-MLA and found that there was no difference in either speed nor KV Cache utilization.
qwen2.5-7b-MLA was converted using your script: bash scripts/convert/qwen2.5-7B-Instruct.sh
both models were hosted with vLLM.
Results for base qwen2.5-7b:

Results for base qwen2.5-7b-MLA:
Could you please specify how exactly you measured increase in output throughout and KV Cache saving?