No performance difference with vLLM: qwen2.5-7b vs qwen2.5-7b-MLA

Hello,

I ran an experiment with two models: base qwen2.5-7b vs qwen2.5-7b-MLA and found that there was no difference in either speed nor KV Cache utilization.

qwen2.5-7b-MLA was converted using your script: ```bash scripts/convert/qwen2.5-7B-Instruct.sh```
both models were hosted with vLLM.

Results for base qwen2.5-7b:
<img width="1409" height="667" alt="Image" src="https://github.com/user-attachments/assets/61939045-3b38-46c8-918c-7d834678d32f" />

<img width="1409" height="667" alt="Image" src="https://github.com/user-attachments/assets/74398a32-78e4-4bf3-8f24-8ad4992a9c80" />

Results for base qwen2.5-7b-MLA:

<img width="1409" height="667" alt="Image" src="https://github.com/user-attachments/assets/23174376-b96e-4106-a6bf-30f09d05b3c5" />

<img width="1409" height="667" alt="Image" src="https://github.com/user-attachments/assets/5ddc945a-0308-4b30-bd98-5a06383e9dd1" />

Could you please specify how exactly you measured increase in output throughout and KV Cache saving?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

No performance difference with vLLM: qwen2.5-7b vs qwen2.5-7b-MLA #34

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

No performance difference with vLLM: qwen2.5-7b vs qwen2.5-7b-MLA #34

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions