【4090 24G Qwen2.5-7B显存不足】能否支持逐步或按层转换

硬件信息：RTX 4090 24G

在4909 GPU上转换Qwen2.5-7B-Instruct模型, 因显存不足报错，看起来模型权重是全量加载的，能否支持分步乃至按层加载转换，可以在小显存GPU上也支持TransMLA。


报错信息如下：
=====================================================
(base) yangxianpku@ubuntu:~/Repos/TransMLA$ bash scripts/qwen2.5-7B-Instruct.sh 

============================================================
                       Original Model                       
============================================================

`torch_dtype` is deprecated! Use `dtype` instead!
Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:02<00:00,  1.90it/s]
Token indices sequence length is longer than the specified maximum sequence length for this model (299078 > 131072). Running this sequence through the model will result in indexing errors
Evaluating original model's ppl:   0%|                                                                                                                                                   | 0/5 [00:00<?, ?it/s]
Traceback (most recent call last):
  File "/home/yangxianpku/Repos/TransMLA/transmla/converter.py", line 132, in <module>
    main(args)
  File "/home/yangxianpku/Repos/TransMLA/transmla/converter.py", line 68, in main
    dataset_ppl = evaluate_ppl(model, tokenizer.pad_token_id, test_loader, message)
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/anacodna3/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 120, in decorate_context
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/home/yangxianpku/Repos/TransMLA/transmla/utils.py", line 233, in evaluate_ppl
    logits = model(**batch, use_cache=False).logits
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/anacodna3/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/anacodna3/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1786, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/anacodna3/lib/python3.11/site-packages/transformers/utils/generic.py", line 918, in wrapper
    output = func(self, *args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/anacodna3/lib/python3.11/site-packages/transformers/models/qwen2/modeling_qwen2.py", line 449, in forward
    outputs: BaseModelOutputWithPast = self.model(
                                       ^^^^^^^^^^^
  File "/usr/local/anacodna3/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/anacodna3/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1786, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/anacodna3/lib/python3.11/site-packages/transformers/utils/generic.py", line 1072, in wrapper
    outputs = func(self, *args, **kwargs)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/anacodna3/lib/python3.11/site-packages/transformers/models/qwen2/modeling_qwen2.py", line 384, in forward
    hidden_states = decoder_layer(
                    ^^^^^^^^^^^^^^
  File "/usr/local/anacodna3/lib/python3.11/site-packages/transformers/modeling_layers.py", line 94, in __call__
    return super().__call__(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/anacodna3/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/anacodna3/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1786, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/anacodna3/lib/python3.11/site-packages/transformers/utils/deprecation.py", line 172, in wrapped_func
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/anacodna3/lib/python3.11/site-packages/transformers/models/qwen2/modeling_qwen2.py", line 249, in forward
    hidden_states = self.mlp(hidden_states)
                    ^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/anacodna3/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/anacodna3/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1786, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/anacodna3/lib/python3.11/site-packages/transformers/models/qwen2/modeling_qwen2.py", line 46, in forward
    down_proj = self.down_proj(self.act_fn(self.gate_proj(x)) * self.up_proj(x))
                               ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~
torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 2.31 GiB. GPU 0 has a total capacity of 23.65 GiB of which 297.06 MiB is free. Including non-PyTorch memory, this process has 23.35 GiB memory in use. Of the allocated memory 20.57 GiB is allocated by PyTorch, and 2.32 GiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation.  See documentation for Memory Management  (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

【4090 24G Qwen2.5-7B显存不足】能否支持逐步或按层转换 #41

报错信息如下：

============================================================
Original Model

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

【4090 24G Qwen2.5-7B显存不足】能否支持逐步或按层转换 #41

Description

报错信息如下：

============================================================ Original Model

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

============================================================
Original Model