-
Notifications
You must be signed in to change notification settings - Fork 26
Description
I have performed transfer learning, using base model Llama-3.2-1B-Instruct.
Variant 1. When running the result Llama-3.2-1B-transmla model for inference, have got error like:
Head size 576 is not supported by FlashAttention. Supported head sizes are: [32, 64, 96, 128, 160, 192, 224, 256].
Head size 576 is not supported by PagedAttention. Supported head sizes are: [32, 64, 80, 96, 112, 120, 128, 192, 256].
Command used to create Llama-3.2-1B-transmla:
python transmla/converter.py
--model-path models/Llama-3.2-1B-Instruct/
--save-path output/Llama-3.2-1B-transmla
--dtype bf16
--cal-dataset wikitext2
--cal-nsamples 128
--cal-max-seqlen 256
--cal-batch-size 8
--ppl-eval-batch-size 8
--freqfold auto
--collapse auto
--qk-mqa-dim 64
--q-lora-rank 512
--kv-lora-rank 512
Python environment satisfy project requirements:
vllm==0.8.4
transformers==4.52.4
datasets==4.2.0
accelerate==1.3.0
datatrove==0.6.0
tensorboardX==2.6.4
Variant 2. When changed transfer parameters qk-mqa-dim=64, q-lora-rank=192, kv-lora-rank=192,
(to satisfy condition Supported_head_size=256) then face another error:
ModuleNotFoundError: No module named 'transformers_modules.Llama-3'
Also noted, in original model (config.json):
architectures: LlamaForCausalLM
model_type: llama
num_key_value_heads 8
In trained model (config.json):
architectures: LlamaMLAForCausalLM
model_type: deepseek_v3
num_key_value_heads 32
Variant 3. When changed transfer parameters qk-mqa-dim=64, q-lora-rank=192, kv-lora-rank=192, --deepseek-style,
then face another error:
ValueError: Following weights were not initialized from checkpoint.
My question: any help or comments will be appreciated.