Errors when running target model

I have performed transfer learning, using base model Llama-3.2-1B-Instruct.

Variant 1. When running the result Llama-3.2-1B-transmla model for inference, have got error like:
Head size 576 is not supported by FlashAttention. Supported head sizes are: [32, 64, 96, 128, 160, 192, 224, 256].
Head size 576 is not supported by PagedAttention. Supported head sizes are: [32, 64, 80, 96, 112, 120, 128, 192, 256].

Command used to create Llama-3.2-1B-transmla:
python transmla/converter.py \
	--model-path models/Llama-3.2-1B-Instruct/ \
	--save-path output/Llama-3.2-1B-transmla \
	--dtype bf16 \
	--cal-dataset wikitext2 \
	--cal-nsamples 128 \
	--cal-max-seqlen 256 \
	--cal-batch-size 8 \
	--ppl-eval-batch-size 8 \
	--freqfold auto \
	--collapse auto \
	--qk-mqa-dim 64 \
	--q-lora-rank 512 \
	--kv-lora-rank 512

Python environment satisfy project requirements:
vllm==0.8.4
transformers==4.52.4
datasets==4.2.0
accelerate==1.3.0
datatrove==0.6.0
tensorboardX==2.6.4

Variant 2. When changed transfer parameters qk-mqa-dim=64, q-lora-rank=192, kv-lora-rank=192,
(to satisfy condition Supported_head_size=256) then face another error:
ModuleNotFoundError: No module named 'transformers_modules.Llama-3'

Also noted, in original model (config.json):
architectures: 		LlamaForCausalLM
model_type: 		llama
num_key_value_heads	8

In trained model (config.json):
architectures: 		LlamaMLAForCausalLM
model_type: 		deepseek_v3
num_key_value_heads	32

Variant 3. When changed transfer parameters qk-mqa-dim=64, q-lora-rank=192, kv-lora-rank=192, --deepseek-style,
then face another error:
ValueError: Following weights were not initialized from checkpoint.

My question: any help or comments will be appreciated.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Errors when running target model #39

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Errors when running target model #39

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions