Fix attention test #1973

sssshhhhhh · 2026-01-07T23:45:59Z

Some assertions were wrong and MockModel init didn't set some variables causing wrong branches. Also swapped dims to prevent D_HEAD == NUM_HEADS. Possibly fixes #1968

Test failing is expected as mqa doesn't support knorm. This should also use _merge_time_and_head_dims because kv_heads can be 1 when it shouldn't be merged (tp/t5 relative pos)

CTranslate2/src/layers/attention.cc

Lines 371 to 374 in 91d46ec

    
           if (_num_heads_kv == 1) { // MQA (Multi-Query Attention) 
        
             if (values_padder) 
        
               values_padder->add_padding(fused_proj); 
        
             ops::Split(2, {_d_head, _d_head})(fused_proj, keys_proj, values_proj);

jordimas · 2026-01-08T06:41:34Z

Thanks. Yes, this is similar to the fix master...jordimas:CTranslate2:cross_attention that I started to work for #1968.

sssshhhhhh · 2026-01-08T07:17:10Z

Nice. btw layernorm/rmsnorm can be inplace, transformer.cc does it in multiple places. Also what do you think about an attention refactor somewhat like this merging attention_layer.cc/attention.cc/flash_attention.cc: https://github.com/sssshhhhhh/CTranslate2/blob/rocm/src/layers/rocm_attention.cc
The fa2 path is falling behind (its support wasn't great in the first place) and does a lot of duplicate steps, if fa2 gets limited to softmax(QK)V op instead of a separate layer I think it would be simpler. To match current fa2 support it needs to handle both BNTH/BTNH qkv proj layout paths; and move head replication/rope to dpa.

Fix attention test

f608cc2

sssshhhhhh force-pushed the attn-test branch from 700d36c to f608cc2 Compare January 7, 2026 23:47

sssshhhhhh closed this Jan 8, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix attention test #1973

Fix attention test #1973

Uh oh!

sssshhhhhh commented Jan 7, 2026 •

edited

Loading

Uh oh!

jordimas commented Jan 8, 2026

Uh oh!

sssshhhhhh commented Jan 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	if (_num_heads_kv == 1) { // MQA (Multi-Query Attention)
	if (values_padder)
	values_padder->add_padding(fused_proj);
	ops::Split(2, {_d_head, _d_head})(fused_proj, keys_proj, values_proj);

Fix attention test #1973

Fix attention test #1973

Uh oh!

Conversation

sssshhhhhh commented Jan 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jordimas commented Jan 8, 2026

Uh oh!

sssshhhhhh commented Jan 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

sssshhhhhh commented Jan 7, 2026 •

edited

Loading