rm token position #70

wassname · 2025-09-21T06:44:16Z

it was a good idea to mask here, but it breaks on all kinds of models e.g. "Qwen/Qwen3-4B-Instruct-2507" and "zai-org/GLM-4.1V-9B-Thinking" and I can't work out how to fix it easily (even using the attention mask is complicated as some models reshape the hidden state and so on). It might be worth disabling it.

it was a good idea, but it breaks on all kinds of models e.g. "Qwen/Qwen3-4B-Instruct-2507" and "zai-org/GLM-4.1V-9B-Thinking" and I can't work out how to fix it easily

thiswillbeyourgithub · 2025-09-21T10:25:40Z

Am I correct if I think that this code is only relevant if we use batch generation?

wassname · 2025-09-21T21:26:09Z

Perhaps, in truth, I haven't pinned down exactly the cases where this happens, or the best solution. I think some models might add position_ids even with batch_size==1, but I'm not sure

wassname · 2025-09-21T21:28:16Z

Another approach here would be to take the attention mask from the inputs if present (rather than working it out from the position_ids if present) but this also leads to shape errors in some models.

I'm not even sure we need to mask the padding tokens if an attention mask is provided, so perhaps this section can be safely removed.

rm token position

745214d

it was a good idea, but it breaks on all kinds of models e.g. "Qwen/Qwen3-4B-Instruct-2507" and "zai-org/GLM-4.1V-9B-Thinking" and I can't work out how to fix it easily

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

rm token position #70

rm token position #70

Uh oh!

wassname commented Sep 21, 2025

Uh oh!

thiswillbeyourgithub commented Sep 21, 2025

Uh oh!

wassname commented Sep 21, 2025

Uh oh!

wassname commented Sep 21, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

rm token position #70

Are you sure you want to change the base?

rm token position #70

Uh oh!

Conversation

wassname commented Sep 21, 2025

Uh oh!

thiswillbeyourgithub commented Sep 21, 2025

Uh oh!

wassname commented Sep 21, 2025

Uh oh!

wassname commented Sep 21, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants