Skip to content

Conversation

@wassname
Copy link
Contributor

it was a good idea to mask here, but it breaks on all kinds of models e.g. "Qwen/Qwen3-4B-Instruct-2507" and "zai-org/GLM-4.1V-9B-Thinking" and I can't work out how to fix it easily (even using the attention mask is complicated as some models reshape the hidden state and so on). It might be worth disabling it.

it was a good idea, but it breaks on all kinds of models
e.g. "Qwen/Qwen3-4B-Instruct-2507" and "zai-org/GLM-4.1V-9B-Thinking"
and I can't work out how to fix it easily
@thiswillbeyourgithub
Copy link
Contributor

Am I correct if I think that this code is only relevant if we use batch generation?

@wassname
Copy link
Contributor Author

Perhaps, in truth, I haven't pinned down exactly the cases where this happens, or the best solution. I think some models might add position_ids even with batch_size==1, but I'm not sure

@wassname
Copy link
Contributor Author

Another approach here would be to take the attention mask from the inputs if present (rather than working it out from the position_ids if present) but this also leads to shape errors in some models.

I'm not even sure we need to mask the padding tokens if an attention mask is provided, so perhaps this section can be safely removed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants