Implementation of FILIP embedding model includes padding vectors in similarity computation

Hello, and thank you for your work on this repository.



I have a question regarding the implementation of the FILIP embedding model in this repository. 

In the original FILIP paper, it is mentioned that padding vectors are excluded from similarity computation to prevent performance degradation. 
> "Unlike Khattab & Zaharia (2020), we discard the padded tokens and use average instead summation of token-wise maximum similarities when computing the image-text alignment, which enhances the cross-modal representation learning and stabilizes training."

However, based on my understanding of the code here, it seems that padding vectors are also being used in the similarity calculation.
In the implementation, FILIP use topk selection in "get_weighted_dense_logits" function of FILIP model. 
However, if we use top k value (input argument of get_weighted_dense_logits function) as a larger value than the number of vectors for each text/image sample, then padding vector can be used in the similarity calculation.
And theoretically, selecting top k vectors and dropping vector for padded token is not the same.

> https://github.com/Sense-GVT/DeCLIP/blob/main/experiments/filip_experiments/yfcc15m/yfcc15m_vit_filip/config.yaml#L22
> https://github.com/Sense-GVT/DeCLIP/blob/main/prototype/model/filip.py#L71-L106

I would like to confirm whether my understanding is correct. If padding vectors are indeed included in the similarity computation, could you clarify the reason behind this design choice?



Thank you for your time and support!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Implementation of FILIP embedding model includes padding vectors in similarity computation #31

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Implementation of FILIP embedding model includes padding vectors in similarity computation #31

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions