Skip to content

PLUM-Lab/Ameli

Repository files navigation

📋 Ameli: Enhancing Multimodal Entity Linking with Fine-Grained Attributes

Ameli: Enhancing Multimodal Entity Linking with Fine-Grained Attributes

This repository is the official implementation of Ameli: Enhancing Multimodal Entity Linking with Fine-Grained Attributes.

Please use the following citation:

@inproceedings{yao-etal-2024-ameli,
    title = "Ameli: Enhancing Multimodal Entity Linking with Fine-Grained Attributes",
    author = "Yao, Barry  and
      Wang, Sijia  and
      Chen, Yu  and
      Wang, Qifan  and
      Liu, Minqian  and
      Xu, Zhiyang  and
      Yu, Licheng  and
      Huang, Lifu",
    editor = "Graham, Yvette  and
      Purver, Matthew",
    booktitle = "Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics (Volume 1: Long Papers)",
    month = mar,
    year = "2024",
    address = "St. Julian{'}s, Malta",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2024.eacl-long.172",
    pages = "2816--2834",
    abstract = "We propose attribute-aware multimodal entity linking, where the input consists of a mention described with a text paragraph and images, and the goal is to predict the corresponding target entity from a multimodal knowledge base (KB) where each entity is also accompanied by a text description, visual images, and a collection of attributes that present the meta-information of the entity in a structured format. To facilitate this research endeavor, we construct Ameli, encompassing a new multimodal entity linking benchmark dataset that contains 16,735 mentions described in text and associated with 30,472 images, and a multimodal knowledge base that covers 34,690 entities along with 177,873 entity images and 798,216 attributes. To establish baseline performance on Ameli, we experiment with several state-of-the-art architectures for multimodal entity linking and further propose a new approach that incorporates attributes of entities into disambiguation. Experimental results and extensive qualitative analysis demonstrate that extracting and understanding the attributes of mentions from their text descriptions and visual images play a vital role in multimodal entity linking. To the best of our knowledge, we are the first to integrate attributes in the multimodal entity linking task. The programs, model checkpoints, and the dataset are publicly available at https://github.com/VT-NLP/Ameli.",
}

Requirements

To install requirements:

conda create -n ameli -y python=3.8 && conda activate ameli
conda install pytorch==1.9.0 torchvision==0.10.0 torchaudio==0.9.0 cudatoolkit=11.3 -c pytorch -c conda-forge
pip install -U "ray[default]"
pip install -r requirements.txt

Dataset - Ameli

You can download dataset here:

Pre-trained Models

checkpoint

Evaluation

python entity_disambiguation_v2.py --mode=test --checkpoint_dir=#PATH_TO_CHECKPOINT

Troubleshooting

If some modules cannot be found, preface the python command with PYTHONPATH=.


Training (Optional)

 
python entity_disambiguation_v2.py --candidate_mode=standard --dataset_class=v3 --model_attribute=B6 --lr=0.001 --batch_size=32 --train_dir=#PATH
python entity_disambiguation_v2.py --candidate_mode=standard   --model_attribute=A6 --lr=0.001 --batch_size=32  --train_dir=#PATH
 

Dataset Build (Optional)

TODO

Note: We will try to clean the code to make it easy to run and understand later.

Contributing

📋 Our dataset is licensed under the CC BY 4.0. The associated codes are licensed under Apache License 2.0.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages