📋 Ameli: Enhancing Multimodal Entity Linking with Fine-Grained Attributes
This repository is the official implementation of Ameli: Enhancing Multimodal Entity Linking with Fine-Grained Attributes.
Please use the following citation:
@inproceedings{yao-etal-2024-ameli,
title = "Ameli: Enhancing Multimodal Entity Linking with Fine-Grained Attributes",
author = "Yao, Barry and
Wang, Sijia and
Chen, Yu and
Wang, Qifan and
Liu, Minqian and
Xu, Zhiyang and
Yu, Licheng and
Huang, Lifu",
editor = "Graham, Yvette and
Purver, Matthew",
booktitle = "Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics (Volume 1: Long Papers)",
month = mar,
year = "2024",
address = "St. Julian{'}s, Malta",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2024.eacl-long.172",
pages = "2816--2834",
abstract = "We propose attribute-aware multimodal entity linking, where the input consists of a mention described with a text paragraph and images, and the goal is to predict the corresponding target entity from a multimodal knowledge base (KB) where each entity is also accompanied by a text description, visual images, and a collection of attributes that present the meta-information of the entity in a structured format. To facilitate this research endeavor, we construct Ameli, encompassing a new multimodal entity linking benchmark dataset that contains 16,735 mentions described in text and associated with 30,472 images, and a multimodal knowledge base that covers 34,690 entities along with 177,873 entity images and 798,216 attributes. To establish baseline performance on Ameli, we experiment with several state-of-the-art architectures for multimodal entity linking and further propose a new approach that incorporates attributes of entities into disambiguation. Experimental results and extensive qualitative analysis demonstrate that extracting and understanding the attributes of mentions from their text descriptions and visual images play a vital role in multimodal entity linking. To the best of our knowledge, we are the first to integrate attributes in the multimodal entity linking task. The programs, model checkpoints, and the dataset are publicly available at https://github.com/VT-NLP/Ameli.",
}
To install requirements:
conda create -n ameli -y python=3.8 && conda activate ameli
conda install pytorch==1.9.0 torchvision==0.10.0 torchaudio==0.9.0 cudatoolkit=11.3 -c pytorch -c conda-forge
pip install -U "ray[default]"
pip install -r requirements.txt
You can download dataset here:
-
Dataset Format and structure are explained in Section K.3 Dataset Format in Appendix of our paper (https://aclanthology.org/2024.eacl-long.172).
python entity_disambiguation_v2.py --mode=test --checkpoint_dir=#PATH_TO_CHECKPOINTIf some modules cannot be found, preface the python command with PYTHONPATH=.
python entity_disambiguation_v2.py --candidate_mode=standard --dataset_class=v3 --model_attribute=B6 --lr=0.001 --batch_size=32 --train_dir=#PATH
python entity_disambiguation_v2.py --candidate_mode=standard --model_attribute=A6 --lr=0.001 --batch_size=32 --train_dir=#PATH
TODO
Note: We will try to clean the code to make it easy to run and understand later.
📋 Our dataset is licensed under the CC BY 4.0. The associated codes are licensed under Apache License 2.0.