To set up your environment, run the following commands:
conda create -n mixlora python=3.8 -y
conda activate mixlora
sh setup.shPlease download the dataset from Vision-Flan.
The evaluation dataset we used can be downloaded from here.
Specify the image_folder and data_path in the fine-tuning scripts according to the data preparation.
To fine-tune the mode, run the following command:
sh scripts/finetune_mixlora.sh <routing-type> <num-experts> <num-rank>
<routing-type>: Specify the type of routing (inputfor instance-based IFS routing alone,input_lora_a_paramfor combined instance-based IFS routing and CFS routing).<num-experts>: Specify the number of factors.<num-rank>: Specify the number of rank.
The projector weights
mm_projector.bincan be downloaded from the original LLava repo.
The trained model checkpoints can be found from here.
To run inference on all the multimodal tasks:
sh scripts/run_eval.sh <model-path> <data-dir>
<model-path>: Specify the path to the model<data-dir>: Specify the path to the evaluation dataset
To run inference on MME:
sh scripts/run_eval_mme.sh <model-path> <data-dir>
<model-path>: Specify the path to the model<data-dir>: Specify the path to the MME dataset
The codebase is built upon LLaVA. We would like to thank the authors for publicly releasing their code.
@article{shen2024multimodal,
title={Multimodal Instruction Tuning with Conditional Mixture of LoRA},
author={Shen, Ying and Xu, Zhiyang and Wang, Qifan and Cheng, Yu and Yin, Wenpeng and Huang, Lifu},
journal={arXiv preprint arXiv:2402.15896},
year={2024}
}
