DeepVul is a multi-task transformer-based model designed to jointly predict gene essentiality and drug response using gene expression data. The model uses a shared feature extractor to learn robust biological representations that can be fine-tuned for downstream tasks, such as gene knockout effect prediction or treatment sensitivity profiling.
- 🚀 Features
- 📦 Installation
- 📊 Datasets
- ⚙️ Hyperparameters
- 🏃 Running the Model
- 🧠 Additional Info
- 📄 Citation
- Joint prediction of gene essentiality and drug response
- Shared transformer encoder for multi-task learning
- Flexible modes: pre-training only, fine-tuning only, or both
- Compatible with public omics and pharmacogenomic datasets
- Fully configurable via command-line arguments
Make sure you have conda installed. Then run:
conda env create --file condaenv.yml
conda activate condaenvTo run DeepVul, download the following datasets and place them in the data/ directory:
| Dataset | Description | Source |
|---|---|---|
| Gene Expression | TPM-log transformed gene expression data | Download |
| Gene Essentiality | CRISPR-Cas9 knockout effect scores | Download |
| Drug Response | PRISM log-fold change drug response | Download |
| Sanger Essentiality | CERES gene effect data from Sanger | Download |
| Somatic Mutation | Mutation profiles for CCLE lines | Download |
DeepVul supports flexible training via CLI arguments:
| Parameter | Default | Description |
|---|---|---|
--pretrain_batch_size |
20 | Batch size during pre-training |
--finetuning_batch_size |
20 | Batch size during fine-tuning |
--hidden_state |
500 | Size of transformer hidden layers |
--pre_train_epochs |
20 | Pre-training epochs |
--fine_tune_epochs |
20 | Fine-tuning epochs |
--opt |
Adam | Optimizer type |
--lr |
0.0001 | Learning rate |
--dropout |
0.1 | Dropout rate |
--nhead |
2 | Number of attention heads |
--num_layers |
2 | Transformer encoder layers |
--dim_feedforward |
2048 | Feedforward network size |
--fine_tuning_mode |
freeze-shared | Whether to freeze shared layers during fine-tuning |
--run_mode |
pre-train / fine-tune / both | Execution mode |
cd srcpython run_deepvul.py --run_mode pre-train ...python run_deepvul.py --run_mode fine-tune ...python run_deepvul.py --run_mode both ...Customize the CLI options as needed based on your experiment setup.
- Source code for model architecture, training, and evaluation is located in the
src/directory. - If you encounter issues or have questions, please open a GitHub Issue or contact the maintainers.
- Model interpretation and evaluation scripts are included in the repo.
If you use DeepVul in your work, please cite:
@article {JararwehDeepVul,
author = {Jararweh, Ala and Bach, My Nguyen and Arredondo, David and Macaulay, Oladimeji and Dicome, Mikaela and Tafoya, Luis and Hu, Yue and Virupakshappa, Kushal and Boland, Genevieve and Flaherty, Keith and Sahu, Avinash},
title = {DeepVul: A Multi-Task Transformer Model for Joint Prediction of Gene Essentiality and Drug Response},
elocation-id = {2024.10.17.618944},
year = {2025},
doi = {10.1101/2024.10.17.618944},
publisher = {Cold Spring Harbor Laboratory},
URL = {https://www.biorxiv.org/content/early/2025/10/15/2024.10.17.618944},
eprint = {https://www.biorxiv.org/content/early/2025/10/15/2024.10.17.618944.full.pdf},
journal = {bioRxiv}
}