Jailbreaking Deep Models

A comprehensive implementation of adversarial attacks on ImageNet classifiers, demonstrating the vulnerability of deep neural networks to carefully crafted perturbations.

Overview

This project implements and evaluates three types of adversarial attacks on state-of-the-art image classifiers:

Fast Gradient Sign Method (FGSM)
Projected Gradient Descent (PGD)
Localized Patch Attacks

The attacks are tested on ResNet-34 and DenseNet-121 models trained on ImageNet-1K, showing how even small perturbations can significantly degrade model performance.

Features

Implementation of FGSM, PGD, and patch-based adversarial attacks
Support for both ResNet-34 and DenseNet-121 architectures
Comprehensive evaluation metrics including top-1 and top-5 accuracy
Visualization tools for comparing original and adversarial examples

Installation

# Clone the repository
git clone https://github.com/ftaghiyev/Jailbreaking-Deep-Models.git
cd Jailbreaking-Deep-Models

# Install dependencies
pip install -r requirements.txt

Results

Our experiments demonstrate significant degradation in model performance across all attack methods. For detailed results and analysis, please refer to the full report.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
.gitattributes		.gitattributes
README.md		README.md
jailbreaking_deep_models.pdf		jailbreaking_deep_models.pdf
main.ipynb		main.ipynb
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Jailbreaking Deep Models

Overview

Features

Installation

Results

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

ftaghiyev/Jailbreaking-Deep-Models

Folders and files

Latest commit

History

Repository files navigation

Jailbreaking Deep Models

Overview

Features

Installation

Results

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages