Hugging Face Image Classification Project

Custom Vision Transformer (ViT) model fine-tuned for 5 specific classes using google/vit-base-patch16-224.

Overview

This project:

Uses the pre-trained google/vit-base-patch16-224 model
Modifies it from 1000 classes to 5 custom classes: my_cat, my_dog, my_car, my_house, my_phone
Trains it on your custom image dataset
Tests it on your own photos

Customization Explanation

What Was Changed

The base google/vit-base-patch16-224 model is pre-trained on ImageNet with 1000 general classes (like "cat", "dog", "car", etc.). This project customizes it for 5 specific classes:

Before (Base Model):

1000 output classes (ImageNet categories)
Generic labels like "Egyptian cat", "golden retriever", "sports car"
Classifier head: Linear(768, 1000)

After (Custom Model):

5 output classes: my_cat, my_dog, my_car, my_house, my_phone
Personalized labels for your specific objects
Classifier head: Linear(768, 5)

How It Works

Model Architecture Modification (model_custom.py):
- Loads the pre-trained ViT model (weights preserved)
- Replaces the final classification layer from 1000 → 5 outputs
- Updates label mappings (id2label, label2id)
- Keeps all pre-trained feature extraction layers (transfer learning)
Fine-Tuning (train.py):
- Freezes most layers, trains only the new classifier head
- Uses your custom images to learn class-specific features
- Adapts the model to recognize your specific objects
Key Benefits:
- Leverages pre-trained features (no training from scratch)
- Fast training (only classifier head needs learning)
- Personalized for your specific objects
- Requires less data than training from scratch

Technical Details

Base Model: Vision Transformer (ViT) with patch size 16×16, 224×224 input
Feature Dimension: 768 (hidden size)
Modification: Final linear layer changed from 768 → 1000 to 768 → 5
Training: Fine-tuning with custom dataset using Hugging Face Trainer

Setup

Install dependencies:

pip install -r requirements.txt

Quick Start

Step 1: Create Custom Model

Run the script to create a custom model with your 5 classes:

python model_custom.py

This creates a custom model in ./custom_vit_model with 5 classes instead of 1000.

Your classes:

my_cat
my_dog
my_car
my_house
my_phone

Step 2: Prepare Your Dataset

Organize your images in this structure:

data/
  my_cat/
    image1.jpg
    image2.jpg
    ...
  my_dog/
    image1.jpg
    ...
  my_car/
    ...
  my_house/
    ...
  my_phone/
    ...

Important:

Folder names must match the class names exactly: my_cat, my_dog, my_car, my_house, my_phone
Use common formats: .jpg, .jpeg, .png, .bmp, .gif
Aim for at least 50-100 images per class for best results
The data/ folder structure has been created for you - just add your images!

Step 3: Train the Model

python train.py --data_dir ./data --epochs 5 --batch_size 8

Parameters:

--data_dir: Directory with class subdirectories (default: ./data)
--model_path: Path to custom model (default: ./custom_vit_model)
--output_dir: Where to save trained model (default: ./trained_model)
--epochs: Number of training epochs (default: 5)
--batch_size: Batch size (default: 8, reduce if out of memory)
--learning_rate: Learning rate (default: 2e-5)

Example with custom parameters:

python train.py --data_dir ./data --epochs 10 --batch_size 16 --learning_rate 2e-5

Step 4: Test Your Photos

Single image:

python test.py --image my_photo.jpg

All images in a directory:

python test.py --directory ./my_test_photos

Use a different model:

python test.py --image photo.jpg --model_path ./my_trained_model

Project Structure

huggingface-image-project/
├── requirements.txt          # Dependencies
├── model_custom.py          # Create custom 5-class model
├── train.py                 # Training script
├── test.py                  # Testing script
├── README.md                # This file
├── custom_vit_model/        # Created by model_custom.py
├── trained_model/           # Created by train.py
└── data/                    # Your training images
    ├── my_cat/
    ├── my_dog/
    ├── my_car/
    ├── my_house/
    └── my_phone/

Complete Workflow

# 1. Install dependencies
pip install -r requirements.txt

# 2. Create custom model
python model_custom.py

# 3. Add your images to data/ subdirectories
#    - data/my_cat/your_cat_images.jpg
#    - data/my_dog/your_dog_images.jpg
#    - data/my_car/your_car_images.jpg
#    - data/my_house/your_house_images.jpg
#    - data/my_phone/your_phone_images.jpg

# 4. Train the model
python train.py --data_dir ./data --epochs 5

# 5. Test your photos
python test.py --image my_photo.jpg

Tips for Best Results

More data = better accuracy: Use at least 50-100 images per class
Image quality: Use clear, well-lit images
Variety: Include different angles, backgrounds, lighting conditions
Batch size: Reduce if you run out of memory (try 4 or 8)
Epochs: Start with 5, increase if validation accuracy is still improving
Balance: Try to have roughly equal number of images per class

Troubleshooting

No images found:

Check that your data directory structure matches the expected format
Verify folder names match exactly: my_cat, my_dog, my_car, my_house, my_phone
Ensure images have supported extensions (.jpg, .png, etc.)

Out of memory:

Reduce --batch_size (try 4 or 8)
Use smaller images or resize before training

Low accuracy:

Add more training images per class
Ensure images are clear and representative
Try training for more epochs
Check that test images are similar to training data

Model not found:

Make sure you've run python model_custom.py before training
Check that ./custom_vit_model exists

Requirements

Python 3.8+
PyTorch 2.0+
Transformers 4.30+
See requirements.txt for full list

License

This project uses the google/vit-base-patch16-224 model from Hugging Face.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
.gitignore		.gitignore
CUSTOMIZATION.md		CUSTOMIZATION.md
GITHUB_SETUP.md		GITHUB_SETUP.md
README.md		README.md
SAMPLE_OUTPUTS.md		SAMPLE_OUTPUTS.md
model_custom.py		model_custom.py
requirements.txt		requirements.txt
test.py		test.py
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Hugging Face Image Classification Project

Overview

Customization Explanation

What Was Changed

How It Works

Technical Details

Setup

Quick Start

Step 1: Create Custom Model

Step 2: Prepare Your Dataset

Step 3: Train the Model

Step 4: Test Your Photos

Project Structure

Complete Workflow

Tips for Best Results

Troubleshooting

Requirements

License

About

Uh oh!

Releases

Packages

Languages

PawanKonwar/Huggingface-Image-Project

Folders and files

Latest commit

History

Repository files navigation

Hugging Face Image Classification Project

Overview

Customization Explanation

What Was Changed

How It Works

Technical Details

Setup

Quick Start

Step 1: Create Custom Model

Step 2: Prepare Your Dataset

Step 3: Train the Model

Step 4: Test Your Photos

Project Structure

Complete Workflow

Tips for Best Results

Troubleshooting

Requirements

License

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages