LLaVA-Shot: Zero-Shot Sentinel-2 Classification

Zero-shot land use classification for Sentinel-2 satellite imagery using LLaVA vision-language models. Initial benchmark: 38% accuracy on 10-class EuroSAT test (50 samples, 5 per class) using task-focused prompting.

📊 See detailed results and methodology →

Overview

This project evaluates LLaVA's capability for zero-shot classification of Sentinel-2 satellite imagery without any satellite-specific training. Using task-focused prompts with the full 10-class EuroSAT taxonomy and True Color RGB composites, we demonstrate targeted capabilities on specific land use types.

Key Results (Initial Test: 50 samples across 10 classes)

38% overall accuracy on 10-class EuroSAT benchmark
100% precision on River and Industrial classes
Zero training required - no fine-tuning or training data
~3 seconds per image on M4 Max with llava:13b

Note: These are preliminary results from a small test set (5 samples per class). Larger-scale validation needed to confirm performance.

Best Use Cases

✅ Strong Performance:

River detection (80% recall, 100% precision)
Industrial building identification (60% recall, 100% precision)
Residential area detection (100% recall, 45% precision)

❌ Challenging:

Fine-grained vegetation discrimination (AnnualCrop vs. PermanentCrop vs. Pasture vs. Forest)
Large water body detection (SeaLake: 0% recall)
Comprehensive 10-class land use classification

Quick Start

Prerequisites

Apple Silicon Mac (M3/M4 recommended) or Linux/Windows with GPU
Python 3.12+
Ollama for local LLaVA inference
EuroSAT dataset (optional, for benchmarking)

Installation

# Clone repository
git clone https://github.com/ecohydro/llava_shot.git
cd llava_shot

# Install with uv (recommended)
uv pip install -e .

# Or with pip
pip install -e .

Install Ollama and LLaVA

# Install Ollama
brew install ollama  # macOS
# or download from https://ollama.ai/

# Start Ollama service
ollama serve

# Pull LLaVA model (in a new terminal)
ollama pull llava:13b  # Recommended for benchmarking
# or
ollama pull llava:7b   # Faster, lower accuracy

Run Benchmark

# Quick test (5 samples per class, 10-class taxonomy)
python scripts/benchmark_eurosat.py --n-per-class 5 --prompt-style eurosat10

# Larger test (20 samples per class)
python scripts/benchmark_eurosat.py --n-per-class 20 --prompt-style eurosat10

# Test specific classes only (using EuroSAT class names)
python scripts/benchmark_eurosat.py --n-per-class 10 --classes Industrial River Residential

Note: EuroSAT dataset must be downloaded separately. See EuroSAT for download instructions. Place in eurostat/ directory.

Benchmark Dataset

We use EuroSAT for rigorous validation:

27,000 labeled Sentinel-2 patches (64×64 pixels, 13 bands)
5,400 test samples with expert ground truth
10 land use classes from across Europe:
- AnnualCrop, Forest, HerbaceousVegetation, Highway, Industrial
- Pasture, PermanentCrop, Residential, River, SeaLake

This provides real, expert-labeled ground truth rather than synthetic labels.

Approach: Task-Focused Prompting

Instead of explaining spectral theory, we define LLaVA's role explicitly with the full 10-class EuroSAT taxonomy:

"""You are a land cover classifier analyzing a Sentinel-2 satellite image.

**CLASSIFICATION TASK:**
Classify this image into ONE of the 10 EuroSAT land use classes below.

**EUROSAT CLASSES:**
1. **AnnualCrop** - Annual cropland
2. **Forest** - Areas with dense tree cover
3. **HerbaceousVegetation** - Natural grasslands, meadows
4. **Highway** - Major roads, highways
5. **Industrial** - Industrial buildings, factories
6. **Pasture** - Managed grassland for grazing
7. **PermanentCrop** - Orchards, vineyards
8. **Residential** - Houses, residential buildings
9. **River** - Rivers and streams
10. **SeaLake** - Seas, lakes, large water bodies

**OUTPUT FORMAT:**
CLASS: [exact class name from above]
CONFIDENCE: [high/medium/low]
"""

Key Design Principles:

✅ Clear task definition
✅ Explicit class taxonomy with visual cues
✅ Structured output format
✅ No spectral theory or band math
✅ Simple True Color RGB input

Project Structure

llava_shot/
├── src/llava_shot/
│   ├── data/
│   │   ├── download.py           # Sentinel-2 scene download (AWS STAC)
│   │   └── eurosat_loader.py     # EuroSAT benchmark loader
│   ├── processing/
│   │   ├── bands.py              # Band reading and resampling
│   │   ├── composites.py         # RGB composite generation
│   │   └── indices.py            # NDVI, NDWI, NDBI calculation
│   ├── classification/
│   │   ├── llava_interface.py    # Ollama API wrapper
│   │   └── task_prompts.py       # Task-focused prompts
│   └── validation/
│       └── metrics.py            # Accuracy, confusion matrix
├── scripts/
│   ├── benchmark_eurosat.py      # Main benchmark script
│   ├── quickstart.py             # Download and visualize scenes
│   └── demo_classification.py    # Interactive classification demos
├── data/
│   ├── raw/                      # Downloaded Sentinel-2 scenes
│   ├── processed/                # Generated composites
│   └── validation_eurosat/       # Benchmark results
├── eurosat/                      # EuroSAT dataset (not in git)
├── RESULTS.md                    # Detailed results and analysis
└── README.md                     # This file

Usage Examples

Benchmark on EuroSAT

from llava_shot.data.eurosat_loader import EuroSATLoader
from llava_shot.classification.llava_interface import LLaVAClassifier
from llava_shot.classification.task_prompts import get_task_prompt

# Load EuroSAT dataset
loader = EuroSATLoader()
samples = loader.sample_dataset(n_per_class=5)

# Initialize classifier
classifier = LLaVAClassifier(model="llava:13b")
prompt = get_task_prompt("classifier")

# Classify samples
for eurosat_class, filename, simple_class in samples:
    bands, metadata = loader.load_patch(eurosat_class, filename)
    rgb = loader.create_rgb_composite(bands, "true_color")

    response = classifier.classify(rgb, prompt, temperature=0.1)
    print(f"Ground truth: {simple_class}, Prediction: {response}")

Download and Classify Sentinel-2 Scene

from llava_shot.data.download import download_sentinel2_scene
from llava_shot.processing.composites import CompositeGenerator
from llava_shot.classification.llava_interface import LLaVAClassifier

# Download scene
scene_dir = download_sentinel2_scene(
    bbox=[-120.5, 34.4, -120.3, 34.5],  # Santa Barbara
    date_range=("2024-09-01", "2024-09-30"),
    max_cloud_cover=20
)

# Generate True Color composite
generator = CompositeGenerator(scene_dir)
rgb = generator.generate_composite("true_color", target_resolution=10)

# Classify
classifier = LLaVAClassifier(model="llava:13b")
result = classifier.classify_land_cover(rgb)
print(result)

Performance & Limitations

Strengths

No training data or fine-tuning required
Fast inference (~3 sec/image on consumer hardware)
Excellent precision on River (100%) and Industrial (100%) classes
Works with standard True Color RGB composites

Limitations

38% accuracy on 10-class EuroSAT falls well short of supervised methods (90%+)
Cannot distinguish fine-grained vegetation types reliably
Complete failures on AnnualCrop, Forest, and SeaLake (0% recall each)
Limited to RGB visual information; cannot leverage 13-band spectral data
Fine distinctions require spectral indices (NDVI, EVI) beyond LLaVA's capabilities

See RESULTS.md for detailed analysis.

Development Status

Current Phase: Benchmarking Complete ✅

Project structure and dependencies
Sentinel-2 data download (AWS STAC)
Band processing and RGB composites
Spectral index calculation (NDVI, NDWI, NDBI, EVI)
LLaVA integration via Ollama
Task-focused prompt engineering
EuroSAT benchmark dataset integration
Validation metrics and confusion matrix
Results documentation

Next Steps:

Few-shot learning experiments
Larger patch sizes (128×128, 256×256)
Multi-temporal analysis
Comparison with specialized satellite vision models

Contributing

This is a research project evaluating zero-shot satellite image classification. Contributions welcome!

Areas for improvement:

Prompt engineering for better grassland/forest discrimination
Few-shot learning implementations
Integration with other vision-language models
Comparison benchmarks

Citation

If you use this project in your research, please cite:

@software{llava_shot_2025,
  title = {LLaVA-Shot: Zero-Shot Sentinel-2 Classification},
  author = {Caylor, Kelly},
  year = {2025},
  url = {https://github.com/ecohydro/llava_shot}
}

License

TBD

References

EuroSAT: Helber et al., "EuroSAT: A Novel Dataset and Deep Learning Benchmark for Land Use and Land Cover Classification", IEEE JSTARS 2019
Sentinel-2: ESA Sentinel-2 Mission
LLaVA: Liu et al., "Visual Instruction Tuning", NeurIPS 2023
Ollama: https://ollama.ai/

Acknowledgments

Developed on M4 Max MacBook Pro (128GB RAM) with local LLaVA inference using Metal Performance Shaders. EuroSAT dataset provided by the German Research Center for Artificial Intelligence (DFKI).

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
.claude		.claude
eurostat		eurostat
scripts		scripts
src/llava_shot		src/llava_shot
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
README.md		README.md
RESULTS.md		RESULTS.md
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

LLaVA-Shot: Zero-Shot Sentinel-2 Classification

Overview

Key Results (Initial Test: 50 samples across 10 classes)

Best Use Cases

Quick Start

Prerequisites

Installation

Install Ollama and LLaVA

Run Benchmark

Benchmark Dataset

Approach: Task-Focused Prompting

Project Structure

Usage Examples

Benchmark on EuroSAT

Download and Classify Sentinel-2 Scene

Performance & Limitations

Strengths

Limitations

Development Status

Contributing

Citation

License

References

Acknowledgments

About

Uh oh!

Releases

Packages

Languages

ecohydro/llava_shot

Folders and files

Latest commit

History

Repository files navigation

LLaVA-Shot: Zero-Shot Sentinel-2 Classification

Overview

Key Results (Initial Test: 50 samples across 10 classes)

Best Use Cases

Quick Start

Prerequisites

Installation

Install Ollama and LLaVA

Run Benchmark

Benchmark Dataset

Approach: Task-Focused Prompting

Project Structure

Usage Examples

Benchmark on EuroSAT

Download and Classify Sentinel-2 Scene

Performance & Limitations

Strengths

Limitations

Development Status

Contributing

Citation

License

References

Acknowledgments

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages