Zero-shot land use classification for Sentinel-2 satellite imagery using LLaVA vision-language models. Initial benchmark: 38% accuracy on 10-class EuroSAT test (50 samples, 5 per class) using task-focused prompting.
π See detailed results and methodology β
This project evaluates LLaVA's capability for zero-shot classification of Sentinel-2 satellite imagery without any satellite-specific training. Using task-focused prompts with the full 10-class EuroSAT taxonomy and True Color RGB composites, we demonstrate targeted capabilities on specific land use types.
- 38% overall accuracy on 10-class EuroSAT benchmark
- 100% precision on River and Industrial classes
- Zero training required - no fine-tuning or training data
- ~3 seconds per image on M4 Max with llava:13b
Note: These are preliminary results from a small test set (5 samples per class). Larger-scale validation needed to confirm performance.
β Strong Performance:
- River detection (80% recall, 100% precision)
- Industrial building identification (60% recall, 100% precision)
- Residential area detection (100% recall, 45% precision)
β Challenging:
- Fine-grained vegetation discrimination (AnnualCrop vs. PermanentCrop vs. Pasture vs. Forest)
- Large water body detection (SeaLake: 0% recall)
- Comprehensive 10-class land use classification
- Apple Silicon Mac (M3/M4 recommended) or Linux/Windows with GPU
- Python 3.12+
- Ollama for local LLaVA inference
- EuroSAT dataset (optional, for benchmarking)
# Clone repository
git clone https://github.com/ecohydro/llava_shot.git
cd llava_shot
# Install with uv (recommended)
uv pip install -e .
# Or with pip
pip install -e .# Install Ollama
brew install ollama # macOS
# or download from https://ollama.ai/
# Start Ollama service
ollama serve
# Pull LLaVA model (in a new terminal)
ollama pull llava:13b # Recommended for benchmarking
# or
ollama pull llava:7b # Faster, lower accuracy# Quick test (5 samples per class, 10-class taxonomy)
python scripts/benchmark_eurosat.py --n-per-class 5 --prompt-style eurosat10
# Larger test (20 samples per class)
python scripts/benchmark_eurosat.py --n-per-class 20 --prompt-style eurosat10
# Test specific classes only (using EuroSAT class names)
python scripts/benchmark_eurosat.py --n-per-class 10 --classes Industrial River ResidentialNote: EuroSAT dataset must be downloaded separately. See EuroSAT for download instructions. Place in eurostat/ directory.
We use EuroSAT for rigorous validation:
- 27,000 labeled Sentinel-2 patches (64Γ64 pixels, 13 bands)
- 5,400 test samples with expert ground truth
- 10 land use classes from across Europe:
- AnnualCrop, Forest, HerbaceousVegetation, Highway, Industrial
- Pasture, PermanentCrop, Residential, River, SeaLake
This provides real, expert-labeled ground truth rather than synthetic labels.
Instead of explaining spectral theory, we define LLaVA's role explicitly with the full 10-class EuroSAT taxonomy:
"""You are a land cover classifier analyzing a Sentinel-2 satellite image.
**CLASSIFICATION TASK:**
Classify this image into ONE of the 10 EuroSAT land use classes below.
**EUROSAT CLASSES:**
1. **AnnualCrop** - Annual cropland
2. **Forest** - Areas with dense tree cover
3. **HerbaceousVegetation** - Natural grasslands, meadows
4. **Highway** - Major roads, highways
5. **Industrial** - Industrial buildings, factories
6. **Pasture** - Managed grassland for grazing
7. **PermanentCrop** - Orchards, vineyards
8. **Residential** - Houses, residential buildings
9. **River** - Rivers and streams
10. **SeaLake** - Seas, lakes, large water bodies
**OUTPUT FORMAT:**
CLASS: [exact class name from above]
CONFIDENCE: [high/medium/low]
"""Key Design Principles:
- β Clear task definition
- β Explicit class taxonomy with visual cues
- β Structured output format
- β No spectral theory or band math
- β Simple True Color RGB input
llava_shot/
βββ src/llava_shot/
β βββ data/
β β βββ download.py # Sentinel-2 scene download (AWS STAC)
β β βββ eurosat_loader.py # EuroSAT benchmark loader
β βββ processing/
β β βββ bands.py # Band reading and resampling
β β βββ composites.py # RGB composite generation
β β βββ indices.py # NDVI, NDWI, NDBI calculation
β βββ classification/
β β βββ llava_interface.py # Ollama API wrapper
β β βββ task_prompts.py # Task-focused prompts
β βββ validation/
β βββ metrics.py # Accuracy, confusion matrix
βββ scripts/
β βββ benchmark_eurosat.py # Main benchmark script
β βββ quickstart.py # Download and visualize scenes
β βββ demo_classification.py # Interactive classification demos
βββ data/
β βββ raw/ # Downloaded Sentinel-2 scenes
β βββ processed/ # Generated composites
β βββ validation_eurosat/ # Benchmark results
βββ eurosat/ # EuroSAT dataset (not in git)
βββ RESULTS.md # Detailed results and analysis
βββ README.md # This file
from llava_shot.data.eurosat_loader import EuroSATLoader
from llava_shot.classification.llava_interface import LLaVAClassifier
from llava_shot.classification.task_prompts import get_task_prompt
# Load EuroSAT dataset
loader = EuroSATLoader()
samples = loader.sample_dataset(n_per_class=5)
# Initialize classifier
classifier = LLaVAClassifier(model="llava:13b")
prompt = get_task_prompt("classifier")
# Classify samples
for eurosat_class, filename, simple_class in samples:
bands, metadata = loader.load_patch(eurosat_class, filename)
rgb = loader.create_rgb_composite(bands, "true_color")
response = classifier.classify(rgb, prompt, temperature=0.1)
print(f"Ground truth: {simple_class}, Prediction: {response}")from llava_shot.data.download import download_sentinel2_scene
from llava_shot.processing.composites import CompositeGenerator
from llava_shot.classification.llava_interface import LLaVAClassifier
# Download scene
scene_dir = download_sentinel2_scene(
bbox=[-120.5, 34.4, -120.3, 34.5], # Santa Barbara
date_range=("2024-09-01", "2024-09-30"),
max_cloud_cover=20
)
# Generate True Color composite
generator = CompositeGenerator(scene_dir)
rgb = generator.generate_composite("true_color", target_resolution=10)
# Classify
classifier = LLaVAClassifier(model="llava:13b")
result = classifier.classify_land_cover(rgb)
print(result)- No training data or fine-tuning required
- Fast inference (~3 sec/image on consumer hardware)
- Excellent precision on River (100%) and Industrial (100%) classes
- Works with standard True Color RGB composites
- 38% accuracy on 10-class EuroSAT falls well short of supervised methods (90%+)
- Cannot distinguish fine-grained vegetation types reliably
- Complete failures on AnnualCrop, Forest, and SeaLake (0% recall each)
- Limited to RGB visual information; cannot leverage 13-band spectral data
- Fine distinctions require spectral indices (NDVI, EVI) beyond LLaVA's capabilities
See RESULTS.md for detailed analysis.
Current Phase: Benchmarking Complete β
- Project structure and dependencies
- Sentinel-2 data download (AWS STAC)
- Band processing and RGB composites
- Spectral index calculation (NDVI, NDWI, NDBI, EVI)
- LLaVA integration via Ollama
- Task-focused prompt engineering
- EuroSAT benchmark dataset integration
- Validation metrics and confusion matrix
- Results documentation
Next Steps:
- Few-shot learning experiments
- Larger patch sizes (128Γ128, 256Γ256)
- Multi-temporal analysis
- Comparison with specialized satellite vision models
This is a research project evaluating zero-shot satellite image classification. Contributions welcome!
Areas for improvement:
- Prompt engineering for better grassland/forest discrimination
- Few-shot learning implementations
- Integration with other vision-language models
- Comparison benchmarks
If you use this project in your research, please cite:
@software{llava_shot_2025,
title = {LLaVA-Shot: Zero-Shot Sentinel-2 Classification},
author = {Caylor, Kelly},
year = {2025},
url = {https://github.com/ecohydro/llava_shot}
}TBD
- EuroSAT: Helber et al., "EuroSAT: A Novel Dataset and Deep Learning Benchmark for Land Use and Land Cover Classification", IEEE JSTARS 2019
- Sentinel-2: ESA Sentinel-2 Mission
- LLaVA: Liu et al., "Visual Instruction Tuning", NeurIPS 2023
- Ollama: https://ollama.ai/
Developed on M4 Max MacBook Pro (128GB RAM) with local LLaVA inference using Metal Performance Shaders. EuroSAT dataset provided by the German Research Center for Artificial Intelligence (DFKI).