This project develops a machine learning model to estimate chlorophyll-a concentration using Sentinel-2 satellite imagery combined with in-situ measurement data. The workflow includes data preprocessing, model training and validation, and spatial prediction across multiple time periods.
Data cleaning is the essential first step of this project.
Due to data-sharing restrictions and an unclear data disclaimer of raw data, only the final cleaned dataset is included in the repository under the data/ folder.
Once data-sharing permissions are clarified, the detailed cleaning procedures and scripts will also be made publicly available.
Jupyter notebook data_preview.ipynb in the folder notebooks is used to filter out what data can be used to train the model. It won't affect the final result, just to preview the data points and the map.
A Random Forest (RF) model was developed using in-situ measurements collected on 2023-07-06 and the corresponding Sentinel-2 imagery acquired on the same date. To make the model training process clearer, the in-situ measurement (ch_train_test_0706.csv) and Sentinel-2 image (raw_masked_image_0706.tif) that used to train and test the model are stored locally in the folder data
All source code related to model training and validation is available in the src/ directory.
After training and testing, the Random Forest model was applied to Sentinel-2 images acquired from July to October 2023 to estimate spatial and temporal variations in chlorophyll-a concentration. To avoid download large datasets locally, Python API geemap is used. The whole process of how to apply the model is shown in the jupyter notebook stored in the S2_image_apply_model.ipynb in the notebooks folder. The module that used to facilitate finalize model application is stored in class_prediction.py in the src folder
The predicted chlorophyll-a concentrations can be visualized using the Jupyter Notebook provided in the notebooks/ folder. The module being used is stored in class_prediction.py in the src folder.
These visualizations demonstrate the spatial patterns and temporal dynamics of chlorophyll-a across the study area.
The data and results presented in this repository are provided for research and educational purposes only.