-
Notifications
You must be signed in to change notification settings - Fork 44
Automate model compatibility checks #907
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #907 +/- ##
==========================================
- Coverage 99.63% 97.72% -1.91%
==========================================
Files 103 104 +1
Lines 8238 8399 +161
==========================================
Hits 8208 8208
- Misses 30 191 +161 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
f633916 to
924e812
Compare
9b1ea17 to
bd74716
Compare
Projects for compat includes also non-trainable backends
bd74716 to
d08a04b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR introduces automated model compatibility and reproducibility checks for Annif models through a new GitHub Actions workflow. The implementation enables systematic verification that code changes don't break model backward compatibility or training reproducibility.
Key changes:
- New GitHub Actions workflow (
model-compatibility.yml) that runs compatibility and consistency checks on workflow dispatch or push events - Python script (
check_models_compatability_consistency.py) that downloads models from Hugging Face Hub, evaluates them, compares metrics against baselines, and reports significant differences - Two configuration files defining project setups for compatibility testing (8 projects including ensemble backends) and consistency testing (8 projects focusing on base backends)
Reviewed Changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 17 comments.
| File | Description |
|---|---|
.github/workflows/model-compatibility.yml |
GitHub Actions workflow orchestrating the compatibility checks with steps for environment setup and running both compatibility and consistency tests |
tests/check_models_compatability_consistency.py |
Python script implementing the core logic for downloading models/metrics, training, evaluation, comparison, and uploading results to Hugging Face Hub |
tests/projects-compatibility.cfg |
Configuration defining 8 projects (including yake-fi and ensemble-fi) for backward compatibility testing against existing trained models |
tests/projects-consistency.cfg |
Configuration defining 8 projects for reproducibility testing through retraining and metric comparison |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
|



This pull request introduces automated model compatibility and reproducibility checks for the backends, ensuring that changes to the codebase do not introduce significant metric regressions.
Key changes include:
Continuous Integration and Automation:
.github/workflows/model-compatibility.yml) that runs model compatibility and reproducibility checks on workflow_dispatch trigger executing thetests/check_models_compatability_consistency.pyscript with the--cioption.Testing Infrastructure and Scripts:
The script functions as follows in the two check modes:
compatibilitymode/subcommand:consistencymode/subcommand:abs(prev_value - new_value) / abs(prev_value)) for compatibility, and 0.03 for consistency (the larger value allow non-determinism in training).--cioption and detecting differences, the script exits with code 1 failing the GH Action job.The
uploadsubcommand of the script uploads the newly trained models and their evalution metrics to the HFH repo, thus "resetting" the state:In the above command,
uploadcan be changed tocompatibilityorconsistencyfor running in those modes.Configuration for Model Checks:
tests/projects-compatibility.cfgandtests/projects-consistency.cfgconfiguration files, which define the set of Annif projects (models) to be checked for compatibility and consistency, respectively. The first configuration is for projects of non-trainable backends.This testing is probably best used via the workflow dispatch trigger from the GH Actions workflow page, which allows also checking the status:
TODO: