Skip to content

Conversation

@juhoinkinen
Copy link
Member

@juhoinkinen juhoinkinen commented Oct 30, 2025

This pull request introduces automated model compatibility and reproducibility checks for the backends, ensuring that changes to the codebase do not introduce significant metric regressions.

Key changes include:

Continuous Integration and Automation:

  • Added a new GitHub Actions workflow (.github/workflows/model-compatibility.yml) that runs model compatibility and reproducibility checks on workflow_dispatch trigger executing the tests/check_models_compatability_consistency.py script with the --ci option.

Testing Infrastructure and Scripts:

The script functions as follows in the two check modes:

  1. Download existing models and metrics from a Hugging Face Hub repository which is set via a repository GH Actions secret.
  2. Depends on mode:
    • In compatibility mode/subcommand:
      • evaluate the downloaded models with the current Annif code and compare to previous evaluation metrics.
    • In consistency mode/subcommand:
      • train new models with the current Annif code
      • evaluate the trained models and compare to previous evaluation metrics
  3. Flag all significant differences found in the comparison; a default threshold is 0.01 of the relative difference (= abs(prev_value - new_value) / abs(prev_value)) for compatibility, and 0.03 for consistency (the larger value allow non-determinism in training).
  • When running with the --ci option and detecting differences, the script exits with code 1 failing the GH Action job.

The upload subcommand of the script uploads the newly trained models and their evalution metrics to the HFH repo, thus "resetting" the state:

python tests/check_models_compatibility_consistency.py upload --hf_repo <repo-id-to-upload>

In the above command, upload can be changed to compatibility or consistency for running in those modes.

Configuration for Model Checks:

  • Added tests/projects-compatibility.cfg and tests/projects-consistency.cfg configuration files, which define the set of Annif projects (models) to be checked for compatibility and consistency, respectively. The first configuration is for projects of non-trainable backends.

This testing is probably best used via the workflow dispatch trigger from the GH Actions workflow page, which allows also checking the status: Model Compatibility Check

TODO:

  • Remove trigger on pushes to main or the feature branch.

@juhoinkinen juhoinkinen added this to the 1.5 milestone Oct 30, 2025
@juhoinkinen juhoinkinen added maintenance github_actions Pull requests that update GitHub Actions code labels Oct 30, 2025
@codecov
Copy link

codecov bot commented Oct 30, 2025

Codecov Report

❌ Patch coverage is 0% with 161 lines in your changes missing coverage. Please review.
✅ Project coverage is 97.72%. Comparing base (5bf0b9e) to head (307616f).
⚠️ Report is 2 commits behind head on main.

Files with missing lines Patch % Lines
tests/check_models_compatibility_consistency.py 0.00% 161 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #907      +/-   ##
==========================================
- Coverage   99.63%   97.72%   -1.91%     
==========================================
  Files         103      104       +1     
  Lines        8238     8399     +161     
==========================================
  Hits         8208     8208              
- Misses         30      191     +161     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@juhoinkinen juhoinkinen force-pushed the issue906-automate-model-compatibility-checks branch from f633916 to 924e812 Compare October 31, 2025 10:09
@juhoinkinen juhoinkinen force-pushed the issue906-automate-model-compatibility-checks branch 2 times, most recently from 9b1ea17 to bd74716 Compare October 31, 2025 15:34
@juhoinkinen juhoinkinen force-pushed the issue906-automate-model-compatibility-checks branch from bd74716 to d08a04b Compare November 12, 2025 13:11
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR introduces automated model compatibility and reproducibility checks for Annif models through a new GitHub Actions workflow. The implementation enables systematic verification that code changes don't break model backward compatibility or training reproducibility.

Key changes:

  • New GitHub Actions workflow (model-compatibility.yml) that runs compatibility and consistency checks on workflow dispatch or push events
  • Python script (check_models_compatability_consistency.py) that downloads models from Hugging Face Hub, evaluates them, compares metrics against baselines, and reports significant differences
  • Two configuration files defining project setups for compatibility testing (8 projects including ensemble backends) and consistency testing (8 projects focusing on base backends)

Reviewed Changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 17 comments.

File Description
.github/workflows/model-compatibility.yml GitHub Actions workflow orchestrating the compatibility checks with steps for environment setup and running both compatibility and consistency tests
tests/check_models_compatability_consistency.py Python script implementing the core logic for downloading models/metrics, training, evaluation, comparison, and uploading results to Hugging Face Hub
tests/projects-compatibility.cfg Configuration defining 8 projects (including yake-fi and ensemble-fi) for backward compatibility testing against existing trained models
tests/projects-consistency.cfg Configuration defining 8 projects for reproducibility testing through retraining and metric comparison

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

juhoinkinen and others added 8 commits November 13, 2025 15:03
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
@juhoinkinen juhoinkinen marked this pull request as ready for review December 9, 2025 13:50
@sonarqubecloud
Copy link

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

github_actions Pull requests that update GitHub Actions code maintenance

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants