Addressing differences in the backend score scales

Currently, the simple ensemble in Annif averages the scores from different backends with just weighting. However, the score scales of the backends are not necessarily uniform. For example, a score of 0.8 from MLLM might mean a suggestion is good, while the same score from Bonsai could mean it's very good, but in the lower score-range the MLLM scores could be "more representative". Thus, simply weighting of the scores from different algorithms while combining them is not necessarily optimal.

The score distributions of some algorithms for the JYX documents were plotted quite some time ago (see [this Slack thread](https://kansalliskirjasto.slack.com/archives/CDPLSJ49J/p1614254661021000)), see below. Notably, the fastText scores are much lower than the scores of other algorithms. (I assume MLLM gives similar results as Maui.)

<img width="397" height="278" alt="Image" src="https://github.com/user-attachments/assets/db5288f8-9bbe-442a-8f8d-8dd1a3e191fb" />
<img width="397" height="278" alt="Image" src="https://github.com/user-attachments/assets/7dc55eab-0b06-4078-a7be-b6d180151a11" />
<img width="406" height="278" alt="Image" src="https://github.com/user-attachments/assets/2bd4c82b-f459-436b-8a7a-ab706bf264f0" />
<img width="397" height="278" alt="Image" src="https://github.com/user-attachments/assets/0603033c-b66c-4445-aaf8-4aab608a407e" />
<img width="397" height="278" alt="Image" src="https://github.com/user-attachments/assets/e0b90360-c665-4351-8175-60e265d53727" />

---

An approach tried out in the [GermEval task](https://sites.google.com/view/llms4subjects-germeval/participate?authuser=0) was to not just average the base suggestions' scores, but to first raise the score to some power x, and then multiply by a weight w: score**x * w. The exponent x could vary for each backend and could be optimized using hyperopt. This is similar to the NN ensemble, where all scores are square-rooted (x=0.5), but the exponent was made backend-specific.

There is now the [`exponentiate-scores`](https://github.com/NatLibFi/Annif/compare/exponentiate-scores) branch including those changes.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Addressing differences in the backend score scales #862

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Addressing differences in the backend score scales #862

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions