WIP LLM ranking/scoring backend #859

juhoinkinen · 2025-07-08T12:53:57Z

Closes #856.

Co-authored-by: Copilot Co-authored-by: Osma Suominen <osma.suominen@helsinki.fi>

This is mainly workaround for differing model names in Ollama and HFH, which complicates tokenizer selection

annif/backend/llm_ensemble.py

codecov · 2025-07-08T12:56:40Z

Codecov Report

Attention: Patch coverage is 1.87500% with 157 lines in your changes missing coverage. Please review.

Project coverage is 97.56%. Comparing base (6bae2e5) to head (2013e9c).

Files with missing lines	Patch %	Lines
annif/backend/llm_ensemble.py	0.00%	155 Missing ⚠️
annif/backend/__init__.py	33.33%	2 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #859      +/-   ##
==========================================
- Coverage   99.64%   97.56%   -2.09%     
==========================================
  Files          99      100       +1     
  Lines        7349     7509     +160     
==========================================
+ Hits         7323     7326       +3     
- Misses         26      183     +157

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>

Copilot

Pull Request Overview

This PR adds exponentiated weighted averaging to suggestions and implements an LLM-based ensemble backend for ranking and scoring.

Extend SuggestionBatch.from_averaged to accept an optional exponents parameter for score exponentiation.
Introduce BaseLLMBackend and LLMEnsembleBackend with OpenAI/AzureOpenAI integration and parallel prompt processing.
Register the new llm_ensemble backend in the backend factory.

Reviewed Changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.

File	Description
annif/suggestion.py	Added `exponents` parameter and updated averaging logic/docstring.
annif/backend/llm_ensemble.py	New LLM ensemble backend: API calls, prompt handling, ensemble logic.
annif/backend/init.py	Registered `llm_ensemble` backend.

Comments suppressed due to low confidence (2)

annif/suggestion.py:125

[nitpick] Update the docstring for from_averaged to include a description of the new exponents parameter and its default behavior.

        """Create a new SuggestionBatch where the subject scores are the

annif/backend/llm_ensemble.py:263

[nitpick] Add a brief docstring to _get_labels_batch to clarify its behavior and inputs, improving code readability.

    def _get_labels_batch(self, suggestion_batch: SuggestionBatch) -> list[list[str]]:

annif/suggestion.py

annif/backend/llm_ensemble.py

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

sonarqubecloud · 2025-07-10T12:52:01Z

Quality Gate passed

Issues
0 New issues
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube Cloud

juhoinkinen and others added 23 commits July 8, 2025 15:02

Rename file to adhere naming convention

4de2afb

Move LLM-related methods to BaseLLMBackend

eb10357

Docstrings and cleanup

b3773f7

Verify connection to LLM when initializing backend

0cd63e9

Catch Azure connection exceptions

dff5836

Add hyperopt support for llm_weight optimization

a278dd5

Call LLM concurrently for whole batch with threading

9bf34da

Co-authored-by: Copilot Co-authored-by: Osma Suominen <osma.suominen@helsinki.fi>

Increase thread number from default

9516311

Optionally exponentiate scores when combining suggestions

9de49b8

Expand hyperopt search range of llm_exponent to 0...30.0

e882c6c

Switch LLM score range to 0...100

313ce65

Initialize source projects after verifying LLM connection

bbd7278

Add possibility to connect to OpenAI style API

61979b5

Raise ConfigurationException for missing model setting

e89b925

Support both tiktoken and HF tokenizer

c3135c4

Disable prompt truncation by default

d6a3a52

This is mainly workaround for differing model names in Ollama and HFH, which complicates tokenizer selection

Use log distribution for llm_exponent sampling in hyperopt

e034340

Set default param max_completion_tokens=2000

fcba710

Catch all OpenAIErrors

5c7de8e

Create openai client only if it does not exist

df38ecc

More compact message for JSON decode error

6ef17d9

Reduce exponent search region lower limit

4e09803

Fix llm_ensemble hyperopt to work with score exponents

786b4a7

juhoinkinen added the enhancement label Jul 8, 2025

github-advanced-security bot found potential problems Jul 8, 2025

View reviewed changes

annif/backend/llm_ensemble.py Fixed Show fixed Hide fixed

annif/backend/llm_ensemble.py Fixed Show fixed Hide fixed

juhoinkinen and others added 4 commits July 8, 2025 15:57

Remove unused import

42737ba

Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>

Remove commented-out code

bf8648c

Remove unused variable

0f52bf0

Fix backend name setup lost in cherry-pick

af8a911

juhoinkinen added 2 commits July 9, 2025 15:42

Log messages instead of printing them

c8d3272

Better prompt formatting & logging

d54c441

juhoinkinen requested a review from Copilot July 10, 2025 12:28

Copilot AI reviewed Jul 10, 2025

View reviewed changes

annif/suggestion.py Show resolved Hide resolved

annif/backend/llm_ensemble.py Outdated Show resolved Hide resolved

annif/backend/llm_ensemble.py Outdated Show resolved Hide resolved

juhoinkinen and others added 2 commits July 10, 2025 15:39

Fix variable name in error message to llm_exponent

b2432b9

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Make max_workers configurable

2013e9c

juhoinkinen mentioned this pull request Jul 10, 2025

LLM ranking/scoring backend #856

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

WIP LLM ranking/scoring backend #859

WIP LLM ranking/scoring backend #859

Uh oh!

juhoinkinen commented Jul 8, 2025

Uh oh!

Uh oh!

Uh oh!

codecov bot commented Jul 8, 2025 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

sonarqubecloud bot commented Jul 10, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

WIP LLM ranking/scoring backend #859

Are you sure you want to change the base?

WIP LLM ranking/scoring backend #859

Uh oh!

Conversation

juhoinkinen commented Jul 8, 2025

Uh oh!

Uh oh!

Uh oh!

codecov bot commented Jul 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

sonarqubecloud bot commented Jul 10, 2025

Quality Gate passed

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

codecov bot commented Jul 8, 2025 •

edited

Loading