Skip to content

Conversation

@bond005
Copy link

@bond005 bond005 commented Jan 16, 2026

  1. Incorrect signatures in the examples/calculate_baseline_metrics.ipynb are fixed.
  2. The rag-bench package is refactored, and errors in this package are corrected. Some of these errors are:
    • found_ids in the evaluate_rag_results function were processed incorrectly, because they were represented as string instead of an int list;
    • question types were not taken into account when calculating metrics, although they should be according to the evaluator's logic;
    • and so on.
  3. In addition to the jupyter notebook, a special python script examples/calculate_baseline_metrics.py is implemented for more convenient launch of the baseline and obtaining metrics.
  4. The special readme.md for the rag-bench package is added.

Items 1 and 2 ensure that the benchmark can be used correctly (without the bug fixes described in these items, it is impossible to use the benchmark to evaluate RAG pipelines). Items 3 and 4 simply improve the usability of the benchmark.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant