Home Explore Blog CI



ragit

tests/benchmark/README.md
13b27d9f968ec6e78be4d4f6743ce675f96d5669a53e3f1a000000030000024f
# Ragit Benchmarks

TODO

## 1. Rerank Summary

A testset consists of (summaries, query, relevant summaries).

It runs `rerank_summary.pdl` with the set, and counts how many tuples it get correct. The problem is that it's tough to tell whether a summary is relenvant or not. It might seem relevant to someone and not relevant to another.

## 2. End-to-End

A testset consists of a knowledge-base and a set of questions and answers (multi-choice).

It first runs the testset without RAG, then with RAG.

It counts the number of questions that it got correct with RAG, but failed without RAG.

Chunks
f361e295 (1st chunk of `tests/benchmark/README.md`)
Title: Ragit Benchmark Tests
Summary
The Ragit benchmarks consist of two tests: Rerank Summary, which evaluates the relevance of summaries, and End-to-End, which compares the performance of a system with and without RAG on a knowledge-base and question set.