Commit cf11be8
committed
feat(retrieval): page-level BM25 + SPLADE retrieval over OCR'd reports
Self-contained PEP 723 script that indexes OCR'd .mmd reports page by page
and ranks pages for a query, to measure whether a method retrieves the page
containing a given KPI. BM25 + SPLADE on one pyserini/Lucene stack sharing a
single docid space; ColBERTv2 deferred to a later script.1 parent a43a856 commit cf11be8
1 file changed
Lines changed: 566 additions & 0 deletions
0 commit comments