[8.x] Vector rescoring oversamples k instead of num_candidates #119887

carlosdelest · 2025-01-09T17:15:32Z

It makes more sense to apply rescoring to an oversampled k instead of num_candidates, as rescoring just a fraction of the candidates will be more performant and offer good recall, specially for smaller k sizes compared to number of candidates.

API changes so we use oversample instead of num_candidates_factor:

GET msmarco-v2-bbq/_search
{
    "query": {
        "knn": {
            "field": "emb",
            "query_vector": [...],
            "k": 10,
            "num_candidates": 100,
            "rescore_vector": {
                "overseample": 2.5
            }
        }
    }
}

This will mean rescoring k * oversample from the num_candidates retrieved on each shard, and returning the top k out of them.

Follow up to #116663

We start with 8.x and will backport to main, as this introduces an incompatible API change that we want to land in 8.x first so BwC and rest compat tests pass in main.

elasticsearchmachine · 2025-01-09T19:55:59Z

Pinging @elastic/es-search-relevance (Team:Search Relevance)

carlosdelest added 3 commits January 9, 2025 18:07

Use oversample to modify k instead of num_candidates for rescoring

0f956b4

Renaming typo

ab484b5

Fix test

401fede

carlosdelest added :Search Relevance/Vectors Vector search Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch auto-backport Automatically create backport pull requests when merged v9.0.0 v8.18.0 >non-issue labels Jan 9, 2025

carlosdelest marked this pull request as ready for review January 9, 2025 19:55

carlosdelest requested a review from benwtrent January 9, 2025 19:55

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[8.x] Vector rescoring oversamples k instead of num_candidates #119887

[8.x] Vector rescoring oversamples k instead of num_candidates #119887

carlosdelest commented Jan 9, 2025

elasticsearchmachine commented Jan 9, 2025

[8.x] Vector rescoring oversamples k instead of num_candidates #119887

Are you sure you want to change the base?

[8.x] Vector rescoring oversamples k instead of num_candidates #119887

Conversation

carlosdelest commented Jan 9, 2025

elasticsearchmachine commented Jan 9, 2025