Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[8.x] Vector rescoring oversamples k instead of num_candidates #119887

Open
wants to merge 3 commits into
base: 8.x
Choose a base branch
from

Conversation

carlosdelest
Copy link
Member

It makes more sense to apply rescoring to an oversampled k instead of num_candidates, as rescoring just a fraction of the candidates will be more performant and offer good recall, specially for smaller k sizes compared to number of candidates.

API changes so we use oversample instead of num_candidates_factor:

GET msmarco-v2-bbq/_search
{
    "query": {
        "knn": {
            "field": "emb",
            "query_vector": [...],
            "k": 10,
            "num_candidates": 100,
            "rescore_vector": {
                "overseample": 2.5
            }
        }
    }
}

This will mean rescoring k * oversample from the num_candidates retrieved on each shard, and returning the top k out of them.

Follow up to #116663

We start with 8.x and will backport to main, as this introduces an incompatible API change that we want to land in 8.x first so BwC and rest compat tests pass in main.

@carlosdelest carlosdelest added :Search Relevance/Vectors Vector search Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch auto-backport Automatically create backport pull requests when merged v9.0.0 v8.18.0 >non-issue labels Jan 9, 2025
@carlosdelest carlosdelest marked this pull request as ready for review January 9, 2025 19:55
@carlosdelest carlosdelest requested a review from benwtrent January 9, 2025 19:55
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-search-relevance (Team:Search Relevance)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
auto-backport Automatically create backport pull requests when merged >non-issue :Search Relevance/Vectors Vector search Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch v8.18.0 v9.0.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants