Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC] Explainability for Hybrid query #905

Open
martin-gaievski opened this issue Sep 13, 2024 · 7 comments
Open

[RFC] Explainability for Hybrid query #905

martin-gaievski opened this issue Sep 13, 2024 · 7 comments

Comments

@martin-gaievski
Copy link
Member

Introduction

This document describes details of design for Explainability in Hybrid Query. This feature has been requested through GitHub issue #658.

Overview

Hybrid search combines multiple query types, like keyword and neural search, to improve search relevance. In 2.11 team has release hybrid query that is part of the neural-search plugin. Main responsibility of the hybrid query is to return scores of multiple queries that are normalized and combined.

Process of score normalization and combination is decoupled from actual query execution and score collection and is done in the search pipeline processor. That is different from other traditional queries, and makes it a non trivial to enable existing OpenSearch debug/troubleshoot tools like explain. Currently there is no way for user to check what part of the hybrid query contributes to the final normalized document score.

Problem Statement

User needs visibility on how each sub query result contributes to the final result of the hybrid query. Explain API is not enabled for hybrid query, on top of that we may need a special format for these results.

Requirements

Functional Requirements

At high level user needs to understand how each sub-query contributes to the final result. This should include following information:

  • details of how raw scores got transformed into normalized scores
  • raw scores of each individual sub query
  • extra information for source score, similar to today’s explain
  • used normalization techniques including parameters and weights

At high level response may look like following (note that this is scoped to a doc_id)

"docid: : "123456abcd",
"processor_level_explain": {
   "score_normalization": {
        {
           "type": "normalization",                         
           "technique": "min_max",                          
           "raw_scores_per_query": [0.5, 8.45, 0]           
           "processed_scores_per_query": [0.5, 0.95, 0,001] 
        },
        {
           "type": "combination",                       
           "technique": "arithmetic_mean",              
           "raw_scores_per_query": [0.5, 0.95, 0,001]   
           "processed_scores_per_query": [0.78]         
        }
   }
},
"query_phase_explain": {                                
   "query1_shard_level_score": {},
   "query2_shard_level_score": {}
}
  • Provided information should be relevant to search hits. If a document is a match for some sub query but got kicked out from the final result that should be clearly indicated in the explanation response.

Non functional requirements

  • There should be no regression in a query execution performance in case explainability is disabled/not active
  • “By query” request for large queries should not fail with timeout exception
  • Extensible solution: we should be able to add new details if needed (pagination, inner hits), potential support for new normalization/combination techniques/processors (ideally works out of the box, but minimal requirement is to provide clear interfaces to processors/technique
  • Minimal changes in User experience

Current state

There are no tools that provide this information to user in any form.

Consistent way to go for the user is to use existing “explain” API: that’s user expectation and it has lot of pros.
In response to that hybrid query returns “Explain is not supported” 500 response if hybrid query is called with standard explain parameter.

There are two types of explain calls:

  • search query. search query executed normally and every document from the search hits list got explanation
  • by document id. results are returned for the particular doc id (that’s passed by user), search query omits the query phase. In case doc id is a no hit this will return no match response

Today hybrid query will return “Explain is not supported” with response code 500 if its get called with explain parameter.

Explain at the query level

Following diagram shows the flow for search query with explain calls for non-hybrid query types

Explain_for_HQ_current_flow_v2

Notes:

  • first place where results from both 4 and 7 are available is SearchPhaseController

And following is the example of the search + explain request and response:

GET /myindex/_search?search_pipeline=nlp-search-pipeline&explain=true

"hits": {
        "total": {
            "value": 3,
            "relation": "eq"
        },
        "max_score": 1.9431856,
        "hits": [
            {
                "_shard": "[index-test][0]",
                "_node": "qnphyXiURve0A4xLRaRjSA",
                "_index": "index-test",
                "_id": "3",
                "_score": 1.9431856,
                "_source": {
                    "name": "Why would he go to all that effort for a free pack of ranch dressing?",
                    "category": "story",
                    "price": 10
                },
                "_explanation": {
                    "value": 1.9431856,
                    "description": "sum of:",
                    "details": [
                        {
                            "value": 0.94318557,
                            "description": "weight(name:effort in 0) [PerFieldSimilarity], result of:",
                            "details": [
                                {
                                    "value": 0.94318557,
                                    "description": "score(freq=1.0), computed as boost * idf * tf from:",
                                    "details": [
                                        {
                                            "value": 2.2,
                                            "description": "boost",
                                            "details": []
                                        },
                                        {
                                            "value": 0.98082924,
                                            "description": "idf, computed as log(1 + (N - n + 0.5) / (n + 0.5)) from:",
                                            "details": [
                                                {
                                                    "value": 1,
                                                    "description": "n, number of documents containing term",
                                                    "details": []
                                                },
                                                {
                                                    "value": 3,
                                                    "description": "N, total number of documents with field",
                                                    "details": []
                                                }
                                            ]
                                        },
                                        {
                                            "value": 0.43710023,
                                            "description": "tf, computed as freq / (freq + k1 * (1 - b + b * dl / avgdl)) from:",
                                            "details": [
                                                {
                                                    "value": 1.0,
                                                    "description": "freq, occurrences of term within document",
                                                    "details": []
                                                },
                                                {
                                                    "value": 1.2,
                                                    "description": "k1, term saturation parameter",
                                                    "details": []
                                                },
                                                {
                                                    "value": 0.75,
                                                    "description": "b, length normalization parameter",
                                                    "details": []
                                                },
                                                {
                                                    "value": 15.0,
                                                    "description": "dl, length of field",
                                                    "details": []
                                                },
                                                {
                                                    "value": 13.666667,
                                                    "description": "avgdl, average length of field",
                                                    "details": []
                                                }
                                            ]
                                        }
                                    ]
                                }
                            ]
                        },
                        {
                            "value": 1.0,
                            "description": "price:[10 TO 70]",
                            "details": []
                        }
                    ]
                }
            },
            {
                "_shard": "[index-test][0]",
                "_node": "qnphyXiURve0A4xLRaRjSA",
                "_index": "index-test",
                "_id": "1",
                "_score": 1.0,
                "_source": {
                    "name": "A West Virginia university women 's basketball team , officials , and a small gathering of fans are in a West Virginia arena .",
                    "category": "novel",
                    "price": 20
                },
                "_explanation": {
                    "value": 1.0,
                    "description": "sum of:",
                    "details": [
                        {
                            "value": 1.0,
                            "description": "price:[10 TO 70]",
                            "details": []
                        }
                    ]
                }
            }
        ]
    }

Explain by doc id

Motivation for designing this API was to make explain by query faster.

Explain_for_HQ_current_flow_explain_by_docid

You need to call explain by doc_id with following URL

GET /myindex/_explain/docid12345678

Challenges

Different parts of hybrid query results are coming from different steps/stages of the query execution. To have complete data for query results we need to have access to both shard level and coordinator level data.

Existing Explain API of OpenSearch works at the shard level. There isn’t close to Explain at the coordinator level/for search processors.

In worst case scenario we can output only one type of explain data, shard or processor level. In such case I would preferred processor level data because:

  • this is critical information in context of hybrid search
  • it’s not available today in any way including potential workarounds
  • shard level data can be obtained via existing Explain API, this can be a workaround

Possible solutions

  1. Modify standard Explain API
  2. Explain with new response processor
  3. New API
  4. Explain with a new Fetch sub phase

Following diagram shows all solution options at high level, options 2 and 4 are similar at that level of abstraction

All_solutions_high_level

Option 1: Standard Explain with custom merge in FetchSearch phase

Solution is based on following:

  • Add explain field to QuerySearchResult object
  • PhaseResults processor adds normalization explain information to the new field
  • Fetch phase stays unchanged and collects scores at the shard level
  • Modify merge method of the FetchSearchPhase. In SearchPhaseController.merge results of Query and Fetch phases are merged, that includes existing explain from Fetch phase and newly added explanations from Query phase.
  • Reuse SearchHit.explain to publish processor explain results

Explain_for_HQ_merge_in_query_featch_phase_flow

Pros:

  • no API changes, same UX except for explain by doc id
  • we will have shard level explain data for free
  • search is already part of the flow
  • transferable in case hybrid will became part of core

Cons:

  • hard to pass information from processor, explain works as the pre-fetch phase at coordinator level
  • some results may be irrelevant (reshuffle of individual query results in normalization processor)
  • explain by doc_id not supported
  • specific to PhaseResults processors
  • hard to extend, all changes will require adjustments in OS core
  • added latency and limitations because query level explain data will be part of the transport request/response

Current implementation of Explain:

  • query implements explain in weight class
  • special fetch sub phase that collects result from query and publish it to the search hits results

We can have two sections in the explain section, one with shard level score calculation detail similar to what system has today. And another new section that has details of how scores are normalized.

How the response will look like:

"_explanation": {
                    "value": 1.0,
                    "description": "hybrid query score: ",
                    "details": [
                        {
                            "value": 1.0,
                            "description": "normalized scores: ",
                            "details": [
                                {
                                    "value": 0.315,
                                    "description": "normalization technique min_max",
                                    "details": [
                                        {
                                            "value": 0.315,
                                            "description": "query A max score [1.0], min score [0.81], source score [0.943], min_max score [0.315]",
                                            "details": []
                                        },
                                        {
                                            "value": 1.0,
                                            "description": "query B max score [0.943], min score [0.943], source score [0.943], min_max score [1.0]",
                                            "details": []
                                        }
                                    ]
                                },
                                {
                                    "value": 1.0,
                                    "description": "combination technique arithmetic_mean",
                                    "details": [
                                        {
                                            "value": 1.0,
                                            "description": "combining scores [1.0, 0.943] with weights [0.5, 0.5], final doc score [1.0]",
                                            "details": []
                                        } 
                                    ]
                                }
                            ]
                        },
                        {
                            "value": 1.94318557,
                            "description": "low-level source score: ",
                            "details": [
                                {
                                    "value": 0.94318557,
                                    "description": "weight(name:effort in 0) [PerFieldSimilarity], result of:",
                                    "details": [
                                        {
                                            "value": 0.94318557,
                                            "description": "score(freq=1.0), computed as boost * idf * tf from:",
                                            "details": [
                                                {
                                                    "value": 2.2,
                                                    "description": "boost",
                                                    "details": []
                                                },
                                                {
                                                    "value": 0.98082924,
                                                    "description": "idf, computed as log(1 + (N - n + 0.5) / (n + 0.5)) from:",
                                                    "details": [
                                                        {
                                                            "value": 1,
                                                            "description": "n, number of documents containing term",
                                                            "details": []
                                                        },
                                                        {
                                                            "value": 3,
                                                            "description": "N, total number of documents with field",
                                                            "details": []
                                                        }
                                                    ]
                                                },

Option 2: Explain with new response processor [Recommended]

We can create a customized Explain solution specific to hybrid query. We can utilize response processor approach and modify SearchHits for each document. Explain information of normalization process can be shared between processor using existing pipeline state mechanism.

Pros:

  • most parts are easier to implement comparing to other options: search codepath with normalization processor is triggered by core, no processor specific logic goes to core
  • document scores from shard level are available
  • most infrastructure exists in core: pipeline context, response processor has access to SearchHits
  • smooth UX based on existing explain flag
  • no limitation on explain object size/format, pipeline context is in memory at coordinator node
  • easy to migrate in case hybrid will became part of core

Cons:

  • added extra configuration step (new response processor)
  • limitations on the format, we’ll be using existing “explain” object structure
  • explain by doc_id not supported

Because this is Recommended option we will put detailed diagrams in one of the next section.

Option 3: New profile API

Brand new API specifically for profiling hybrid query.

GET /_plugins/_neural/profile?search_pipeline=nlp-search-pipeline
{
    "query": {
        "hybrid": {
            "queries": [
                {
                    "match": {
                        "name": "effort"
                    }
                },
                {
                    "range": {
                        "price": {
                            "gte": 10,
                            "lte": 70
                        }
                    }
                }
            ]
        }
    },
    "size": 20
}

Pros:

  • full control of interface: both detailed format and by doc id are feasible
  • no burden of being backward compatible
  • extensible in future

Cons:

  • complex implementation: way to trigger search flow including processor, query specific code should be developed from scratch
  • all edge cases for search should be re-implemented (pagination, aggregations etc.)
  • learning curve for users
  • limited to neural search

Option 4: Explain with a new Fetch sub phase

This option has been suggested during the design review.

Main idea:

  • similar to Option 2, we put normalization details to the pipeline context, don by the normalization processor
  • instead of response processor we create new fetch sub phase

Pros:

  • will be executed after the core get explanations feth sub phase → it has access to shard level explanations
  • more flexible because it’s registered in the plugin
  • smoother UX as registration is done by the system, no new/extra steps required from user
  • less effort as no app sec required

Cons:

  • limitations on the format, we’ll be using existing “explain” object structure
  • explain by doc_id not supported

Main question is - if pipeline context is available in fetch sub-phase. If not, how much is the effort to change that.

Current findings:
Fetch processors are executed from FetchPhase:
https://github.com/opensearch-project/OpenSearch/blob/main/server/src/main/java/org/opensearch/search/fetch/FetchPhase.java#L181-L182
only argument they do have is FetchContext
https://github.com/opensearch-project/OpenSearch/blob/main/server/src/main/java/org/opensearch/search/fetch/FetchContext.java

Pipeline context is not part of it and is nowhere near so it cannot be easily passed

Solution Comparison

We drop the Option 4 because it’s not feasible from the technical side. Results from normalization processor cannot be passed to the fetch phase - that’s the major blocker.

Let’s compare solutions by the way they fulfill requirements:

  • Completeness of information about hybrid scores
    • All options will have required information. New API should be slightly better, but other options providing complete information.
  • Performance
    • There are no clear leader or outlier between proposed options. With new request processor approach (Option 2) all explain objects are stored in memory of coordinator node, local for new processor. For Option 1 and 3 objects must be serialized, send between nodes and deserialized back.
  • Extensibility
    • New API approach (Option 3) can be extended as needed, but it’s limited to a neural search plugin and its processors.
    • Request processor approach (Option 2) can be extended by new details and is generic to put explain information from other processors: other phase results processor or even other processor type like request processor.
    • Option 1 with custom merger is the least extensible: any change in plugin require changes in core, support for other processor types is problematic.
  • User Experience
    • Approaches that are based on existing Explain API (Options 1 and 2) promise good user adoption because of well known interface. Both has a drawback of not supporting “explain by doc id”.
    • New API option (Option 3) can have “explain by doc id” mode, but it will work only partially. Behind the scenes we will need to execute full hybrid query, and “explain by doc id” is a result of post processing. Also there will be a learning curve for users and potential complain on why “explain” doesn’t work in the same way as in other queries.
    • For new response processor (Option 2) additional setup is required to add it as part of the search pipeline. Part of the phased delivery plan is to make is transparent for user by adding “dependent processors” mechanism in core.

One more aspect of evaluation is amount of Engineering Efforts.
New API option will require a lot more of efforts due to creation of new API endpoint and mechanism for triggering search and pipeline processor flows. Other two options that are based on Explain are comparable in terms of efforts.

High Level Design

Based on the recommended directions from Option 2, following is the high level flow diagram. In this example we have 2 data nodes with 2 shards each, this is provided for the sake of example.

Explain_for_HQ_Response_processor_flow

Key Design Decisions

  • For any option based on Explain we need to enable explain functionality at the Hybrid Query level to get the shard level details. It’s a straightforward task, the only thing to decide is the format of the high level wrapper for explanations from other queries.
  • For getting explain date to the new response processor we’re going to use Pipeline Context, it’s the only object that can be accessed by phase result and response processors. That is essentially a Java <String, Object> map and it’s passed between processors in an order of execution. Processors can change the content of that Pipeline context.
  • We need to make new response processor calls idempotent to avoid data corruption in case user has configured same processor twice. At the same time if processor isn’t configured the response will be same as today, without processor level explain.
  • For recommended option (new response processor) no extra measures required to keep results isolated between different search requests. New PipelineContext object is created for each new search request.

Short Term/Long Term implementation

Setting up new response processor:

Short Term

  • When we add a new response processor we will rely on user to configure that processor. User should have a search pipeline in order to use Hybrid query, response processor should be added to that pipeline.

Long Term

  • Processors can be executed automatically by the system if we introduce a dependency between them. One processor defines “depends on” relation for another processor. When system executes the processor it checks for such dependents and execute them. Processor call should be idempotent to avoid errors in case of multi setup. Such mechanism should be implemented in core.

Issue for adding mechanism of processor dependencies to core opensearch-project/OpenSearch#15921

Metrics

New metrics is not required because the new functionality is the on-demand expert level debug tool. We can add basic counter at the request processor level to check number of times it's called.

Potential Issues

Known limitations and Future extensions

There can be a risk of having explain slows down the search request. We can’t optimize similar to Explain by doc id because for hybrid query we need to execute query in full.
Another related factor is how full explain information should be. Because we need both shard level explanations and coordinator level data from processor, execution will need more resources, meaning will not be as fast as existing explain.

Manual steps of setting up new response processor should be eliminated in future by “dependent on” processors. This is a two ways door.

Solution LLD

Main steps we need to take

Enable explain for Hybrid Query

That steps is needed to get shard level scores before normalization for all options.

We need to go over each sub-query and call its explain method:

    public Explanation explain(LeafReaderContext context, int doc) throws IOException {
        boolean match = false;
        for (Weight wt : weights) {
            Explanation e = wt.explain(context, doc);
            if (e.isMatch()) {
                match = true;
                subsOnMatch.add(e);
            } else if (!match) {
                subsOnNoMatch.add(e);
            }
        }
        if (match) {
            return Explanation.match(max, "combination of:", subsOnMatch);
        } else {
            return Explanation.noMatch("no matching clause", subsOnNoMatch);
        }
    }

Modify Normalization Processor

  • read the explain flag
  • modify processor methods to get access to pipeline context
  • if explain enabled then add the processor execution details to pipeline context. Make it per document id, later in response processor we need to find details for each document id.
  • we don’t have id for document, only docId which is not global id but unique for scope of shard. We address it by using docId + shardId as unique key for the document.
  • generic framework for adding explains from different workflow steps. This allow each existing and new technique and processor to be plugged easy. Three elements of explain make sense:
  • general description of the technique. It’s a P2, detailed version of the pipeline configuration
  • normalization part of explain
  • combination part of explain

Following diagram shows flow for the normalization processor

LLD_flow_normalization_processor

Following diagram shows new methods that will be added to normalizer and combiner worker classes and lower level normalization and combination technique classes.

LLD_explain_in_technique_class_diagram

  • describe - gives text description of the technique without exact scores for particular request or document. It’s similar to toString()
  • explain - gives detailed description at the query level, including scores for documents.
Why explain part of the interface is different between normalization and combination?

TL;DR
Interfaces of techniques are different: normalization takes results from all shards, and combination accepts array of scores for a single document. This is a deal breaker for scores by doc data.

Details

Because process of scores normalization and combination are fundamentally different if we talk about runtime dynamic data with scores.

For normalization we need scores for same query and from all shards. Responsibility of technique is to work with a single document score. Score normalizer class is pretty light and all heavy lifting is done in technique. That’s why it makes sense to put explain to technique class.

Combination needs scores for the same document from all sub queries. For one document id data is from the single shard. All heavy lifting is done in score combiner: it groups all scores by doc id. Technique class is responsible for doing combination with all scores of one document, it’s practically pure mathematical calculations without knowledge of OpenSearch abstractions.

This is not the same for the description though, function description is static.

Create new Response Processor

  • add factory class for the new response processor, no params needed

Following is example of request that creates search pipeline with existing hybrid query and new response processor.

{
    "description": "Post processor for hybrid search",
    "phase_results_processors": [
        {
            "normalization-processor": {
                "normalization": {
                    "technique": "min_max"
                }
            }
        }
    ],
    "response_processors": [
        {
            "processor_explain_publisher": {}
        }
    ]
}
  • read explain flag
  • check if processor explain data is not present then skip (idempotent call and NO_OP call in case explain flag is not passed)
  • read processor explain data from pipeline context
  • iterate over search hits from query results, for each hit find its processor level explain and merge it with shard level explanation
  • we need to use docId + shard id to lookup the explain per doc

LLD_flow_response_processor

I’ve done a POC to prove this LLD works: https://github.com/martin-gaievski/neural-search/tree/poc/explain_for_hybrid_v2

References

Feedback Required

We greatly value feedback from the community to ensure that this proposal addresses real-world use cases effectively. Here are a few specific points where your input would be particularly helpful:

  • Is the explanation provided sufficient?
    We want to ensure that the level of detail in the explanation is sufficient for practical, real-life use cases. If you'd like to see additional details or more comprehensive results, please let us know.

  • "Explain by doc ID" usage in hybrid queries:
    If you're currently using the "explain by doc ID" functionality, we’d love to hear about its relevance to your workflows, especially when it comes to hybrid queries. Is this feature crucial for your use case, or would you prefer other alternatives?

  • Additional response setup concerns:
    Does setting up an additional response process introduce challenges for your system? If so, please share specific issues or concerns so that we can better understand how this impacts your setup.

@yuye-aws
Copy link
Member

Good to see this RFC. This could also help in opensearch-project/ml-commons#2612.

@yuye-aws
Copy link
Member

If we support explain for hybrid query, can we also support explain for nested neural and neural sparse query?

@yuye-aws
Copy link
Member

yuye-aws commented Oct 16, 2024

Considering user experience, supporting hybrid query explain just like the bm25 is the best option. I am also considering how to let the user know the search relevance score in nested query. If you are having a design review meeting, feel free to invite me. Thank you @martin-gaievski !

@martin-gaievski
Copy link
Member Author

If we support explain for hybrid query, can we also support explain for nested neural and neural sparse query?

Those are unrelated functionalities. In terms of hybrid query we're adding explain info only for the approach related to score normalization and combination. Individual queries must add support for explain. For instance knn and thus neural queries do have support explain, and that needs to be addressed by k-NN owners

@martin-gaievski
Copy link
Member Author

Considering user experience, supporting hybrid query explain just like the bm25 is the best option. I am also considering how to let the user know the search relevance score in nested query. If you are having a design review meeting, feel free to invite me. Thank you @martin-gaievski !

Can you elaborate on what the "explain just like the bm25" mean? One limitation that will always be there is that for traditional queries like most of bm25 scores are calculated only at the shard level, and for hybrid query it's both at the shard and at coordinator. That's what limiting us from adding explain by doc id, so it's not going to be exactly as in bm25.

We actually had a design review for explain about a month ago, we tend to publish RFC after the design has been reviwed.

@yuye-aws
Copy link
Member

Can you elaborate on what the "explain just like the bm25" mean?

I mean the explain for the match query works: https://opensearch.org/docs/latest/query-dsl/full-text/match/.

@yuye-aws
Copy link
Member

Those are unrelated functionalities.

Agreeing that hybrid query and nested query are actually not related. Just be curious what if a user wants to explain a hybrid nested query.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: New
Status: 2.19.0
Development

No branches or pull requests

4 participants