[FEATURE] Allow user-defined functions for score normalization and combination in hybrid queries #994
Labels
enhancement
Features
Introduces a new unit of functionality that satisfies a requirement
hybrid search
Is your feature request related to a problem?
Currently, OpenSearch provides internal functionality for score normalization and combination in hybrid queries, as outlined in the Normalization Processor documentation. However, there is a need to allow users to define their own custom functions for these operations instead of solely relying on OpenSearch's internal mechanisms.
The ability to define user-specific functions will provide more flexibility and control over how scores are normalized and combined, especially for advanced use cases where the built-in functionality may not suffice.
What solution would you like?
Introduce the ability for users to define custom functions for score normalization and combination in hybrid queries. These functions could be implemented using:
Function Score Queries: Use the existing OpenSearch function score query framework to define custom scoring logic, including:
Painless Scripts: Allow users to use OpenSearch's Painless scripting language to define custom functions for score normalization and combination. Painless is a lightweight, secure, and performant scripting language that integrates seamlessly with OpenSearch.
The Painless scripting language can be used to define custom functions for complex logic, including:
We can go and further and try to implement the support for invoking external scripts (e.g., Python or SQL) for even more sophisticated logic, where the internal scripting options may not be sufficient. This could allow users to execute pre-defined models or complex scoring algorithms that are managed externally.
Benefits:
Use Case:
For example, a user may want to combine results from multiple models (e.g., semantic search and traditional keyword search) and apply a custom score normalization function. Using a Painless script, the user could adjust the score of each result based on a combination of the model score and some external business logic (e.g., boosting certain results based on document metadata or user preferences).
Alternatively, a user might prefer to use a Python script to implement a more complex machine learning model for score normalization, offering them the flexibility to include custom ranking logic, external data, or ML-based techniques.
The text was updated successfully, but these errors were encountered: