[FEATURE] Allow user-defined functions for score normalization and combination in hybrid queries #994

martin-gaievski · 2024-11-19T05:36:18Z

Is your feature request related to a problem?

Currently, OpenSearch provides internal functionality for score normalization and combination in hybrid queries, as outlined in the Normalization Processor documentation. However, there is a need to allow users to define their own custom functions for these operations instead of solely relying on OpenSearch's internal mechanisms.

The ability to define user-specific functions will provide more flexibility and control over how scores are normalized and combined, especially for advanced use cases where the built-in functionality may not suffice.

What solution would you like?

Introduce the ability for users to define custom functions for score normalization and combination in hybrid queries. These functions could be implemented using:

Function Score Queries: Use the existing OpenSearch function score query framework to define custom scoring logic, including:
- Weighting: Apply custom weights to individual components of a query.
- Decay functions: Implement decay functions based on relevance or other factors.
- Custom logic: Create complex scoring models based on attributes, fields, or external parameters.
Painless Scripts: Allow users to use OpenSearch's Painless scripting language to define custom functions for score normalization and combination. Painless is a lightweight, secure, and performant scripting language that integrates seamlessly with OpenSearch.

The Painless scripting language can be used to define custom functions for complex logic, including:
- Custom arithmetic: Implement custom arithmetic for combining or normalizing scores.
- Conditional scoring: Apply conditional logic based on query results or document attributes.
- Advanced normalization: Create custom normalization schemes to adjust scores based on user-defined rules.

We can go and further and try to implement the support for invoking external scripts (e.g., Python or SQL) for even more sophisticated logic, where the internal scripting options may not be sufficient. This could allow users to execute pre-defined models or complex scoring algorithms that are managed externally.

Benefits:

Flexibility: Users can create highly customized scoring and normalization logic, tailored to their specific use case.
Advanced Control: Users gain fine-grained control over how scores are combined or normalized, leading to more relevant and accurate search results.
Extensibility: Users can leverage external tools (e.g., machine learning models, databases, or business rules) for scoring, creating a unified search pipeline.
Consistency: Streamline the process of integrating custom scoring logic across different search types and models, ensuring uniformity in query results.

Use Case:

For example, a user may want to combine results from multiple models (e.g., semantic search and traditional keyword search) and apply a custom score normalization function. Using a Painless script, the user could adjust the score of each result based on a combination of the model score and some external business logic (e.g., boosting certain results based on document metadata or user preferences).

Alternatively, a user might prefer to use a Python script to implement a more complex machine learning model for score normalization, offering them the flexibility to include custom ranking logic, external data, or ML-based techniques.

navneet1v · 2024-11-27T01:17:10Z

@martin-gaievski I remember MLcommons has the pre and post functions that can run on the embeddings we should see how they are doing it.

dblock · 2024-12-09T17:09:37Z

[Catch All Triage - 1, 2, 3, 4]

martin-gaievski added untriaged enhancement labels Nov 19, 2024

martin-gaievski changed the title ~~[FEATURE] Allow User-Defined Functions for Score Normalization and Combination in Hybrid Queries~~ [FEATURE] Allow user-defined functions for score normalization and combination in hybrid queries Nov 21, 2024

dblock removed the untriaged label Dec 9, 2024

minalsha added hybrid search Features Introduces a new unit of functionality that satisfies a requirement labels Jan 8, 2025

minalsha assigned vibrantvarun Jan 9, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEATURE] Allow user-defined functions for score normalization and combination in hybrid queries #994

[FEATURE] Allow user-defined functions for score normalization and combination in hybrid queries #994

martin-gaievski commented Nov 19, 2024

navneet1v commented Nov 27, 2024

dblock commented Dec 9, 2024

[FEATURE] Allow user-defined functions for score normalization and combination in hybrid queries #994

[FEATURE] Allow user-defined functions for score normalization and combination in hybrid queries #994

Comments

martin-gaievski commented Nov 19, 2024

Is your feature request related to a problem?

What solution would you like?

navneet1v commented Nov 27, 2024

dblock commented Dec 9, 2024