Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEATURE] Allow user-defined functions for score normalization and combination in hybrid queries #994

Open
martin-gaievski opened this issue Nov 19, 2024 · 2 comments
Assignees
Labels
enhancement Features Introduces a new unit of functionality that satisfies a requirement hybrid search

Comments

@martin-gaievski
Copy link
Member

Is your feature request related to a problem?

Currently, OpenSearch provides internal functionality for score normalization and combination in hybrid queries, as outlined in the Normalization Processor documentation. However, there is a need to allow users to define their own custom functions for these operations instead of solely relying on OpenSearch's internal mechanisms.

The ability to define user-specific functions will provide more flexibility and control over how scores are normalized and combined, especially for advanced use cases where the built-in functionality may not suffice.

What solution would you like?

Introduce the ability for users to define custom functions for score normalization and combination in hybrid queries. These functions could be implemented using:

  • Function Score Queries: Use the existing OpenSearch function score query framework to define custom scoring logic, including:

    • Weighting: Apply custom weights to individual components of a query.
    • Decay functions: Implement decay functions based on relevance or other factors.
    • Custom logic: Create complex scoring models based on attributes, fields, or external parameters.
  • Painless Scripts: Allow users to use OpenSearch's Painless scripting language to define custom functions for score normalization and combination. Painless is a lightweight, secure, and performant scripting language that integrates seamlessly with OpenSearch.

    The Painless scripting language can be used to define custom functions for complex logic, including:

    • Custom arithmetic: Implement custom arithmetic for combining or normalizing scores.
    • Conditional scoring: Apply conditional logic based on query results or document attributes.
    • Advanced normalization: Create custom normalization schemes to adjust scores based on user-defined rules.

We can go and further and try to implement the support for invoking external scripts (e.g., Python or SQL) for even more sophisticated logic, where the internal scripting options may not be sufficient. This could allow users to execute pre-defined models or complex scoring algorithms that are managed externally.

Benefits:

  • Flexibility: Users can create highly customized scoring and normalization logic, tailored to their specific use case.
  • Advanced Control: Users gain fine-grained control over how scores are combined or normalized, leading to more relevant and accurate search results.
  • Extensibility: Users can leverage external tools (e.g., machine learning models, databases, or business rules) for scoring, creating a unified search pipeline.
  • Consistency: Streamline the process of integrating custom scoring logic across different search types and models, ensuring uniformity in query results.

Use Case:

For example, a user may want to combine results from multiple models (e.g., semantic search and traditional keyword search) and apply a custom score normalization function. Using a Painless script, the user could adjust the score of each result based on a combination of the model score and some external business logic (e.g., boosting certain results based on document metadata or user preferences).

Alternatively, a user might prefer to use a Python script to implement a more complex machine learning model for score normalization, offering them the flexibility to include custom ranking logic, external data, or ML-based techniques.

@martin-gaievski martin-gaievski changed the title [FEATURE] Allow User-Defined Functions for Score Normalization and Combination in Hybrid Queries [FEATURE] Allow user-defined functions for score normalization and combination in hybrid queries Nov 21, 2024
@navneet1v
Copy link
Collaborator

@martin-gaievski I remember MLcommons has the pre and post functions that can run on the embeddings we should see how they are doing it.

@dblock dblock removed the untriaged label Dec 9, 2024
@dblock
Copy link
Member

dblock commented Dec 9, 2024

[Catch All Triage - 1, 2, 3, 4]

@minalsha minalsha added hybrid search Features Introduces a new unit of functionality that satisfies a requirement labels Jan 8, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Features Introduces a new unit of functionality that satisfies a requirement hybrid search
Projects
None yet
Development

No branches or pull requests

5 participants