fix(search): make graphql query autoCompleteForMultiple to show exact matches first #11586

deepgarg-visa · 2024-10-10T15:19:55Z

This PR is targeting the below Issue

Checklist

The PR conforms to DataHub's Contributing Guideline (particularly Commit Message Format)
Links to related issues (if applicable)
Tests for the changes have been added/updated (if applicable)
Docs related to the changes have been added/updated (if applicable). If a new feature has been added a Usage Guide has been added for the same.
For any breaking change/potential downtime/deprecation/big changes an entry has been made in Updating DataHub

…tches first

…ry_result # Conflicts: # metadata-io/src/main/java/com/linkedin/metadata/search/elasticsearch/query/request/AutocompleteRequestHandler.java

david-leifker · 2024-10-11T03:13:48Z

...ava/com/linkedin/metadata/search/elasticsearch/query/request/AutocompleteRequestHandler.java

    if (field != null && !field.isEmpty()) {
-      return ImmutableList.of(field);
+      return ImmutableList.of(Pair.of(field, "2.0"));


This is hardcoded to 2.0 and looks like it treats URN matches and non-URN matches the same. Previously it weighted URNs higher which probably doesn't make sense for auto-complete purposes, but now the conditional on line 220 is the same for both conditions. So we should probably simplify the conditional and likely remove the boostScore entirely?

As I checked, we can't remove the boostscore entirely. We need to have higher boost score for ngram fields for non-urn fields in "multi_match" clause to get the expected result. I would suggest that we can have boostscore for non-urn fields from their respective searchable spec(which I already did) and urn field can have only default boost score (1.0), as you have rightly mention that it doesn't make sense for auto-complete purposes.

Please find the expected multi_match clause

{ "multi_match": { "query": "aucdl", "fields": [ "name.delimited^1.0", "name.ngram^10.0", "name.ngram._2gram^10.0", "name.ngram._3gram^10.0", "name.ngram._4gram^10.0", "urn.delimited^1.0", "urn.ngram^1.0", "urn.ngram._2gram^1.0", "urn.ngram._3gram^1.0", "urn.ngram._4gram^1.0" ], "type": "phrase", "operator": "OR", "slop": 0, "prefix_length": 0, "max_expansions": 50, "zero_terms_query": "NONE", "auto_generate_synonyms_phrase_query": true, "fuzzy_transpositions": true, "boost": 1.0 } }

In case of function at line 266, we can put condition to have default boostscore of 10.0 for non-urn field and boostscore of 1.0 for urn field ?

Ok, there are two things to consider here, the relative weights between the fields being used for the query AND the relative weights of the subfields (delimited, ngram, etc). Most likely we should be applying a multiplier. The URN while important as a unique identifier for general search is not as important here so the urn field weight is likely 1.0.

Next lets consider a multiplier for ngram. I am not seeing these being applied ->

datahub/metadata-service/configuration/src/main/resources/application.yaml

Line 240 in b74ba11

wordGram:

The field weights for ngrams should be impacted by that configuration and it doesn't appear to be based on. the example query above.

Generally the weight formula for ngram subfields should be something like:
<field annotation weight> * <configuration ngram factor>

Hi @david-leifker, As suggested done the following changes

add "match" clause for exact matches

make URN field default boostscore to 1.0

make ngram subfields weight as per the formula suggested

add a testcase in class "SampleDataFixtureTestBase.java"

add testdata in containerindex_v2.json.gz

reverted to "bool_prefix" type in "multi_match" clause

simplify the condition at line 222

Please find the final query

{ "multi_match": { "query": "container", "fields": [ "name.delimited^1.0", "name.ngram^10.0", "name.ngram._2gram^12.0", "name.ngram._3gram^15.0", "name.ngram._4gram^18.0", "urn.delimited^1.0", "urn.ngram^1.0", "urn.ngram._2gram^1.0", "urn.ngram._3gram^1.0", "urn.ngram._4gram^1.0" ], "type": "bool_prefix", "operator": "OR", "slop": 0, "prefix_length": 0, "max_expansions": 50, "zero_terms_query": "NONE", "auto_generate_synonyms_phrase_query": true, "fuzzy_transpositions": true, "boost": 1 } }, { "match": { "name.keyword": { "query": "container", "operator": "OR", "prefix_length": 0, "max_expansions": 50, "fuzzy_transpositions": true, "lenient": false, "zero_terms_query": "NONE", "auto_generate_synonyms_phrase_query": true, "boost": 10 } } }

david-leifker

Just the one comment and wanted to point out the 3 failing tests (x2 for elasticsearch & opensearch) which likely need to be updated or taken into consideration.

metadata-io-search > elasticsearch-testcontainers > com.linkedin.metadata.search.elasticsearch.SampleDataFixtureElasticSearchTest > testGroupAutoComplete FAILED
    java.lang.AssertionError: Expected 1 results for `T` found 0 expected [true] but found [false]
        at org.testng.Assert.fail(Assert.java:111)
        at org.testng.Assert.failNotEquals(Assert.java:1578)
        at org.testng.Assert.assertTrue(Assert.java:57)
        at com.linkedin.metadata.search.fixtures.SampleDataFixtureTestBase.lambda$testGroupAutoComplete$23(SampleDataFixtureTestBase.java:915)
        at java.base/java.lang.Iterable.forEach(Iterable.java:75)
        at com.linkedin.metadata.search.fixtures.SampleDataFixtureTestBase.testGroupAutoComplete(SampleDataFixtureTestBase.java:909)
...
metadata-io-search > elasticsearch-testcontainers > com.linkedin.metadata.search.elasticsearch.SampleDataFixtureElasticSearchTest > testGroupAutoComplete FAILED
    java.lang.AssertionError: Expected 1 results for `T` found 0 expected [true] but found [false]
        at org.testng.Assert.fail(Assert.java:111)
        at org.testng.Assert.failNotEquals(Assert.java:1578)
        at org.testng.Assert.assertTrue(Assert.java:57)
        at com.linkedin.metadata.search.fixtures.SampleDataFixtureTestBase.lambda$testGroupAutoComplete$23(SampleDataFixtureTestBase.java:915)
...
metadata-io-search > elasticsearch-testcontainers > com.linkedin.metadata.search.elasticsearch.SampleDataFixtureElasticSearchTest > testUserAutoComplete FAILED
    java.lang.AssertionError: Expected at least 1 results for `D` found 0 expected [true] but found [false]
        at org.testng.Assert.fail(Assert.java:111)
        at org.testng.Assert.failNotEquals(Assert.java:1578)
        at org.testng.Assert.assertTrue(Assert.java:57)
        at com.linkedin.metadata.search.fixtures.SampleDataFixtureTestBase.lambda$testUserAutoComplete$24(SampleDataFixtureTestBase.java:935)
        at java.base/java.lang.Iterable.forEach(Iterable.java:75)
        at com.linkedin.metadata.search.fixtures.SampleDataFixtureTestBase.testUserAutoComplete(SampleDataFixtureTestBase.java:929)

912 tests completed, 6 failed, 71 skipped

> Task :metadata-io:test FAILED

…y_result' into fix_autoCompleteForMultiple_query_result

deepgarg-visa · 2024-10-13T15:49:04Z

Hi @david-leifker , "phrase" type in multi_match clause is not handling query with one and two characters very well. More than that its working fine. Fix the testcases accordingly. But needs your input regarding this

david-leifker · 2024-10-14T15:39:35Z

@deepgarg-visa - those test cases are important, autocomplete needs to properly handle < 3 character queries and is responsible for this function for type-ahead use cases in the UI. We need a solution which supports less than 3 characters and this may involve combining both of the queries (the previous one with the new one) so that both use-cases can work.

deepgarg-visa · 2024-10-15T14:46:13Z

@deepgarg-visa - those test cases are important, autocomplete needs to properly handle < 3 character queries and is responsible for this function for type-ahead use cases in the UI. We need a solution which supports less than 3 characters and this may involve combining both of the queries (the previous one with the new one) so that both use-cases can work.

@david-leifker, I have reverted the changes in testcases

david-leifker

This looks great. Thank you!

deepgarg-visa · 2024-10-15T15:28:52Z

@david-leifker, could you please mark this issue resolved:
#11411

… matches first (datahub-project#11586)

fix(search): make graphql query autoCompleteForMultiple show exact ma…

f5181a8

…tches first

github-actions bot added product PR or Issue related to the DataHub UI/UX community-contribution PR or Issue raised by member(s) of DataHub Community labels Oct 10, 2024

Merge branch 'refs/heads/master' into fix_autoCompleteForMultiple_que…

3910999

…ry_result # Conflicts: # metadata-io/src/main/java/com/linkedin/metadata/search/elasticsearch/query/request/AutocompleteRequestHandler.java

vercel bot deployed to Preview October 10, 2024 16:14 View deployment

Merge branch 'master' into fix_autoCompleteForMultiple_query_result

39ff93a

vercel bot deployed to Preview October 10, 2024 17:53 View deployment

david-leifker self-assigned this Oct 11, 2024

david-leifker reviewed Oct 11, 2024

View reviewed changes

david-leifker requested changes Oct 11, 2024

View reviewed changes

fix(search): fix testcases

0ae7f57

deepgarg-visa closed this Oct 13, 2024

deepgarg-visa reopened this Oct 13, 2024

deepgarg-visa and others added 3 commits October 13, 2024 19:46

Merge branch 'master' into fix_autoCompleteForMultiple_query_result

97f788a

Merge branch 'master' into fix_autoCompleteForMultiple_query_result

1f8a7ce

Merge remote-tracking branch 'origin/fix_autoCompleteForMultiple_quer…

0d38a8b

…y_result' into fix_autoCompleteForMultiple_query_result

vercel bot deployed to Preview October 13, 2024 15:06 View deployment

deepgarg-visa requested a review from david-leifker October 14, 2024 13:28

deepgarg-visa and others added 2 commits October 15, 2024 19:29

Merge branch 'master' into fix_autoCompleteForMultiple_query_result

ba9bcba

implement review comments and code refactor

071d032

vercel bot deployed to Preview October 15, 2024 14:33 View deployment

david-leifker approved these changes Oct 15, 2024

View reviewed changes

david-leifker merged commit b4b9142 into datahub-project:master Oct 15, 2024
45 checks passed

deepgarg-visa deleted the fix_autoCompleteForMultiple_query_result branch October 17, 2024 12:40

epatotski pushed a commit to acryldata/datahub that referenced this pull request Oct 22, 2024

fix(search): make graphql query autoCompleteForMultiple to show exact…

f7da5d6

… matches first (datahub-project#11586)

sleeperdeep pushed a commit to sleeperdeep/datahub that referenced this pull request Dec 17, 2024

fix(search): make graphql query autoCompleteForMultiple to show exact…

ec25585

… matches first (datahub-project#11586)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(search): make graphql query autoCompleteForMultiple to show exact matches first #11586

fix(search): make graphql query autoCompleteForMultiple to show exact matches first #11586

deepgarg-visa commented Oct 10, 2024 •

edited

Loading

david-leifker Oct 11, 2024

deepgarg-visa Oct 11, 2024 •

edited

Loading

david-leifker Oct 14, 2024

deepgarg-visa Oct 15, 2024 •

edited

Loading

david-leifker left a comment

deepgarg-visa commented Oct 13, 2024 •

edited

Loading

david-leifker commented Oct 14, 2024

deepgarg-visa commented Oct 15, 2024 •

edited

Loading

david-leifker left a comment

deepgarg-visa commented Oct 15, 2024

fix(search): make graphql query autoCompleteForMultiple to show exact matches first #11586

fix(search): make graphql query autoCompleteForMultiple to show exact matches first #11586

Conversation

deepgarg-visa commented Oct 10, 2024 • edited Loading

Checklist

david-leifker Oct 11, 2024

Choose a reason for hiding this comment

deepgarg-visa Oct 11, 2024 • edited Loading

Choose a reason for hiding this comment

david-leifker Oct 14, 2024

Choose a reason for hiding this comment

deepgarg-visa Oct 15, 2024 • edited Loading

Choose a reason for hiding this comment

david-leifker left a comment

Choose a reason for hiding this comment

deepgarg-visa commented Oct 13, 2024 • edited Loading

david-leifker commented Oct 14, 2024

deepgarg-visa commented Oct 15, 2024 • edited Loading

david-leifker left a comment

Choose a reason for hiding this comment

deepgarg-visa commented Oct 15, 2024

deepgarg-visa commented Oct 10, 2024 •

edited

Loading

deepgarg-visa Oct 11, 2024 •

edited

Loading

deepgarg-visa Oct 15, 2024 •

edited

Loading

deepgarg-visa commented Oct 13, 2024 •

edited

Loading

deepgarg-visa commented Oct 15, 2024 •

edited

Loading