-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(search): make graphql query autoCompleteForMultiple to show exact matches first #11586
fix(search): make graphql query autoCompleteForMultiple to show exact matches first #11586
Conversation
…ry_result # Conflicts: # metadata-io/src/main/java/com/linkedin/metadata/search/elasticsearch/query/request/AutocompleteRequestHandler.java
if (field != null && !field.isEmpty()) { | ||
return ImmutableList.of(field); | ||
return ImmutableList.of(Pair.of(field, "2.0")); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is hardcoded to 2.0 and looks like it treats URN matches and non-URN matches the same. Previously it weighted URNs higher which probably doesn't make sense for auto-complete purposes, but now the conditional on line 220 is the same for both conditions. So we should probably simplify the conditional and likely remove the boostScore entirely?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I checked, we can't remove the boostscore entirely. We need to have higher boost score for ngram fields for non-urn fields in "multi_match" clause to get the expected result. I would suggest that we can have boostscore for non-urn fields from their respective searchable spec(which I already did) and urn field can have only default boost score (1.0), as you have rightly mention that it doesn't make sense for auto-complete purposes.
Please find the expected multi_match clause
{ "multi_match": { "query": "aucdl", "fields": [ "name.delimited^1.0", "name.ngram^10.0", "name.ngram._2gram^10.0", "name.ngram._3gram^10.0", "name.ngram._4gram^10.0", "urn.delimited^1.0", "urn.ngram^1.0", "urn.ngram._2gram^1.0", "urn.ngram._3gram^1.0", "urn.ngram._4gram^1.0" ], "type": "phrase", "operator": "OR", "slop": 0, "prefix_length": 0, "max_expansions": 50, "zero_terms_query": "NONE", "auto_generate_synonyms_phrase_query": true, "fuzzy_transpositions": true, "boost": 1.0 } }
In case of function at line 266, we can put condition to have default boostscore of 10.0 for non-urn field and boostscore of 1.0 for urn field ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, there are two things to consider here, the relative weights between the fields being used for the query AND the relative weights of the subfields (delimited, ngram, etc). Most likely we should be applying a multiplier. The URN while important as a unique identifier for general search is not as important here so the urn field weight is likely 1.0.
Next lets consider a multiplier for ngram. I am not seeing these being applied ->
wordGram: |
The field weights for ngrams should be impacted by that configuration and it doesn't appear to be based on. the example query above.
Generally the weight formula for ngram subfields should be something like:
<field annotation weight>
* <configuration ngram factor>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @david-leifker, As suggested done the following changes
- add "match" clause for exact matches
- make URN field default boostscore to 1.0
- make ngram subfields weight as per the formula suggested
- add a testcase in class "SampleDataFixtureTestBase.java"
- add testdata in containerindex_v2.json.gz
- reverted to "bool_prefix" type in "multi_match" clause
- simplify the condition at line 222
Please find the final query
{ "multi_match": { "query": "container", "fields": [ "name.delimited^1.0", "name.ngram^10.0", "name.ngram._2gram^12.0", "name.ngram._3gram^15.0", "name.ngram._4gram^18.0", "urn.delimited^1.0", "urn.ngram^1.0", "urn.ngram._2gram^1.0", "urn.ngram._3gram^1.0", "urn.ngram._4gram^1.0" ], "type": "bool_prefix", "operator": "OR", "slop": 0, "prefix_length": 0, "max_expansions": 50, "zero_terms_query": "NONE", "auto_generate_synonyms_phrase_query": true, "fuzzy_transpositions": true, "boost": 1 } }, { "match": { "name.keyword": { "query": "container", "operator": "OR", "prefix_length": 0, "max_expansions": 50, "fuzzy_transpositions": true, "lenient": false, "zero_terms_query": "NONE", "auto_generate_synonyms_phrase_query": true, "boost": 10 } } }
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just the one comment and wanted to point out the 3 failing tests (x2 for elasticsearch & opensearch) which likely need to be updated or taken into consideration.
metadata-io-search > elasticsearch-testcontainers > com.linkedin.metadata.search.elasticsearch.SampleDataFixtureElasticSearchTest > testGroupAutoComplete FAILED
java.lang.AssertionError: Expected 1 results for `T` found 0 expected [true] but found [false]
at org.testng.Assert.fail(Assert.java:111)
at org.testng.Assert.failNotEquals(Assert.java:1578)
at org.testng.Assert.assertTrue(Assert.java:57)
at com.linkedin.metadata.search.fixtures.SampleDataFixtureTestBase.lambda$testGroupAutoComplete$23(SampleDataFixtureTestBase.java:915)
at java.base/java.lang.Iterable.forEach(Iterable.java:75)
at com.linkedin.metadata.search.fixtures.SampleDataFixtureTestBase.testGroupAutoComplete(SampleDataFixtureTestBase.java:909)
...
metadata-io-search > elasticsearch-testcontainers > com.linkedin.metadata.search.elasticsearch.SampleDataFixtureElasticSearchTest > testGroupAutoComplete FAILED
java.lang.AssertionError: Expected 1 results for `T` found 0 expected [true] but found [false]
at org.testng.Assert.fail(Assert.java:111)
at org.testng.Assert.failNotEquals(Assert.java:1578)
at org.testng.Assert.assertTrue(Assert.java:57)
at com.linkedin.metadata.search.fixtures.SampleDataFixtureTestBase.lambda$testGroupAutoComplete$23(SampleDataFixtureTestBase.java:915)
...
metadata-io-search > elasticsearch-testcontainers > com.linkedin.metadata.search.elasticsearch.SampleDataFixtureElasticSearchTest > testUserAutoComplete FAILED
java.lang.AssertionError: Expected at least 1 results for `D` found 0 expected [true] but found [false]
at org.testng.Assert.fail(Assert.java:111)
at org.testng.Assert.failNotEquals(Assert.java:1578)
at org.testng.Assert.assertTrue(Assert.java:57)
at com.linkedin.metadata.search.fixtures.SampleDataFixtureTestBase.lambda$testUserAutoComplete$24(SampleDataFixtureTestBase.java:935)
at java.base/java.lang.Iterable.forEach(Iterable.java:75)
at com.linkedin.metadata.search.fixtures.SampleDataFixtureTestBase.testUserAutoComplete(SampleDataFixtureTestBase.java:929)
912 tests completed, 6 failed, 71 skipped
> Task :metadata-io:test FAILED
…y_result' into fix_autoCompleteForMultiple_query_result
Hi @david-leifker , "phrase" type in multi_match clause is not handling query with one and two characters very well. More than that its working fine. Fix the testcases accordingly. But needs your input regarding this |
@deepgarg-visa - those test cases are important, autocomplete needs to properly handle < 3 character queries and is responsible for this function for type-ahead use cases in the UI. We need a solution which supports less than 3 characters and this may involve combining both of the queries (the previous one with the new one) so that both use-cases can work. |
@david-leifker, I have reverted the changes in testcases |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great. Thank you!
@david-leifker, could you please mark this issue resolved: |
This PR is targeting the below Issue
#11411
Checklist