-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ISSUE-263: Improve HL for JOINS + Phrase Linking. #264
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
@aksm @alliomeria this adds phrase linking to our advanced search, makes JOIN (OCR + ADO) highlight more realistic by shifting to OR just for the highlight (given that maybe our AND matched because some metadata matched and some OCR matched but in total ALL matched). and makes Solr 9.x aware by setting original highlight method as the default. Still. I'm STILL highlighting pieces of the OCR (means "Wild pumpkin" will link that exact phrase but .... if in the same sentence wild and pumpkin appear those will be bolded too... Solr is returning that so need to find a way of post processing so our friends and partners don't come back and ask me to do it later
DiegoPino
added
Drupal Views
JSON Integration with VIEWS
Typed Data and Search
Strawberry Flavor
Post Processing data extracted that goes into Solr
Search API
F around and find out
UI/UX
Experience
labels
Apr 6, 2023
This is important. Specially when dealing with additional conditions and negations on the actual JOIN query
Why? Because some people might want to get ONLY the exact terms passed via the input to be highlighted, and others might like the serendipity that stemming/ngram/partial words (basically trusting Solr) might provide. In any case, if the new option (setting highlight_backend_use_keys) is enabled i will do an extra effort to avoid "duplicating" words that might already be present in the input. E.g "Diego is tired" tired will remove diego \ tired as individual words from the actual linked highlight. As you requested @alliomeria !
Learn this, remember this, never forget. UNDOCUMENTED EVERYWHERE
Makes queries in chunks. Also weeeird Drupal. If the query has any processing then it is statically cached and even if "there is docs" that say that using $query->getOriginalQuery() will get me one i can reuse for a further query *in this case offseting the results * eventually the clone gets poluted. Cloning over and over will add more memory bc the results are referenced in each clone.Sooo. the trick here is to $query->setProcessingLevel(QueryInterface::PROCESSING_NONE); to avoid any static processing. This aggregation now is faster. We only get what we need. ONLY way of allowing AND queries not end excluding either Metadata or OCR bc no page will match ExACTly all queries nor will the metadata do that
the actual fields we don't want to be highlighted (e.g the aggregated one) to avoid extra backend times/efforts/processing
because we now send this as a preprocessor (for the advanced highlight) option, we already did the dead of cleaning up the fields. This is a mix of $hl->setRequireFieldMatch(TRUE); but still against a LIST of fields we are passing (allowing us for example to remove from the highlight the expensive to process aggregated field (but matching it) AND still get highlights directly from Flavors afterwards
Why caching a query result? And why does this happen sometimes. The event subscribers DON'T HAVE THIS ISSUE. IS is because the number of results is large? Anyways. Making this general. paging simply does not work if i don't add - PROCESSING NONE + $query = $query->getOriginalQuery();
…umber I believe relevancy IS the key but hey, not my repos, not my fleeeees!
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
Drupal Views
JSON Integration with VIEWS
Search API
F around and find out
Strawberry Flavor
Post Processing data extracted that goes into Solr
Typed Data and Search
UI/UX
Experience
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
see #263
@aksm @alliomeria this adds phrase linking to our advanced search, makes JOIN (OCR + ADO) highlight more realistic by shifting to OR just for the highlight (given that maybe our AND matched because some metadata matched and some OCR matched but in total ALL matched). and makes Solr 9.x aware by setting original highlight method as the default.
Still. I'm STILL highlighting pieces of the OCR (means "Wild pumpkin" will link that exact phrase but .... if in the same sentence wild and pumpkin appear those will be bolded too... Solr is returning that so need to find a way of post processing so our friends and partners don't come back and ask me to do it later
WIP, more soon