Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ISSUE-263: Improve HL for JOINS + Phrase Linking. #264

Merged
merged 12 commits into from
Apr 14, 2023
Merged

Conversation

DiegoPino
Copy link
Member

see #263

@aksm @alliomeria this adds phrase linking to our advanced search, makes JOIN (OCR + ADO) highlight more realistic by shifting to OR just for the highlight (given that maybe our AND matched because some metadata matched and some OCR matched but in total ALL matched). and makes Solr 9.x aware by setting original highlight method as the default.

Still. I'm STILL highlighting pieces of the OCR (means "Wild pumpkin" will link that exact phrase but .... if in the same sentence wild and pumpkin appear those will be bolded too... Solr is returning that so need to find a way of post processing so our friends and partners don't come back and ask me to do it later

WIP, more soon

@aksm @alliomeria this adds phrase linking to our advanced search, makes JOIN (OCR + ADO) highlight more realistic by shifting to OR just for the highlight (given that maybe our AND matched because some metadata matched and some OCR matched but in total ALL matched). and makes Solr 9.x aware by setting original highlight method as the default.

Still. I'm STILL highlighting pieces of the OCR (means "Wild pumpkin" will link that exact phrase but .... if in the same sentence wild and pumpkin appear those will be bolded too... Solr is returning that so need to find a way of post processing so our friends and partners don't come back and ask me to do it later
@DiegoPino DiegoPino self-assigned this Apr 6, 2023
@DiegoPino DiegoPino added Drupal Views JSON Integration with VIEWS Typed Data and Search Strawberry Flavor Post Processing data extracted that goes into Solr Search API F around and find out UI/UX Experience labels Apr 6, 2023
@DiegoPino DiegoPino added this to the 1.1.0 milestone Apr 6, 2023
DiegoPino added 11 commits April 9, 2023 17:41
This is important. Specially when dealing with additional conditions and negations on the actual JOIN query
Why? Because some people might want to get ONLY the exact terms passed via the input to be highlighted, and others might like the serendipity that stemming/ngram/partial words (basically trusting Solr) might provide. In any case, if the new option (setting highlight_backend_use_keys) is enabled i will do an extra effort to avoid "duplicating" words that might already be present in the input. E.g "Diego is tired" tired will remove diego \ tired as individual words from the actual linked highlight. As you requested @alliomeria !
Learn this, remember this, never forget. UNDOCUMENTED EVERYWHERE
Makes queries in chunks. Also weeeird Drupal. If the query has any processing then it is statically cached and even if "there is docs" that say that using $query->getOriginalQuery() will get me one i can reuse for a further query *in this case offseting the results * eventually the clone gets poluted. Cloning over and over will add more memory bc the results are referenced in each clone.Sooo.  the trick here is to $query->setProcessingLevel(QueryInterface::PROCESSING_NONE); to avoid any static processing. This aggregation now is faster. We only get what we need. ONLY way of allowing AND queries not end excluding either Metadata or OCR bc no page will match ExACTly all queries nor will the metadata do that
the actual fields we don't want to be highlighted (e.g the aggregated one) to avoid extra backend times/efforts/processing
because we now send this as a preprocessor (for the advanced highlight) option, we already did the dead of cleaning up the fields.

This is a mix of  $hl->setRequireFieldMatch(TRUE); but still against a LIST of fields we are passing (allowing us for example to remove from the highlight the expensive to process aggregated field (but matching it) AND still get highlights directly from Flavors afterwards
Why caching a query result? And why does this happen sometimes. The event subscribers DON'T HAVE THIS ISSUE. IS is because the number of results is large?
Anyways. Making this general. paging simply does not work if i don't add
- PROCESSING NONE +  $query = $query->getOriginalQuery();
…umber

I believe relevancy IS the key but hey, not my repos, not my fleeeees!
@DiegoPino DiegoPino merged commit 50144f6 into 1.1.0 Apr 14, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Drupal Views JSON Integration with VIEWS Search API F around and find out Strawberry Flavor Post Processing data extracted that goes into Solr Typed Data and Search UI/UX Experience
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant