Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize search start and search end index computation while finding … #134

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

amansaryal
Copy link

While finding the current token, WordTokenizer would run through all MentionSpans in the text to determine the span index closest to the cursor on either side. This is rather wasteful and, in the case of getSearchEndIndex(), even unnecessary.

The idea is to use nextSpanTransition() to iterate sequentially till the cursor over batches of spans to get the closest last span end index in getSearchStartIndex.

For getSearchEndIndex, nextSpanTransition() does exactly what it needs without ever looking at any MentionSpans.

…token.

Earlier, the code would run through all MentionSpans in the text to determine the span index closest to the cursor on either side. This is rather wasteful and, in the case of getSearchEndIndex(), even unnecessary.

The idea is to use text.nextSpanTransition to iterate sequentially over batches of spans and break out of the loop as soon as the closest index is reached.
Comment on lines +380 to 396
// iterate over all spans before the cursor
// we do this by finding the next span transition and looking back to find the closest span to the cursor
int nextSpanStart;
for (int searchStartIndex = 0; searchStartIndex < cursor; searchStartIndex = nextSpanStart) {

// find the next span transition
nextSpanStart = text.nextSpanTransition(searchStartIndex, text.length(), MentionSpan.class);

// get the spans from searchStartIndex to nextSpanStart
// of these, we find the closest span to the cursor
MentionSpan[] closestLastSpans = text.getSpans(searchStartIndex, nextSpanStart, MentionSpan.class);
for (MentionSpan span : closestLastSpans) {
int end = text.getSpanEnd(span);
if (end > closestLastSpanEnd && end <= cursor) {
closestLastSpanEnd = end;
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure that this is actually faster given that the new code has more calls to get the spans for different substrings and to compute next span transitions (which internally will loop through all the spans, e.g. see this).

It is also worth noting that the new code is a fair bit longer/more complicated.

I'd like to understand the context for changing this. What is the reason for these changes? Are you running into measurable performance issues in your app? Have you benchmarked these methods? More info would be helpful. If we're going to make the code more complicated, we need to understand what we're getting in return.

Comment on lines -414 to -421
MentionSpan[] spans = text.getSpans(0, text.length(), MentionSpan.class);
int closestAfterCursor = text.length();
for (MentionSpan span : spans) {
int start = text.getSpanStart(span);
if (start < closestAfterCursor && start >= cursor) {
closestAfterCursor = start;
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is nifty, I didn't know about the nextSpanTransition(..) method when I wrote this many years ago, very nice! 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants