Replies: 1 comment 1 reply
-
hey ryan, yeah i agree this seems pretty brittle. The assumption in compromise is that the text is correct, but maybe we should allow for sloppier inputs, when the meaning here is clear. One thing you can do is swap-out the sentence tokenizer with a more tolerant one. the sentence tokenizer is here if you wanted to fork it. not sure what the best approach is. |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
I have some text where the user has accidentally left out a space between punctuation and words. Is Compromise able to detect words separated like this?
For example:
Hello.Welcome.
I have apples,bananas and pears.
Example code:
The result of
nlpProcessedText.document
is:Then if I wanted to use the match() function to search for the word "Welcome", it cannot find it because it only recognises "Hello.Welcome" as a word.
Is this a bug or maybe is there a way to use Compromise to extract the separate words from this example text?
Beta Was this translation helpful? Give feedback.
All reactions