Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

replaceWordWith doesn't work if word has spelling error (underlined in red) #95

Open
dallanmc opened this issue Dec 3, 2021 · 1 comment

Comments

@dallanmc
Copy link

dallanmc commented Dec 3, 2021

As per the title, words that are underlined in red (indicating that they there is a spelling error and are not in the dictionary) will not be replaced via a replaceWordWith comment.

How to reproduce:
Create a blank document and type the following sentence:

This is a testg document

*Note that the word "test" has been deliberately misspelled.

Now add a comment and in the comment type

replaceWordWith(name)

where "name" is an attribute in your data context.

Now generate the document. You will see that the output is unchanged from the input and that the word replacement has not occurred.

Now go back to the doc, correct the spelling error (either by removing the "g" or by adding "testg" to the dictionary.
Re-run the doc processor and you will see that the word is now replaced by the value of the "name" attribute in the context.

I looked into the code and this is happening because of the proofErr elements in the xml. Docx-stamper expects the word to be immediately after the commentRangeStart and having any other element straight after, will throw it off and it will ignore the comment.

			<w:commentRangeStart w:id="0"/>
			<w:proofErr w:type="spellStart"/>
			<w:r>
				<w:t>testg</w:t>
			</w:r>
			<w:commentRangeEnd w:id="0"/>
			<w:proofErr w:type="spellEnd"/>

One potential fix is to ignore the proofErr elements when processing the comments. This can be done by making a change to the CommentUtil class, method getCommentAround

<snip>
				for (Object contentElement : parent.getContent()) {

					// ignore ProofErr elements. These indicate spelling mistakes
					if (XmlUtils.unwrap(contentElement) instanceof ProofErr) {
						continue;
					}

					// so first we look for the start of the comment
					if (XmlUtils.unwrap(contentElement) instanceof CommentRangeStart) {
						possibleComment = (CommentRangeStart) contentElement;
					}

</snip>

I have tested this and it works.

Obviously this issue is not a big deal if you are aware of the constraint (you can, after all, just add the misspelled word to the dictionary), but if someone else is authoring the templates, they will be scratching their heads at this and wondering why their word to be replaced isn't actually being replaced.

I can, of course, create a pull request for this, if people think the fix above is the correct approach.

@dallanmc
Copy link
Author

dallanmc commented Jan 18, 2022

Just an update to the above - I decided to add a pre-processor (code that processes the WordprocessingMLPackage document object before docxstamper does its thing

The proprocessor basically strips out every single ProofErr object in the document:

public class SpellCheckPreProcessor implements IPreProcessor {
    @Override
    public void process(WordprocessingMLPackage document) {

        List<ProofErr> proofErrsFromDocument = getProofErrsFromObject(document);

        for(ProofErr proofErr: proofErrsFromDocument) {
            ((ContentAccessor)proofErr.getParent()).getContent().remove(proofErr);
        }
    }
}

I also created a pre-processor to handle merging of styles. I found that certain documents I was dealing with would not process properly because the variables were split into different runs even though the style was exactly the same. The stylemergepreprocessor would go through the whole document and merge adjacent runs into a single run if their styles were the same, turning this:

						<w:r w:rsidR="007653C0">
							<w:rPr>
								<w:b/>
								<w:sz w:val="22"/>
							</w:rPr>
							<w:t>${</w:t>
						</w:r>
						<w:r w:rsidR="001B08AA">
							<w:rPr>
								<w:b/>
								<w:sz w:val="22"/>
							</w:rPr>
							<w:t>firstName}</w:t>
						</w:r>

into this:

						<w:r w:rsidR="007653C0">
							<w:rPr>
								<w:b/>
								<w:sz w:val="22"/>
							</w:rPr>
							<w:t>${firstName}</w:t>
						</w:r>

Also, the advantage of this style merge pre-processor is that it enables you to select multiple words for a replaceWordWith comment. So you can have a comment around "first name" instead of needing it to be "firstName". May not seem like a big deal, but it means that there is less explaining to do to document authors and gives them more freedom.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant