Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

draft: parser interpretation improvements #1487

Draft
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

missinglink
Copy link
Member

@missinglink missinglink commented Sep 1, 2020

draft: parser interpretation improvements

this will be paired with changes to pelias/parser which will need to be merged first.

cases.push(['Kaschk Be', { subject: 'Kaschk Be' }, true]);
cases.push(['Kaschk Ber', { subject: 'Kaschk Ber' }, true]);
cases.push(['Kaschk Berl', { subject: 'Kaschk Berl' }, true]);
cases.push(['Kaschk Berli', { subject: 'Kaschk Berli' }, true]);
Copy link
Member Author

@missinglink missinglink Sep 1, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to test that Pelias still finds the venue with partially complete city name as before:

Screenshot 2020-09-01 at 11 11 34

// this constant defines a lower boundary for the solution score returned
// by the Pelias parser. Any solutions which scored lower than this value
// will simply have their entire body returned as the $subject
const MIN_ACCEPTABLE_SCORE = 0.3;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Users would probably appreciate this being tuneable via a config option.

@missinglink
Copy link
Member Author

Paired with pelias/parser#120 these tests all pass 🎉

@@ -100,33 +142,31 @@ function parse (clean) {
.map((c, i) => (mask[i] !== 'P') ? c : ' ')
.join('');

// same as $body above but with consecutive whitespace squashed and trimmed.
const normalizedBody = t.section.map(sp => sp.body).join(' ').replace(/\s+/g, ' ').trim();
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not quite the same as above since it includes postalcode classifications.
Need to give this some more consideration before merging 🤔

@missinglink
Copy link
Member Author

added an additional commit ac26263 to improve parsing of addresses where the unit number was previously being included in the $admin section of the parse:

Screenshot 2020-09-03 at 17 06 57

@missinglink missinglink force-pushed the interpreting_pelias_parser branch from ac26263 to 639b3a9 Compare September 21, 2020 12:19
@missinglink
Copy link
Member Author

rebased master

@missinglink
Copy link
Member Author

linking pelias/pelias#894 as it may be covered by this work

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant