Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Automatically choose preposition-article contraction in Portuguese #294

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

1ec5
Copy link
Member

@1ec5 1ec5 commented Dec 4, 2019

Issue

In #283 (review), @xendez and @jppcel contributed new translations for Portuguese and pointed out that certain names would need a different preposition-article contraction depending on the grammatical gender of the name. This PR adapts the French grammatical system that @yuryleb implemented in #252.

I came up with the list of rules and tests by querying OpenStreetMap for name tags on highway ways in Lisbon and Rio de Janeiro. (I’m not sure if those two cities are representative of road type designations elsewhere in Lusophone countries, but we can always manually add more road type designations.) I isolated the road type designations but stripping all but first word of each multiword name, then removing duplicates, given names, and acronyms. Finally, I looked up the grammatical gender of each word in the English and Portuguese Wiktionaries and identified a limited set of lexical patterns. (Hopefully my limited Spanish didn’t bias the final rules in any way.)

Because I started with road names, these rules are somewhat unlikely to have great results against the place names that one would see in the destination and waypoint_name variables. But again it’s just a start.

A starter list of abbreviations has also been added based on some common abbreviations I found on street names in Lisbon and Rio de Janeiro.

Before merging, it’d be great to get feedback on the following:

  • Does it make sense to apply these rules to all instances of “em” and “a” followed by destination, junction_name, way_name, or waypoint_name?
  • What prepositions should we fall back on when we’re unable to determine the grammatical gender lexically? (There seems to be a lot of ambiguity in Portuguese due to etymology that isn’t obvious from the spelling.)
  • Are there any cases where a particular road type designation shouldn’t be preceded by an article?
  • Are there any adjectives that can be either gender and commonly precede the road type designation (akin to “gran” in Spanish)?

Tasklist

  • Collect road type designations
  • Look up grammatical genders
  • Devise grammar rules
  • Add changelog entry
  • Test with osrm-frontend (?)
  • Review

Requirements / Relations

Depends on #283.

/cc @danpaz

@1ec5 1ec5 requested a review from danpaz December 4, 2019 04:01
@1ec5 1ec5 self-assigned this Dec 4, 2019
languages/grammar/pt.json Outdated Show resolved Hide resolved
languages/grammar/pt.json Outdated Show resolved Hide resolved
languages/grammar/pt.json Outdated Show resolved Hide resolved
languages/grammar/pt.json Outdated Show resolved Hide resolved
@@ -110,6 +114,8 @@ var abbreviations = {
'hu': abbreviationsHu,
'lt': abbreviationsLt,
'nl': abbreviationsNl,
'pt-BR': abbreviationsPt,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And maybe also item for "generic" Portuguese 'pt': abbreviationsPt,?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR copies the abbreviations object to two keys for consistency with the translation files, but the client is expected to perform some sort of locale matching. Even if we were to set pt, the environment’s locale may be something else like pt-AO or even pt-US. There are plenty of libraries that can perform locale matching, such as locale-utils.

I’m not necessarily opposed to setting the language-only locale, but I think we’d want to do so consistently for all languages and resource types, and I’m not sure that would be feasible for a situation like zh.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually I don't see the problem with this grammar use for all Portuguese dialects - grammar expressions will match only Portuguese street names even if pt-US will be used in US with English names. Or there is a difference in Portuguese articles usage inpt-BR and pt-US?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, I think abbreviations and grammar rules should be available regardless of the specific Portuguese locale in use, but it should be up to the client to choose a country to default to. For now the two locales share the same abbreviations and grammar rules, but that wouldn’t necessarily be the case in the future for all languages, so I’d be hesitant to create an expectation that clients can look up grammars without performing locale matching first, which they have to do when getting a translated instruction.

If it’s a major inconvenience for clients to perform locale matching themselves, then we could have this library depend on locale-utils, but there are larger libraries with more robust locale matching and I wouldn’t want to force clients to use the more rudimentary logic in locale-utils.

@1ec5 1ec5 force-pushed the 1ec5-pt-2019-04-02 branch from 5971e65 to b641a6b Compare December 4, 2019 19:02
@1ec5 1ec5 changed the base branch from 1ec5-pt-2019-04-02 to master December 4, 2019 20:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants