Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add emoji flags to languages #30

Open
jmsv opened this issue Jun 13, 2018 · 5 comments
Open

Add emoji flags to languages #30

jmsv opened this issue Jun 13, 2018 · 5 comments

Comments

@jmsv
Copy link
Owner

jmsv commented Jun 13, 2018

As mentioned on #25

Command line interface could have an -e flag for displaying relevant emojis alongside languages and maybe words

Very low priority feature, but might be fun to implement and use

@alxwrd
Copy link
Collaborator

alxwrd commented Jun 13, 2018

Reference: in Unicode the characters used to display flags are the code point for the capital letter, plus 127397. [source]

So chr(ord("A") + 127397) = 🇦

>>> chr(ord("G") + 127397) + chr(ord("B") + 127397)
'🇬🇧'
>>> chr(ord("F") + 127397) + chr(ord("R") + 127397)
'🇫🇷'

The iso-639-3.json already has these country codes, so could be a feature of the Langauge class.

>>> import ety
>>> fr = ety.Language("fra")
>>> fr.emoji
'🇫🇷'
  {
    "name": "French",
    "type": "living",
    "scope": "individual",
    "iso6393": "fra",
    "iso6392B": "fre",
    "iso6392T": "fra",
    "iso6391": "fr"
  }

@hugovk
Copy link
Contributor

hugovk commented Dec 30, 2018

Adding 127397 to each code letter is a neat trick (I'd never seen it before!), but there's a bit of a problem here.

ISO 3166-1 alpha-2 is for countries, and is the code used for mapping flags.

ISO 639 is for languages.

It's fine for many, like French, but there are a few which don't have the same code in each ISO.

And it's a bit tricky to choose a flag for a language, as some countries use many languages, and some languages are used by many countries. (This is also a problem in UX, see for example http://www.flagsarenotlanguages.com/blog/why-flags-do-not-represent-language/)

Demo

iso-639-3.json now only contains 3-char language codes (ISO 639-3, eg. "fra") and no longer contains the 2-char codes (ISO 639-2, eg. "fr"), so using the pycountry library (pip install pycountry) to get the ISO 639-2 from the ISO 639-3, and then assume it's ISO 3166-1 alpha-2 and returning the flag:

import pycountry
...
class Language(object):
...
    @property
    def emoji(self):
        try:
            alpha_2 = pycountry.languages.get(alpha_3=self.iso).alpha_2.upper()
            print(alpha_2)
            return chr(ord(alpha_2[0]) + 127397) + chr(ord(alpha_2[1]) + 127397)
        except AttributeError:
            return None

Then running this:

import ety
from ety.data import langs

for code in langs:
    lang = ety.Language(code)
    if lang.emoji is not None:
        print(lang.emoji, lang)

Gives:

🇦🇦 Afar
🇦🇧 Abkhazian
🇦🇫 Afrikaans
🇦🇰 Akan
🇦🇲 Amharic
🇦🇷 Arabic
🇦🇳 Aragonese
🇦🇸 Assamese
🇦🇻 Avaric
🇦🇪 Avestan
🇦🇾 Aymara
🇦🇿 Azerbaijani
🇧🇦 Bashkir
🇧🇲 Bambara
🇧🇪 Belarusian
🇧🇳 Bengali
🇧🇮 Bislama
🇧🇴 Tibetan
🇧🇸 Bosnian
🇧🇷 Breton
🇧🇬 Bulgarian
🇨🇦 Catalan
🇨🇸 Czech
🇨🇭 Chamorro
🇨🇪 Chechen
🇨🇺 Church Slavic
🇨🇻 Chuvash
🇰🇼 Cornish
🇨🇴 Corsican
🇨🇷 Cree
🇨🇾 Welsh
🇩🇦 Danish
🇩🇪 German
🇩🇻 Dhivehi
🇩🇿 Dzongkha
🇪🇱 Modern Greek (1453-)
🇪🇳 English
🇪🇴 Esperanto
🇪🇹 Estonian
🇪🇺 Basque
🇪🇪 Ewe
🇫🇴 Faroese
🇫🇦 Persian
🇫🇯 Fijian
🇫🇮 Finnish
🇫🇷 French
🇫🇾 Western Frisian
🇫🇫 Fulah
🇬🇩 Scottish Gaelic
🇬🇦 Irish
🇬🇱 Galician
🇬🇻 Manx
🇬🇳 Guarani
🇬🇺 Gujarati
🇭🇹 Haitian
🇭🇦 Hausa
🇸🇭 Serbo-Croatian
🇭🇪 Hebrew
🇭🇿 Herero
🇭🇮 Hindi
🇭🇴 Hiri Motu
🇭🇷 Croatian
🇭🇺 Hungarian
🇭🇾 Armenian
🇮🇬 Igbo
🇮🇴 Ido
🇮🇮 Sichuan Yi
🇮🇺 Inuktitut
🇮🇪 Interlingue
🇮🇦 Interlingua (International Auxiliary Language Association)
🇮🇩 Indonesian
🇮🇰 Inupiaq
🇮🇸 Icelandic
🇮🇹 Italian
🇯🇻 Javanese
🇯🇦 Japanese
🇰🇱 Kalaallisut
🇰🇳 Kannada
🇰🇸 Kashmiri
🇰🇦 Georgian
🇰🇷 Kanuri
🇰🇰 Kazakh
🇰🇲 Khmer
🇰🇮 Kikuyu
🇷🇼 Kinyarwanda
🇰🇾 Kirghiz
🇰🇻 Komi
🇰🇬 Kongo
🇰🇴 Korean
🇰🇯 Kuanyama
🇰🇺 Kurdish
🇱🇴 Lao
🇱🇦 Latin
🇱🇻 Latvian
🇱🇮 Limburgan
🇱🇳 Lingala
🇱🇹 Lithuanian
🇱🇧 Luxembourgish
🇱🇺 Luba-Katanga
🇱🇬 Ganda
🇲🇭 Marshallese
🇲🇱 Malayalam
🇲🇷 Marathi
🇲🇰 Macedonian
🇲🇬 Malagasy
🇲🇹 Maltese
🇲🇳 Mongolian
🇲🇮 Maori
🇲🇸 Malay (macrolanguage)
🇲🇾 Burmese
🇳🇦 Nauru
🇳🇻 Navajo
🇳🇷 South Ndebele
🇳🇩 North Ndebele
🇳🇬 Ndonga
🇳🇪 Nepali (macrolanguage)
🇳🇱 Dutch
🇳🇳 Norwegian Nynorsk
🇳🇧 Norwegian Bokmål
🇳🇴 Norwegian
🇳🇾 Nyanja
🇴🇨 Occitan (post 1500)
🇴🇯 Ojibwa
🇴🇷 Oriya (macrolanguage)
🇴🇲 Oromo
🇴🇸 Ossetian
🇵🇦 Panjabi
🇵🇮 Pali
🇵🇱 Polish
🇵🇹 Portuguese
🇵🇸 Pushto
🇶🇺 Quechua
🇷🇲 Romansh
🇷🇴 Romanian
🇷🇳 Rundi
🇷🇺 Russian
🇸🇬 Sango
🇸🇦 Sanskrit
🇸🇮 Sinhala
🇸🇰 Slovak
🇸🇱 Slovenian
🇸🇪 Northern Sami
🇸🇲 Samoan
🇸🇳 Shona
🇸🇩 Sindhi
🇸🇴 Somali
🇸🇹 Southern Sotho
🇪🇸 Spanish
🇸🇶 Albanian
🇸🇨 Sardinian
🇸🇷 Serbian
🇸🇸 Swati
🇸🇺 Sundanese
🇸🇼 Swahili (macrolanguage)
🇸🇻 Swedish
🇹🇾 Tahitian
🇹🇦 Tamil
🇹🇹 Tatar
🇹🇪 Telugu
🇹🇬 Tajik
🇹🇱 Tagalog
🇹🇭 Thai
🇹🇮 Tigrinya
🇹🇴 Tonga (Tonga Islands)
🇹🇳 Tswana
🇹🇸 Tsonga
🇹🇰 Turkmen
🇹🇷 Turkish
🇹🇼 Twi
🇺🇬 Uighur
🇺🇰 Ukrainian
🇺🇷 Urdu
🇺🇿 Uzbek
🇻🇪 Venda
🇻🇮 Vietnamese
🇻🇴 Volapük
🇼🇦 Walloon
🇼🇴 Wolof
🇽🇭 Xhosa
🇾🇮 Yiddish
🇾🇴 Yoruba
🇿🇦 Zhuang
🇿🇭 Chinese
🇿🇺 Zulu

Some clear mismatches:

🇦🇫 Afrikaans
🇦🇷 Arabic
🇧🇪 Belarusian
🇧🇷 Breton
🇨🇦 Catalan
🇨🇭 Chamorro
🇰🇼 Cornish
🇨🇾 Welsh
🇪🇪 Ewe
🇮🇪 Interlingue
🇸🇻 Swedish

@jmsv
Copy link
Owner Author

jmsv commented Dec 30, 2018

Are you sure they're mismatches? A few of those just look like they country code is derived from their native languages - I come from Devon, UK (next to Cornwall) so the first thing I noticed was that Cornish for 'Cornwall' is 'Kernow', which probably explains its 🇰🇼 code. Similarly, Welsh is 'Cymraeg' or something in Welsh - should explain its 🇨🇾 code.

Looking further into it, these seem to all be the two-char ISO 639-1 codes, rather than the three-char ISO 639-3 codes used by this library.

so tl;dr: your code looks good to me! feel free to PR it with a CLI arg to enable it!

@hugovk
Copy link
Contributor

hugovk commented Dec 30, 2018

I'm sure they're mismatches. Languages != countries.

"🇰🇼 Cornish"

That is not the Cornish flag, it's the flag of Kuwait.

Language ISO 639-3 alpha-3 language code ISO 639-3 alpha-2 language code Flag
Cornish cor kw
Country ISO 3166-1 alpha-2 country code Flag
Kuwait KW

"🇨🇾 Welsh"

That is not the Welsh flag, it's the flag of Cyprus.

Language ISO 639-3 alpha-3 language code ISO 639-3 alpha-2 language code Flag
Welsh cym cy
Country ISO 3166-1 alpha-2 country code Flag
Cyprus CY

"🇸🇻 Swedish"

That is not the Swedish flag, it's the flag of El Salvador.

Language ISO 639-3 alpha-3 language code ISO 639-3 alpha-2 language code Flag
Swedish swe sv
Country ISO 3166-1 alpha-2 country code Flag
El Salvador SV

@jmsv
Copy link
Owner Author

jmsv commented Dec 30, 2018

Oops sorry my mistake, you're right - for some reason I'm seeing different things on different devices: Chrome on my laptop displays letters that seem to map to ISO 639-1s and on Chrome on my phone I can see the wrong flags you mentioned 🤔

Maybe there's a free dataset somewhere mapping ISO 639-3 codes to flag emojis we could use?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants