Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change default locale to an UTF-8 one #26

Open
jitakirin opened this issue Jan 17, 2018 · 4 comments
Open

Change default locale to an UTF-8 one #26

jitakirin opened this issue Jan 17, 2018 · 4 comments

Comments

@jitakirin
Copy link

I came across #12 and I understand why additional locales are not installed, however..

Would it be possible to change the default locale from POSIX to C.UTF-8? The current default causes problems with python programs, for example:

#!/usr/bin/env python3
print('↗')

when run will produce:

Traceback (most recent call last):
  File "./z.py", line 2, in <module>
    print('\u2197')
UnicodeEncodeError: 'ascii' codec can't encode character '\u2197' in position 0: ordinal not in range(128)
@davidcassany
Copy link
Contributor

@jitakirin Which is the base image that are you using?

I think the easiest way to work around that is to use the PYTHONIOENCODING=UTF-8 environment variable, which will ensure that UTF-8 is used regardless of what python guesses from the system. In the official library the python images make use of the LANG=C.UTF-8 environment variable, however I have tested it with tumbleweed base image and it appears not to be sufficient even running locale in the image looks quite good. I have the feeling this is more a python issue to be solved within python images rather than in base images. I suggest you to just use

ENV PYTHONIOENCODING UTF-8

In your dockerfile or include it in the tool you use to create any derived image.

Also, I am a bit concerned about setting some kind of a default locale in a container, since distros are starting to make use of systemd tools in order to set locales and, at least at the moment, we don't expect to have systemd running within the container (i.e. localectl command won't work in our base images).

Hope it helps

@cyphar
Copy link
Member

cyphar commented Jan 18, 2018

I believe this is related to this known Python issue. Not sure if there's much we can really do (it's not clear how we should set the locale and whether/how it should be inherited from the host it's running on).

@jitakirin
Copy link
Author

@davidcassany Sorry, I should've mentioned, I use opensuse:tumbleweed image. Setting PYTHONIOENCODING=UTF-8 actually works without doing anything else (like installing locales), which makes my sceptical mind think it's too good to be true and that there's probably a flip side to it 😉 but so far I cannot find anything, so hey, I think I'll go with that at the moment 😀

@cyphar I think you are spot on, it appears to be because python detects the encoding as ascii from the locale (or lack thereof), i.e.:

$ python3 -c "import sys; print(sys.getfilesystemencoding())"
ascii

Setting PYTHONIOENCODING overrides that guess.

One thing I'm still wondering about - with PEP 538 implemented, python would indeed attempt to alleviate this, but I'm not sure if it would work with the current tumbleweed image. IIUC Python 3.7 will essentially attempt to set LC_CTYPE to a UTF-8 based locale if one is available. While I can see that (in an image with glibc-locale installed):

$ LC_CTYPE=C.UTF-8 python3 -c "import sys; print(sys.getfilesystemencoding())"
utf-8
$ LC_CTYPE=C.UTF-8 python3 -c 'print("\342\206\227")'                         
â

This doesn't work in base tumbleweed as it appears C.UTF-8 is not there:

$ LC_CTYPE=C.UTF-8 python3 -c "import sys; print(sys.getfilesystemencoding())"
ascii
$ LC_CTYPE=C.UTF-8 python3 -c 'print("\342\206\227")'
Traceback (most recent call last):
  File "<string>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-2: ordinal not in range(128)

and:

LC_CTYPE=C.UTF-8 locale                                                                                                                                                                                                                                      
locale: Cannot set LC_CTYPE to default locale: No such file or directory                                                                                                                                                                                               
locale: Cannot set LC_ALL to default locale: No such file or directory                                                                                                                                                                                                 
LANG=                                                                                                                                                                                                                                                                  
LC_CTYPE=C.UTF-8
LC_NUMERIC="POSIX"
...
LC_ALL=

So I suppose one final question is, shouldn't C.UTF-8 at least be part of the base image?

@davidcassany
Copy link
Contributor

So I suppose one final question is, shouldn't C.UTF-8 at least be part of the base image?

@jitakirin we are currently discussing it, this is not obvious since this C.UTF-8 (and all others) locale comes from glibc-locale package which is not part and will not be part of the docker base images. The problem with this package is that is damn heavy (about 117MB), so now we are currently discussing several options to include some UTF-8 locale in the base image.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants