-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Piping Closure compiler stderr output to Python with Unicode characters on Windows problem #4159
Comments
I cannot tell from the example Can you confirm that the input file, |
Actually, could you just attach 2 files to this issue?
|
Here are the input files: a.zip
The test case does not produce any JavaScript output from closure-compiler. Python attempts to capture the stderr error message from Closure process, but Python croaks internally since it cannot decode the stderr bytes that Closure is outputting, and so does not produce any output to the calling a.py file. Executing the following python file instead import subprocess
ret = subprocess.run(['npx', 'google-closure-compiler','--charset=UTF8','--js','a.js','--js_output_file','o.js'], encoding='iso-8859-1', stderr=subprocess.PIPE, shell=True)
print(ret.stderr) does not throw an exception, and instead causes Python to print the stderr as expected:
|
What I want to know is this: Is closure-compiler actually generating an invalid character sequence to One thing that could be happening is that the Thanks for providing the If this problem is in some way actually tied to Windows, we're unlikely to fix it ourselves as none of the core team uses Windows when working on closure-compiler. |
Thank you for supplying the
First confirm that my terminal / OS is using UTF-8 $ echo $LANG
en_US.UTF-8
$ echo á |xxd
00000000: c3a1 0a Yep. Now confirm that the character is correct in $ xxd a.js
00000000: 6966 2028 3420 3d3d 204e 614e 2920 636f if (4 == NaN) co
00000010: 6e73 6f6c 652e 6c6f 6728 27c3 a127 293b nsole.log('..');
00000020: 0d0a .. Yep. Now run the compiler with the options as described in earlier comments and save its $ java -jar $ccjar --charset=UTF8 --js a.js --js_output_file o.js 2> err.out
$ xxd err.out
00000000: 612e 6a73 3a31 3a34 3a20 5741 524e 494e a.js:1:4: WARNIN
00000010: 4720 2d20 5b4a 5343 5f53 5553 5049 4349 G - [JSC_SUSPICI
00000020: 4f55 535f 4e41 4e5d 2043 6f6d 7061 7269 OUS_NAN] Compari
00000030: 736f 6e20 6167 6169 6e73 7420 4e61 4e20 son against NaN
00000040: 6973 2061 6c77 6179 7320 6661 6c73 652e is always false.
00000050: 2044 6964 2079 6f75 206d 6561 6e20 6973 Did you mean is
00000060: 4e61 4e28 293f 0a20 2031 7c20 6966 2028 NaN()?. 1| if (
00000070: 3420 3d3d 204e 614e 2920 636f 6e73 6f6c 4 == NaN) consol
00000080: 652e 6c6f 6728 27c3 a127 293b 0d0a 2020 e.log('..');..
00000090: 2020 2020 2020 205e 5e5e 5e5e 5e5e 5e0a ^^^^^^^^.
000000a0: 0a30 2065 7272 6f72 2873 292c 2031 2077 .0 error(s), 1 w
000000b0: 6172 6e69 6e67 2873 290a arning(s). Yep. We again see "c3" and "a1" used as the 2-byte encoding in bytes at positions 0x87 and 0x88. The Java jar executing in Linux is definitely generating stderr using UTF-8 encoding. Probably the closure-compiler you're running has been converted from a jar file to a native Windows binary using Graal, because I think that's what the google/closure-compiler-npm code that generates the NPM release tries to make the default. I'm not sure if the different behavior you see is the result of Windows behavior or in the behavior of Java on Windows (as emulated by Graal), or something else. |
One simplification/note to the bug test case is that the original import subprocess
subprocess.run(['npx', 'google-closure-compiler','--charset=UTF8','--js','a.js','--js_output_file','o.js'], encoding='utf-8', stderr=subprocess.PIPE, shell=True) although this bug does not relate to import subprocess
subprocess.run(['npx', 'google-closure-compiler','--js','a.js','--js_output_file','o.js'], encoding='utf-8', stderr=subprocess.PIPE, shell=True) It is expected that the issue does not occur on Linux or macOS, since those OSes default to UTF-8 widely. In my Windows shell I have changed my active codepage to UTF-8, i.e.
See chcp 65001. Although this change does not affect the bug, so this is not a Windows terminal/console issue, but something somewhere in the libraries in question either in Closure or somewhere else like observed. We successfully worked around this in Emscripten code by specifying a directive |
STR:
a.py
a.js
generates an error
My impression here is that Closure has emitted the ISO-8859-1 encoding value of
á
to stderr, which has the hex value of0xe1
. However, theencoding='utf-8'
argument in Python expects the stderr to be printed out as UTF-8.I could not find a command line directive in https://github.com/google/closure-compiler/wiki/Flags-and-Options to help control Closure stdout/stderr output encoding.
Which encoding does Closure use for stdout/stderr printing? Is it ISO-8859-1 by intent? Or should it have been UTF-8 and Closure accidentally printed out ISO-8859-1?
The text was updated successfully, but these errors were encountered: