-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Misc #409
Misc #409
Conversation
@@ -376,7 +376,7 @@ def convert(args): | |||
# | |||
# Write out the converted file | |||
# | |||
tree_from_model.write(outputfile, encoding="utf-8") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The XML declaration is optional in XML 1.0, which is the default for IMSC and TTML.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, I really don't know about the standards, I check the ttml with https://github.com/skynav/ttt
The result is that, without the xml declaration, I get encoding errors on some diacritical marks and french symbols (ç ...).
Is utf-8 the default when no xml declaration exists -> do you think it is a ttt issue which fails to defaults to utf-8 ? or should the particular encoding characters be detected to trigger the xml declaration ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ngaullier Can you provide a sample file that fails TTV validation? Looking at code I wrote that uses TTV, I force UTF-8:
args.add("--force-encoding");
args.add("UTF-8");
There is also a known issue with BOM (skynav/ttt#193) but it is probably not relevant here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
samplefrench.zip
PYTHONPATH=src/main/python python3 src/main/python/ttconv/tt.py convert -i /mnt/d/samplefrench.stl -o /mnt/d/samplefrench.ttml
Timed Text Verifier (TTV) [7.2-SNAPSHOT] Copyright (c) 2013-21 Skynav, Inc.
[E]:Malformed US-ASCII at byte offset 815 of one byte.
On "é" of métisse:
10:00:45:06-10:00:47:07 SGN.SN.EBN.CS.VP=00.0012.FF.00.20C
[DOUBLE HEIGHT][ALPHA CYAN] <<qui est métisse,>>|| (30)
Yes, I also experienced ttv/BOM issue, but personnaly I don't like BOMs anyway!
@ngaullier Can you separate each of the features in a separate pull request? |
Just to confirm before to proceed: you mean exactly 4 pull requests for each my bullet points above ? |
Yes, I suggest we start with WebVTT : add support for 'align' |
Ok, so one at a time as I read you. Anyway the ttml is currently questionnable so maybe not a good idea to start a new PR for this one at the moment. |
I am pretty sure that TTV defaults to US-ASCII. |
Understood, utf-8 should be the default. I will submit a new issue or PR to ttv. |
+1 |
Use case: output SDH TTML (basic, not EBU) and WebVTT from an EBU STL input.
Add features: