Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Misc #409

Closed
wants to merge 5 commits into from
Closed

Misc #409

wants to merge 5 commits into from

Conversation

ngaullier
Copy link
Contributor

Use case: output SDH TTML (basic, not EBU) and WebVTT from an EBU STL input.

Add features:

  • STL reader : disable EBU style, force position to bottom
  • TTML : fix missing xml declaration
  • WebVTT : add support for 'align'
  • misc missing documentation

@@ -376,7 +376,7 @@ def convert(args):
#
# Write out the converted file
#
tree_from_model.write(outputfile, encoding="utf-8")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The XML declaration is optional in XML 1.0, which is the default for IMSC and TTML.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, I really don't know about the standards, I check the ttml with https://github.com/skynav/ttt
The result is that, without the xml declaration, I get encoding errors on some diacritical marks and french symbols (ç ...).
Is utf-8 the default when no xml declaration exists -> do you think it is a ttt issue which fails to defaults to utf-8 ? or should the particular encoding characters be detected to trigger the xml declaration ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ngaullier Can you provide a sample file that fails TTV validation? Looking at code I wrote that uses TTV, I force UTF-8:

      args.add("--force-encoding");
      args.add("UTF-8");

There is also a known issue with BOM (skynav/ttt#193) but it is probably not relevant here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

samplefrench.zip
PYTHONPATH=src/main/python python3 src/main/python/ttconv/tt.py convert -i /mnt/d/samplefrench.stl -o /mnt/d/samplefrench.ttml

Timed Text Verifier (TTV) [7.2-SNAPSHOT] Copyright (c) 2013-21 Skynav, Inc.
[E]:Malformed US-ASCII at byte offset 815 of one byte.

On "é" of métisse:
10:00:45:06-10:00:47:07 SGN.SN.EBN.CS.VP=00.0012.FF.00.20C
[DOUBLE HEIGHT][ALPHA CYAN] <<qui est métisse,>>|| (30)

Yes, I also experienced ttv/BOM issue, but personnaly I don't like BOMs anyway!

@palemieux
Copy link
Contributor

@ngaullier Can you separate each of the features in a separate pull request?

@ngaullier
Copy link
Contributor Author

Just to confirm before to proceed: you mean exactly 4 pull requests for each my bullet points above ?

@palemieux
Copy link
Contributor

Just to confirm before to proceed: you mean exactly 4 pull requests for each my bullet points above ?

Yes, I suggest we start with WebVTT : add support for 'align'

@ngaullier
Copy link
Contributor Author

Ok, so one at a time as I read you. Anyway the ttml is currently questionnable so maybe not a good idea to start a new PR for this one at the moment.

@palemieux
Copy link
Contributor

samplefrench.zip

I am pretty sure that TTV defaults to US-ASCII.

@ngaullier
Copy link
Contributor Author

Understood, utf-8 should be the default. I will submit a new issue or PR to ttv.
I let you review the vtt at first, then I will submit the PR for STL.
Concerning the slight README updates, maybe it is better if you can handle/commit it yourself as a PR seems overkill and you may want to arrange the wording differently.
So now I think we can close this "misc" issue. Thanks for your reviews.

@palemieux
Copy link
Contributor

Concerning the slight README updates, maybe it is better if you can handle/commit it yourself as a PR seems overkill and you may want to arrange the wording differently.
So now I think we can close this "misc" issue. Thanks for your reviews.

+1

@ngaullier ngaullier closed this Nov 10, 2023
@ngaullier ngaullier deleted the misc branch March 19, 2024 09:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants