Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initial implementation for unic-ucd-unihan #225

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

eyeplum
Copy link
Member

@eyeplum eyeplum commented Jun 7, 2018

This is a partial implementation of ucd-unihan #224 .

Changed areas

  • gen
  • New subcrate ucd/unihan

Notes

One other thing to consider is - as Unihan is a CJK centric module in the Unicode standard - maybe we could make this crate an optional subcrate of the rust-unic super crate and user needs to opt-in explicitly to use it.


This change is Reviewable

@eyeplum eyeplum changed the title Sample implementation for unic-ucd-unihan Sample partial implementation for unic-ucd-unihan Jun 7, 2018
@CAD97
Copy link
Collaborator

CAD97 commented Jun 7, 2018

Failures are unrelated; I'll fix those as soon as I get a chance this weekend.

At a glance-over, this looks good; I'll do a more detailed pass this weekend.

@behnam
Copy link
Member

behnam commented Aug 20, 2018

Thanks for building this, @eyeplum!

I'm in the process of moving source data files out of the repository, to be imported as submodules. (The download+unzip scripts will be in Python, I guess.)

Having the data externally, we won't have to deal with downloading/unzipping ourselves, which would be much better for this repo. Also, allowing easier addition of other sources and models.

If you like, we can rebase and try to land this work, and drop the data retrieving parts later. Or, we can just wait for the external sources and drop the data source work from this PR. What do you think?

@behnam behnam self-requested a review August 20, 2018 06:05
@eyeplum
Copy link
Member Author

eyeplum commented Aug 20, 2018

Sure, wait for the external sources sounds like a better option 👍

@behnam behnam added C: ucd Unicode Character Database A: lib-impl Library Implementation A: lib-api Library API labels Aug 20, 2018
@LuoZijun
Copy link

LuoZijun commented Oct 4, 2018

Ping ?

@behnam
Copy link
Member

behnam commented Jan 6, 2019

In #247, I have added the complete Unicode UCD data package, under the new address: /external/unicode/ucd/data/, which also contains a Unihan/ directory with all of its txt files. (See https://github.com/open-i18n/data-unicode-ucd/tree/master/data/Unihan)

So, there's no data package anymore, and you just need to write the gen rules for the tables, and implement the algorithm in a new component.

Also, as a reminder, since the new source data files are imported as submodules, you need to do git submodule update --init to get the files, before running /etc/generate-tables.sh.

Copy link
Member

@behnam behnam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 3 of 50 files at r1.
Reviewable status: 3 of 51 files reviewed, 3 unresolved discussions (waiting on @eyeplum and @behnam)


.gitignore, line 4 at r2 (raw file):

*.rs.bk
*.rs.rustfmt
*.zip

We shouldn't need this, either, as we don't download files anymore.


data/Cargo.toml, line 22 at r2 (raw file):

# Parsing zip files in UCD
zip = "0.3"
 

Neither these.


data/sources.toml, line 23 at r2 (raw file):

"Unihan.zip"                    = "Unihan.zip"

This file is gone.

@behnam behnam added this to the UNIC-0.9 milestone Jan 6, 2019
@eyeplum
Copy link
Member Author

eyeplum commented Jan 7, 2019

Hey @behnam , thanks for the review!
I have updated the pull request to read Unihan data from ./external/unicode/ucd/data/Unihan/ and migrated related code changes to Rust 2018.

@eyeplum eyeplum changed the title Sample partial implementation for unic-ucd-unihan Partial implementations for unic-ucd-unihan Jan 7, 2019
Copy link
Member Author

@eyeplum eyeplum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 38 files reviewed, 3 unresolved discussions (waiting on @behnam and @eyeplum)


.gitignore, line 4 at r2 (raw file):

Previously, behnam (Behnam Esfahbod ❄) wrote…

We shouldn't need this, either, as we don't download files anymore.

Done.


data/Cargo.toml, line 22 at r2 (raw file):

Previously, behnam (Behnam Esfahbod ❄) wrote…
# Parsing zip files in UCD
zip = "0.3"
 

Neither these.

Done.


data/sources.toml, line 23 at r2 (raw file):

Previously, behnam (Behnam Esfahbod ❄) wrote…

This file is gone.

Done.

@eyeplum
Copy link
Member Author

eyeplum commented Mar 1, 2019

Hey @behnam , I'm planning to merge this in this weekend. Although the functionalities are very limited at the moment, I imagine this might be somewhat useful for users that needs Unihan. I'm planning to map most of the Unihan tables later this year, hopefully I will have enough time to make it happen.

As of now, before I start mapping more Unihan contents, I'm thinking about tackling the Unicode 11.0 upgrade for rust-unic first. Mainly because Unicode 12.0 is coming and I think it would be nice for us to at least update to Unicode 11.0 so future updates will be more manageable. I may need some help planning the work as there seems to be a lot involved. I will probably create a new issue so we can have more detailed discussions there.

What do you think?

unic/Cargo.toml Outdated Show resolved Hide resolved
The feature is named "unihan" and a user needs to opt-in explicitly to use it
@eyeplum
Copy link
Member Author

eyeplum commented Mar 6, 2019

Hey @behnam and @CAD97 , are you happy if I merge this in?

@eyeplum eyeplum changed the title Partial implementations for unic-ucd-unihan Initial implementation for unic-ucd-unihan Mar 6, 2019
@CAD97 CAD97 requested a review from behnam March 6, 2019 19:18
@mozillazg
Copy link

Is there any news?

@asg0451
Copy link

asg0451 commented Jul 1, 2021

Is this still being worked on?

@eyeplum
Copy link
Member Author

eyeplum commented Jul 2, 2021

@asg0451, I'm not planning to merge this anytime soon, you could try it out in my fork if you are interested in Unihan https://github.com/eyeplum/rust-unic

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A: lib-api Library API A: lib-impl Library Implementation C: ucd Unicode Character Database
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants