Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding support for Language Identification #16

Open
gabriele-tomassetti opened this issue Mar 21, 2020 · 6 comments
Open

Adding support for Language Identification #16

gabriele-tomassetti opened this issue Mar 21, 2020 · 6 comments

Comments

@gabriele-tomassetti
Copy link
Member

Fasttext will allow to implement effective language identification, with little space and resources required. This can be useful for a lot of content that has no language and also for content that contains multiple languages.

@theolivenbaum
Copy link
Contributor

@gabriele-tomassetti @ftomassetti
I'm the maintainer of an open-source C# NLP library that has two models for language detection:
https://github.com/curiosity-ai/catalyst/
If you want I can either port the code from there, or add as a dependency to cover this need.

@gabriele-tomassetti
Copy link
Member Author

Thanks for your offer to help on this issue, too.
Honestly, I was mostly looking at this issue as an excuse to work on a NLP library, but if your library can do it better and sooner, I see no reason not to use it.

I think we only have two requirements:

  • we need to add it as a dependency, rather than port the code from there because there is no need to add other code to maintain, if we can avoid it
  • we need to make sure that the library does not require much space. Otherwise I think we would need to make it a separate nuget package for this functionality

@theolivenbaum
Copy link
Contributor

We could add it as a callback you need to provide, and just add an example on the Wiki of how to use it with Catalyst for example

@gabriele-tomassetti
Copy link
Member Author

That's a really smart idea. I will work on it.

@yeelut
Copy link

yeelut commented Dec 21, 2024

Any update on this topic?

@gabriele-tomassetti
Copy link
Member Author

Sorry for the late reply. I am going to work on it in the first quarter of the year. I will update this issue when I have something available.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants