Skip to content

mustaszewski/nkjp

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Training, development and test data for morphosyntactic tagging of Polish.

The data have been used to train POS models with ixa-pipe-ml (see https://github.com/ixa-ehu/ixa-pipe-ml)

The three data files are based on the manually annotated 1-million word subcorpus of the National Corpus of Polish (NKJP) and correspond to splits of 80% (training), 10% (development) and 10% (testing), respectively. The files are tabulator-separated plain text files with one token and its corresponding morphosyntactic tag per line, blank lines indicate sentence boundaries.

The original NKJP data is available on the GNU GPL v3.0 and can be downloaded from http://clip.ipipan.waw.pl/NationalCorpusOfPolish.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published