This tool is a spelling checker for Modern Turkish. It detects spelling errors and corrects them appropriately, through its list of misspellings and matching to the Turkish dictionary.
You can also see Cython, Java, C++, C, Swift, Js, or C# repository.
To check if you have a compatible version of Python installed, use the following command:
python -V
You can find the latest version of Python here.
Install the latest version of Git.
pip3 install NlpToolkit-SpellChecker
In order to work on code, create a fork from GitHub page. Use Git for cloning the code to your local or below line for Ubuntu:
git clone <your-fork-git-link>
A directory called SpellChecker will be created. Or you can use below link for exploring the code:
git clone https://github.com/starlangsoftware/TurkishSpellChecker-Py.git
Steps for opening the cloned project:
- Start IDE
- Select File | Open from main menu
- Choose
DataStructure-PY
file - Select open as project option
- Couple of seconds, project will be downloaded.
SpellChecker finds spelling errors and corrects them in Turkish. There are two types of spell checker available:
-
SimpleSpellChecker
-
To instantiate this, a
FsmMorphologicalAnalyzer
is needed.fsm = FsmMorphologicalAnalyzer() spellChecker = SimpleSpellChecker(fsm)
-
-
NGramSpellChecker
,-
To create an instance of this, both a
FsmMorphologicalAnalyzer
and aNGram
is required. -
FsmMorphologicalAnalyzer
can be instantiated as follows:fsm = FsmMorphologicalAnalyzer()
-
NGram
can be either trained from scratch or loaded from an existing model.-
Training from scratch:
corpus = Corpus("corpus.txt"); ngram = NGram(corpus.getAllWordsAsArrayList(), 1) ngram.calculateNGramProbabilities(LaplaceSmoothing())
There are many smoothing methods available. For other smoothing methods, check here.
-
Loading from an existing model:
ngram = NGram("ngram.txt")
-
For further details, please check here.
-
Afterwards,
NGramSpellChecker
can be created as below:spellChecker = NGramSpellChecker(fsm, ngram)
-
Spell correction can be done as follows:
sentence = Sentence("Dıktor olaç yazdı")
corrected = spellChecker.spellCheck(sentence)
print(corrected)
Output:
Doktor ilaç yazdı