EMT stands for Emacs MacOS Tokenizer
.
This package use macOS’s built-in NLP tokenizer to tokenize and operate on CJK words in Emacs.
- macOS 10.15 or later
- Emacs 26.1 or later, built with dynamic module support (use
--with-modules
during compilation)
If you enable emt-mode
and the module cannot be found, it will prompt whether to automatically download it from GitHub. Or you can manually retrieve the pre-built module from the releases section and place the dylib
file in the emacs-macos-tokenizer-lib-path
(by default, it is located at modules/libEMT.dylib
within your personal configuration folder, normally ~/.emacs.d/modules/libEMT.dylib
).
Current version of the dynamic module is v2.0.0, make sure you have updated to latest module.
- Install Xcode.
- Build the module using
emt-compile-module
, which compiles and copies the module toemt-lib-path
.
If you enconter the folloing error:
No such module “PackageDescription”
run the following command and try again:
sudo xcode-select --switch /Applications/Xcode.app/Contents/Developer
Install with straight
and use-package
:
(use-package emt
:straight (:host github :repo "roife/emt"
:files ("*.el" "module/*" "module"))
:hook (after-init . emt-mode))
Caches for results of tokenization if non-nil. Default is t
.
The size of LRU cache. Default is 50
.
The path to the directory of dynamic library for emt. Default is ~/.emacs.d/modules/libEMT.dylib
.
It remaps forward-word
, backward-word
, kill-word
and backward-kill-word
to use emt’s version.
It calls emt-ensure
, which load dynamic modeuls and set emt-mode-map
.
Return the word at point. If current point is at bound of a word, return the one forward.
Return the word at point. If current point is at bound of a word, return the one backward.
Compile and copy the module to emt-lib-path
.
It takes an optional argument path
, which is the path to the directory of dynamic library. By default, path
is set to emt-lib-path
.
Download dynamic module from https://github.com/roife/emt/releases/download/<VERSION>/libEMT.dylib
.
If PATH is non-nil, download the module to PATH.
Load dynamic module.
Split string into a list of words.
Return a list of word bounds (a cons of the beginning position and the ending position of a word)
CJK compatible version of forward-word
.
CJK compatible version of backward-word
.
CJK compatible version of kill-word
.
CJK compatible version of backward-kill-word
.
CJK compatible version of mark-word
.
- https://github.com/Master-Hash/ewt-rs : Tokenizer for Windows via WinRT API/ICU, also cross-platform capable.
This package is inspired by jieba.el which is a Chinese tokenizer for Emacs using jieba
.
The dynamic module uses emacs-swift-module, which provides an interface for writing Emacs dynamic modules in Swift.