diff --git a/README.rst b/README.rst index cfb898a2..be506ba4 100644 --- a/README.rst +++ b/README.rst @@ -27,7 +27,7 @@ Library for performing speech recognition, with support for several engines and Speech recognition engine/API support: -* `CMU Sphinx `__ (works offline) +* `CMU Sphinx `__ (works offline) * Google Speech Recognition * `Google Cloud Speech API `__ * `Wit.ai `__ @@ -123,14 +123,14 @@ The installation instructions on the PyAudio website are quite good - for conven PyAudio `wheel packages `__ for common 64-bit Python versions on Windows and Linux are included for convenience, under the ``third-party/`` `directory `__ in the repository root. To install, simply run ``pip install wheel`` followed by ``pip install ./third-party/WHEEL_FILENAME`` (replace ``pip`` with ``pip3`` if using Python 3) in the repository `root directory `__. -PocketSphinx-Python (for Sphinx users) +PocketSphinx (for Sphinx users) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -`PocketSphinx-Python `__ is **required if and only if you want to use the Sphinx recognizer** (``recognizer_instance.recognize_sphinx``). +`PocketSphinx `__ is **required if and only if you want to use the Sphinx recognizer** (``recognizer_instance.recognize_sphinx``). -PocketSphinx-Python `wheel packages `__ for 64-bit Python 2.7, 3.4, and 3.5 on Windows are included for convenience, under the ``third-party/`` `directory `__. To install, simply run ``pip install wheel`` followed by ``pip install ./third-party/WHEEL_FILENAME`` (replace ``pip`` with ``pip3`` if using Python 3) in the SpeechRecognition folder. +PocketSphinx `wheel packages `__ for 64-bit Python 3.10 on Windows, Linux, and Mac OS X are available on PyPI. To install, simply run ``pip install --pre pocketsphinx``. -On Linux and other POSIX systems (such as OS X), follow the instructions under "Building PocketSphinx-Python from source" in `Notes on using PocketSphinx `__ for installation instructions. +For other versions of Python or other operating sysgems, follow the instructions under "Building PocketSphinx-Python from source" in `Notes on using PocketSphinx `__ for installation instructions. Note that the versions available in most package repositories are outdated and will not work with the bundled language data. Using the bundled wheel packages or building from source is recommended. diff --git a/reference/pocketsphinx.rst b/reference/pocketsphinx.rst index 831a7780..880f2e81 100644 --- a/reference/pocketsphinx.rst +++ b/reference/pocketsphinx.rst @@ -28,39 +28,30 @@ Here is a simple Bash script to install all of them, assuming you've downloaded Once installed, you can simply specify the language using the ``language`` parameter of ``recognizer_instance.recognize_sphinx``. For example, French would be specified with ``"fr-FR"`` and Mandarin with ``"zh-CN"``. -Building PocketSphinx-Python from source +Building PocketSphinx from source ---------------------------------------- -For Windows, it is recommended to install the precompiled Wheel packages in the ``third-party`` directory. These are provided because building Pocketsphinx on Windows requires a lot of work, and can take hours to download and install all the surrounding software. +For Windows, it is recommended to install the precompiled Wheel packages from PyPI using ``pip``. These are provided because building Pocketsphinx on Windows requires a lot of work, and can take hours to download and install all the surrounding software. -For Linux and other POSIX systems (like OS X), you'll want to build from source. It should take less than two minutes on a fast machine. +For Linux and other POSIX systems (like OS X), some precompiled packages are available, but in many cases you'll need to build from source. It should take less than two minutes on a fast machine. * On any Debian-derived Linux distributions (like Ubuntu and Mint): - 1. Run ``sudo apt-get install python python-all-dev python-pip build-essential swig git libpulse-dev libasound2-dev`` for Python 2, or ``sudo apt-get install python3 python3-all-dev python3-pip build-essential swig git libpulse-dev libasound2-dev`` for Python 3. - 2. Run ``pip install pocketsphinx`` for Python 2, or ``pip3 install pocketsphinx`` for Python 3. + 1. Run ``sudo apt-get install python3 python3-all-dev python3-pip build-essential``. + 2. Run ``pip3 install --pre pocketsphinx``. * On OS X: - 1. Run ``brew install swig git python`` for Python 2, or ``brew install swig git python3`` for Python 3. - 2. Install PocketSphinx-Python using Pip: ``pip install pocketsphinx``. + 1. Run ``brew install git python3`` for Python 3. + 2. Install PocketSphinx-Python using Pip: ``pip install --pre pocketsphinx``. * If this gives errors when importing the library in your program, try running ``brew link --overwrite python``. * On other POSIX-based systems: - 1. Install `Python `__, `Pip `__, `SWIG `__, and `Git `__, preferably using a package manager. - 2. Install PocketSphinx-Python using Pip: ``pip install pocketsphinx``. -* On Windows: - 1. Install `Python `__, `Pip `__, `SWIG `__, and `Git `__, preferably using a package manager. + 1. Install `Python `__, `Pip `__, and `Git `__, preferably using a package manager. + 2. Install PocketSphinx-Python using Pip: ``pip install --pre pocketsphinx``. +* On Windows (FIXME: it is not clear this is still correct, Visual Studio URLs change constantly): + 1. Install `Python `__ and `Pip `__ 2. Install the necessary `compilers suite `__ (`here's a PDF version `__ in case the link goes down) for compiling modules for your particular Python version: * `Microsoft Visual C++ Compiler for Python 2.7 `__ for Python 2.7. - * `Visual Studio 2015 Community Edition `__ for Python 3.5. - * The installation process for Python 3.4 is outlined in the article above. - 3. Add the folders containing the Python, SWIG, and Git binaries to your ``PATH`` environment variable. - * My ``PATH`` environment variable looks something like: ``C:\Users\Anthony\Desktop\swigwin-3.0.8;C:\Program Files\Git\cmd;(A BUNCH OF OTHER PATHS)``. - 4. Reboot to apply changes. - 5. Download the full PocketSphinx-Python source code by running ``git clone --recursive --depth 1 https://github.com/cmusphinx/pocketsphinx-python`` (downloading the ZIP archive from GitHub will not work). - 6. Run ``python setup.py install`` in the PocketSphinx-Python source code folder to compile and install PocketSphinx. - 7. Side note: when I build the precompiled Wheel packages, I skip steps 5 and 6 and do the following instead: - * For Python 2.7: ``C:\Python27\python.exe setup.py bdist_wheel``. - * For Python 3.4: ``C:\Python34\python.exe setup.py bdist_wheel``. - * For Python 3.5: ``C:\Users\Anthony\AppData\Local\Programs\Python\Python35\python.exe setup.py bdist_wheel``. - * The resulting packages are located in the ``dist`` folder of the PocketSphinx-Python project directory. + * `Visual Studio 2015 Community Edition `__ for Python 3.5 and above. + 3. Run ``pip install --pre pocketsphinx`` to download, compile and install PocketSphinx. + 4. Alternately, run ``pip wheel --pre pocketsphinx`` to download and build a wheel of PocketSphinx. Notes on the structure of the language data ------------------------------------------- @@ -80,17 +71,18 @@ Notes on building the language data from source * All of the following points assume a Debian-derived Linux Distibution (like Ubuntu or Mint). * To work with any complete, real-world languages, you will need quite a bit of RAM (16 GB recommended) and a fair bit of disk space (20 GB recommended). -* `SphinxBase `__ is needed for all language model file format conversions. We use it to convert between ``*.dmp`` DMP files (an obselete Sphinx binary format), ``*.lm`` ARPA files, and Sphinx binary ``*.lm.bin`` files: - * Install all the SphinxBase build dependencies with ``sudo apt-get install build-essential automake autotools-dev autoconf libtool``. - * Download and extract the `SphinxBase source code `__. - * Follow the instructions in the README to install SphinxBase. Basically, run ``sh autogen.sh --force && ./configure && make && sudo make install`` in the SphinxBase folder. +* `PocketSphinx `__ is needed for all language model file format conversions. We use its `pocketsphinx_lm_convert` tool to convert between ``*.dmp`` DMP files (an obselete Sphinx binary format), ``*.lm`` ARPA files, and Sphinx binary ``*.lm.bin`` files: + * Install all the build dependencies with ``sudo apt-get install build-essential cmake``. + * Download and extract the `PocketSphinx source code `__. + * Follow the instructions in the README to install PocketSphinx. Basically, run ``cmake -S . -B build && cmake --build build && sudo cmake --build build --target install`` in the PocketSphinx folder. + * Alternately you can build it in a Docker container using the provided Dockerfile, which is left as an exercise to the reader. * Pruning (getting rid of less important information) is useful if language model files are too large. We can do this using `IRSTLM `__: * Install all the IRSTLM build dependencies with ``sudo apt-get install build-essential automake autotools-dev autoconf libtool`` * Download and extract the `IRSTLM source code `__. * Follow the instructions in the README to install IRSTLM. Basically, run ``sh regenerate-makefiles.sh --force && ./configure && make && sudo make install`` in the IRSTLM folder. - * If the language model is not in ARPA format, convert it to the ARPA format. To do this, ensure that SphinxBase is installed and run ``sphinx_lm_convert -i LANGUAGE_MODEL_FILE_GOES_HERE -o language-model.lm -ofmt arpa``. + * If the language model is not in ARPA format, convert it to the ARPA format. To do this, ensure that SphinxBase is installed and run ``pocketsphinx_lm_convert -i LANGUAGE_MODEL_FILE_GOES_HERE -o language-model.lm -ofmt arpa``. * Prune the model using IRSTLM: run ``prune-lm --threshold=1e-8 t.lm pruned.lm`` to prune with a threshold of 0.00000001. The higher the threshold, the smaller the resulting file. - * Convert the model back into binary format if it was originally not in ARPA format. To do this, ensure that SphinxBase is installed and run ``sphinx_lm_convert -i language-model.lm -o LANGUAGE_MODEL_FILE_GOES_HERE``. + * Convert the model back into binary format if it was originally not in ARPA format. To do this, ensure that PocketSphinx is installed and run ``pocketsphinx_lm_convert -i language-model.lm -o LANGUAGE_MODEL_FILE_GOES_HERE``. * US English: ``/speech_recognition/pocketsphinx-data/en-US/`` is taken directly from the contents of `PocketSphinx's US English model `__. * International French: ``/speech_recognition/pocketsphinx-data/fr-FR/``: * ``/speech_recognition/pocketsphinx-data/fr-FR/language-model.lm.bin`` is ``fr-small.lm.bin`` from the `Sphinx French language model `__. @@ -98,14 +90,14 @@ Notes on building the language data from source * ``/speech_recognition/pocketsphinx-data/fr-FR/acoustic-model/`` contains all of the files extracted from ``cmusphinx-fr-5.2.tar.gz`` in the `Sphinx French acoustic model `__. * To get better French recognition accuracy at the expense of higher disk space and RAM usage: 1. Download ``fr.lm.gmp`` from the `Sphinx French language model `__. - 2. Convert from DMP (an obselete Sphinx binary format) to ARPA format: ``sphinx_lm_convert -i fr.lm.gmp -o french.lm.bin``. + 2. Convert from DMP (an obselete Sphinx binary format) to ARPA format: ``pocketsphinx_lm_convert -i fr.lm.gmp -o french.lm.bin``. 3. Replace ``/speech_recognition/pocketsphinx-data/fr-FR/language-model.lm.bin`` with ``french.lm.bin`` created in the previous step. * Mandarin Chinese: ``/speech_recognition/pocketsphinx-data/zh-CN/``: * ``/speech_recognition/pocketsphinx-data/zh-CN/language-model.lm.bin`` is generated as follows: 1. Download ``zh_broadcastnews_64000_utf8.DMP`` from the `Sphinx Mandarin language model `__. - 2. Convert from DMP (an obselete Sphinx binary format) to ARPA format: ``sphinx_lm_convert -i zh_broadcastnews_64000_utf8.DMP -o chinese.lm -ofmt arpa``. + 2. Convert from DMP (an obselete Sphinx binary format) to ARPA format: ``pocketsphinx_lm_convert -i zh_broadcastnews_64000_utf8.DMP -o chinese.lm -ofmt arpa``. 3. Prune with a threshold of 0.00000004 using ``prune-lm --threshold=4e-8 chinese.lm chinese.lm``. - 4. Convert from ARPA format to Sphinx binary format: ``sphinx_lm_convert -i chinese.lm -o chinese.lm.bin``. + 4. Convert from ARPA format to Sphinx binary format: ``pocketsphinx_lm_convert -i chinese.lm -o chinese.lm.bin``. 5. Replace ``/speech_recognition/pocketsphinx-data/zh-CN/language-model.lm.bin`` with ``chinese.lm.bin`` created in the previous step. * ``/speech_recognition/pocketsphinx-data/zh-CN/pronounciation-dictionary.dict`` is ``zh_broadcastnews_utf8.dic`` from the `Sphinx Mandarin language model `__. * ``/speech_recognition/pocketsphinx-data/zh-CN/acoustic-model/`` contains all of the files extracted from ``zh_broadcastnews_16k_ptm256_8000.tar.bz2`` in the `Sphinx Mandarin acoustic model `__. @@ -114,7 +106,7 @@ Notes on building the language data from source * ``/speech_recognition/pocketsphinx-data/it-IT/language-model.lm.bin`` is generated as follows: 1. Download ``cmusphinx-it-5.2.tar.gz`` from the `Sphinx Italian language model `__. 2. Extract ``/etc/voxforge_it_sphinx.lm`` from ``cmusphinx-it-5.2.tar.gz`` as ``italian.lm``. - 3. Convert from ARPA format to Sphinx binary format: ``sphinx_lm_convert -i italian.lm -o italian.lm.bin``. + 3. Convert from ARPA format to Sphinx binary format: ``pocketsphinx_lm_convert -i italian.lm -o italian.lm.bin``. 4. Replace ``/speech_recognition/pocketsphinx-data/it-IT/language-model.lm.bin`` with ``italian.lm.bin`` created in the previous step. * ``/speech_recognition/pocketsphinx-data/it-IT/pronounciation-dictionary.dict`` is ``/etc/voxforge_it_sphinx.dic`` from ``cmusphinx-it-5.2.tar.gz`` (from the `Sphinx Italian language model `__). * ``/speech_recognition/pocketsphinx-data/it-IT/acoustic-model/`` contains all of the files in ``/model_parameters`` extracted from ``cmusphinx-it-5.2.tar.gz`` (from the `Sphinx Italian language model `__). diff --git a/speech_recognition/__init__.py b/speech_recognition/__init__.py index 39d042af..94d83a2a 100644 --- a/speech_recognition/__init__.py +++ b/speech_recognition/__init__.py @@ -772,7 +772,7 @@ def recognize_sphinx(self, audio_data, language="en-US", keyword_entries=None, g Sphinx can also handle FSG or JSGF grammars. The parameter ``grammar`` expects a path to the grammar file. Note that if a JSGF grammar is passed, an FSG grammar will be created at the same location to speed up execution in the next run. If ``keyword_entries`` are passed, content of ``grammar`` will be ignored. - Returns the most likely transcription if ``show_all`` is false (the default). Otherwise, returns the Sphinx ``pocketsphinx.pocketsphinx.Decoder`` object resulting from the recognition. + Returns the most likely transcription if ``show_all`` is false (the default). Otherwise, returns the Sphinx ``pocketsphinx.Decoder`` object resulting from the recognition. Raises a ``speech_recognition.UnknownValueError`` exception if the speech is unintelligible. Raises a ``speech_recognition.RequestError`` exception if there are any issues with the Sphinx installation. """ @@ -782,13 +782,16 @@ def recognize_sphinx(self, audio_data, language="en-US", keyword_entries=None, g # import the PocketSphinx speech recognition module try: - from pocketsphinx import pocketsphinx, Jsgf, FsgModel - + from pocketsphinx import Jsgf, FsgModel except ImportError: raise RequestError("missing PocketSphinx module: ensure that PocketSphinx is set up correctly.") except ValueError: raise RequestError("bad PocketSphinx installation; try reinstalling PocketSphinx version 0.0.9 or better.") - if not hasattr(pocketsphinx, "Decoder") or not hasattr(pocketsphinx.Decoder, "default_config"): + try: + from pocketsphinx import Decoder + except ImportError: + raise RequestError("bad PocketSphinx installation; try reinstalling PocketSphinx version 0.0.9 or better.") + if not hasattr(Decoder, "default_config"): raise RequestError("outdated PocketSphinx installation; ensure you have PocketSphinx version 0.0.9 or better.") if isinstance(language, str): # directory containing language data @@ -808,12 +811,12 @@ def recognize_sphinx(self, audio_data, language="en-US", keyword_entries=None, g raise RequestError("missing PocketSphinx phoneme dictionary file: \"{}\"".format(phoneme_dictionary_file)) # create decoder object - config = pocketsphinx.Decoder.default_config() + config = Decoder.default_config() config.set_string("-hmm", acoustic_parameters_directory) # set the path of the hidden Markov model (HMM) parameter files config.set_string("-lm", language_model_file) config.set_string("-dict", phoneme_dictionary_file) config.set_string("-logfn", os.devnull) # disable logging (logging causes unwanted output in terminal) - decoder = pocketsphinx.Decoder(config) + decoder = Decoder(config) # obtain audio data raw_data = audio_data.get_raw_data(convert_rate=16000, convert_width=2) # the included language models require audio to be 16-bit mono 16 kHz in little-endian format