Skip to content

Commit

Permalink
Merge branch 'master' into ruff
Browse files Browse the repository at this point in the history
  • Loading branch information
cclauss authored Dec 31, 2024
2 parents 0420286 + dd67b7c commit f4cf7bc
Show file tree
Hide file tree
Showing 28 changed files with 1,007 additions and 256 deletions.
6 changes: 3 additions & 3 deletions .github/workflows/unittests.yml
Original file line number Diff line number Diff line change
Expand Up @@ -44,16 +44,16 @@ jobs:
- name: Install Python dependencies (Ubuntu, <=3.12)
if: matrix.os == 'ubuntu-latest' && matrix.python-version != '3.13'
run: |
python -m pip install .[dev,audio,pocketsphinx,whisper-local,openai,groq]
python -m pip install .[dev,audio,pocketsphinx,google-cloud,whisper-local,faster-whisper,openai,groq]
- name: Install Python dependencies (Ubuntu, 3.13)
if: matrix.os == 'ubuntu-latest' && matrix.python-version == '3.13'
run: |
python -m pip install standard-aifc setuptools
python -m pip install --no-build-isolation .[dev,audio,pocketsphinx,openai,groq]
python -m pip install --no-build-isolation .[dev,audio,pocketsphinx,google-cloud,openai,groq]
- name: Install Python dependencies (Windows)
if: matrix.os == 'windows-latest'
run: |
python -m pip install .[dev,whisper-local,openai,groq]
python -m pip install .[dev,whisper-local,faster-whisper,google-cloud,openai,groq]
- name: Test with unittest
run: |
pytest --doctest-modules -v speech_recognition/recognizers/ tests/
10 changes: 8 additions & 2 deletions Makefile
Original file line number Diff line number Diff line change
@@ -1,6 +1,12 @@
lint:
# ignore errors for long lines and multi-statement lines
@pipx run flake8 --ignore=E501,E701,W503 .
@pipx run flake8 --ignore=E501,E701,W503 --extend-exclude .venv,venv,build --doctests .

rstcheck:
@pipx run rstcheck --ignore-directives autofunction README.rst reference/*.rst
# PyPI does not support Sphinx directives and roles
@pipx run rstcheck README.rst
@pipx run rstcheck[sphinx] --ignore-directives autofunction reference/*.rst

distribute:
@pipx run build
@pipx run twine check dist/*
24 changes: 19 additions & 5 deletions README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -97,6 +97,7 @@ To use all of the functionality of the library, you should have:
* **FLAC encoder** (required only if the system is not x86-based Windows/Linux/OS X)
* **Vosk** (required only if you need to use Vosk API speech recognition ``recognizer_instance.recognize_vosk``)
* **Whisper** (required only if you need to use Whisper ``recognizer_instance.recognize_whisper``)
* **Faster Whisper** (required only if you need to use Faster Whisper ``recognizer_instance.recognize_faster_whisper``)
* **openai** (required only if you need to use OpenAI Whisper API speech recognition ``recognizer_instance.recognize_openai``)
* **groq** (required only if you need to use Groq Whisper API speech recognition ``recognizer_instance.recognize_groq``)

Expand Down Expand Up @@ -151,14 +152,20 @@ You also have to install Vosk Models:

`Here <https://alphacephei.com/vosk/models>`__ are models avaiable for download. You have to place them in models folder of your project, like "your-project-folder/models/your-vosk-model"

Google Cloud Speech Library for Python (for Google Cloud Speech API users)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Google Cloud Speech Library for Python (for Google Cloud Speech-to-Text API users)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

`Google Cloud Speech library for Python <https://cloud.google.com/speech-to-text/docs/quickstart>`__ is required if and only if you want to use the Google Cloud Speech API (``recognizer_instance.recognize_google_cloud``).
The library `google-cloud-speech <https://pypi.org/project/google-cloud-speech/>`__ is **required if and only if you want to use Google Cloud Speech-to-Text API** (``recognizer_instance.recognize_google_cloud``).
You can install it with ``python3 -m pip install SpeechRecognition[google-cloud]``.
(ref: `official installation instructions <https://cloud.google.com/speech-to-text/docs/transcribe-client-libraries#client-libraries-install-python>`__)

If not installed, everything in the library will still work, except calling ``recognizer_instance.recognize_google_cloud`` will raise an ``RequestError``.
**Prerequisite**: Create local authentication credentials for your Google account

According to the `official installation instructions <https://cloud.google.com/speech-to-text/docs/quickstart>`__, the recommended way to install this is using `Pip <https://pip.readthedocs.org/>`__: execute ``pip install google-cloud-speech`` (replace ``pip`` with ``pip3`` if using Python 3).
* Digest: `Before you begin (Transcribe speech to text by using client libraries) <https://cloud.google.com/speech-to-text/docs/transcribe-client-libraries#before-you-begin>`__
* `Set up Speech-to-Text <https://cloud.google.com/speech-to-text/docs/before-you-begin>`__
* `User credentials (Set up ADC for a local development environment) <https://cloud.google.com/docs/authentication/set-up-adc-local-dev-environment#local-user-cred>`__

Currently only `V1 <https://cloud.google.com/speech-to-text/docs/quickstart>`__ is supported. (`V2 <https://cloud.google.com/speech-to-text/v2/docs/quickstart>`__ is not supported)

FLAC (for some systems)
~~~~~~~~~~~~~~~~~~~~~~~
Expand All @@ -173,6 +180,13 @@ Whisper is **required if and only if you want to use whisper** (``recognizer_ins

You can install it with ``python3 -m pip install SpeechRecognition[whisper-local]``.

Faster Whisper (for Faster Whisper users)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

The library `faster-whisper <https://pypi.org/project/faster-whisper/>`__ is **required if and only if you want to use Faster Whisper** (``recognizer_instance.recognize_faster_whisper``).

You can install it with ``python3 -m pip install SpeechRecognition[faster-whisper]``.

OpenAI Whisper API (for OpenAI Whisper API users)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Expand Down
4 changes: 2 additions & 2 deletions examples/audio_transcribe.py
Original file line number Diff line number Diff line change
Expand Up @@ -33,9 +33,9 @@
print("Could not request results from Google Speech Recognition service; {0}".format(e))

# recognize speech using Google Cloud Speech
GOOGLE_CLOUD_SPEECH_CREDENTIALS = r"""INSERT THE CONTENTS OF THE GOOGLE CLOUD SPEECH JSON CREDENTIALS FILE HERE"""
# Before run, create local authentication credentials (``gcloud auth application-default login``)
try:
print("Google Cloud Speech thinks you said " + r.recognize_google_cloud(audio, credentials_json=GOOGLE_CLOUD_SPEECH_CREDENTIALS))
print("Google Cloud Speech thinks you said " + r.recognize_google_cloud(audio))
except sr.UnknownValueError:
print("Google Cloud Speech could not understand audio")
except sr.RequestError as e:
Expand Down
4 changes: 2 additions & 2 deletions examples/extended_results.py
Original file line number Diff line number Diff line change
Expand Up @@ -37,10 +37,10 @@
print("Could not request results from Google Speech Recognition service; {0}".format(e))

# recognize speech using Google Cloud Speech
GOOGLE_CLOUD_SPEECH_CREDENTIALS = r"""INSERT THE CONTENTS OF THE GOOGLE CLOUD SPEECH JSON CREDENTIALS FILE HERE"""
# Before run, create local authentication credentials (``gcloud auth application-default login``)
try:
print("Google Cloud Speech recognition results:")
pprint(r.recognize_google_cloud(audio, credentials_json=GOOGLE_CLOUD_SPEECH_CREDENTIALS, show_all=True)) # pretty-print the recognition result
pprint(r.recognize_google_cloud(audio, show_all=True)) # pretty-print the recognition result
except sr.UnknownValueError:
print("Google Cloud Speech could not understand audio")
except sr.RequestError as e:
Expand Down
4 changes: 2 additions & 2 deletions examples/microphone_recognition.py
Original file line number Diff line number Diff line change
Expand Up @@ -32,9 +32,9 @@
print("Could not request results from Google Speech Recognition service; {0}".format(e))

# recognize speech using Google Cloud Speech
GOOGLE_CLOUD_SPEECH_CREDENTIALS = r"""INSERT THE CONTENTS OF THE GOOGLE CLOUD SPEECH JSON CREDENTIALS FILE HERE"""
# Before run, create local authentication credentials (``gcloud auth application-default login``)
try:
print("Google Cloud Speech thinks you said " + r.recognize_google_cloud(audio, credentials_json=GOOGLE_CLOUD_SPEECH_CREDENTIALS))
print("Google Cloud Speech thinks you said " + r.recognize_google_cloud(audio))
except sr.UnknownValueError:
print("Google Cloud Speech could not understand audio")
except sr.RequestError as e:
Expand Down
6 changes: 3 additions & 3 deletions examples/special_recognizer_features.py
Original file line number Diff line number Diff line change
Expand Up @@ -35,11 +35,11 @@


# recognize preferred phrases using Google Cloud Speech
GOOGLE_CLOUD_SPEECH_CREDENTIALS = r"""INSERT THE CONTENTS OF THE GOOGLE CLOUD SPEECH JSON CREDENTIALS FILE HERE"""
# Before run, create local authentication credentials (``gcloud auth application-default login``)
try:
print("Google Cloud Speech recognition for \"numero\" with different sets of preferred phrases:")
print(r.recognize_google_cloud(audio_fr, credentials_json=GOOGLE_CLOUD_SPEECH_CREDENTIALS, preferred_phrases=["noomarow"]))
print(r.recognize_google_cloud(audio_fr, credentials_json=GOOGLE_CLOUD_SPEECH_CREDENTIALS, preferred_phrases=["newmarrow"]))
print(r.recognize_google_cloud(audio_fr, preferred_phrases=["noomarow"]))
print(r.recognize_google_cloud(audio_fr, preferred_phrases=["newmarrow"]))
except sr.UnknownValueError:
print("Google Cloud Speech could not understand audio")
except sr.RequestError as e:
Expand Down
36 changes: 11 additions & 25 deletions reference/library-reference.rst
Original file line number Diff line number Diff line change
Expand Up @@ -227,20 +227,10 @@ Returns the most likely transcription if ``show_all`` is false (the default). Ot

Raises a ``speech_recognition.UnknownValueError`` exception if the speech is unintelligible. Raises a ``speech_recognition.RequestError`` exception if the speech recognition operation failed, if the key isn't valid, or if there is no internet connection.

``recognizer_instance.recognize_google_cloud(audio_data: AudioData, credentials_json: Union[str, None] = None, language: str = "en-US", preferred_phrases: Union[Iterable[str], None] = None, show_all: bool = False) -> Union[str, Dict[str, Any]]``
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
``recognizer_instance.recognize_google_cloud(audio_data: AudioData, credentials_json_path: Union[str, None] = None, **kwargs) -> Union[str, Dict[str, Any]]``
-------------------------------------------------------------------------------------------------------------------------------------------------------------

Performs speech recognition on ``audio_data`` (an ``AudioData`` instance), using the Google Cloud Speech API.

This function requires a Google Cloud Platform account; see the `Google Cloud Speech API Quickstart <https://cloud.google.com/speech/docs/getting-started>`__ for details and instructions. Basically, create a project, enable billing for the project, enable the Google Cloud Speech API for the project, and set up Service Account Key credentials for the project. The result is a JSON file containing the API credentials. The text content of this JSON file is specified by ``credentials_json``. If not specified, the library will try to automatically `find the default API credentials JSON file <https://developers.google.com/identity/protocols/application-default-credentials>`__.

The recognition language is determined by ``language``, which is a BCP-47 language tag like ``"en-US"`` (US English). A list of supported language tags can be found in the `Google Cloud Speech API documentation <https://cloud.google.com/speech/docs/languages>`__.

If ``preferred_phrases`` is an iterable of phrase strings, those given phrases will be more likely to be recognized over similar-sounding alternatives. This is useful for things like keyword/command recognition or adding new phrases that aren't in Google's vocabulary. Note that the API imposes certain `restrictions on the list of phrase strings <https://cloud.google.com/speech/limits#content>`__.

Returns the most likely transcription if ``show_all`` is False (the default). Otherwise, returns the raw API response as a JSON dictionary.

Raises a ``speech_recognition.UnknownValueError`` exception if the speech is unintelligible. Raises a ``speech_recognition.RequestError`` exception if the speech recognition operation failed, if the credentials aren't valid, or if there is no Internet connection.
.. autofunction:: speech_recognition.recognizers.google_cloud.recognize

``recognizer_instance.recognize_wit(audio_data: AudioData, key: str, show_all: bool = False) -> Union[str, Dict[str, Any]]``
----------------------------------------------------------------------------------------------------------------------------
Expand Down Expand Up @@ -300,29 +290,25 @@ Returns the most likely transcription if ``show_all`` is false (the default). Ot

Raises a ``speech_recognition.UnknownValueError`` exception if the speech is unintelligible. Raises a ``speech_recognition.RequestError`` exception if the speech recognition operation failed, if the key isn't valid, or if there is no internet connection.

``recognizer_instance.recognize_whisper(audio_data: AudioData, model: str="base", show_dict: bool=False, load_options: Dict[Any, Any]=None, language:Optional[str]=None, translate:bool=False, **transcribe_options):``
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Performs speech recognition on ``audio_data`` (an ``AudioData`` instance), using Whisper.

The recognition language is determined by ``language``, an uncapitalized full language name like "english" or "chinese". See the full language list at https://github.com/openai/whisper/blob/main/whisper/tokenizer.py

model can be any of tiny, base, small, medium, large, tiny.en, base.en, small.en, medium.en. See https://github.com/openai/whisper for more details.
``recognizer_instance.recognize_whisper(audio_data: AudioData, model: str="base", show_dict: bool=False, load_options=None, **transcribe_options)``
---------------------------------------------------------------------------------------------------------------------------------------------------

If show_dict is true, returns the full dict response from Whisper, including the detected language. Otherwise returns only the transcription.
.. autofunction:: speech_recognition.recognizers.whisper_local.whisper.recognize

You can translate the result to english with Whisper by passing translate=True
``recognizer_instance.recognize_faster_whisper(audio_data: AudioData, model: str="base", show_dict: bool=False, **transcribe_options)``
---------------------------------------------------------------------------------------------------------------------------------------

Other values are passed directly to whisper. See https://github.com/openai/whisper/blob/main/whisper/transcribe.py for all options
.. autofunction:: speech_recognition.recognizers.whisper_local.faster_whisper.recognize

``recognizer_instance.recognize_openai(audio_data: AudioData, model = "whisper-1", **kwargs)``
----------------------------------------------------------------------------------------------

.. autofunction:: speech_recognition.recognizers.openai.recognize
.. autofunction:: speech_recognition.recognizers.whisper_api.openai.recognize

``recognizer_instance.recognize_groq(audio_data: AudioData, model = "whisper-large-v3-turbo", **kwargs)``
---------------------------------------------------------------------------------------------------------

.. autofunction:: speech_recognition.recognizers.groq.recognize
.. autofunction:: speech_recognition.recognizers.whisper_api.groq.recognize

``AudioSource``
---------------
Expand Down
6 changes: 6 additions & 0 deletions setup.cfg
Original file line number Diff line number Diff line change
@@ -1,17 +1,23 @@
[options.extras_require]
dev =
numpy
pytest
pytest-randomly
respx
rstcheck
ruff

audio =
PyAudio >= 0.2.11
pocketsphinx =
pocketsphinx < 5
google-cloud =
google-cloud-speech
whisper-local =
openai-whisper
soundfile
faster-whisper =
faster-whisper
openai =
httpx < 0.28
openai
Expand Down
Loading

0 comments on commit f4cf7bc

Please sign in to comment.