Skip to content

Commit

Permalink
Resolve part of CI issues (#1560)
Browse files Browse the repository at this point in the history
  • Loading branch information
eaidova authored Dec 19, 2023
1 parent 66d16d4 commit e846450
Show file tree
Hide file tree
Showing 4 changed files with 405 additions and 106 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@
"\n",
"Speaker diarization is the process of partitioning an audio stream containing human speech into homogeneous segments according to the identity of each speaker. It can enhance the readability of an automatic speech transcription by structuring the audio stream into speaker turns and, when used together with speaker recognition systems, by providing the speaker’s true identity. It is used to answer the question \"who spoke when?\"\n",
"\n",
"![image.png](https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/stable/_images/asr_sd_diagram.png)\n",
"![image.png](https://developer-blogs.nvidia.com/wp-content/uploads/2022/09/speaker-diarization.png)\n",
"\n",
"With the increasing number of broadcasts, meeting recordings and voice mail collected every year, speaker diarization has received much attention by the speech community. Speaker diarization is an essential feature for a speech recognition system to enrich the transcription with speaker labels.\n",
"\n",
Expand Down Expand Up @@ -49,7 +49,7 @@
},
"outputs": [],
"source": [
"%pip install -q librosa>=0.8.1 \"ruamel.yaml>=0.17.8,<0.17.29\" --extra-index-url https://download.pytorch.org/whl/cpu torch torchvision torchaudio git+https://github.com/eaidova/[email protected] openvino>=2023.1.0"
"%pip install -q \"librosa>=0.8.1\" \"matplotlib<3.8\" \"ruamel.yaml>=0.17.8,<0.17.29\" --extra-index-url https://download.pytorch.org/whl/cpu torch torchvision torchaudio git+https://github.com/eaidova/[email protected] openvino>=2023.1.0"
]
},
{
Expand Down Expand Up @@ -98,7 +98,6 @@
"```python\n",
"\n",
"## login to huggingfacehub to get access to pre-trained model\n",
"[back to top ⬆️](#Table-of-contents:)\n",
"from huggingface_hub import notebook_login, whoami\n",
"\n",
"try:\n",
Expand Down Expand Up @@ -336,7 +335,7 @@
"## Convert model to OpenVINO Intermediate Representation format\n",
"[back to top ⬆️](#Table-of-contents:)\n",
"\n",
"For best results with OpenVINO, it is recommended to convert the model to OpenVINO IR format. OpenVINO supports PyTorch via ONNX conversion. We will use `torch.onnx.export` for exporting the ONNX model from PyTorch. We need to provide initialized model's instance and example of inputs for shape inference. We will use `mo.convert_model` functionality to convert the ONNX models. The `mo.convert_model` Python function returns an OpenVINO model ready to load on the device and start making predictions. We can save it on disk for the next usage with `openvino.runtime.serialize`."
"For best results with OpenVINO, it is recommended to convert the model to OpenVINO IR format. OpenVINO supports PyTorch via ONNX conversion. We will use `torch.onnx.export` for exporting the ONNX model from PyTorch. We need to provide initialized model's instance and example of inputs for shape inference. We will use `ov.convert_model` functionality to convert the ONNX models. The `mo.convert_model` Python function returns an OpenVINO model ready to load on the device and start making predictions. We can save it on disk for the next usage with `ov.save_model`."
]
},
{
Expand Down Expand Up @@ -567,7 +566,7 @@
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
Expand All @@ -581,7 +580,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.5"
"version": "3.8.10"
},
"vscode": {
"interpreter": {
Expand All @@ -598,4 +597,4 @@
},
"nbformat": 4,
"nbformat_minor": 5
}
}
2 changes: 1 addition & 1 deletion notebooks/212-pyannote-speaker-diarization/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

Speaker diarization is the process of partitioning an audio stream containing human speech into homogeneous segments according to the identity of each speaker. It can enhance the readability of an automatic speech transcription by structuring the audio stream into speaker turns and, when used together with speaker recognition systems, by providing the speaker’s true identity. It is used to answer the question "who spoke when?". Speaker diarization is an essential feature for a speech recognition system to enrich the transcription with speaker labels.

![image.png](https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/stable/_images/asr_sd_diagram.png)
![image.png](https://developer-blogs.nvidia.com/wp-content/uploads/2022/09/speaker-diarization.png)

In this tutorial, we consider how to build speaker diarization pipeline using `pyannote.audio` and OpenVINO. `pyannote.audio` is an open-source toolkit written in Python for speaker diarization. Based on PyTorch deep learning framework, it provides a set of trainable end-to-end neural building blocks that can be combined and jointly optimized to build speaker diarization pipelines. You can find more information about pyannote pre-trained models in [model card](https://huggingface.co/pyannote/speaker-diarization), [repo](https://github.com/pyannote/pyannote-audio) and [paper](https://arxiv.org/abs/1911.01255).

Expand Down
Loading

0 comments on commit e846450

Please sign in to comment.