OCI Speech Demos

Introduction.

OCI Speech is an OCI AI service that applies automatic speech recognition technology to transform audio-based content into text. Developers can easily make API calls to integrate OCI Speech’s pre-trained models into their applications. OCI Speech can be used for accurate, text-normalized, time-stamped transcription via the console and REST APIs as well as command-line interfaces or SDKs. You can also use OCI Speech in an OCI Data Science notebook session. With OCI Speech, you can filter profanities, get confidence scores for both single words and complete transcriptions, and more.

Contents.

This repository contains all the work done for demos on OCI Speech Service

Using OCI Speech Service you can easily get accurate transcription of speech contained in audio files.

You can take a set of audio files (wav, flac format), upload these files to a bucket in the OCI Object Storage and get json files containing the transcriptions, in few minutes.

In this repository you will find examples and demos showing how to use OCI Python SDK to easily transcribe the audio files.

Languages Supported

OCI Speech support the following languages:

English
US English
Spanish
Italian
French
German
Hindi
Portuguese

Demos in this repository have been tested using: English, Italian languages.

Input format

OCI Speech supports not only wav format, but also: mp3, ogg, oga, webm, ac3, aac, mp4a, flac, amr.

Demos in this repository have been tested using wav, flac format.

Demos:

demo1: command line demo, takes a list of wav/flac files from a local directory, transcribe the audio and output the result to the screen and csv
demo2: a UI, built with Streamlit, enables you to upload a set of audio files and get back the trascriptions; Supports wav and flac formats.
demo3: Compute WER.

I have provided shell file (.sh) to show how to correctly launch the demos.

Demo Features

In demo1 you can see how-to:

copy a set of wav files to Object Storage
launch an OCI Speech transcription job
wait for the job to complete
extract the transcription from the produced json files.
save transcriptions to csv

In demo2 you can see how-to:

create a UI for OCI Speech, using Streamlit
launch a transcription job
extract the transcription from the produced json files.

In SpeechClient:

how-to wait for the job to complete

In utils:

check audio file sampling rate
clean a remote bucket
copy files to/from Object Storage

Sampling rate

For all languages 16 Khz is supported.
For some languages (english, spanish...) it is also supported 8 Khz.

If you want to check the sampling rate of your files, you can use the utility provided here.

Configuration

To be able to use OCI Speech and the demos provided some configuration is needed.

If you want to launch the demo from your laptop you need to have created the keys-pair, to be setup in $HOME/.oci directory

For more details on the needed configuration, see the Wiki.

Dependencies

oci
ocifs
Streamlit
soundfile
tqdm
Pandas

The steps needed to create a dedicated conda environment are listed in the Wiki page.

Quotes

The sentence that you see last in the picture is from the book "La misura del tempo", G. Carofiglio, p. 119 in the italian edition.

References.

This repository has been linked in the Oracle Technology Organization GitHub repo. See here

Name		Name	Last commit message	Last commit date
Latest commit History 59 Commits
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
check_sample_rate.py		check_sample_rate.py
config.py		config.py
demo1.py		demo1.py
demo2.py		demo2.py
demo3.py		demo3.py
get_job.py		get_job.py
launch_demo1.sh		launch_demo1.sh
launch_demo2.sh		launch_demo2.sh
launch_demo3.sh		launch_demo3.sh
oracle.png		oracle.png
parse_json.py		parse_json.py
process_librispeech.py		process_librispeech.py
speech_client.py		speech_client.py
ui_printscreen.png		ui_printscreen.png
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

OCI Speech Demos

Introduction.

Contents.

Languages Supported

Input format

Demos:

Demo Features

Sampling rate

Configuration

Dependencies

Quotes

References.

About

Releases

Packages

Languages

License

luigisaetta/oci-speech-demos

Folders and files

Latest commit

History

Repository files navigation

OCI Speech Demos

Introduction.

Contents.

Languages Supported

Input format

Demos:

Demo Features

Sampling rate

Configuration

Dependencies

Quotes

References.

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages