Skip to content

An example web application using the HTML5 media API JavaScript to send and receive audio from the GCP API. Demonstrates how to capture an audio device, record audio, and convert the audio into a format that GCP speech to text API will recognize, upload to GCP storage, transcribe using speech to text API, and play the response. All from a web br…

License

Notifications You must be signed in to change notification settings

prasanna-ML-expert/aws-lex-browser-audio-capture

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

49 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Speech<->Text NLP on browser using Google API

  • Record audio using HTML5 media audioControl browser API
  • Playback recorded audio on browser
  • send recorded 16KHz WAV audio to cloud storage
  • Transcribe Speech -> text
  • Use DialogFlow to get intent and derive fulfillment text
  • Transcribe text -> speech back, MP3 audio
  • Play the Mp3 audio buffer on browser

Setup

Demo

https://aqueous-dawn-66602.herokuapp.com/example/index.html

Usage

Conversation

The conversation object provides an abstraction on top of the GCP API and makes it easy to manage conversation state (Passive, Listening, Recording, Speaking) and perform silence detection.

Create the conversation object

var conversation = new LexAudio.conversation({lexConfig:{botName: 'BOT_NAME'}}, 
function (state) { // Called on each state change.
}, 
function (data) { // Called with the LexRuntime.PostContent response.
},
function (error){ // Called on error.
},
function (timeDomain) { // Called with audio time domain data (useful for rendering the recorded audio levels).
});

Start the conversation

conversation.advanceConversation();

Advances the conversation from Passive to Listening. By default, silence detection will be used to transition to Sending and the conversation will continue Listenting, Sending, and Speaking until the Dialog state is [Fulfilled] Here are the conversation state transitions.

                                       onPlaybackComplete and ElicitIntent | ConfirmIntent | ElicitSlot
                                         +--------------------------------------------------------+
                                         |                                                        |
   +---------+                     +-----v-----+                     +---------+            +----------+
   |         | advanceConversation |           | advanceConversation |         | onResponse |          |
   | Passive +-------------------> | Listening +-------------------> | Sending +----------> | Speaking |
   |         |                     |           | onSilence           |         |            |          |
   +----^----+                     +-----------+                     +---------+            +----------+
        |                                                                                         |
        +-----------------------------------------------------------------------------------------+
           onPlaybackComplete and Fulfilled | ReadyForFulfillment | Failed | no silence detection

Setting silence detection to false allows you to manually transition out of the Passive and Listening states by calling conversation.advanceConversation().

var conversation = new LexAudio.conversation({silenceDetection: false, lexConfig:{botName: 'BOT_NAME'}}, ... );

You can pass silence detection configuration values to tune the silence detection algorithm. The time value is the amount of silence to wait for (in milliseconds). The amplitude is a threshold value (between 1 and -1). Above the amplitude threshold value is considered "noise". Below the amplitude threshold value is considered "silence". Here is the complete configuration object. Everything except botName has a default value.

{
  silenceDetection: true, 
  silenceDetectionConfig: {
    time: 1500,
    amplitude: 0.2
  },
  lexConfig:{
    botName: 'BOT_NAME',
    botAlias: '$LATEST',
    contentType: 'audio/x-l16; sample-rate=16000',
    userId: 'userId',
    accept: 'audio/mpeg'
  }
}

Browser support

This example code has been tested in the latest versions of:

  • Chrome
  • Firefox
  • Safari (on macOS)

About

An example web application using the HTML5 media API JavaScript to send and receive audio from the GCP API. Demonstrates how to capture an audio device, record audio, and convert the audio into a format that GCP speech to text API will recognize, upload to GCP storage, transcribe using speech to text API, and play the response. All from a web br…

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • JavaScript 92.2%
  • HTML 2.6%
  • CSS 2.6%
  • Python 2.6%