Skip to content

VoiceML: <Stream>

The <Stream> action supports near real-time bi-directional streaming of audio with your application. You use <Stream> in combination with the enfonica.voice.v1beta1.Streams.StreamCall bi-directional streaming gRPC method.

Usage

Establishing a stream is a two-part process, where first the gRPC stream is created, and then the call is connected to that stream. Once the stream is established, audio (and some other commands) can be exchanged in both directions.

Creating the stream

The resource name of a stream is of the format projects/<Project ID>/streams/<Stream ID>, where:

  • Project ID is the name of the project that parents the stream. This must match the Project ID associated with the call that you are connecting to the stream.
  • Stream ID is an ephemeral user-generated identifier. It must be unique and can be up to 36 characters.

To create the stream, establish a bi-directional gRPC stream with the enfonica.voice.v1beta1.Streams.StreamCall method by writing a enfonica.voice.v1beta1.Streams.StreamCallRequest message to the request stream with the setup field specified. The setup field should contain the stream resource name and the requested audio configuration.

Connecting to the stream

Once you have created the stream, connect to the stream using the <Stream> action. Use the same stream resource name that you used when creating the stream. For example:

<Response>
    <Stream NextUri="stream-complete.php">projects/my-project/streams/my-stream-id-d3ba2fed</Stream>
</Response>

When the call has connected to the stream, you will receive the gRPC message enfonica.voice.v1beta1.Streams.StreamCallResponse with call_connected specified. Once you receive this message, you will receive a steady stream of responses containing audio and other events, and any audio that you send will be played back to the call.

Exchanging audio

Audio received from the call is sent to you with the gRPC message enfonica.voice.v1beta1.Streams.StreamCallResponse with output_audio specified. The audio will be encoded in the format specified during stream setup, and will be sent in chunks of a sensible size.

To send audio to the call, use the gRPC message enfonica.voice.v1beta1.Streams.StreamCallRequest with input_audio specified. Input audio will be buffered with a maximum buffer size of 1 minute. If the maximum buffer size is exceeded, the overflow will be silently discarded. If the input audio buffer is empty, the call will hear silence. When the input audio buffer transitions to an empty state, the gRPC message enfonica.voice.v1beta1.Streams.StreamCallResponse will be sent with input_audio_buffer_empty to indicate that the input buffer is empty (ie that all buffered audio has been played back to the call). To clear the input buffer, specify clear within the enfonica.voice.v1beta1.Streams.InputAudio message.

Passing data to the call

At any time during the stream, you may send the gRPC message enfonica.voice.v1beta1.Streams.StreamCallRequest with update_request_parameters. This will update the parameters map that is sent over the subsequent VoiceML request to NextUri after the stream has closed. Note that NextUri must be specified when connecting to the stream with <Stream>, otherwise these parameters will be discarded.

Closing the stream

If the call hangs up, the gRPC response stream will be closed and you should immediately close the request stream.

To stop the stream and resume VoiceML execution, close the gRPC request stream and the API will immediately close the gRPC response stream.

Attributes

The <Stream> action supports the following attributes.

Attribute Allowed Values Default
NextUri any relative or absolute URI -

NextUri

The NextUri attribute specifies the URI to redirect to after you close the stream. Any parameters specified during the stream using the enfonica.voice.v1beta1.Streams.StreamCallRequest.update_request_parameters field will be sent to this URI.

If this is not specified, then VoiceML execution will continue to execute actions specified in the previous response.

Body

The body of an action is the content nested within the action. The following is supported for <Stream>.

Type Description
plain text The resource name of the stream to connect to.