VoiceML: <Say>¶
The <Say>
action synthesizes text into speech and plays it back on the call.
Attributes¶
The <Say>
action supports the following attributes.
Attribute | Allowed Values | Default |
---|---|---|
Voice | Voice name | Google-en-AU-Wavenet-B |
Voice¶
The Voice
attribute specifies the voice to use with text-to-speech from the available voices.
Body¶
The body of an action is the content nested within the action. The following is supported
for <Say>
.
Type | Description |
---|---|
plain text | The text to synthesize into speech. |
The maximum size of the input text is 5000 characters.
Pricing¶
Enfonica uses high quality speech synthesis provided by Google Cloud Text-to-Speech.
Tier | Pricing |
---|---|
First 10M characters | $0.0019 per 100 characters |
10M+ characters | Talk to sales |
Text-to-speech is billed at the end of a call based on how many total characters were synthesized, in blocks of 100 characters. For example, if 180 characters were used, the text-to-speech cost for that call will be $0.0038.
Using text-to-speech in place of audio URIs¶
Anywhere in VoiceML that accepts the URI to an audio file for playback is also capable of text-to-speech. Use the tts
scheme
with the URL encoded text that you want to synthesize. This allows you to use text-to-speech for attributes like
WhisperAudioUri
and ScreenAudioUri
.
"To say this with text-to-speech, use the following audio URI:"
tts:To%20say%20this%20with%20text-to-speech%2C%20use%20the%20following%20audio%20URI%3A
To specify the voice, you can use the voice
query parameter. For example:
tts:Custom%20voice?voice=Google-en-AU-Neural2-D
Examples¶
Example 1: Play some speech¶
The following example says "Hello world" on the call.
<?xml version="1.0" encoding="UTF-8"?>
<Response>
<Say>Hello world</Say>
</Response>
Example 2: Play some speech with a British accent¶
The following example says "Hello world" on the call in a British accent.
<?xml version="1.0" encoding="UTF-8"?>
<Response>
<Say Voice="Google-en-GB-Neural2-C">Hello world</Say>
</Response>