Skip to content

VoiceML requests

When Cloud Voice needs instructions for a call, it sends a VoiceML request to one of the handler URIs configured for that call. The request is an HTTP POST with a JSON body. Your server replies with a VoiceML XML document, and Cloud Voice executes the actions in that document on the call. When an action that has a NextUri finishes, Cloud Voice sends the next VoiceML request, and the cycle repeats until the call ends.

Explaining the difference between VoiceML requests and webhooks

VoiceML requests are synchronous request/response: your server must reply with a VoiceML XML document, and Cloud Voice acts on that response. They are sent to the call's handler URIs — for incoming calls, the URI configured on the phone number; for outgoing calls, the handlerUris field of Call.options.

Voice Webhooks are fire-and-forget event notifications — you return any 2xx status code, and the response body is ignored. They are sent for events such as CALL_STATE_UPDATE, FAX_READY, RECORDING_READY, and TRANSCRIPTION_READY.

Request

Cloud Voice sends a VoiceML request as:

  • Method: POST
  • Headers:
    • Content-Type: application/json; charset=utf-8
    • X-Enfonica-Event: CALL
    • X-Enfonica-Signature: HMAC-SHA256 signature for authenticity verification. The signing procedure is the same as for voice webhooks.
  • Body: a JSON-serialized enfonica.voice.v1beta1.CallRequest.

Request body

The body has two fields:

Field Description
call A JSON-serialized enfonica.voice.v1beta1.Call representing the call at the time of the request. The state field reflects the call's state at the moment Cloud Voice issued the request — for example, STARTING for an incoming call that has not yet been answered, or IN_PROGRESS after an <Input> collected digits on an answered call.
parameters A string-to-string map of return values from the action that most recently completed. Empty (or absent) on the first VoiceML request of a call. On subsequent requests, contains the parameters documented on the action that produced them (see Return parameters below).

Return parameters

The following actions populate parameters on the next VoiceML request:

  • <Call>action, callStatus, callEndpoint
  • <Input>action, digits
  • <Record>action, recording, estimatedDurationSeconds
  • <Stream> — any parameters set via update_request_parameters

Other actions (such as <Say>, <Play>, <Wait>) do not populate parameters.

Example: first request

The first VoiceML request for an incoming PSTN call. The call has not been answered yet, so state is STARTING and parameters is empty.

{
  "call": {
    "name": "projects/my-project/calls/abc123def456",
    "to": "+61399998888",
    "from": "+61400111222",
    "isPrivate": false,
    "transport": "PSTN",
    "direction": "INCOMING",
    "state": "STARTING",
    "createTime": "2026-05-15T14:23:01.234Z",
    "startTime": "2026-05-15T14:23:01.234Z",
    "createMethod": "INCOMING_CALL",
    "fromLocation": {
      "regionCode": "AU",
      "administrativeArea": "Victoria",
      "locality": "Melbourne"
    },
    "fromZone": "Melbourne"
  },
  "parameters": {}
}

Example: follow-up request after <Input>

The same call after your application returned a VoiceML response containing <Input>, and the caller pressed 1#. The call is now answered, so state is IN_PROGRESS and answerTime is set. The parameters map carries the values documented on <Input>.

{
  "call": {
    "name": "projects/my-project/calls/abc123def456",
    "to": "+61399998888",
    "from": "+61400111222",
    "isPrivate": false,
    "transport": "PSTN",
    "direction": "INCOMING",
    "state": "IN_PROGRESS",
    "createTime": "2026-05-15T14:23:01.234Z",
    "startTime": "2026-05-15T14:23:01.234Z",
    "answerTime": "2026-05-15T14:23:03.456Z",
    "createMethod": "INCOMING_CALL",
    "fromLocation": {
      "regionCode": "AU",
      "administrativeArea": "Victoria",
      "locality": "Melbourne"
    },
    "fromZone": "Melbourne"
  },
  "parameters": {
    "action": "INPUT",
    "digits": "1"
  }
}

Response

Your server replies with:

  • Status: 200 OK. Any 2xx status is accepted. Redirects (3xx) are followed.
  • Header: Content-Type indicating XML, for example application/xml.
  • Body: a VoiceML document — a <Response> element wrapping zero or more VoiceML actions.

Example response

<?xml version="1.0" encoding="UTF-8"?>
<Response>
    <Input MaxDigitCount="1" NextUri="/menu">
        <Say>Press 1 for sales, 2 for support, or 3 for everything else.</Say>
    </Input>
</Response>

Failure handling

A non-2xx response, a network failure, or a body that cannot be parsed as VoiceML is treated as a failure. Cloud Voice will try the next handler URI (if multiple are configured) and then retry. The retry schedule is the same as for voice webhooks — see Webhooks → Response.

Lifecycle

A typical call goes through this loop:

  1. An event triggers the first request — an incoming call arrives, or an outgoing call created via the API reaches the point where it needs instructions.
  2. Cloud Voice POSTs a CallRequest to the first handler URI. parameters is empty.
  3. Your server replies 200 OK with a <Response> document.
  4. Cloud Voice executes the actions in order.
  5. When an action with a NextUri finishes, Cloud Voice POSTs the next CallRequest. parameters contains the return values from the action that just completed.
  6. If execution reaches the end of the document without a NextUri or other continuation, there may be no further CallRequest; the call flow ends after the final action completes.

Cookies

VoiceML requests maintain cookies on the controlling call. You can use cookies to easily maintain state in a conversation by either setting cookies directly, or using the in-built session state in your programming language or framework of choice.