Personal Voice Call Assistant: VoiceXML and SIP in a Distributed Environment

Michael Pucher
Donau-City Straße 1/3
A - 1220 Vienna
Julia Tertyshnaya
Donau-City Straße 1/3
A - 1220 Vienna
Florian Wegscheider
Donau-City Straße 1/3
A - 1220 Vienna


In this paper we introduce the architecture of a distributed service platform that integrates speech, web technology and voice-over-IP technologies and describe how a specific service can be built using these technologies. The electronic assistant is an advanced voice-based service, that answers incoming calls and takes messages, consulting the user's calendar and address book. A novel contribution consists in the dynamic creation of the dialog (in form of VoiceXML pages) from calendar and other data residing in the platform's databases, providing in this way actualized information to the caller (at call time).

The described platform provides basic services (like call control) and allows developers to stack services, alleviating quick service generation. The assistant illustrates how several fairly basic building blocks can be combined into a powerful end-user application

Sticking to our assistant as a guiding example, the first part of this paper gives details on the platform's goals and architecture. The second part portrays the dynamic generation of user-friendly and intuitive VoiceXML pages and grammar from the calendar data-base, which proved to be an intricate task by itself.


VoiceXML[1], SIP, CPL, Parlay, CCXML


The implementation of new mobile communication technologies like UMTS and GPRS will have a strong impact on the Internet. Today we already access the Internet not only from a PC, but also via mobile phones, palmtops and other devices. New applications combining several basic services like telephony, e-mail, web browsing or instant messaging will emerge and include security and Quality of Service mechanisms.

We developed such an Internet-based telecom application using VoiceXML, which applies web development paradigms to voice dialog development and the Session Initiation Protocol (SIP).

The usage of these two technologies leads to a certain conver-gence between the Web and the sparsely existing Voice Web. Using VoiceXML voice interfaces, people can easily access Web resources, while SIP turns out to be the key technology for both Internet telephony and the core network of UMTS.


2.1 Scenarios

Consider the following scenario (Figure 1). In the morning A calls B (because of some problem). The platform knows that B is not available and transfers A to B's assistant. Via the assistant A finds out that B will not be back before tomorrow. A leaves a message for B and asks to be connected to a colleague of B because the problem is urgent.

Later B calls his assistant and modifies his calendar: The trip will take a day longer. B listens to A's message and calls A back immediately

Figure 1: Usage Scenario

2.2 Architecture

The Architecture consists of the following Components which are shown in Figure 2.

An external Calendar that stores calendar data for all users, a Calendar service which creates VoiceXML snipplets from calendar data. A VoiceXML Data Managment storage system. A CPL interpreter that redirects calls to the assistant. An Address book which provides access rights to calendar. A Call setup service that creates and redirects calls. An external VoiceXML platform that parses VoiceXML, does speech recognition and TTS. And finally the Voice Call Assistant which pro-vides the core logic

Figure 2: System Architecture


The generation process creates dynamic VoiceXML prompts by extracting information from the calendar event response such as event start and end time, location and title finding a template, walking through the decision tree guided by request type, event duration, iteration number and user access rights replacing template slots using regular expressions. The Architecture is shown in Figure 3.

Figure 3: Calendar Parser Architecture

Template examples are

Path:owner/iterative/getEventsForDay/irst-iteration/normal/ Template: Today you have [title] from [start_time] till [end_time] [location]

Path:owner/iterative/getEventsForDay/first-iteration/whole-day/ Template: Today you have [title] for whole day [location]


This paper has shown how VoiceXML and other standardized technologies can be used to create applications for next generation telecom networks. We introduced an extraction method that can be used to build voice interfaces for closed domain, dynamic data structures.

We also presented an extensible and scalable architecture that uses high level and standardized interfaces and languages (Parlay, VoiceXML, CPL). Looking beyond the call assistant we described, this architecture allows easy creation of various applications and deployment of multiple scenarios basing on combination of the above mentioned generic elements. The integration of CCXML or a similar call control language will permit implementing even more functionality in a standardized, easy to understand scripting interface.


This work was supported within the Austrian competence center program Kplus.


  1. W3C: VoiceXML Version 2.0, Working Draft, 24 April 2002,