Using VoiceXML for Telephony Applications

András Micsik
MTA SZTAKI, Department of Distributed Systems
H-1111 Lágymányosi utca 11.
Budapest, Hungary
Roland Alton-Scheidl
Public Voice Lab
A-1040 Operngasse 22-24
Vienna, Austria
Mathias Kimpl
Public Voice Lab
A-1040 Operngasse 22-24
Vienna, Austria


VoiceXML can be the new lingua franca for telephony applications by standardizing the description of audio dialogs. PublicVoiceXML is both the name of an open source software package and the name of a trial project of the EU. Within this project an open source VoiceXML 2.0 compliant voice browser was implemented, and several example applications were built with mixed voice and web interfaces. These applications are currently being tried and evaluated by community radios of Austria.


VoiceXML, voice browser, telephony applications

1. Introduction

VoiceXML [1] is an XML based dialog markup language for interactive voice response applications. The language features call control, speech synthesis and audio output, recognition of spoken and DTMF key input, recording of spoken input, etc. VoiceXML may receive a key role in telephony and other voice-driven applications, but it can also help to extend Web-based services with voice interaction. VoiceXML became a Candidate Recommendation of the World Wide Web Consortium in January 2003.

The main objectives of the PublicVoiceXML [2] project funded by the EU are to provide a reference implementation of a voice browser, to develop and try out examples for voice-driven applications. The partners of the project are Public Voice Lab, MTA SZTAKI DSD (Department of Distributed Systems at the Computer and Automation Research Institute of the Hungarian Academy of Sciences) and Team Teichenberg.

The recently released voice browser and its demo applications are presented in the following sections.

2. PublicVoiceXML

PublicVoiceXML is also the name of the voice browser implemented within the project. PublicVoiceXML is designed to be used by SMEs, and it is targeted at low cost telephony hardware. The software works with ISDN using CAPI, analog lines are currently not supported. The browser is written in C++, it is built on top of the OpenVXI VoiceXML interpreter (another free software).

A pre-release running on Windows platforms has been published on SourceForge in December 2002. The beta version for Linux will be available in the first quarter of 2003.

PublicVoiceXML supports all "must-have" tags of the VoiceXML 2.0 specification except speech recognition. The project has decided to test the level of compliance by creating an implementation report for W3C, and will use the results as a feedback for further developments.

Architecture of PublicVoiceXML voice browser
Figure 1. Architecture of PublicVoiceXML voice browser

3. Voice and web applications

The PublicVoiceXML project implemented several applications in order to demonstrate and test the usefulness and usability of the VoiceXML language and our voice browser. Most of the example applications are targeted for radio stations, as Team Teichenberg, our testing partner in this project has strong connections to community radio stations, who are willing to test drive our applications.

The project team collected and analysed ideas and needs of community radio stations, and developed mock-ups together with these stations for the user interfaces of selected applications.

3.1 InterviewBox

This is a tool that enables reporters, journalists to make interviews using any phone set. The reporter calls the number of the InterviewBox, identifies herself, and the recording is started. After this, the microphone of the phone set is used to record questions and answers of the interviewee. At the end of the interview the reporter hangs up the phone, and the recorded sound file is stored at the server, thus immediately available for further processing or broadcasting. A web interface is used for managing and accessing recorded interviews.

InterviewBox makes it possible for anyone to record quick interviews using a cellular phone, practically anywhere and without any preparation.

Mr. Colosanti (left) being interviewed by the
        project manager Mr. Alton-Scheidl (right) using InterviewBox
Figure 2. Mr. Colosanti (left) being interviewed by the project manager Mr. Alton-Scheidl (right) using InterviewBox

3.2 PresentBox

PresentBox is an online ticket offering service often used by radios. People can call a number, listen for the available 'goodies' (tickets, books, etc.), and leave their names, addresses and phone numbers at the one they would like to get. Offerings are collected at the studio, where operators insert new sound advertisements into the queue. A voice and web interface are both available for users to select tickets and other goodies.

The voice and web interface are quite different from each other: on the web interface users see a list of the goodies and may click on any item to place the application. Using the voice interface the listeners has to go through the audio messages describing each goodie in the specified order, and apply for the goodie right after its message was played. The forms users fill in for application are also different: on the web it is useful to collect logically separate entities (e.g. phone number, e-mail) in separate fields of the form. On the phone interface it proved to be more comfortable for users to answer several questions with a single recorded answer.

There is also a web and phone interface for administrators. With the phone interface they can easily record intro and other help messages or messages for new goodies. On the web interface they can manage goodies, their playing order, and see the list of applications for each goodie.

3.3 Forum

There are many types and styles of message boards and online discussions on the web. The PublicVoiceXML project extended the functionality of one such system called net.board towards a bi-modal (phone and web) discussion area. Comments made on the phone can be listened using the web interface as any other sound file. Comments made on the web are converted into speech by the phone interface using a text-to-speech converter.

3.4 Payline

It is a practical solution for micropayments to collect the payment with a phone call. If a user wants access to a document, image or music on the web, she has to call a number. The cost of this phone call will settle the payment for the selected item, and the user will hear the password or code to access that item.

A JSP and Java based environment for such a payline mechanism has also been built as a demo using the PublicVoiceXML software.

4. Experiences and future work

Most of these applications were built on a basic framework developed within the project. This framework uses PHP as a middleware, Postgres database (through PEAR database abstraction layer), and Smarty template engine for the generation of HTML and VoiceXML pages. This results in clean PHP code, and the separation of presentation from business logic. HTML templates can be edited by designers without the need for programming skills, and VoiceXML scripts can be fine-tuned without touching PHP code.

The InterviewBox had two major trials: first, at the IST 2002 Event in November 2002 in Copenhagen. This was the biggest European fair for Research on Information Society Technologies. The project ran the official radio for this event, which broadcasted interviews made with cellular phones.

Based on the experiences of the first trial, Radio Orange 94.0, Public Voice Lab and Team Teichenberg managed the reporting on the general elections in Austria on November 24, 2002. At former elections the production of reports was rather complicated because of manual coordination of live in-bound telephone interviews taking place at different locations. This time each interviewer team had its own phone number to call when accessing InterviewBox. Producers and editors in the radio studio could easily handle the load of audio material by pre-listening to the finished interviews and creating playlists for the on-air-programme. Other community radio stations in Austria were able to download the recorded files from an official website so they could integrate those into their broadcasts.

PresentBox and Forum are currently under test and evaluation by Radio Orange 94.0, a community radio in Wien.

During the implementation of these examples we studied the tempting possibility of creating a generic form and dialog description from which both HTML and VoiceXML can be generated. We found that in our examples this solution is strongly objected by the difference in the nature of these interfaces and the user needs.

5. Acknowledgements

This work is supported by the EU under contract IST-2001-34546.

4. References

  1. Voice Extensible Markup Language (VoiceXML) Version 2.0.
  2. PublicVoiceXML project homepage.