First Asian Network-Based Speech-to-Speech Translation System

July 29, 2009

MASTAR Project of the National Institute of Information and Communications Technology (NICT), which is led by Dr. Satoshi Nakamura and is in collaboration with the Asian Speech Translation Advanced Research Consortium (A-STAR) members, successfully developed the first network-based speech-to-speech translation system for Asian languages. This technology comes from the members’ contribution to ASTAP to standardize interfaces and data formats for connecting speech translation modules internationally on the Internet. NICT and the other A-STAR member laboratories are carrying out a field test to demonstrate this brand new translation system.

Background

As a part of its R&D of speech-to-speech translation technology, NICT, with A-STAR Consortium members, has been working at ASTAP for the standardization of the translation modules to be interconnected through the Internet. Based on this standardization activity, NICT and the other A-STAR members will launch “the first Asian network-based speech-to-speech translation system” which can perform real-time, location-free, multi-party communication between speakers of different Asian languages including Japanese, Chinese, Korean, Thai, Indonesian, Malay, Vietnamese, Hindi and English.

Details

The new system enables each A-STAR member to provide a module of the spoken language technologies: Automatic Speech Recognition (ASR), Machine Translation (MT), and Text-to-Speech (TTS) through the STML web servers which NICT has developed. Through the collaboration with the other A-STAR members, the number of languages supported by the system was extended to eight Asian languages (Japanese, Chinese, Korean, Thai, Indonesian, Malay, Vietnamese, and Hindi) as well as the English language. The client applications are implemented on a handheld mobile terminal device, which allows portable speech-to-speech translation. A maximum of four clients can perform the multi-party speech translation at the same time. In addition, the system domain covers 20,000 travel expressions, plus additional Named Entities (NE) from major Asian countries (e.g., tourist areas: Bulkuksa-Korea; Watprakaew-Thailand; attractions: Wayangkulit-Indonesia; Khatak-India, etc.)

Future Aspects

NICT will advance the standardization activity within the frame of ASTAP SNLP EG to support more languages.

Terminology

Asian Speech Translation Advanced Research Consortium (A-STAR)
Originally started by six members, the Asian Speech Translation Advanced Research Consortium was established in June, 2006 by NICT and the Advanced Telecommunications Research Institute International (ATR) of Japan. Founding countries included Japan (NICT and ATR), Korea (the Electronics and Telecommunications Research Institute - ETRI), Thailand (the National Electronics and Computer Technology Center - NECTEC), Indonesia (Badan Pengkajian Dan Penerapan Teknologi or the Agency for the Assessment and Application Technology - BPPT), China (the National Laboratory of Pattern Recognition of the Institute of Automation of Chinese Academy of Sciences - NLPR-CASIA), and India (the Centre for Development of Advance Computing - C-DAC). In 2008, Vietnam (the Institute Of Information Technology - IOIT) and Singapore (Institute for Infocomm Research - I2R) joined the Consortium.

ASTAP
The Asia-Pacific Telecommunity Standardization Program. Launched by the Asia-Pacific Telecommunity (APT) in February 1998 to promote and coordinate expert activity in telecommunications standardization across the Asia-Pacific region.

STML
Speech Translation Markup Language. Located in multiple points, the STML servers provide the speech-to-speech translation technologies such as ASR, MT and TTS through the Internet. STML development was originally launched at ATR and supported by Special Coordination Funds for Promoting Science and Technology by the Ministry of Education, Culture, Sports, Science and Technology. This technology was handed to NICT for further development.

SNLP EG
The Speech and Natural Language Processing Expert Group. One of the Expert Groups in the ASTAP, focusing on the standardization of speech-to-speech translation and Asian language resources.
Dr. Satoshi Nakamura, Director of MASTAR Project of NICT serves as rapporteuer, and Dr. Jun Park, ETRI, Korea, is co-rapporteuer.

Appendix

The First Asian Network-Based Speech-to-Speech Translation System

The Asian Speech Translation Advanced Research Consortium (A-STAR) is an international consortium established with the goal of making a network-based speech-to-speech translation systems in the Asian region a reality. Established in June 2006 by NICT/ATR (Japan), A-STAR originally started with 6 members: Japan (NICT/ATR), Korea (ETRI), Thailand (NECTEC), Indonesia (BPPT), China (NLPR-CASIA), and India (CDAC). In 2008, Vietnam (IOIT) and Singapore (I2R) also joined the consortium.

A-STAR was founded in order to create a basic infrastructure for spoken language communication for overcoming the language barriers that exist in the Asia-Pacific region. The consortium is working collaboratively to collect Asian language corpora, create common speech recognition and translation dictionaries, develop Web service speech translation modules for the various Asian languages, and standardize interfaces and data formats that facilitate the international connection between the different speech translation modules from different countries. One of the consortium's effort is to create an expert group within APT ASTAP (the Asia-Pacific Telecommunity Standardization Program). This group is dedicated to develop a draft for a standardized interface and data format that enables the various speech translation modules residing in locations all across the Asia-Pacific region to connect with each other through the Internet. These research activities have also been adopted as the APEC TEL (Telecommunications and Information) project.

On July 29, 2009, A-STAR launched the "first Asian network-based speech-to-speech translation system" that can perform real-time, location-free, multi-party communication between speakers of different Asian languages. The highlights of this system are:

Languages: Eight A-STAR member research groups participated in the experiments covering 9 languages which included eight major Asian languages (Japanese, Chinese, Korean, Thai, Indonesian, Malay, Vietnamese, Hindi) and the English language.
Language Technologies: Each A-STAR member contributes one or more of the following spoken language technologies: automatic speech recognition (ASR), machine translation (MT), and text-to-speech (TTS) through STML web servers. Currently, the system performs ASR for 8 different languages, TTS for 9 languages, and MT for 72 language pairs.
Handheld Device: The client applications are implemented on a handheld mobile terminal device (VAIO), which allows portable speech-to-speech translation.
User Access: Any client user can access, in real time from anywhere, all available A-STAR ASR/MT/TTS STML servers, and can perform language translation between up to 4 parties at the same time.
Travel Domain: The system domain covers 20,000 travel expressions, plus additional named entities (NE) from major Asian countries (e.g., tourist areas: Bulkuksa in Korea, Wat-pra-kaew in Thailand; attractions: Wayang kulit in Indonesia, Khatak in India, etc).

The consortium is continuing their activities internationally and partners are still being sought for languages not only in Asia, but around the world. This is expected to accelerate the speed of research towards the realization of practical speech translation systems.

Fig.1