Integration of a voice recognition system in a social robot Articles
Overview
published in
- CYBERNETICS AND SYSTEMS Journal
publication date
- June 2011
start page
- 215
end page
- 245
issue
- 4
volume
- 42
Digital Object Identifier (DOI)
full text
International Standard Serial Number (ISSN)
- 0196-9722
Electronic International Standard Serial Number (EISSN)
- 1087-6553
abstract
- Human-Robot Interaction (HRI) 1 is one of the main fields in the study and research of robotics. Within this field, dialog systems and interaction by voice play a very important role. When speaking about human- robot natural dialog we assume that the robot has the capability to accurately recognize the utterance what the human wants to transmit verbally and even its semantic meaning, but this is not always achieved. In this paper we describe the steps and requirements that we went through in order to endow the personal social robot Maggie, developed in the University Carlos III of Madrid, with the capability of understanding the natural language spoken by any human. We have analyzed the different possibilities offered by current software/hardware alternatives by testing them in real environments. We have obtained accurate data related to the speech recognition capabilities in different environments, using the most modern audio acquisition systems and analyzing not so typical parameters as user age, sex, intonation, volume and language. Finally we propose a new model to classify recognition results as accepted and rejected, based in a second ASR opinion. This new approach takes into account the pre-calculated success rate in noise intervals for each recognition framework decreasing false positives and false negatives rate.
Classification
subjects
- Robotics and Industrial Informatics
keywords
- robot audition; automatic speech recognition; asr; voice recognition; speech recognition; maggie; personal robot; social robot; human-robot interaction; human-computer interaction; dialog; microphone system; audition system; natural language understanding; natural language processing; computing confidence score