Non-verbal sounds (NVS) constitute an appealing communicative channel for transmitting a message during a dialog. They provide two main benefits, such as they are not linked to any particular language, and they can express a message in a short time. NVS have been successfully used in robotics, cell phones, and science fiction films. However, there is a lack of deep studies on how to model NVS. For instance, most of the systems for NVS expression are ad hoc solutions that focus on the communication of the most prominent emotion. Only a small number of papers have proposed a more general model or dealt directly with the expression of pure communicative acts, such as affirmation, denial, or greeting. In this paper we propose a system, referred to as the sonic expression system (SES), that is able to generate NVS on the fly by adapting the sound to the context of the interaction. The system is designed to be used by social robots while conducting human robot interactions. It is based on a model that includes several acoustic features from the amplitude, frequency, and time spaces. In order to evaluate the capabilities of the system, nine categories of communicative acts were created. By means of an online questionnaire, 51 participants classified the utterances according to their meaning, such as agreement, hesitation, denial, hush, question, summon, encouragement, greetings, and laughing. The results showed how very different NVS generated by our SES can be used for communicating.
sound synthesis; human-robot interaction; electrosonic mode; social robots; non-verbal sounds; sonic mode; quasons; emotions; robot; speech