The objectives of this WP are to develop methods and techniques for the automatic detection and interpretation of non-verbal social signals in the context of multi-party natural interaction between humans and virtual humans in a serious gaming-like scenario.
Sensors, which include cameras, microphones, tactile, position and (neuro)physiological sensors are more and more becoming part of sensor-extended PCs, sensor- equipped environments, smart rooms and homes, augmented reality, mixed reality and virtual reality environments. The availability of these sensors introduces the opportunity of context-dependent detection and interpretation of a user’s or visitor’s interaction relevant social signals and activities, such as persuasion, rapport, disagreement.
In a joint conversational activity between humans or between humans and virtual humans, social signals regulate the flow of conversation, for instance floor changes and turn taking. These social signals are generated by, among others, head movements, body movements, hand and arm gestures. Detection and interpretation of such social signals and activities allows virtual humans participating in the joint conversational activity to interact in a natural way with the user by reactively and pro-actively providing (context-dependent) feedback, including the adaptation of the environment. In general, social signals can provide cues of the affective state, cognitive state and social awareness. Detection of such social signals is relevant in many applications ranging from E-inclusion, social well being and assisted living.
The main research question
The main research questionof this work package is “To what extent can computers detect social human behavior?” Of course this research question is much too broad in terms of which social human behaviors and of the environment in which the behavior takes place. We consider a smart environment setting in which there is a multi-party natural interaction between humans and virtual humans in a serious gaming-like scenario. Moreover we focus on the multi-modal combination of nonverbal human behavior and audio, especially social signals conveyed by body movements and speech. When convenient, other sensors, such as pressure sensors in chairs, will also be taken into consideration.
A data-driven approach will be taken for the design and evaluation of the detectors. The first one and a half year will mostly be dedicated to setting up a corpus of annotated social human behaviors. The requirements, context and scenarios will be defined in close cooperation with the other WPs. The annotations will also be used to determine which nonverbal behaviors can be interpreted as social signals independent of context (non-ambiguous signals). To determine the role of context in the interpretation of intrinsically ambiguous signals, a perception experiment will be conducted.