Citation info: Ekman, I. and Kajastila, R. (2009) "Localisation Cues Affect Emotional Judgements – Results from a User Study on Scary Sound", Proc. AES 35th Conference on Audio for Games, February 2009, London, UK. CD-ROM.
Inger Ekman, Raine Kajastila
Department of Media Technology,
Helsinki University of Technology
P.O.Box 5400
FIN-02015 HUT
Abstract. The current paradigm for creating emotional impact in game sound is to carefully choose which sounds to play. This paper takes an alternative approach, suggesting that emotional impact of sounds can be affected by choosing how to play those sounds. We describe a novel concept for emotional sound design - emotional fine-tuning - and show how it is possible to systematically influence the emotional impact of a single sound sample. A controlled user study with 8 subjects confirmed that changing the reproduction of a sample so that source localization of the sound is challenged will increase its perceived scariness compared to the same sound with clearly detectable source. The work extends experimental research on emotion perception in sound. It has practical implications for sound design in games and other interactive media.
1 INTRODUCTIONSound holds a power to influence how people feel. Consequentially, both music as well as non-musical sounds have been incorporated into many forms of media representation. Sound has held a strong role in feature film even before the advent of the "talkies" in the early 1930's. More recently, it has established its value also as a tool for creating emotion in computer games.
The technical potential of game sound is already at par with that of the feature film. However, design strategies for sound in computer games are still struggling with how to incorporate expressiveness with interactivity. The obvious challenge resides in how to make sound that is both responsive to user interaction and also emotionally effective. Traditional cinematic techniques for emotional sound rely on a single static soundtrack and linear story progression. When events are known in beforehand, sound can be tailored to precision. In games, emotional moments arise as a result of user action, which makes the final result much harder to predict. On top of that, many of the emotions in gaming arise not so much from an understanding of the underlying story, but arise as the player evaluates events in term of their personal progress in the game. Gameplay emotions (see [1]) are tied to the success and failures of user action. Consequentially, emotion in games is neither strictly predetermined, nor static. In fact, in terms of emotional sound design, games and film rely on different, even partly contradictory strategies. [2]
The unique challenge for game sound is therefore how to build up powerful emotional impact, while maintaining the flexibility to respond dynamically to user action. We tackle this question with a novel approach: seeking systematic ways to influence the emotional effect of sound by manipulating how individual sound samples are played. We apply findings from emotional psychology and propose a way to utilize these for emotional fine-tuning of sounds. The approach extends current game sound design practise with nuanced emotional control on individual sound samples. We demonstrate our point with a case study, subtly adjusting the scariness of sound samples by manipulating the parameters of spatial reproduction.
The paper is organised as follows: The following chapter will present currently employed strategies for game sound and propose how these can be enriched by findings from emotional psychology. Next, the experiment with scary sound is presented. Finally, we conclude with a discussion about the implications of our results and make suggestions for future directions.
1 EMOTIONAL FINE-TUNING OF GAME SOUNDThe current approach in game sound design is to use a set of predefined sound samples, often assigned to specific sources within the game world. These static samples are triggered, mixed and manipulated dynamically, in dependence with game parameters. The soundscape can, for example, respond to changes in the game world (different time of day, weather circumstances), player location and movement within the game world (accommodating for various listening positions and listening environments) and variables related to gameplay (providing clues, signalling presence of threat).
1.1 A Slider Controlling Emotion
We propose to extend above techniques to involve parametric control of the emotional content of sound samples. Especially with non-musical sound, emotion has been viewed as a property of the sound sample. Thus, the sound of laughter has been considered happy and the sound of thunder ominous, much in a categorical fashion. But the emotional subtlety of everyday sounds suggests much finer nuance than mere categorical choice. If this is the case, emotionality could be designed with much richer nuance than merely choosing which sound to play.
The proposed goal is to tap into the subtle meaning embedded in sound, making it possible to slightly shift the emotional tone of a sound. The goal is not finding parameters that would turn the emotional effect of thunder from ominous to happy. Instead, we suggest a way to modify the sound of thunder so that it is made more or less ominous depending on what the game situation calls for. Think of it as an emotional slider, which allows cranking up or down the emotional potential contained within a sample, to make the final sound fit the interaction.
1.2 Similar Approaches - Parametric Music
Currently, the closest resemblance to our suggested approach is to be found in the domain of music. Interactive music in games already involves a degree of parametric control. Particularly, the emotional impact of a piece can be altered e.g. by changing tempo, harmonization and instrumentation of the piece [3]. Further, such higher level modifiers as reverberation influence also the feeling associated to a piece [4]. However, many of the aforementioned tools are applicable only to specific musical traditions (that is, the tonal music tradition).
Particularly, for everyday sounds (and musical styles relying on such sounds), there are no known solutions that would allow similar flexibility and emotional finetuning. The reason, we believe, is that the emotional effect of everyday sounds is not well known.
1.3 Potential Parameters for Influencing Emotion of Everyday Sounds
Studies in psychology suggest that the emotional content of a sound is determined at various levels of meaning-making. Particularly, several findings from emotion psychology indicate to the fact that affective evaluations are made precognitively and that at least part of the emotional content is determined by low-level factors, before reasoning. Zajonc [5] was among the first to point this out in his famous essay “Feeling and Thinking: Preferences need no Inferences”. This view is supported by subsequent results, and researchers suggest there exists such a thing as unconscious emotion [6]. For example, Öhman [7] has demonstrated that people can become frightened of pictures they never even realize they see.
Unconsciously made affective judgements provide the raw stuff of emotion, and they also influence the interpretation of simultaneously ongoing conscious processes. For example when playing computer games, most players do not pay notice specifically to the sound, but focus instead on gaming. Nevertheless, the sound is likely to cause some affective reactions, which will tint ongoing attended-to events with emotional significance. Particularly interesting for sound design is the theory that emotional processes are influenced by the very ease of perceptual processing [8]. There is indication that perceptually fluent information is inherently perceived as positive, and vice versa. This suggests a source to look for emotionally salient manipulations of sound. It is further probable that evaluation of some sounds has biological motivation. This appears to be the fact with the startle response, and similar functions that relate to survival. However, evolution may influence perception of sound in less obvious ways as well. For example Huron [9] suggests that the perceived cuteness of sounds may be an evolutionary adaptation that promotes parenting.
2 EXPERIMENT WITH FINE-TUNING THE SCARINESS OF SCARY SOUNDWe demonstrate the concept of emotional fine-tuning with a case study of scary sound, using localization cues to manipulate the scariness of four different sound samples. The theoretic foundation resides in the evolutionary link between fear emotions and promoting survival. In short, our assumption is that the scariness of a (scary) sound is causally related to how well it affords localizing a potentially harmful source. This effect is evolutionarily motivated: in potentially dangerous environments, it is important to be able to make fast and accurate decisions in order to avoid possibly dangerous situations. The less information available, the more threatening the situation should be. Following this reasoning, we further assume that compromising the capability a listener has to localize a sound will lead to increased fear.
The specific questions of our experiment are:
- Does the front-back location of a sound influence how scary a sound is judged to be?
- Does the quality of spatial cues in a sound influence how scary it is judged to be?
2.1 Subjects
8 subjects participated in the test (2 female). We used naive subjects and the task involved no training for the listening task. This is a realistic setting for testing the effect in question, considering the anticipated consumer experience with game sound.
2.2 Stimuli
Individual stimuli for comparison were pairs of the same identical sound file. This removes the methodological issues surrounding comparisons between different sounds.
The two configurations for localisation cues are as follows:
- Spread. To produce a less defined sound source, groups of three loudspeakers were used in reproduction. Three loudspeakers were positioned about 30 degrees apart, thus creating a sound source with a theoretical width of 60 degrees. To avoid undesired comb filter effect produced by multiple loudspeaker reproduction of the same sound, each sound signal was convolved with 100ms white noise bursts, thus creating less correlated signal for each loudspeaker. Convolution process produces desired qualities to the reproduced sound: the sound is perceived to emanate from larger and more undefined area, but still from certain direction. Generally it is assumed that any wide band and incoherent signal is perceived to emanate from the whole area between the used loudspeakers. Relatively short sound bursts may nevertheless create perceptually narrower sound sources, however this is not the case with longer signal lengths [10].
- Point. The pointlike sources were created by playing the sound only from one loudspeaker. To ensure the pointlike stimulus sound corresponded to the wider sound, it was also convolved with white noise.
The choice of sound samples was motivated by the particular effect we are investigating - in order to modify scariness we had to ensure we were using sounds that have a potential to be scary in the first place. Four different sound files of approx. 4s length were used. The used sound samples are reminiscent of vocalizations made by large predators, which motivates the importance of localizing the threat. All sound samples can categorically be classed as 'scary'. This interpretation was also verified prior to the main listening task by asking each listener to describe each of the four sounds in a separate questionnaire.
2.3 Facilities and Procedure
Listening tests were performed in an acoustically dry, large space designed for use in audio research. The walls and ceiling are designed of absorbing material, while the floor is concrete. The space is 790 m3 in volume (floor area 12m x 11m and room height 6m) and equipped with a loudspeaker setup of 24 loudspeakers located approximately on the surface of a hemisphere. Loudspeakers are virtually positioned to equal distance from the centre of the room by using delays. Before the user experiment the each loudspeaker configuration was carefully tuned to create equal sound pressure level at the listening position. Measurements used a microphone positioned at listening spot in the middle of the room and loudspeaker input signals were equalized accordingly. The system was implemented with Pure Data and the experiment was operated by a laptop located in the middle of the room.
Stimulus sounds were played using loudspeakers positioned at ear level. The test included 2x2 (front/back x spread/point) configurations tested against each other. When these four setups were compared, they formed six comparison pairs (a setup was never compared against itself). The test consisted of 96 individual comparisons, resulting in 16 comparisons between any two different setups.
Pointlike sources were played using one of four speakers in the corners. Spread sources used four alternative sets of three loudspeakers each, two from the front and two from the back. (See Figure 1 for pointlike, Figure 2 for spread sounds.) Stimuli were divided between setups so that in each category, equal amounts of sounds were played coming from left or right.
To create an immersive soundscape, a background sound was introduced. We used two monophonic recordings of nature ambience of different lengths, looped to create a continuous non-repetitive background. The ambience was reproduced through loudspeakers at various elevations. Slight reverberation was added to the ambient sound to create a diffuse sound field.
Figure 1. Loudspeaker setups used in reproducing pointlike stimulus sounds.
Figure 2. Loudspeaker setups used in reproducing spread stimulus sounds.
The listening test used subjective evaluations, and subjects were asked to indicate which of two presented stimulus sounds they perceive as more scary. Before the actual listening test, subjects were introduced to each of the four sound samples, and answered brief questionnaires regarding their interpretations of the sounds (e.g. what they thought it sounded like, and how the sound made them feel). The listening test used subjective evaluations, and subjects were asked to indicate which of two presented stimulus sounds they perceive as more scary. Before the actual listening test, subjects were introduced to each of the four sound samples, and answered brief questionnaires regarding their interpretations of the sounds (e.g. what they thought it sounded like, and how the sound made them feel). The listening test used repeated two-alternative forced-choice comparisons made directly on the computer. Subjects were asked to judge which one of the presented sounds was scarier, choosing between two instances of the same sample played through different speaker setups. All playing of sounds was initiated by the subjects. During each judgement, the two sounds under comparison could be played at will in free order. No time limit was given for comparisons and subjects were encouraged to keep a short break whenever they felt like it. The whole test procedure lasted approximately 30-40 minutes.
To reduce the influence of visual information on judgments of sounds, the large roof lights were switched off, leaving the large black space illuminated only by the light of the monitor screen and a small lamp.
2.4 Results
The judgements for each subject were collated to ratios of wins/losses for each type of loudspeaker configuration within the six comparison categories under investigation. In a case of no effect, the distribution between two given alternatives would be random, and choices would approach equal on both sides. The statistical analysis evaluates, whether the balance of judgements in any comparison is reliably deviating from a random result.
Table 1. Win and loss balances for comparisons of different loudspeaker configurations. Statistically significant results marked as following: one star (*) when p< 0.05; three stars (***) when p< 0.001.
The number of wins in each category, for all subjects, is presented in Table 1. Win/loss ratios deviating from the random balance are marked to indicate the level of statistical significance.
The results show that pointlike sounds from the front were perceived less scary than pointlike sounds from the back. However, this difference is removed when using spread sound sources. It is possible that spreading the sources makes it harder to distinguish whether a sound comes from the front or the back and that this explains the difference in ratings. On the other hand, spreading the source in the front may increase its scariness to such an amount that the front-back effect looses its impact. The latter assumption is supported by comparisons between sounds played only from the back. There, we find a difference in scariness ratings depending simply on whether the source of the sound was pointlike or spread. Pointlike sources were perceived significantly less scary than when the source was less defined (t=3.55, p=0.001).
The scariness factor associated both to sound from the back and from a spread source seems to be working together. Importantly, the effects can also be in contradiction, in which case they cancel each other. Point-front sounds are clearly less scary than backspread. However, when the sounds in front are spread and the sound in the back turned pointlike, the difference in scariness disappears (see Figure 3). The shift is statistically significant (paired within-subject comparisons; t=3.28; p=0.014).
Figure 3. When sounds in front are spread and sound in the back is turned pointlike, there is a shift in the scariness associated to front and back sound, respectively.
This is an important finding and it concerns the utility of spatial cues for emotional design: Whereas the design may sometimes require that a sound is played from a certain location, our results show that a gradual change in scariness can be achieved also without modifying the location, by modifying spatial cues associated to that particular location.
3 DISCUSSIONWhereas much of the effect is linked to the identity of a sound, and depends on what is being played, also decisions how sound is played can affect the sound's emotional impact. We describe a novel concept for emotional sound design - emotional fine-tuning - and show how it is possible to systematically influence the emotional impact of a single sound sample. The approach presented in this paper views emotion as a dynamic, parametrically controlled feature in interactive applications. This acknowledges the unique nature of game as a medium and the demand for having content that is at the same time both interactive and emotionally powerful. Parametric tuning of emotion would allow, for example, modifying the emotional feel of a game on the fly either in response to automatically detected user behaviour, or simply by letting the player to change the emotional tone or intensity of the game to match his/her preferences.
We have detailed theoretical background of emotionality in sound perception. Particularly, the concept of unconscious emotion provides a good starting point for developing parametric controls for emotional fine-tuning. Our case study demonstrates one case of such fine-tuning, showing how to subtly adjust the perceived scariness of sound. The control was implemented and validated in a user test with eight subjects.
This study is ongoing work, but its results are already applicable. The present study may seem limited in that it is only controlling a single quality - the scariness of a sound. Nevertheless, having even this one parameter for parametrically adjusting emotions can prove valuable for practical designs. We have chosen our case emotion with the game industry in mind. Most contemporary games rely in some way on feelings related to the scary. From suspense to care, worry to outright terror, all these feelings have their biological source in one basic emotion, fear. In terms of game emotions, fright is one of the main building blocks of the gaming experience.
Our work has demonstrated the feasibility of a novel approach and also provided pointers for where to look for potential parameters to extend the present system. Taking a broader perspective, the tool we envision for nuanced emotional sound would involve many more parameters. Finding the will be a laborious process of discovery, implementation and validation. It is probable that sound designers already are intuitively exploiting these rules, but the parameters remain unexplicated and without parametric control. The literature on perceptual fluency further suggests some of the parameters may in fact be very simple, but others will undoubtedly be tricky to implement, or rely on complex mechanisms of the human brain. Successfully working out these issues will call for a collaborative effort between psychologists, engineers and designers alike.
ACKNOWLEDGMENTSThe research was made possible by funding from the Academy of Finland, projects no. [119092] and no. [111509], and the European Research Council under the European Community's Seventh Framework Programme (FP7/2007-2013) / ERC grant agreement no. [203636].
References[1] B. Perron, “A Cognitive Psychological Approach to Gameplay Emotions” Changing Views: Worlds in Play. Proc. DiGRA 2005 Conference, Vancouver, Canada (2005).
[2] I. Ekman. “Psychologically Motivated Techniques for Emotional Sound in Computer Games” Proc. AudioMostly, Piteå, Sweden, pp. 20—26 (2008).
[3] J. Berg, and J. Wingstedt, “Relations Between Selected Musical Parameters and Expressed Emotions: Extending the Potential of Computer Entertainment” Proc. ACM Conference on Advances in Computer Entertainment Technology, Valencia, Spain, pp. 164—171 (2005).
[4] D. Västfjäll, P. Larsson, and M. Kleiner, “Emotion and Auditory Virtual Environments: Affect-Based Judgments of Music Reproduced with Virtual Reverberation Times” CyberPsychology & Behavior, vol. 5, no. 1, pp. 19—32 (2002).
[5] R. B. Zajonc, “Feeling and Thinking: Preferences Need No Inferences” American Psychologist, vol. 35, pp. 151—175 (1980).
[6] P. Winkielman, and K. Berridge, “Unconscious Emotion” Current Directions in Psychological Science, vol. 13, no. 3. pp. 120—123 (2004).
[7] A. Öhman, “The Role of the Amygdala in Human Fear: Automatic Detection of Threat“ Psychoneuroendocrinology, vol. 30, pp. 953— 958 (2005).
[8] R. Reber, N. Schwarz, and P. Winkielman, “Processing Fluency and Aesthetic Pleasure: Is Beauty in the Perceiver's Processing Experience?” Personality and Social Psychology Review, vol. 8, no. 4, pp. 364—382 (2004).
[9] D. Huron, "The Plural Pleasures of Music." Proc. 2004 Music and Music Science Conference. Kungliga Musikhögskolan & KTH (Royal Institute of Technology), pp. 1—13 (2005).
[10] T. Hirvonen, and V. Pulkki, “Perceived Spatial Distribution and Width of Horizontal Ensemble of Independent Noise Signals as Function of Waveform and Sample Length” Proc. 124th AES Convention, Amsterdam, Netherlands (2008).