Citation info: Ekman, I. (2009) "Modelling the Emotional Listener: Making Psychological Processes Audible", Proc. AudioMostly 2009, Sepbember, Glasgow, Great Britain.
Modelling the Emotional Listener: Making Psychological Processes Audible
Inger Ekman
Center for Knowledge and Innovation Research,
Helsinki School of Economics, Finland
inger.ekman@hse.fi
Abstract. There is an increasing tendency to use of procedural strategies for the creation and manipulation of sound in computer games. This development is motivating a shift in the design process; meaning is no longer tied to a specific asset or asset type, but is instead linked to the procedural manipulation of the sound material. Often the design phase for this type of game sound includes modelling sound within the virtual environment in terms of a source, a medium and a listener. This way of thinking about sound is not new; however, how it relates to emotional expression has not been investigated before. Particularly what has not been modelled is the listener as a perceptual entity, whose perception changes to reflect psychological states and processes. This study addresses the lack of research in this area. We identify four psychological processes that influence what and how sounds are heard: attention, emotion, multimodal perception and internal sound. We also provide a detailed investigation of a special case of psychological process: the perceptual distortions people suffer during extreme stress. Drawing from this empirical data, we form a listener model capable of expressing the avatar's psychological qualities through sound manipulation. The listener model is described, along with examples on how to apply it in practise.
1 Introduction
Game sound designs need not strive for realism. Nevertheless, in many games the designer's ideal is to create a virtual world, where both the sound and visuals (and in rare cases, other senses, too) mimic the physical world as much as possible. This desire is particularly true for some genres wherein the gameplay relies on mediated forms of real-life skills such as accurate perception. In these games, a realistic portrayal of the virtual environment is prerequisite for engaging play. An illustrative example of this kind of genre is the First Person Shooter, which strives to create a highly detailed aural landscape. Perhaps it is because of this need that the genre can be accredited as the driving force both in computer graphics, and sound development.
The creation of sound for such a game environment usually involves the modelling of sound in terms of physical properties of sound. Several ongoing trends in game sound is heightening the importance of such an organization also outside this genre. Particularly technology development and the increased availability of real-time Digital Signal Processing is suggesting a shift over from organizing sounds mainly as asset types (sound effects, music, dialogue), towards a more physically motivated notion of sound and its propagation in space. This shift in conceptualizing game sound is tightly linked to the constant efforts to create more realistic portrayals of the sonic virtual environment. Particularly, game design is being strongly influenced by development in two areas, wherein meaning is encoded into sound rather by certain processes than the choice of assets: sound synthesis and 3D spatial modelling. These advances are introducing a capability to both create and manipulate sound in real-time, allowing much higher degrees of mimicking sonic events in the physical world. True real-time processing also introduces the possibility to model changes accurately in response to user action. In order to understand, and thus intentionally design, the information content of such interactions, the designer must necessarily attribute meaning to processes that in the end are responsible for what the player is listening to, conceptually linking algorithms with expression.
The natural way of thinking about sound in these interactions is to consider sound in terms of its physical properties as this provides a way of ascribing meaning (about the game world, proximity and type of sound) to the sonic processes the player will encounter. Modelling sound in a virtual environment includes modelling of the source (sound origination), modelling of the environment (sound medium) and modelling of the listener (observer). This is indeed the way much of sound design is conducted today. Typically, game sound engines and programming interfaces provide the means for (at least) the following functions to physically model sound propagation in space [11][24]:
- The positioning of sound sources in virtual space.
- Playing sounds relative to a certain listener position.
- Means for dynamically expressing the relation between listener and the sound source, e.g. in terms of a sound's spatial behaviour (e.g. attenuation, doppler effect).
- Elementary modelling of the virtual spaces and their effect on the sound (e.g. reverb).
Depending both on processing capacity and available memory, the level of fidelity varies between systems. Even so, the omnipresence of said functions makes it evident that modelling sound in physical terms, as source and medium, is already incorporated in the field of game sound.
The aim of this paper is to investigate more closely the third component of such a physically motivated model: the listener. Current design practises, and technological tools used by the industry, incorporate the listener only as a physical point in space, a spatial reference point. This work aims at broadening listener modelling to include the psychological aspects of hearing. In order to be able to truly mimic realistic perception of sound in 3D environments, it is necessary to understand the role of the listener not only as a physical point in space, but as a psychologically active interpreter of the environment. The particular focus will be on how the psychological properties of the listener can be expressed through modelling the perceptual properties of a virtual listener in a source–medium–listener model of game audio. This is an uncharted territory, both in terms of design and technology development.
The paper is structured as follows: The next section introduces the premises for a psychological listener model in game sound. Chapter 3 discusses in detail what kind of psychological processes influence hearing and the subjective soundscape, drawing on experimental research in psychoacoustics and perception. In Chapter 4, I provide a case example of a markedly altered psychological state, namely extreme stress, and how it is characterized in terms of perceptual distortions. Based on these findings, in Chapter 5, I provide a breakdown of the listener model and consider functions whereby the avatar's internal state can be externalized through sound design. I conclude by a discussion of the implications of a psychological listener model, both in terms of practical design and in relation to academic theory of game sound.
2 The Ears of the Player
Most systems for game sound consider the listener as a spatial reference point. The traditional model defines a reference point for calculating where in the game world the “microphone” is, i.e. defining the access point within the virtual world which determines what sounds the player will be able to hear. Often included is also spatial orientation for determining 3D sound location and attenuation depending on the viewing angle and its distance to the sound source.
However, modelling the listener would allow for much more than this. One aspect of extending such a listener model would be to describe, in more detail, the physical properties of the listener. For example, having water in the ears after swimming will change how incoming sound is heard. Similarly wearing a thick hat would influence hearing. This is, owever, not the focus of this paper. Instead, I will look at a previously neglected aspect of listener modelling and move from details of the physical properties of the ears inside the listener's head.
The potential to model psychological processes arises whenever the avatar perceptually represents the player within the game. This is a situation not unlike the First Person perspective in visual domain. Here, the player character, as a virtual embodiment, is presented as capable of perception and the act of perceiving is what brings the representations to the player. In the visual domain, this means that the player sees what the character is seeing. It is as if the player were sitting in the character’s head, watching and interpreting incoming perceptual data. Similarly, in First Person Audition, it is suggested to the player that incoming sound (that they hear through the speakers or headphones) corresponds to what the player character perceives (sounds passing through the character’s ears).
In a real world situation, the hearing (and vision) of a person is highly influenced by a multitude of psychological processes. Consider the below quote by film theorist James Lastra:
[T]here is no “Innocent Ear.” There is never a fullness to perception that is somehow “lost” by focusing on a portion of the event, by using the event for certain purposes, or simply by perceiving with some particular goal, say understanding, in mind. [18]
We often take for granted that it is the sounds that are present in an environment that define and shape the auditory experience. However, as Lastra points out, all sounds must be heard to be perceived, and perception is by necessity subjective. There is no “pure” sound percept that is free from interpretation, for the process is inherently interpretative. Thus, the auditory experience is always subjectivized, tinted by the not-so-innocent ear of a listener, her motivations and preconceptions. The action of listening defines the auditory experience.
A common demonstration is the cocktail party effect [6]. This is the name given to the psychoacoustic phenomenon whereby listeners can zoom in on a conversation in a crowded room, segregating a conversational stream from the background noise of all other conversations. Notice that the cocktail party effect is purely subjective and the event of zooming in on any of the conversations is a psychological/attentional process. Thus, nothing in the room changes, but the individual's aural experience would seem to differ from case to case, depending on what they are listening to.
First-person audition is effectively positioning the player within the characters head, thus inside these processes. Some perceptual processing is thus transferred—outsourced to the game character. Acknowledging this psychological layer extends listener modelling to include more than reflecting the physical location and orientation of the avatar. It defines the character's ears (and eyes), not only as a physical, but also as a psychological access point to the world.
Let us continue our thought experiment with the cocktail party effect. Suppose we were designing sounds for a game, where an important conversation takes place in a crowded room. Since we want to ensure that the player hears the important discussion, it is quite natural to make the sound of the discussion stand out in such a way that the player is more likely to hear it, no matter what they are occupied with in the game world.
Someone may say that in this case we are making a choice for unrealistic portrayal. However, what the designer is doing is actually rather illustrative of the psychological process of attentional source selection that listeners employ in natural situations. The only difference is, in the game the effort of selection is taken from the player and sound is given attentional bias already at the level of game character. Instead of relying on the player to zoom in on the specific conversation, the choice to provide an acoustic close-up and make the conversation audible no matter what, is used to reflect the avatar's focus of attention. I have previously suggested a similar process is used to explain the passive viewing position and increase immersion in film sound. The apparently unrealistic sound conventions are illustrative of attention, both expressing and facilitating a focus on important storytelling events—the cocktail party effect is being outsourced [9]. The design thus lures our attention to events that might otherwise go unheard:
Acoustic close-ups make us perceive sounds which are included in the accustomed noise of day-to-day life, but which we never hear as individual sounds because they are drowned in the general din. Possibly they even have an effect on us but this effect never becomes conscious. If a close-up picks out such a sound and thereby makes us aware of its effect, then at the same time its influence on the action will have been made manifest. [2]
The cocktail party effect is the result of a voluntary attentional process. Even so, attention and judgement are affected by a multitude of psychological processes. Working along the same approach as with cocktail party effect, then, it is possible to selectively play sounds to reflect the psychological and emotional state of the listener—again, outsourcing the interpretation of sounds, this time to reflect emotion.
Already the microphone has crossed the threshold of the lips, slipped into the interior world of man, moved into the hiding places of the voices of consciousness, of the refrains of memory, of the screams of nightmares and of words never spoken. Echo chambers are already translating not just the space of a set but the distances within the soul. [10]
The above two quotes—by film theorists Bela Balasz and Jane Epstein, respectively—both concern film sound. The design process of film sound differs from that of game sound. However, the notion of outsourcing interpretation can be extended to game sound quite readily, by means of listener modelling. This holds especially for games already utilizing a source–medium–listener sound design. The psychologically informed listener model takes into account that what is presented as the sounds of the environment is not the scene itself from a given point in space, but the scene as heard if listened to by (selectively) attentive ears. In conceptual terms, the listener model consists of an informed strategy for handling the listening point. In practise, the model can manifest as a processing layer between the in-game "microphone" and sound output, or it may simply underlie the general design and thus provide rules for exercising selectivity and making modifications. We will consider in more detail what kind of strategies may be employed for this purpose in Chapter 5. First, however, we will look at what psychological processes are included in listening, and also present some empirical data to inform our model.
A note of caution is in order here. In upcoming chapters I will use examples of empirically validated research to argue the case for a listener model. However, I am by no means implying the designer should restrain himself to mimicking real-world effects. On the contrary, many of the phenomena described here may not be ideally expressive unless exaggerated, or stylized, to match the overall aesthetic of the game. Listener modelling nevertheless provides an explicit and pragmatic approach to formulating sound manipulations in terms of psychological process, regardless of whether the designer chooses to employ a realistic or exaggerated style for sound manipulations.
3. The Subjective Listener
Let us consider in more detail how the psychology of listeners—their intentions, emotions, and thoughts—influence hearing. I will now consider separately four major influences contributing to the subjective auditory landscape: attention, emotion, multimodal processing and internal sounds.
3.1. Attention
Attention is a key factor in determining what sounds are heard, but also how sounds are heard. Cusack and Carlyon [7] provide a review of how attentional processes influence perception, demonstrating that perceptual organization can affect the perception of pitch, timbre and rhythm, and influence the listener's ability to make temporal judgements. They also suggest attention works by a hierarchical decomposition of the soundscape, in which attentional selection processes level by level 'unlock' access to more intricate acoustic detail as a listener traverses the hierarcy: Concentrating on a single source within the environment (band playing music) opens up to focusing on a single instrument (guitar). Then, only when the attention is fixed on the guitar is it possible to pay detailed attention to the individual sound material within that stream (fret noise, string sound). If attention is lost (a friend asks a question), re-focusing on the guitar fret sounds requires traversing the hierarchy again—the detail is not accessible from the main level. [7]
Attention thus influences not only what, but also how things are heard. Further, the user's intentions behind listening will guide the traversal of the "listening hierarchy" and so determine what detail the listener will and can attend to at each moment. Obviously, there are a lot of strategies to listen for a sound. Tuuri, Mustonen, and Pirhonen [25] identify eight listening modes with associated focus on different aspects of the sound (see Table 1). The modes vary in level of detail and the type of attention they incorporate and consequentially also the different types of judgement affordances present at each mode. All in all, considering the various attentional factors at play in sound perception, it becomes clear that the subjective stance has significant effects on the judgement of actual sounds.
Table 1. Listening modes. Adapted from Tuuri, Mustonen, and Pirhonen [25].
| Type | Mode | Description |
| Preconscious | Reflexive | Automatic fast responses |
| Connotative | Associations, connotations | |
| Source orientated | Causal | Concerns the origin of the sound, ecological |
| Empathetic | Emotional meaning of sound | |
| Content oriented | Functional | Reason for sound, recognizing/rationalizing about its function |
| Semantic | What does the sound mean (language, semantics) | |
| Critical | Critical considerations of the sounds appropriateness, authenticity, etc. | |
| Quality oriented | Reduced | Focus on the sound itself, acoustic quality |
3.2. Emotion
It is not only voluntary direction of attention that shapes our auditory experience. Emotions are an example of a spontaneous process that influences sound perception. First of all, emotion directs attention automatically. Vujilleumier provides an excellent review of the effects by emotion on attention. To summarize, the more emotionally salient a stimulus is, the easier it is to detect and, on the other hand, the harder it is to ignore. The effect is particularly prominent for threatening stimuli; an expected bias that ensures the organism is more likely to react and take action to threatening situations. [26]
Also, if the listener is in an emotional state, this is reflected on his/hers perceptual processes. Emotional priming appears to influence memory recall [23], which may in turn influence the associations and interpretations made by the listener. Further, Spreckelmeyer et al. [22] show evidence that emotionally congruent events will be more swiftly processed than noncongruent stimuli. Congruent, here, means that the event carries emotional content similar to the perceiver's own emotional state). Thus, it seems the listener's emotional state may bias processing so as to favour incoming sound that is in line with the current emotional state. [22]
Preference, cultural, and social factors compose a frame of reference in evaluating heard sounds. For example, studies on noise perception have shown that attitudes influence judgements of loudness. Sounds that are judged as pleasant are estimated as being softer than, for example, aircraft noise, which is usually experienced as unpleasant. [19]
3.3. Multimodal Perception
The interpretation of sound is also continuously influenced by data from other senses. In general, a likely result of the simultaneity of events is that visual perceptions (and, in some cases other senses as well) will influence the perception of sound, and vice versa. Given how fundamental the audiovisual combination is to film, and computer games, it seems forced to produce scientific evidence that there is an interaction between the two senses. However, a majority of the multimodal effects encountered in games and film are explained by two perceptual illusions, the Ventriloguism phenomenon and the McGurk effect.
The Ventriloguism phenomenon, originally demonstrated by Howard and Templeton [15] refers to the ability of a visual stimuli to "capture" the sound, so that the sound is perceived as emanating from the point suggested by the picture. The McGurk effect, described in a study by McGurk and MacDonald shows [19] that two simultaneously presented stimuli—a sound and a video clip (speech)—will modulate the perception of the other modality. Together, these two effects underpin much of the toolbox of audiovisual expression, and explain why synchronized sound effects are so essential in bringing life to the events on screen.
Multimodal processing also extends the influence of emotional congruence discussed above. De Gelder shows that emotional processing biases are amplified when the same emotion is perceived simultaneously through both the auditory and the visual channel [12].
3.4. Internal Sound: Bodily Sounds and Hallucinations
The subjective soundscape is also defined by the inevitable mixing of perceived sound with the multitude of sounds made by the body, part of which are practically inaudible to other people than the subject herself. These include bone-conducted sounds of bodily contact with external objects as well as internal sound-producing processes. An illustrative example is the sounds of eating. The act of consuming a crispy cracker may effectively mask softer sounds in the environment. Less audible, but nevertheless important to the subjective experience is the sounds of breathing, heartbeat and nervous activity. This is how John Cage describes his visit to the anechoic chamber at Harvard:
[I]n that silent room, I heard two sounds, one high and one low. Afterward I asked the engineer in charge why, if the room was so silent, I had heard two sounds. He said, “Describe them.” I did. He said, “The high one was your nervous system in operation. The low one was your blood in circulation.” [5]
Self-produced sound also introduces an interesting facet to multimodal perception. Ballas points to the fact that in the case of self-produced sound, there will be an exceptionally tight matching between the sound and the haptic sense of the activity [3]. Therefore, it is likely that of our sense of self-produced sounds is in fact a multimodal percept combining both the tactile and kinaesthetic sensations relating to producing the sound, and the sound itself.
Subjective sound perception may also involve the internal mental representation of thoughts and memories; indeed, these are often represented in film as dialogue or sound flashbacks. Finally, the personal soundscape may involve various auditory hallucinations. The figments of our imagination need not be outrageous, even if they certainly can be. Examples of rather ordinary auditory hallucinations are the ringing of the ears commonly referred to as tinnitus and the earworm, or having a song that persistently plays in your head.
4. A Case Example—Perceptual Distortions During Extreme Stress
The previous section considered various factors involved in shaping the subjective experience of sound. However, these isolated facts form only fragments of the larger picture. In order to illustrate the extent of subjective perception, I will next provide a case example of how these factors combine to shape the holistic experience. The example considers the special case of extreme stress and how it influences hearing.
4.1. Extreme stress
Extreme stress can alter a person's perception quite drastically. Psychologists working with law enforcement officers and soldiers report that in highly demanding conditions, people are likely to experience severe perceptual distortions. Typical effects include changes in awareness of time (slow-motion time, fast-motion time), changes in vision (tunnel vision, or its opposite visual clarity) and audition (decreased or increased auditory sensitivity). I will here draw primarily on two prominent studies investigating law enforcement officers' responses to officer-involved shootings, one by David Klinger [17] and the other by Alexis Artwohl [1]. Both studies consisted of interviews with officers who had been involved in shootings.
Table 2 presents a summary of perceptual distortions in 113 officer-involved shootings, from Klinger [17]. In total, perceptual distortions were reported in in 107 (95%) of the cases. The numbers are comparative to the similar study by Artwohl [1].
Interestingly, the single most commonly experienced distortion was diminished sound. “Auditory blunting” occurred in 82% of the cases. On the other hand, in 20% of the cases, sounds were perceived as exceptionally strong or clear [17]. For comparison, Artwohl's study reported experiences of "diminished sound" in 84% and "intensified sound" in 16% of the studied events [1]. Next I shall consider in more detail what these effects entail and when they appear.
Table 2. Officers’ perceptual distortions during shooting incidents. Adapted from Klinger [17].
| Type of distortion | At any time | Prior to firing | Upon firing |
| Tunnel vision | 51% | 31% | 27% |
| Heightened visual detail | 56% | 37% | 35% |
| Both visual distortions | 15% | 10% | 11% |
| Auditory blunting | 82% | 42% | 70% |
| Auditory acuity | 20% | 10% | 5% |
| Both aural distortions | 9% | 0% | 9% |
| Slow motion | 56% | 43% | 40% |
| Fast motion | 23% | 12% | 17% |
| Both time distortions | 2% | 0% | 2% |
| Other | 13% | 6% | 9% |
| Total | 95% | 88% | 94% |
4.2. Auditory Blunting
Auditory blunting refers to the loss of sound detail. It may be an inability to hear very loud sounds one would ordinarily hear. Artwohl [1] also mentions subjects hearing sounds in a modified way, sounding muffled, or distant. The following quote from a subject interviewed in Artwohl's study provides an account of auditory blunting:
If it hadn't been for the recoil, I wouldn't have know my gun was working. Not only didn't I hear the shots, but afterwards my ears weren't even ringing. (Subject quoted in Artwohl's report [1])
Of the subjects in Artwohl's study, the vast majority reporting diminished sound stated that it was their own gunshots that were muted. In many cases, the sounds weren't completely lost, instead shots sounded like pop guns or the gunshots were just not perceived as loud as the subjects felt they should have been. The processes underlying auditory perceptual changes are still not completely understood. Some of the literature suggests the effects are due to the rapid discharge of stress hormones (e.g., adrenalin) that occurs when the sympathetic nervous system is activated by the brain’s perception of a life-threatening situation [14]. The end effect of such a stressful event appears to be a very crude prioritizing of cortical function, whereby autonomic attentional selection ensures the organism is fully concentrated on survival. Grossman and Christensen write:
Our brains must constantly tune out sensory data or we would be overwhelmed. In extreme stress situations, this screening process can be even more intense, as we tune out all senses except the one we need for survival. [13]
There also appears to be a mechanism in which loud noises are physically and mechanically muted or silenced for a brief moment. Subjects report not having heard the sound, nor experiencing the usual ringing in the ears afterward.
4.3. Auditory Acuity
As seen in Table 2, auditory acuity is relatively rare. Neither Klinger nor Artwohl provide any descriptions nor explanations as to what auditory acuity might entail. Discussing these studies, however, Grossman and Christensen [13] provide a particularly haunting example of what this category might entail. They write about an officer, who blinded and immobilized lies on the floor and as the perpetrator walks closer, perceives in hyper-clear detail the approaching footsteps. Explaining these accounts, they suggest that in situations where there are no visuals (such as when the person in the dark), the brain may decide to allocate all available resources to the auditory modality, excluding the poor visuals and favouring sound [13].
Interestingly, 8% of the subjects experienced both auditory blunting and increased acuity: this suggests the two seemingly contradictory effects can co-exist, and that probably the effects focus selectively on different types of sounds.
4.4. Predictability of Perceptual Distortions
Forming a deeper understanding of changes in auditory perception during stress involves examining the predictability of the effects. Such predictability would also be of benefit if we want to model the changes, as then it becomes obvious that we need some means of mapping these very subjective experiences to identifiable external events. The reports suggest a few regularities in perceptual distortions. We have already covered that some contextual factors (e.g. darkness) may explain the advent of auditory acuity. Table 2 also suggests a temporal development related to the whether effects set in prior to, or only during the moment of, firing. Looking at the temporal aspect of auditory blunting vs. acuity in Klinger's study, we notice auditory perceptions were altered in more than half the cases already before firing. A diminution of sound was experienced in 42% of the shootings and intensification in 10%. During shooting, cases of auditory blunting increased dramatically to 70% of all subjects.
A possible explanation for this development resides in the change in stress between the two conditions. Similarly, Grossman and Christensen [13] identify three degrees of auditory muting, and link them to different levels of stress. These are listed below in Table 3 together with the level of stress at which the effect is typically encountered. Stress levels are given as stress-induced heart rates. Note that these heart rates are assumed to indicate purely stress-induced arousal, not physical exercise (physical exercise resulting in the same heart rates typically does not entail perceptual distortion).
Table 3. Type of auditory blunting at different levels of stress. Stress level given in stress-induced heart rate elevation (BPM: Beat Per Minute).
| Sress level | Heart Rate (BPM) | Auditory blunting | Situations encountered |
| High | < 115 | Selectively muting only own weapon sounds. | Common at lower levels of stress, experienced not only in deadly-force encounters, but also by e.g. hunters. |
| Very high | 115- 145 | Muting of all gunshot sounds, but not other sounds. | Apparently the most typical response in stressful, high-action situations encountered by law enforcement officers. |
| Extreme | > 145 | Muting of all sound. | Extreme stress reaction, encountered only at times of severe arousal. |
Finally, Klinger's study revealed certain correlations that indicate how auditory effects coincide with other perceptual distortions. Particularly the links between sound and temporal distortion are interesting. The study shows that fast-forward time seems to coincide with auditory accuracy whereas slow-motion time is characterized by auditory blunting or muffled sound.
5. The listener position and sound design
What is a listener model, then? It is a collection of strategies for handling incoming sound at the listener position. The aim of these strategies is to create an audible representation of the listening character's internal mental processes. Subjective listening is essentially a narrative tool: it describes the internal state of the character. Some pointers as to how one may be using this technique would be expected to be found in films, and indeed it is. Let us look at a few examples so as to understand how it is being used. Bordwell and Thompson [4] provide this example from the film Possessed, by Curtis Bernhardt:
The central character is gradually falling deeper into mental illness; in one scene she is alone, very distraught, in her room on a rainy night. We begin to hear things as she does; the ticking of the clock and dripping of raindrops gradually magnify in volume. Here the shift in fidelity functions to suggest a psychological state. [4]
Another example comes from the film Saving Private Ryan. David Sonnenschein quotes sound designer Gary Rydstrom on the sound manipulations done at a specific point in the story:
When Tom Hanks is suffering a momentary hearing loss from shell shock, the visuals don't change much, but we did a radical thing with the sound by shutting down the outside world. You get distorted bits and pieces of the outside, but mostly just this seashell roar as if you're inside his head. […] There's also a rising tone, like a teakettle boiling up to that point, then it snaps back to reality. [21]
The two above examples are from the domain of film sound, however, similar powerful sound portrayals can, and have been, employed to game sound as well. For example, in Max Payne 2: The Fall of Max Payne (Rockstar games 2003) the main character has the capacity to enter a special mode, bullet time, which allows slow motion action. The special state is characterized by a pitching down of all environment sounds. This is suggestive of a special mental state. Another game, Prince of Persia: The Sands of Time (Ubisoft, 2003) similarly filters background sounds and introduces a windy breath-like sound over moments in which the main character manipulates time (rewinding time & slow motion). In Scarface: The World is Yours (Sierra Entertainment, 2006), the main character will gain moments of temporary invincibility called a rage mode. This rage mode is characterized by manipulations of background sounds, and the addition of a dialogue track of the main character's cursing. The subjective view is further enhanced by visuals and the screen is tinted red throughout the special condition. These examples demonstrate that subjective sound can be used also in games as a signifier of the player character's mental state.
[Table 4. Sound manipulations applicable to the listener model]
| Effect category | Manipulation | Motivation | Description | Example |
| Sound selection | Controlling playback, sound prioritizing | Attention, multimodal perception | Sounds chosen for playback according to an internal model of attentional focus. Prioritizing the sound queue depending on current activity, field of view. | In a crowded soundscape, play only sounds relating to the current activity (e.g. manipulation of machinery) to suggest that the character is deeply concentrated on the task at hand. |
| Volume control | Mixing | Attention, perceptual distortions | Relative sound levels determined by internal model. Signifying attentional processes or emotional impact, expressing the subjective impact of sounds through silence or loudness. | Raise the volume of environmental sound and lower the sound of character's own sounds (footsteps, weapons manipulation) to suggest an oppressive and overpowering environment and weakness of the player. |
| Transformation | Filters | Perceptual distortions | Signifying altered states, particularly extreme emotions (diminished hearing). | Low-pass filter all environmental sounds when the character is severely hurt to suggest that the character is in a critical state and almost dead. |
| Additional sounds | Controlling playback | Body sounds, hallucination | Emotional impact through introduction of bodily sounds (breathing, heartbeat). Hallucinations, memories. Internal dialogue. | Add a channel of chaotic giggles or whispering when there is no-one in sight to suggest the character is loosing his/her mind. |
| Sound localization | 3D positioning, sound source clustering | Attention / perceptual distortions | Grouping object locations depending on how they are attended to (clustering non-attended sound to the same location). Emotional impact by violating source stability. | Abruptly move a few prominent ambient sounds from place to place to create a sense of disorientation and bewilderment. |
| Sound spatialization | Virtual acoustics | Attention / hallucination / perceptual distortions | Exaggerated reverb and echo to signify altered states. | Use sounds with contrastingly little reverberation to suggest sounds are heard as inside the head. |
5.1. Constructing the Listener Model
Modelling the listener is unlike determining the receiving position for sounds in the game (placing the microphone) or modelling a physical property obstructing the listening point (describing what the thick hat does to sound). Instead, the listener model describes processes that take place at the physical listener position. The effect introduced by attentional and emotional processes certainly only affects sounds that are physically received at the listener's ear, but differs from physical models. Importantly, all sounds received at this point will not be handled equally.
The listener model can influence incoming sounds in a number of ways. It may influence all of them, or effects may selectively focus on only a certain type of sounds. Table 4 summarizes aspects of sound manipulation and how they relate to the listener model. The table identifies six effect categories that can be used,alone or in combination, to mimic or exaggerate the psychological functions detailed in previous chapters. These are: sound selection, volume control, transformation, additional sounds, sound localization and sound spatialization. The table includes examples of how these techniques might be used to express subjective listening position. Even so, it is not suggested that the auditory manipulations work in isolation from other aspects of representation. In order to achieve the desired effect, other aspects of the design (visuals, gameplay) must also be considered.
Listener models may, or may not, strive for natural representations. Since the motivation of listener modelling is primarily expressive, it is perhaps not necessary to aim for accurate models but instead adopt such means that best serve a certain narrative goal. Nevertheless, as the case of extreme stress illustrates, even extreme manipulations of sound are in line with requirements for realism (the effects do happen to ordinary listeners in the real world).
In terms of application, it is completely up to the designer where and when to invoke the subjective listening model. It may be in constant use, or it may be employed only at certain moments during the game. Particularly, hints of subjective sound experience may be utilized in situations, where the avatar is in an altered or critical state, as it benefits to be able to signal this state to the player. Here, the linkages and regularities identified relating to extreme stress suggest a few possible event-based triggers to invoking the distortions in games. Particularly the notion of a stress level may be a useful approach to modelling at what point in the game or in which situations perceptual distortions kick in. The links between temporal distortions and auditory effects may also provide important, especially in games that involve time manipulations in their gameplay. For these, the studies on perceptual distortion may provide an explicated connection, whereby temporal manipulations are made audible.
The inclusion of a listener model in the sound design provides a conceptual approach that influences many of the central decisions in sound design. Particularly, it provides an expressive alternative alongside the current paradigm of aiming for physical fidelity. However, on the technical level, it works in similar ways. A listener model is all about selection, modification and playback of sounds. Only the rationale differs. For example, in most contemporary games, handling a growing number of sound sources is a challenge. Whereas physical fidelity allows one approach to sound source selection, attentional and emotional processes provide an alternative base for pruning the soundscape depending on the player character's state. Similarly, the listener model may inform sound source clustering (representing several nearby sources as one sound point), prioritizing and culling. Tsingos, Gallo and Drettakis [24] discuss the limitations on the number of simultaneous audio channels. Their solution is to involve a psychoacoustically informed model for dynamic mixing. I propose to extend even further, and suggest adding a interpretatory layer as well.
6. Conclusions
This work has investigated a previously overlooked component of game sound: the virtual listener. Current design practises, and technological tools used by the industry, incorporate the listener only as a physical point in space, a spatial reference point. I suggest broadening listener modelling to include the psychological aspects of hearing. The avatar represents a psychologically active interpreter of the environment. The avatar is viewed here as an active listener, thus outsourcing some of the perceptual capabilities ordinarily involved in everyday listening. My argument is that in a game, psychological process can be expressed through making audible (or visible) the perceptual processes of the avatar. By manipulating incoming sound—those heard by the player "inside" the character's head—we can reflect the psychological, emotional and attentional process of a game character. Thus, by perceptually filtering the sounds of the game world, we can express psychological properties of the character, in how game sound is heard and manipulated.
The listener model provides a conceptual link between the psychological state of the listener (narrative property) to procedures and rules for determining how to handle sounds in the game world environment. The selective outsourcing of emotional and attentional processes is essentially a way of adding expression. The technique can be easily proceduralized, and is readily applicable to games.
The manipulation of sounds due to listener modelling can influence incoming sounds in a number of ways. Modifications may influence all of them, or effects may selectively focus on only a certain type of sounds. The decisions to attenuate, or highlight, sounds provide tool for expression. I have provided an example of how such an expressive process could be initiated, by the examination of listener processes during situations of extreme stress. Incidentally, while “extreme conditions” that give rise to such perceptual distortions are rarely met by the ordinary person, they are the raw stuff of many games in which the main challenge is the repeated encounter with an opposing deadly force. The material presented here, based on two studies recounting the real-life extreme situations as experienced by law enforcement officers, already provide a sufficient body of information to start modelling these effects.
From a practical viewpoint, this study bears contact points to several of the most fundamental steps in the sound design process, such as writing a strategy for the handling of multiple simultaneous sound sources—how to prioritize sounds and balance the mix. Perceptual distortion and listener modelling is by no means the only way to go, but if applicable to the general aesthetic of the game, perceptual processes can provide a nonarbitrary and psychologically motivated basis informing design decisions.
On the theoretical side, an ongoing discussion within game sound has been that in games that strive for realism in their sound design, the apparent realism of game sound is achieved through means that in many cases are physically unjustified [8][9][16]. The theoretic framework and the concept of subjective listener modelling may partly justify these processes, explaining the apparent incoherence through an additional layer of listener interpretation.
7. Acknowledgements
I wish to thank Cumhur Erkut and Petri Lankoski for their valuable comments on draft versions of this article.
References
[1] Artwohl, A. Perceptual and memory distortion during officer-involved shootings, Research Forum, FBI Law Enforcement Bulletin, (2002) Available at: http://findarticles.com/p/articles/mi_m2194/is_10_71/ai_93915941/
[2] Balasz, B. Theory of the film: Sound, In Weiss, E. and Belton, J. (eds.) Film Sound, Theory and Practice, Columbia University Press, 116—125 (1985)
[3] Ballas, J. Self-produced Sound: Tightly Binding Haptics and Audio, In Oakley, I. and Brewster, S. (eds.) HAID 2007, Lecture Notes in Computer Science 4813, 1—8 (2007)
[4] Bordwell, D. and Thompson, K. Fundamental Aesthetics of Sound in the Cinema, In Weiss, E. and Belton, J. (eds.) Film Sound, Theory and Practice, Columbia University Press, 181—199 (1985)
[5] Cage, J. A Year from Monday: New Lectures and Writings, Wesleyan University Press, (1967)
[6] Cherry, E. C. Some experiments on the recognition of speech, with one and with two ears, Journal of Acoustical Society of America 25(5), 975—979 (1953)
[7] Cusack, R. and Carlyon, R. Auditory Perceptual Organization Inside and Outside the Laboratory, In Neuhoff, J. (ed.) Ecological Psychoacoustics, Elsevier Academic Press, 15—48 (2004)
[8] Ekman, I. Understanding Sound Effects in Computer Games, In Proc. Digital Arts and Cultures 2005, Kopenhagen, Denmark (2005)
[9] Ekman, I. 2008. Psychologically Motivated Techniques for Emotional Sound in Computer Games, In Proc. AudioMostly 2008, 3rd Conference on Interaction with Sound, Piteå, Sweden. 20—26 (2008)
[10] Epstein, J. Slow-Motion Sound, In Weiss, E. and Belton, J. (eds.) Film Sound, Theory and Practice, Columbia University Press, 143—144 (1985)
[11] Gal, V.; Le Prado, C.; Merland, J.; Natkin, S. and Vega, L. Processes and tools for sound design in computer games, In International Computer Music Conference (ICMC), Gothenburg, Sweden, (2002)
[12] Gelder, B. de; Morris, J. and Dolan, R. Unconscious fear influences emotional awareness of faces and voices PNAS 102 (51) 18682—18687 (2005)
[13] Grossman, D & Christensen. L. On Combat, PPCT Research Publications, (2004)
[14] Grossman, D. and Siddle, B. Psychological Effects of Combat, In Lester R. Kutz (ed.), Encyclopedia of Violence, Peace, and Conflict, Volume 3, Orlando, FL, Academic Press, (1999)
[15] Howard, I. and Templeton, W. Human spatial orientation. London: Wiley, (1966)
[16] Jørgensen, K. What are Those Grunts and Growls Over There? Computer Game Audio and Player Action, Ph.D. dissertation, Copenhagen University, (2007)
[17] Klinger, D., Police Responses to Officer-Involved Shootings, National Institute of Justice, Department of Justice, (2002) Available at: http://www.ncjrs.gov/pdffiles1/nij/grants/192286.pdf
[18] Lastra, J. Reading, Writing, and Reprsesenting Sound, In Altman, R. (ed.) Sound theory, Sound practise, Routledge, 65—86 (1985)
[19] McGurk, H. and MacDonald, J. Hearing lips and seeing voices, Nature 264, 746—748 (1976)
[20] Namba, S. & Kuwano, S. Environmental Acoustics: Psychological Assessment of Noise, In Neuhoff, J. (ed.) Ecologial Psychoacoustics, Elsevier Academic Press, 175—190 (2004)
[21] Sonnenschein, D. Sound Design—The Expressive Power of Music, Voice and Sound Effects in Cinema. Michael Wiese Productions. (2001)
[22] Spreckelmeyer, K.; Kutas, M.; Urbach, T.; Altenmüller, E. and Münte, T. Neural processing of vocal emotion and identity, Brain Cogn. 69 (1), 121—126 (2009)
[23] Tobias, B.; Kihlstrom, J. and Schachter, D. Emotion and Implicit Memory, In Christianson, S-A. (ed.) The Handbook of Emotion and Memory: Research and Theory, Lawrence Erlbaum, 67— 92 (1992)
[24] Tsingos , N.; Gallo, E. and Drettakis, G. Breaking the 64 Spatialized Sources Barrier Gamasutra Features, May 29, (2003) http://www.gamasutra.com/view/feature/2850/breaking_the_64_spatialized_.php
[25] Tuuri, K.; Mustonen, M.-S.; Pirhonen, A., Same sound - Different meanings: A novel scheme for modes of listening, Proc. AudioMostly 2007, Ilmenau, Germany, 13—18 (2007)
[26] Vuilleumier , P. How brains beware: neural mechanisms of emotional attention, Trends in Cognitive Sciences 9 (12), 585— 594 (2005)