AUDITORY SCENE ANALYSIS

(from H. Purwins, B. Blankertz, and K. Obermayer. Computing auditory perception. Organised Sound 5(3), 2000)

The Binding Problem

We are always exposed to chaos of diverse sensory impressions. How can we identify an object in the environment? How can certain sensory impressions form a 'Gestalt' according to certain criteria and provide us with information about the object in the environment? (The binding problem) The 'Gestalt' concept originated from (Ehrenfels 1890) and (Mach 1886). They initially presented musical examples. Subsequently visual perception was investigated. From the seventies on, computer-supported sound synthesis and analysis enforced the application of Gestalt theory to auditory perception, exhaustively reviewed in (Bregman 1990).

Grouping Principles

In the following, principles are introduced which aid binding in auditory perception (Fig., Bregman 1990):The principle of 'proximity' refers to distances between auditory features with respect to their onsets, pitch, and loudness. Features that are grouped together have a small distance between each other, and a long distance to elements of another group. Temporal and pitch proximity are competitive criteria. E.g. the slow sequence of notes A-B-A-B... (Fig. A1) which contains large pitch jumps, is perceived as one stream. The same sequence of notes played very fast (Fig. A2) produces one perceptual stream consisting of A's and another one consisting of B's.

'Similarity' is very similar to proximity, but refers to properties of a sound, which cannot be easily identified with a single physical dimension (Bregman 1990: 198), like timbre.

The principle of 'good continuation' identifies smoothly varying frequency, loudness, or spectra with a changing sound source. Abrupt changes indicate the appearance of a new source. In (Bregman and Dannenbring 1973) (Fig. 1 B) high (H) and low (L) tones alternate. If the notes are connected by glissandi (Fig. 1 B 1), both tones are grouped to a single stream. If high and low notes remain unconnected (Fig. 1 B 2), H's and T's each group to a separate stream. 'Good continuation' is the continuous limit of 'proximity'.

The principle of 'closure' completes fragmentary features, which already have a 'good Gestalt'. E.g. ascending and descending glissandi are interrupted by rests (Fig. C2). Three temporally separated lines are heard one after the other. Then noise is added during the rests (Fig. C1). This noise is so loud, that it would mask the glissando, unless it would be interrupted by rests. Amazingly the interrupted glissandi are perceived as being continuous. They have 'good Gestalt': They are proximate in frequency before and after the rests. So they can easily be completed by a perceived good continuation. This completion can be understood as an auditory compensation for masking.

The principle 'common fate' groups frequency components together, when similar changes occur synchronously, e.g. synchronous onsets, glides, or vibrato. (Chowning 1980, Fig. D) made the following experiment: First three pure tones are played. A chord is heard, containing the three pitches. Then the full set of harmonics for three vowels (‘oh’, ‘ah’, and ‘eh’) is added, with the given frequencies as fundamental frequencies, but without frequency fluctuations. This is not heard as a mixture of voices but as a complex sound in which the three pitches are not clear. Finally, the three sets of harmonics are differentiated from one another by their patterns of fluctuation. We then hear three vocal sounds being sung at three different pitches.

Other important topics in auditory perception are attention and learning. In a cocktail party environment, we can focus on one speaker. Our attention selects this stream. Also, whenever some aspect of a sound changes, while the rest remains relatively unchanging, then that aspect is drawn to the listener's attention ('figure ground phenomenon'). Let us give an example for learning: The perceived illusory continuity (cf. Fig. C) of a tune through an interrupting noise is even stronger, when the tune is more familiar (Bregman 1990: 401).

REFERENCES

Bregman, A. S. 1990. Auditory Scene Analysis. Cambridge, MA: MIT Press.

Bregman, A. S. and Dannenbring, G. 1973. The effect of continuity on auditory stream segregation. Perception Psychophysics 13: 308-12.

Chowning, J.M. 1980. Computer synthesis of the singing voice. In Sound Generation in Winds, Strings, Computers. Stockholm: Royal Swedisch Academy of Music 29.

Ehrenfels, C. von 1890. Über Gestaltqualitäten. Vierteljahresschrift Wiss. Philos. 14: 249-92.

Mach, E. 1886. Beiträge zur Analyse der Empfindungen. Jena.



To top     Home     Mail      Contact      Résumé      Publications      Music & acting      Links

//