Table Of ContentModern Acoustics and Signal Processing
Jens Blauert
Jonas Braasch Editors
The Technology
of Binaural
Understanding
Modern Acoustics and Signal Processing
Editor-in-Chief
William M. Hartmann, East Lansing, USA
Series Editors
Yoichi Ando, Kobe, Japan
Whitlow W. L. Au, Kane’ohe, USA
Arthur B. Baggeroer, Cambridge, USA
Christopher R. Fuller, Blacksburg, USA
William A. Kuperman, La Jolla, USA
Joanne L. Miller, Boston, USA
Alexandra I. Tolstoy, McLean, USA
More information about this series at http://www.springer.com/series/3754
The ASA Press
The ASA Press imprint represents a collaboration between the Acoustical Society
ofAmericaandSpringerdedicatedtoencouragingthepublicationofimportantnew
booksinacoustics.Publishedtitlesareintendedtoreflectthefullrangeofresearch
inacoustics.ASAPressbookscanincludealltypesofbookspublishedbySpringer
and may appear in any appropriate Springer book series.
Editorial Board
Mark F. Hamilton (Chair), University of Texas at Austin
James Cottingham, Coe College
Diana Deutsch, University of California, San Diego
Timothy F. Duda, Woods Hole Oceanographic Institution
Robin Glosemeyer Petrone, Threshold Acoustics
William M. Hartmann (Ex Officio), Michigan State University
Darlene R. Ketten, Boston University
James F. Lynch (Ex Officio), Woods Hole Oceanographic Institution
Philip L. Marston, Washington State University
Arthur N. Popper (Ex Officio), University of Maryland
Martin Siderius, Portland State University
G. Christopher Stecker, Vanderbilt University School of Medicine
Ning Xiang, Rensselaer Polytechnic Institute
Jens Blauert Jonas Braasch
(cid:129)
Editors
The Technology of Binaural
Understanding
123
Editors
Jens Blauert Jonas Braasch
Institut für Kommunikationsakustik Schoolof Architecture
Ruhr-Universität Bochum Rensselaer Polytechnic Institute
Bochum, Nordrhein-Westfalen, Germany Troy,NY, USA
ISSN 2364-4915 ISSN 2364-4923 (electronic)
ModernAcoustics andSignal Processing
ISBN978-3-030-00385-2 ISBN978-3-030-00386-9 (eBook)
https://doi.org/10.1007/978-3-030-00386-9
©SpringerNatureSwitzerlandAG2020
Thisworkissubjecttocopyright.AllrightsarereservedbythePublishers,whetherthewholeorpart
of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations,
recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission
orinformationstorageandretrieval,electronicadaptation,computersoftware,orbysimilarordissimilar
methodologynowknownorhereafterdeveloped.
The use of general descriptive names, registered names, trademarks, service marks, etc. in this
publicationdoesnotimply,evenintheabsenceofaspecificstatement,thatsuchnamesareexemptfrom
therelevantprotectivelawsandregulationsandthereforefreeforgeneraluse.
Thepublishers,theauthors,andtheeditorsaresafetoassumethattheadviceandinformationinthis
book are believed to be true and accurate at the date of publication. Neither the publishers nor the
authorsortheeditorsgiveawarranty,expressorimplied,withrespecttothematerialcontainedhereinor
for any errors or omissions that may have been made. The publishers remain neutral with regard to
jurisdictionalclaimsinpublishedmapsandinstitutionalaffiliations.
ThisSpringerimprintispublishedbytheregisteredcompanySpringerNatureSwitzerlandAG
Theregisteredcompanyaddressis:Gewerbestrasse11,6330Cham,Switzerland
Preface
Sound, devoid of meaning, would not matter to us. It is the information sound
conveys that helps the brain to understand its environment. Sound and its under-
lying meaning are always associated with time and space. There is no sound
withoutspatialproperties,andthebrainalwaysorganizesthisinformationwithina
temporal–spatialframework.Thisbookisdevotedtounderstandingtheimportance
ofmeaningforspatialandrelatedfurtheraspectsofhearing,includingcross-modal
inference.
People, when exposed to acoustic stimuli, do not react directly to what they
hear but rather to what they hear means to them.
This semiotic maxim may not always apply, for instance, when the reactions are
reflexive. But, where it does apply, it poses a major challenge to the builders of
modelsoftheauditorysystem.Take,forexample,anauditory modelthatismeant
to be implemented on a robotic agent for autonomous search-&-rescue actions. Or
think of a system that can perform judgments on the sound quality of
multimedia-reproductionsystems.Itbecomesimmediatelyclearthatsuchasystem
needs
(cid:129) Cognitive capabilities, including substantial inherent knowledge
(cid:129) The ability to integrate information across different sensory modalities
To realize these functions, the auditory system provides a pair of sensory organs,
the two ears, and the means to perform adequate preprocessing of the signals
providedbytheears.Thisisrealizedinthesubcorticalpartsoftheauditorysystem.
Inthetitleofapriorbook,1thetermBinauralListeningisusedtoindicate afocus
on sub-cortical functions. Psychoacoustics and auditory signal processing con-
tribute substantially to this area.
1TheTechnologyofBinauralListening,J.Blauert(ed.),SpringerandASAPress,2013.
v
vi Preface
Thepreprocessedsignalsarethenforwardedtothecorticalpartsoftheauditory
system where, among other things, recognition, classification, localization, scene
analysis, assignment of meaning, quality assessment, and action planning take
place.Also,informationfromdifferentsensorymodalitiesisintegratedatthislevel.
Between sub-cortical and cortical regions of the auditory system, numerous feed-
back loops exist that ultimately support the high complexity and plasticity of the
auditory system.
The current book concentrates on these cognitive functions. Instead of pro-
cessing signals, processing symbols is now the predominant modeling task.
Substantial contributions to the field draw upon the knowledge acquired by cog-
nitive psychology. The keyword Binaural Understanding in the book title char-
acterizes this shift.
Both books, The Technology of Binaural Listening and the current one, have
been stimulated and supported by AABBA, an open research group devoted to the
development and application of models of binaural hearing.2
Thecurrentbookisdedicatedtotechnologiesthathelpexplain,facilitate,apply,
and support various aspects of binaural understanding. It is organized into five
parts, each containing three to six chapters in order to provide a comprehensive
overview of this emerging area. Each chapter was thoroughly reviewed by at least
two anonymous, external experts.
ThefirstpartdealswiththepsychophysicalandphysiologicaleffectsofForming
andInterpreting AuralObjectsaswellastheunderlyingmodels.Thefundamental
concepts of reflexive and reflective auditory feedback are introduced. Mechanisms
ofbinauralattention andattentionswitchingarecovered—aswell ashow auditory
Gestaltrulesfacilitatebinauralunderstanding.Ageneralblackboardarchitectureis
introduced as an example of how machines can learn to form and interpret aural
objects to simulate human cognitive listening.
The second part, Configuring and Understanding Aural Space, focuses on the
human understanding of complex three-dimensional environments—covering the
psychological and biological fundamentals of auditory space formation. This part
further addresses the human mechanisms used to process information and interact
in complex reverberant environments, such as concert halls and forests, and addi-
tionally examines how the auditory system can learn to understand and adapt to
these environments.
The third part is dedicated to Processing Cross-Modal Inference and highlights
thefundamentalhumanmechanismsusedtointegrateauditorycueswithcuesfrom
other modalities to localize and form perceptual objects. This part also provides a
general framework for understanding how complex multimodal scenes can be
simulated and rendered.
2https://www.kfs.oeaw.ac.at/index.php?option=com_content&view=article&id=1072&Itemid=
920&lang=de[lastaccessAugust30,2019].
Preface vii
The fourth part, Evaluating Aural-scene Quality and Speech Understanding,
focuses on the object-forming aspects of binaural listening and understanding. It
addresses cognitive mechanisms involved in both the understanding of speech and
the processing of nonverbal information such as Sound Quality and Quality-of-
Experience. The aesthetic judgment of rooms is also discussed in this context.
Models that simulate underlying human processes and performance are covered in
additiontotechniquesforrenderingvirtualenvironmentsthatcanthenbeusedtotest
these models.
The fifth part deals with the Application of Cognitive Mechanisms to Audio
Technology. It highlights how cognitive mechanisms can be utilized to create
spatialauditoryillusionsusingbinauralandother3D-audiotechnologies.Further,it
covers how cognitive binaural technologies can be applied to improve human
performance in auditory displays and to develop new auditory technologies for
interactive robots. The book concludes with the application of cognitive binaural
technologies to the next generation of hearing aids.
Bochum, Germany Jens Blauert
Troy, USA Jonas Braasch
Contents
Forming and Interpreting Aural Objects: Effects and Models
Reflexive and Reflective Auditory Feedback. . . . . . . . . . . . . . . . . . . . . . 3
Jens Blauert and Guy J. Brown
Auditory Gestalt Rules and Their Application. . . . . . . . . . . . . . . . . . . . 33
Sarinah Sutojo, Joachim Thiemann, Armin Kohlrausch
and Steven van de Par
Selective Binaural Attention and Attention Switching . . . . . . . . . . . . . . 61
Janina Fels, Josefa Oberem and Iring Koch
Blackboard Systems for Cognitive Audition. . . . . . . . . . . . . . . . . . . . . . 91
Christopher Schymura and Dorothea Kolossa
Configuring and Understanding Aural-Space
Formation of Three-Dimensional Auditory Space . . . . . . . . . . . . . . . . . 115
Piotr Majdak, Robert Baumgartner and Claudia Jenny
Biological Aspects of Perceptual Space Formation . . . . . . . . . . . . . . . . . 151
Michael Pecka, Christian Leibold and Benedikt Grothe
Auditory Spatial Impression in Concert Halls . . . . . . . . . . . . . . . . . . . . 173
Tapio Lokki and Jukka Pätynen
Auditory Room Learning and Adaptation to Sound Reflections. . . . . . . 203
Bernhard U. Seeber and Samuel Clapp
Room Effect on Musicians’ Performance . . . . . . . . . . . . . . . . . . . . . . . . 223
Malte Kob, Sebastià V. Amengual Garí and Zora Schärer Kalkandjiev
Binaural Modeling from an Evolving-Habitat Perspective . . . . . . . . . . . 251
Jonas Braasch
ix
x Contents
Processing Cross-Modal Inference
Psychophysical Models of Sound Localisation with Audiovisual
Interactions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 289
Catarina Mendonça
Cross-Modal and Cognitive Processes in Sound Localization. . . . . . . . . 315
M. Torben Pastore, Yi Zhou and William A. Yost
Spatial Soundscape Superposition and Multimodal Interaction . . . . . . . 351
Michael Cohen and William L. Martens
Evaluating Aural-Scene Quality and Speech Understanding
Binaural Evaluation of Sound Quality and Quality of Experience . . . . . 393
Alexander Raake and Hagen Wierstorf
The Language of Rooms: From Perception to Cognition
to Aesthetic Judgment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 435
Stefan Weinzierl, Steffen Lepa and Martin Thiering
Modeling the Aesthetics of Audio-Scene Reproduction. . . . . . . . . . . . . . 455
John Mourjopoulos
A Virtual Testbed for Binaural Agents . . . . . . . . . . . . . . . . . . . . . . . . . 491
Jens Blauert
Binaural Technology for Machine Speech Recognition
and Understanding. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 511
Richard M. Stern and Anjali Menon
Modeling Binaural Speech Understanding in Complex Situations . . . . . 547
Mathieu Lavandier and Virginia Best
Applying Cognitive Mechanisms to Audio Technology
Creating Auditory Illusions with Spatial-Audio Technologies. . . . . . . . . 581
Rozenn Nicol
Creating Auditory Illusions with Binaural Technology . . . . . . . . . . . . . 623
Karlheinz Brandenburg, Florian Klein, Annika Neidhardt,
Ulrike Sloma and Stephan Werner
Toward Cognitive Usage of Binaural Displays. . . . . . . . . . . . . . . . . . . . 665
Yôiti Suzuki, Akio Honda, Yukio Iwaya, Makoto Ohuchi
and Shuichi Sakamoto
Audition as a Trigger of Head Movements . . . . . . . . . . . . . . . . . . . . . . 697
Benjamin Cohen-Lhyver, Sylvain Argentieri and Bruno Gas