Table Of ContentDavid Zhang · Kebin Wu
Pathological
Voice
Analysis
Pathological Voice Analysis
(cid:129)
David Zhang Kebin Wu
Pathological Voice Analysis
DavidZhang KebinWu
TheChineseUniversity HuaweiTechnologies
ofHongKong(Shenzhen) Beijing,China
ShenzhenInstituteofArtificial
IntelligenceandRoboticsforSociety
Guangdong,China
ISBN978-981-32-9195-9 ISBN978-981-32-9196-6 (eBook)
https://doi.org/10.1007/978-981-32-9196-6
©SpringerNatureSingaporePteLtd.2020
Thisworkissubjecttocopyright.AllrightsarereservedbythePublisher,whetherthewholeorpartofthe
materialisconcerned,specificallytherightsoftranslation,reprinting,reuseofillustrations,recitation,
broadcasting,reproductiononmicrofilmsorinanyotherphysicalway,andtransmissionorinformation
storageandretrieval,electronicadaptation,computersoftware,orbysimilarordissimilarmethodology
nowknownorhereafterdeveloped.
Theuseofgeneraldescriptivenames,registerednames,trademarks,servicemarks,etc.inthispublication
doesnotimply,evenintheabsenceofaspecificstatement,thatsuchnamesareexemptfromtherelevant
protectivelawsandregulationsandthereforefreeforgeneraluse.
The publisher, the authors, and the editorsare safeto assume that the adviceand informationin this
bookarebelievedtobetrueandaccurateatthedateofpublication.Neitherthepublishernortheauthorsor
theeditorsgiveawarranty,expressedorimplied,withrespecttothematerialcontainedhereinorforany
errorsoromissionsthatmayhavebeenmade.Thepublisherremainsneutralwithregardtojurisdictional
claimsinpublishedmapsandinstitutionalaffiliations.
ThisSpringerimprintispublishedbytheregisteredcompanySpringerNatureSingaporePteLtd.
The registered company address is: 152 Beach Road, #21-01/04 Gateway East, Singapore 189721,
Singapore
Preface
The human voice contains rich information about the speaker, such as identity,
gender, health, situation, and emotion. However, the biomedical value of voice is
lessaddressedcomparedwithitsbiometricsapplicationssuchasspeechrecognition
and speaker identification. In this book, we present the importance and value of
voiceinaidingdiagnosisthroughpathologicalvoiceanalysis.
We systematically introduce our research works on pathological voice analysis
fromthefollowingthreeaspects:(1)areviewonpathologicalvoiceanalysisanda
guidelineonvoiceacquisitionforclinicalapplication;(2)designappropriatesignal
processingalgorithmsforpathologicalvoice;and(3)extractbiomedicalinformation
invoicetoimprovediseasedetection,suchasdictionary-basedfeaturelearningand
multi-audio fusion. Experimental results have shown the superiority of these tech-
niques. These proposed methods can be used in applications of voice quality
assessment, disease detection, disease severity prediction, and even the analysis of
other similar signals. This book will be useful to researchers, professionals, and
postgraduate students working in the field of speech signal processing, pattern
recognition, biomedical engineering, etc. This book also will be very meaningful
forinterdisciplinaryresearch.
The book is organized as follows: In Chap. 1, the development of pathological
voiceanalysisissystematicallyreviewed.Thenimportantfactorsintheacquisition
of pathological voice, sampling rate in particular, are discussed in Chap. 2. After
that, two of the widely used signal processing steps in pathological voice analysis,
whicharepitchestimationandglottalclosureinstant(GCI)detection,arestudiedin
Chaps. 3 and 4, respectively. In Chap. 5, feature learning based on spherical
K-means is proposed, in contrast to the traditional handcrafted features. In
Chaps.6and7,weinvestigatemulti-audiofusioninpathologicalvoiceanalysisso
as to make full use of the multiple audios collected for each subject. Finally, we
summarizethebookandgiveashortintroductiontofutureworksthatmaycontrib-
utetothefurtherdevelopmentofpathologicalvoiceanalysisinChap.8.
Ourteamhasbeenworkingonpathologicalvoiceanalysisformorethan5years.
We appreciate the related grant supports from the GRF fund of the HKSAR
v
vi Preface
Government,ResearchprojectsfromShenzhenInstitutesofbothBigDataandArti-
ficial Intelligence & Robotics for Society, and the National Natural Science Foun-
dation of China (NSFC) (61020106004, 61332011, 61272292, and 61271344).
Besides, we thank Prof. Guangming Lu and Prof. Zhenhua Guo for their valuable
suggestionsaboutourresearchonpathologicalvoiceanalysis.
Guangdong,China DavidZhang
Beijing,China KebinWu
April2020
Contents
1 Introduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1 PathologicalVoiceAnalysis. . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 ComputerizedVoiceAnalysis. . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.2.1 Introduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.2.2 BiomedicalValueofVoice. . . . . . . . . . . . . . . . . . . . . . . 5
1.2.3 PresentSituationofComputerizedVoiceAnalysis. . . . . . . 11
1.2.4 OpenChallengesinComputerizedVoiceAnalysis. . . . . . . 17
1.2.5 FutureDirectionsinComputerizedVoiceAnalysis. . . . . . 20
1.2.6 Summary. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
References. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
2 PathologicalVoiceAcquisition. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
2.1 Introduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
2.2 DatabaseDescription. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
2.3 ExperimentalResults. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
2.3.1 Metric1:InformationEntropy(SignalLevel). . . . . . . . . . 35
2.3.2 Metric2:ReconstructionError(SignalLevel). . . . . . . . . . 36
2.3.3 Metric3:FeatureCorrelation(FeatureLevel). . . . . . . . . . 37
2.3.4 Metric4:ClassificationAccuracy(SystemLevel). . . . . . . 38
2.3.5 Metric5:ComputationalCost(HardwareLevel). . . . . . . . 39
2.3.6 Metric6:StorageCost(HardwareLevel). . . .. . . .. . . . .. 40
2.3.7 DiscussionandGuidelineontheSamplingRate
Selection. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
2.4 Summary. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
References. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
3 PitchEstimation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
3.1 Introduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
3.2 RelatedWorks. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
vii
viii Contents
3.3 HarmonicsEnhancementforPitchEstimation. . . . . . . . . . . .. . . . 52
3.3.1 MotivationforHarmonicsEnhancement. . . . . . . . . . . . . . 52
3.3.2 TheoreticalAnalysis. . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
3.3.3 Implementation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
3.3.4 In-DepthAnalysisoftheProposedAlgorithm. . . . . . . . . . 60
3.4 ExperimentalResult. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
3.4.1 BasicExperimentalSetting. . . . . . . . . . . . . . . . . . . . . . . . 65
3.4.2 PerformancefortheNoisySpeeches. . . . . . . . . . . . . . . . . 66
3.4.3 ExtensiontoMusic. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
3.4.4 Discussion. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
3.5 Summary. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
References. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
4 GlottalClosureInstantsDetection. . . . . . . . . . . . . . . . . . . . . . . . . . . 75
4.1 Introduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
4.2 RelatedWorks. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
4.2.1 ClassicalGCIDetectionMethods. . . . . . . . . . . . . . . . . . . 79
4.2.2 Limitations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
4.3 GCIDetectionUsingTKEO. . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
4.3.1 TKEOandItsRelationshipwithGCI. . . . . . . . . . . . . . . . 82
4.3.2 AbsoluteTKEOwithScaleParameterk. . . . . . . . . . . . . . 83
4.3.3 MultiresolutionCombination. . . . . . . . . . . . . . . . . . . . . . 84
4.3.4 GMAT:GCIDetectionBasedonMultiresolution
AbsoluteTKEO. .. . . . . . . . . . . . . .. . . . . . . . . . . . . .. . 86
4.4 ExperimentalResults. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
4.4.1 BasicExperimentalSettings. . . . . . . . . . . . . . . . . . . . . . . 88
4.4.2 PerformanceComparison. . . . . . . . . . . . . . . . . . . . . . . . . 90
4.4.3 GCIsDetectioninPathologicalSpeechIdentification. . . . . 97
4.5 Discussion. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
4.5.1 ParameterSensitivity. . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
4.5.2 GMATandLowFrequencyNoise. . . . . . . . . . . . . . . . . . 101
4.5.3 RelationBetweenGMATandMSM. . . . . . . . . . . . . . . . . 101
4.5.4 DifferentPoolingMethodsinGMAT. . . . . . . . . . . . . . . . 102
4.6 Summary. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
References. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
5 FeatureLearning. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
5.1 Introduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
5.2 RelatedWorks. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
5.3 ProposedFeatureLearningMethod. . . . . . . . . . . . . . . . . . . . . . . 110
5.3.1 Preprocessing. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
5.3.2 Mel-Spectrogram. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
5.3.3 DictionaryLearningBasedonSphericalK-Means. . . . . . . 112
5.3.4 EncodingandPooling. . . . . . . . . . . . . . . . . . . . . . . . . . . 112
Contents ix
5.4 ExperimentalResults. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
5.4.1 DatasetandExperimentalSetup. . . . . . . . . . . . . . . . . . . . 113
5.4.2 PDDetectionwithLearnedandHand-CraftedFeatures. . . 114
5.4.3 AblationStudy. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
5.5 Summary. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118
References. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119
6 JointLearningforVoiceBasedDiseaseDetection. . . . . . . . . . . . . . . 123
6.1 Introduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123
6.2 RelatedWorks. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125
6.2.1 Notation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125
6.2.2 RidgeRegression. . . .. . . . . . . . . . . . .. . . . . . . . . . . .. . 125
6.2.3 Low-RankRidgeRegression. . . . . . . . . . . . . . . . . . . . . . 126
6.2.4 ε‐DraggingTechnique. . . . . . . . . . . . . . . . . . . . . . . . . . . 127
6.3 ProposedMethod. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128
6.3.1 JointLearningwithLabelRelaxedLow-RankRidge
Regression(JOLL4R). . . . . . . . . . . . . . . . . . . . . . . . . . . 128
6.3.2 AlgorithmtoSolveJOLL4R. .. . . . . . .. . . . . .. . . . . . .. 129
6.3.3 Classification. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132
6.4 ExperimentalResults. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134
6.4.1 DatasetandExperimentalSetup. . . . . . . . . . . . . . . . . . . . 134
6.4.2 TheDetectionofPatientswithCordectomy. . . . . . . . . . . . 135
6.4.3 TheDetectionofPatientswithFrontolateralResection. . . . 135
6.4.4 AblationStudy. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135
6.5 Discussions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139
6.5.1 VisualizationofJOLL4R. . . . . . . . . . . . . . . . . . . . . . . . . 139
6.5.2 ClassifierinJOLL4R. . . . . . . . . . . . . . . . . . . . . . . . . . . . 140
6.5.3 ExtensiontoFusionofMoreAudios. . . . . . . . . . . . . . . . . 141
6.6 Summary. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142
References. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142
7 RobustMulti-ViewDiscriminativeLearningforVoiceBased
DiseaseDetection. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147
7.1 Introduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147
7.2 ProposedMethod. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149
7.2.1 Notation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149
7.2.2 TheObjectiveFunction. . . . . . . . . . . . . . . . . . . . . . . . . . 149
7.2.3 OptimizationofROME-DLR. . . . . . . . . . . . . . . . . . . . . . 152
7.2.4 TheClassificationRuleofROME-DLR. . . . . . . . . . . . . . 155
7.3 ExperimentalResults. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156
7.3.1 DatasetandExperimentalSetup. . . . . . . . . . . . . . . . . . . . 156
7.3.2 TheDetectionofRLNP. . . . . . . . . . . . . . . . . . . . . . . . . . 157
7.3.3 TheDetectionofSD. . . . . . . . . . . . . . . . . . . . . . . . . . . . 159
7.3.4 AblationStudy. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159
x Contents
7.4 Discussions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162
7.5 Summary. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163
References. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164
8 BookReviewandFutureWork. . . . . . .. . . . . . . . . .. . . . . . . . . . .. 167
8.1 BookRecapitulation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167
8.2 FutureWork. . . .. . . . . .. . . . . . .. . . . . . .. . . . . . .. . . . . . .. . 169
References. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 170
Index. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171