Table Of ContentBBrriigghhaamm YYoouunngg UUnniivveerrssiittyy
BBYYUU SScchhoollaarrssAArrcchhiivvee
Theses and Dissertations
2015-12-01
FFaacciilliittaattiinngg CCoorrppuuss AAnnnnoottaattiioonn bbyy IImmpprroovviinngg AAnnnnoottaattiioonn
AAggggrreeggaattiioonn
Paul L. Felt
Brigham Young University - Provo
Follow this and additional works at: https://scholarsarchive.byu.edu/etd
Part of the Computer Sciences Commons
BBYYUU SScchhoollaarrssAArrcchhiivvee CCiittaattiioonn
Felt, Paul L., "Facilitating Corpus Annotation by Improving Annotation Aggregation" (2015). Theses and
Dissertations. 5678.
https://scholarsarchive.byu.edu/etd/5678
This Dissertation is brought to you for free and open access by BYU ScholarsArchive. It has been accepted for
inclusion in Theses and Dissertations by an authorized administrator of BYU ScholarsArchive. For more
information, please contact [email protected], [email protected].
FacilitatingCorpusAnnotationbyImproving
AnnotationAggregation
PaulL.Felt
Adissertationsubmittedtothefacultyof
BrighamYoungUniversity
inpartialfulfillmentoftherequirementsforthedegreeof
DoctorofPhilosophy
EricK.Ringger,Chair
KevinSeppi
ChristopheGiraud-Carrier
DeryleW.Lonsdale
QuinnSnell
DepartmentofComputerScience
BrighamYoungUniversity
December2015
Copyright c 2015PaulL.Felt
�
AllRightsReserved
ABSTRACT
FacilitatingCorpusAnnotationbyImproving
AnnotationAggregation
PaulL.Felt
DepartmentofComputerScience,BYU
DoctorofPhilosophy
Annotated text corpora facilitate the linguistic investigation of language as well as the
automationofnaturallanguageprocessing(NLP)tasks. NLPtasksincludeproblemssuchasspam
email detection, grammatical analysis, and identifying mentions of people, places, and events in
text. However,constructinghighqualityannotatedcorporacanbeexpensive. Costcanbereduced
by employing low-cost internet workers in a practice known as crowdsourcing, but the resulting
annotations are often inaccurate, decreasing the usefulness of a corpus. This inaccuracy is typically
mitigatedbycollectingmultipleredundantjudgmentsandaggregatingthem(e.g.,viamajorityvote)
toproducehighqualityconsensusanswers.
Weimprovethequalityofconsensuslabelsinferredfromimperfectannotationsinanumber
ofways. Weshowthattransferlearningcanbeusedtoderivebenefitfromout-datedannotations
which would typically be discarded. We show that, contrary to popular preference, annotation
aggregationmodelsthattakeagenerativedatamodelingapproachtendtooutperformthosethattake
aconditionapproach. Weleveragethisinsighttodevelop CSLDA,anovelannotationaggregation
model that improves on the state of the art for a variety of annotation tasks. When data does
not permit generative data modeling, we identify a conditional data modeling approach based on
vector-spacetextrepresentationsthatachievesstate-of-the-artresultsonseveralunusualsemantic
annotation tasks. Finally, we identify a family of models capable of aggregating annotation data
containingheterogenousannotationtypessuchaslabelfrequenciesandlabeledfeatures. Wepresent
amultiannotatoractivelearningalgorithmforthismodelfamilythatjointlyselectsanannotator,
dataitems,andannotationtype.
Keywords: crowdsourcing,corpusannotation,semanticembeddings,LDA,richpriorknowledge
ACKNOWLEDGMENTS
Firstandforemost,thankyou,Stephanie. Youhavebeentirelesslysupportivefrombeginning
to end. You havebeen a soundingboard forideas, an emotional coach, acopy editor,and a friend
through the long process of completing a dissertation. In many ways an acknowledgment seems
likean insufficientway todescribeyour role—thereality iscloser to jointauthorship. Thanksalso
go to my children, Jane, Nathaniel, Gabriel, and Mackay who have grudgingly allowed me to work
many late nights, but never without remindingme thattime is precious. Similar thanksgo to my
parents, Doug and Shelley, as well as my parents-in-law, Scott and Jane. Without their constant
supportthisworkwouldnothavebeenpossible.
My advisor, Dr. Eric Ringger, has been instrumental in helping me develop the mental
tools necessary to do this work. His remarkable attention to detail and rigor of thought were what
originally attracted me to the field of natural language processing. I have benefited greatly from
his passion for exploring the unknown and expanding the intellectual, geographical, and culinary
horizons of himself and those around him. His role in this work cannot be overstated. Dr. Kevin
Seppi has been similarly influential, consistently willing to drop whatever he was doing in order
to discuss a new idea.
I would also like to thank Dr. Jordan Boyd-Graber for his close collaboration on the ideas
that went into Chapter 5 related to the CSLDA model. My committee also deserves a good deal of
t hanks for repeatedly providing valuable ideas and feedback, giving perspective, and helping
my work stay focused in useful directions. Also, many thanks go to fellow students who have,
surprisingly, been among my most effective teachers; in particular, Robbie Haertel, Dan Walker,
Kevin Black, and Jeff Lund.
Thanks go to the Fulton Supercomputing Lab for providing the computational resources
supportinganumberoftheexperimentsinthiswork.
Finally,thisworkwaspartlysupportedbythecollaborative NSF Grant IIS-1409739(BYU)
and IIS-1409287(UMD). Anyopinions,findings,conclusions,orrecommendationsexpressedhere
arethoseoftheauthorsanddonotnecessarilyreflecttheviewofthesponsor.
TableofContents
1 Introduction 1
1.1 OverviewofCorpusAnnotation . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 OpportunitiestoImproveCorpusAnnotation . . . . . . . . . . . . . . . . . . . . 3
1.3 ThesisStatement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.4 DissertationOrganization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2 UsingTransferLearningtoAssistExploratoryCorpusAnnotation 7
2.1 ExploratoryCorpusAnnotation(ECA) . . . . . . . . . . . . . . . . . . . . . . . . 7
2.2 PreviousWork . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.2.1 TransferLearning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.3 ECAasTransferLearning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.3.1 Baselines: TGTTRAINandALLTRAIN . . . . . . . . . . . . . . . . . . 12
2.3.2 STACK . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.3.3 AUGMENT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.4 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.5 ConclusionsandFutureWork . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
3 MOMRESP:ABayesianModelforMulti-AnnotatorDocumentLabeling 19
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
3.2 PreviousWork . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
3.3 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
3.3.1 Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
v
3.3.2 Inference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
3.3.3 ClassCorrespondenceCorrection . . . . . . . . . . . . . . . . . . . . . . 25
3.3.4 LossFunctions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
3.4 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
3.4.1 ClassCorrespondenceCorrection . . . . . . . . . . . . . . . . . . . . . . 31
3.4.2 InferredLabelAccuracy . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
3.4.3 AnnotatorErrorEstimation . . . . . . . . . . . . . . . . . . . . . . . . . . 34
3.4.4 FailureCases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
3.5 ConclusionsandFutureWork . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
4 Early Gains Matter: A Case for Preferring Generative over Discriminative Crowd-
sourcingModels 38
4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
4.2 PreviousWork . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
4.3 Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
4.3.1 Log-lineardatamodel(LOGRESP) . . . . . . . . . . . . . . . . . . . . . . 42
4.3.2 Multinomialdatamodel(MOMRESP) . . . . . . . . . . . . . . . . . . . . 43
4.3.3 AGenerative-DiscriminativePair . . . . . . . . . . . . . . . . . . . . . . 44
4.4 Mean-fieldVariationalInference(MF) . . . . . . . . . . . . . . . . . . . . . . . . 44
4.4.1 LOGRESP Inference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
4.4.2 MOMRESP Inference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
4.4.3 Modelpriorsandimplementationdetails . . . . . . . . . . . . . . . . . . 47
4.5 ExperimentswithSimulatedAnnotators . . . . . . . . . . . . . . . . . . . . . . . 48
4.5.1 SimulatingAnnotators . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
4.5.2 DatasetsandFeatures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
4.5.3 ValidatingMean-fieldVariationalInference . . . . . . . . . . . . . . . . . 50
4.5.4 Discriminative(LOGRESP)versusGenerative(MOMRESP) . . . . . . . . 51
4.6 ExperimentswithHumanAnnotators . . . . . . . . . . . . . . . . . . . . . . . . 54
vi
4.7 ConclusionsandFutureWork . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
5 Making the Most of Crowdsourced Document Annotations: Confused Supervised
LDA 56
5.1 ModelingAnnotatorsandAbilities . . . . . . . . . . . . . . . . . . . . . . . . . . 56
5.2 LatentRepresentationsthatReflectLabelsandConfusion . . . . . . . . . . . . . . 57
5.2.1 LeveragingData . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
5.2.2 ConfusedSupervisedLDA(CSLDA) . . . . . . . . . . . . . . . . . . . . 59
5.2.3 StochasticEM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
5.2.4 HyperparameterOptimization . . . . . . . . . . . . . . . . . . . . . . . . 63
5.3 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
5.3.1 Human-generatedAnnotations . . . . . . . . . . . . . . . . . . . . . . . . 65
5.3.2 SyntheticAnnotations . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
5.3.3 JointvsPipelineInference . . . . . . . . . . . . . . . . . . . . . . . . . . 70
5.3.4 ErrorAnalysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
5.4 AdditionalRelatedWork . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
5.5 ConclusionandFutureWork . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
6 SemanticAnnotationAggregationwithConditionalCrowdsourcingModelsandWord
Embeddings 74
6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
6.2 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
6.2.1 Data-awareannotationmodels . . . . . . . . . . . . . . . . . . . . . . . . 76
6.2.2 WordandDocumentRepresentations . . . . . . . . . . . . . . . . . . . . 78
6.3 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
6.3.1 Datasets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
6.3.2 Comparisonwithlexicalmethods . . . . . . . . . . . . . . . . . . . . . . 82
6.3.3 Whenlexicalmethodsdonotapply . . . . . . . . . . . . . . . . . . . . . 83
vii
6.3.4 Summaryofexperiments . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
6.4 Sentimentdataseterroranalysis . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
6.5 AdditionalRelatedWork . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
6.6 ConclusionsandFutureWork . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
7 Learning from Measurements in Crowdsourcing Models: Inferring Ground Truth
fromDiverseAnnotationTypes 90
7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
7.2 BackgroundonMeasurements . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
7.3 Multi-annotatorMeasurementsArchitecture . . . . . . . . . . . . . . . . . . . . . 94
7.4 Per-annotatorNormalMeasurementModelforClassification . . . . . . . . . . . . 96
7.4.1 ImplementationConsiderations . . . . . . . . . . . . . . . . . . . . . . . 99
7.5 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
7.5.1 Baselines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
7.5.2 SimulatedData . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
7.5.3 SentimentClassification . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
7.6 ModelExtensions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
7.6.1 ActiveMeasurementSelection . . . . . . . . . . . . . . . . . . . . . . . . 104
7.6.2 LabeledLocationMeasurements . . . . . . . . . . . . . . . . . . . . . . . 106
7.7 AdditionalRelatedWork . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
7.8 ConclusionandFutureWork . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
8 ConclusionsandFutureWork 111
9 Appendix: SupplementaryMaterialfor
EarlyGainsMatter: ACaseforPreferringGenerativeoverDiscriminativeCrowdsourc-
ingModels 114
viii
10 Appendix: SupplementaryMaterialfor
LearningfromMeasurementsinCrowdsourcingModels 126
10.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126
10.2 VariationalInference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126
10.2.1 JointProbability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128
10.2.2 Meanfieldupdates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129
10.2.3 LowerBound . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133
10.3 CalculatingExpectedValues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136
10.4 PropertiesofExpectations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136
11 Appendix: Deriving the majority vote procedure from an item response model under
limitingassumptions 139
References 142
ix
Description:email detection, grammatical analysis, and identifying mentions of people, .. 6 Semantic Annotation Aggregation with Conditional Crowdsourcing