Table Of Content

Technical Report UCAM-CL-TR-743 ISSN 1476-2986 Number 743 Computer Laboratory Optimising the speed and accuracy of a Statistical GLR Parser Rebecca F. Watson March 2009 15 JJ Thomson Avenue Cambridge CB3 0FD United Kingdom phone +44 1223 763500 http://www.cl.cam.ac.uk/ (cid:13)c 2009 Rebecca F. Watson This technical report is based on a dissertation submitted September 2007 by the author for the degree of Doctor of Philosophy to the University of Cambridge, Darwin College. Technical reports published by the University of Cambridge Computer Laboratory are freely available via the Internet: http://www.cl.cam.ac.uk/techreports/ ISSN 1476-2986 3 Abstract The focus of this thesis is to develop techniques that optimise both the speed and accuracyofaunification-basedstatisticalGLRparser. However,wecanapplythese methods within a broad range of parsing frameworks. We first aim to optimise the level of tag ambiguity resolved during parsing, given that we employ a front-end PoS tagger. This work provides the first broad comparison of tag models as we consider both tagging and parsing performance. A dynamic model achieves the best accuracy and provides a means to overcome the trade-off between tag error ratesinsingletagperwordinputandtheincreaseinparseambiguityovermultiple- tag per word input. The second line of research describes a novel modification to the inside-outside algorithm, whereby multiple inside and outside probabilities are assignedforelementswithinthepackedparseforestdatastructure. Thisalgorithm enables us to compute a set of ‘weighted GRs’ directly from this structure. Our experiments demonstrate substantial increases in parser accuracy and throughput forweightedGRoutput. Finally,wedescribeanovelconfidence-basedtrainingframework,thatcan,inprin- ciple,beappliedtoanystatisticalparserwhoseoutputisdefinedintermsofitscon- sistency with a given level and type of annotation. We demonstrate that a semisu- pervised variant of this framework outperforms both Expectation-Maximisation (when both are constrained by unlabelled partial-bracketing) and the extant (fully supervised) method. These novel training methods utilise data automatically ex- tracted from existing corpora. Consequently, they require no manual effort on be- halfofthegrammar writer,facilitatinggrammar development. 4 5 Acknowledgements I would first like to thank Ted Briscoe, who was an excellent supervisor. He has helped to guide this thesis with his invaluable insight and I have appreciated his patience and enthusiasm. Without his easy-going nature and constant support and direction this thesis would not have been completed as and when it was. Most importantly, he always reminded me to enjoy my time at Cambridge and have a nice glass of wine whenever possible! I would also like to thank John Carroll who even at a distance has managed to provide a great deal of support and was always availablewhenIneededhelporadvice. People from the NLIP group and administrative staff at the Computer Laboratory were also very helpful. I enjoyed my many talks with Anna Ritchie, Ben Medlock and Bill Hollingsworth. I will miss their moral support and I’m grateful that fate locked us in a room together for so many years! Thanks also to Gordon Royle and other staff at the University of Western Australia who supported me while I completedmyresearchwhilevisitingthisUniversityathomeinPerth. IalsogreatlyappreciatedthefeedbackIreceivedduringmyPhDViva. Bothofmy examiners, Stephen Clark and Anna Korhonen, provided helpful and thoughtful suggestionswhichimprovedtheoverallqualityofthiswork’spresentation. Thisresearchwouldnothavebeenpossiblewithoutthefinancialsupportofboththe OverseasResearchStudentsAwardsSchemeandthePoyntonScholarshipawarded by the Cambridge Australia Trust in collaboration with the Cambridge Common- wealthTrust. On a personal note I would like to thank my family; my parents and my sister Kathryn, who were always available to talk and provided a great deal of support. Finally, special thanks goes to my partner James who moved across the world to support me during my PhD. He made our home somewhere I didn’t mind working onweekends. 6 Contents 1 Introduction 13 1.1 NaturalLanguageParsing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 1.1.1 ProblemDefinition . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 1.1.2 Corpus-basedEstimation . . . . . . . . . . . . . . . . . . . . . . . . . 14 1.1.3 StatisticalApproaches . . . . . . . . . . . . . . . . . . . . . . . . . . 15 1.2 ResearchBackground . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 1.3 AvailableResources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 1.3.1 Corpora . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 1.3.2 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 1.4 ResearchGoals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 1.5 ThesisSummary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 1.5.1 Contributions ofthisThesis . . . . . . . . . . . . . . . . . . . . . . . 26 1.5.2 Outline ofSubsequentChapters . . . . . . . . . . . . . . . . . . . . . 27 2 LRParsers 28 2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 2.2 FiniteAutomata . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 2.2.1 NFA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 2.2.2 DFA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 2.3 LRParsers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 2.3.1 LRParsingModel . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 2.3.2 TypesofLRParsers . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 2.3.3 ParserActions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 2.3.4 LRTable . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 2.3.5 ParsingProgram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 2.3.6 TableConstruction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 2.4 GLRParsing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 2.4.1 Relationship totheLRParsingFramework . . . . . . . . . . . . . . . 43 2.4.2 TableConstruction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 2.4.3 Graph-structured Stack . . . . . . . . . . . . . . . . . . . . . . . . . . 44 2.4.4 ParseForest . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 2.4.5 LRParsingProgram . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 2.4.6 Output . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 2.4.7 ModificationstotheAlgorithm . . . . . . . . . . . . . . . . . . . . . . 50 2.5 StatisticalGLR(SGLR)Parsing . . . . . . . . . . . . . . . . . . . . . . . . . 50 2.5.1 Probabilistic Approaches . . . . . . . . . . . . . . . . . . . . . . . . . 51 7 8 CONTENTS 2.5.2 Estimating ActionProbabilities . . . . . . . . . . . . . . . . . . . . . 51 2.6 RASP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 2.6.1 Grammar . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 2.6.2 Training . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 2.6.3 ParserApplication . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 2.6.4 Output Formats . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61 3 Part-of-speechTagModels 65 3.1 PreviousWork . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65 3.1.1 PoSTaggersandParsers . . . . . . . . . . . . . . . . . . . . . . . . . 65 3.1.2 TagModels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67 3.1.3 HMMPoSTaggers . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68 3.2 RASP’sArchitecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70 3.2.1 ProcessingStages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70 3.2.2 PoSTagger . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70 3.3 Part-of-speechTagModels . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73 3.3.1 Part-of-speechTagFiles . . . . . . . . . . . . . . . . . . . . . . . . . 73 3.3.2 Thresholding overTagProbabilities . . . . . . . . . . . . . . . . . . . 74 3.3.3 Top-rankedParseTags . . . . . . . . . . . . . . . . . . . . . . . . . . 75 3.3.4 HighestCountTags . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76 3.3.5 WeightedCountTags . . . . . . . . . . . . . . . . . . . . . . . . . . . 77 3.3.6 GoldStandardTags . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77 3.3.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77 3.4 Part-of-speechTaggingPerformance . . . . . . . . . . . . . . . . . . . . . . . 77 3.4.1 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77 3.4.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80 3.5 ParserPerformance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81 3.5.1 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81 3.5.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82 3.6 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84 4 EfficientExtraction ofWeightedGRs 86 4.1 Inside-OutsideAlgorithm (IOA) . . . . . . . . . . . . . . . . . . . . . . . . . 87 4.1.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87 4.1.2 TheStandardAlgorithm . . . . . . . . . . . . . . . . . . . . . . . . . 88 4.1.3 ExtensiontoLRParsers . . . . . . . . . . . . . . . . . . . . . . . . . 92 4.2 ExtractingGrammaticalRelations . . . . . . . . . . . . . . . . . . . . . . . . 94 4.2.1 ModificationtoLocalAmbiguity Packing . . . . . . . . . . . . . . . . 94 4.2.2 ExtractingGrammatical Relations . . . . . . . . . . . . . . . . . . . . 95 4.2.3 Problem: MultipleLexicalHeads . . . . . . . . . . . . . . . . . . . . 97 4.2.4 Problem: MultipleParseForests . . . . . . . . . . . . . . . . . . . . . 100 4.3 TheEWGAlgorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101 4.3.1 InsideProbability CalculationandGRInstantiation . . . . . . . . . . . 102 4.3.2 OutsideProbability Calculation . . . . . . . . . . . . . . . . . . . . . 105 4.3.3 RelatedWork . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107 4.4 EWGPerformance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107 4.4.1 Comparing PackingSchemes . . . . . . . . . . . . . . . . . . . . . . 108 CONTENTS 9 4.4.2 EfficiencyofEWG . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108 4.4.3 DataAnalysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109 4.4.4 AccuracyofEWG . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110 4.5 ApplicationtoParseSelection . . . . . . . . . . . . . . . . . . . . . . . . . . 110 4.6 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111 5 Confidence-basedTraining 112 5.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113 5.2 ResearchBackground . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114 5.2.1 UnsupervisedTraining . . . . . . . . . . . . . . . . . . . . . . . . . . 114 5.2.2 SemisupervisedTraining . . . . . . . . . . . . . . . . . . . . . . . . . 114 5.3 ExtantParserTrainingandResources . . . . . . . . . . . . . . . . . . . . . . 118 5.3.1 Corpora . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119 5.3.2 ExtantParserTraining . . . . . . . . . . . . . . . . . . . . . . . . . . 120 5.3.3 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121 5.3.4 Baseline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121 5.4 Confidence-basedTrainingApproaches . . . . . . . . . . . . . . . . . . . . . 121 5.4.1 Framework . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121 5.4.2 ConfidenceMeasures . . . . . . . . . . . . . . . . . . . . . . . . . . . 124 5.4.3 Self-training . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125 5.5 Experimentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125 5.5.1 SemisupervisedTraining . . . . . . . . . . . . . . . . . . . . . . . . . 125 5.5.2 UnsupervisedTraining . . . . . . . . . . . . . . . . . . . . . . . . . . 130 5.6 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132 6 Conclusion 135 References 139 List of Figures 1.1 TreeandGRparseroutput forthesentenceThedogbarked. . . . . . . . . . . 14 1.2 ExamplesentencefromSusanne. . . . . . . . . . . . . . . . . . . . . . . . . . 19 1.3 Examplebracketedcorpustraininginstancefrom Susanne. . . . . . . . . . . . 19 1.4 Exampleannotatedcorpustraining instancefromSusanne. . . . . . . . . . . . 20 1.5 Exampleannotatedtraininginstancefrom theGDT. . . . . . . . . . . . . . . . 20 1.6 ExamplesentencefromtheWSJ. . . . . . . . . . . . . . . . . . . . . . . . . . 21 1.7 Examplebracketedcorpustraininginstancefrom theWSJ. . . . . . . . . . . . 21 1.8 ExamplesentencefromPARC700DependencyBank. . . . . . . . . . . . . . 22 1.9 ExampleofsentencesfromDepBank. . . . . . . . . . . . . . . . . . . . . . . 22 2.1 NFAfortheRE(a|b)∗ab. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 2.2 DFAfortheRE(a|b)∗ab. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 2.3 Algorithm tosimulateaDFA. . . . . . . . . . . . . . . . . . . . . . . . . . . 32 2.4 ComponentsofanLRparser. . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 2.5 GrammarG . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 1 2.6 DFAforG . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 1 2.7 LR(0)itemsfortheruleS → NP VP. . . . . . . . . . . . . . . . . . . . . . . . 39 2.8 GrammarG . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 2 2.9 GrammarNFAforG . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 2 2.10 ExampleparsesforG . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 2 2.11 Examplegraph-structured stackforG . . . . . . . . . . . . . . . . . . . . . . 49 2 2.12 Examplemetagrammar rule. . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 2.13 TheGRsubsumptionhierarchy . . . . . . . . . . . . . . . . . . . . . . . . . . 57 2.14 Simplifiedparseforestwithintheextantparser. . . . . . . . . . . . . . . . . . 60 2.15 Examplesyntactictreeoutput. . . . . . . . . . . . . . . . . . . . . . . . . . . 62 2.16 Examplen-bestGRandweightedGRoutput. . . . . . . . . . . . . . . . . . . 64 3.1 RASPprocessingpipeline. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71 3.2 Examplelexicalentriesinthetagdictionary. . . . . . . . . . . . . . . . . . . . 71 3.3 Examplemappingfrom PoStagtoterminalcategory. . . . . . . . . . . . . . . 72 3.4 PoStagoutput forWeallwalkedupthehill . . . . . . . . . . . . . . . . . . . . 73 3.5 SINGLE-TAGandALL-TAGPoStagsexample. . . . . . . . . . . . . . . . . 74 3.6 MULT-SYSPoStagsexample. . . . . . . . . . . . . . . . . . . . . . . . . . . 75 4.1 Theinside(e)andoutside(f)regionsfornodeN. . . . . . . . . . . . . . . . . 89 i 4.2 Calculationofinsideprobabilities fornodeN. . . . . . . . . . . . . . . . . . . 90 i 4.3 Calculationofoutsideprobabilities fornodeN. . . . . . . . . . . . . . . . . . 90 i 10

Description:

accuracy of a unification-based statistical GLR parser. However the inside- outside algorithm, whereby multiple inside and outside probabilities are assigned

Optimising the speed and accuracy of a Statistical GLR Parser PDF

145 Pages·2009·1.34 MB·English

by Rebecca F. Watson

Checking for file health...

Save to my drive

Quick download

Download

Download Optimising the speed and accuracy of a Statistical GLR Parser PDF Free - Full Version

by Rebecca F. Watson| 2009| 145 pages| 1.34| English

Download Optimising the speed and accuracy of a Statistical GLR Parser by Rebecca F. Watson in PDF format completely FREE. No registration required, no payment needed. Get instant access to this valuable resource on PDFdrive.to!

Free Download PDF

About Optimising the speed and accuracy of a Statistical GLR Parser

accuracy of a unification-based statistical GLR parser. However the inside- outside algorithm, whereby multiple inside and outside probabilities are assigned

Detailed Information

Author:	Rebecca F. Watson
Publication Year:	2009
Pages:	145
Language:	English
File Size:	1.34
Format:	PDF
Price:	FREE

Download Free PDF

Safe & Secure Download - No registration required

Why Choose PDFdrive for Your Free Optimising the speed and accuracy of a Statistical GLR Parser Download?

100% Free: No hidden fees or subscriptions required for one book every day.
No Registration: Immediate access is available without creating accounts for one book every day.
Safe and Secure: Clean downloads without malware or viruses
Multiple Formats: PDF, MOBI, Mpub,... optimized for all devices
Educational Resource: Supporting knowledge sharing and learning

Frequently Asked Questions

Is it really free to download Optimising the speed and accuracy of a Statistical GLR Parser PDF?

Yes, on https://PDFdrive.to you can download Optimising the speed and accuracy of a Statistical GLR Parser by Rebecca F. Watson completely free. We don't require any payment, subscription, or registration to access this PDF file. For 3 books every day.

How can I read Optimising the speed and accuracy of a Statistical GLR Parser on my mobile device?

After downloading Optimising the speed and accuracy of a Statistical GLR Parser PDF, you can open it with any PDF reader app on your phone or tablet. We recommend using Adobe Acrobat Reader, Apple Books, or Google Play Books for the best reading experience.

Is this the full version of Optimising the speed and accuracy of a Statistical GLR Parser?

Yes, this is the complete PDF version of Optimising the speed and accuracy of a Statistical GLR Parser by Rebecca F. Watson. You will be able to read the entire content as in the printed version without missing any pages.

Is it legal to download Optimising the speed and accuracy of a Statistical GLR Parser PDF for free?

https://PDFdrive.to provides links to free educational resources available online. We do not store any files on our servers. Please be aware of copyright laws in your country before downloading.

The materials shared are intended for research, educational, and personal use in accordance with fair use principles.