Table Of Content

UUnniivveerrssiittyy ooff PPeennnnssyyllvvaanniiaa SScchhoollaarrllyyCCoommmmoonnss Publicly Accessible Penn Dissertations 2013 MMooddeellss ffoorr IImmpprroovveedd TTrraaccttaabbiilliittyy aanndd AAccccuurraaccyy iinn DDeeppeennddeennccyy PPaarrssiinngg Emily Pitler University of Pennsylvania, [email protected] Follow this and additional works at: https://repository.upenn.edu/edissertations Part of the Computer Sciences Commons RReeccoommmmeennddeedd CCiittaattiioonn Pitler, Emily, "Models for Improved Tractability and Accuracy in Dependency Parsing" (2013). Publicly Accessible Penn Dissertations. 912. https://repository.upenn.edu/edissertations/912 This paper is posted at ScholarlyCommons. https://repository.upenn.edu/edissertations/912 For more information, please contact [email protected]. MMooddeellss ffoorr IImmpprroovveedd TTrraaccttaabbiilliittyy aanndd AAccccuurraaccyy iinn DDeeppeennddeennccyy PPaarrssiinngg AAbbssttrraacctt Automatic syntactic analysis of natural language is one of the fundamental problems in natural language processing. Dependency parses (directed trees in which edges represent the syntactic relationships between the words in a sentence) have been found to be particularly useful for machine translation, question answering, and other practical applications. For English dependency parsing, we show that models and features compatible with how conjunctions are represented in treebanks yield a parser with state-of-the-art overall accuracy and substantial improvements in the accuracy of conjunctions. For languages other than English, dependency parsing has often been formulated as either searching over trees without any crossing dependencies (projective trees) or searching over all directed spanning trees. The former sacrifices the ability to produce many natural language structures; the latter is NP-hard in the presence of features with scopes over siblings or grandparents in the tree. This thesis explores alternative ways to simultaneously produce crossing dependencies in the output and use models that parametrize over multiple edges. Gap inheritance is introduced in this thesis and 6 5 quantifies the nesting of subtrees over intervals. The thesis provides O(n ) and O(n ) edge-factored parsing algorithms for two new classes of trees based on this property, and extends the latter to include grandparent factors. This thesis then defines 1-Endpoint-Crossing trees, in which for any edge that is crossed, all other edges that cross that edge share an endpoint. This property covers 95.8% or more of dependency parses across a variety of languages. A crossing-sensitive factorization introduced in this thesis generalizes a commonly used third-order factorization (capable of scoring triples of edges simultaneously). This thesis provides exact dynamic programming algorithms that find the optimal 1-Endpoint-Crossing 4 tree under either an edge-factored model or this crossing-sensitive third-order model in O(n ) time, orders of magnitude faster than other mildly non-projective parsing algorithms and identical to the parsing time for projective trees under the third-order model. The implemented parser is significantly more accurate than the third-order projective parser under many experimental settings and significantly less accurate on none. DDeeggrreeee TTyyppee Dissertation DDeeggrreeee NNaammee Doctor of Philosophy (PhD) GGrraadduuaattee GGrroouupp Computer and Information Science FFiirrsstt AAddvviissoorr Mitchell P. Marcus SSeeccoonndd AAddvviissoorr Sampath Kannan KKeeyywwoorrddss algorithms, dependency, natural language processing, parsing, syntax, trees SSuubbjjeecctt CCaatteeggoorriieess Computer Sciences This dissertation is available at ScholarlyCommons: https://repository.upenn.edu/edissertations/912 MODELSFORIMPROVEDTRACTABILITYANDACCURACYINDEPENDENCY PARSING EmilyPitler ADISSERTATION in ComputerandInformationScience PresentedtotheFacultiesoftheUniversityofPennsylvania in PartialFulfillmentoftheRequirementsforthe DegreeofDoctorofPhilosophy 2013 SupervisorofDissertation Co-SupervisorofDissertation MitchellP.Marcus SampathKannan Professor,ComputerandInformationScience Professor,ComputerandInformationScience GraduateGroupChairperson ValTannen,Professor,ComputerandInformationScience DissertationCommittee: MichaelCollins,Professor,ComputerScience,ColumbiaUniversity ChrisCallison-Burch,AssistantProfessor,ComputerandInformationScience MarkLiberman,Professor,Linguistics BenTaskar,AssociateProfessor,ComputerandInformationScience MODELS FOR IMPROVED TRACTABILITY AND ACCURACY IN DEPENDENCY PARSING COPYRIGHT 2013 EmilyPitler Acknowledgements Mitch Marcus and Sampath Kannan were wonderful advisors and I am truly grateful to both of them for taking me on as a student. Mitch’s patience and unwavering support allowed me the freedom to find this topic for this thesis. His wide-ranging knowledge of both linguistics and computer science was incredibly valuable. Sampath inspired me to strive for clean solutions to problems. His clarity helped me to identify the crucial pieces ofanyresultandgreatlyimprovedthematerialpresentedhere. Iamgratefultomythesiscommittee: ChrisCallison-Burch,MikeCollins,MarkLiber- man, and Ben Taskar. I have benefited from Chris’s advice throughout graduate school, at both Penn and JHU. I have tried to emulate his clear presentation style in both my talks and my writing. Mike’s work in parsing served as inspiration for me for much of the work in this thesis. His detailed questions have both improved this document and pointed the way towards future directions. Mark’s questions at CLunch over the years always brought up subtle but important details. Discussions of treebank representations in this document are largely due to Mark’s influence. Ben helped me identify interesting directions at a for- mative stage in this work. Throughout the process, Ben encouraged me to simultaneously investigateboththeoreticalandempiricalissues. Alargenumberofpeoplecontributedtomyundertakingthisdissertationandfollowing through. Thanks especially to Jerry Berry and Stephen Rose at Thomas Jefferson High School for Science and Technology for introducing me to programming and computer science; Charles Yang, at Yale and Penn, for introducing me to computational linguistics; DanaAngluinandBrianScassellatiatYaleforencouragingmetopursuegraduateschool; iii and Ken Church, Dekang Lin, and Ani Nenkova for guiding me as I started research. I benefited tremendously from two summers with Ken: he patiently taught me how to write papers and how to approach problems from multiple perspectives. Dekang has been very supportive of me and my research ever since working together at a JHU CLSP Summer Workshop in 2009. Dekang introduced me to research in parsing and applications of parsing. Ani Nenkova and Annie Louis were wonderful collaborators. Annie Louis was my constant partner throughout graduate school; it was a pleasure to go through courses, research, internships, and other milestones together. Thanks to Katerina Fragkiadaki, Jenny Gillenwater, Arun Raghavan, and David Weiss for providing feedback on drafts and prac- ticetalks;ShaneBergsma,JasonEisner,AravindJoshi,JulieLegate,RyanMcDonald,Slav Petrov,GiorgioSatta,andDavidYarowskyforproductiveconversationsthatimpactedthis thesis; Terry Koo for his assistance with the dpo3 parser; Daniel Zeman for his assistance withtheHamleDTtreebankconversionsoftware;MikeFelkerforremovingadministrative hurdles; Cheryl Hickey for all her support; the students, staff, and volunteers at White- Williams Scholars/Philadelphia Futures for a rewarding time and memories; Drew Hilton, Santosh Nagarakatte, and Arun Raghavan for long games of bridge; Katie Gibson for bak- ingdeliciouscupcakesandfudge;andotherstudentsIhadthepleasureofoverlappingwith at Penn, including Adam Aviv, Alexis Baird, John Blitzer, Chris Casinghino, Erwin Chan, SanjianChen,NikhilDinesh,MarkDredze,RyanGabbard,KuzmanGanchev,JoaõGraça, MichaelGreenberg,KaiHong,LiangHuang,MarieJacob,SarahJohnstone,AndrewKing, Alex Kulesza, Junyi Li, Constantine Lignos, Xi Lin, Ellie Pavlick, Alex Roderer, Sudeepa Roy,BenSapp,ParthaPratimTalukdar,AndrewWest,andQiuyeZhao. Iamdeeplygratefultomycollegefriends: Alison,Betsy,Caroline,Charlie,Guy,Illana, Jackie, Joel, Julia, Lekshmi, Michael, and Samarth for their visits and encouragement. Samarth and I were classmates from middle school through undergrad and we wrote our first paper together. I would like to thank my brother Will, my grandparents, and Sudipto fortheirsteadfastbeliefinme. Mostimportantly,Iamthankfultomyparentsfornurturing mycuriosityandfortheirloveandsupport. iv ABSTRACT MODELSFORIMPROVEDTRACTABILITYANDACCURACYINDEPENDENCY PARSING EmilyPitler MitchellP.Marcus SampathKannan Automaticsyntacticanalysisofnaturallanguageisoneofthefundamentalproblemsinnat- ural language processing. Dependency parses (directed trees in which edges represent the syntacticrelationshipsbetweenthewordsinasentence)havebeenfoundtobeparticularly usefulformachinetranslation,questionanswering,andotherpracticalapplications. For English dependency parsing, we show that models and features compatible with how conjunctions are represented in treebanks yield a parser with state-of-the-art overall accuracyandsubstantialimprovementsintheaccuracyofconjunctions. ForlanguagesotherthanEnglish,dependencyparsinghasoftenbeenformulatedasei- thersearchingovertreeswithoutanycrossingdependencies(projectivetrees)orsearching over all directed spanning trees. The former sacrifices the ability to produce many natural language structures; the latter is NP-hard in the presence of features with scopes over siblingsorgrandparentsinthetree. Thisthesisexploresalternativewaystosimultaneouslyproducecrossingdependencies intheoutputandusemodelsthatparametrizeovermultipleedges. Gap inheritance is introduced in this thesis and quantifies the nesting of subtrees over intervals. The thesis provides O(n6) and O(n5) edge-factored parsing algorithms for two new classes of trees based on this property, and extends the latter to include grandparent factors. Thisthesisthendefines1-Endpoint-Crossingtrees,inwhichforanyedgethatiscrossed, all other edges that cross that edge share an endpoint. This property covers 95.8% or more of dependency parses across a variety of languages. A crossing-sensitive factorization introduced in this thesis generalizes a commonly used third-order factorization (capable of v scoringtriplesofedgessimultaneously). This thesis provides exact dynamic programming algorithms that find the optimal 1- Endpoint-Crossingtreeundereitheranedge-factoredmodelorthiscrossing-sensitivethird- order model in O(n4) time, orders of magnitude faster than other mildly non-projective parsing algorithms and identical to the parsing time for projective trees under the third- order model. The implemented parser is significantly more accurate than the third-order projectiveparserundermanyexperimentalsettingsandsignificantlylessaccurateonnone. vi Contents Acknowledgements iii 1 Introduction 1 1.1 Background: Framework . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.2 Scoring . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.2.1 Factorizations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.2.2 Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 1.2.3 ParsingwithUnlabeledDataandRelevantFactorizations . . . . . . 7 1.3 Searching . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 1.3.1 ProjectiveTrees . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 1.3.2 Arborescences . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 1.3.3 ExistingDefinitionsofMildlyNon-projectiveTrees . . . . . . . . 10 1.3.4 ClassesofTreesProposedinthisThesis . . . . . . . . . . . . . . . 10 1.4 FactorizationsthatFacilitateSearch . . . . . . . . . . . . . . . . . . . . . 12 1.5 ThesisContributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 2 Attacking Parsing Bottlenecks with Unlabeled Data and Relevant Factoriza- tions 16 2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 2.2 ConversiontoDependencyRepresentations . . . . . . . . . . . . . . . . . 18 2.3 ImplicationsofRepresentationsontheScopeofFactorization . . . . . . . 22 2.3.1 Edge-basedScoring . . . . . . . . . . . . . . . . . . . . . . . . . 22 vii

Description:

Mark's questions at CLunch over the years always brought up subtle but important details. Discussions of treebank representations in this document.

Models for Improved Tractability and Accuracy in Dependency Parsing PDF

169 Pages·2014·1.13 MB·English

by Emily Pitler

Checking for file health...

Save to my drive

Quick download

Download

Download Models for Improved Tractability and Accuracy in Dependency Parsing PDF Free - Full Version

by Emily Pitler| 2014| 169 pages| 1.13| English

Download Models for Improved Tractability and Accuracy in Dependency Parsing by Emily Pitler in PDF format completely FREE. No registration required, no payment needed. Get instant access to this valuable resource on PDFdrive.to!

Free Download PDF

About Models for Improved Tractability and Accuracy in Dependency Parsing

Mark's questions at CLunch over the years always brought up subtle but important details. Discussions of treebank representations in this document.

Detailed Information

Author:	Emily Pitler
Publication Year:	2014
Pages:	169
Language:	English
File Size:	1.13
Format:	PDF
Price:	FREE

Download Free PDF

Safe & Secure Download - No registration required

Why Choose PDFdrive for Your Free Models for Improved Tractability and Accuracy in Dependency Parsing Download?

100% Free: No hidden fees or subscriptions required for one book every day.
No Registration: Immediate access is available without creating accounts for one book every day.
Safe and Secure: Clean downloads without malware or viruses
Multiple Formats: PDF, MOBI, Mpub,... optimized for all devices
Educational Resource: Supporting knowledge sharing and learning

Frequently Asked Questions

Is it really free to download Models for Improved Tractability and Accuracy in Dependency Parsing PDF?

Yes, on https://PDFdrive.to you can download Models for Improved Tractability and Accuracy in Dependency Parsing by Emily Pitler completely free. We don't require any payment, subscription, or registration to access this PDF file. For 3 books every day.

How can I read Models for Improved Tractability and Accuracy in Dependency Parsing on my mobile device?

After downloading Models for Improved Tractability and Accuracy in Dependency Parsing PDF, you can open it with any PDF reader app on your phone or tablet. We recommend using Adobe Acrobat Reader, Apple Books, or Google Play Books for the best reading experience.

Is this the full version of Models for Improved Tractability and Accuracy in Dependency Parsing?

Yes, this is the complete PDF version of Models for Improved Tractability and Accuracy in Dependency Parsing by Emily Pitler. You will be able to read the entire content as in the printed version without missing any pages.

Is it legal to download Models for Improved Tractability and Accuracy in Dependency Parsing PDF for free?

https://PDFdrive.to provides links to free educational resources available online. We do not store any files on our servers. Please be aware of copyright laws in your country before downloading.

The materials shared are intended for research, educational, and personal use in accordance with fair use principles.