Table Of Content

LEARNING PROBABILISTIC MODELS OF WORD SENSE DISAMBIGUATION Approved by: Dr. Dan Moldovan Dr. Rebecca Bruce Dr. Weidong Chen Dr. Frank Coyle Dr. Margaret Dunham Dr. Mandyam Srinath LEARNING PROBABILISTIC MODELS OF WORD SENSE DISAMBIGUATION A Dissertation Presented to the Graduate Faculty of the School of Engineering and Applied Science Southern Methodist University in Partial Fulfillment of the Requirements for the degree of Doctor of Philosophy with a Major in Computer Science by Ted Pedersen (B.A., Drake University) (M.S., University of Arkansas) May 16, 1998 ACKNOWLEDGMENTS I am indebted to Dr. Rebecca Bruce for sharing freely of her time, knowledge, andinsightsthroughoutthisresearch. Certainlynoneofthiswouldhavebeenpossible without her. Dr. Weidong Chen, Dr. Frank Coyle, Dr. Maggie Dunham, Dr. Dan Moldovan, andDr. MandyamSrinathhaveallmadeimportantcontributionstothisdissertation. They are also among the main reasons why my time at SMU has been both happy and productive. I am also grateful to Dr. Janyce Wiebe, Lei Duan, Mehmet Kayaalp, Ken McK- eever, and Tom O’Hara for many valuable comments and suggestions that influenced the direction of this research. This work was supported by the Office of Naval Research under grant number N00014-95-1-0776. iii Pedersen, Ted B.A., Drake University M.S., University of Arkansas Learning Probabilistic Models of Word Sense Disambiguation Advisor: Professor Dan Moldovan Doctor of Philosophy degree conferred May 16, 1998 Dissertation completed May 16, 1998 Selecting the most appropriate sense for an ambiguous word is a common problem in natural language processing. This dissertation pursues corpus–based approaches that learn probabilistic models of word sense disambiguation from large amounts of text. These models consist of a parametric form and parameter estimates. The parametric form characterizes the interactions among the contextual features and the sense of the ambiguous word. Parameter estimates describe the probability of observing different combinations of feature values. These models dis- ambiguate by determining the most probable sense of an ambiguous word given the context in which it occurs. This dissertation presents several enhancements to existing supervised methods of learning probabilistic models of disambiguation from sense–tagged text. A new search strategy, forward sequential, guides the selection process through the space of possible models. Each model considered for selection is judged by a new class of evaluation metric, the information criteria. The combination of forward sequential search and Akaike’s Information Criteria is shown to consistently select highly ac- curate models of disambiguation. The same search strategy and evaluation criterion also serve as the basis of the Naive Mix, a new supervised learning algorithm that is shown to be competitive with leading machine learning methodologies. In these comparisons the Naive Bayesian classifier also fares well which seems surprising since it is based on a model where the parametric form is simply assumed. However, an iv explanation for this success is presented in terms of learning rates and bias–variance decompositions of classification error. Unfortunately, sense–taggedtextonlyexistsinsmallquantitiesandisexpensive to create. This substantially limits the portability of supervised learning approaches to word sense disambiguation. This bottleneck is addressed by developing unsupervised methods that learn probabilistic models from raw untagged text. However, such text does not contain enough information to automatically select a parametric form. Instead, one must simply be assumed. Given a form, the senses of ambiguous words are treated as missing data and their values are imputed via the Expecta- tion Maximization algorithm and Gibbs Sampling. Here the parametric form of the Naive Bayesian classifier is employed. However, this methodology is appropriate for any parametric form in the class of decomposable models. Several local–context, frequency–based feature sets are also developed and shown to be appropriate for unsupervised learning of word senses from raw untagged text. v TABLE OF CONTENTS ACKNOWLEDGMENTS .................................................... iii LIST OF FIGURES.......................................................... x LIST OF TABLES ........................................................... xiii CHAPTER 1. INTRODUCTION ..................................................... 1 1.1. Word Sense Disambiguation ....................................... 2 1.2. Learning from Text ............................................... 3 1.2.1. Supervised Learning........................................ 5 1.2.2. Unsupervised Learning ..................................... 6 1.3. Basic Assumptions ................................................ 7 1.4. Chapter Summaries ............................................... 7 2. PROBABILISTIC MODELS ........................................... 10 2.1. Inferential Statistics ............................................... 10 2.1.1. Maximum Likelihood Estimation ........................... 11 2.1.2. Bayesian Estimation ....................................... 14 2.2. Decomposable Models ............................................. 15 2.2.1. Examples .................................................. 17 2.2.2. Decomposable Models as Classifiers......................... 22 3. SUPERVISED LEARNING FROM SENSE–TAGGED TEXT ........... 24 3.1. Sequential Model Selection ........................................ 25 3.1.1. Search Strategy ............................................ 26 3.1.2. Evaluation Criteria......................................... 29 vi 3.1.2.1. Significance Testing ............................... 30 3.1.2.2. Information Criteria .............................. 33 3.1.3. Examples .................................................. 35 3.1.3.1. FSS AIC ......................................... 35 3.1.3.2. BSS AIC ......................................... 37 3.2. Naive Mix ........................................................ 39 3.3. Naive Bayes....................................................... 43 4. UNSUPERVISED LEARNING FROM RAW TEXT .................... 45 4.1. Probabilistic Models............................................... 46 4.1.1. EM Algorithm ............................................. 47 4.1.1.1. General Description............................... 47 4.1.1.2. Naive Bayes description ........................... 49 4.1.1.3. Naive Bayes example.............................. 51 4.1.2. Gibbs Sampling ............................................ 57 4.1.2.1. General Description............................... 58 4.1.2.2. Naive Bayes description ........................... 60 4.1.2.3. Naive Bayes example.............................. 63 4.2. Agglomerative Clustering.......................................... 70 4.2.1. Ward’s minimum–variance method ......................... 71 4.2.2. McQuitty’s similarity analysis .............................. 72 5. EXPERIMENTAL DATA .............................................. 74 5.1. Words ............................................................ 74 5.2. Feature Sets ...................................................... 75 5.2.1. Supervised Learning Feature Set............................ 75 vii 5.2.2. Unsupervised Learning Feature Sets ........................ 80 5.2.3. Feature Sets and Event Distributions ....................... 83 6. SUPERVISED LEARNING EXPERIMENTAL RESULTS .............. 92 6.1. Experiment 1: Sequential Model Selection ......................... 92 6.1.1. Overall Accuracy........................................... 93 6.1.2. Model Complexity ......................................... 96 6.1.3. Model Selection as a Robust Process........................ 96 6.1.4. Model selection for Noun interest .......................... 99 6.2. Experiment 2: Naive Mix.......................................... 104 6.3. Experiment 3: Learning Rate...................................... 109 6.4. Experiment 4: Bias Variance Decomposition ....................... 113 7. UNSUPERVISED LEARNING EXPERIMENTAL RESULTS ........... 119 7.1. Assessing Accuracy in Unsupervised Learning...................... 120 7.2. Analysis 1: Probabilistic Models................................... 124 7.2.1. Methodological Comparison ................................ 127 7.2.2. Feature Set Comparison.................................... 130 7.3. Analysis 2: Agglomerative Clustering .............................. 135 7.3.1. Methodological Comparison ................................ 138 7.3.2. Feature Set Comparison.................................... 143 7.4. Analysis 3: Gibbs Sampling and McQuitty’s Similarity Analysis ... 145 8. RELATED WORK..................................................... 151 8.1. Semantic Networks ................................................ 152 8.2. Machine Readable Dictionaries .................................... 154 8.3. Parallel Translations .............................................. 155 viii 8.4. Sense–Tagged Corpora ............................................ 157 8.5. Raw Untagged Corpora ........................................... 160 9. CONCLUSIONS ....................................................... 163 9.1. Supervised Learning............................................... 163 9.1.1. Contributions .............................................. 163 9.1.2. Future Work ............................................... 165 9.2. Unsupervised Learning ............................................ 168 9.2.1. Contributions .............................................. 169 9.2.2. Future Work ............................................... 170 REFERENCES .............................................................. 174 ix LIST OF FIGURES Figure Page 2.1. Saturated Model (CVRTS) ........................................... 18 2.2. Decomposable Model (CSV)(RST) ................................... 19 2.3. Model of Independence (C)(V)(R)(T)(S) .............................. 21 2.4. Naive Bayes Model (CS)(RS)(TS)(VS) ............................... 22 4.1. E–Step Iteration 1 .................................................... 52 4.2. M–Step Iteration 1: pˆ(S), pˆ(F |S), pˆ(F |S) ............................ 53 1 2 4.3. E–Step Iteration 2 .................................................... 54 4.4. E–Step Iteration 2 .................................................... 55 4.5. M–Step Iteration 2: pˆ(S), pˆ(F |S), pˆ(F |S) ............................ 55 1 2 4.6. E–Step Iteration 3 .................................................... 56 4.7. E–Step Iteration 3 .................................................... 57 4.8. Stochastic E–Step Iteration 1.......................................... 64 4.9. Stochastic M–step Iteration 1: pˆ(S), pˆ(F |S), pˆ(F |S) .................. 65 1 2 4.10. E–Step Iteration 2 .................................................... 66 4.11. Stochastic E–Step Iteration 2.......................................... 67 4.12. Stochastic M–step Iteration 2: pˆ(S), pˆ(F |S), pˆ(F |S) .................. 68 1 2 4.13. Stochastic E–Step Iteration 3.......................................... 69 4.14. Stochastic E–Step Iteration 3.......................................... 69 4.15. Matrix of Feature Values, Dissimilarity Matrix......................... 71 x

Description:

are manually annotated with sense values by a human judge. These sense–tagged accuracy of unsupervised learning algorithms; particular attention is paid to features

LEARNING PROBABILISTIC MODELS OF WORD SENSE DISAMBIGUATION Approved by PDF

195 Pages·2007·0.88 MB·English

Checking for file health...

Save to my drive

Quick download

Download

Download LEARNING PROBABILISTIC MODELS OF WORD SENSE DISAMBIGUATION Approved by PDF Free - Full Version

by Unknow| 2007| 195 pages| 0.88| English

Download LEARNING PROBABILISTIC MODELS OF WORD SENSE DISAMBIGUATION Approved by by in PDF format completely FREE. No registration required, no payment needed. Get instant access to this valuable resource on PDFdrive.to!

Free Download PDF

About LEARNING PROBABILISTIC MODELS OF WORD SENSE DISAMBIGUATION Approved by

are manually annotated with sense values by a human judge. These sense–tagged accuracy of unsupervised learning algorithms; particular attention is paid to features

Detailed Information

Author:	Unknown
Publication Year:	2007
Pages:	195
Language:	English
File Size:	0.88
Format:	PDF
Price:	FREE

Download Free PDF

Safe & Secure Download - No registration required

Why Choose PDFdrive for Your Free LEARNING PROBABILISTIC MODELS OF WORD SENSE DISAMBIGUATION Approved by Download?

100% Free: No hidden fees or subscriptions required for one book every day.
No Registration: Immediate access is available without creating accounts for one book every day.
Safe and Secure: Clean downloads without malware or viruses
Multiple Formats: PDF, MOBI, Mpub,... optimized for all devices
Educational Resource: Supporting knowledge sharing and learning

Frequently Asked Questions

Is it really free to download LEARNING PROBABILISTIC MODELS OF WORD SENSE DISAMBIGUATION Approved by PDF?

Yes, on https://PDFdrive.to you can download LEARNING PROBABILISTIC MODELS OF WORD SENSE DISAMBIGUATION Approved by by completely free. We don't require any payment, subscription, or registration to access this PDF file. For 3 books every day.

How can I read LEARNING PROBABILISTIC MODELS OF WORD SENSE DISAMBIGUATION Approved by on my mobile device?

After downloading LEARNING PROBABILISTIC MODELS OF WORD SENSE DISAMBIGUATION Approved by PDF, you can open it with any PDF reader app on your phone or tablet. We recommend using Adobe Acrobat Reader, Apple Books, or Google Play Books for the best reading experience.

Is this the full version of LEARNING PROBABILISTIC MODELS OF WORD SENSE DISAMBIGUATION Approved by?

Yes, this is the complete PDF version of LEARNING PROBABILISTIC MODELS OF WORD SENSE DISAMBIGUATION Approved by by Unknow. You will be able to read the entire content as in the printed version without missing any pages.

Is it legal to download LEARNING PROBABILISTIC MODELS OF WORD SENSE DISAMBIGUATION Approved by PDF for free?

https://PDFdrive.to provides links to free educational resources available online. We do not store any files on our servers. Please be aware of copyright laws in your country before downloading.

The materials shared are intended for research, educational, and personal use in accordance with fair use principles.