Table Of ContentLectu re N otes in
Statistics
Edited by D. Brillinger, S. Fienberg, J. Gani,
J. Hartigan, and K. Krickeberg
34
Douglas E. Critchlow
Metric Methods for Analyzing
Partially Ranked Data
Springer-Verlag Berlin Heidelberg GmbH
Author
Douglas E. Critchlow
Department of Statistics, Purdue University
West Lafayette, Indiana 47907, USA
Mathematics Subject Classification (1980): 62A05, 62F07
ISBN 978-0-387-96288-7 ISBN 978-1-4612-1106-8 (eBook)
DOI 10.1007/978-1-4612-1106-8
Library 01 Congress Cataloging-in·Publication Data. Critchlow, Douglas Edward. Metric methods
lor analyzing partially ranked data. (Lecture notes in statistics; 34) Bibliography: p. Includes index.
1 . Ranking and selection (Statistics) 2. Metric spaces. I. Title. 11. Series: Lecture notes in statistics
(Springer-Verlag); v. 34.
0A278.75.C75 1985519.585-25044
This work is subject to copyright. All rights are reserved, whether the whole or part 01 the material
is concerned, specilically those 01 translation, reprinting, re-use 01 illustrations, broadcasting,
reproduction by photocopying machine or similar means, and storage in data banks. Under
§ 54 01 the German Copyright Law where copies are made lor other than private use, a fee is
payable to "Verwertungsgesellschaft Wort", Munich.
© by Springer-Verlag Berlin Heidelberg 1985
Originally published by Springer-Verlag Berlin Heidelberg New York in 1985
2147/3140-543210
To my parents
PREFACE
A full ranking of n items is simply an ordering of all these items,
of the form: first choice, second choice, •.. , n-th choice. If two
judges each rank the same n items, statisticians have used various
metrics to measure the closeness of the two rankings, including Ken
dall's tau, Spearman's rho, Spearman's footrule, Ulam's metric,
Hal1l11ing distance, and Cayley distance. These metrics have been em
ployed in many contexts, in many applied statistical and scientific
problems.
Thi s monograph presents genera 1 methods for extendi ng these metri cs
to partially ranked data. Here "partially ranked data" refers, for
instance, to the situation in which there are n distinct items, but
each judge specifies only his first through k-th choices, where k < n.
More complex types of partially ranked data are also investigated.
Group theory is an important tool for extending the metrics.
Full rankings are identified with elements of the permutation group,
whereas partial rankings are identified with points in a coset
space of the permutation group. The problem thus becomes one of ex
tending metrics on the permutation group to metrics on a coset space
of the permutation group. To carry out the extens"ions, two novel
methods -- the so-called Hausdorff and fixed vector methods -- are
introduced and implemented, which exploit this group-theoretic
structure.
Various data-analytic applications of metrics on fully ranked
data have been presented in the statistical literature. These can
be extended now to applications of metrics on partially ranked data,
and are illustrated by analyses of real data sets. The applications
include fitting probability models to partially ranked data, multi
dimensional scaling for partially ranked data, and testing for
significant differences between two populations of rankers.
* * *
It is a pleasure to acknowledge the special contribution to this
monograph of a wonderful scholar and friend, Persi Diaconis. Persi has
been an invaluable source of ideas and encouragement. I thank him for
many hours of stimulating conversation, on the uses of mathematics in
applied statistical problems.
David Pickard read the entire manuscript with great care and in
sight, and provided many pages of useful suggestions. I have enjoyed
several enlightening discussions with him, and am sincerely grateful
for all of his help.
I am indebted to Peter Huber for suggesting several intriguing
areas of investigation. I thank both him and Donald Anderson for their
thoughtful reading of the manuscript, and for helpful comments for im
proving the presentation.
Cheryl Waller did a superb job of typing the manuscript, and
handled my innumerable revisions with amazing efficiency and cheerful
ness.
VII
Financial support from National Science Foundation Grant
MCS80-24649 is acknowledged gratefully.
Finally, I want to thank the statistics departments at Harvard
University, Stanford University, and Purdue Univel'sity, for providing
very substantial encouragement and assistance, during the various
stages of evolution of this monograph.
TABLE OF CONTENTS
Page
1. INTRODUCTION AND OUTLINE ................................ 1
II. METRICS ON FULLY RANKED DATA ......................•..... 5
A. Permutations: Some Important Conventions .........•.. 5
B. Metrics on Permutations: Discussion and
Exampl es •.....•.....•..................•............ 6
C. The Requirement of Right-Invariance ................. 10
III. METRICS ON PARTIALLY RANKED DATA: THE CASE WHERE
EACH JUDGE LISTS HIS k FAVORITE ITEMS OUT OF n .......... 12
A. The Coset Space Sn/Sn_k ....................•........ 12
B. The Hausdorff Metrics on Sn/Sn-k .................... 14
C. The Fixed Vector Metrics on Sn/Sn-k .•............... 27
IV. METRICS ON OTHER TYPES OF PARTIALLY
RAN KED DATA ............................................. 33
A. The Coset Space Sn~S, Where S = Snl x Sn2, x ... x Snr ... 34
B. The Hausdorff MetrlCS on Sn/S •......•.. 36
i •••••••••••••
C. The Fixed Vector Metrics on Sn/S .................... 44
D. Hausdorff Distances between Different
Types of Partially Ranked Data: A Complete
Proof of the Main Theorem ........................... 46
E. The Tied Ranks Approach to Metrizing
Partially Ranked Data ............................... 73
1. A Description of the Tied Ranks
Approach ..............•......................... 73
2. Relations among the Tied Ranks, Hausdorff.
and Fixed Vector Metrics ........................ 74
3. Limitations of the Tied Ranks Approach .......... 77
V. DISTRIBUTIONAL PROPERTIES OF THE METRICS ................ 80
A. Exact Distributions ................................. 80
B. Asymptotic Distributions ............................ 85
VI. DATA ANALYSIS. USING THE METRICS ..............•......... 97
A. Fitting Probability Models to Partially
Ranked Data ......................................... 97
1. Mallows' Model for Fully Ranked Data ......•..... 97
x
Page
2. The Extension of Mallows' Model to
Partially Ranked Data •....................•....... 100
3. A Likelihood Ratio Interpretation of the
Triangle Inequality •.......•....•.............•... 101
4. Maximum Likelihood Estimation for the
Model ..•.•..................•...•....•.••••....•.• 102
5. A Goodness-of-Fit Result ..........••...••......... 107
6. An Example: The Educational Testing
Service Word Association Data .........•........... 111
B. Multidimensional Scaling for Partially Ranked
Data .........•..........•................•.••••...•... 116
1. An Example, Using Leann Lipps
Birch's Cracker Preference Data ................... 117
C. Two Sample Problems for Partially Ranked Data ......... 121
1. A Two-Sample Test Based on the Minimal
Spanning Tree .•••...•••...•...•.•..........•••.... 122
2. A Two-Sample Test Based on the Nearest
Neighbors Graph ................................... 126
APPENDIX A - THE EXISTENCE OF FIXED VECTORS •....••.....•.....•. 130
APPENDIX B - FORTRAN SUBROUTINES FOR EVALUATING THE
METRICS ON Sn/Sn_k AND Sn/S •••................•............ 139
APPENDIX C - FORTRAN SUBROUTINES FOR FITTING MALLOWS'
MODEL TO PARTIALLY RANKED DATA .....•...........••.••••.••..• 158
APPENDIX D - TABLES OF THE DISTRIBUTIONS OF THE METRICS
ON Sn/Sn_k . • . . . . . . . . . . . . . . . . • • . . . . . . . . . . . • . • • . . . . . . . • . . • .. 168
APPENDIX E - COMPARISON OF EXACT AND ASYMPTOTIC
DISTRIBUTIONS ............................................... 205
BIBLIOGRAPHY ...................................................................................................... 210
INDEX OF NOTATION ............................................................................................ 214
CHAPTER I - INTRODUCTION AND OUTLINE
There are many instances of partially ranked data, where several
items are ranked according to some criterion, but the ordering is not
complete. In its simplest form, such data arises when there are n
distinct items, and each judge lists in order his k favorite items,
where k < n. An example with n = 5 and k = 3 is afforded by the De
troit Area study, which asks people to specify the first, second, and
third most important out of five named parts of marriage [84]. More
complex types of partially ranked data are also possible. The General
Social Survey [01] lists thirteen qualities that a child could possess,
and from this list, respondents are asked to choose the most desirable
quality, the two next most desirable qualities, the least desirable
quality, and the two next least desirable qualities.
A number of interesting statistical questions arise from looking
at such data. How does one measure the degree of association between
two judges' partial rankings. and is the association "statistically
significant"? What is a reasonable probabil ity model for all of the
respondents' rankings. and how can one test whether it fits the data?
Does the data point to a significant difference between two distinct
subpopulations of rankers?
This monograph approaches such questions by a novel method. which
uses metrics on the permutation group and on coset spaces of the
permutation group. It begins by studying, in Chapter II, some
procedures which have already been developed for the case of fully
2
ranked data. In particular, suppose that two individuals each pro
vide a full ranking of the same set of items (i.e., a first choice,
a second choice, ... , a last choice). Statisticians have several ways
to measure the closeness of two such rankings, including Kendall's
tau, Spearman's rho, Spearman's footrule, Hamming distance, Ulam's
distance, and Cayley's distance. In Chapter II, these "measures of
association" for fu11y ranked data are identified, mathematica11y,
with metrics on the permutation group Sn' and their invariance pro
perties are investigated. This is a review of some material in
Oiaconis' important monograph [02].
We would like to extend these "measures of association" to
partially ranked data. There are various tricks which can be used
to extend some of the metrics, and these are discussed in Sections
III.C, IV.C, and IV.E; but the focus of this monograph is on a general
procedure which enables us to extend all of the metrics.
In brief, the method works as follows. The set of all
partial rankings of k out of n items can be identified with
a quotient space S IS k of the permutation group, consisting of all
n n-
right cosets of the subgroup Sn_k = {TIESn: TI(i) = ¥i = 1, ..• ,k}.
The problem then becomes one of extending metrics on Sn to metrics
on the coset space Sn/Sn_k of partial rankings. This is accomplished
by calculating the so-ca11ed "induced Hausdorff metrics" on S IS k.
n n-
Chapter III presents the results of these calculations: each
of the six aforementioned "measures of association" for fu11y ranked
data has a natural extension to partially ranked data. Later chapters
explore the appropriate generalizations to more complex types of
partially ranked data, and the distributional properties of the metrics.
Description:A full ranking of n items is simply an ordering of all these items, of the form: first choice, second choice, •. . , n-th choice. If two judges each rank the same n items, statisticians have used various metrics to measure the closeness of the two rankings, including Ken dall's tau, Spearman's r