Table Of ContentTLFeBOOK
Analyzing Rater Agreement
Manifest Variable Methods
TLFeBOOK
This page intentionally left blank
TLFeBOOK
Analyzing Rater Agreement
Manifest Variable Methods
Alexander von Eye
Michigan State University
Eun Young Mun
University of Alabama at Birmingham
LAWRENCE ERLBAUM ASSOCIATES, PUBLISHERS
Mahwah, New Jersey London
TLFeBOOK
Camera ready copy for this book was provided by the authors.
Copyright © 2005 by Lawrence Erlbaum Associates, Inc.
All rights reserved. No part of this book may be reproduced in any form,
by photostat, microform, retrieval system, or any other means, with-
out prior written permission of the publisher.
Lawrence Erlbaum Associates, Inc., Publishers
10 Industrial Avenue
Mahwah, New Jersey 07430
Cover design by Kathryn Houghtaling Lacey
Library of Congress Cataloging-in-Publication Data
Eye, Alexander von.
Analyzing rater agreement: manifest variable methods / Alexander
von Eye, Eun Young Mun.
p. cm.
Includes bibliographical references and index.
ISBN 0-8058-4967-X (alk. paper)
1. Multivariate analysis. 2. Acquiescence (Psychology)—Statistical
methods. I. Mun, Eun Young. II. Title.
QA278.E94 2004
519.5'35—dc22 2004043344
CIP
Books published by Lawrence Erlbaum Associates are printed on acid-
free paper, and their bindings are chosen for strength and durability.
Printed in the United States of America
10 9 8 7 6 5 4 3 21
Disclaimer:
This eBook does not include the ancillary media that was
packaged with the original printed version of the book.
TLFeBOOK
Contents
Preface ix
Coefficients of Rater Agreement
1. 1
1.1 Cohen's K (kappa) 1
1.1.1 K as a Summary Statement for the Entire Agreement Table 2
1.1.2 Conditional K 8
1.2 Weighted K 10
1.3 Raw Agreement, Brennan and Prediger's K, and a
n
Comparison with Cohen's K 13
1.4 The Power of K 17
1.5 Kendall's W for Ordinal Data 19
1.6 Measuring Agreement among Three or More Raters 22
1.7 Many Raters or Many Comparison Objects 25
1.8 Exercises 27
2. Log-Linear Models of Rater Agreement 31
2.1 A Log-Linear Base Model 32
2.2 A Family of Log-Linear Models for Rater Agreement 34
2.3 Specific Log-Linear Models for Rater Agreement 35
2.3.1 The Equal-Weight Agreement Model 35
2.3.2 The Weight-by-Response-Category Agreement Model 40
2.3.3 Models with Covariates 41
2.3.3.1 Models for Rater Agreement with Categorical Covariates 42
2.3.3.2 Models for Rater Agreement with Continuous Covariates 48
2.3.4 Rater Agreement plus Linear-by-Linear Association for
Ordinal Variables 54
2.3.5 Differential Weight Agreement Model with Linear-by-Linear
Interaction plus Covariates 59
2.4 Extensions 63
2.4.1 Modeling Agreement among More than Two Raters 63
2.4.1.1 Estimation of Rater-Pair-Specific Parameters 64
2.4.1.2 Agreement among Three Raters 67
2.4.2 Rater-Specific Trends 68
2.4.3 Generalized Coefficients K 70
2.5 Exercises 75
3. Exploring Rater Agreement 79
3.1 Configural Frequency Analysis: A Tutorial 80
3.2 CFA Base Models for Rater Agreement Data 85
TLFeBOOK
vi Contents
3.2.1 CFA of Rater Agreement Data Using the Main Effect Base
Model 85
3.2.2 Zero Order CFA of Agreement Tables 87
3.2.3 CFA of Rater Agreement Data under Consideration of
Linear-by-Linear Association for Ordinal Variables 91
3.2.4 Using Categorical Covariates in CFA 93
3.3 Fusing Explanatory and Exploratory Research: Groups
of Types 97
3.4 Exploring the Agreement among Three Raters 100
3.5 What Else Is Going on in the Table: Blanking out
Agreement Cells 103
3.5.1 CFA of Disagreement Cells 104
3.5.2 Testing Hypotheses about Disagreement 111
3.6 Exercises 112
4. Correlation Structures 115
4.1 Intraclass Correlation Coefficients 116
4.2 Comparing Correlation Matrices Using LISREL 123
4.3 Exercises 129
5. Computer Applications 131
5.1 Using SPSS to Calculate Cohen's K 132
5.2 Using SYSTAT to Calculate Cohen's K 134
5.3 Programs for Weighted K 135
5.3.1 Using SAS to Calculate Cohen's K and Weighted K 135
5.3.2 Other Programs for Weighted K 137
5.4 Using Lem to Model Rater Agreement 142
5.4.1 Specifying the Equal Weight and the Weight-by-Response-
Category Agreement Models 142
5.4.2 Models with Covariates 146
5.4.2.1 Models with Categorical Covariates 147
5.4.2.2 Models with Continuous Covariates 149
5.4.3 Linear-by-Linear Association Models of Rater Agreement 149
5.4.4 Models of Agreement among More than Two Raters 151
5.4.5 Models of Rater-Specific Trends 151
5.5 Using Configural Frequency Analysis to Explore
Patterns of Agreement 152
5.5.1 First Order CFA (Main Effects Only) 152
5.5.2 Zero Order CFA 156
5.5.3 First Order CFA with One Continuous Covariate 158
5.5.4 CFA of the Agreement in Two Groups; No Gender-Association
Base Model 161
TLFeBOOK
Contents vii
5.5.5 CFA of the Agreement among Three Raters 164
5.6 Correlation Structures: LISREL Analyses 166
5.7 Calculating the Intraclass Correlation Coefficient 170
6. Summary and Outlook 173
References 177
Author Index 185
Subject Index 187
TLFeBOOK
This page intentionally left blank
TLFeBOOK
Preface
Agreement among raters is of great importance in many domains, both
academic and nonacademic. In the Olympic Games, the medals and ranking
in gymnastics, figure skating, synchronized swimming, and other
disciplines are based on the ratings of several judges. Extreme judgements
are often discarded from the pool of scores used for the ranking. In
medicine, diagnoses are often provided by more than one doctor, to make
sure the proposed treatment is optimal. In criminal trials, a group of jurors
is used, and sentencing depends, among other things, on the complete
agreement among the jurors. In observational studies, researchers increase
reliability by discussing discrepant ratings. Restaurants receive Michelin
stars only after several test-eaters agree on the chef's performance. There
are many more examples.
We believe that this book will appeal to a broad range of students
and researchers, in particular in the areas of psychology, biostatistics,
medical research, education anthropology, sociology, and many other areas
in which ratings are provided by multiple sources. A large number of
models is presented, and examples are provided from many of these fields
and disciplines.
This text describes four approaches to the statistical analysis of
rater agreement. The first approach, covered in chapter 1, involves
calculating coefficients that allow one to summarize agreement in a single
score. Five coefficients are reviewed that differ in (1) the scale level of
rating categories that they can analyze; (2) the assumptions made when
specifying a chance model, that is, the model with which the observed
agreement is compared; (3) whether or not there exist significance tests;
and (4) whether they allow one to place weights on rating categories.
The second approach, presented in chapter 2, involves estimating
log-linear models. These are typically more complex than coefficients of
rater agreement, and allow one to test specific hypotheses about the
structure of a cross-classification of two or more raters' judgements. Often,
such cross-classifications display characteristics such as, for instance,
trends, that help interpret the joint frequency distribution of two or more
raters. This text presents a family of log-linear models and discusses
submodels, that is, special cases.
The third approach, in chapter 3, involves exploring cross-
TLFeBOOK
Description:Agreement among raters is of great importance in many domains. For example, in medicine, diagnoses are often provided by more than one doctor to make sure the proposed treatment is optimal. In criminal trials, sentencing depends, among other things, on the complete agreement among the jurors. In obs