Table Of Content

This page intentionally left blank DATA ANALYSIS WILEY SERIES IN PROBABILITY AND STATISTICS Established by WALTER A. SHEWHART and SAMUEL S. WILKS Editors: David J. Balding, Noel A. C. Cressie, Garrett M. Fitzmaurice, Iain M. Johnstone, Geert Molenberghs, David W. Scott, Adrian F M. Smith, Ruey S. Tsay, Sanford Weisberg Editors Emeriti: Vic Barnett, J. Stuart Hunter, Joseph B. Kadane, JozefL. Teugels A complete list of the titles in this series appears at the end of this volume. DATA ANALYSIS What Can Be Learned From the Past 50 Years Peter J. Huber Professor of Statistics, retired Klosters, Switzerland WILEY A JOHN WILEY & SONS, INC., PUBLICATION Copyright © 2011 by John Wiley & Sons, Inc. All rights reserved Published by John Wiley & Sons, Inc., Hoboken, New Jersey Published simultaneously in Canada No part of this publication may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, electronic, mechanical, photocopying, recording, scanning, or otherwise, except as permitted under Section 107 or 108 of the 1976 United States Copyright Act, without either the prior written permission of the Publisher, or authorization through payment of the appropriate per-copy fee to the Copyright Clearance Center, Inc., 222 Rosewood Drive, Danvers, MA 01923, (978) 750-8400, fax (978) 750-4470, or on the web at www.copyright.com. Requests to the Publisher for permission should be addressed to the Permissions Department, John Wiley & Sons, Inc., 111 River Street, Hoboken, NJ 07030, (201) 748-6011, fax (201) 748-6008, or online at http://www.wiley.com/go/permission. Limit of Liability/Disclaimer of Warranty: While the publisher and author have used their best efforts in preparing this book, they make no representations or warranties with respect to the accuracy or completeness of the contents of this book and specifically disclaim any implied warranties of merchantability or fitness for a particular purpose. No warranty may be created or extended by sales representatives or written sales materials. The advice and strategies contained herein may not be suitable for your situation. You should consult with a professional where appropriate. Neither the publisher nor author shall be liable for any loss of profit or any other commercial damages, including but not limited to special, incidental, consequential, or other damages. For general information on our other products and services or for technical support, please contact our Customer Care Department within the United States at (800) 762-2974, outside the United States at (317) 572-3993 or fax (317) 572-4002. Wiley also publishes its books in a variety of electronic formats. Some content that appears in print may not be available in electronic formats. For more information about Wiley products, visit our web site at www.wiley.com. Library of Congress Cataloging-in-Publication Data: Huber, Peter J. Data analysis : what can be learned from the past 50 years / Peter J. Huber. p. cm. — (Wiley series in probability and statistics ; 874) Includes bibliographical references and index. ISBN 978-1-118-01064-8 (hardback) 1. Mathematical statistics—History. 2. Mathematical statistics—Philosophy. 3. Numerical analysis—Methodology. I. Title. QA276.15.H83 2011 519.509—dc22 2010043284 Printed in the United States of America. 10 9 8 7 6 5 4 3 21 CONTENTS Preface xi 1 What is Data Analysis? 1 1.1 Tukey's 1962 paper 3 1.2 The Path of Statistics 5 2 Strategy Issues in Data Analysis 11 2.1 Strategy in Data Analysis 11 2.2 Philosophical issues 13 2.2.1 On the theory of data analysis and its teaching 14 2.2.2 Science and data analysis 15 2.2.3 Economy of forces 16 2.3 Issues of size 17 2.4 Strategic planning 21 2.4.1 Planning the data collection 21 2.4.2 Choice of data and methods. 22 v VI CONTENTS 2.4.3 Systematic and random errors 23 2.4.4 Strategic reserves 24 2.4.5 Human factors 25 2.5 The stages of data analysis 26 2.5.1 Inspection 26 2.5.2 Error checking 27 2.5.3 Modification 30 2.5.4 Comparison 30 2.5.5 Modeling and Model fitting 30 2.5.6 Simulation 31 2.5.7 What-if analyses 32 2.5.8 Interpretation 32 2.5.9 Presentation of conclusions 32 2.6 Tools required for strategy reasons 33 2.6.1 Ad hoc programming 33 2.6.2 Graphics 34 2.6.3 Record keeping 35 2.6.4 Creating and keeping order 35 Massive Data Sets 37 3.1 Introduction 38 3.2 Disclosure: Personal experiences 39 3.3 What isi massive? A classification of size 39 3.4 Obstacles to scaling 40 3.4.1 Human limitations: visualization 40 3.4.2 Human - machine interactions 41 3.4.3 Storage requirements 41 3.4.4 Computational complexity 42 3.4.5 Conclusions 43 43 3.5 On the structure of large data sets 43 3.5.1 Types of data 44 3.5.2 How do data sets grow? 44 3.5.3 On data organization 45 3.5.4 Derived data sets 46 3.6 Data base management and related issues 48 3.6.1 Data archiving CONTENTS VII 3.7 The stages of a data analysis 49 3.7.1 Planning the data collection 49 3.7.2 Actual collection 50 3.7.3 Data access 50 3.7.4 Initial data checking 50 3.7.5 Data analysis proper 51 3.7.6 The final product: presentation of arguments and conclusions 51 3.8 Examples and some thoughts on strategy 52 3.9 Volume reduction 55 3.10 Supercomputers and software challenges 56 3.10.1 When do we need a Concorde? 57 3.10.2 General Purpose Data Analysis and Supercomputers 57 3.10.3 Languages, Programming Environments and Data- based Prototyping 58 3.11 Summary of conclusions 59 Languages for Data Analysis 61 4.1 Goals and purposes 62 4.2 Natural languages and computing languages 64 4.2.1 Natural languages 64 4.2.2 Batch languages 65 4.2.3 Immediate languages 67 4.2.4 Language and literature 68 4.2.5 Object orientation and related structural issues 69 4.2.6 Extremism and compromises, slogans and reality 71 4.2.7 Some conclusions 73 4.3 Interface issues 74 4.3.1 The command line interface 75 4.3.2 The menu interface 78 4.3.3 The batch interface and programming environments 80 4.3.4 Some personal experiences 81 4.4 Miscellaneous issues 82 4.4.1 On building blocks 82 4.4.2 On the scope of names 83 4.4.3 On notation 83 viii CONTENTS 4.4.4 Book-keeping problems 84 4.5 Requirements for a general purpose immediate language 85 5 Approximate Models 89 5.1 Models 89 5.2 Bayesian modeling 92 5.3 Mathematical statistics and approximate models 94 5.4 Statistical significance and physical relevance 96 5.5 Judicious use of a wrong model 97 5.6 Composite models 98 5.7 Modeling the length of day 99 5.8 The role of simulation 111 5.9 Summary of conclusions 112 6 Pitfalls 113 6.1 Simpson's paradox 114 6.2 Missing data 116 6.2.1 The Case of the Babylonian Lunar Six 118 6.2.2 X-ray crystallography 126 6.3 Regression of Y on X or of X on Yl 129 7 Create order in data 133 7.1 General considerations 134 7.2 Principal component methods 135 7.2.1 Principal component methods: Jury data 137 7.3 Multidimensional scaling 145 7.3.1 Multidimensional scaling: the method 145 7.3.2 Multidimensional scaling: a synthetic example 145 7.3.3 Multidimensional scaling: map reconstruction 147 7.4 Correspondence analysis 147 7.4.1 Correspondence analysis: the method 147 7.4.2 Kiiltepe eponyms 148 7.4.3 Further examples: marketing and Shakespearean plays 156 7.5 Multidimensional scaling vs. Correspondence analysis 160 7.5.1 Hodson's grave data 162 7.5.2 Plato data 168

Data Analysis: What Can Be Learned From the Past 50 Years PDF

235 Pages·10.516 MB·English

by Peter J. Huber

#Computers #Cybernetics: Artificial Intelligence

Checking for file health...

Save to my drive

Quick download

Download

Download Data Analysis: What Can Be Learned From the Past 50 Years PDF Free - Full Version

by Peter J. Huber| 235 pages| 10.516| English

Download Data Analysis: What Can Be Learned From the Past 50 Years by Peter J. Huber in PDF format completely FREE. No registration required, no payment needed. Get instant access to this valuable resource on PDFdrive.to!

Free Download PDF

About Data Analysis: What Can Be Learned From the Past 50 Years

No description available for this book.

Detailed Information

Author:	Peter J. Huber
Pages:	235
Language:	English
File Size:	10.516
Format:	PDF
Price:	FREE

Download Free PDF

Safe & Secure Download - No registration required

Why Choose PDFdrive for Your Free Data Analysis: What Can Be Learned From the Past 50 Years Download?

100% Free: No hidden fees or subscriptions required for one book every day.
No Registration: Immediate access is available without creating accounts for one book every day.
Safe and Secure: Clean downloads without malware or viruses
Multiple Formats: PDF, MOBI, Mpub,... optimized for all devices
Educational Resource: Supporting knowledge sharing and learning

Frequently Asked Questions

Is it really free to download Data Analysis: What Can Be Learned From the Past 50 Years PDF?

Yes, on https://PDFdrive.to you can download Data Analysis: What Can Be Learned From the Past 50 Years by Peter J. Huber completely free. We don't require any payment, subscription, or registration to access this PDF file. For 3 books every day.

How can I read Data Analysis: What Can Be Learned From the Past 50 Years on my mobile device?

After downloading Data Analysis: What Can Be Learned From the Past 50 Years PDF, you can open it with any PDF reader app on your phone or tablet. We recommend using Adobe Acrobat Reader, Apple Books, or Google Play Books for the best reading experience.

Is this the full version of Data Analysis: What Can Be Learned From the Past 50 Years?

Yes, this is the complete PDF version of Data Analysis: What Can Be Learned From the Past 50 Years by Peter J. Huber. You will be able to read the entire content as in the printed version without missing any pages.

Is it legal to download Data Analysis: What Can Be Learned From the Past 50 Years PDF for free?

https://PDFdrive.to provides links to free educational resources available online. We do not store any files on our servers. Please be aware of copyright laws in your country before downloading.

The materials shared are intended for research, educational, and personal use in accordance with fair use principles.