Table Of ContentPost-Shrinkage Strategies
in Statistical and
Machine Learning for
High-Dimensional Data
This book presents some post-estimation and predictions strategies for the host of useful statis-
tical models with applications in data science. It combines statistical learning and machine learn-
ing techniques in a unique and optimal way. It is well-known that machine learning methods are
subject to many issues relating to bias, and consequently the mean squared error and prediction
error may explode. For this reason, we suggest shrinkage strategies to control the bias by com-
bining a submodel selected by a penalized method with a model with many features. Further, the
suggested shrinkage methodology can be successfully implemented for high-dimensional data
analysis. Many researchers in statistics and medical sciences work with big data. They need to
analyze this data through statistical modeling. Estimating the model parameters accurately is
an important part of the data analysis. This book may be a repository for developing improve
estimation strategies for statisticians. This book will help researchers and practitioners for their
teaching and advanced research and is an excellent textbook for advanced undergraduate and
graduate courses involving shrinkage, statistical, and machine learning.
• The book succinctly reveals the bias inherited in machine learning method and successfully
provides tools, tricks, and tips to deal with the bias issue.
• Expertly sheds light on the fundamental reasoning for model selection and post-estimation
using shrinkage and related strategies.
• This presentation is fundamental because shrinkage and other methods appropriate for model
selection and estimation problems, and there is a growing interest in this area to fill the gap
between competitive strategies.
• Application of these strategies to real-life data set from many walks of life.
• Analytical results are fully corroborated by numerical work, and numerous worked examples
are included in each chapter with numerous graphs for data visualization.
• The presentation and style of the book clearly makes it accessible to a broad audience. It offers
rich, concise expositions of each strategy and clearly describes how to use each estimation
strategy for the problem at hand.
• This book emphasizes that statistics/statisticians can play a dominant role in solving Big Data
problems and will put them on the precipice of scientific discovery.
• The book contributes novel methodologies for HDDA and will open a door for continued
research in this hot area.
• The practical impact of the proposed work stems from wide applications. The developed com-
putational packages will aid in analyzing a broad range of applications in many walks of life.
Taylor & Francis
Taylor & Francis Group
http://taylorandfrancis.com
Post-Shrinkage Strategies
in Statistical and
Machine Learning for
High-Dimensional Data
Syed Ejaz Ahmed
Feryaal Ahmed
Bahadır Yüzbaşı
Designed cover image: © Askhat Gilyakhov
First edition published 2023
by CRC Press
6000 Broken Sound Parkway NW, Suite 300, Boca Raton, FL 33487-2742
and by CRC Press
4 Park Square, Milton Park, Abingdon, Oxon, OX14 4RN
CRC Press is an imprint of Taylor & Francis Group, LLC
© 2023 Syed Ejaz Ahmed, Feryaal Ahmed and Bahadır Yüzbaşı
Reasonable efforts have been made to publish reliable data and information, but the author and publisher cannot as-
sume responsibility for the validity of all materials or the consequences of their use. The authors and publishers have
attempted to trace the copyright holders of all material reproduced in this publication and apologize to copyright hold-
ers if permission to publish in this form has not been obtained. If any copyright material has not been acknowledged,
please write and let us know so we may rectify in any future reprint.
Except as permitted under U.S. Copyright Law, no part of this book may be reprinted, reproduced, transmitted, or
utilized in any form by any electronic, mechanical, or other means, now known or hereafter invented, including pho-
tocopying, microfilming, and recording, or in any information storage or retrieval system, without written permission
from the publishers.
For permission to photocopy or use material electronically from this work, access
www.copyright.com or contact the Copyright Clearance Center, Inc. (CCC), 222 Rosewood Drive, Danvers, MA 01923,
978-750-8400. For works that are not available on CCC please contact [email protected]
Trademark notice: Product or corporate names may be trademarks or registered trademarks and are used only for
identification and explanation without intent to infringe.
ISBN: 978-0-367-76344-2 (hbk)
ISBN: 978-0-367-77205-5 (pbk)
ISBN: 978-1-003-17025-9 (ebk)
DOI: 10.1201/9781003170259
Typeset in CMR10
by KnowledgeWorks Global Ltd.
Publisher’s note: This book has been prepared from camera-ready copy provided by the authors.
Dedicated in loving memory to Don Fraser and Kjell Doksum.
Taylor & Francis
Taylor & Francis Group
http://taylorandfrancis.com
Contents
Preface xiii
Acknowledgments xv
Author/editor biographies xvii
List of Figures xix
List of Tables xxiii
Contributors xxvii
Abbreviations xxix
1 Introduction 1
1.1 Least Absolute Shrinkage and Selection Operator . . . . . . . . . . . . . . . 4
1.2 Elastic Net . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.3 Adaptive LASSO . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.4 Smoothly Clipped Absolute Deviation . . . . . . . . . . . . . . . . . . . . . 6
1.5 Minimax Concave Penalty . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.6 High-Dimensional Weak-Sparse Regression Model . . . . . . . . . . . . . . . 7
1.7 Estimation Strategies. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.7.1 Pretest Estimation Strategy . . . . . . . . . . . . . . . . . . . . . . . 8
1.7.2 Shrinkage Estimation Strategy . . . . . . . . . . . . . . . . . . . . . 8
1.8 Asymptotic Properties of Non-Penalty Estimators . . . . . . . . . . . . . . 9
1.8.1 Bias of Estimators . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.8.2 Risk of Estimators . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.9 Organization of the Book . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2 Introduction to Machine Learning 13
2.1 What is Learning? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.2 Unsupervised Learning: Principle Component Analysis and k-Means Clus-
tering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.2.1 Principle Component Analysis (PCA) . . . . . . . . . . . . . . . . . 14
2.2.2 k-Means Clustering. . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.2.3 Extension: Unsupervised Text Analysis . . . . . . . . . . . . . . . . 17
2.3 Supervised Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.3.1 Logistic Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.3.2 Multivariate Adaptive Regression Splines (MARS) . . . . . . . . . . 19
2.3.3 k Nearest Neighbours (kNN) . . . . . . . . . . . . . . . . . . . . . . 20
2.3.4 Random Forest . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
2.3.5 Support Vector Machine (SVM) . . . . . . . . . . . . . . . . . . . . 23
vii
viii Contents
2.3.6 Linear Discriminant Analysis (LDA) . . . . . . . . . . . . . . . . . . 24
2.3.7 Artificial Neural Network (ANN) . . . . . . . . . . . . . . . . . . . . 25
2.3.8 Gradient Boosting Machine (GBM) . . . . . . . . . . . . . . . . . . 27
2.4 Implementation in R . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
2.5 Case Study: Genomics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
2.5.1 Data Exploration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
2.5.2 Modeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
3 Post-Shrinkage Strategies in Sparse Regression Models 33
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
3.2 Estimation Strategies. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
3.2.1 Least Squares Estimation Strategies . . . . . . . . . . . . . . . . . . 36
3.2.2 Maximum Likelihood Estimator . . . . . . . . . . . . . . . . . . . . 36
3.2.3 Full Model and Submodel Estimators . . . . . . . . . . . . . . . . . 37
3.2.4 Shrinkage Strategies . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
3.3 Asymptotic Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
3.3.1 Asymptotic Distributional Bias . . . . . . . . . . . . . . . . . . . . . 42
3.3.2 Asymptotic Distributional Risk . . . . . . . . . . . . . . . . . . . . . 44
3.4 Relative Risk Assessment . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
3.4.1 Risk Comparison of βˆFM and βˆSM . . . . . . . . . . . . . . . . . . . 47
1 1
3.4.2 Risk Comparison of βˆFM and βˆS . . . . . . . . . . . . . . . . . . . . 47
1 1
3.4.3 Risk Comparison of βˆS and βˆSM . . . . . . . . . . . . . . . . . . . . 48
1 1
3.4.4 Risk Comparison of βˆPS and βˆFM . . . . . . . . . . . . . . . . . . . 49
1 1
3.4.5 Risk Comparison of βˆPS and βˆS . . . . . . . . . . . . . . . . . . . . 49
1 1
3.4.6 Mean Squared Prediction Error . . . . . . . . . . . . . . . . . . . . . 50
3.5 Simulation Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
3.5.1 Strong Signals and Noises . . . . . . . . . . . . . . . . . . . . . . . . 51
3.5.2 Signals and Noises . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
3.5.3 Comparing Shrinkage Estimators with Penalty Estimators . . . . . . 55
3.6 Prostrate Cancer Data Example . . . . . . . . . . . . . . . . . . . . . . . . 65
3.6.1 Classical Strategy . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
3.6.2 Shrinkage and Penalty Strategies . . . . . . . . . . . . . . . . . . . . 71
3.6.3 Prediction Error via Bootstrapping . . . . . . . . . . . . . . . . . . . 74
3.6.4 Machine Learning Strategies . . . . . . . . . . . . . . . . . . . . . . 77
3.7 R-Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
3.8 Concluding Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
4 Shrinkage Strategies in High-Dimensional Regression Models 91
4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
4.2 Estimation Strategies. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
4.3 Integrating Submodels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
4.3.1 Sparse Regression Model. . . . . . . . . . . . . . . . . . . . . . . . . 95
4.3.2 Overfitted Regression Model . . . . . . . . . . . . . . . . . . . . . . 95
4.3.3 Underfitted Regression Model . . . . . . . . . . . . . . . . . . . . . . 96
4.3.4 Non-Linear Shrinkage Estimation Strategies . . . . . . . . . . . . . 96
4.4 Simulation Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
4.5 Real Data Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
4.5.1 Eye Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
4.5.2 Expression Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
4.5.3 Riboflavin Data. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
Contents ix
4.6 R-Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
4.7 Concluding Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
5 Shrinkage Estimation Strategies in Partially Linear Models 109
5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
5.1.1 Statement of the Problem . . . . . . . . . . . . . . . . . . . . . . . . 110
5.2 Estimation Strategies. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110
5.3 Asymptotic Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112
5.4 Simulation Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116
5.4.1 Comparing with Penalty Estimators . . . . . . . . . . . . . . . . . . 117
5.5 Real Data Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126
5.5.1 Housing Prices (HP) Data . . . . . . . . . . . . . . . . . . . . . . . . 126
5.5.2 Investment Data of Turkey . . . . . . . . . . . . . . . . . . . . . . . 127
5.6 High-Dimensional Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129
5.6.1 Real Data Example . . . . . . . . . . . . . . . . . . . . . . . . . . . 130
5.7 R-Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133
5.8 Concluding Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140
6 Shrinkage Strategies : Generalized Linear Models 147
6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147
6.2 Maximum Likelihood Estimation . . . . . . . . . . . . . . . . . . . . . . . . 149
6.3 A Genle Introduction of Logistic Regression Model . . . . . . . . . . . . . . 150
6.3.1 Statement of the Problem . . . . . . . . . . . . . . . . . . . . . . . 150
6.4 Estimation Strategies. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153
6.4.1 The Shrinkage Estimation Strategies . . . . . . . . . . . . . . . . . . 153
6.5 Asymptotic Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154
6.6 Simulation Experiment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156
6.6.1 Penalized Strategies . . . . . . . . . . . . . . . . . . . . . . . . . . . 158
6.7 Real Data Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173
6.7.1 Pima Indians Diabetes (PID) Data . . . . . . . . . . . . . . . . . . . 173
6.7.2 South Africa Heart-Attack Data . . . . . . . . . . . . . . . . . . . . 175
6.7.3 Orinda Longitudinal Study of Myopia (OLSM) Data . . . . . . . . . 175
6.8 High-Dimensional Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177
6.8.1 Simulation Experiments . . . . . . . . . . . . . . . . . . . . . . . . . 179
6.8.2 Gene Expression Data . . . . . . . . . . . . . . . . . . . . . . . . . . 181
6.9 A Gentle Introduction of Negative Binomial Models . . . . . . . . . . . . . 181
6.9.1 Sparse NB Regression Model . . . . . . . . . . . . . . . . . . . . . . 186
6.10 Shrinkage and Penalized Strategies . . . . . . . . . . . . . . . . . . . . . . . 186
6.11 Asymptotic Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 187
6.12 Simulation Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 189
6.13 Real Data Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 200
6.13.1 Resume Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 200
6.13.2 Labor Supply Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201
6.14 High-Dimensional Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203
6.15 R-Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 205
6.16 Concluding Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 213