Table Of ContentDEVELOPMENT AND APPLICATION OF
STATISTICAL METHODS FOR
PROGNOSIS RESEARCH
By
KYM IRIS ERIKA SNELL
A thesis submitted to the University of Birmingham
for the degree of
DOCTOR OF PHILOSOPHY
School of Health and Population Sciences
University of Birmingham
May 2015
University of Birmingham Research Archive
e-theses repository
This unpublished thesis/dissertation is copyright of the author and/or third
parties. The intellectual property rights of the author or third parties in respect
of this work are as defined by The Copyright Designs and Patents Act 1988 or
as modified by any successor legislation.
Any use made of information contained in this thesis/dissertation must be in
accordance with that legislation and must be properly acknowledged. Further
distribution or reproduction in any format is prohibited without the permission
of the copyright holder.
ABSTRACT
A pivotal component of prognosis research is the prediction of future outcome risk. This
thesis applies, develops and evaluates novel statistical methods for development and
validation of risk prediction (prognostic) models. In the first part, a literature review of
published prediction models shows that the Cox model remains the most common approach
for developing a model using survival data; however, this avoids modelling the baseline
hazard and therefore restricts individualised predictions. Flexible parametric survival models
are shown to address this by flexibly modelling the baseline hazard, thereby enabling
individualised risk predictions over time. Clinical application reveals discrepant mortality rates
for different hip replacement procedures, and identifies common issues when developing
models using clinical trial data.
In the second part, univariate and multivariate random-effects meta-analyses are proposed
to summarise a model’s performance across multiple validation studies. The multivariate
approach accounts for correlation in multiple statistics (e.g. C-statistic and calibration slope),
and allows joint predictions about expected model performance in applied settings. This
allows competing implementation strategies (e.g. regarding baseline hazard choice) to be
compared and ranked. A simulation study also provides recommendations for the scales on
which to combine performance statistics to best satisfy the between-study normality
assumption in random-effects meta-analysis.
ACKNOWLEDGEMENTS
This PhD was funded by the MRC Midland Hub for Trials Methodology Research, without
which I would not have been able to complete this research.
I would like to express my gratitude to my supervisors, Prof. Richard Riley and Prof. Lucinda
Billingham. Richard, I consider myself so lucky to have had the opportunity to work with you.
You have always believed in me, built up my confidence, sent opportunities my way and
helped me find my passion and shape my career. I cannot truly thank you enough for the
endless support and guidance you have offered.
I would also like to thank the following people I have been lucky enough to work with:
Thomas Debray for input and feedback on several chapters, Joie Ensor for all the
discussions and feedback on chapters, Jon Deeks for his support in the last few months
leading up to submission, and not forgetting Karen Biddle and Anne Walker for helping with
anything and everything that they could. Thanks also go to my colleagues in Health and
Population Sciences for all their encouragement.
To my family, thank you for all your love and support over the years. Mum, Dad, Matt, Jay
and Dawn, thanks for always believing in me and supporting me in everything I do. I am so
lucky to have the family I do and words cannot express how much I love you all.
Last but certainly not least, thanks go to my friends, old and new: Lozz, Hannah, Ruby,
Elena, and the biggest thanks of all to a friend that has been there every single day of this
journey, Dani. Since we started our PhDs together, you have been the best friend and my
biggest support. We’ve been through it all together, you’ve been there to celebrate the highs
and help me through the lows. You have made the last three and a half years an amazing
experience. I am so thankful to have you in my life and proud of all we have achieved.
TABLE OF CONTENTS
Chapter 1: Introduction ...................................................................................................... 1
1.1 Introduction to research area .................................................................................... 1
1.2 What is prognosis research? ..................................................................................... 2
1.2.1 Framework for prognosis research .................................................................... 5
1.3 Logistic regression .................................................................................................... 6
1.3.1 Example prognostic model developed using logistic regression ........................ 8
1.4 Survival analysis ........................................................................................................ 9
1.4.1 Functions in survival data ................................................................................. 11
1.4.2 Cox proportional hazard model ........................................................................ 14
1.4.3 Parametric models ........................................................................................... 16
1.4.4 Flexible parametric models .............................................................................. 18
1.4.5 Non-proportional hazards ................................................................................. 23
1.4.6 Example prognostic model developed using a flexible parametric survival
model ......................................................................................................................... 23
1.5 Model development considerations ......................................................................... 24
1.6 Validating a prognostic model ................................................................................. 27
1.6.1 Internal validation ............................................................................................. 28
1.6.2 External validation ............................................................................................ 31
1.6.3 Validation statistics ........................................................................................... 32
1.7 Presentation of prognostic models for clinical decision making .............................. 38
1.8 Importance of improving methodology in prognosis research ................................. 39
1.9 Aims and overview of thesis .................................................................................... 42
Chapter 2: Hip replacement surgery in osteoarthritis patients .................................... 45
2.1 Introduction .............................................................................................................. 45
2.2 Background to hip replacement procedures ............................................................ 45
2.2.1 Cemented procedures ...................................................................................... 46
2.2.2 Uncemented procedures .................................................................................. 47
2.2.3 Hybrid procedures ............................................................................................ 48
2.2.4 Birmingham Hip Resurfacing ............................................................................ 48
2.3 Data ......................................................................................................................... 49
2.4 Objectives ................................................................................................................ 51
2.4.1 Clinical objectives ............................................................................................. 51
2.4.2 Statistical objectives ......................................................................................... 51
2.5 Methods ................................................................................................................... 52
2.5.1 Data cleaning, inclusion and exclusion criteria ................................................. 52
2.5.2 Summary of data .............................................................................................. 52
2.5.3 Analysis of primary outcomes .......................................................................... 53
2.5.4 Assessing the proportional hazards assumption .............................................. 54
2.5.5 Number of knots for the baseline hazard function ............................................ 55
2.5.6 Analysis of secondary outcomes ...................................................................... 55
2.6 Results ..................................................................................................................... 56
2.6.1 Summary of data for cemented and uncemented THRs .................................. 56
2.6.2 Proportional hazards assumption ..................................................................... 57
2.6.3 Number of knots for the baseline hazard function ............................................ 59
2.6.4 Primary outcome analyses ............................................................................... 61
2.6.5 Secondary analyses ......................................................................................... 73
2.7 Discussion ............................................................................................................... 81
2.7.1 Summary of clinical findings ............................................................................. 81
2.7.2 Statistical advantages of flexible parametric models in this dataset ................ 84
2.7.3 Potential pitfalls and situations when Royston-Parmar models are not required .
......................................................................................................................... 87
2.7.4 Further work ..................................................................................................... 87
2.8 Conclusion ............................................................................................................... 89
Chapter 3: Estimating the baseline hazard and absolute risk in multivariable
prediction models: a review of current practice ............................................................... 91
3.1 Introduction and objectives ...................................................................................... 91
3.2 Method .................................................................................................................... 93
3.2.1 Identifying a set of articles for review ............................................................... 93
3.2.2 Inclusion/exclusion criteria ............................................................................... 93
3.2.3 Review process ................................................................................................ 94
3.2.4 Evaluation of relevant articles .......................................................................... 94
3.3 Results .................................................................................................................... 96
3.3.1 Identification of relevant articles ....................................................................... 96
3.3.2 Summary of articles included in the review ...................................................... 97
3.3.3 Development data description and size ......................................................... 102
3.3.4 Model development methods ......................................................................... 105
3.3.5 Reporting of results ........................................................................................ 107
3.3.6 Modelling the baseline hazard and reporting absolute risk predictions .......... 108
3.3.7 Validation ....................................................................................................... 119
Description:Identification of relevant articles . Application of flexible parametric survival models for prognostic model . Appendix E2: Example Stata code for simulation study . The cementing technique has moved away from finger-.