Table Of Content

FROM STATISTICAL PHYSICS TO DATA-DRIVEN MODELLING From Statistical Physics to Data-Driven Modelling with Applications to Quantitative Biology Simona Cocco Rémi Monasson Francesco Zamponi GreatClarendonStreet,Oxford,OX26DP, UnitedKingdom OxfordUniversityPressisadepartmentoftheUniversityofOxford. ItfurtherstheUniversity’sobjectiveofexcellenceinresearch,scholarship, andeducationbypublishingworldwide.Oxfordisaregisteredtrademarkof OxfordUniversityPressintheUKandincertainothercountries ©SimonaCocco,RémiMonasson,andFrancescoZamponi2022 Themoralrightsoftheauthorshavebeenasserted Impression:1 Allrightsreserved.Nopartofthispublicationmaybereproduced,storedin aretrievalsystem,ortransmitted,inanyformorbyanymeans,withoutthe priorpermissioninwritingofOxfordUniversityPress,orasexpresslypermitted bylaw,bylicenceorundertermsagreedwiththeappropriatereprographics rightsorganization.Enquiriesconcerningreproductionoutsidethescopeofthe aboveshouldbesenttotheRightsDepartment,OxfordUniversityPress,atthe addressabove Youmustnotcirculatethisworkinanyotherform andyoumustimposethissameconditiononanyacquirer PublishedintheUnitedStatesofAmericabyOxfordUniversityPress 198MadisonAvenue,NewYork,NY10016,UnitedStatesofAmerica BritishLibraryCataloguinginPublicationData Dataavailable LibraryofCongressControlNumber:2022937922 ISBN978–0–19–886474–5 DOI:10.1093/oso/9780198864745.001.0001 Printedandboundby CPIGroup(UK)Ltd,Croydon,CR04YY LinkstothirdpartywebsitesareprovidedbyOxfordingoodfaithand forinformationonly.Oxforddisclaimsanyresponsibilityforthematerials containedinanythirdpartywebsitereferencedinthiswork. Contents 1 Introduction to Bayesian inference 1 1.1 Why Bayesian inference? 1 1.2 Notations and definitions 2 1.3 The German tank problem 4 1.4 Laplace’s birth rate problem 7 1.5 Tutorial 1: diffusion coefficient from single-particle tracking 11 2 Asymptotic inference and information 17 2.1 Asymptotic inference 17 2.2 Notions of information 23 2.3 Inference and information: the maximum entropy principle 29 2.4 Tutorial 2: entropy and information in neural spike trains 32 3 High-dimensional inference: searching for principal components 39 3.1 Dimensional reduction and principal component analysis 39 3.2 The retarded learning phase transition 43 3.3 Tutorial 3: replay of neural activity during sleep following task learning 52 4 Priors, regularisation, sparsity 59 4.1 L -norm based priors 59 p 4.2 Conjugate priors 64 4.3 Invariant priors 67 4.4 Tutorial4:sparseestimationtechniquesforRNAalternativesplic- ing 71 5 Graphicalmodels:fromnetworkreconstructiontoBoltzmann machines 81 5.1 Network reconstruction for multivariate Gaussian variables 81 5.2 Boltzmann machines 86 5.3 Pseudo-likelihood methods 92 5.4 Tutorial 5: inference of protein structure from sequence data 97 6 Unsupervised learning: from representations to generative models 107 6.1 Autoencoders 107 6.2 Restricted Boltzmann machines and representations 112 6.3 Generative models 120 6.4 Learning from streaming data: principal component analysis re- visited 125 vi Contents 6.5 Tutorial 6: online sparse principal component analysis of neural assemblies 132 7 Supervised learning: classification with neural networks 137 7.1 The perceptron, a linear classifier 137 7.2 Case of few data: overfitting 143 7.3 Case of many data: generalisation 146 7.4 A glimpse at multi-layered networks 152 7.5 Tutorial7:predictionofbindingbetweenPDZproteinsandpep- tides 156 8 Time series: from Markov models to hidden Markov models 161 8.1 Markov processes and inference 161 8.2 Hidden Markov models 164 8.3 Tutorial 8: CG content variations in viral genomes 171 References 175 Index 181 Preface Today’s science is characterised by an ever-increasing amount of data, due to instru- mental and experimental progress in monitoring and manipulating complex systems made of many microscopic constituents. While this tendency is true in all fields of science, it is perhaps best illustrated in biology. The activity of neural populations, composed of hundreds to thousands of neurons, can now be recorded in real time and specifically perturbed, offering a unique access to the underlying circuitry and itsrelationshipwithfunctionalbehaviourandproperties.Massivesequencinghasper- mitted us to build databases of coding DNA or protein sequences from a huge variety of organisms, and exploiting these data to extract information about the structure, function, and evolutionary history of proteins is a major challenge. Other examples abound in immunology, ecology, development, etc. How can we make sense of such data, and use them to enhance our understanding of biological, physical, chemical, and other systems? Mathematicians, statisticians, theoretical physicists, computer scientists, computational biologists, and others have developed sophisticated approaches over recent decades to address this question. The primaryobjectiveofthistextbookistointroducetheseideasatthecrossroadbetween probability theory, statistics, optimisation, statistical physics, inference, and machine learning. The mathematical details necessary to deeply understand the methods, as wellastheirconceptualimplications,areprovided.Thesecondobjectiveofthisbookis toprovidepracticalapplicationsforthesemethods,whichwillallowstudentstoreally assimilatetheunderlyingideasandtechniques.Theprincipleisthatstudentsaregiven adataset,askedtowritetheirowncodebasedonthematerialseenduringthetheory lectures,andanalysethedata.Thisshouldcorrespondtoatwo-tothree-hourtutorial. Most of the applications we propose here are related to biology, as they were part of a course to Master of Science students specialising in biophysics at the Ecole Normale Supérieure. The book’s companion website1 contains all the data sets necessary for the tutorials presented in the book. It should be clear to the reader that the tutorials proposed here are arbitrary and merely reflect the research interests of the authors. Manymoreillustrationsarepossible!Indeed,ourwebsitepresentsfurtherapplications to “pure” physical problems, e.g. coming from atomic physics or cosmology, based on the same theoretical methods. Little prerequisite in statistical inference is needed to benefit from this book. We expectthematerialpresentedheretobeaccessibletoMScstudentsnotonlyinphysics, but also in applied maths and computational biology. Readers will need basic knowledge in programming (Python or some equivalent language) for the applications, and inmathematics(functionalandlinearanalysis,algebra,probability).Oneofourmajor goals is that students will be able to understand the mathematics behind the meth- 1https://github.com/StatPhys2DataDrivenModel/DDM_Book_Tutorials viii Preface ods, and not act as mere consumers of statistical packages. We pursue this objective without emphasis on mathematical rigour, but with a constant effort to develop in- tuition and show the deep connections with standard statistical physics. While the content of the book can be thought of as a minimal background for scientists in the contemporary data era, it is by no means exhaustive. Our objective will be truly ac- complished if readers then actively seek to deepen their experience and knowledge by reading advanced machine learning or statistical inference textbooks. As mentioned above, a large part of what follows is based on the course we gave at ENS from 2017 to 2021. We are grateful to A. Di Gioacchino, F. Aguirre-Lopez, and all the course students for carefully reading the manuscript and signalling us the typos or errors. We are also deeply indebted to Jean-François Allemand and Maxime Dahan, who first thought that such a course, covering subjects not always part of the standard curriculum in physics, would be useful, and who strongly supported us. We dedicate the present book to the memory of Maxime, who tragically disappeared four years ago. Paris, January 2022. Simona Cocco1, Rémi Monasson1,2 and Francesco Zamponi1 1 Ecole Normale Supérieure, Université PSL & CNRS 2 Department of Physics, Ecole Polytechnique 1 Introduction to Bayesian inference This first chapter presents basic notions of Bayesian inference, starting with the definitions of elementary objects in probability, and Bayes’ rule. We then discuss two historicallymotivatedexamplesofBayesianinference,inwhichasingleparameterhas to be inferred from data. 1.1 Why Bayesian inference? Most systems in nature are made of small components, interacting in a complex way. Think of sand grains in a dune, of molecules in a chemical reactor, or of neurons in a brain area. Techniques to observe and characterise quantitatively these systems, or at least part of them, are routinely developed by scientists and engineers, and allow one to ask fundamental questions, see figure 1.1: • Whatcanwesayaboutthefutureevolutionofthesesystems?Abouthowtheywill respond to some perturbation, e.g. to a change in the environmental conditions? Or about the behaviour of the subparts not accessible to measurements? • Whataretheunderlyingmechanismsexplainingthecollectivepropertiesofthese systems?Howdothesmallcomponentsinteracttogether?Whatistheroleplayed by stochasticity in the observed behaviours? The goal of Bayesian inference is to answer those questions based on observations, which we will refer to as data in the following. In the Bayesian framework, both the Fig. 1.1 A. A large complex system includes many components (black dots) that interact together (arrows). B. An observer generally has access to a limited part of the system and canmeasurethebehaviourofthecomponentstherein,e.g. theircharacteristicactivitiesover time.

From Statistical Physics to Data-Driven Modelling: with Applications to Quantitative Biology PDF

193 Pages·2022·9.113 MB·English

by Simona Cocco, Rémi Monasson, Francesco Zamponi

Checking for file health...

Save to my drive

Quick download

Download

Download From Statistical Physics to Data-Driven Modelling: with Applications to Quantitative Biology PDF Free - Full Version

by Simona Cocco, Rémi Monasson, Francesco Zamponi| 2022| 193 pages| 9.113| English

Download From Statistical Physics to Data-Driven Modelling: with Applications to Quantitative Biology by Simona Cocco, Rémi Monasson, Francesco Zamponi in PDF format completely FREE. No registration required, no payment needed. Get instant access to this valuable resource on PDFdrive.to!

Free Download PDF

About From Statistical Physics to Data-Driven Modelling: with Applications to Quantitative Biology

No description available for this book.

Detailed Information

Author:	Simona Cocco, Rémi Monasson, Francesco Zamponi
Publication Year:	2022
ISBN:	9780198864745
Pages:	193
Language:	English
File Size:	9.113
Format:	PDF
Price:	FREE

Download Free PDF

Safe & Secure Download - No registration required

Why Choose PDFdrive for Your Free From Statistical Physics to Data-Driven Modelling: with Applications to Quantitative Biology Download?

100% Free: No hidden fees or subscriptions required for one book every day.
No Registration: Immediate access is available without creating accounts for one book every day.
Safe and Secure: Clean downloads without malware or viruses
Multiple Formats: PDF, MOBI, Mpub,... optimized for all devices
Educational Resource: Supporting knowledge sharing and learning

Frequently Asked Questions

Is it really free to download From Statistical Physics to Data-Driven Modelling: with Applications to Quantitative Biology PDF?

Yes, on https://PDFdrive.to you can download From Statistical Physics to Data-Driven Modelling: with Applications to Quantitative Biology by Simona Cocco, Rémi Monasson, Francesco Zamponi completely free. We don't require any payment, subscription, or registration to access this PDF file. For 3 books every day.

How can I read From Statistical Physics to Data-Driven Modelling: with Applications to Quantitative Biology on my mobile device?

After downloading From Statistical Physics to Data-Driven Modelling: with Applications to Quantitative Biology PDF, you can open it with any PDF reader app on your phone or tablet. We recommend using Adobe Acrobat Reader, Apple Books, or Google Play Books for the best reading experience.

Is this the full version of From Statistical Physics to Data-Driven Modelling: with Applications to Quantitative Biology?

Yes, this is the complete PDF version of From Statistical Physics to Data-Driven Modelling: with Applications to Quantitative Biology by Simona Cocco, Rémi Monasson, Francesco Zamponi. You will be able to read the entire content as in the printed version without missing any pages.

Is it legal to download From Statistical Physics to Data-Driven Modelling: with Applications to Quantitative Biology PDF for free?

https://PDFdrive.to provides links to free educational resources available online. We do not store any files on our servers. Please be aware of copyright laws in your country before downloading.

The materials shared are intended for research, educational, and personal use in accordance with fair use principles.