Table Of ContentIntelligent Systems Reference Library 109
Achim Zielesny
From Curve Fitting
to Machine Learning
An Illustrative Guide to Scientific Data
Analysis and Computational Intelligence
Second Edition
Intelligent Systems Reference Library
Volume 109
Series editors
Janusz Kacprzyk, Polish Academy of Sciences, Warsaw, Poland
e-mail: [email protected]
Lakhmi C. Jain, Bournemouth University, Fern Barrow, Poole, Australia, and
University of Canberra, Canberra, Australia
e-mail: [email protected]
About this Series
The aim of this series is to publish a Reference Library, including novel advances
and developments in all aspects of Intelligent Systems in an easily accessible and
well structured form. The series includes reference works, handbooks, compendia,
textbooks,well-structuredmonographs,dictionaries,andencyclopedias.Itcontains
well integrated knowledge and current information in the field of Intelligent
Systems. The series covers the theory, applications, and design methods of
IntelligentSystems.Virtuallyalldisciplinessuchasengineering,computerscience,
avionics, business, e-commerce, environment, healthcare, physics and life science
are included.
More information about this series at http://www.springer.com/series/8578
Achim Zielesny
From Curve Fitting
to Machine Learning
fi
An Illustrative Guide to Scienti c Data
Analysis and Computational Intelligence
Second Edition
123
AchimZielesny
Institut für biologische undchemische
Informatik
Westfälische Hochschule
Recklinghausen
Germany
ISSN 1868-4394 ISSN 1868-4408 (electronic)
Intelligent Systems Reference Library
ISBN978-3-319-32544-6 ISBN978-3-319-32545-3 (eBook)
DOI 10.1007/978-3-319-32545-3
LibraryofCongressControlNumber:2016936957
©SpringerInternationalPublishingSwitzerland2011,2016
Thisworkissubjecttocopyright.AllrightsarereservedbythePublisher,whetherthewholeorpart
of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations,
recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission
orinformationstorageandretrieval,electronicadaptation,computersoftware,orbysimilarordissimilar
methodologynowknownorhereafterdeveloped.
The use of general descriptive names, registered names, trademarks, service marks, etc. in this
publicationdoesnotimply,evenintheabsenceofaspecificstatement,thatsuchnamesareexemptfrom
therelevantprotectivelawsandregulationsandthereforefreeforgeneraluse.
The publisher, the authors and the editors are safe to assume that the advice and information in this
book are believed to be true and accurate at the date of publication. Neither the publisher nor the
authorsortheeditorsgiveawarranty,expressorimplied,withrespecttothematerialcontainedhereinor
foranyerrorsoromissionsthatmayhavebeenmade.
Printedonacid-freepaper
ThisSpringerimprintispublishedbySpringerNature
TheregisteredcompanyisSpringerInternationalPublishingAGSwitzerland
To myparents
Preface
Preface to the firstedition
Theanalysisofexperimentaldataisatheartofsciencefromitsbeginnings.Butit
wastheadventofdigitalcomputersinthesecondhalfofthe20thcenturythatrev-
olutionized scientific data analysis twofold: Tedious pencil and paper work could
besuccessivelytransferredtotheemergingsoftwareapplicationssosweatandtears
turnedintoautomatedroutines.Inaccordancewithautomationthemanageabledata
volumescouldbedramaticallyincreasedduetotheexponentialgrowthofcomputa-
tional memory and speed. Moreover highly non-linear and complex data analysis
problems came within reach that were completely unfeasible before. Non-linear
curve fitting, clustering and machine learning belong to these modern techniques
thatenteredtheagendaandconsiderablywidenedtherangeofscientificdataanal-
ysis applications. Last but not least they are a further step towards computational
intelligence.
Thegoalofthisbookisto provideaninteractiveandillustrativeguidetothese
topics. It concentrates on the road from two-dimensional curve fitting to multidi-
mensionalclusteringandmachinelearningwithneuralnetworksorsupportvector
machines.Alongthewaytopicslikemathematicaloptimizationorevolutionaryal-
gorithms are touched. All concepts and ideas are outlined in a clear cut manner
withgraphicallydepictedplausibilityargumentsandalittleelementarymathemat-
ics.Difficultmathematicalandalgorithmicdetailsareconsequentlybannedforthe
sakeofsimplicitybutareaccessiblebythereferredliterature.Themajortopicsare
extensivelyoutlinedwithexploratoryexamplesandapplications.Theprimarygoal
is to be as illustrative as possible without hiding problems and pitfalls but to ad-
dressthem.Thecharacterofanillustrativecookbookiscomplementedwithspecific
sectionsthataddressmorefundamentalquestionsliketherelationbetweenmachine
learningandhumanintelligence.Thesesectionsmaybeskippedwithoutaffecting
the main roadbuttheywill openup possiblyinterestinginsightsbeyondthe mere
datamassage.
vii
viii Preface
Alltopicsare completelydemonstratedwith the aid ofthe computingplatform
MathematicaandtheComputationalIntelligencePackages(CIP),ahigh-levelfunc-
tionlibrarydevelopedwithMathematica’sprogramminglanguageontopofMath-
ematica’s algorithms. CIP is open-sourceso the detailed code of every method is
freely accessible. All examples and applications shown throughoutthe book may
be used and customized by the reader without any restrictions. This leads to an
interactiveenvironmentwhichallowsindividualmanipulationslike therotationof
3Dgraphicsortheevaluationofdifferentsettingsuptotailoredenhancementsfor
specificfunctionality.
Thebooktries tobe asintroductoryas possiblecallingonlyfora basic mathe-
maticalbackgroundofthereader-alevelthatistypicallytaughtinthefirstyearof
scientific education.Thetargetreadershipsarestudentsof(computer)scienceand
engineeringaswellasscientificpractitionersinindustryandacademiawhodeserve
an illustrative introduction to these topics. Readers with programming skills may
easilyportandcustomizetheprovidedcode.Themajorityoftheexamplesandap-
plicationsoriginatefromteachingeffortsor solutionproviding.The outlineofthe
bookisasfollows:
• The introductorychapter 1 providesnecessary basics that underliethe discus-
sionsofthefollowingchapterslikeaninitialmotivationfortheinterplayofdata
and models with respect to the molecular sciences, mathematical optimization
methodsordatastructures.Thechaptermaybeskippedatfirstsightbutshould
beconsultedifthingsbecomeunclearinasubsequentchapter.
• Themainchaptersthatdescribetheroadfromcurvefittingtomachinelearning
are chapters 2 to 4. The curve fitting chapter 2 outlines the variousaspects of
adjustinglinearandnon-linearmodelfunctionstoexperimentaldata.A section
about mere data smoothing with cubic splines complements the fitting discus-
sions.
• The clustering chapter 3 sketches the problems of assigning data to different
groupsinanunsupervisedmannerwithclusteringmethods.Unsupervisedclus-
teringmaybeviewedasalogicalfirststeptowardssupervisedmachinelearning
-andmaybeabletoconstructpredictivesystemsonitsown.Machinelearning
methodsmayalsoneedclustereddatatoproducesuccessfulresults.
• The machine learning chapter 4 comprises supervised learning techniques, in
particular multiple linear regression, three-layer feed-forward neural networks
andsupportvectormachines.Adequatedatapreprocessingandtheiruseforre-
gressionandclassificationtasksaswellastherecurringpitfallsandproblemsare
introducedandthoroughlydiscussed.
• The discussions chapter 5 supplementsthe topics of the main road. It collects
someopenissuesneglectedinthepreviouschaptersandopensupthescopewith
more general sections about the possible discovery of new knowledge or the
emergenceofcomputationalintelligence.
The scientific fields touched in the present book are extensive and in addition
constantlyandprogressivelyrefined.Thereforeitisinevitabletoneglectanawfullot
ofimportanttopicsandaspects.Theconcreteselectionalwaysmirrorsanauthor’s
Preface ix
preferencesaswellashispersonalknowledgeandoverview.Sincethemissingparts
unfortunatelyexceedtheselectedonesandpeoplealwayshavestrongfeelingsabout
whatisofimportancethefinalstatementhastobearequestforindulgence.
Recklinghausen,April2011 AchimZielesny
Preface to the secondedition
The first edition was friendly reviewed as a useful introductory cookbookfor the
novicereader.Thesecondeditiontriesto keepthischaracterandresiststhetemp-
tation to heavily expandtopicsor lift the discussion to more subtle academic lev-
els.Besidesnumerousminoradditionsandcorrectionsthroughoutthewholebook
(togetherwith the unavoidableintroductionof somenew errors)the onlysubstan-
tialextensionofthesecondeditionistheadditionofMultiplePolynomialRegres-
sion (MPR) in order to support the discussions concerning the method crossover
from linear and near-linear up to highly non-linear machine learning approaches.
As a consequence several examples and applications have been reworked to im-
provereadabilityandlineofreasoning.Alsotheconstructionofminimalpredictive
modelsisoutlinedinanupdatedandmorecomprehensiblemanner.
Thesecondeditionisbasedontheextendedversion2.0oftheComputationalIn-
telligencePackages(CIP)whichnowallowsparallelizedcalculationsthatleadtoan
oftenconsiderablyimprovedperformancewithmultiple(ormulticore)processors.
Specificparallelizationnotesaregiventhroughoutthebook,thedescriptionofCIP
isaccordinglyextendedandreworkedexamplesandapplicationsmakenowuseof
thenewfunctionality.
Withthissecondeditionthebookhopefullystrengthensitsoriginalintenttopro-
vide a clear and straight introduction to the fascinating road from curve fitting to
machinelearning.
Recklinghausen,February2016 AchimZielesny
Acknowledgements
Certain authors, speaking of their works, say, "My book", "My commentary", "My
history",etc.Theyresemblemiddle-classpeoplewhohaveahouseoftheirown,andal-
wayshave"Myhouse"ontheirtongue.Theywoulddobettertosay,"Ourbook", "Our
commentary","Ourhistory",etc.,becausethereisinthemusuallymoreofotherpeople’s
thantheirown.
Pascal
Acknowledgements tothe first edition
I wouldlike to thankLhoussaineBelkoura,ManfredL. Ristig andDietrichWoer-
mannwhokindledmyinterestfordataanalysisandmachinelearninginchemistry
andphysicsalongtimeago.
MymathematicalcolleaguesHeinrichBrinckandSoerenW.Perreycontributed
alot-mayitbeindeepcanyons,remotejunglesoratourinstitute’scoffeekitchen.
TothemandmyIBCIcollaboratorsMircoDanielandRebeccaSchultzaswellasthe
GNWIteamwithStefanNeumann,Jan-NiklasScha¨fer,HolgerSchulteandThomas
KuhnIamdeeplythankful.
Thecooperationwith ChristophSteinbeckwasveryfruitfulandanexceptional
pleasure:Iowealottohissupportandkindness.
Karina vanden Broek, MareikeDo¨rrenberg,Saskia Faassen, Jenny Grote, Jen-
niferMakalowski,StefanieKleiberandAndreasTruszkowskicorrectedthemanuscript
withbenevolenceandstrongcommitment:Manythankstoallofthem.
Last but not least I want to express deep gratitude and love to my companion
DanielaBeisserwhonotonlyhadtobearanoverworkedbookwriterbutsupported
allstagesofthebookanditscontentswithgreatpassion.
Every book is a piece of collaborative work but all mistakes and errors are of
coursemine.
xi