Table Of ContentTechnische Universit(cid:228)t M(cid:252)nchen
Department of Mathematics
Masters’s Thesis
Approximation and Analysis of Probability
Densities using Radial Basis Functions
Fabian Fr(cid:246)hlich
Supervisor: Prof. Dr. Dr. Fabian Theis
Advisor: Dr. Jan Hasenauer
Submission Date: ...
I assure the single handed composition of this Masters’s thesis only supported by declared
resources.
Garching,
Zusammenfassung
In dieser Arbeit behandeln wir eine neue Methoden zu Approximation von Wahrschein-
lichkeitsdichten. Obwohl die Methode f(cid:252)r ein Breites spektrum an Problemen anwendbar
ist, werden wir uns auf Wahrscheinlichkeitsdichten, die bei der Analyse von dynamischen
System auftreten, fokussieren. Die dort auftretenden Wahrscheinlichkeitsdichten ergeben
sich aus den Unsicherheiten bei der Parametersch(cid:228)tzung. Diese Art von Problem ist
besonders interessant, da die Analyse der Wahrscheinlichkeitsdichten bereits f(cid:252)r Prob-
leme mit niedrig-dimensionalen Parameter R(cid:228)umen nicht-trivial sein kann. F(cid:252)r Prob-
leme mit nichtlinearen Korrelationen zwischen Parametern oder endlastigen Wahrschein-
lichkeitsdichten der Parameter funktionieren klassische Methoden zur Approximation von
Marginalen, wie Markov chain Monte Carlo in Kombination mit Kerndichtesch(cid:228)tzern, nur
mittelm(cid:228)ssig gut.
Es ist bekannt das gitterlose Approximationsschemata wie beispielsweise die Interpola-
tion mit radialen Basisfunktionen, f(cid:252)r die Approximation von hoch-dimensionalen, un-
regelm(cid:228)ssig verteilten Daten gut geeignet sind. Dies macht sie zu einer aussichtsreichen
alternative zu Markov chain Monte Carlo Methoden. In dieser Arbeit werden wir einen al-
ternativen Ansatz pr(cid:228)sentieren, der die Wahrscheinlichkeitsdichten der Parameter mittels
einerLinearkombinationvonGauss’schenradialenBasisfunktionenapproximiert. Dersich
daraus ergendebende Approximand kann als Mixtur von Gaussdichten betrachtet werden,
wodurch die Ausdr(cid:252)cke f(cid:252)r Marginale und Momente des Approximanden in geschlossener
Form angebbar sind.
Im Gegensatz zur Interpolation von unregelm(cid:228)ssig verteilten Daten sind wir nicht auf
festgelegte Interpolationsknoten beschr(cid:228)nkt. Dadurch ergibt sich die M(cid:246)glichkeit auch die
Knotenwahl zu optimieren. Wir werden die Optimalit(cid:228)tseigenschaften von bestimmten
Gittern zeigen und anschliessend daraus eine adaptive Methode zur Knotenwahl herleiten.
Des Weiteren werden wir verschiedene Verbesserungen einf(cid:252)hren, die die Approximation-
sgenauigkeit des adaptiven Verfahrens signi(cid:28)kant erh(cid:246)ht.
Am Beispiel einer bivariaten Mixtur aus zwei Gaussdichten konnten wir zeigen, dass
der L Approximationsfehler f(cid:252)r Marginale mit radialen Basisfunktionen im Vergleich
2
zu Kerndichtesch(cid:228)tzern bei begrenzter Anzahl and Funktionsauswertungen um mehrere
Zehnerpotenzen kleiner ist. Dar(cid:252)ber hinaus konnten wir die Methode erfolgreich bei
Problemen mit bis zu 4-dimensionalen Parameterr(cid:228)umen anwenden. Aufgrund von Ein-
schr(cid:228)nkungen an die Dimensionalit(cid:228)t des Parameterraums, sehen wir das Hauptandwen-
dungsgebiet der Methode bei niedrig-dimensionalen Problemen mit hoher Komplexit(cid:228)t.
Abstract
In this thesis we will present a novel method for the approximation of probability densi-
ties. Although the method is applicable for a wide range of problems, we will focus on
the approximation of probability densities that arise in the estimation of parameters of
dynamical systems. These probabilities densities re(cid:29)ect the uncertainty of the parame-
ter estimates. This class of problems is of special interest, as the analysis of the arising
densities can easily be non-trivial, even for problems with low-dimensional parameter
spaces. In the presence of nonlinear correlations between parameters or heavy tails in the
probability density of parameters classical approximation methods for marginals, such as
Markov chain Monte Carlo combined with kernel density estimators, perform moderately.
Approximation schemes using radial basis functions networks, are known to perform well
for the interpolation of high-dimensional scattered data. This makes them a promis-
ing alternative to kernel density estimation methods. In this thesis we will present a
novel approach which approximates the probability densities of parameters with a linear
combination of Gaussian radial basis functions. As the resulting approximation can be
seen as a mixture of Gaussians, the expressions for marginals and moments of the of the
approximant can be given in closed-form.
Incontrasttoscattereddatainterpolationandkerneldensityestimation,oneisnotlimited
to(cid:28)xedinterpolationnodes. Thisfacilitatestheoptimizationofthechoiceofinterpolation
nodes. We will show the optimality properties of certain lattices and propose a novel
algorithm for the generation of lattices restricted to superlevel-sets. Subsequently we will
motivate and present an adaptive method for the generation of nodes based on interacting
particles. Moreover we will introduce several improvements to the adaptive method that
signi(cid:28)cantly increase the approximation accuracy.
Based on the example of a bivariate mixture of two Gaussians, we will show that our
method yields an L approximation error for marginals that is several orders of magnitude
2
lower compared to approximations using Markov chain Monte Carlo combined with kernel
density estimators. Moreover we will show that the method works for dynamical systems
ofupto4-dimensionalparameterspaces. Duetolimitationsintermsofthedimensionality
of the parameter space, we currently envision the method to be mainly applied to low
dimensional problem of high computational complexity.
Contents
1 Introduction 1
1.1 Parameter Inference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.1.1 Maximum Likelihood . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.1.2 Identi(cid:28)ability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.1.3 Bayesian Inference . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.1.4 Marginals, Moments and Pro(cid:28)les . . . . . . . . . . . . . . . . . . . 6
1.2 Mathematical Tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.2.1 Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.2.2 Markov Chain Monte Carlo . . . . . . . . . . . . . . . . . . . . . . 8
1.2.3 Kernel Density Estimate . . . . . . . . . . . . . . . . . . . . . . . . 8
1.3 Contributions of this Thesis . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.4 Example of Gaussian Mixture . . . . . . . . . . . . . . . . . . . . . . . . . 11
2 Interpolation using Radial Basis Functions 13
2.1 Interpolation Conditions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.2 Positive De(cid:28)nite Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.3 Error Estimates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.3.1 The Native Space . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.3.2 Error Estimates for the Native Space . . . . . . . . . . . . . . . . . 18
2.4 Stability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
5
2.4.1 Trade-o(cid:27) Principle . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
2.4.2 Finding the optimal Shape Parameter . . . . . . . . . . . . . . . . . 20
2.4.3 Improved Stability . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.5 Non-negativity of the Density . . . . . . . . . . . . . . . . . . . . . . . . . 23
2.5.1 Generalized Hermite Interpolation . . . . . . . . . . . . . . . . . . . 24
2.5.2 Compactly Supported Radial Basis Functions . . . . . . . . . . . . 26
2.5.3 Moving Least Squares . . . . . . . . . . . . . . . . . . . . . . . . . 27
2.5.4 Scaled Moving Least Squares . . . . . . . . . . . . . . . . . . . . . 29
2.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
3 Generation of Centers 31
3.1 Connection to Voronoi Cells . . . . . . . . . . . . . . . . . . . . . . . . . . 31
3.2 Lattices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
3.2.1 Restriction of Interpolation Area . . . . . . . . . . . . . . . . . . . 36
3.2.2 Algorithm to generate restricted lattices . . . . . . . . . . . . . . . 36
3.2.3 Discussion of the Algorithm. . . . . . . . . . . . . . . . . . . . . . . 38
3.2.4 Choice of Lattice . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
3.2.5 Reparametrisation . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
3.2.6 Local Reparametrisation . . . . . . . . . . . . . . . . . . . . . . . . 42
3.3 Adaptive Re(cid:28)nement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
3.3.1 Self Organizing Potential . . . . . . . . . . . . . . . . . . . . . . . . 44
3.3.2 Original Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
3.4 Improvements to the Algorithm . . . . . . . . . . . . . . . . . . . . . . . . 46
3.4.1 Modi(cid:28)ed Interpolation of D . . . . . . . . . . . . . . . . . . . . . . 46
p
3.4.2 Modi(cid:28)ed Density Function . . . . . . . . . . . . . . . . . . . . . . . 48
3.4.3 Improved Stopping Criterion . . . . . . . . . . . . . . . . . . . . . . 53
3.4.4 Extension to Higher Dimensions . . . . . . . . . . . . . . . . . . . . 57
3.4.5 Extension to Unknown Domains . . . . . . . . . . . . . . . . . . . . 57
3.4.6 Modi(cid:28)ed Spawning . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
3.4.7 Iterative Reparametrisation . . . . . . . . . . . . . . . . . . . . . . 61
3.4.8 Local Kernel Shape . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
3.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
4 Analysis of the Interpolant 65
4.1 Formulas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
4.2 Numerical Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
4.3 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
5 Application of the Algorithm 73
5.1 Reversible First Order Reaction . . . . . . . . . . . . . . . . . . . . . . . . 73
5.2 Enzymatic Catalysation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
5.3 Stochastic Gene Expression . . . . . . . . . . . . . . . . . . . . . . . . . . 81
5.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
6 Conclusion and Outlook 85
6.1 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
6.2 Outlook . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
Bibliography 88
Chapter 1
Introduction
The probability density function p(x) of a continuous random variable X is a fundamental
concept in probability theory. The function p(x) describes the relative likelihood of X to
take the value x. For a random variable X, the probability that X takes values in any
measurable set A is given as follows:
(cid:90)
P(X ∈ A) = p(x)dµ.
A
In statistics, one often deals with the problem where one is given a set of samples {ξk}N ,
k=1
where
ξk i.∼i.d. X
are independent realizations of the random variable X and wants to estimate the cor-
responding probability density of X from these samples. Methods commonly used to
achieve this can be discriminated in two distinct categories. There are parametric estima-
tion techniques where it is assumed that the samples are drawn from a known parametric
family of distributions. In this context the goal is to estimate the parameters for the
special realization of the parametric family. In the setting of nonparametric density es-
timation, one does not make any strong assumptions about the probability distribution,
except the existence of a corresponding density. Two basic references covering parametric,
as well as nonparametric density estimation are the books (Ramsay and Scott, 1993) and
(Silverman, 1986).
Inthisthesiswewillconsiderarelatedproblem, wherewearegivenadensityfunctionp(x)
forwhichwehavenoclosed-formexpressionbutwhichisnumericallyevaluableuptosome
scaling factor. To analyze this density, it is usually necessary to give an approximation
to the density and carry out the analysis on the approximant. One possible approach for
this problem is to generate independent, identically distributed samples from the random
variable associated to the density p(x) and use classical methods for density estimation.
In this thesis we will investigate an alternative approach that enforces certain regularity
conditionsonthesamplingschemeandemploysaparametricdensityestimationtechnique
that only assumes a certain smoothness of the density function.
1
2 CHAPTER 1. INTRODUCTION
In Chapter 1 we will introduce the problem and present the commonly used techniques for
the analysis of the probability density. In Chapter 2 we will present the relevant theory
for the approximation using radial basis functions. Subsequently Chapter 3 treats the
problem of generating optimal interpolation nodes and Chapter 4 gives the closed-form
expressions for the analysis of the approximant. In Chapter 5 we apply the proposed
method to several models for dynamical and stochastic systems. Eventually we present
our conclusion and outlook to the method in Chapter 6.
1.1 Parameter Inference
Given a set of (multi-dimensional) time-resolved data points
D = {(t ,y¯(t )}nt
k k i=1
where t = (t ,...,t )T is the vector containing the time points where the measurements
1 N
y¯(t ) were taken. We are interested in explaining the measured data with the help of a
k
model that is governed by ordinary di(cid:27)erential equations:
(cid:40)
c˙(t) = f(t,c,θ), c(0) = c (θ)
0
Σ(θ) : .
y(t;θ) = h(c,θ)
Here c(t) : [0,∞) → Rnc is the function describing the time-dependent evolution of
the physical states of the system, such as the concentration of a certain substance. These
physicalstatesfollowthedi(cid:27)erentialequationsdescribedbytheparametersθ ∈ Ω ⊆ Rnθ,
θ +
the vector (cid:28)eld f : [0,∞)×Rnc ×Ω → Rnc and the initial conditions c (θ) : Ω → Rnc.
θ 0 θ
Normally it is not possible to directly measure the physical states of a system, hence it is
necessary to introduce the function y(t;θ) : [0,∞)×Rnc → Rny describing the observable
quantities of the system. The physical states c(t) and the observables y(t) are brought
in relationship by the (cid:28)lter h(c,θ) : Rnc × Ω → Rny, which in many cases simply is a
θ
projection. In the following, we will assume that all involved functions have su(cid:30)cient
smoothness for existence and uniqueness of solutions.
1.1.1 Maximum Likelihood
When not stated otherwise, we will assume that the data D can be explained by the
system Σ(θ) under normally distributed measurement noise:
y¯(t ) = y(t ;θtrue)+(cid:15)k y(t;θtrue) sol. to Σ(θtrue), (cid:15)k ∼ N(0,σ (θtrue)).
k k j jk
Hence we can state the conditional probability density to observe the measured data given
a set of parameters θ and the model Σ(θ):
(cid:89)nt (cid:89)ny 1 (cid:32) (y¯ (t )−y(t ;θ))2(cid:33)
p(D|θ) = √ exp − j k k . (1.1.1)
σ 2π 2σ2
j=1k=1 jk jk
CHAPTER 1. INTRODUCTION 3
This expression is called likelihood. It is possible to interpret this as a function in θ and
to (cid:28)nd the maximizer of this function. The resulting maximum-likelihood estimator
(MLE)
ˆMLE
θ = argmax p(D|θ)
θ∈Ω
θ
gives the parameter that maximizes the probability to observe the data D. The MLE
returns a single data point and does not assess the uncertainty of the returned value. Al-
though it is possible to compute con(cid:28)dence intervals (Joshi et al., 2006) and thus quantify
the uncertainty of the estimator, we still need to (cid:28)nd a way to evaluate this uncertainty.
To do this in a sophisticated fashion we will introduce the concept of identi(cid:28)ability.
1.1.2 Identi(cid:28)ability
For this thesis, it is su(cid:30)cient to introduce two di(cid:27)erent kinds of identi(cid:28)abilities. First
there is structural identi(cid:28)ability:
De(cid:28)nition 1.1 ((Hasenauer and Theis, 2012)). Global structural identi(cid:28)ability
A parameter θ is globally structurally identi(cid:28)able if for almost any θ ∈ Ω
i
∀t : y(t,θ) = y(t,θ(cid:48)) =⇒ θ = θ(cid:48).
Structural identi(cid:28)ability does not depend on the measured data but is a property of the
model. It tells us wether the observables of the model are injective almost everywhere
with respect to the parameters. Hence the only possibility to remove structural non-
identi(cid:28)abilities is to make changes to the model.
Second, there is practical identi(cid:28)ability, which depends on the available data as well as
the model and the employed estimator. As practical identi(cid:28)ability is not clearly de(cid:28)ned
in literature, we will give an independent de(cid:28)nition here:
De(cid:28)nition 1.2 ((Hasenauer and Theis, 2012)). Practical identi(cid:28)ability (MLE)
A parameter θ is practically identi(cid:28)able from data D if there exist θ ,θ ∈ Ω such
i min max θ
that
(cid:16) (cid:17)
(cid:64)θ < θ : p(D|θ) > exp(cid:0)−∆α(cid:1)p D|θˆMLE
i i,min 2
(cid:16) (cid:17) (1.1.2)
and (cid:64)θ > θ : p(D|θ) < exp(cid:0)−∆α(cid:1)p D|θˆMLE .
i i,max 2
Forasystemtobepracticallyidenti(cid:28)able, welookatthespreadoftheprobabilitydensity
above a certain threshold. This is illustrated in Figure 1.1.1, where one example for an
identi(cid:28)able parameter and a not identi(cid:28)able parameter is given. Practical identi(cid:28)ability
of a system essentially depends on the choice of this threshold. Therefore we should make
a reasonable choice. When we have a likelihood as in (1.1.1), we can use a result from
linear regression models which is also a good approximation for ODE models (Raue et al.,
2009):
2(−log(p(D|θtrue))+log(p(D|θMLE)) ∼ χ2(·|n ).
θ
Description:System auftreten, fokussieren. for the interpolation of high-dimensional scattered data In statistics, one often deals with the problem where one is given a set of samples . the measurement is taken and inference is carried out.