Table Of ContentOn Higher-Order Probabilistic Subrecursion
(Long Version)∗
Flavien Breuvart Ugo Dal Lago Agathe Herrou
7
1 Abstract
0
We study the expressive power of subrecursive probabilistic higher-order calculi. More
2
specifically,weshowthatendowingaveryexpressivedeterministiccalculuslikeG¨odel’sTwith
n
various forms of probabilistic choice operators may result in calculi which are not equivalent
a as for the class of distributions they give rise to, although they all guarantee almost-sure
J
termination. Along the way, we introduce a probabilistic variation of the classic reducibility
7
technique, and we prove that the simplest form of probabilistic choice leaves the expressive
1
power of T essentially unaltered. The paper ends with some observations about functional
expressivity: expectedly, all the considered calculi represent precisely the functions which T
]
O itself represents.
L
s. 1 Introduction
c
[
Probabilistic models are more and more pervasive in computer science and are among the most
1 powerfulmodelingtoolsinmanyareaslikecomputervision[20],machinelearning[19]andnatural
v
language processing [17]. Since the early times of computation theory [8], the very concept of an
6
algorithm has been itself generalised from a purely deterministic process to one in which certain
8
7 elementary computation steps can have a probabilistic outcome. This has further stimulated
4 researchin computation and complexity theory [11], but also in programming languages [21].
0 Endowingprogramswithprobabilisticprimitives(e.g. anoperatorwhichmodelssamplingfrom
.
1 a distribution) poses a challenge to programming language semantics. Already for a minimal, im-
0 perative probabilistic programming language, giving a denotational semantics is nontrivial [16].
7 When languages also have higher-order constructs, everything becomes even harder [14] to the
1
pointofdisruptingmuchofthebeautifultheoryknowninthedeterministiccase[1]. Thishasstim-
:
v ulated research on denotational semantics of higher-order probabilistic programming languages,
Xi with some surprising positive results coming out recently [9, 4].
Notmuchisknownaboutthe expressivepowerofprobabilistic higher-ordercalculi,asopposed
r
a to the extensive literature on the same subject about deterministic calculi (see, e.g. [24, 23]).
What happens to the class of representable functions if one enrich, say, a deterministic λ-calculus
X with certain probabilistic choice primitives? Are the expressive power or the good properties
of X somehow preserved? These questions have been given answers in the case in which X is
the pure, untyped, λ-calculus [6]: in that case, univesality continues to hold, mimicking what
happensinTuringmachines[22]. ButwhatifX isoneofthemanytypedλ-calculiensuringstrong
normalisation for typed terms [12]?
Butletusdoastepback,first: whenshouldahigher-orderprobabilisticprogrambeconsidered
terminating? The questioncanbegivenasatisfactoryanswerbeinginspiredby,e.g.,recentworks
on probabilistic termination in imperative languages and term rewrite systems [18, 2]: one could
ask the probability ofdivergenceto be 0, calledalmost sure termination property,orthe stronger
positive almost sure termination, in which one requires the averagenumber of evaluation steps to
be finite. That termination is desirable property, even in a probabilistic setting can be seen, e.g.
∗TheauthorsarepartiallysupportedbyANRproject14CE250005ELICAandANRproject12IS02001PACE.
1
in the fieldof languageslike Church and Anglican,in whichprogramsare oftenassumedto be
almost surely terminating, e.g. when doing inference by MH algorithms [13].
In this paper, we initiate a study on the expressive power of terminating higher-order calculi,
in particular those obtained by endowing G¨odel’s T with various forms of probabilistic choice
operators. In particular, three operators will be analyzed in this paper:
• A binary probabilistic operator ⊕ such that for every pair of terms M,N, the term M ⊕N
evaluates to either M or N, each with probability 1. This is a rather minimal option, which,
2
however,guarantees universality if applied to the untyped λ-calculus [6] (and, more generally,
to universal models of computation [22]).
• A combinator R, which evaluates to any natural number n ≥0 with probability 1 . This is
2n+1
the natural generalization of ⊕ to sampling from a distribution having countable rather than
finite support. This apparently harmless generalization (which is absolutely non-problematic
inauniversalsetting)hasdramaticconsequencesinasubrecursivescenario,aswewilldiscover
soon.
• A combinatorX such that for everypair of values V,W, the term XV W evaluates to either W
or V(XV W), eachwith probability 1. The operatorX canbe seen as a probabilistic variation
2
onP C F ’sfixpointcombinator. As such,Xispotentiallyproblematictotermination,givingrise
to infinite trees.
This way, various calculi can be obtained, like T⊕, namely a minimal extension of T, or the full
calculus T⊕,R,X, in which the three operators are all available. In principle, the only obvious fact
about the expressive power of the above mentioned operators is that both R and X are at least as
expressiveas ⊕: binarychoice canbe easily expressedby either R orX. Less obviousbut still easy
to prove is the equivalence between R and X in presence of a recursive operator (see Section 3.3).
But how about, say, T⊕ vs. TR?
Traditionally, the expressivities of such languages are compared by looking at the set of func-
tions f ∶ N → N defined by typable programs M ∶ NAT → NAT. However, in probabilistic setting,
programs M ∶ NAT → NAT computes functions from natural numbers into distributions of nat-
ural numbers. In order to fit usual criterions, we need to fix a notion of observation. There
are at least two relevant notions of observations, corresponding to two randomised programming
paradigms, namely Las Vegas and Monte Carlo observations. The main question, then, consists
inunderstandinghowtheobtainedclassesrelatetoeachother,andtothe classofT-representable
functions. Alongtheway,however,wemanagetounderstandhowtocapturetheexpressivepower
of probabilistic calculi per se. This paper’s contributions can be summarised as follows:
• We first take a look at the full calculus T⊕,R,X, and prove that it enforces almost-sure termi-
nation, namely that the probability of termination of any typable term is 1. This is done by
appropriatelyadaptingthewell-knownreducibilitytechnique[12]toaprobabilisticoperational
semantics. We then observe that while T⊕,R,X cannot be positively almost surely terminating,
T⊕ indeed is. This already shows that there must be a gap in expressivity. This is done in
Section 3.
• In Section 4, we look more precisely at the expressive power of T⊕, proving that the mere
presence of probabilistic choice does not add much to the expressive power of T: in a sense,
probabilistic choice can be “lifted up” to the ambient deterministic calculus.
• We look at other fragments of T⊕,R,X and at their expressivity. More specifically, we will prove
that (the equiexpressive)TR and TX representprecisely what T⊕ cando at the limit, in a sense
which will be made precise in Section 3. This part, which is the most challenging, is done in
Section 5.
• Section6 is devotedto provingthatboth forMonte Carlo andfor Las Vegas observations,the
class of representable functions of TR coincides with the T-representable ones.
2 Probabilistic Choice Operators, Informally
AnytermofG¨odel’sTcanbeseenasapurelydeterministiccomputationalobjectwhosedynamics
is finitary, due to the well-knownstrong normalizationtheorem (see, e.g., [12]). In particular, the
2
apparent non-determinism due to multiple redex occurrences is completely harmless because of
confluence. Inthispaper,indeed,weevenneglectthisproblem,andworkwithareductionstrategy,
namely weak call-by-value reduction (keeping in mind that all what we will say also holds in call-
by-name). Evaluationof a T-termM of type NAT canbe seen as a finite sequence of terms ending
in the normal form n of M (see Figure 1). More generally, the unique normal form of any term
T term M will be denoted as M . Noticeably, T is computationally very powerful. In particular,
the T-representablefunctions from N to N coincide with the functions which are provablytotal in
Peano’s arithmetic [12]. J K
As we already mentioned, the most natural way to enrich deterministic calculi and turn them
into probabilistic ones consists in endowing their syntax with one or more probabilistic choice
operators. Operationally,eachof them models the essentially stochastic process of sampling from
a distribution and proceeding depending on the outcome. Of course, one has many options here
as for which of the various operators to grab. The aim of this work is precisely to study to which
extend this choice have consequences on the overallexpressive power of the underlying calculus.
Suppose, for example, that T is endowed with the binary probabilistic choice operator ⊕ de-
scribed in the Introduction, whose evaluation corresponds to tossing a fair coin and choosing one
of the two arguments accordingly. The presence of ⊕ has indeed an impact on the dynamics of
the underlying calculus: the evaluation of any term M is not deterministic anymore, but can be
modeled as a finitely branching tree (see, e.g. Figure 3 for such a tree when M is (3⊕4)⊕2).
The fact that all branches of this tree have finite height (and the tree is thus finite) is intuitive,
and a proof of it can be given by adapting the well-known reducibility proof of termination for T.
In this paper, we in fact prove much more, and establish that T⊕ can be embedded into T.
If ⊕ is replaced by R, the underlying tree is not finitely branching anymore, but, again, there
is not (at least apparently) any infinitely long branch, since each of them can somehow be seen
as a T computation (see Figure 2 for an example). What happens to the expressive power of the
obtained calculus? Intuition tells us that the calculus should not be too expressive viz. T⊕. If
⊕ is replaced by X, on the other hand, the underlying tree is finitely branching, but its height
can well be infinite. Actually, X and R are easily shown to be equiexpressive in presence of higher-
order recursion, as we show in Section 3.3. On the other hand, for R and ⊕, no such encoding is
available. Nonetheless, TR can still be somehow encoded embedded into T (just that we need an
infinite structure) as we will detail in Section 5. From this embeding, we can show that applying
Monte Carlo or Las Vegas algorithmon T⊕,X,R results do not add any expressive power to T, This
is done in Section 6.
3 The Full Calculus T⊕,R,X
All along this paper, we work with a calculus T⊕,R,X whose terms are the ones generated by the
following grammar:
M,N,L ∶∶ x ∣ λx.M ∣ M N ∣ M,N ∣ π ∣ π
= ⟨ ⟩ 1 2
∣ rec ∣ 0 ∣ S ∣ M ⊕N ∣ R ∣ X.
Please observe the presence of the usual constructs from the untyped λ-calculus, but also of
primitive recursion, constants for natural numbers, pairs, and the three choice operators we have
described in the previous sections.
As usual, terms are taken modulo α-equivalence. Terms in which no variable occurs free are,
as usual, dubbed closed, and are collected in the set T⊕,R,X. A value is simply a closed term from
C
the following grammar
U,V ∶∶ λx.M ∣ π ∣ π ∣ U,V ∣ rec ∣ 0 ∣ S ∣ S V ∣ X,
= 1 2 ⟨ ⟩
and the set of values is T⊕,R,X. Extended values are (not necessarily closed) terms generated by
V
the same grammar as values with the addition of variables. Closed terms that are not values are
3
called reducible and their set is denoted T⊕,R,X. A context is a term with a unique hole:
R
C ∶= ⋅ ∣ λx.C ∣ C M ∣ M C ∣ ⟨C,M⟩ ∣ ⟨M,C⟩ ∣ C⊕M ∣ M ⊕C
We write T⊕,R,X forLtMhe set of all such contexts.
L⋅M
Termination of G¨odel’s T is guaranteed by the presence of types, which we also need here.
Types are expressions generated by the following grammar
A,B ∶∶=NAT ∣ A→B ∣ A×B.
Environmental contexts areexpressionsofthe formΓ=x1∶A1,...,xn∶An, whiletyping judgments
are of the form Γ⊢M∶A. Typing rules are given in Figure 5. From now on, only typable terms
will be considered. We denote by T⊕,R,X A the set ofterms of type A, and similarly for T⊕,R,X A
( ) C ( )
and T⊕,R,X A . We use the shortcut n for values of type NAT: 0 is already part of the language of
V ( )
terms, while n+1 is simply S n.
3.1 Operational Semantics
While evaluating terms in a deterministic calculus ends up in a value, the same process leads to
a distribution of values when performed on terms in a probabilistic calculus. Formalizing all this
requiressomecare,butcanbe donefollowingoneofthe manydefinitionsfromthe literature(e.g.,
[6]).
GivenacountablesetX,adistribution onX isaprobabilistic(sub)distributionoverelements
L
of X:
L,M,N ∈D(X)={f ∶X → 0,1 ∑ f x ≤1
[ ]∣ ( ) }
x∈X
We are especially concerned with distributions over terms here. In particular, a distribution of
type A is simply an element of D T⊕,R,X A . The set D T⊕,R,X is ranged over by metavari-
( ( )) ( V )
ables like , , . We will use the pointwise order on distributions, which turns them into an
U V W ≤
ωCPO. Moreover, we use the following notation for Dirac’s distributions over terms: M ∶=
{ }
M ↦1
. Thesupportofadistributionisindicatedas ; wealsodefinethereducible
{N ↦0 if M ≠N} ∣M∣
andVvhaaluveesaunppoborvtisoufrsaagnmdenntastausra∣Ml m∣Rea∶n=i∣nMg:∣∩foTr⊕Ra,Rn,Xyand∣MD∣VX∶=a∣Mnd∣∩YT⊕V,XR,X,.tNheontionsYlikxeMRanxd
M M∈ ⊆ M =M
if x Y and Y x 0 otherwise. ( ) ( ) ( )
∈ M =
( )
Assyntacticsugar,weuseintegralnotationstomanipulatedistributions,i.e.,foranyfamilyof
distributions(NM)M∈T⊕,R,X ∶D(T⊕,R,X)T⊕,R,X,theexpression∫MNM.dM standsfor∑M∈T⊕,R,XM(M)⋅
(by abuse of notation, we may define only for M , since the others are not used
NM NM ∈ M
∣ ∣
anyway). Thenotationcanbeeasilyadapted,e.g.,tofamiliesofrealnumbers(pM)M∈T⊕,R,X andto
other kinds of distributions. We indicate as C the push-forward distribution C M dM
M ∫M{ }
induced by a context C, and as the norm 1dM of . Remark, finally, that we have
the useful equality M d∣∣MM.∣∣The intLegra∫MlMcan be maMnipulated as usual integraLls wMhich
M = ∫M{ }
respect the less usual equation:
dN dN dM (1)
∫(∫MNMdM)LN = ∫M(∫NM LN )
Reduction rules of T⊕,R,X are given by Figure 6. For simplicity, we use the notation M →?
M
for M → , i.e., M → whenever M is reducible and M whenever M is a value. The
M M M=
{ } { }
notation permit to rewrite rule r- as:
∈
( )
∀M ∈ M, M →?NM
∣ ∣ (r-∈)
→ .dM
M ∫MNM
4
M →⋯→n 3⊕4 ⊕2 X S 3
( )
1 1
Figure 1: A Reduction in T 1 1 2 2
2 2 3 S X S 3
( )
R 3⊕4 2 12 12
4 SS X S 3
1 1 ( )
12 41 18 116 2 2 12 12
0 1 2 3 4 3 4 5 ⋱
Figure 2:A Reduction in TR Figure 3:A Reduction in T⊕ Figure 4:A Reduction in TX
Γ,x A M B Γ M A B Γ N A
∶ ⊢ ∶ ⊢ ∶ → ⊢ ∶
Γ,x A x A Γ λx.M A B Γ M N B
∶ ⊢ ∶ ⊢ ∶ → ⊢ ∶
Γ 0 NAT Γ S NAT NAT Γ rec A NAT A A NAT A
⊢ ∶ ⊢ ∶ → ⊢ ∶ ×( → → )× →
Γ M A Γ N B
⊢ ∶ ⊢ ∶
Γ M,N A B Γ π1 A B A Γ π2 A B B
⊢⟨ ⟩∶ × ⊢ ∶( × )→ ⊢ ∶( × )→
Γ M A Γ N A
⊢ Γ∶ M⊕N ∶⊢A ∶ Γ R∶NAT Γ X∶ A A ×A A
⊢ ⊢ ⊢ ( → ) →
Figure 5: Typing Rules.
M
(r-β) →M (r-@L)
λx.M V M V x M V V
[ / ]} M
( ) →{ →
N M M
→N (r-@R) →M (r-⟨⋅⟩L) →M (r-⟨⋅⟩R)
M N M M,N ,N V,M V,
N M M
→ ⟨ ⟩→⟨ ⟩ ⟨ ⟩→⟨ ⟩
(r-rec0) (r-recS)
rec U,V,0 U rec U,V,S n V n rec U,V,n
⟨ ⟩→{ } ⟨ ⟩→{ ( ⟨ ⟩)}
(r-π1) (r-π2) (r-R)
π1 ⟨V,U⟩→{V} π2 ⟨V,U⟩→{U} R→{n↦ 2n1+1}n∈
N
(r-⊕) (r-X)
M 1 V X V,W 1
M⊕N →{N ↦↦ 212} X⟨V,W⟩→{ ( W⟨ ⟩) ↦↦ 212}
M , M V , = V
∀ ∈ MR NM ∀ ∈ MV NV
∣ ∣ → ∣ ∣ { } (r-∈)
.dM
M→∫MNM
Figure 6: Operational Semantics.
5
Notice that the reduction → (resp. →?) is deterministic. We can easily define →n (resp. →≤n)
as the nth exponentiation of → (resp. →?) and →∗ as the reflexive and transitive closure of →
(and →?). In probabilistic systems, we might wantto consider infinite reductions such as the ones
induced by X λx.x 0 which reduces to 0 , but in an infinite number of steps. Remark that
( ) { }
for any value V, and whenever → , it holds that V V . As a consequence, we can
M N M ≤N
( ) ( )
proceed as follows:
Definition1. LetM beatermandlet Mn n∈N betheuniquedistributionfamilysuchthatM →≤n
( )
. The evaluation of M is the value distribution
Mn
M ∶={V ↦nl→im∞Mn(V)}∈D(T⊕V,R,X).
The success of is its probaJbiliKty of normalization, which is formally defined as the norm of its
M
evaluation,i.e.,Succ M ∶= M . M∆nV standsfor V ↦Mn V −Mn−1 V ,thedistributions
( ) ∣∣ ∣∣ { ( ) ( )}
of values reachable in exactly n steps. The averagereduction length from M is then
J K
M ∶= ∑ n⋅ M∆nV ∈ N ∪ +∞
[ ] ( ∣∣ ∣∣) { }
n
Notice that, by Rule r- , the evaluation is continuous:
∈
( )
M dM.
M =∫
M
Any term M of type NAT → NAT repreJsentKs a funJctioKn g ∶ N → D N iff for every n,m it holds
that g n m M n m . ( )
=
( )( ) ( )
J K
3.2 Almost-Sure Termination
We now have all the necessary ingredients to specify a quite powerful notion of probabilistic com-
putation. When, precisely, should such a process be considered terminating? Do all probabilistic
branches (see figures 1-4) need to be finite? Or should we stay more liberal? The literature on
the subjectisunanimouslypointingtothenotionofalmost-sure termination: aprobabilisticcom-
putation should be considered terminating if the set of infinite computation branches, although
not necessarily empty, has null probability [18, 10, 15]. This has the following incarnation in our
setting:
Definition 2. A term M is said to be almost-surely terminating (AST) iff Succ M 1.
=
( )
This section is concerned with proving that T⊕,R,X indeed guarantees almost-sure termination.
This will be done by adapting Girard-Tait’s reducibility technique. Before giving the main result,
some auxiliary lemmas are necessary.
Lemma 3. If it exist, the reduction of any integral dM → , is the integral dM
L ∫MNM L ∫MLM
such that → for any M .
NM LM ∈ M
∣ ∣
Proof. • Let such that → for any M . Then in order to derive
(LM)M∈∣M∣ NM LM ∈ ∣M∣
→ , the only rule we can apply is r- so that there is ′ such
NM LM ( ∈) (LM,N)M∈∣M∣,N∈∣NM∣
that LM = ∫NML′M,NdN and N →? L′M,N. Notice that for N ∈ ∣NM∣ ∩ ∣NM′∣, the de-
terminism of →? gives that ′ ′ , thus we can define ′ without ambiguity for
LM,N = LM′,N LN
any N dM . Then we have N →? ′ and ′ dN
∈ ⋃M∈∣M∣∣NM∣ = ∣∫MNM ∣ LN ∫∫MNMdMLN =
′ dN dM dM (we use Eq (1)). Thus, by applying rule r- we get
∫M(∫NMLN ) = ∫MLM ( ∈)
dM → dM.
∫MNM ∫MLM
• Conversely,weassumethat dM → . Inordertoderiveit,theonlyrulewecanapplyis
∫MNM L
r- sothatthereis ′ suchthat ′ dN ′ dN dM
( ∈) (LN)N∈∣∫MNMdM∣ L=∫(∫MNMdM)LN =∫M(∫NM LN )
and N →?L′N. We conclude by setting LM ∶=∫NM L′NdN.
6
Lemma 4. The nth reductions of any integral dM →n , is (if it exists) the integral
L ∫MNM L
dM such that →n for any M .
∫MLM NM LM ∈∣M∣
Proof. By induction on n:
• If n 0 then .
= LM =N
• Otherwise, dM → ′→n−1 . ByLemma 3,the firststepis possible iff ′ ′ dM
∫MNM L L L =∫MLM
with → ′ for any M . By IH, the remaining steps are then possible iff
NdMM suLchMthat ′ →n−1∈ ∣M∣for any M . L =
∫MLM LM LM ∈∣M∣
Lemma 5. If (M N)→n L+W then there is M,N ∈D(T⊕,R,X) and U,V ∈D(T⊕V,R,X) such that
M →∗ M+U, N →∗N +V and U V →∗L′+W.
( )
Proof. By induction on n.
Trivial if n 0.
=
If n 1, we proceed differently depending whether M and N are values or not.
≥
• If N is not a value then N →N′ and M N′ →n−1L+W.
( )
aNnodticLe =th∫aNt′(LMN′NdN′)′∶=in∫Nsu′{cMh aNw′}adyNth′.atUsfoinrgaLneymNm′a∈4∣,Nw′e∣,c(aMn dNec′o)m→p≤ons−e1WLN=′∫+NW′WNN′.′dBNy′
induction we have N′ →∗ NN′ +VN′ and M →∗ MN′ +UN′ with (UN′ VN′)→∗ L′N′ +WN′ for
allN′∈∣N∣′. Summing upwe haveM∶=∫N′MN′dN′, U ∶=∫N′UN′dN′, N ∶=∫N′NN′dN′ and
V ∶=∫N′VN′dN′.
• If N is a value and M →M′ then M′ N →n−1L+W. Thus we can decompose the equation
( )
similarly along ′ and apply our IH.
M
• If both M and N are values it is trivial since M and N .
U = V =
{ } { }
Lemma 6. For every m,n N and every , D T⊕,R,X and , , D T⊕,R,X , whenever
M →m M+U and N →nN +∈ V, we have UMVN≤∈M(N . ) U V W ∈ ( V )
Proof. By induction on m+n. J K J K
Trivial if m+n=0.
If n 1, we proceed differently depending whether M and N are values or not.
≥
• If N is not a value then N →N′→n−1N +V.
UsingLemma4,wecandecomposeN =∫N′NN′dN′ andV =∫N′VN′dN′ insuchawaythatfor
any N′ ∈ N′, N′ →≤n−1 NN′ +VN′ (with →≤n−1 dubbing either = or →n−1 depending whether
∣ ∣
N′ is a value or not).
Then we get
U V =sU (∫N′VN′dN′){=∫N′ U VN′ dN′≤∫N′ M N′ dN′= M N′ .
J K J K J K J K
• If N is a value and M → M′ then M′ →m−1 M+U. Thus we can decompose the equation
similarly along ′ and apply our IH.
M
• If both M and N are values it is trivial since M and N .
U = V =
{ } { }
Thefollowingis acrucialintermediatesteptowardsTheorem8,the mainresultofthis section.
Lemma 7. For any M,N: M N M N . In particular, if the application M N is almost-
=
surely terminating, so are M and N.
J K JJ K J KK
Proof. ≤ There is Ln,Wn n≥1 such that M N →nLn+Wn for all n and such that M N =
lim ( ). (Applyin)g Lemma (5 )gives , , ′ D T⊕,R,X
n Wn Mn Nn Ln ∈
and (Un,V)n ∈D(T⊕V,R,X) such that M →∗ Mn +Un, N →∗ Nn +Vn and (Un Vn) →∗ LJ′n( +WKn).
Thus M and N with leading to the required inequality.
⋁nUn≤ ⋁nVn≤ Wn≤ Un Vn
J K J K J K
7
≥ Thereis Mn,Nn,Un,Vn n≥1suchthatM →nMn+UnandN →nNn+Vnforallnandsuch
( ) ( )
that M lim and N lim . Thisleadstotheequality M N lim .
= n Un = n Vn = m,n Um Vn
( ) ( )
Finally, by Lemma 6, for any m,n, each approximant of is below M N , so is their
Um Vn
sup. J K J K JJ K J KK J K
J K J K
Theorem 8. The full system T⊕,R,X is almost-surely terminating (AST), i.e.,
∀M ∈T⊕,R,X, Succ M =1.
( )
Proof. The proof is is based on the the notion of a reducible term, which is given as follows by
induction on the structure of types:
RedNAT∶= M ∈T⊕,R,X NAT M is AST
{ ( )∣ }
RedA→B ∶= {M ∣∀V ∈RedA∩T⊕V,R,X, (M V)∈RedB}
RedA×B ∶= M π1 M ∈RedA, π2 M ∈RedB
{ ∣( ) ( ) }
Then we can observe that:
• The reducibility candidates over Red are →-saturated:
A
byinductiononAwecanindeedshowthatifM → then Red iffM Red .
M M ⊆ A ∈ A
• Trivial for A NAT. ∣ ∣
=
• If A B →C: then for all value V Red , M V → V , thus by IH M V Red
= ∈ B M ∈ C
( ) ( ) ( )
iff V M′ V M′ Red ; which exactly means that Red iff
M = ∈ M ⊆ C M ⊆ B→C
∣ ∣ { ∣ ∣ ∣} ∣ ∣
M Red .
∈ B→C
• If A = A1×A2: then for i ∈ {1,2}, (πi M) → (πi M) so that by IH, (πi M) ∈ RedAi iff
π Red ; which exactly means that Red iff M Red .
• The(reidMuci)b⊆ility caAnididates over Red are preci∣sMely∣⊆the ASAT1×Ate2rms M∈suchAt1h×aAt2 M Red :
A ⊆ A
this goes by induction on A.
• Trivial for A NAT. J K
=
• Let M Red :
∈ B→C
remark that there is a value V Red , thus M V Red and M V is AST by IH;
∈ B ∈ C
( ) ( )
using Lemma 7 we get M AST and it is easy to see that if U M then U for
∈ ∈ M
∣ ∣ ∣ ∣
some M →∗ so that U Red by saturation.
M ∈ B→C
Conversely, let M be AST with M Red and let V Red JbeKa value:
⊆ B→C ∈ B
∣ ∣
by IH, for any U M Red J wKe have U V AST with an evaluation supported by
∈ ⊆ B→C
∣ ∣ ( )
elements of Red ; by Lemma 7 M V M V meaning that M V is AST and has
C =
an evaluation suppoJrtedK by elements of Red , so that we can conc(lude b)y IH.
C
• Let M Red : J K JJ K K
∈ A1×A2
then π M Red and π M is AST by IH; using Lemma 7 we get M AST and it is
( 1 )∈ A1 ( 1 )
easy to see that if U M then U for some M →∗ so that U Red by
∈ ∣ ∣ ∈ ∣M∣ M ∈ A1→A2
saturation.
J K
Conversely, let M be AST with M Red and let i 1, 2:
∣ ∣⊆ A1→A2 ∈{ }
byIH,foranyU M Red wehave π U AST withanevaluationsupportedby
elements of Red ∈;∣by L∣e⊆mma A71J→πA2KM π (Mi )meaning that π M is AST and has
an evaluation suApipoJrteKd by elementis of R=edi , so that we can conc(luide b)y IH.
Ai
• Every term M such that x1 ∶ A1,..J.,xn ∶KAnJ ⊢JM K∶KB is a candidate in the sense that if
V Red for every 1 i n, then M V x ,...,V x Red :
i∈ Ai ≤ ≤ [ 1/ 1 n/ n]∈ B
byinduction onthe type derivation. The onlydifficult casesarethe recursion,the application,
and the X:
• For the operator rec: We have to show that if U ∈RedA and V ∈RedNAT→A→A then for all
n N , rec U V n Red . We proceed by induction on n:
∈ ∈ A
• If n( 0: rec U)V 0→ U Red and we conclude by saturation.
= ⊆ A
• Otherwise: recU V (n{+}1)→V n rec U V n Red since rec U V n Red by
∈ A ∈ A
IH and since n∈RedN and V ∈RedN(→A→A, we c)onclude by satu(ration. )
8
• For the application: we have to show that if M Red and N Red then M N
∈ A→B ∈ A ∈
( )
Red . But since N Red , this means that it is AST and for every V N , M V
B ∈ A ∈ ∈
∣ ∣ ( )
Red . In particular,by Lemma 7, we have M N M N so that M N is AST and
B =
M N M V Red . ( J )K
• ∣For the o∣p⊆e⋃raVt∈o∣JrNXK∣:∣we have∣⊆to shoBw that fJor anyKvaJlue UJ RKKed and V Red if holds
∈ A→A ∈ A
tJhat XKU V Red J. K
∈ A
By a(n easy)induction on n, Un V Red since U0 V V Red and Un+1 V
∈ A = ∈ B =
U Un V Red whenever U(n V R)ed and U Red .
∈ B ∈ B ∈ B→B
Mo(reover),byaneasyinductiononnwehave X U V = 2n1+1 Un (X U V) +∑i≤n 2i1+1 Ui V :
q y
• trivial for n 0, J K J K
=
• we knowthat X U V = 21 V +21 V(X U V) andwe get V(X U V) = V X U V =
2n1+1 Un+1(X UJ V) +K ∑i≤nJ 2Ki1+1 UJi+1V byKLemma 7 anJd IH, whicKh iJs suJfficientKKto
conclqude. y q y
At the limit, we get X U V = ∑i∈N 2i1+1 Ui V . We can then conclude that (X U V)
is AST (since each of the Ui V Red qare AyST and 1 1) and that M N
J ( K ) ∈ B ∑i 2i+1 = ∣ ∣ =
Ui V Red .
⋃i ⊆ A
∣ ∣ J K
This concluqdes thye proof.
Almost-sure termination could howeverbe seen as too weak a property: there is no guarantee
abouttheaveragecomputationlength. Forthisreason,anotherstrongernotionisoftenconsidered,
namely positive almost-sure termination:
Definition 9. A term M is said to be positively almost-surely terminating (or PAST) iff the
average reduction length M is finite.
[ ]
G¨odel’s T, when paired with R, is combinatorially too powerful to guarantee positive almost
sure termination:
Theorem 10. T⊕,R,X is not positively almost-surely terminating.
Proof. Let Expo∶=rec 2 λxy.rec y λx.S y ∶NAT→NAT be the program computing n↦2n+1
2inn+t1im; tehe2na+v1e.raTgheerned(uEcxtpioo(nRl)en∶gNtAhTisist(choumsp)utin)g, with probability 2n1+1 the number 2n+1 in time
2n+1
[Expo R] = ∑2n+1 = +∞.
n
3.3 On Fragments of T⊕,R,X: a Roadmap
ThecalculusT⊕,R,X containsatleastfourfragments,namelyG¨odel’sTandthethreefragmentsT⊕,
TR andTX correspondingtothethree probabilisticchoiceoperatorsweconsider. Itisthennatural
to ask oneselves how these fragments relate to each other as for their respective expressive power.
At the end of this paper, we will have a very clear picture in front of us.
The firstsuchresultis the equivalence betweenthe apparentlydualfragmentsTR andTX. The
embeddings are in fact quite simple: getting X from R only requires “guessing” the number of
iterations via R and then use rec to execute them; conversely, capturing R from X is even easier:
it corresponds to counting the recursive loops done by some executions of X:
Proposition 11. TR and TX are both equiexpressive with T⊕,R,X.
Proof. The calculus TR embeds the full system T⊕,R,X via the encoding:1
M ⊕N ∶= rec λz.N,λxyz.M,R 0; X ∶= λxy.rec y,λz.x,R .
⟨ ⟩ ⟨ ⟩
1Notice that the dummyabstractions onz and the 0 at the end ensurethe correct reduction order by making
λz.N avalue.
9
The fragment TX embeds the full system T⊕,R,X via the encoding:
M ⊕N ∶= X λxy.M,λy.N 0; R ∶= X S,0 .
⟨ ⟩ ⟨ ⟩
In both cases,the embedding is compositionaland preservestypes. That the two embeddings are
correct can be proved easily, see [3].
Notice how simulating X by R requiresthe presenceof recursion,while the converseis nottrue.
The implications of this fact are intriguing, but lie outside the scope of this work.
In the following, we will no longer consider TX nor T⊕,R,X but only TR, keeping in mind that
all these are equiexpressive due to Proposition11. The rest of this paper, thus, will be concerned
withunderstandingtherelativeexpressivepowerofthethreefragmentsT,T⊕,andTR. Cananyof
the(obvious)strictsyntactical inclusionsbetweenthembeturnedintoastrictsemantic inclusion?
Are the three systems equiexpressive?
In order to compare probabilistic calculi to deterministic ones, several options are available.
Themostcommononeistoconsidernotionsofobservationsovertheprobabilisticoutputs;thiswill
bethepurposeofSection6. Inthissection,wewilllookatwhetheritispossibletodeterministically
represent the distributions computed by the probabilistic calculus at hand. We say that the
distribution M∈D N is finitely represented by2 f ∶N →B , if there exists a q such that for every
( )
k q it holds that f k 0 and
≥ =
( ) k↦f k
M=
{ ( )}
Moreover, the definition can be extended to families of distributions by requiring the
Mn n
existence of f ∶ N ×N →B , q∶N →N such that ∀k≥q n ,f n,k =0 an(d )
( ) ( ) ( )
∀n, Mn= k↦f n,k .
{ ( )}
In this case, we say that the representation is parameterized.
WewillseeinSection4thatthedistributionscomputedbyT⊕ areexactlythe(parametrically)
finitely representable by T terms. In TR, however, distributions are more complex (infinite, non-
rational). That is why only a characterization in terms of approximations is possible. More
specifically, a distribution D NAT is said to be functionally represented by two functions
M ∈
f ∶ N ×N →B and g ∶N →N iff fo(r ev)ery n∈N and for every k ≥g n it holds that f n,k =0
( ) ( ) ( )
and
1
k∑∈N ∣ M(k) − f(n,k) ∣ ≤ n.
Inotherwords,thedistribution canbeapproximatedarbitrarilywell,anduniformly,byfinitely
M
representableones. Similarly,wecandefineaparameterizedversionofthisdefinitionatfirstorder.
InSection5 , weshow thatdistributions generatedby TR terms areindeed uniformlimits over
those of T⊕; using our result on T⊕ this give their (parametric) functional representability in the
deterministic T.
4 Binary Probabilistic Choice
This section is concerned with two theoretical results on the expressive power of T⊕. Taken
together, they tell us that this fragment is not far from T.
4.1 Positive Almost-Sure Termination
TheaveragenumberofstepstonormalformisinfiniteinT⊕,R,X. Wewillprovethat,onthecontrary,
T⊕ is positive almost-surely terminating. This will be done by adapting (and strengthening!) the
reducibility-based result from Section 3.2.
2Here we denote B for binomial numbers n (where m,n ∈ N) and BIN for their representation in system T
2m
encodedbypairsofnaturalnumbers.
10