Table Of ContentKnowledge(cid:2) Probability(cid:2) and Adversaries
Joseph Y(cid:2) Halpern Mark R(cid:2) Tuttle
IBM Almaden Research Center Digital Equipment Corporation
San Jose(cid:2) CA (cid:3)(cid:4)(cid:5)(cid:6)(cid:7) Cambridge Research Lab
halpern(cid:8)ibm(cid:9)com Cambridge(cid:2) MA (cid:7)(cid:6)(cid:5)(cid:10)(cid:3)
tuttle(cid:8)crl(cid:9)dec(cid:9)com
July (cid:3)(cid:4) (cid:3)(cid:5)(cid:5)(cid:6)
Abstract(cid:2) What should it mean for an agent to know or believe an assertion is true with
probability (cid:0)(cid:3)(cid:3)(cid:11) Di(cid:12)erent papers (cid:13)FH(cid:14)(cid:14)(cid:2) FZ(cid:14)(cid:14)a(cid:2) HMT(cid:14)(cid:14)(cid:15) give di(cid:12)erent answers(cid:2) choosing to
use quite di(cid:12)erent probability spaces when computing the probability that an agent assigns to
anevent(cid:9) Weshowthateachchoice can be understoodin termsofabetting game(cid:9) This betting
game itself can be understood in terms of three types of adversaries in(cid:16)uencing three di(cid:12)erent
aspects of the game(cid:9) The (cid:17)rst selects the outcome ofall nondeterministic choices in the system(cid:18)
thesecond representstheknowledge oftheagent(cid:19)sopponent in thebetting game(cid:20)thisis thekey
place the papersmentioned abovedi(cid:12)er(cid:21)(cid:18) the third is needed in asynchronoussystemstochoose
the time the bet is placed(cid:9) We illustrate the need for considering all three types of adversaries
with a number of examples(cid:9) Given a class of adversaries(cid:2) we show how to assign probability
spaces to agents in a way most appropriate for that class(cid:2) where (cid:22)most appropriate(cid:23) is made
precise in terms of this betting game(cid:9) We conclude by showing how di(cid:12)erent assignments of
probability spaces (cid:20)corresponding to di(cid:12)erent opponents(cid:21) yield di(cid:12)erent levels of guarantees in
probabilistic coordinated attack(cid:9)
This paper appeared in Journal of the ACM(cid:2) (cid:24)(cid:7)(cid:20)(cid:24)(cid:21)(cid:25)(cid:3)(cid:5)(cid:26)(cid:27)(cid:3)(cid:28)(cid:6)(cid:2)September (cid:5)(cid:3)(cid:3)(cid:10)(cid:9)
(cid:0) Introduction
In nearly every (cid:17)eld of research concerned with systems of interacting agents(cid:29)be it distributed
computing(cid:2) arti(cid:17)cial intelligence(cid:2) or economics(cid:29)people have found it useful to think about
these systems in terms of knowledge(cid:9) In game theory(cid:2) for example(cid:2) a player(cid:19)s strategy typically
takes into account the knowledge the player acquires about the other players(cid:19) strategies(cid:9) In
all of these (cid:17)elds(cid:2) an important subclass of interactions involve probability(cid:9) For example(cid:2) a
player in a game might toss a coin in order to determine its next move(cid:9) In such contexts(cid:2) it
is natural to (cid:17)nd oneself reasoning(cid:29)at least informally(cid:29)about knowledge and probability and
their interaction(cid:9) This sort of reasoning is quite common in computer science(cid:2) such as when
reasoning about probabilistic primality(cid:30)testing algorithms(cid:9) Such an algorithm might guarantee
that if the input n is a composite number(cid:2) then with high probability the algorithm will (cid:17)nd
a (cid:22)witness(cid:23) that can be used to verify that n is composite(cid:9) Loosely speaking(cid:2) we reason(cid:2) if an
agent runs this algorithm on input n and the algorithm fails to (cid:17)nd such a witness(cid:2) then the
agent knows that n is almost certainly prime(cid:2) since the agent is guaranteed that the algorithm
would almost certainly have found a witness had n been composite(cid:9)
Anumberofrecentpapershavetriedtoformalizethissortofreasoningaboutknowledgeand
probability(cid:9) Faginand Halpern (cid:13)FH(cid:14)(cid:14)(cid:15)presentanabstractmodel forknowledge andprobability
in which theyassigntoeachagent(cid:30)statepairaprobability space tobe used whencomputing the
(cid:0)
probability(cid:2) according tothatagentatthatstate(cid:2)thataformula (cid:2)is true(cid:9) In their framework(cid:2)
the problem of modeling knowledge and probability reduces to choosing this assignment of
probability spaces(cid:9) Although they show that more than one choice may be reasonable(cid:2) they do
nottellushowtomakethischoice(cid:9) Oneparticular(cid:20)andquitenatural(cid:21)choice ismadein(cid:13)FZ(cid:14)(cid:14)a(cid:15)
(cid:2)
and some arguments are presented for its appropriateness(cid:18) another is made in (cid:13)HMT(cid:14)(cid:14)(cid:15) and
used toanalyze interactive proofsystems(cid:9) Itis not initially clear(cid:2) however(cid:2)which choice is most
appropriate(cid:9)
In this paper(cid:2) we clarify the issues involved in choosing the right assignment of probability
spaces(cid:9) Wearguethatnosingle assignmentis appropriatein all contexts(cid:25) therightwaytothink
about these assignments is in terms of strategies for a betting game(cid:2) and di(cid:12)erent assignments
can be viewed as most appropriate in the contexts of betting against di(cid:12)erent opponents in
this game(cid:9) Thinking in terms of probability(cid:2) the truth of a statement such as (cid:22)event E will
occur with probability (cid:3)(cid:23) depends on the assignment of probability spaces(cid:9) Thinking in terms
of games(cid:2) if the opponent can in(cid:16)uence the occurrence ofan event E in any way(cid:2)then the truth
of a statement such as (cid:22)event E will occur with probability (cid:3)(cid:23) depends on the extent to which
the opponent can in(cid:16)uence E(cid:2) and the success of any strategy depending on the occurrence of
E depends on the power of the opponent(cid:9) We establish a correspondence between assignments
of probability spaces and powers of opponents(cid:2) and hence establish a correspondence between
assignments and winning strategies in this betting game against these opponents(cid:9)
We (cid:17)nd(cid:2) however(cid:2) there is more to the betting game than just the opponent you are betting
against(cid:9) Roughly speaking(cid:2) the setting in which the game is played is also of great importance(cid:9)
We identify three aspects of the gameand its environment that capture all thatis relevant(cid:2) and
(cid:0)
Related results about their model appear in (cid:2)FH(cid:3)(cid:4)(cid:5) FHM(cid:3)(cid:6)(cid:7)(cid:8)
(cid:2)
In (cid:2)FZ(cid:9)(cid:10)(cid:5) FZ(cid:9)(cid:9)b(cid:7)(cid:5) other de(cid:11)nitions of knowledge involving probability are proposed(cid:5) in order to analyze an
interactive proof of quadratic residuosity(cid:5) but these de(cid:11)nitions are primarily concerned with accounting for the
limited computational power of agents in the system(cid:8)
(cid:5)
model them in terms of three types of adversaries(cid:2) each playing a fundamentally di(cid:12)erent role(cid:9)
We brie(cid:16)y describe these adversaries and their roles here(cid:2) and explore them in greater depth in
the rest of the paper(cid:9)
When we analyze probabilistic protocols(cid:2) we do so in terms of probability distributions on
the runs or executions of the protocol(cid:9) When we say that a protocol is correct with probability
(cid:0)(cid:3)(cid:3)(cid:2) we are trying to say that the protocol will do the right thing in (cid:0)(cid:3)(cid:3) percent of the runs(cid:9) In
fact(cid:2) a statement like (cid:22)(cid:0)(cid:3)(cid:3) percent of the runs(cid:23) does not usually make sense(cid:2) since we usually
have probability distributions on subsets of the runs(cid:2) but not on the entire set of runs(cid:9) For
example(cid:2) consider any probabilistic primality(cid:30)testing algorithm (cid:13)Rab(cid:14)(cid:7)(cid:2) SS(cid:26)(cid:26)(cid:15)(cid:9) For each (cid:17)xed
input (cid:20)the number to be tested(cid:21)(cid:2) the coins tossed during the algorithm induce a probability
space on the set of runs of the algorithm with that input(cid:2) but we do not have a distribution
on the set of all runs because we are not willing to assume a distribution on the inputs(cid:9) When
we say that the algorithm works with probability (cid:9)(cid:3)(cid:3)(cid:2) we really mean that for every choice
of input the algorithm is correct in (cid:9)(cid:3)(cid:3) of the runs with that input(cid:9) The choice of input is
a nonprobabilistic choice(cid:2) and the coin tosses are probabilistic choices(cid:9) The role of the (cid:17)rst
type of adversary in our framework is to distinguish between these two types of choices(cid:25) this
adversary factors out all nonprobabilistic choices in the system(cid:2) so that for each adversary the
remaining probabilistic choices induce a natural probability distribution on the setof runs with
that adversary(cid:9)
The probability on the runs can be viewed as giving us an a priori probability of an event(cid:2)
before the protocol is run(cid:9) However(cid:2) the probability an agent places on runs will in general
change over time(cid:2) as a function of information received by the agent in the course of the
execution of the protocol(cid:9) New subtleties arise in analyzing this probability(cid:9)
Consider a situation with three agents p(cid:0)(cid:2) p(cid:2)(cid:2) and p(cid:3)(cid:9) Agent p(cid:3) tosses a fair coin at time (cid:7)
and observes the outcome at time (cid:5)(cid:2) but agents p(cid:0) and p(cid:2) never learn the outcome(cid:9) What is
the probability according to p(cid:0) that the coin lands heads(cid:11) Clearly at time (cid:7)(cid:2) before the coin is
tossed(cid:2) it should be (cid:5)(cid:4)(cid:6)(cid:9) What about at time (cid:5)(cid:11) There is one argument that says the answer
should be (cid:5)(cid:4)(cid:6)(cid:9) After all(cid:2) agent p(cid:0) does not learn any more about the coin as a result of its
having been tossed(cid:2) so why should its probability change(cid:11) Another argument says that after
the coin has been tossed(cid:2) it does not make sense to say that the probability of heads is (cid:5)(cid:4)(cid:6)(cid:9)
The coin has either landed heads or it hasn(cid:19)t(cid:2) so the probability of the coin landing heads is
either (cid:7) or (cid:5) (cid:20)although agent p(cid:0) does not know which(cid:21)(cid:9) This point of view appears in a number
of papers in the philosophical literature (cid:20)for example(cid:2) (cid:13)Fra(cid:14)(cid:7)(cid:2) Lew(cid:14)(cid:7)(cid:15)(cid:21)(cid:9) Interestingly(cid:2) the same
issue arises in quantum mechanics(cid:2) in Schro(cid:31)dinger(cid:19)s famous cat(cid:30)in(cid:30)the(cid:30)box thought experiment
(cid:20)see (cid:13)Pag(cid:14)(cid:6)(cid:15) for a discussion(cid:21)(cid:9)
We claim that these twochoices of probability are best explained in terms of betting games
(cid:20)assuming honest players(cid:21)(cid:9) At time (cid:7)(cid:2) agent p(cid:0) should certainly be willing to accept an o(cid:12)er
from either p(cid:2) or p(cid:3) to bet (cid:5) for a payo(cid:12) of (cid:6) if the coin lands heads (cid:20)assuming p(cid:0) is risk
(cid:3)
neutral (cid:21)(cid:9) Half the time the coin will land heads and p(cid:0) will be (cid:5) ahead(cid:2) and half the time
the coin will land tails and p(cid:0) will lose (cid:5)(cid:2) but on average p(cid:0) will come out even(cid:9) On the other
hand(cid:2) p(cid:0) is clearly not willing to accept such an o(cid:12)erfromp(cid:3) attime (cid:5) (cid:20)since p(cid:3) can tell at time
(cid:5) whether it is going to win the bet(cid:2) and since p(cid:3) is presumably willing to o(cid:12)er the bet only
(cid:3)
Informally(cid:5) an agent is said to be risk neutral if it is willing to accept all bets where its expected winnings
are nonnegative(cid:8)
(cid:6)
when it will win(cid:21)(cid:2) although p(cid:0) is still willing to accept this bet from p(cid:2)(cid:9) The point here is that
if you do not want to lose money in a betting game(cid:2) not only is your knowledge important(cid:2) but
also the knowledge of the opponent o(cid:12)ering the bet(cid:9) Betting games are not played in isolation!
Thus(cid:2) the role played by the second type of adversary in our framework is to model the
knowledge of theopponent o(cid:12)ering abet toanagentata given point in the run(cid:9) We sometimes
identify the opponent with an adversary of this type(cid:9) One obvious choice of opponent is to
assumeyouareplaying againstsomeonewhoseknowledge is identical toyourown(cid:9) This is what
decision theorists implicitly do when talking about an agent(cid:19)s posterior probabilities (cid:13)BG(cid:4)(cid:24)(cid:15)(cid:18)
it is also how we can understand the choice of probability space made in (cid:13)FZ(cid:14)(cid:14)a(cid:15)(cid:9) By way of
contrast(cid:2) the choice in (cid:13)HMT(cid:14)(cid:14)(cid:15) corresponds to playing someone who has complete knowledge
about the past and knows the outcome of the coin toss(cid:18) this corresponds to the viewpoint that
says that when the coin has landed(cid:2) the probability of heads is either (cid:7) or (cid:5) (cid:20)although you may
not know which(cid:21)(cid:9)
A further complication arises when analyzing asynchronous systems(cid:9) In this case(cid:2) there is
a precise sense in which an agent does not even know exactly when the event to which it would
like to assign a probability is being tested (cid:20)for example(cid:2) when a bet is being placed(cid:21)(cid:9) Thus we
need to consider a third type of adversary in asynchronous systems(cid:2) whose role is to choose
this time(cid:9) To illustrate the need for this third type of adversary(cid:2) we give an example of an
asynchronous system where there are a number of plausible answers to the question (cid:22)What is
the probability the mostrecent coin toss landed heads(cid:11)(cid:23) It turns outthat the di(cid:12)erent answers
correspond todi(cid:12)erent adversaries choosing the times toperform the testin di(cid:12)erent ways(cid:9) We
remarkthatthecase ofasynchronoussystemsis also considered in (cid:13)FZ(cid:14)(cid:14)a(cid:15)(cid:9) We canunderstand
the assignment of (cid:22)con(cid:17)dence(cid:23) made there as corresponding to playing against a certain class
of adversaries of this third type(cid:9)
As the preceding examples suggest(cid:2) each of the earlier de(cid:17)nitions of probabilistic knowledge
that appear in the literature can be understood as the (cid:22)best(cid:23) de(cid:17)nition for a particular choice
of adversaries(cid:9) On the other hand(cid:2) we show that every choice of adversaries gives rise to a
de(cid:17)nition of probabilistic knowledge that is (cid:22)best(cid:23) for that choice(cid:9) To make this precise(cid:2) we
formalize our intuition that the probability an agent assigns to an event is related to the payo(cid:12)
the agent is willing to accept in a betting game with an opponent(cid:9) We de(cid:17)ne a particular
betting game(cid:2) and show that once we (cid:17)x the choice of adversaries(cid:2) there is a de(cid:17)nition of
probabilistic knowledge that is best in terms of doing as well as possible against an opponent
whose knowledge is modeled by the second adversary(cid:9) We show how this de(cid:17)nition corresponds
to a strategy that enables the agent to break even (cid:20)at least(cid:21) in the game(cid:2) and how any other
de(cid:17)nition with this property corresponds to an overly conservative strategy that assumes the
opponent is more powerful than it really is(cid:9) These results form the technical core of our paper(cid:9)
The rest of the paper is organized as follows(cid:9) In the next section(cid:2) Section (cid:6)(cid:2) we provide
a formal model of a system of agents such as a distributed system(cid:9) In Section (cid:10) we consider
the problem of putting a probability on the runs of a system(cid:18) this is where we need the (cid:17)rst
type of adversary(cid:2) to factor out the nondeterministic choices(cid:9) In Section (cid:24) we start to consider
the issue of how probability should change over time(cid:9) In Section (cid:4) we consider the choices
that must be made in a general de(cid:17)nition of probabilistic knowledge(cid:9) In Section (cid:28) we consider
particularchoicesofprobabilityassignmentsthatseemreasonableinsynchronoussystems(cid:9) Here
we consider the second type of adversary(cid:2) representing the knowledge of the opponent in the
(cid:10)
betting game(cid:9) In Section (cid:26)(cid:2) we consider asynchronous systems(cid:2) where we also have to consider
the third type of adversary(cid:9) In Section (cid:14) we apply our ideas to analyzing the coordinated
attack problem(cid:2) showing how di(cid:12)erent notions of probability correspond to di(cid:12)erent levels of
guaranteesin coordinated attack(cid:9) The paper ends with twoappendices(cid:9) In Appendix Awe give
the proofs of the results claimed in the paper(cid:2) and in Appendix B we discuss some interesting
secondary observations related to the rest of the paper(cid:9)
(cid:2) Modeling systems
As the examples in the introduction show(cid:2) the analysis of a probabilistic system typically
depends on the choice of a probability distribution on the runs or executions of the system(cid:9) In
this section(cid:2) we(cid:17)x a model of computation that de(cid:17)nes these runs(cid:2) and in later sections we will
vary the probability distributions associated with these runs(cid:9) Our model is actually the model
given in (cid:13)HF(cid:14)(cid:3)(cid:15)(cid:2) which is itself a simpli(cid:17)cation of the model given in (cid:13)HM(cid:3)(cid:7)(cid:15)(cid:9) Both models are
heavily in(cid:16)uenced by models for distributed computation(cid:9)
Consideranarbitrarysystemofninteractingagentsp(cid:0)(cid:5)(cid:0)(cid:0)(cid:0)(cid:5)pn(cid:9) Intuitively(cid:2) arunofasystem
is a complete description of one of the possible interactions of the agents(cid:9) Such an interaction
is uniquely determined by the sequence of global states through which the system passes as a
result of the interaction(cid:9) A global state is modeled as a tuple consisting of each process(cid:19) local
state(cid:2) together with the state of the environment where(cid:2) loosely speaking(cid:2) the environment is
intended tocapture everything relevant tothe stateof the system thatcannot be deduced from
the agents(cid:19)local states(cid:9) Formally(cid:2) aglobal stateis an(cid:20)n"(cid:5)(cid:21)(cid:30)tuple (cid:20)se(cid:5)s(cid:0)(cid:5)(cid:0)(cid:0)(cid:0)(cid:5)sn(cid:21) oflocal states(cid:2)
where si is the local state of agent pi(cid:2) and se is the state of the environment(cid:9) A run of the
system is mapping r from times to global states(cid:9) We assume for sake of convenience that times
are natural numbers(cid:9) A system is a set R of runs(cid:2) intuitively the set of all possible interactions
of the system agents(cid:9) We denote the global state at time k in run r by r(cid:20)k(cid:21)(cid:2) the local state
of pi in r(cid:20)k(cid:21) by ri(cid:20)k(cid:21)(cid:2) and the state of the environment by re(cid:20)k(cid:21)(cid:9) We refer to the ordered pair
(cid:20)r(cid:5)k(cid:21)consisting of a run r and a time k as a point(cid:9) Later in the paper(cid:2) we will assume that the
entire history of a run r up to time k is encoded in the environment(cid:19)s state in r(cid:20)k(cid:21)(cid:9) This allows
us to think of the runs of a system in terms of a computation tree(cid:25) nodes of the tree are global
states (cid:20)and correspond to a set of points(cid:21)(cid:2) and the paths in the tree are the runs of the system(cid:9)
(cid:0) (cid:0)
We say that a run r extends a point (cid:20)r(cid:5)k(cid:21) if r and r pass through the same global states up
(cid:0) (cid:0) (cid:0) (cid:0)
to time k(cid:18) that is(cid:2) r(cid:20)k(cid:21)# r (cid:20)k(cid:21) for (cid:7) (cid:2) k (cid:2) k(cid:9)
A factis considered to be true orfalse ofa point(cid:9) We identify a fact (cid:2)with the set of points
(cid:4)
at which (cid:2) is true(cid:2) and write (cid:20)r(cid:5)k(cid:21)j# (cid:2) i(cid:12) (cid:2) is true at (cid:20)r(cid:5)k(cid:21)(cid:9) In a system R(cid:2) a fact (cid:2) is said
to be a fact about the run if(cid:2) given two points of the same run(cid:2) (cid:2) is either true at both points
or false at both points(cid:9) Similarly(cid:2) a fact (cid:2) is said to be a fact about the global state if(cid:2) given two
points with the same global state(cid:2) (cid:2) is true at both points or false at both points(cid:9)
We now de(cid:17)ne what it means for an agent to know a fact (cid:2) at a point (cid:20)r(cid:5)k(cid:21) of a system
R(cid:9) Intuitively(cid:2) ri(cid:20)k(cid:21) captures all of agent pi(cid:19)s information at (cid:20)r(cid:5)k(cid:21)(cid:9) We say pi considers a point
(cid:0) (cid:0) (cid:0) (cid:0)
(cid:20)r(cid:5)k(cid:21) possibleat (cid:20)r(cid:5)k(cid:21)(cid:2)and write (cid:20)r(cid:5)k(cid:21)(cid:3)i (cid:20)r(cid:5)k(cid:21)(cid:2) if pi has the same local state atboth points(cid:18)
(cid:4)
In Section (cid:12) wede(cid:11)ne a logical language for describing such facts(cid:8) Formally(cid:5) a factis the interpretation ofa
formula in such a language(cid:8) See (cid:2)HM(cid:3)(cid:6)(cid:7)for a complete formal treatment of the syntax and semantics of such a
language(cid:8)
(cid:24)
(cid:0) (cid:0) (cid:0) (cid:0) (cid:0) (cid:0)
that is(cid:2) if ri(cid:20)k(cid:21)# ri(cid:20)k(cid:21)(cid:9) We use Ki(cid:20)r(cid:5)k(cid:21) to denote f(cid:20)r(cid:5)k(cid:21) j(cid:20)r(cid:5)k(cid:21)(cid:3)i (cid:20)r(cid:5)k(cid:21)g(cid:2) the set of points
agent pi considers possible at (cid:20)r(cid:5)k(cid:21)(cid:9) Following (cid:13)HM(cid:3)(cid:7)(cid:15) (cid:20)and many other papers since then(cid:21)(cid:2) we
say pi knows (cid:2) at (cid:20)r(cid:5)k(cid:21) if (cid:2) is true at all points pi considers possible at (cid:20)r(cid:5)k(cid:21)(cid:9) This means pi
knows (cid:2) at (cid:20)r(cid:5)k(cid:21) if (cid:2) is guaranteed to hold given the information recorded in pi(cid:19)s local state
at (cid:20)r(cid:5)k(cid:21)(cid:9) More formally(cid:2) we denote the fact that pi knows (cid:2) at (cid:20)r(cid:5)k(cid:21) by (cid:20)r(cid:5)k(cid:21) j# Ki(cid:2)(cid:2) and
(cid:0) (cid:0) (cid:0) (cid:0)
de(cid:17)ne (cid:20)r(cid:5)k(cid:21)j# Ki(cid:2) i(cid:12) (cid:20)r(cid:5)k(cid:21) j# (cid:2) for all (cid:20)r(cid:5)k(cid:21) (cid:4) Ki(cid:20)r(cid:5)k(cid:21)(cid:9) While this de(cid:17)nition of knowledge
depends heavily on the system R (cid:20)it restricts the set of points an agent considers possible at a
given point(cid:21)(cid:2) the system will always be clear from context and we omit explicit reference to R
in our notation(cid:9)
(cid:3) Probability on runs
In order to discuss the probability of events in a distributed system(cid:2) we must specify a proba(cid:30)
bility space(cid:9) In this section we showthat in order to place a reasonable probability distribution
on the runs of a system(cid:2) it is necessary to postulate the existence of the (cid:17)rst type of adversary
sketched in the introduction(cid:9)
Considerthesimple systemconsisting ofasingle agentthattossesafair coinonce andhalts(cid:9)
This system consists of two runs(cid:2) one in which the coin comes up heads and one in which the
coin comes up tails(cid:9) The coin toss induces a very natural distribution on the two runs(cid:25) each is
assigned probability (cid:5)(cid:4)(cid:6)(cid:9)
Nowconsiderthesystem(cid:20)suggestedbyMosheVardi(cid:18) avariantappearsin(cid:13)FZ(cid:14)(cid:14)a(cid:15)(cid:21)consisting
of two agents(cid:2) p(cid:0) and p(cid:2)(cid:2) where p(cid:0) has an input bit and two coins(cid:2) one fair coin landing heads
with probability (cid:5)(cid:4)(cid:6) and one biased coin landing heads with probability (cid:6)(cid:4)(cid:10)(cid:9) If the input bit
is (cid:7)(cid:2) p(cid:0) tosses the fair coin once and halts(cid:9) If the input bit is (cid:5)(cid:2) p(cid:0) tosses the biased coin and
halts(cid:9) This system consists of four runs of the form hb(cid:5)ci(cid:2) where b is the value of the input bit
and c is the outcome of the coin toss(cid:9) What is the appropriate probability distribution on the
runs of this system(cid:11) For example(cid:2) what is the probability of heads(cid:11)
Clearly the conditional probability of heads given that the input bit is (cid:7) should be (cid:5)(cid:4)(cid:6)(cid:2)
while the conditional probability of heads given the input bit is (cid:5) should be (cid:6)(cid:4)(cid:10)(cid:9) But what is
the unconditional probability of heads(cid:11) If we are given a distribution on the inputs(cid:2) then it is
easy toanswer this question(cid:9) If we assume(cid:2) forexample(cid:2) that(cid:7) and (cid:5) areequally likely as input
(cid:0) (cid:0) (cid:0) (cid:2) (cid:5)
values(cid:2) then we can compute that the probability of heads is (cid:2) (cid:5) (cid:2) " (cid:2) (cid:5) (cid:3) # (cid:0)(cid:2)(cid:9) If we are not
given a distribution on the inputs(cid:2) then the question has no obvious answer(cid:9) It is tempting to
assume(cid:2) therefore(cid:2) that such a distribution exists(cid:9) Often(cid:2) however(cid:2) assuming a particular (cid:17)xed
distribution on inputs leads to results about a system that are simply too weak to be of any
use(cid:9) Knowing an algorithm produces the correct answer in (cid:9)(cid:3)(cid:3) of its runs when all inputs are
equally likely is of no use when the algorithm is used in the context of a di(cid:12)erent distribution
on the inputs(cid:9)
To overcome this problem(cid:2) one might be willing to assume the existence of some (cid:17)xed but
unknown distribution on the inputs(cid:9) Proving that an algorithm produces the correct answer in
(cid:9)(cid:3)(cid:3) of the runs in the context of an unknown distribution(cid:2) however(cid:2) is no easier than proving
that for each (cid:17)xed input the algorithm is correct in (cid:9)(cid:3)(cid:3) of the runs(cid:2) since it is always possible
for the unknown distribution to place all the probability on the input for which the algorithm
performs particularly poorly(cid:9) Here the advantage of viewing the system as a single probability
(cid:4)
space is lost(cid:2) since this is precisely the proof technique one would use when no distribution is
assumed in the (cid:17)rst place(cid:9) Moreover(cid:2) assuming the existence of some unknown distribution
on the inputs simply moves all problems arising from nondeterminism up one level(cid:9) Although
we have a distribution on the space of input values(cid:2) we have no distribution on the space of
probability distributions(cid:9)
This discussion leads us to conclude that some choices in a distributed system must be
viewed as inherently nondeterministic (cid:20)or(cid:2) perhaps better(cid:2) nonprobabilistic(cid:21)(cid:2) and that it is
inappropriate(cid:2) both philosophically and pragmatically(cid:2) tomodel probabilistically whatis inher(cid:30)
ently nondeterministic(cid:9) But then how can we reason probabilistically about a system involving
bothnondeterministic andprobabilistic choices(cid:11) Oursolution(cid:29)which isessentially aformaliza(cid:30)
tion of the standard approach taken in the literature(cid:29)is to factor out initial nondeterministic
events(cid:2) and view the system as a collection of subsystems(cid:2) each with its natural probability
distribution(cid:9) In the coin tossing example above(cid:2) we would consider two probability spaces(cid:2) one
corresponding to the input bit being (cid:7) and the other corresponding to the input bit being (cid:5)(cid:9)
(cid:6)
The probability of heads is (cid:5)(cid:4)(cid:6) in the (cid:17)rst space and (cid:6)(cid:4)(cid:10) in the second(cid:9)
We want to stress that although this example may seem arti(cid:17)cial(cid:2) analogous examples
frequently arise in the literature(cid:9) In a probabilistic primality(cid:30)testing algorithm (cid:13)Rab(cid:14)(cid:7)(cid:2) SS(cid:26)(cid:26)(cid:15)(cid:2)
for example(cid:2) we do not want to assume a probability distribution on the inputs(cid:9) We want to
know that for each choice of input(cid:2) the algorithm gives the right answer with high probability(cid:9)
Rabin(cid:19)s primality(cid:30)testing algorithm (cid:13)Rab(cid:14)(cid:7)(cid:15) is based on the existence of a polynomial(cid:30)time
computable predicate Pn(cid:20)a(cid:21) with the following properties(cid:25) (cid:20)(cid:5)(cid:21) if n is composite(cid:2) then at least
(cid:10)(cid:4)(cid:24)ofthe a(cid:4) f(cid:5)(cid:5)(cid:0)(cid:0)(cid:0)(cid:5)n(cid:6)(cid:5)gcause Pn(cid:20)a(cid:21)tobe true(cid:2) and (cid:20)(cid:6)(cid:21)if n is prime(cid:2) then nosuch a causes
n to be true(cid:9) Rabin(cid:19)s algorithm generates a polynomial number of a(cid:19)s at random(cid:9) If Pn(cid:20)a(cid:21) is
true for any of the a(cid:19)s generated(cid:2) then the algorithm outputs (cid:22)composite(cid:23)(cid:18) otherwise it outputs
(cid:22)prime(cid:23)(cid:9) Property(cid:20)(cid:6)(cid:21)guaranteesthatifthealgorithmoutputs(cid:22)composite(cid:23)(cid:2)thennisde(cid:17)nitely
composite(cid:9) If the algorithm outputs (cid:22)prime(cid:23)(cid:2) then there is a chance that n is not prime(cid:2) but
property(cid:20)(cid:5)(cid:21)guaranteesthatthisisveryrarelythecase(cid:25) ifnis indeed composite(cid:2)thenwithhigh
probability the algorithm outputs (cid:22)composite(cid:23)(cid:9) If the algorithm outputs (cid:22)prime(cid:23)(cid:2) therefore(cid:2)
it might seem natural to say that n is prime with high probability(cid:18) but(cid:2) of course(cid:2) this is not
(cid:5)Often(cid:5) even in the presence of nondeterminism(cid:5) we can impose a meaningful distribution on the runs of a
system without factoring the system into subsystems(cid:5) but the resulting distribution still may not capture all
of our intuition(cid:8) The problem in the preceding example is that probabilistic events (cid:13)the coin toss(cid:14) depend on
nonprobabilistic events (cid:13)the input bit(cid:14)(cid:8) Suppose(cid:5) however(cid:5) the agent tosses a fair coin regardless of the input
bit(cid:15)s value(cid:8) Now it is natural to assign probability (cid:4)(cid:0)(cid:16) to each of the events f(cid:13)(cid:4)(cid:2)h(cid:14)(cid:2)(cid:13)(cid:6)(cid:2)h(cid:14)g and f(cid:13)(cid:4)(cid:2)t(cid:14)(cid:2)(cid:13)(cid:6)(cid:2)t(cid:14)g
that the coin lands head and tails(cid:5) respectively(cid:8) Consider(cid:5) however(cid:5)the situation (cid:13)discussed in (cid:2)FH(cid:9)(cid:9)(cid:5)HMT(cid:9)(cid:9)(cid:7)(cid:14)
where an agent performs a given action a i(cid:17) the input bit is (cid:4) and the coin landed heads(cid:5) or the input bit is (cid:6)
and the coin landed tails(cid:8) Itis natural to argue that theprobability the agent performs the action a is also (cid:4)(cid:0)(cid:16)(cid:18)
if the input bit is (cid:4) then with probability (cid:4)(cid:0)(cid:16) the coin will land heads and a will be performed(cid:19) and ifthe input
bit is (cid:6) then with probability (cid:4)(cid:0)(cid:16) the coin will land tails and a will be performed(cid:8) Unfortunately(cid:5) our (cid:20)natural(cid:21)
distribution on the runs of the system does not support this line of reasoning(cid:5) since this distribution does not
assign a probability to the set f(cid:13)(cid:4)(cid:2)h(cid:14)(cid:2)(cid:13)(cid:6)(cid:2)t(cid:14)g corresponding to the performance of a(cid:8) In fact(cid:5) if we could assign
a probability to this set(cid:5) then we would have to consider it a measurable set(cid:8) Using this information(cid:5) we could
then prove that the sets f(cid:13)(cid:4)(cid:2)h(cid:14)(cid:2)(cid:13)(cid:4)(cid:2)t(cid:14)g and f(cid:13)(cid:6)(cid:2)h(cid:14)(cid:2)(cid:13)(cid:6)(cid:2)t(cid:14)g are also measurable sets(cid:8) This means that we would
have to assign a probability to having the input bit set to (cid:6) or (cid:4)(cid:5) but the setting of the input bit was assumed
to be nondeterministic(cid:22) Again(cid:5) however(cid:5) if we factor out this initial nondeterminism(cid:5) we can view the system
as two subsystems with obvious associated probability distributions(cid:5) and within each subsystem the action a is
performedwithprobability (cid:4)(cid:0)(cid:16)(cid:8) Thisisprecisely whatthereasoning underlying ourintuition isimplicitly doing(cid:8)
(cid:28)
r
(cid:0)Q
(cid:0) Q
(cid:0)(cid:0) (cid:0) Q (cid:0)(cid:0)
(cid:0) Q
(cid:0) Q
r(cid:0)(cid:2)(cid:0) QQsr
(cid:3)(cid:5) (cid:3)(cid:5)
(cid:0)(cid:0) (cid:3) (cid:5) (cid:0)(cid:0) (cid:0)(cid:2)(cid:0) (cid:3) (cid:5) (cid:0)(cid:2)(cid:0)
(cid:0)(cid:0)
(cid:3) (cid:5) (cid:3) (cid:5)
(cid:3)(cid:4)r(cid:3) (cid:5)(cid:5)Rr r(cid:3)(cid:4)(cid:3) (cid:6)r (cid:5)(cid:5)Rr
Figure (cid:5)(cid:25) A (cid:20)labeled(cid:21) computation tree(cid:9)
quite right(cid:9) The input n is either prime or it is not(cid:18) it does not make sense to say that it is
prime with high probability(cid:9) On the other hand(cid:2) it does make sense to say that the algorithm
gives the correct answer with high probability(cid:9) The natural way to make this statement precise
is to partition the runs of the algorithm into a collection of subsystems(cid:2) one for each possible
input(cid:2) and provethatthe algorithmgivestherightanswerwith high probability in eachofthese
subsystems(cid:2) where the probability on the runs in each subsystem is generated by the random
choices for a(cid:9) While for a (cid:17)xed composite input n there may be a few runs where the algorithm
incorrectly outputs (cid:22)prime(cid:23)(cid:2) in almost all runs it will give the correct output(cid:9)
In many contexts of interest(cid:2) the choice of input is not the only source of nondeterminism
in the system(cid:9) Later nondeterministic choices may also be made throughout a run(cid:9) In asyn(cid:30)
chronous distributed systems(cid:2) for example(cid:2) it is common to view the choice of the next pro(cid:30)
cessor to take a step or the next message to be delivered as a nondeterministic choice(cid:9) Similar
arguments to those made above can be used to show that we need to factor out these nonde(cid:30)
terministic choices in order to use the probabilistic choices (cid:20)coin tosses(cid:21) to place a well(cid:30)de(cid:17)ned
probability on the set of runs(cid:9) A common technique for factoring out these nondeterministic
choices is toassume the existence of a scheduler deterministically choosing (cid:20)asa function of the
history of the system up to that point(cid:21) the next processor to take a step (cid:20)cf(cid:9) (cid:13)Rab(cid:14)(cid:6)(cid:2) Var(cid:14)(cid:4)(cid:15)(cid:21)(cid:9)
It is standard practice to (cid:17)x some class of schedulers(cid:2) perhaps the class of (cid:22)fair(cid:23) schedulers
or (cid:22)polynomial(cid:30)time(cid:23) schedulers(cid:2) and argue that for every scheduler in this class the system
satis(cid:17)es some condition(cid:9)
As we now show(cid:2) if we view all nondeterministic choices as under the control of some
adversary taken from some class of adversaries(cid:2) then there is a straightforwardway to view the
set of runs of a system as a collection of probability spaces(cid:2) one for each adversary(cid:9) By (cid:17)xing
an adversarywe factor out the nondeterministic choices and are left with a purely probabilistic
system(cid:2) with the obvious distribution on the runs determined by the probabilistic choices made
during the runs(cid:9) This is essentially the approach taken in (cid:13)FZ(cid:14)(cid:14)a(cid:15)(cid:9)
Once we (cid:17)x an adversary A(cid:2) we can view the runs of the system with this adversary as
a (cid:20)labeled(cid:21) computation tree TA (cid:20)see Figure (cid:5)(cid:21)(cid:9) As in Section (cid:6)(cid:2) nodes of the tree are global
states and paths in the tree are runs(cid:9) Now(cid:2) however(cid:2) edges of the tree are labeled with positive
real numbers such that for every node the values labeling the node(cid:19)s outgoing edges sum to (cid:5)(cid:9)
Intuitively(cid:2) the value labeling an outgoing edge of node s represents the probability the system
makesthe corresponding transition fromnode s(cid:9) Given a (cid:17)nite pathin the tree(cid:2) the probability
of the set of runs extending this (cid:17)nite path is simply the product of the probabilities labeling
(cid:26)
the edges in this (cid:17)nite path(cid:9)
It is natural to view this computation tree TA as a probability space(cid:2) a tuple (cid:20)RA(cid:5)XA(cid:5)(cid:6)A(cid:21)
where RA is theset ofruns in TA(cid:2)XA consists ofsubsets ofRA thataremeasurable(cid:20)thatis(cid:2) the
ones to which a probability can be assigned(cid:18) these are generated by starting with sets of runs
with a common (cid:17)nite pre(cid:17)x and closing under countable union and complementation(cid:21)(cid:2) and a
probability function (cid:6)A de(cid:17)ned onsetsin XA sothattheprobability oftheset ofall runs witha
common pre(cid:17)x is the product of the probabilities labeling the edges of the pre(cid:17)x(cid:9) If we restrict
attention to (cid:17)nite runs (cid:20)as is done in (cid:13)FZ(cid:14)(cid:14)a(cid:15)(cid:21)(cid:2) then it is easy to see that each individual run
is measurable(cid:2) so that XA consists of all possible subsets of RA(cid:9) Moreover(cid:2) in the case of (cid:17)nite
runs(cid:2) the probability of a run is just the product of the transition probabilities along the edges
of the run(cid:9)
It is occasionally useful to view this computation tree TA as consisting of two components(cid:25)
the tree structure (cid:20)that is(cid:2) the unlabeled graph itself(cid:21)(cid:2) and the assignment of transition proba(cid:30)
bilities to the edges of the tree(cid:9) Given an unlabeled tree TA(cid:2) we de(cid:17)ne a transition probability
assignment for TA to be a mapping (cid:7) assigning transition probabilities to the edges of TA(cid:9) We
will use the notation TA at times to refer to the unlabeled tree(cid:2) to the labeled tree(cid:2) and to the
induced probability space(cid:18) which is meant should be clear from context(cid:9)
Wede(cid:17)neaprobabilisticsystem toconsistofacollection oflabeled computationtrees(cid:20)which
we view as separate probability spaces(cid:21)(cid:2) one for each adversary A in some set A(cid:9) We assume
that the environment component in each global state in TA encodes the adversary A and the
entire past history of the run(cid:9) This technical assumption ensures that di(cid:12)erent nodes in the
same computation tree have di(cid:12)erent global states(cid:2) and that we cannot have the same global
state in two di(cid:12)erent computation trees(cid:9) Given a point c(cid:2) we denote the computation tree
containing c by T(cid:20)c(cid:21)(cid:9) Our technical assumption guarantees that T(cid:20)c(cid:21) is well(cid:30)de(cid:17)ned(cid:9)
The choice of the appropriate set A of adversaries against which the system runs is typi(cid:30)
cally made by the system designer when specifying correctness conditions for the system(cid:9) An
adversary might be limited to choosing the initial input of the agents (cid:20)in which case the set of
possible adversaries would correspond to the set of possible inputs(cid:21) as is the case in the context
of primality(cid:30)testing algorithms in which an agent receives a single number (cid:20)the number to be
tested(cid:21)asinput(cid:9) On theotherhand(cid:2)anadversarymayalsodetermine theorderin which agents
are allowed to take steps(cid:2) the order in which messages arrive(cid:2) or the order in which processors
fail(cid:9) One might also wish to restrict the computational power of the adversary to polynomial
time(cid:9) What is the most appropriate set of powers to assign to the adversary(cid:11) It depends on
the application(cid:9)
(cid:4) Probability at a point
In thepreceding section(cid:2) weshowedhowtousethenotion ofan adversarytoimpose ameaning(cid:30)
ful probability distribution on the runs of a system(cid:9) In order to de(cid:17)ne an agent(cid:19)s probabilistic
knowledge atagiven point(cid:2) however(cid:2)weseemtorequire aprobability distribution onthe points
of the system(cid:2) and not the runs(cid:9) An agent(cid:19)s probability distribution on points must certainly
be related to the distribution on runs if it is to be at all meaningful(cid:9) Nevertheless(cid:2) the two
distributions maybequite di(cid:12)erent(cid:9) Forexample(cid:2) justasanagent(cid:19)sknowledgevaries overtime(cid:2)
we would expect the probability an agent assigns to an event to vary over time(cid:9) In contrast(cid:2)
(cid:14)
the distribution over runs can be viewed as a static distribution(cid:9) Furthermore(cid:2) depending on
which of the two distributions we use(cid:2) we can be led to quite di(cid:12)erent analyses of a protocol(cid:9)
The purpose of this section is to make clear the distinction between distributions overruns and
distributions over points(cid:9)
To begin(cid:2) consider the Coordinated Attack problem (cid:13)Gra(cid:26)(cid:14)(cid:15)(cid:9) Two generals A and B must
decide whether toattacka common enemy(cid:2) but werequire that anyattackbe a coordinated at(cid:30)
tack(cid:18) thatis(cid:2) A attacksi(cid:12) B attacks(cid:9) Unfortunately(cid:2) they can communicate only by messengers
who may be captured by the enemy(cid:9) It is known that it is impossible for the generals tocoordi(cid:30)
nateanattackunder such conditions (cid:13)Gra(cid:26)(cid:14)(cid:2)HM(cid:3)(cid:7)(cid:15)(cid:9) Suppose werelax thiscondition(cid:2) however(cid:2)
and require only that the generals coordinate their attackwith high probability (cid:13)FH(cid:14)(cid:14)(cid:2) FZ(cid:14)(cid:14)a(cid:15)(cid:9)
Toeliminate allnondeterminism(cid:2) letusassumegeneralAtossesafaircointodetermine whether
to attack(cid:2) and let us assume the probability a messenger is lost to the enemy is (cid:5)(cid:4)(cid:6)(cid:9) Let us
assume further that the system is completely synchronous(cid:9) Our new correctness condition is
that the condition (cid:22)A attacks i(cid:12) B attacks(cid:23) holds with probability (cid:0)(cid:3)(cid:3)(cid:9)
Consider the following two(cid:30)step solution CA(cid:0) to the problem(cid:9) At round (cid:7)(cid:2) A tosses a coin
and sends (cid:5)(cid:7) messengers to B i(cid:12) the coin landed heads(cid:9) At round (cid:5)(cid:2) B sends a messenger to
tell A whether it has learned the outcome of the coin toss(cid:9) At round (cid:6)(cid:2) A attacks i(cid:12) the coin
landed heads (cid:20)regardless of what it hears from B(cid:21) and B attacks i(cid:12) at round (cid:5) it learned that
the coin landed heads(cid:9) It is not hard to see that if we put the natural probability space on the
set of runs(cid:2) then with probability at least (cid:0)(cid:3)(cid:3) (cid:20)taken over the runs(cid:21) A attacks i(cid:12) B attacks(cid:25) if
the coin lands tails then neither attacks(cid:2) and if the coin lands heads then with probability at
least (cid:0)(cid:3)(cid:3) at least one of the ten messengers sent from A to B at round (cid:7) avoids capture and
both generals attack(cid:9)
This is very di(cid:12)erent(cid:2) however(cid:2) from saying that at all times both generals know that with
probability atleast (cid:0)(cid:3)(cid:3)theattackwill be coordinated(cid:9) Tosee this(cid:2) consider thestatejustbefore
attacking in which A has decided to attack but has received a message from B saying that B
has not learned the outcome of the coin toss(cid:9) At this point(cid:2) A is certain the attack will not be
coordinated(cid:9) Although we have not yet given a formal de(cid:17)nition of how to compute an agent(cid:19)s
probability at agiven point(cid:2) it seems unreasonable for an agenttobelieve with high probability
that an event will occur when information available to the agent guarantees it will not occur(cid:9)
On the other hand(cid:2) consider the solution CA(cid:2) di(cid:12)ering from the preceding one only in
that B does not try to send a messenger to A at round (cid:5) informing A about whether B has
learned the outcome of the coin toss(cid:9) An easy argument shows that in this protocol(cid:2) at all
times both generals have con(cid:17)dence (cid:20)in some sense of the word(cid:21) at least (cid:0)(cid:3)(cid:3) that the attack
will be coordinated(cid:9) Consider B(cid:2) for example(cid:2) after having failed to receive a message from
A(cid:9) B reasons that either A(cid:19)s coin landed tails and neither general will attack(cid:2) which would
happen with probability (cid:5)(cid:4)(cid:6)(cid:2) or A(cid:19)s coin landed heads and all messengers were lost(cid:2) which
(cid:0)(cid:0)
would happen with probability (cid:5)(cid:4)(cid:6) (cid:18) and hence the conditional probability that the attack
will be coordinated given that B received no messages from A is at least (cid:0)(cid:3)(cid:3)(cid:9)
As the preceding discussion shows(cid:2) in a protocol which has a certain property P with high
probability taken over the runs(cid:2) an agent maystill (cid:17)nd itself in a statewhere it knows perfectly
well thatP does not(cid:20)andwill not(cid:21)hold(cid:9) While correctnessconditions P forproblems arising in
computer science have typically been stated in terms of a probability distribution on the runs(cid:2)
it might be of interest to consider protocols where an agent knows P with high probability at
(cid:3)
Description:Given a class of adversaries, we show how to assign probability This paper appeared in Journal of the ACM, 40(4):917{962, September 1993 correspond to different adversaries choosing the times to perform the test in different