Table Of ContentTime and the Prisoner’s Dilemma
Yishay Mor and Jeffrey S. Rosenschein
Institute of Computer Science
Hebrew University
Givat Ram, Jerusalem, Israel
ph: 011-972-2-658-5353
fax: 011-972-2-658-5439
email: [email protected], jeff@cs.huji.ac.il
7
0
B
0 Abstract
2 D C
Thispaperexaminestheintegrationofcomputational
n complexity into game theoretic models. The example D P S
a P T
focused on is the Prisoner’s Dilemma, repeated for a
J A
2 fionnittehelenpglatyherosf’tcimome.puWtaetsiohnoawl tahbaitlitaymisinsiumffiaclibenotuntdo C S T R R
2 enable cooperative behavior.
T >R>P >S
] In addition, a variant of the repeated Prisoner’s 2R>T +S
T Dilemmagameissuggested,inwhichplayershavethe
choice of opting out. This modification enriches the
G
game and suggests dominance of cooperative strate- Figure 1: Prisoner’s Dilemma game matrix
. gies.
s
c Competitive analysis is suggested as a tool for inves-
[ tigating sub-optimal (but computationally tractable) As anexample,considertwo softwareagents,Aand
1 strategies and game theoretic models in general. Us- B, sent by their masters onto the Internet to find as
ingcompetitiveanalysis, itisshownthatforbounded
v manyarticlesaboutPDastheycan. Theagentsmeet,
players, a sub-optimal strategy might be the optimal
9 andidentifythattheyhaveacommongoal. Eachagent
choice, given resource limitations.
3
canbenefit fromreceiving informationfromthe other,
1
1 Keywords: Conceptual and theoretical foundations but sending information has a cost. The agents agree
0 of multiagent systems; Prisoner’s Dilemma to send packets of information to each other simulta-
7 neously. Assume that sending an empty packet costs
0 Introduction $1, sending a useful packet costs $2, and receiving a
/
s Alice andBobhavebeen arrestedassuspects for mur- useful packet is worth $3. This interaction is precisely
c der, and are interrogated in separate rooms. If they a PD game with S =−2, P =−1, R=1 and T =2.
:
v both admit to the crime, they get 15 years of impris- Under the assumption of rationality, the PD game
Xi onment. If both do not admit to the crime, they can has only one equilibrium: both players defect. This
only be convicted for a lesser crime, and get 3 years result is valid even for any finite sequence of games—
r
a each. However, if one of them admits and the other both players can deduce that the opponent will de-
does not,the defector becomesa state’switness andis fect in the last round, therefore they cannot be “pun-
released, while the other serves 20 years. ished” for defecting in the one-before-last round, and
This is a “Prisoner’s Dilemma” (PD) game, a bybackwardinductionit becomes commonknowledge
typeofinteractionthathasbeenwidelystudiedinPo- that both players will defect in every round of the re-
liticalScience,theSocialSciences,Philosophy,Biology, peated game.
ComputerScience,andofcourseinGameTheory. The The “always defect” equilibrium of PD is in a sense
feature of PD that makes it so interesting is that it is paradoxical;itcontradictssomeofourbasicintuitions
analogous to many situations of interaction between about intelligent behavior, and stands in contrast to
autonomous parties. The PD game models most situ- psychological evidence (Rapoport et al. 1962). The
ations in which both parties can benefit from playing root of this paradox is the assumption of rationality,
cooperatively, but each party can get a higher gain whichimpliesunlimitedcomputationalpower;itispre-
from not cooperating when the opponent does. See cisely the unlimited computational power of rational
Figure 1. agents that both allows and requires them to perform
the unlimited backwardinductionin the repeatedPD. gated evolutionary models of games. Most, if not all,
In reality, both natural and artificial agents have lim- of these works studied populations of deterministic or
ited resources. In this paper we show that once these stochastic automata with a small number of states.
limitationsareincorporatedintotheinteraction,coop- Thereasonforlimitingtheclassofcomputationalmod-
erative behavior becomes possible and reasonable. elsinvestigatedwasmainlypragmatic: the researchers
The idea of bounding agents’rationalityis not new. had limited resources of computer space and time at
The novelty of the approach presented here is in its their disposal. We hope that this paper might give a
straightforwardness. The bound onrationalityis mea- soundermotivationforthe focus on“simple”or“fast”
suredinthe moststandardscaleofComputer Science: strategies. We claim that the same limitations that
computation time. We assume that agents need time hold for researchers hold for any decision maker, and
for computations, and that the game is repeated for shouldbetreatedasaninherentaspectofthedomain.
a finite length of time, rather then a fixed number of The task of finding papers on the Internet, as pre-
iterations. These assumptions are sufficient to create sented in the example above, can be seen as a dis-
cooperative equilibria. tributed search problem. The power of cooperation in
Theseresultsareinterestingfromtwopointsofview, such problems has been studied in (Hogg & Huber-
the system (or environment) designer’s perspective, mann 1993).
and the agent designer’s perspective. From the sys- Other examples of domains for which work of the
tem designer’s point of view, it gives guidelines as to type presented here is relevant can be found in (Fikes
howto createa cooperation-encouragingenvironment. et al. 1995) and in (Foner 1995). Foner’s system ex-
Fromtheagentdesigner’spointofview,itenableshim plicitly relies on autonomous agents’ cooperation. He
todesignastrategythatwillimposecooperationonhis describesa scenarioinwhichagentspostadsonanet-
agent’s opponent. work, advertising their wish to buy or sell some item.
Ads(or,infact,theagentsbehindthem)findpartners
Related Work bycommunicatingwithotheragents,andsharingwith
them information on ad-location acquired in previous
A thoroughand comprehensivesurvey of the basic lit-
encounters. As Foner himself states:
erature on bounded rationality and repeated PD ap-
pears in (Kalai 1990). Axelrod (Axelrod & Hamilton
Suchbehaviorisaltruistic,inthatanygivenagent
1981; Axelrod 1984) reports on his famous computer
haslittle incentiveto rememberpriorads,though
tournament and analyzes social systems accordingly.
if there is uniformity in the programming of each
Most of the work on this subject has centered agent,the community asawhole willbenefit, and
on various automata as models of bounded ratio- so will each individual agent.
nality; (Papadimitriou 1992; Gilboa & Samet 1989;
Fortnow & Whang 1994) and others (see (Kalai 1990) Game theory predicts that such behavior will not pre-
foranextensivebibliography)dealwithfinitestateau- vail. The results presented in this paper suggest that
tomatawithalimitednumberofstates,and(Megiddo the computational incompetence of the agents can be
& Wigderson 1986) examines Turing machines with a utilized to make sure that it does.
bounded number of states. The main drawback of the
Outline of the Paper
automataapproachisthatcooperativebehaviorisusu-
ally achievedby “exhausting”the machine—designing The section “Finite Time Repeated Prisoner’s
agamepatternthatissocomplexthatthemachinehas Dilemma” presents and examines the finite time re-
to use all its computational power to follow it. Such peated PD game. The main result of this section is
a pattern is highly non-robust and will collapse in the in Theorem 4, which shows weak conditions for the
presence of noise. existence of a cooperative equilibrium.
Papadimitriou, in (Papadimitriou 1992), analyzes a The following section introduces the possibility of
3-player variant of the PD game. This game is played optingout,partingfromanunsuitablepartner. Inthat
in two stages: first, every player chooses a partner, sectionwemainlydeveloptoolsfordealingwithopting
then if two players choose each other, they play PD. out, and show conditions under which opting out is a
Twosectionsofthispaperdealwithasimilarvariation rational choice.
on the repeated PD game, in which players have the Inthesection“Sub-OptimalStrategies”weusecom-
possibility of opting out. petitive analysis to show that, in a sense, opting out
Several researchers (see (Nowak & Sigmund 1993; strengthensthecooperativeplayers. Withoutit,anon-
Binmore & Samuelson 1994) for examples and fur- cooperative player can force his opponent into non-
ther bibliography) took Axelrod’s lead and investi- cooperative behavior. The possibility of opting out
changes the balance of forces, allowing a cooperative Proof. The W row (column) is dominated by the D
playerto force anopponent into cooperativebehavior. row (column), and thus can be eliminated.
We now proceed to define our notion of bounded
Finite Time Repeated Prisoner’s
rationality, and examine its influence on the game’s
Dilemma
outcome. Theorem 2 is presented to enable the reader
In this section we deal with a game of PD that is to compare our approach to the more standard one of
repeated for a finite time (which we call the FTPD automata models.
game). Previous work (Kalai 1990) focused on finite
Definition 1 A Complexity Bounded
or infinite iterated PD (IPD). The basic idea is that
(CB) player is one with the following bound on ra-
two players play PD for N rounds. In each round,
tionality: each “compare” action takes the player (at
once both have made their move (effectively simulta- least) one clock tick.1
neously), they getthe payoffsdefined by the PDgame
Unless mentioned otherwise, we will assume this is
matrix. In our version of the game, the players play
the only bound on players’ rationality, and that each
PD for a fixed amount of (discrete) time. At each tick
“compare” requires exactly one clock tick.
oftheclock,ifbothmadeamove,theygetthePDpay-
off. However, if either player did not make his move Theorem 2 The complexity bound on rationality is
yet, nothing happens (both players get 0). See Fig- weaker than restricting players to Turing Machines.
ure 2. Readers who are familiar with the game theo- That is to say, any strategy realizable by a Turing Ma-
reticliteratureonPD,mightnoticeaproblemthatthis chine can be played by CB players.
perturbation entails. If H = 0, it creates a new equi-
Proof. Assuming that everyread/writeaTuring Ma-
libriumpoint,thustechnicallyeliminatingtheparadox
chine performs takes one clock tick, a Turing Machine
(this is,infact,whythepayoffwhenbothplayerswait
is by definition complexity bounded.
is labeled H and not defined as 0). However, this is
a technical problem, and can be overcome by techni- From now on we will deal only with complexity
cal means, for instance by setting P = 0 or H = −ǫ. bounded players. Furthermore, we must limit either
See (Mor 1995) for further discussion. design time or memory to be finite (or enumerable).2
Otherwise, our bound on rationality becomes void: a
B player can “write down” strategies for any N before-
hand, and “jump” to the suitable one as soon as it
D C W receives N.
D P S 0 Themainobjectiveofthissectionistoshowhowthe
P T
complexity bound on rationality leads to the possibil-
A C S T R R 0 ity of cooperative behavior. To do this, we must first
definetheconceptsofequilibriumandcooperativeness.
W 0 0 H Definition 2 A Nash equilibrium in an n-player
game is a set of strategies, Σ = {σ1,..σn}, such that,
H ≤0 given that for all i player i plays σi, no player j can
get a higher payoff by playing a strategy other then σj.
Figure 2: FTPD Game Payoff Matrix Definition 3 ACooperativeEquilibriumisapair
of strategies in Nash equilibrium, such that, when
Rules of the FTPD Game: playedoneagainsttheother, theywillresultinapayoff
of R∗N to both players.
• 2 players play PD repeatedly for N clock ticks, N
1Formally,wehavetosayexplicitlywhatwemeanbya
given as input to players.
“compare”. Thekeyideaisthatanycomputationalprocess
• At each round, players can choose C (cooperate),
takes time for the player. Technically, we will say that a
or D (defect). If they choose neither, W (wait) is CBplayercanperformatmostkbinaryXORsinoneclock
chosen for them by default. tick,and k<log N.
2
• The payoff for each player is his total payoff over N 2Papadimitriou (Papadimitriou 1992) makes a distinc-
tion between design complexity and decision complexity.
rounds (clock ticks).
In our model, decision complexity forces a player to play
W,whiledesigncomplexityisnotyethandled. Thechoice
Theorem 1 If all players are (unboundedly) rational, ofstrategyismadeatdesigntime;switchingfromSa toSb
and P > 0 > H, the FTPD game is reduced to the atdecisiontimecanbephrasedasastrategySc: “playSa,
standard IPD game. if...switch to Sb”.
B
Definition 4 A Cooperative Strategy is one that D C O W
participates in a cooperative equilibrium.
D P S Q 0
P T
Theorem 3 A cooperative player will Wait or Defect
only if his opponent Defected or Waited at an earlier C T R Q 0
S R
stage of the game.
A
Proof. If the playeris playing againstits counterpart O Q Q Qˆ 0
in the cooperative equilibrium and it Waits, its payoff
is no more then (N −1)∗R, in contradiction to the W 0 0 0 H
definition of cooperative equilibrium.
Figure 3: OPD Game Matrix
For any other player, as long as the opponent plays
C, the cooperative player cannot distinguish it from
its equilibrium counterpart. Finite Time Repeated Prisoner’s
Theorem 4 is the main result of this section. Dilemma with Opting Out
Theorem 4 If R > 0, then there exists a cooperative In this section we study a variant of the FTPD game,
equilibrium of the FTPD game. which we call OPD, in which players have the option
of Opting Out—initiating a change of opponent (see
Proof. Consider the strategy GRIM:
Figure 3). A similar idea appears in (Papadimitriou
1992). While in the previous section we only altered
1. BeginbyplayingC,continuedoingsoaslongasthe
the nature of the players and the concept of iteration,
opponent does.
here we change the rules of the game. This requires
2. Ifthe opponentplaysanythingotherthenC, switch
some justification.
to playing D for the remainder of the game.3
The motivation for OPD is that it allows players
greater flexibility in choosing strategies. Consider
Note that GRIM requires one “compare” per round,
player A whose opponent plays GRIM. In IPD or
sohavingitplayedbybothplayersresultsinNrounds
FTPD, once A defects, the opponent will defect for-
played, and a payoff of N ∗R for each player.
everafter. Fromthis point on, the only rationalstrat-
Assume that bothplayersdecide to play GRIM.We
egy for A is also to defect until the end of the game.
have to show that no player can gain by changing his
The possibility of opting out enables A to return to
strategy.
a cooperative equilibrium. Generally, the existence of
IfplayerAplaysDinroundk,k <N,thenplayerB
“vengefulstrategies”(likeGRIM)isproblematicinthe
plays D from that round on. A can gain from playing
standard game context. Other researchers (Gilboa &
D if and only if he plays D in the Nth round only. In
Samet 1989;Fortnow & Whang 1994)have dealt with
order to do so, he has to be able to count to N. Since
these strategies by explicitly removing them from the
N is given as input at the beginning of the game, A
set of strategies under consideration. Once opting out
can’tdesignastrategythatplaysGRIMN−1rounds
is introduced, this is no longer necessary.
and then plays D. He must compare some counter
The possibility of opting out also makes the game
to N −2 before he defects, to make sure he gains by
less vulnerable to noise, and provides fertile ground
defection. Doing so, he causes B to switch to playing
for studying learning in the PD game context. These
D. Therefore, his maximal payoff is:
issues are beyond the scope of the current paper;
see (Mor 1995) for further discussion.
(N −1)∗R+(P −R)
One last motivation for this line of work is soci-
A will change his strategy if and only if P −2R > 0. ological: “breaking up a relationship” is a common
We assumed that R>P. If R>0, then 2R>R, and way of punishing defectors in repeated human inter-
A will not switch. Else, by assumption, 2R > T +S, actions. Vanberg and Congleton, in (Vanberg & Con-
soAwillswitchonlyif P >T+S,andwillnotswitch gleton 1992), claim that opting out (“Exit” in their
if T −P >−S. terminology) is the moral choice, and show that Opt-
for-Tatis moresuccessfulthanTit-for-TatinAxelrod-
3The strategy GRIM assumes a player can react to a type tournaments (Axelrod 1984).
WaitorDefectinthenextround,i.e.,withoutwaitinghim- WestartbycomparingtheOPDgametotraditional
self. Thiscanbedone,forexample,ifthestrategyisstated approaches, namely to games played by rational play-
as an “if” statement: “If opponent played C, play C, else, ers and to the IPD game.
play D.” This strategy requires one“compare” perround.
Theorem 5 If all players are (unboundedly) rational, high as against an unknown one. If he plays anything
and P ≥ Q ≥ 0 (and Qˆ < 0, H < 0), the OPD game other then C, his opponent becomes unknown, and
is reduced to the standard IPD game. the proof is complete. If he plays C, we continue by
induction.
Proof. The W row (column) is dominated by the D
row (column), and thus can be eliminated.
Theorem 8 Ifthereisapositiveprobability ofatleast
Playing O will result in a payoff of Q, Q < P, and
one of the other players being cooperative, and re-
a switch in partner. Since all players are rational,this
matching is instantaneous, then a player in the OPD
is equivalent to remaining with the same partner. A
game has a positive expected gain from opting out
player cannot get a higher payoff by playing O, hence
4
whenever the opponent waits.
the O row and column are eliminated.
Proof. This theoremfollowsdirectly fromTheorem3
Theorem 6 In the IPD game with opting out, if P >
and Theorem 7. The full proof can be found in (Mor
Q > Qˆ and H < 0, then the only Nash equilibrium is
1995).
<DN,DN >.
Proof. Thestandardreasoningofbackwardinduction
Sub-Optimal Strategies
works in this game; see (Mor 1995) for the full proof.
In considering whether to opt out or not, player A
has to assess his expected payoff against his current
Fromnow onwe will dealonly with the OPDgame.
opponent B, the probability B will opt out given A’s
For simplicity’s sake, we will assume Q = 0, and Qˆ =
actions,andhisexpectedpayoffafteroptingout. Such
H =−ǫ.
calculationsrequireextensivecomputationalresources,
Let us define the full context of the game.
and thus carry a high cost for a CB player. Fur-
Rules of the OPD Game: thermore, they require vast amounts of prior knowl-
edgeaboutthe differentplayersinthe population. Al-
1. A population of 2K players is divided into pairs.
thoughcompleteinformationisastandardassumption
2. For every pair of players < i,j >, every clock tick, in many game theoretic paradigms, it is infeasible in
each player outputs an action α, α ∈ {C,D,O}. If most realistic applications to form a complete proba-
he does not output any of these, W is assigned by bilistic description of the domain.
default. As an alternative to the optimizing approach, we
3. Ifeither outputs O, the pairis splitandbothgetQ, examine satisfying strategies.5 Instead of maximizing
regardlessof the other player’s action. Otherwise,if theirexpectedpayoff,satisfyingplayersmaximizetheir
both play C or D, they get the PD payoff, and con- worst case payoff. The intuition behind this is, that
tinue playing with one another. In any other case, if maximizing expected payoff is too expensive com-
both get 0 and remain paired. putationally, the next best thing to do is to ensure
4. Bothplayerscanobservetheirpayofffortheprevious thehighestpossible“securitylevel,”protectingoneself
round. Weassumebothdo,andthereforeignorethis best against the worst (max-min).
monitoring action in our computations, i.e., assume
this is done in 0 time. 4The problematic condition is instantaneous rematch-
ing. Thereare2necessaryandsufficientconditionsforthis
5. Everytclockticks,allunpairedplayersarerandomly
tohappen:
matched.
6. The payoff to a player is the sum of payoffs he gets 1. Optingoutcanbedoneinthesameroundtheopponent
waits.
over N clock ticks.
2. If a player opted out in this round, he will be playing
We can now make a qualitative statement: opting against a (possibly new) opponent in the next round,
i.e., there is no “transition time” from one opponent to
out can be a rational response to a defection. This is
another.
the intuition behind Theorem 7. In the next section
we will attempt to quantify this claim. Thesecondconditionisunjustifiableinanyrealisticsetting.
The first condition returns to a player’s ability to “watch
Theorem 7 The expectedpayoff when playing against
the opponent” without waiting, mentioned in Footnote 3.
a cooperative opponent is higher than when playing In (Mor 1995) we show different assumptions that make
against an unknown one. thiscondition possible.
5Herbert Simon (Simon 1969; 1983) coined the term
Proof. (Sketch - See (Mor 1995) for a more detailed
“Satisficing” as an alternative to “Maximizing.” Although
proof.) Whatever A’s move is for the first round, his ourapproachiscloseinspirittohis,itdiffersinitsformal-
payoff against a cooperative opponent is at least as ism. Therefore we prefer to use a slightly different term.
In orderto evaluate satisfying strategies,we use the Proof. Assume there exists a strategy θ that offers a
method of competitive analysis. Kaniel (Kaniel 1994) player A playing it a payoff greater then N ∗(R+β),
names Sleator and Tarjan (Sleator & Tarjan 1984) as β > 0. This means that in some rounds A defects
the initiators of this approach. The idea is to use the and receives T. However, as soon as A defects, he
ratio between the satisfying strategy’s payoff and that identifies himself as a θ player. His opponentcaninfer
of a maximizing strategy as a quantifier of the satisfy- that playing against A he will receive a payoff lower
ing strategy’s performance. thenthe securitylevelN∗R−const,andwilloptout.
Letr bethe expectednumberofroundsaplayerwaits
Webeginbydefiningtheconceptsintroducedabove.
for a rematch. If (r+1)∗R < T then A loses every
Definition 5 The Security Level of a strategy S is time he defects, and will get a payoff <N ∗R. If (r+
the lowest payoff a player playing S might get. For- 1)∗R>T then“defectalways”istheonlyequilibrium
mally, if Γ is the set of all possible populations of play- strategy, and the probability of being rematched with
ers, and µ(S) is the expected payoff of S then the se- acooperativeplayerbecomes0(incontradictiontoour
curity level SL(S) is: assumption).
minγ∈Γ {µ| the population is γ} From Theorems 9 and 10 we get that as N and q
grow, the competitive ratio of a satisfying strategy in
Definition 6
this game approaches 1. If the time needed to com-
• A Maximizing player is one that plays in a way pute the optimizing strategy is proportional to N, a
that maximizes his expected payoff. satisfyingstrategyisde facto optimalfora CB player.
• A Satisfying player is one that plays in such a way
Related work revisited
that maximizes his security level.
The possibility of opting out makes cooperative, non-
Definition 7 Let A be a satisfying player, and let S
vengeful strategies even stronger. This tool can be
be A’s strategy. Let h(S) be the expected payoff of A,
further developed into a strategy-designing tool (Mor
had he been a maximizing player. The Competitive
1995). We wish to demonstrate this claim using the
ratio of S is:
examples presented in the introduction of this paper.
SL(S) Indomainslikethepaper-searchingagentsorFoner’s
CR(S)= . ad-agents, cooperative information sharing is a desir-
h(S)
able,if notrequired,behavior. We proposethe follow-
Thefollowingtwotheoremsattempttojustifyexam- ing guidelines for designers of such an environment:
inationofsatisfyingratherthanmaximizingstrategies.
• Ensure a large enough initial proportion of cooper-
Theorem 9 If there is a probability of at least q > 0 ative agents in the domain (e.g., by placing them
of being matched with a cooperative player at any stage there as part of the system). Make the existence of
ofthegame, thenaplayerintheOPDgamecanensure these agents known to the users of the system.
himself a security level of N ∗R−const.
• Advise the users to use the following protocol in in-
Proof. Consider the strategy Opt-for-Tat (OFT).
formational transactions:
This is the strategy of opting out whenever the op-
ponent does not cooperate and cooperatingotherwise. – Split the information being transferred to small
TheexpectednumberoftimesaplayerAplayingOFT packets.
1
will opt out is , after which he will be matched with
q – UseanOFTstrategy,i.e.,keeponsendingpackets
a cooperativeplayer,andreceiveR foreachremaining
aslongastheopponentdoes,breakupandsearch
round of the game.
for a new opponent as soon as a packet does not
Actually, it is easy to show that const = 1 ∗[(r +
q arrive in time.
1)R−S] where r is the expected number of rounds a
player has to wait for a rematch (Mor 1995).
• Inform the users of the satisfying properties of this
strategy.
Theorem 10 If there is a probability of at least q >
0 of being matched with a cooperative player, and all As shown in the section “Sub-Optimal Strategies,” it
players in the population are satisfying or optimizing is reasonable to assume that a large proportion of the
players, a player in the OPD game cannot receive a users will implement cooperative strategies in their
payoff higher then N ∗R+const. agents.
Conclusions Hogg, T., and Hubermann, B. A. 1993. Better than
We introduced the finite time repeated PD game, and thebest: Thepowerofcooperation. InNadel,L.,and
the notion of complexity bounded players. In doing so, Stein, D., eds., 1992 Lectures in Complex Systems,
we encapsulated both the player’s utility and his in- volume V of SFI Studies in the Sciences of Complex-
ductive power into one parameter: his payoff in the ity. Reading, MA: Addison-Wesley. 163–184.
game. Cooperative equilibria arise as an almost in- Kalai, E. 1990. Bounded rationality and strategic
herent characteristic of this model, as can be seen in complexity in repeated games. In Ichiishi, T.; Ney-
Theorem 4. Furthermore, the common knowledge of man, A.; and Tauman, Y., eds., Game Theory and
limited computational power enables an agent to con- Aplications. San Diego: Academic Press. 131–157.
trol his opponent’s strategy. If the opponent spends
Kaniel, R. 1994. On the equipment rental problem.
too much time on computations, he is probably plan-
Master’s thesis, Hebrew University.
ning to defect.
In the sections that followed, we introduced and Kreps, D.; Milgrom, P.; Roberts, J.; and Wilson,
studied opting out in the PD game. We discussed R. 1982. Rational cooperation in finitely repeated
boththeoreticalandintuitivemotivationsforthisvari- Prisoners’ Dilemma. Journal of Economic Theory
ation on the standard game description. In the sec- 27(2):245–252.
tion “Sub-Optimal Strategies,” we used the tool of
Megiddo, N., and Wigderson, A. 1986. On play by
competitive analysis to show that the possibility of
meansofcomputingmachines.InConferenceonThe-
opting out makes cooperative, non-vengeful strategies
oretical Aspects of Reasoning about Knowledge, 259–
evenstronger. Wethendemonstratedtheusefulnessof
274.
these results with relation to the examples presented
Mor, Y. 1995. Computational approaches to ratio-
in the introduction.
nal choice. Master’s thesis, Hebrew University. In
Acknowledgments preparation.
This research has been partially supported by the Is- Nowak, M., and Sigmund, K. 1993. A strategy of
raeli Ministry of Science and Technology (Grant 032- win-stay lose-shift that outperforms tit-for-tat in the
8284)andbytheIsraelScienceFoundation(Grant032- prisoner’s dilemma game. Nature 364:56–58.
7517).
Papadimitriou, C. H. 1992. On players with a
boundednumberofstates. Games and Economic Be-
References
havior 4:122–131.
Axelrod, R., and Hamilton, W. 1981. The evolution
Rapoport, A.; Chammah, A.; Dwyer, J.; and Gyr,
of cooperation. Science 211(4489):1390–1396.
J. 1962. Three-person non-zero-sum nonnegotiable
Axelrod,R. 1984.TheEvolutionofCooperation. New
games. Behavioral Science 7:38–58.
York: Basic Books.
Simon, H. A. 1969. The Sciences of the Artificial.
Binmore, K., and Samuelson, L. 1994. Drifting to
Cambridge, Massachusetts: The MIT Press.
equilibrium. Unpublished manuscript.
Simon, H. A. 1983. Models of Bounded Rationality.
Fikes, R.; Engelmore, R.; Farquhar, A.; and Pratt,
Cambridge, Massachusetts: The MIT Press.
W. 1995. Network-based information brokers. In
Sleator, D. D., and Tarjan, R. E. 1984. Amortized
The AAAI Spring Workshop on Information Gather-
efficiency of list rules. STOC 16:488–492.
ing from Distributed, Heterogeneous Environments.
Vanberg, V. J., and Congleton, R. D. 1992. Ratio-
Foner, L. N. 1995. Clustering and information shar-
nality, morality,and exit. American Political Science
ing in an ecology of cooperating agents. In The
Review 86(2):418–431.
AAAI Spring Workshop on Information Gathering
from Distributed, Heterogeneous Environments.
Fortnow, L., and Whang, D. 1994. Optimality and
dominationin repeated games with bounded players.
Technical report, Department of Computer Science
University of Chicago, Chicago.
Gilboa, I., and Samet, D. 1989. Bounded vs. un-
boundedrationality: Thetyrannyoftheweak.Games
and Economic Behavior 1:213–221.