Table Of Content1
A refined analysis of the Poisson channel in the
high-photon-efficiency regime
Ligong Wang, Member, IEEE and Gregory W. Wornell, Fellow, IEEE
Abstract—We study the discrete-time Poisson channel under output y is the actual number of photons that are detected in
the constraint that its average input power (in photons per the pulse duration. The dark current λ is the average number
channel use) must not exceed some constant E. We consider the
of extraneous counts that appear in y. We note that, although
4 wideband, high-photon-efficiency extreme where E approaches
1 zero, and where the channel’s “dark current” approaches zero the name “dark current” is traditionally used in the Poisson-
0 proportionallywithE.Improvingoverapreviouslyobtainedfirst- channelliterature,theseextraneouscountsusuallyincludeboth
2 ordercapacityapproximation,wederivearefinedapproximation, the detector’s “dark clicks” (which are where the name “dark
r which includes the exact characterization of the second-order current” comes from) and photons in background radiation.
p term, as well as an asymptotic characterization of the third- We impose an average-power constraint2 on the input
A order term with respect to the dark current. We also show that
pulse-position modulation is nearly optimal in this regime.
E[X]≤E (3)
3
Index Terms—Optical communication, Poisson channel, chan-
2
nel capacity, wideband, low SNR. for some E >0.
] In applications like free-space optical communications, the
T
cost of producing and successfully transmitting photons is
I I. INTRODUCTION
high, hence high photon-information efficiency—amount of
s. WEconsiderthediscrete-timememorylessPoissonchan-
information transmitted per photon, which we henceforth call
c nelwhoseinputxisinthesetR+ ofnonnegativereals
[ 0 simply “photonefficiency”—isdesirable. As we later demon-
and whose output y is in the set Z+ of nonnegative integers.
0 strate, this can be achieved in the wideband regime, where
2 Conditional on the input X =x, the output Y has a Poisson
thepulse durationofthe inputapproacheszeroand,assuming
v
distribution of mean (λ+x), where λ≥0 is called the “dark
7 thatthecontinuous-timeaverageinputpowerisfixed,whereE
current”andisaconstantthatdoesnotdependontheinputX.
6 approaches zero proportionally with the pulse duration. Note
7 We denote the Poisson distribution of mean ξ by Poisξ(·) so thatinthisregimetheaveragenumberofdetectedbackground
5
ξy photons or dark clicks also tends to zero proportionally with
. Pois (y)=e−ξ , y ∈Z+. (1)
1 ξ y! 0 the pulse duration. Hence we have the linear relation
0
4 With this notation the channel law W(·|·) is λ=cE, (4)
1
v: W(y|x)=Poisλ+x(y), x∈R+0,y ∈Z+0. (2) where c is some nonnegative constant that does not change
i This channel models pulse-amplitude modulated optical withE.Inpractice,asymptoticresultsinthisregimeareuseful
X
in scenarios where E is small and where λ is comparable to
communication where the transmitter sends light signals in
r or much smaller than E. Scenarios where E is small but λ
a coherent states (which are usually produced using laser de-
is much larger is better captured by the model where λ stays
vices), and where the receiver employs direct detection (i.e.,
constant while E tends to zero; see [2, Proposition 2].
photon counting) [1]. The channel input x describes the
We denote the capacity (in nats per channel use) of the
expected number of signal photons (i.e., photons that come
channel (2) under power constraint (3) with dark current (4)
fromthe inputlightsignalrather than noise)to be detected in
the pulse duration, and is proportional to the light signal’s by C(E,c), then
intensity,1 the pulse duration, the channel’s transmissivity,
C(E,c)= sup I(X;Y), (5)
and the detector’s efficiency; see [1] for details. The channel
E[X]≤E
This work was supported in part by the DARPA InPho program under where the mutual information is computed from the channel
ContractNo.HR0011-10-C-0159, andbyAFOSRunderGrantNo.FA9550- law (2) and is maximized over input distributions satisfy-
11-1-0183. This paper was presented in part at the 2012 IEEE Information
ing (3), with dark current λ given by (4). As we shall
TheoryWorkshop(ITW)inLausanne, Switzerland, andwillbepresentedin
partatthe2014IEEEInternationalSymposiumonInformationTheory(ISIT) see, our results on the asymptotic behavior of C(E,c) hold
inHonolulu, HI,USA. irrespectively of whether a peak-power constraint
L.WangiswiththeMassachusettsInstituteofTechnology,Cambridge,MA
02139USA(e-mail: [email protected]).
X ≤A with probability 1 (6)
G.W.WornelliswiththeMassachusettsInstituteofTechnology,Cambridge,
MA02139USA(e-mail:[email protected]).
1Weassumethatthepulsesaresquare,whichisusuallythecaseinpractice. 2Here “power” is in discrete time, means expected number of detected
Iftheyarenotsquare,thelightsignal’sintensityshouldbeaveragedoverthe signal photons per channel use, and is proportional to the continuous-time
pulseduration. physicalpowertimesthepulseduration.
2
is imposed or not, as long as A is positive and does not Some progress has been made in improving the approx-
approach zero together with E. imation (9). It is noticed in [10] that the photon efficiency
We now formally define the maximum achievable photon achievable on the Poisson channel (2) with λ= 0 may be of
efficiency C (E,c), where the subscript notation services the form
PE
1 1
a reminder that this quantity is photon efficiency and not log −loglog +O(1), (11)
E E
capacity:
C(E,c) whichmeansthatrestrictiontocoherent-stateinputsanddirect
C (E,c), . (7)
PE detection may induce a loss in the second-order term in
E
the photon efficiency on the pure-loss bosonic channel. For
Variouscapacity results for the discrete-time Poisson chan-
practicalvaluesofE,thissecond-ordertermcanbesignificant.
nel have been obtained [2]–[6]. Among them, [2, Proposition
For example, for E = 10−5, the difference between the first-
1]considersthesamescenarioasthepresentpaperandasserts
orderterm in (9) and the first- and second-ordertermsin (11)
that
C(E,c) is about 20%.
lim =1, c∈[0,∞). (8)
E↓0 Elog 1 The analysis in [10] (whose main focus is not on the
E Poisson channelitself) is based on certain assumptionsonthe
In other words, the maximum photon efficiency satisfies input distribution.4 It is therefore unclear from [10] if (11)
1 1 is the maximum photon efficiency achievable on the Poisson
C (E,c)=log +o log , c∈[0,∞). (9)
PE channel (2) with λ = 0 subject to constraint (3) alone, i.e.,
E E
(cid:18) (cid:19) if (11) is the correct expression for C (E,0). In the present
PE
Furthermore, the proof of [2, Proposition 1] shows that the
paper, we prove that this is indeed the case.
limit (8) is achievable using on-off keying. In the Appendix
Recentworkssuchas[10],[11]oftenignorethedarkcurrent
we provide a proof for the converse part of (8) that is much
in the Poisson channel. It has been unknown to us whether
simpler than the original proof in [2].3
the dark current, i.e., whether the constant c influences the
Theapproximationin(9)canbecomparedtothemaximum
second term in (11) and, if not, whether it influences the next
photonefficiencyachievableonthe pure-lossbosonicchannel
term in photon efficiency. We show that the constant c does
[7]. This is a quantum optical communication channel that
not influence the second-order term in C (E,c), but does
PE
attenuates the input optical signal, but that does not add any
influence the third-order, constant term.
noise to it. The transmitter and the receiver of this channel
It has long been known that that infinite photon efficiency
may employ any structure permitted by quantum physics.
onthePoissonchannelwithzerodarkcurrentcanbeachieved
When the transmitter is restricted to sending coherent optical
using pulse-position modulation (PPM) combined with an
states, and when the receiver is restricted to ideal direct
outer code [3], [4]. Further, [11] observes that PPM can
detection(withnodarkcounts),thepure-lossbosonicchannel
achieve (11) on such a channel. PPM greatly simplifies the
becomes the Poisson channel (2) with λ = 0. We denote
codingtaskforthischannel,sinceonecaneasilyapplyexisting
by C (E) the maximum photon efficiency of the pure-
PE-bosonic codes, such as a Reed-Solomon code, to the PPM “super
loss bosonic channel under an average-power constraint that
symbols”; while the on-off keying scheme that achieves (11)
is equivalent to (3). The value of C (E) can be easily
PE-bosonic has a highly skewed input distribution and is hence difficult
computed using the explicit capacity formula [7, Eq. (4)],
to code. The question then arises: how useful is PPM when
which yields
there is a positive dark current? This question has two parts.
1 First,isPPMstillnearoptimalintermsofcapacity(orphoton
C (E)=log +1+o(1). (10)
PE-bosonic E efficiency)whenthereisapositivedarkcurrent?Second,does
PPM still simplify coding when there is a dark current? We
Comparing (9) and (10) shows the following:
answer the first part of the question in the affirmative in our
• For the pure-loss bosonic channel in the wideband
main results. We cannot fully answer the second question in
regime, coherent-state inputs and direct detection are
this paper, but we shall discuss it in the concluding remarks
optimal up to the first-order term in photon efficiency
in Section VII.
(or, equivalently,in capacity). For example, they achieve
The rest of this paper is arranged as follows. We introduce
infinite capacity per unit cost [8], [9].
somenotationandformallydefinePPMinSectionII.Westate
• The dark current does not affect this first-order term.
anddiscuss ourmain resultsin Section III.We thenprovethe
In the present paper, we refine the analysis in [2] in two
achievability parts of these results in Sections IV and V, and
aspects. First, we provide a more accurate approximation for
provetheconversepartsinSectionVI.Weconcludethepaper
C (E,c)thatcontainshigher-orderterms,andthatreflectsthe
PE with numerical comparison of the bounds and some remarks
influenceofthedarkcurrent,i.e.,oftheconstantc.Second,we
in Section VII.
identify near-optimal modulation schemes that facilitate code
design for this channel. II. NOTATIONAND PPM
We usually use a lower-case letter like x to denote a
3Fortheachievability partof(8),wenote that itsproofcan besimplified
by letting the decoder ignore multiple photons, rather than considering all constant, and an upper-case letter like X to denote a random
possiblevalues ofY asin[2].Thiscanbeseenviatheachievability proofs variable.
in the current paper, which do ignore multiple photons at the receiver, and
whichyieldstrongerresultsthanthosein[2]. 4According toconversation withtheauthors.
3
Weusenaturallogarithms,andmeasureinformationinnats. The achievability part of Theorem 1 follows from Theo-
We use the usual o(·) and O(·) notation to describe behav- rem 2. To prove the latter, we need to show two things: first,
iors of functions of E in the limit where E approaches zero thatthelargestphotonefficiencythatisachievablewithsimple
with other parameters, if any, held fixed. Specifically, given PPM, which we henceforth denote by C (E,c), satisfies
PE-PPM
a reference function f(·) (which might be the constant 1), a
1 1
function described as o(f(E)) satisfies C (E,c)≥log −loglog +O(1), c∈[0,∞); (18)
PE-PPM
E E
o(f(E))
lim =0, (12) andsecond,thatthemaximumphotonefficiencythatisachiev-
E↓0 f(E) able with soft-decision PPM, which we henceforth denote by
and a function described as O(f(E)) satisfies CPE-PPM(SD)(E,c), satisfies
O(f(E)) 1 1
lim <∞. (13) CPE-PPM(SD)(E,c)−log +loglog
E↓0 f(E) lim lim E E ≥−1. (19)
(cid:12) (cid:12)
(cid:12) (cid:12) c→∞E↓0 logc
We emphasizethat, in pa(cid:12)rticular,w(cid:12)e do not use o(·) and O(·)
(cid:12) (cid:12)
to describe how functions behave with respect to c. To prove the converse part of Theorem 1, we note that the
We adopt the convention capacity of the channel is monotonically decreasing in c [2],
and hence so is the photon efficiency. It thus suffices to show
0log0=0. (14)
twothings:first,that,intheabsenceofdarkcurrent,thelargest
We next formally describe what we mean by PPM. On the photon efficiency achievable with any scheme satisfies
transmitter side: 1 1
C (E,0)≤log −loglog +O(1); (20)
• Thechannelusesaredividedintoframesofequallengths; PE E E
• In each frame, there is only one channel input that is and second, that
positive, which we call the “pulse”, while all the other
1 1
inputs are zeros;
C (E,c)−log +loglog
PE
• The pulses in all frames have the same amplitude. lim lim E E ≤−1. (21)
Onthereceiverside,wedistinguishbetweentwocases,which c→∞E↓0 logc
we call simple PPM and soft-decision PPM, respectively. In We prove (18) in Section IV, (19) in Section V, and (20)
simple PPM, the receiver records at most one pulse in each and (21) in Section VI.
frame;if morethanonepulseis detectedina frame,thenthat To better understand the capacity results in Theorem 1, we
frame is recorded as an “erasure”, as if no pulse is detected make the following remarks.
at all. In soft-decision PPM, the receiver records up to two
• Choosing c = 0 in (15) confirms that (11) is the correct
pulses in each frame; frames containing no or more than two
asymptotic expression up to the second-order term for
detected pulses are treated as erasures.
C (E,0). Compared to (10) this means that, on the
PE
pure-loss bosonic channel, for small E, restricting the
III. MAIN RESULTS AND DISCUSSIONS receivertousingdirectdetectioninducesalossinphoton
The following theorem provides an approximation for efficiency of about loglog 1 nats per photon. Note that
E
the photon efficiency C (E,c) in the high-photon-efficiency the capacity of the pure-loss bosonic channel can be
PE
regime. achievedusingcoherentinputstatesonly[7],so thisloss
isindeeddueto directdetection,andnotdueto coherent
Theorem1. ThemaximumphotonefficiencyC (E,c)achiev-
PE input states.
able on the Poisson channel (2) subject to constraint (3) and
Attempts to overcome this loss by employing other fea-
with dark current (4), which we define in (7), satisfies
sible detection techniques have so far been unsuccessful
1 1 [10], [12].
C (E,c)=log −loglog +O(1), c∈[0,∞). (15)
PE E E • The expression on the right-hand side of (15) does not
depend on c, so the value of c affects neither the first-
Furthermore, denote
order term nor the second-order term in C (E,c). In
1 1 PE
K(E,c),C (E,c)−log +loglog , (16) particular, these two terms do not depend on whether c
PE
E E
is zero or positive.
then • The first term in CPE(E,c) that c does affect is the third,
K(E,c) K(E,c) constantterm.Indeed,thoughwehavenotgivenanexact
lim lim = lim lim =−1. (17)
c→∞E↓0 logc c→∞E↓0 logc expressionfortheconstantterm,the asymptoticproperty
(17) shows that c affects the constant term in such a
Our second theorem shows that PPM is nearly optimal in
way that, for large c, the constant term is approximately
the regime of interest.
−logc.
Theorem2. Theasymptoticexpressionontheright-handside • Theorem 1 suggests the approximation
of (15) is achievablewith simple PPM. The limits in (17) are 1 1
achievable with soft-decision PPM. CPE(E,c)≈log −loglog −logc. (22)
E E
4
The terms constituting the approximation error, roughly these lower bounds, because the dependence between
speaking, are either vanishing for small E, or small the channel inputs introduced by PPM can only reduce
comparedtologcforlargec.Notethatifwefixthedark the total mutual information between channel inputs and
current, i.e., if we fix the product cE, then the first and outputs. These (PPM and on-off keying schemes) are
third terms on the right-hand side of (22) cancel. This is differentfromtheon-offkeyingschemeusedin[2]where
intuitively in agreement with [2, Proposition 2], which the“on”signalhasafixedamplitudethatdoesnotdepend
asserts that, for fixed dark current, photon efficiency on E. The latter on-off keying scheme, as well as any
scales like some constant times loglog(1/E) and hence PPM scheme with a fixed pulse amplitude (with respect
not like log(1/E). We note, however,that (17)cannotbe toE),isnotsecond-orderoptimalonthePoissonchannel.
derived directly using (15) together with [2, Proposition • BecauseinthePPMschemestoachieve(18)and(19)the
2], because we cannot change the order of the limits. In pulse tends to zero as E tends to zero, we know that, as
(17) we do not let E tend to zero and c tend to infinity claimedinSectionI,bothourtheoremsholdifaconstant
simultaneously.Instead,wefirstletE tendtozerotoclose (i.e., not approaching zero together with E) peak-power
downonto the constantterm in C (E,c) with respectto constraint as in (6) is imposed on X in addition to (3).
PE
E, and then let c tend to infinity to study the asymptotic
Wenextproceedtoproveourmainresults,andleavefurther
behavior of this constant term with respect to c.
discussions to Section VII.
• The approximation (22) is good for large c, but diverges
asctendsto zero.We henceneeda betterapproximation
for the constant term, which behaves like −logc for IV. PROOF OF THESIMPLE-PPM LOWERBOUND(18)
large c, but which does not diverge for small c. As both
the nonasymptoticboundsand the numericalsimulations Assume that E is small enough so that
we show later will suggest, −log(1 + c) is a good
1
approximation: Elog <1 (25a)
E
1 1
C (E,c)≈log −loglog −log(1+c). (23) E <e−c. (25b)
PE
E E
For modulation and coding considerations, we make some Consider the following simple PPM scheme:
remarks following Theorem 2.
Scheme 1.
• Theorem 2 shows that PPM is optimalup to the second-
order term in photon efficiency when c = 0. Hence, • The channel uses are divided into frames of length b, so
compared to the restriction to the receiver using direct each frame contains b input symbols x ,...,x and b
1 b
detection, the further restriction to PPM induces only a corresponding output symbols y ,...,y . We set
1 b
small additional loss in photon efficiency.
1
• Furthermore, for the third-order, constant term, (soft- b= . (26)
decision) PPM is also not far from optimal, in the sense Elog 1
(cid:22) E(cid:23)
that it achieves the optimal asymptotic behavior of this
term for large c. • Withineachlength-bframe,thereisalwaysoneinputthat
equalsη, andallthe other(b−1)inputsare zeros. Each
• In Section IV we show that
frame is then fully specified by the position of its unique
CPE-PPM(E,c) nonzerosymbol,i.e.,its pulse position.We considereach
1 1 3 frame as a “super input symbol” x˜ that takes value in
≥log −loglog −c−log(1+c)− +o(1).(24)
E E 2 {1,...,b}. Here x˜=i, i∈{1,...,b}, means
A careful analysis will confirm that the bound (24) is
tight in the regime of interest, in the sense that simple xi =η (27a)
PPM cannot achieve a constant term that is better than x =0, j ∈{1,...,b},j 6=i. (27b)
j
linear in c (while being second-order optimal). This is
in contrast to soft-decision PPM, which can achieve a To meet the average-power constraint (3) with equality,
constanttermthatislogarithmicinc.Inparticular,simple we require
PPM is clearly not third-order optimal.5
η =bE. (28)
• InthePPMschemesthatachieve(18)and(19),whichwe
describe in Sections IV and V, the pulse has amplitude
• The b output symbols y1,...,yb are mapped to one
1/ log(1/E) , which depends on E, and which tends to
“superoutputsymbol”y˜thattakesvaluein{1,...,b,?}
zeroas E tendsto zero.An on-offkeyingscheme having
(cid:0) (cid:1) in the following way: y˜ = i, i ∈ {1,...,b}, if yi is the
the same amplitude for its “on” signals can also achieve
unique nonzero term in {y ,...,y }; and y˜=? if there
1 b
is more than one or no nonzero term in {y ,...,y }.
5Aschemebetweensimpleandsoft-decisionPPMisthefollowing. When 1 b
detectingmultiplepulsesinaframe,thereceiverrandomlyselectsandrecords
Wehavethefollowinglowerboundonthephotonefficiency
one of the positions (possibly together with a “quality” flag). This scheme
outperformssimplePPMinphotonefficiency, butitsthird-ordertermisstill achieved by the above scheme.
linearinc:itscales like−c/2instead of−c.
5
Proposition 1. Under the conditions(25), Scheme 1 achieves W˜(?|i)=1−p −(b−1)p , i∈{1,...,b}. (48)
0 1
photon efficiency
We now have, irrespective of the distribution of X˜,
CPE-PPM(E,c) H(Y˜|X˜)=p log 1 +(b−1)p log 1
η cE 0 p 1 p
≥ 1− logb−cηlogb− 1+ log(1+c) 0 1
2 η 1
(cid:16) (cid:17) (cid:18) (cid:19) + 1−p0−(b−1)p1 log .
cE η 1−p −(b−1)p
0 1
+ 1+ log(1−cη)+log 1−
η 2 (cid:0) (cid:1) (49)
(cid:18) (cid:19)n (cid:16) (cid:17)o
−1− cE (29) DenotethecapacityofthissuperchannelbyC˜(E,c,b,η),then
η
C˜(E,c,b,η)=maxI(X˜;Y˜). (50)
where b and η are given in (26) and (28), respectively. PX˜
Note that the total input power (i.e., expected number of
Proof of (18) using Proposition 1: We plug (26) and
(28) into (29), note that detected signal photons) in each frame equals η. Therefore
we have the following lower bound on C (E,c):
PE-PPM
1 1
(cid:22)ElogE1(cid:23)= ElogE1 +O(1) (30) CPE-PPM(E,c)≥ C˜(E,c,b,η). (51)
η
and simplify (29) to obtain
It can be easily verified that the optimal input distribution
CPE-PPM(E,c) for (50) is the uniform distribution
1 1 3
≥log −loglog −c−log(1+c)− +o(1) (31) 1
P (i)= , i∈{1,...,b}, (52)
E E 2 X˜ b
1 1
=log −loglog +O(1). (32) which induces the following marginal distribution on Y˜:
E E
p +(b−1)p
P (i)= 0 1, i∈{1,...,b}, (53a)
When proving Proposition 1, as well as Propositions 2 Y˜ b
and 3, we frequently use the inequalities summarized in the PY˜(?)=1−p0−(b−1)p1. (53b)
following lemma. We can now use the above joint distribution on (X˜,Y˜) to
Lemma 1. For all a≥0, lower-bound C˜(E,c,b,η) as follows:
log(1+a)≤a, (33) C˜(E,c,b,η)
1−e−a ≤a, (34) =I(X˜;Y˜) (54)
1 =H(Y˜)−H(Y˜|X˜) (55)
1−e−a ≥a− a2. (35)
2 1
=(1−p −(b−1)p )log
ProofofProposition1: Wecomputethetransitionmatrix 0 1 1−p −(b−1)p
0 1
of the PPM “super channel” as follows: b
+(p +(b−1)p )log
0 1
W˜(i|i)=Pr[Y˜ =i|X˜ =i] (36) p0+(b−1)p1
1
b −(1−p −(b−1)p )log
0 1
=Pr[Yi ≥1|Xi =η] Pr[Yk =0|Xk =0] (37) 1−p0−(b−1)p1
1 1
kkY=6=1i −p0log −(b−1)p1log (56)
p p
0 1
b−1
= 1−W(0|η) W(0|0) (38) bp bp
0 1
=p log +(b−1)p log
=((cid:0)1−e−η−cE)(cid:1)e(cid:0)−(b−1)cE(cid:1) (39) 0 p0+(b−1)p1 1 p0+(b−1)p1
=e−(b−1)cE −e−η−bcE (40) (57)
p +(b−1)p
,p , i∈{1,...,b}; (41) =p logb−p log 0 1
0 0 0
p
0
W˜(j|i)=Pr[Y˜ =j|X˜ =i] (42) ≤log(cid:16)1+bpp01(cid:17)
=Pr[Yi =0|Xi =η]Pr[Yj ≥1|Xj =0] | p +{z(b−1)p}
−(b−1)p log 0 1 (58)
b 1
bp
· Pr[Y =0|X =0] (43) 1
k k
k∈/kY{=i,1j} =log(cid:16)1+p0b−p1p1(cid:17)≤p0b−p1p1
=W(0|η) 1−W(0|0) W(0|0) b−2 (44) ≥p logb−p lo|g 1+{bzp1 − b}−1(p −p ) (59)
0 0 0 1
p b
=e−η−cE((cid:0)1−e−cE)e−(cid:1)((cid:0)b−2)cE (cid:1) (45) (cid:18) 0 (cid:19) ≤1 ≤p0
=e−η−(b−1)cE −e−η−bcE (46) bp | {z }
≥p logb−p log 1+ 1 −|p{.z} (60)
,p1, i,j ∈{1,...,b}, i6=j; (47) 0 0 (cid:18) p0 (cid:19) 0
6
Next note that p can be upper-boundedas which indeed satisfy α ≥ 0 and β ≥ 1; and (74) follows by
0
dropping the term 1bcEη2logb, which is positive.
p =e−(b−1)cE −e−η−bcE (61) 2
0 Combining (28), (51), and (74) yields (29).
=e−(b−1)cE 1−e−η−cE (62)
≤1−e−η−cE (63)
(cid:0) (cid:1)
V. PROOF OF THESOFT-DECISION-PPM LOWER
≤η+cE, (64)
BOUND(19)
where the last step follows from (34). It can also be lower-
Consider the following soft-decision PPM scheme:
bounded as
Scheme 2. The transmitter performs the same PPM as in
p =e−(b−1)cE −e−η−bcE (65)
0 Scheme 1. The receiver maps the b output symbols to a
≥e−bcE −e−η−bcE (66) “supersymbol”yˆthattakesvaluein {1,...,b,?}∪ {i,j}⊂
= e−bcE (1−e−η) (67) {1,...,b} . The mapping rule is as follows. Take yˆ=i if y
i
(cid:8)
is the unique nonzero term in {y ,...,y }; take yˆ={i,j} if
≥1−bcE ≥η−1η2 (cid:9) 1 b
2 y and y are the only two nonzero terms in {y ,...,y }; if
i j 1 b
≥(|1{−zb}cE|) {ηz−}1η2 , (68) there are more than two or no nonzero term in {y1,...,yb},
2 take yˆ=?.
(cid:18) (cid:19)
wherethelastinequalityfollowsfrom(34)and(35),andfrom
Proposition 2. Under the conditions(25), Scheme 2 achieves
the fact that the first multiplicand on its right-hand side is (in
photon efficiency
fact, bothmultiplicandsare) nonnegativeforE satisfying (25)
and other parameters chosen accordingly. Also note that p
1 C (E,c)
PE-PPM(SD)
can be upper-boundedas
η cE
≥ 1− logb−cηlogb− 1+ log(1+c)
p =e−η−(b−1)cE −e−η−bcE (69) 2 η
1 (cid:16) (cid:17) (cid:18) (cid:19)
=e−η−(b−1)cE 1−e−cE (70) cE η cE
+ 1+ log(1−cη)+log 1− −1−
η 2 η
≤1 (cid:0) ≤cE (cid:1) (cid:18) (cid:19)n (cid:16) (cid:17)o
η c2E2 b
≤|cE. {z } (71) + 1− (b−1) cE − (1−cη)log
| {z }
2 2 2
Using (64), (68) and (71) we can continue the chain of (cid:16)c2 (cid:17) (cid:18) (cid:19)
inequalities (60) to further lower-bound C˜(E,c,b,η) as − η−c(η+cE) (78)
2
C˜(E,c,b,η) where b and η are given in (26) and (28), respectively.
1
≥(1−bcE) η− η2 logb Proof of (19) using Proposition 2: Simplifying (78) we
2
(cid:18) (cid:19) obtain
bcE
−(η+cE)log 1+ 1 1
(1−bcE) η− 21η2 ! lim CPE-PPM(SD)(E,c)−log +loglog
E E
−η−cE (72) E↓0(cid:26) (cid:27)
(cid:0) (cid:1)
3
1 bcE ≥−log(1+c)− . (79)
≥(1−bcE) η− η2 logb−(η+cE)log 1+ 2
2 η
(cid:18) (cid:19) (cid:18) (cid:19)
η This immediately yields (19).
+(η+cE) log(1−bcE)+log 1−
2 ProofofProposition2: Wecomputethetransitionmatrix
−η−cE n (cid:16) (cid:17)o (73)
ofthesuperchannelthatresultsfromScheme2.We firstnote
1 bcE
≥ η− η2 logb−bcEηlogb−(η+cE)log 1+
2 η Wˆ(i|i)=p , (80)
(cid:18) (cid:19) (cid:18) (cid:19) 0
η
+(η+cE) log(1−bcE)+log 1− Wˆ(j|i)=p1, i,j ∈{1,...,b}, i6=j, (81)
2
−η−cE. n (cid:16) (cid:17)o (74)
where p and p are given in (41) and (47), respectively. For
0 1
Here,(73)followsfromthefactthat,forallα≥0andβ ≥1, the remaining elements of the transition matrix we have
log(1+αβ)≤log(β+αβ)=logβ+log(1+α), (75) Wˆ {i,j} i =Pr[Y ≥1|X =η]Pr[Y ≥1|X =0]
i i j j
b
with the choices (cid:0) (cid:12) (cid:1)
(cid:12) · Pr[Y =0|X =0] (82)
ℓ ℓ
bcE
α= η , (76) ℓ∈/ℓY{=i,1j}
1 =(1−e−η−cE)(1−e−cE)e−(b−2)cE (83)
β = , (77)
(1−bcE) 1− 1η ,p , {i,j}⊂{1,...,b}; (84)
2 2
(cid:0) (cid:1)
7
Wˆ {j,k} i =Pr[Y =0|X =η]Pr[Y ≥1|X =0] side of (96) as
i i j j
·Pr[Y ≥1|X =0]
(cid:0) (cid:12) (cid:1) k k bp
(cid:12) b (b−1)p2log 2
2p +(b−2)p
· Pr[Y =0|X =0] (85) 2 3
ℓ ℓ
b p + b−2p
ℓ∈/{ℓY=i,j1,k} =(b−1)p2log2 −(b−1)p2 log 2 p 2 3 (97)
2
=e−η−cE(1−e−cE)2e−(b−3)cE (86)
,p3, {i,j,k}⊂{1,...,b}; (87) =log(cid:16)1+(b−2p22)p3(cid:17)≤(b−2p22)p3
b (b−1)(b−2)| {z }
Wˆ(?|i)=1−p0−(b−1)p1−(b−1)p2 ≥(b−1)p2log − p3 (98)
2 2
(b−1)(b−2)
− p (88) b b2
2 3 ≥(b−1)p2log − p3. (99)
2 2
,p , i∈{1,...,b}. (89)
4
Welower-boundthefourthtermontheright-handsideof(96)
We now have, irrespective of the distribution of X˜, as
H(Yˆ|X˜)=p0logp10 +(b−1)p1logp11 (b−1)2(b−2)p3log2p2+b(pb3−2)p3
1 (b−1)(b−2) 1 (b−1)(b−2) 2p +(b−2)p
+(b−1)p log + p log =− p log 2 3 (100)
2 p 2 3 p 2 3 bp
2 3 3
1
+p4logp4. (90) =log(cid:16)1+2(pb2p−3p3)(cid:17)≤2(pb2p−3p3)≤2bpp32
(b−1)(b−2) 2p | {z }
≥− · 2 (101)
Choosinga uniformX˜ (whichis optimalasin SectionIV), 2 b
we have the following distribution on Yˆ ≥−bp2. (102)
P (i)= p0+(b−1)p1, i∈{1,...,b}; (91) Using (34) and (35) we have the upper bound on p2
Yˆ b
2p +(b−2)p p =(1−e−η−cE)(1−e−cE)e−(b−2)cE (103)
P {i,j} = 2 3, {i,j}⊂{1,...,b}; (92) 2
Yˆ b ≤η+cE ≤cE ≤1
(cid:0)PYˆ(?(cid:1))=p4. (93) ≤|(η+c{Ez)cE,}| {z }| {z } (104)
Therefore
the lower bound on p
2
H(Yˆ) p = (1−e−η−cE) (1−e−cE)e−(b−2)cE (105)
2
b
= p0+(b−1)p1 logp0+(b−1)p1 ≥1−e−η≥η−η22 ≥cE−c22E2 ≥1−bcE
(cid:0) (cid:1)(b−1)(b−2) b | {ηz2 }| {cz2E2}| {z }
+ (b−1)p + p log ≥ η− cE − (1−bcE), (106)
2 2 3 2p +(b−2)p 2 2
(cid:18) (cid:19) 2 3 (cid:18) (cid:19)(cid:18) (cid:19)
1
+p log . (94) and the upper bound on p
4 3
p
4
p =e−η−cE(1−e−cE)2e−(b−3)cE (107)
Using (90) and (94) we have 3
≤1 ≤cE ≤1
I(X˜;Yˆ)=H(Yˆ)−H(Yˆ|X˜) (95) ≤c|2E{2z. }| {z } | {z } (108)
bp
0
=p log
0
p +(b−1)p From (96), (99), (102), (104), (106), and (108) we obtain
0 1
bp a lower bound on the additional mutual information that
1
+(b−1)p log
1 p +(b−1)p is gained by considering output frames with two detection
0 1
bp positions:
2
+(b−1)p log
2
2p +(b−2)p
2 3 I(X˜;Yˆ)−I(X˜;Y˜)
(b−1)(b−2) bp
+ p3log 3 . (96) b b2
2 2p2+(b−2)p3 ≥(b−1)p2log − p3−bp2 (109)
2 2
At this point, note that the first two summands on the right- η2 c2E2 b
≥(b−1) η− cE − (1−bcE)log
hand side of (96) constitute I(X˜;Y˜), which we analyzed in 2 2 2
(cid:18) (cid:19)(cid:18) (cid:19)
Section IV. It remains to find lower bounds on the last two b2
summands. We lower-bound the third term on the right-hand − c2E2−b(η+cE)cE. (110)
2
8
Dividing the above by η and plugging in (28) we obtain a where the supremum is taken over all allowed input distribu-
lower bound on the additional photon efficiency: tions. We choose R(·) to be the following distribution:
I(X˜;Yˆ)−I(X˜;Y˜)
η 1−(1+c)E, y =0
≥ 1− η (b−1) cE − c2E2 (1−cη)log b R(y)=(1+c)E 1− 1 1 y−1, y ≥1.
(cid:16) 2(cid:17) (cid:18) 2 (cid:19) 2 (cid:18) logE1(cid:19)(cid:18)logE1(cid:19)
c2 (120)
− 2 η−c(η+cE). (111) For anyx≥0 we have
Adding the right-hand side of (111) to the right-hand side of
(29) yields (78).
D W(·|x) R(·)
∞
VI. PROOF OF THEUPPERBOUNDS(20)AND(21) (cid:0) (cid:13) (cid:1) 1
= Poi(cid:13)sx+cE(y)·log −H(Y|X =x) (121)
Proposition 3. Assume that E is small enough so that R(y)
y=0
X
1
E <e−1 (112a) =e−(x+cE)log
1−(1+c)E
1 e−1−21e−1 ∞
Elog < (112b)
E 1+c − Pois (y)
x+cE
4
1 144 y=1
E log < , (112c) X
E (1+c)2 1 1 y−1
(cid:18) (cid:19) ·log (1+c)E 1−
then ( (cid:18) logE1(cid:19)(cid:18)logE1(cid:19) )
1 1 −H(Y|X =x) (122)
C (E,c)≤log −loglog −log(1+c)+2+log13
PE E E 1 ∞ 1
1 1 =e−(x+cE)log + Poisx+cE(y)yloglog
+ log −(1+c)E 1−(1+c)E E
E 1−(1+c)E Xy=1
(cid:18) (cid:19)
c2 1 1 =E[Y|X=x]=x+cE
+ Eloglog +(1+c)log . (113)
2 E 1 ∞ | {z }
1− log 1 + Poisx+cE(y)
E !
y=1
X
Proof of (20) using Proposition 3: The last three
summands on the right-handside of (113) are all of the form =1−e−(x+cE)
o(1). Setting c=0 in (113) we have | {z }
1 1
1 1 ·log +log
CPE(E,0)≤logE −loglogE +2+log13+o(1) (114) (1+c)ElogE1 1− log11
1 1 E
=log −loglog +O(1). (115) −H(Y|X =x) (123)
E E
1 1
=e−(x+cE)log +(x+cE)loglog
1−(1+c)E E
Proof of (21) using Proposition 3: From (113) we have
+(1−e−(x+cE))
1 1
lim C (E,c)−log +loglog
PE
E↓0(cid:26) E E(cid:27) 1 1
≤−log(1+c)+2+log13. (116) ·log +log
(1+c)Elog 1 1
E 1−
Hence log 1
E
1 1 −H(Y|X =x). (124)
C (E,c)−log +loglog
PE
E E
lim lim
c→∞E↓0 logc
We can lower-bound H(Y|X =x) as
−log(1+c)+2+log13
≤ lim (117)
c→∞ logc
=−1. (118) H(Y|X =x)≥H (e−(x+cE)) (125)
b
=e−(x+cE)(x+cE)
Proof of Proposition 3: Like in [2], we use the duality 1
+(1−e−(x+cE))log . (126)
bound [13] which states that, for any distribution R(·) on the 1−e−(x+cE)
output, the channel capacity satisfies
C ≤supE D W(·|X)kR(·) , (119) Using this and (124), we upper-boundD W(·|x) R(·) as
(cid:2) (cid:0) (cid:1)(cid:3) (cid:0) (cid:13) (cid:1)
(cid:13)
9
1
D W(·|x) R(·) ≤E + log −(1+c)E
1−(1+c)E
≤(cid:0)e−(x+cE(cid:13))log(cid:1) 1 +(x+cE)loglog 1 (cid:18) 1 (cid:19) 1
(cid:13) 1−(1+c)E E +(x+cE)loglog +(x+cE)log
E 1
+(1−e−(x+cE)) 1−
log 1
E
1−e−(x+cE)
1 1 +e−cE(1−e−x)log
·log(1+c)ElogE1 +log1− 1 (1+c)ElogE1
log 1 c2 1
E − cE − E2 loglog +x. (132)
1 2 E
−e−(x+cE)(x+cE)−(1−e−(x+cE))log (cid:18) (cid:19)
1−e−(x+cE) Using (119) and (132) together with constraint (3) we have
(127)
1 1 C(E,c)
=e−(x+cE)log +(x+cE)loglog
1−(1+c)E E ≤supE D W(·|X)kR(·) (133)
1 1
+(1−e−(x+cE))log −e−(x+cE)(x+cE) ≤E + (cid:2)log(cid:0) −(cid:1)(cid:3)(1+c)E
1 1−(1+c)E
1− (cid:18) (cid:19)
≤x+cE log 1 ≥cE 1 1
E +(1+c)Eloglog +(1+c)Elog
| {z } | {z } E 1
≥0 1−
+(1−e−(x+cE))l|og(11{−+zec−)E(xl}+ocgEE)1 (128) − cE − c2E2 loglog 1 +E logE1
2 E
1 (cid:18) (cid:19)
≤e−(x+cE) log −cE 1−e−(X+cE)
≤1 (cid:18) 1−(1+c)E (cid:19) +e−cEsupE (1−e−X)log (1+c)Elog 1 (134)
≥(1+c)E−cE≥0 (cid:20) E (cid:21)
| {z } 1
1 1 =Eloglog +2E
+(x+cE|)loglog {+z (x+cE)lo}g E
E 1
1− 1
log 1 + log −(1+c)E
E 1−(1+c)E
1−e−(x+cE) (cid:18) (cid:19)
+ (1−e−(x+cE)) log(1+c)ElogE1 (129) + c22E2loglogE1 +(1+c)Elog 1 1
=(1−e−cE)+(e−cE−e−(x+cE)) 1−
log 1
1 E
| {z }
≤ log −cE 1−e−(X+cE)
1−(1+c)E +e−cEsupE (1−e−X)log . (135)
(cid:18) 1 (cid:19) 1 (cid:20) (1+c)ElogE1(cid:21)
+(x+cE)loglog +(x+cE)log
E 1 Hence the maximum achievable photon efficiency satisfies
1−
log 1
E C (E,c)
PE
1−e−(x+cE)
+(e−cE −e−(x+cE))log C(E,c)
(1+c)ElogE1 = E (136)
1−e−(x+cE) 1 1 1
+(1−e−cE)log (130) ≤loglog +2+ log −(1+c)E
(1+c)Elog 1 E E 1−(1+c)E
E (cid:18) (cid:19)
1 c2 1 1
=E + log1−(1+c)E −(1+c)E + 2 EloglogE +(1+c)log1− 1
(cid:18) 1 (cid:19) 1 log 1
E
+(x+cE)loglog +(x+cE)log
E 1 e−cE 1−e−(X+cE)
1− + supE (1−e−X)log . (137)
log 1 E (1+c)Elog 1
E (cid:20) E(cid:21)
1−e−(x+cE) Wenextupper-boundtheexpectationontheright-handside
+e−cE(1−e−x)log
(1+c)Elog 1 of (137) as follows
E
c
+(1−e−cE)log 1−e−(X+cE)
(1+c)log 1 E (1−e−X)log
E (1+c)Elog 1
≥cE−c2E2 (cid:20) E (cid:21)
2 ≤−loglogE1 1−e−X 1−e−(X+cE)
| {z } =E X· log (138)
+(1−e−cE)l|og1−{ez−(x+cE}) (131) (cid:20) X (1+c)ElogE1(cid:21)
cE 1−e−X X+cE
≤cE ≤E X· log (139)
≤logx+cEcE≤cxE (cid:20) X (1+c)ElogE1(cid:21)
| {z }
| {z } ≤E ·supφ(x) (140)
10
where Plugging (149), (153) and (154) into (143) we have
φ(x), 1−e−x log x+cE , x≥0. (141) dφ(x) ≤−1log x+cE + 1 (155)
x (1+c)Elog 1 dx 6 (1+c)Elog 1 x
E E
We claim that, if E is small enough to satisfy all of the 1 1 1 1
≤− logx + log(1+c)− log
conditions in (112), then 6 6 6 E
12 ≥log12−loglogE1
supφ(x)=φ(x∗) for some x∗ ≤ log 1. (142) + 1logl|o{gz1} + 1 (156)
E 6 E x
To show this, we compute the derivative of φ(x) (with
≤112logE1
respect to x) as 1 1+c 1 1 1 1
≤ log + log|l{ozg} − log (157)
dφ(x) (1+x)e−x−1 x+cE 1−e−x 6 12 3 E 12 E
= log +
dx x2 (1+c)Elog 1 x(x+cE) 1 (1+c)2 1 4
E = log E log (158)
(143) 12 144 E
(cid:18) (cid:19) !
which exists and is continuous for all x > 0. Further note
<0, (159)
thatitis negativeforlargeenoughx, andis positiveforsmall
enough x. Hence x∗ must be achieved at a point where where the last step follows from (112c). We have now shown
dφ(x) dφ(x) 12
=0. (144) <0, <x<1. (160)
dx dx log 1
E
To prove (142), it thus suffices to show that, under the
Case 2. Consider
assumptions (112),
x≥1. (161)
dφ(x) 12
<0, x> . (145) In this case the logarithmic term in the derivative is still
dx log 1
E positive:
We do this separately for two cases.
x+cE 1
log ≥log >0, (162)
Case 1. Consider (1+c)Elog 1 (1+c)Elog 1
12 E E
<x<1. (146)
log 1 where the last step follows from (112b). We bound the other
E
terms as
Note that this case need not be considered if E ≥e−12. (1+x)e−x−1 1−2e−1
≤− (163)
x2 x2
Inthiscasethelogarithmicterminthederivativeispositive:
which is negative, and
x+cE x
log ≥log (147)
(1+c)Elog 1 (1+c)Elog 1 1−e−x 1
E E ≤ . (164)
12 x(x+cE) x2
>log (148)
(1+c)E log 1 2 Using (162), (163) and (164) together with (143) we have
E
>0, (149)
(cid:0) (cid:1)
dφ(x) 1−2e−1 1
wherethelaststepfollowsbecause(112a)and(112c)together ≤− log(x+cE)+log
dx x2 (1+c)Elog 1
imply E
≥0
1 2 12e−21 1
E log < . (150) + | {z } (165)
E 1+c x2
(cid:18) (cid:19)
Next,usingTaylor’stheoremwithCauchyremainder,wehave 1−2e−1 1
≤− log
x2 (1+c)Elog 1
(1+x)e−x−1 1 1 (3−x′)e−x′ E
=− + x− x2 (151) 1−2e−1 1
x2 2 3 24 + · (166)
x2 1−2e−1
1 1 ≥0 1−2e−1 e−1−21e−1
≤−2 + 3 x | {z } (152) =− x2 log(1+c)Elog 1 (167)
E
≤1 <0, (168)
1
≤− , |{z} (153)
6 wherethelaststepfollowsfrom(112b).Hencewehaveshown
where x′ ∈ [0,x]. The underbrace in (151) follows because dφ(x)
<0, x≥1. (169)
x′ ≤x<1. We also have dx
1−e−x x 1 Combining (160) and (169) proves (142) for all E satisfy-
≤ ≤ . (154)
x(x+cE) x(x+cE) x ing (112).