Table Of ContentPolarization as a novel architecture to boost the
classical mismatched capacity of B-DMCs
Mine Alsan and Emre Telatar
EPFL – I&C – LTHI
Lausanne, Switzerland
Email: {mine.alsan,emre.telatar}@epfl.ch
Abstract—We show that the mismatched capacity of binary is the class of binary input discrete memoryless channels
discrete memoryless channels can be improved by channel com- (B-DMC). Balakirsky [2] gave a computable expression for
bining and splitting via Arıkan’s polar transformations. We also
C (W) when W is a B-DMC. See [1], for a more detailed
d
show that the improvement is possible even if the transformed
survey on the topic and a more complete list of references.
channels are decoded with a mismatched polar decoder.
Index Terms—Mismatched capacity, mismatched polar de- OneoftherecentbreakthroughsincodingtheoryisArıkan’s
coder, polar transforms invention of polar codes [3]. A polar code of blocklength N
4
is specified by selecting an information set A over which
1
0 I. INTRODUCTION informationistobetransmittedandbyfreezingtheremaining
2 In various communication scenarios, we encounter sub- inputs to known values. More specifically, one starts by
n optimal decoders due to practical implementation constraints polarizingthechannelbytherecursiveapplicationofthebasic
a (complexity, feasibility requirements) or partial/missing chan- polarization transformations defined by Arıkan. The repeated
J nel information. In general, these decoders might perform application yields at stage n, a set of N = 2n synthetic
3 worse than optimal decoders which minimize the average channels (cid:8)W(i) : i = 1,...,N(cid:9). Then, the information set
2 N
decodingerrorprobabilityandpossiblyresultincapacityloss. is constructed by selecting the channel indices which are
Modeling such sub-optimal scenarios via ‘reliable commu- good for uncoded data transmission. Once the data uN is
] 1
T nication with a given decision rule’ and establishing coding transmitted, the SCD of polar codes will decode the outputs
I theorems for them allows one to assess the extent of any loss. yN using the following estimators [3]
. 1
cs To allow their study within a unified framework, decoders (cid:26) u , if i∈Ac
can be categorized based on generic definitions of their deci- uˆ = i , (1)
[ i f(i)(yN,uˆi−1), if i∈A
sion functions. Csisza´r and Narayan [1] studied and surveyed 1 1
1
the performance of those using additive decision rules: Given where Arıkan uses for f(i)(yN,uˆi−1) the ML decoding rule
v 1 1
7 a codeword set {x(1),...,x(M)} ∈ Xn, an (additive) d- over the i-th synthetic channel W(i).
N
9 decoder assigns a received sequence y, the message i = In fact, SCDs can be considered as another large family
0 1,...,M such that d(x(i),y) < d(x(j),y) for ∀j (cid:54)= i. based on successive cancellation decoding procedures. Offer-
6 (If there is no such i the decoder declares an erasure). The ingaquitedifferentdecodingparadigmthanadditivedecoders,
.
1 decision function is computed using the additive extension of a given SCD will decode the received output y in N stages
0 a single letter metric d(x,y). Note that the optimal maximum using a chain of estimators from i = 1,...,N each possibly
4 likelihood (ML) decoding rule is included in this class. In depending on the previous ones. The estimators uˆ can base
1 i
: addition, a family of decoders of practical interest within the their decisions on arbitrary single letter metrics of the form
v class correspond to mismatch decoders. In this case, the usual d (u ,yNuˆi−1). The SCD of polar codes, however, owes its
Xi MLruleiskept,butinsteadofthetruechannellawadifferent faimeino1t on1ly for yielding polar coding theorems proving the
r one is employed in the decision procedure. ‘symmetric capacity achievingness’ of polar codes for a large
a The transmission capacity of the channel W when decoded class of channels, but also for inheriting the low complexity
with an additive metric d is denoted by Cd(W). When structure of the recursive code construction process.
the metric d corresponds to the ML decoder with respect In [4] and [5], the performance of polar codes over an un-
a channel V, we denote the corresponding capacity by known B-DMC with mismatched successive cancellation de-
C(W,V). No closed form single letter expression is known codingisinvestigated.Themismatchedpolardecoderoperates
for C(W,V) or Cd(W). Single-letter lower bounds have according to decision functions f(i)(yN,uˆi−1) corresponding
1 1
been derived, but no converse for any of the lower bounds to the ML rule of mismatched channels V(i) synthesized by
N
exists, except for some special cases. Binary input binary
polarizing a B-DMC V different than the true communication
output channels are such a case where Cd(W) = C(W) or channelW.AlowerbounddenotedbyI(W,V)1isobtainedin
0 depending on whether or not the mismatch metric is in
‘harmony’ with the channel behavior [1]. Another exception 1SeeEq.(18)forthemathematicaldefinition
[5]ontheachievableratesbymismatchedpolarcodesoverany follows:
B-DMC; I(W,V) turns out to be, as the symmetric capacity (cid:88)
H(PW)=− PW(y)log PW(y), (2)
of the channel [3], a conserved quantity under the polar 2
y∈Y
transforms. While preserving the low complexity structure
(cid:88)
of the original polar decoder, the mismatched polar decoder with PW(y)= P(x)W(y|x).
extends the theory of channel polarization and polar codes to
x∈X
mismatched processes.
(cid:88)
H(W|P)=− P(x)W(y|x)log W(y|x). (3)
Let CP(W,V) denote the mismatched capacity of a polar 2
code transmitted over the channel W when decoded with y∈Y
a mismatched polar SCD designed with respect to another I(P,W)=H(PW)−H(W|P). (4)
channel V. The lower bound I(W,V) derived in [5] for
The following result due to Balakirsky gives a closed form
C (W,V) of B-DMCs is also a lower bound to C(W,V),
P
expression for C (W) when W is a B-DMC.
see Fischer [6]. On the other hand, no general order between d
Theorem 1: [2] For any B-DMC W and any additive
the polar mismatched capacity C (W,V) and the classical
P
decoding metric d(x,y)
mismatched capacity C(W,V) can be formulated: there are
examplesforwhichCP(W,V)>C(W,V)andalsoexamples Cd(W)=maxId(P,W), (5)
where the reverse inequality holds. P
The main goal of this paper is to show that: where
(g1) There are pairs of B-DMCs W and V for which I (P,W)(cid:44) min I(P,W(cid:48)), (6)
d
W(cid:48): PW(cid:48)(y)=PW(y)
C(W+,V+)+C(W−,V−)≥2C(W,V) d(P,W(cid:48))≤d(P,W)
(cid:88)
holds. with d(P,W)= P(x)W(y|x)d(x,y).
(g2) Furthermore, there exist cases for which x,y
Using this closed from expression, Balakirsky studies the
C (W,V)>C(W,V) computation of C (W) for symmetric B-DMCs when the d-
P d
decoder preserves the symmetry structure of the communica-
holds.
tion channel. In the following two examples, we revisit his
For that purpose, we shall study the evolution of the examples.
mismatched capacity of B-DMCs under the one-step Example 1: [2, Examples, Statement 1] For a binary input
polarizationtransformationswhenthecommunicationchannel binary output channel W and any given d-decoding rule
and the mismatched channel used in the decision procedure I (P,W)=I(P,W)1 holds with
d {A}
are both symmetrized by the same permutation, i.e., when
for some permutation π on the output alphabet satisfying A={sign(1−W(0|0)+W(1|1))
π =π−1, we simultaneously have W(y|0)=W(π(y)|1) and =sign(d(0,0)+d(1,1)−d(0,1)−d(1,0))}. (7)
V(y|0)=V(π(y)|1) for all output letters y.
So, C (W)=C(W) or 0.
d
Example 2: [2, Examples, Statement 2] Let W : X → Y
The rest of this paper is organized as follows. Section II
be a B-DMC with X = {0,1} and Y = {0,1,...,L−1}.
will briefly introduce the necessary definitions and existing
Suppose the transition probability matrix of the channel W
results. Then, some analytical computations and numerical
and the corresponding metrics for the additive d-decoder are
experiments will be presented in the Section III. Finally, the
given by
paper will close with some discussions.
(cid:34) (cid:35)
w w ... w
0 1 L−1
W = , (8)
w w ... w
II. PRELIMINARIES L−1 L−2 0
We start by presenting the results of [2] on the classical and (cid:34) (cid:35)
d d ... d
mismatched capacity of B-DMCs. 0 1 L−1
d= . (9)
Definition 1: [1] The d-capacity of a discrete memoryless dL−1 dL−2 ... d0
channel W, denoted by C (W), is defined as the supremum
d Then, the mismatched capacity is achieved for P uniform on
of rates R for which, for every (cid:15) > 0 and sufficiently large
{0,1} and is given by
block length n, there exist codes with d-decoding such that
logM C (W)=H(PW)−H(W(cid:48)|P), (10)
the code rate > R, and the maximum probability of d
n
error is less than (cid:15). where W(cid:48) is given by
Let W :X →Y be a B-DMC with X ={0,1}. We fix an (cid:34) w(cid:48) w(cid:48) ... w(cid:48) (cid:35)
input distribution P(x) on X and denote the transition proba- W(cid:48) = 0 1 L−1 , (11)
bilities of the channel by W(y|x). Some standard definitions wL(cid:48)−1 wL(cid:48)−2 ... w0(cid:48)
with Then, we define the process I(W ,V ) associated to the
n n
e−α.dy polarization process of both channels. Note that I(Wn) (cid:44)
wy(cid:48) =(wy+wL−1−y)e−α.dy +e−α.dL−1−y, (12) I(Wn,Wn) corresponds to the symmetric capacity process of
the channel. The following result shows that C (W,V) ≥
for y ∈ Y, and the parameter α ≥ 0 is chosen from the P
I(W,V).
condition:
(cid:88) (cid:88) Theorem 2: [5] Let W and V be two B-DMCs such that
w(cid:48)d = w d . (13)
y y y y I(W,V)>−∞. Then, I (W,V) is {0,1} valued with
∞
y∈Y y∈Y
For the rest of this paper, we will restrict the additive P[I (W,V)=1]≥I(W,V). (19)
∞
decoders to mismatched decoders. Let V : X → Y be a B- √
DMC symmetrized by the same permutation as the channel Moreover, the speed of convergence is O(2− N).
W defined in Example 2. Suppose the transition probability
matrix of V is given by
III. RESULTS
(cid:34) (cid:35)
v v ... v
0 1 L−1
V = . (14) Tobeginwith,wediscussatrivialimprovementofthelower
v v ... v
L−1 L−2 0 bound stated in Theorem 2. Letting x∗ = max{x,0}, the
Then, the corresponding additive decoder can be defined as bound in (19) can be improved initially as
in (9) by letting d = −logv , for y = 0,...,L−1. In this
y y P[I (W,V)=1]≥I(W,V)∗. (20)
case, the mismatched capacity equals ∞
C(W,V)=I(P,W(cid:48)) (15) Going a further step, we improve the bound to
1 1
where the transition probabilities of W(cid:48) can be computed by P[I (W,V)=1]≥ I(W−,V−)∗+ I(W+,V+)∗, (21)
∞ 2 2
replacing the relations in (12) and (13) by
α.v and more generally to
w(cid:48) =(w +w ) j , (16)
j j L−1−j α.vj +α.vL−1−j 1 (cid:88)
P[I (W,V)=1]≥ I(Ws,Vs)∗, (22)
and ∞ 2n
(cid:88)w(cid:48) logv = (cid:88)w logv . (17) s∈{+,−}n
y j y j
y∈Y y∈Y for any n=0,1,....
Now, we study the following example.
Next, we give the definition of the channel polarization
Example 3: Let W be a BSC of crossover probability (cid:15) ∈
process. From two independent copies of a given binary
[0,0.5] and V be the BSC of crossover probability 1−(cid:15). In
input channel W : X → Y, the polar transformations
this example, we will answer the following questions:
synthesize two new binary input channels W− : X → Y2
and W+ : X → Y2 × F with the transition probabilities (q1) Suppose we transmit over the channel W, but do mis-
2
match decoding with respect to V as shown in Fig. 1.
given by [3]
WhatisthemismatchedcapacityC(W,V)?Whatisthe
(cid:88)
W−(y1y2|u1)= 12W(y1|u1⊕u2)W(y2|u2) value of I(W,V)∗?
u2∈F2 (q2) Suppose we first apply the polar transforms to synthe-
W+(y y u |u )= 1W(y |u ⊕u )W(y |u ). size the channels W+, W− and V+, V−, and then
1 2 1 2 2 1 1 2 2 2
we communicate using the architecture given in Fig.
In analyzing the properties of this channel it is useful to
2. What are the mismatched capacities C(W+,V+)
introduce [7] an auxiliary stochastic process, W ,W ,...,
0 1 and C(W−,V−) in this case? What are the values of
defined by W :=W, and for n≥0
0 I(W+,V+)∗ and I(W−,V−)∗?
(cid:40)
W− with probability 1/2 (q3) Suppose we communicate over the channel W using
W := n
n+1 W+ with probability 1/2 polar coding, and we do mismatched polar decoding
n
with respect to the channel V. The communication
with the successive choices taken independently. In this way, architecture is shown in Fig. 3. What is the mismatched
Wn isuniformlydistributedoverthesetof2n channelsabove. capacity of polar coding CP(W,V)?
Finally, we introduce the results of [5] on the achievable
Oncetheanswersarederived,wewilldiscusshowthisexam-
rates by mismatched polar codes. Given two B-DMCs W and
ple helps us achieve the two goals we set in the introduction.
V, I(W,V) is defined as
(a1) In this case, the crossover probabilities of the BSCs are
(cid:88) (cid:88) 1 V(y|x)
I(W,V)= W(y|x)log . not in harmony. By the result given in Example 1, we
y x∈{0,1}2 2 12V(y|0)+ 12V(y|1) conlcude C(W,V) = 0. As C(W,V) ≥ I(W,V), we
(18) have I(W,V)∗ =0.
∆(W,V)
ML
m Enc W mˆ
for V
0.5
Fig.1. Classicalmismatcheddecoding.
0.1
m− Enc− U¯1 W Y¯1 0.1 1 C(W,V)
U¯
m+ Enc+ 2 W Y¯2
Y¯1 ML
mˆ− Fig.4. C(W,V)versus∆(W,V)for|Y|={2,3}.
Y¯ for V−
2
∆(W,V)
U¯ (mˆ−)
1
0.5
Y¯1 ML
mˆ+
Y¯ for V+
2
0.1
Fig.2. One-steppolarizationarchitectureformismatcheddecoding.
C(W,V)
0.1 1
Pol. Dec
m Pol. Enc W mˆ
for V
Fig.3. Polarmismatcheddecoding. Fig.5. C(W,V)versus∆(W,V)for|Y|=4.
(a2) Itisknownthatafterapplyingtheminuspolartransform numericalexperimentswecarriedforrandompairsofchannels
toaBSCofcrossoverprobabilityα∈[0,1],thesynthe- with various output alphabet sizes. Let
sized channel is also a BSC, and with crossover proba-
bility2α(1−α).So,bothW−andV−arethesameBSC ∆(W,V)(cid:44)C(W+,V+)+C(W−,V−)−2C(W,V). (23)
withcrossoverprobabilityof(cid:15)− =2(cid:15)(1−(cid:15)).Therefore,
The numerical experiments show that:
themismatchedcapacityoftheminuschannelequalsits
matched capacity, i.e., C(W−,V−) = C(W−) = 1− (i) When the output alphabet is binary or ternary, one-
h((cid:15)−).Similarly,I(W−,V−)∗ =C(W−)=1−h((cid:15)−). step improvement, i.e. ∆(W,V) > 0, happens only
(a3) We have already seen that V− = W−. It is easy to when C(W,V) = 0 and can be as large as 1/2. When
see that while V+ (cid:54)= W+, one has V+− = W+−, C(W,V) > 0, we observe that ∆(W,V) = 0, and one
and indeed, V++− = W++−, .... Consequently, for has neither improvement nor loss. See Fig. 4.
any sequence sn = s ,...,s of polar transforms (ii) Whentheoutputalphabetcontainsfourormoresymbols,
1 n
Vsn = Wsn except when sn = +···+. Thus, improvement may happen not only when C(W,V) = 0
I(Wsn,Vsn)=I(Wsn) for all sn except +···+, and but also when C(W,V) > 0; however, there are cases
we see that C (W,V) = C(W), and we arrive at our when C(W+,V+)+C(W−,V−) < 2C(W,V), so one
P
second goal (g2). may encounter a loss, i.e. ∆(W,V)<0, after a one-step
transformation. See Fig. 5.
After such a motivating example, one is curious about Finally, the numerical experiments also suggest that
whethertheimprovementweillustratedforthespecificpairof
(cid:88) 1
BSCs extend to other pairs of mismatched B-DMCs as well. I(Ws,Vs)∗ ≤C(W,V) (24)
2n
To satisfy one’s curiosity, we now present the results of the
s∈{+,−}n
holds whenever C(W,V) > 0, (we did experiment only
for n = 1,2,3). Thus, C (W,V) ≤ C(W,V) is a likely
P
conjecture for the case where C(W,V)>0.
IV. DISCUSSIONS
The study by Balakirsky [2] gave the initial impulse for
this work: We adapted his example [2, Examples, Statement
2] which computes C (W) of a symmetric B-DMC W when
d
the additive decoder shares the symmetry structure of W to
mismatched decoders, and we carried numerical experiments
based on this computation to compare C(W,V) with various
other quantities such as the sum C(W+,V+)+C(W−,V−).
The experiments reveal that, as opposed to I(W,V),
C(W,V) is not necessarily a conserved quantity under the
polar transformations. Nevertheless, communication rates
higher than C(W,V) can be achieved in some cases by
integrating the polarization architecture of Arıkan into the
classical mismatched communication scenarios. Furthermore,
by studying a specific pair of BSCs W and V for which
C(W,V) = 0, but C (W,V) > 0, we showed that there
P
exists channels for which the polar transformations strictly
improve the mismatched capacity of B-DMCs.
ACKNOWLEDGMENT
This work was supported by Swiss National Science Foun-
dation under grant number 200021-125347/1.
REFERENCES
[1] I.CsiszarandP.Narayan,“Channelcapacityforagivendecodingmetric,”
in Information Theory, 1994. Proceedings., 1994 IEEE International
Symposiumon,1994,pp.378–.
[2] V. Balakirsky, “Coding theorem for discrete memoryless channels with
givendecisionrule,”inAlgebraicCoding,ser.LectureNotesinComputer
Science,1992,vol.573,pp.142–150.
[3] E. Arıkan, “Channel polarization: A method for constructing capacity-
achievingcodesforsymmetricbinary-inputmemorylesschannels,”IEEE
Trans.Inf.Theory,vol.55,no.7,pp.3051–3073,2009.
[4] M. Alsan, “A lower bound on achievable rates over B-DMCs with
mismatched polar codes,” in Proc. IEEE Inf. Theory Workshop, 2013,
pp.6–10.
[5] ——,“Universalpolardecodingwithchannelknowledgeattheencoder,”
Available:arXiv:1311.7590,2013.
[6] T.R.M.Fischer,“SomeremarksontheroleofinaccuracyinShannon’s
theoryofinformationtransmission,”inTrans.oftheEighthPragueCon-
ference,ser.CzechoslovakAcademyofSciences. SpringerNetherlands,
1978,vol.8A,pp.211–226.
[7] E.ArikanandI.Telatar,“Ontherateofchannelpolarization,”inProc.
IEEEInt.Symp.Inf.Theory,2009,pp.1493–1495.