Table Of ContentOnly accessible information is useful: insights from gradient-mediated patterning
Mikhail Tikhonov,1 Shawn C. Little,2,3 and Thomas Gregor4,5
1Center of Mathematical Sciences and Applications,
Harvard University, Cambridge, MA 02138, USA
2Howard Hughes Medical Institute
3Department of Molecular Biology
4Joseph Henry Laboratories of Physics
5Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, NJ 08544, USA
Information theory is gaining popularity as a tool to characterize performance of biological sys-
tems. However, information is commonly quantified without reference to whether or how a system
couldextractanduseit;asaresult,information-theoreticquantitiesareeasilymisinterpreted. Here
5 we take the example of pattern-forming developmental systems which are commonly structured as
1 cascades of sequential gene expression steps. Such a multi-tiered structure appears to constitute
0 sub-optimal use of the positional information provided by the input morphogen because noise is
2 addedateachtier. However,theconventionaltheoryfailstodistinguishbetweenthetotalinforma-
tioninamorphogenandinformationthatcanbeusefullyextractedandinterpretedbydownstream
y
a elements. We demonstrate that quantifying the information that is accessible to the system nat-
M urally explains the prevalence of multi-tiered network architectures as a consequence of the noise
inherent to the control of gene expression. We support our argument with empirical observations
from patterning along the major body axis of the fruit fly embryo. Our results exhibit the limi-
7
tations of the standard information-theoretic characterization of biological signaling and illustrate
] how they can be resolved.
N
Keywords: informationtheory/geneticregulation/developmentalbiology/Drosophila
M
.
o As an inspiring example of productive collaboration mediated patterning circuits. For a complex multicel-
bi between computer science, physics and biology, informa- lular organism, the reliability of its developmental pro-
- tiontheoryisgainingpopularityasatooltocharacterize gram directly determines the probability of reaching re-
q
performance of biological systems. Although is may not productive age; therefore, low error rate and/or high er-
[
have become the “general calculus for biology”, as pre- ror tolerance are likely to be key determinants of the
2 dicted by Johnson in his 1970 review [1], the scope of its structures of developmental circuits [9, 10]. Why, then,
v
applications has been steadily expanding: from the ear- are so many patterning circuits structured as a cascade
2
liest work measuring the information content in DNA, of several signaling steps, each of which is susceptible
4
3 RNA and proteins to topics like neuroscience, collective to loss of information due to noise inherent in biological
7 behavior, ecology, developmental biology, genetic regula- control? We will see that treating information content
0 tion and signaling [2–5]. of patterning cues as a one-size-fits-all method to char-
.
1 acterize system performance erroneously predicts that a
Specifically in the context of biochemical signaling,
0 single-step readout strategy should be dominant in de-
several recent reviews make compelling arguments that
5 velopment. To understand the advantages of the multi-
1 the mutual information between input and output of a
tiered architectures observed in real systems, it is es-
: signaling pathway is not just a useful quantity, but is
v sential to distinguish between the total information in
in fact the “only natural framework” for characterizing
Xi the performance of such systems. However, implicit in a morphogen and information that can be usefully ex-
tracted and interpreted. We support our reasoning with
r these arguments is the assumption that the “output” in
a experiments on the well-studied segmentation gene net-
question is the final target of signaling, the functionally
work responsible for anterior-posterior patterning in the
relevant phenotypic trait. Unfortunately, in biological
Drosophila embryo.
applicationsofinformationtheoryinformationcontentis
usually assessed for signals that constitute intermediate
steps,mostcommonlytranscriptionfactors,forexample,
Multi-tier architecture in gradient-mediated pat-
NF-κB[6,7]orDrosophilapatterningcues[8]. Suchsig-
terning. In many developing embryonic systems, cellu-
nals,however,stillneedtobeinterpretedbydownstream
lar identities are conferred by graded input signals that
processes. Therefore,theinformationtheycarryisuseful
induce dose-dependent gene expression programs as out-
only to the extent that it can be extracted and used by
puts [11, 12]. Such graded inputs, termed morphogens,
the system. As we will demonstrate, failure to recognize
often function as diffusible molecules produced by a lo-
this can easily cause information-theoretic quantities to
calized expression source [13, 14]. Localized expression
be misinterpreted.
generates concentration gradients in a field of otherwise
To show this, we take the example of gradient- naive and identical cells (presented in simplified form as
2
aone-dimensionalarrayinFig.1). Cellsactivatespecific diffusible signaling molecules. These signals subdivide
expression programs in response to the local morphogen the prospective brain into relatively large fore-, mid-,
concentration c(x). When c correlates closely with dis- and hindbrain territories, which are then segmented into
tance x from the source, such gradients carry a large smaller subunits by additional signaling activity [21–23].
amount of “positional information” [15] quantified via Similar patterns of broad subdivision followed by short-
the mutual information I[c(x),x] [8, 16]. In principle, a rangerefinementarefoundduringthespecificationofthe
morphogengradientcarryingsufficientinformationcould vertebrate neural crest by reiterated rounds of extracel-
induce in each cell the gene expression program appro- lularsignaling[24];intheformationofsegmentedmuscle
priate for its position, thus generating the required spa- precursors(somites)byFGFandNotchfollowedbyshort
tial arrangement of cell fates [17] (Fig. 1A). In the most rangeEphrinactivity[25,26];thedorsal-ventralpattern-
straightforward model, assuming the input morphogen ingoftheDrosophila bodyaxis,firstbyagradientofNF-
is sufficiently reproducible [18], local morphogen concen- κB activity (also called Dorsal) and then by members of
tration is directly interpreted by each cell, i.e., the lo- the BMP family of secreted signaling molecules [27, 28];
cal input activates all genes required at a given posi- and also in the fruit fly, the patterning of the anterior-
tion, with no additional cycles of gene expression mod- posterior (AP) axis by gradients of diffusible transcrip-
ulation. A central tenet of information theory, the in- tion factors within the shared cytoplasm of the nuclear
formation processing inequality, states that each trans- syncytium [17, 29, 30].
mission or processing step can only reduce the total in- Theseexamplesandothersillustrateacommontheme
formation contained in a signal. Direct decoding might where long range signaling gradients subdivide a large
therefore be expected to dominate in early development fieldintosmallerdomains,withinwhichthepatternedex-
as the optimal strategy for transmitting positional in- pression of secondary factors establishes elaborated pat-
formation. This expectation seems all the more valid terns (Fig. 1B). Since each cycle of transcription and
given the widespread observation that the processes of translation introduces more noise, the widespread use of
transcription and translation exhibit considerable intrin- the multi-tieredarchitectureappearsto conflict withthe
sic variability, or noise [19, 20]. Thus information loss in expectation that development should favor circuits ex-
generegulatoryprocessesshouldbeparticularlynotable. hibiting efficient information utilization.
Therefore, from the perspective of information theory, This apparent conflict arises because Shannon’s infor-
itissurprisingthatmanygradient-basedsystemsexhibit mation content of a signal [16] has two important lim-
a multi-tiered architecture in which reiterated cycles of itations. First, the information content of a patterning
transcription and translation are required to attain pat- cue or other biological signal is defined locally in space
terning goals (illustrated in Fig. 1B). For example, in and time, whereas its interpretation is non-local, and in-
the vertebrate central nervous system, the unpatterned stead occurs over time and frequently involves diffusive
neuroectodermexhibitsagradeddistributionofmultiple signals. Forthisreason,thenaiveapplicationofinforma-
tion processing inequality in these systems is incorrect,
andthelocal,instantaneousinformationcontentinasig-
A B
nal does not in fact provide an upper bound for the per-
Direct readout Multi-tier readout
formance of downstream processes interpreting this sig-
c(x) c(x)
nal [7, 31, 32]. Second, the same amount of information
? ? ? ? ? ? ? ? ? ? ? ? can be encoded in formats that are more or less easy for
(noisy) the system to access, since the interpreting circuit is it-
selfsubjecttonoise. Thus,thelocalinformationcontent
(noisy)
of a signal is neither an upper bound nor a fair estimate
(noisy) of the amount of information this signal can “transmit”
A B C D E F A B C D E F tothedownstreamcircuit. Thisiswellillustratedbythe
recent experimental work on ERK, calcium and NF-κB
FIG. 1. Direct versus multi-tiered decoding strategies for pathways [7]. If the output of any of these pathways is
gradient-mediated patterning. (A) Direct decoding: to re- reducedtoasinglescalar,itisfoundtotransmitverylit-
duce noise introduced by intrinsically variable gene expres- tle information about the input. If the output is treated
sion, patterning proceeds through a single cycle of transcrip-
asadynamicalvariable,itsapparentinformationcontent
tion and translation. Differences in morphogen input c(x)
increases considerably [32]. Neither of these quantities,
directly specify gene expression programs A-F along axis x.
however, can be interpreted before it is established what
(B) Multi-tiered decoding: morphogen first elicits expression
of short range diffusible factors in domains spanning several fractionofthatinformationcanactuallybeextractedand
cells. ThesegeneproductstheninduceprogramsA-Fthrough used by the system. Here we use a simplified model to
a second cycle of transcription/translation. The added step illustratetheselimitationsofwhatwecall“raw”informa-
introducesadditionalgeneexpressionnoise,reducingpattern- tioncontent,contrastingitwith“accessibleinformation”
ing information compared to direct decoding (A).
that we introduce.
3
Results
A Direct
0
access
An abstract gradient response problem. A one-
dimensional array of cells i located at positions (0 < access
x <L) is exposed to a noisy linear gradient of an input
i B Two-tier
morphogen c(x) spanning the range [0,cmax]. To build Amplify (λ) + Average (N )
eff
intuition, we will assume the noise of input c(x) to be
c(λ)
Gaussian, of constant magnitude σ , and uncorrelated
0
between cells1: c(x ) ≡ c = (x /L)c +σ , where σ
i i i max i i
are i.i.d., drawn from a Gaussian of width σ (Fig. 2A).
0
Cellsrespondtomorphogenc(x)bymodulatinggeneex- access
0
pression through intrinsically noise-prone signal trans-
duction and transcription/translation processes. We will
modelthisresponseasacompositionofthreesteps,three
elementary operations that constitute the “toolkit” with FIG. 2. The two patterning strategies. A: In the direct
strategy,targetgenesarecontrolleddirectlybyc. B:Thetwo-
whichcellscanaccessandprocessinformationcontained
tier strategy involves a second patterning factor c(λ); target
in patterning cues: access, amplify, and average.
genes are separated from the input by two tiers of “access”
Let gout be a gene product whose expression is con-
operations. Left, raw information content. Right, accessible
trolledbyc(x). Thesimplestreadoutisachievedbyplac- information content.
ing gene gout under the control of a promoter that is re-
sponsivetocandbyaccumulatingtheoutputproteinfor
sometimeτ. Inourmodel,weexpresstheamountofgout mRNA and protein (minutes), cells can perform tem-
produced during this time by a cell i as gout = F(cest), poral averaging by allowing stable gene products to ac-
i
wherecest isanoisyestimateofthetrueconcentrationc cumulate [35]: if T is the time available for pattern-
i i
thatthesystemcouldobtainintimeτ (“access”),andF ing, the system can effectively perform T/τ access op-
is some deterministic input-output function (“amplify”); erations. In addition, the production of soluble factors
for simplicity, we first consider F to be pure linear am- that can be shared between cells gives rise to spatial av-
plification with coefficient λ, denoted F . The “access” eraging[35,36]. Bothtypesofaveragingofferthesystem
λ
operation is the key element of our framework. Specifi- some capacity to perform multiple measurements of the
cally, we write input, which we capture formally by an averaging oper-
ator G . Here N indicates the effective number of
ceist =ci+ηi, indepenNdefefnt measureeffments, so that application of GNeff
toamorphogen, bydefinition, reducesexpressionfluctu-
where η reflects the intrinsic stochasticity of transcrip-
i ations by a factor 1/N .
eff
tionand,inprinciple,manyothernoisesources. Herewe
We distinguish between two patterning strategies. In
willmodelη simplyasbeingdrawnfromaGaussiandis-
i the first (“direct strategy”; Fig. 2A), cell-fate-specific
tribution of width η . In other words, we postulate that
0 target genes are controlled directly by c and no other
each “access” operation takes time τ and comes at the
patterning factors are involved. Any available averaging
price of corrupting the signal with extra noise of magni-
mechanisms are applied to c itself. In the second (“two-
tude η .
0 tier”) strategy, cells perform an amplifying readout of c
The final toolkit operation is averaging. Because pat-
with input-output function F to establish a spatial pro-
λ
terning systems typically act over durations that are
fileofasecondfactorc(λ)(Fig.2B).ThepatteringtimeT
long(hours)comparedtothetimerequiredtosynthesize
is spent on accumulating and averaging c(λ). Mathemat-
ically, in the two scenarios, the cell-fate-specific target
genes are controlled by:
1 Theassumptionofuncorrelatednoiseisintentionallystrong. In
a real system, correlated noise can be introduced, for example, c(0) =GNeff[c] (direct strategy) (1)
byvariationsinthetotalamountofmorphogendepositedmater-
c(λ) =G [F (c+η)] (two-tier strategy) (2)
nally. Thesefluctuations,whichcannotbereducedbyaveraging, Neff λ
leadtoimperfectreproducibilityofmorphogenactivityatagiven
locationacrossmultipleembryos. Muchworkhasfocusedonin- We now ask: when, if ever, does the noisy amplification
vestigating the limitations imposed on patterning by this type step of the two-tier strategy provide a benefit to the sys-
of fluctuations [8, 33, 34]. In contrast, our model is applica-
tem?
ble for understanding the effects of imperfect precision of gene
expression (at a given location within the same embryo). The
distinctionbetween“raw”and“accessible”informationdoesnot
relyontheassumptionofuncorrelatednoise.
4
Standard information-theoretic considerations do A 𝑐 𝑐(𝜆) B
𝐹𝑧
not explain the benefits of amplification. The po- 𝜆
𝐼raw
sitional information carried by a linear morphogen c(x) 𝑧(𝜆)
withdynamicrangec andnoiseσ , whichwecallthe
max 0 𝑥 𝐼acc 𝑥
“rawinformationcontent”ofageneexpressionprofile,is
before noisy amplification after 0 𝑐 𝑐max
given by
(cid:18) (cid:19) FIG. 3. A: Noisy amplification can increase accessible infor-
c
I [c(x),x]=ln √max mation even if raw information is reduced. Inner error bars
raw
σ0 2πe arethesignalvariabilityandincreasewhenamplificationadds
new noise, reducing I . Outer error bars represent the sig-
raw
(see Supplementary Information). It depends only on nal observed by the noisy cell machinery (corrupted by noise
theratioφ=cmax/σo; forconvenience, wedefineI(φ)≡ η0). After amplification, the relative importance of η0 is re-
(cid:16) (cid:17)
ln √φ , which is an increasing function of φ. duced, increasing Iacc. B: The “segmentation” input-output
2πe function Fz for integer λ (here λ=3) preserves the dynamic
λ
Let us compare the two patterning strategies from range of morphogen concentration. Locations such as those
the point of view of the raw information content car- indicated by dots now have identical expression levels of z(λ)
ried by the controlling signal. In the direct strat- (the y axis), but can be distinguished using the input mor-
egy (1), the application of G reduces the input noise phogen c (the x axis on this plot).
√ Neff
to σ / N and so the controlling signal c(0) carries
o eff
(cid:16) (cid:17)
Ir(a0w) =I cm√ax bits of raw information. In the two- The benefits of the multi-tiered strategy lie in
σ0/ Neff
tierstrategy(2),theamplifiedprofilec(λ)ischaracterized making the “raw” information more accessible.
bynoiseξλ =λ(cid:113)σN02+efηf02,anditsrawinformationcontent Tmhueltib-teineerfisttsraotfeagmypbleificocmateiocnleaanrdwthheenawdveaonbtasegrevseotfhtahte,
is therefore
due to the intrinsic noise in the regulatory readout, the
(cid:18) (cid:19) (cid:32) (cid:115) (cid:33) raw information content is an inadequate measure of a
λc N
I(λ) =I max =I c eff <I(0). (3) morphogen’s usefulness to the system. The purpose of a
raw ξ max σ2+η2 raw
λ 0 0 morphogenistoactivatedownstreamprocesses;therele-
vant quantity is therefore not the amount of information
Averaging mitigates the loss of positional information
a morphogen carries, but the amount of information it
when using a noisy readout [36]. If N is sufficiently
eff can transmit to its downstream targets. Since biologi-
large, the amplified and averaged profile carries even
cal control is intrinsically noisy, the two quantities are
more information than the original input. (Note that
distinct.
the information processing inequality is not violated, as
Our model was designed to make this particularly
itstatesonlythattheoutputcannotcarrymoreinforma-
clear: since the system can never access the true concen-
tion than N independent copies of the input.) Never-
eff trationc,butonlyanoisyestimatecest,I [c]isbeyond
raw
theless, applying averaging directly to the input (the di-
thesystem’sreach. Wedefineaccessible informationina
rect strategy) always yields more raw information; thus,
morphogenI astheamountofinformationthesystem
acc
the multi-step scenario appears inferior to a direct read-
can access in time τ:
out.
In real systems, the three operations we treat as in- I [c]≡I [cest]=I [c+η], (4)
acc raw raw
dependent may be mechanistically linked. For example,
where η, again, is a Gaussian noise of magnitude η
if c(x) is an intracellular factor while spatial averaging 0
within our model.
requires a small diffusible molecule, then performing an
The amount of accessible information provided by the
extrareadoutcanprovideaccesstoanotherwiseunavail-
direct strategy (Fig. 2B) is given by
able averaging mechanism. By assuming that the two
strategies (1) and (2) can benefit from equal amounts of
anvoeisreagainndg,iswohbivchioiunsloyubremneofidceilals,imwpelcyanrefdouccuesssepxepcrifiescsailolny Ia(0cc) =I(cid:113) σc02ma+x η2 (5)
on the effect of signal amplification. Multi-tier pattern- Neff 0
ing proceeds through rounds of amplification: small dif- whereas for the amplified profile c(λ) it is
ferencesininputresultinlargedifferencesingeneexpres-
sion so as to establish increasingly sharp boundaries de-
λc c
limitingexpressiondomains[37],yetinourexpression(3) Ia(λcc) =I(cid:113) max =I(cid:113) max .
for the information content of the amplified profile c(λ), λ2σ02+η02 +η2 σ02+η02 + η02
Neff 0 Neff λ2
the amplification factor λ cancels out. Thus, consider- (6)
ations based on raw information content fail to explain The amplification factor λ no longer cancels out in (6);
the prevalence of signal amplification. amplifying dynamic range is beneficial, since it reduces
5
the relative importance of the intrinsic readout noise Replacing information content of a single profile by this
(Fig.3A).Comparing (5)and(6), wefindthattheextra joint information, our argument demonstrating that am-
tier of noisy amplification is beneficial if and only if plification increases accessible information can now be
repeated verbatim [40], and we again find that the ex-
(cid:18) (cid:19)
η2 1− 1 − 1 >0 (7) tra readout is beneficial as long as (7) is satisfied. Note,
0 N λ2 however,thatonitsown,z(λ) maycarryless information
eff
than the original morphogen c. The easiest way to see
Note that the condition (7) is never satisfied if Neff = 1 this is to compare their noise levels:
(no averaging) or λ = 1 (no amplification). Intuitively,
our argument demonstrates that the patterning system (cid:18)ξ (cid:19)2 λ2 (cid:18) η2(cid:19)
is a mechanism that invests some effort into making a λ = 1+ 0
σ N σ2
careful measurement (N > 1) and encodes this infor- 0 eff 0
eff
mation in a more accessible format where steeper con- Iftheeffectofamplificationisstrongerthanthatofaver-
centration changes (λ > 1) can be interpreted with a aging, we find ξ /σ >1. In this scenario, the amplified
λ 0
faster, and therefore noisier readout. This mechanism profile z(λ) has the same dynamic range but lower pre-
is useful precisely because regulatory readout is intrin- cision than the original morphogen c, and therefore, on
sically noisy, otherwise direct readout would have been its own, carries less information (whether raw or acces-
the better strategy. In other words, to understand the sible). This shows that evaluating the usefulness of a
purpose of the patterning system, it is essential to dis- particularcuefrominformation-theoreticstandpointcan
tinguish between the total information in a morphogen lead to misleading results, unless all other relevant cues
and information that can be usefully extracted and in- (which are often hard to establish) are taken into ac-
terpreted. count simultaneously. Here, we demonstrated that sys-
tems can benefit from multi-tiered interpretation even
Multiple tiers improve gradient interpretation in cases where intermediate steps occur at a net loss of
even when raw information decreases. So far we information, increasing noise.
consideredtheinformationcontent(raworaccessible)in
each tier separately. However, in principle, downstream The multi-tier structure of Drosophila segment
processescouldaccessallpatterningcuesandnotsimply patterning increases information accessibility. In
thefinaltier[38,39]. Asaresult,extrareadouttierscan this system, segmentation of the AP axis proceeds
bebeneficialevenwhentheycarryverylittleinformation through four tiers of gene activity, termed maternal gra-
on their own. dients, gap genes, pair-rule genes, and segment polarity
To see this, consider the input-output function Fz de- genes [30]. The sequential activity of each tier subdi-
λ
picted in Fig. 3B. In some respects, it is more realistic vides the naive blastoderm into smaller domains of gene
than the purely amplifying linear readout Fλ considered expressionwithincreasinglysharpboundaries,culminat-
above,sincerealpatterningsystemsmustoperatewithin ing in the designation of each row of cells with its own
a limited global dynamic range of morphogen concentra- unique set of expressed genes (Fig. 4A). This process is
tions. Let z(λ) be the morphogen profile established by subjecttotranscriptionalnoisewithalargeintrinsiccom-
the new Fz-shaped readout of c; it has noise magnitude ponent [35], as well as several other noise sources with
λ
ξλ (same as the noise in c(λ)), but is folded onto itself different signatures [41–44]. No single value of η0 ade-
λ times, reminiscent of the spatially reiterated expres- quately characterizes such readout noise. Nevertheless,
sion of genes involved in Drosophila axis segmentation. we can gain important insight by computing Iη0 [c] as a
acc
Repeatedlyusingthesameoutputvaluesatmultiplepo- functionofη ,treatingitasavariableparameter: thede-
0
sitionsnaturallyreducesmutualinformationbetweenthe cayofIaηc0c[c]withη0 characterizesthetolerancetoadded
output concentration and position: noise of the information encoded in the morphogen (or
set of morphogens) c. Applied to gene expression data
Iraw[z(λ)]=Iraw[c(λ)]−lnλ from the early Drosophila segmentation gene network,
I [z(λ)]=I [c(λ)]−lnλ. this analysis will show how our simple model explains
acc acc
theuseofmulti-tiergradientinterpretationinarealsys-
However, the λ locations with identical concentrations tem (Fig. 4).
of z(λ) are made distinguishable by the original mor- Wefocusonaparticularnodeinthisnetworkwhereby,
phogenc(Fig.3B).Therefore,thejoint informationthat in early embryos, two gap genes, hb and Kr, regulate a
the original and the amplified profiles together provide pair-rule gene eve. For 0.37 < xAP < 0.47, where Kr
about a cell’s location is the same for Fz as it was for and hb expression form opposing boundaries, they are
λ
F : jointly responsible for creating the trough between eve
λ
stripes 2 and 3; other inputs to eve are negligible in this
I(cid:2){c,z(λ)},x(cid:3)=I(cid:2){c,c(λ)},x(cid:3) region at this time [45, 46]. Protein levels are measured
6
A B C D
4
Eve
HKrb Expression noise 0.1 Hb Kr Normalized profile 1 ccessible info (bits) 123 actual
A
Kr Eve Hb noiseless
Eve 0 0 0
x −0.1 0 0.1 0.4 0.45 0 0.1
AP x − x x Readout noise η
AP 0 AP 0
FIG. 4. A: Immunostaining of three antero-posterior (AP) axis patterning genes in the same embryo. Rather than specifying
cell fate directly, the “gap genes” such as hunchback (Hb; top) and Kru¨ppel (Kr; middle) control “pair-rule” genes such as
even-skipped (Eve,bottom). Bothtiersregulateothergenesfurtherdownstream. Boxesindicatetheselectedregionofinterest
(ROI), where at this time, Hb and Kr are the only relevant inputs to Eve, as shown on the cartoon. B: Within the ROI
(shaded), Eve exhibits higher expression noise than either Hb or Kr. Expression noise computed as RMS difference between
expression level of a nucleus and its immediate dorsal or ventral neighbor (see Methods), plotted against AP distance from
the Hb/Kr boundary (denoted x ). Error bars are standard deviation over N =8 embryos. C: Idealized morphogen profiles,
0
restricted to the ROI. Profile shape obtained as smooth spline-fit to expression values and noise magnitudes calculated for
the profiles of panel A after projection onto the AP axis. D: For all but the lowest readout noise magnitude, joint accessible
information content in the triplet (Hb,Kr,Eve) exceeds the accessible information provided by Hb and Kr alone, even in an
extreme hypothetical case when they are rendered entirely noiseless.
simultaneously in each nucleus by a triple immunostain- out noise magnitude η > 0 imposes an upper bound
0
ing experiment (Fig. 4A) in N = 8 single embryos. We that Iaηc0c[cHb,cKr] must satisfy. This corresponds to the
determine the expression noise of each gene by compar- information in a hypothetical pair of noiseless Hb and
ing levels in a given nucleus with those of its immediate Kr and cannot be achieved in practice; it is a theoretical
dorsal and ventral neighbors (see Methods). best-case scenario for any strategy lacking Eve.
In the defined region of interest, eve expression noise When the readout noise η0 is zero, Iaηc0c coincides with
ishigherthantherespectivenoiseinhb orKr expression the raw information content, which for perfectly noise-
(Fig.4B).Theinformationcontentofeve musttherefore less Hb and Kr would be infinite. However, as read-
belowerthanthatcarriedbyeitherofitstwoinputs. Due out noise increases, the performance bound becomes fi-
to the curvature of the embryo (Fig. 4A), the positional nite and drops quickly (black curve). This behavior con-
information of a real morphogen is only approximately trasts with the joint accessible information of the triplet
related to that derived from projection onto the imag- (Hb,Kr,Eve) (magenta) as calculated using the actual
inary AP axis. Therefore, to estimate the information measured noise of each of the three profiles. The ac-
content for each of the three genes, we consider “ideal- cessible information content in the triplet is, of course,
ized” Gaussian-noise profiles (panel C) with mean and alwaysfinite,butitisalsomoretoleranttoreadoutnoise:
noise obtained by smoothing the measured values in real due to the steeper slopes of the Eve profile, as η in-
0
embryos. The idealized profiles are normalized to the creases, the accessible information content of the triplet
same maximum and are, by construction, functions of (Hb,Kr,Eve) decreases slowly; importantly, more slowly
x carrying positional information I(c(x ),x ). Re- than the black curve. Therefore, a crossing point is ob-
AP AP AP
strictedtotheregionofinterest,theinformationcontent served, whose presence does not qualitatively depend on
of Hb and Kr is respectively 2.6 and 2.7 bits, whereas the specifics of the readout noise model (e.g. absolute
the larger noise of Eve reduces its information content noisemagnitudecanbereplacedbyfractional). Remark-
to only 2.0 bits. Why, then, does the system use Eve to ably,althoughEveismeasurablynoisierthaneitherofits
regulate downstream processes, rather than utilizing Kr inputs,itspresenceenablesthesystemtoaccessmorein-
and Hb directly? formation than could have been extracted from Hb and
The answer becomes clear when we consider the ac- Kr alone, even if these inputs could be rendered per-
cessibility of information encoded in these morphogens, fectly noiseless. In practice, the enhancers of the pair-
namely Iaηc0c as a function of η0 (panel D). A patterning rule genes also contain binding sites for maternal tran-
strategy lacking Eve can access only Hb and Kr. Even scription factors [38, 39], which may lead to a further in-
if some hypothetical filtering mechanism could reduce crease in the precision of gene expression. However, our
their expression noise to arbitrarily low level, the read- framework demonstrates that even if Eve were regulated
7
by Hb and Kr only, and so were fully redundant in the “economy of complexity” constraint is conveniently im-
standard information-theoretic sense, the additional tier posed by construction. We must realize, however, that
would still confer an advantage, because transcription is maximizinginformationtransmissiontothetargetgenes
intrinsically noisy. (downstream of the patterning core) imposes a differ-
ent requirement onto this core circuit than merely effi-
cient information transfer within the core itself. Instead,
Discussion the core circuit must function as a format converter, re-
encoding information at its input into a format that can
be accessed with a simpler and faster readout, that of a
TheDrosophila patterningnetworkhasbeendescribed patterning cue by a functional gene.
as performing a “transition from analog to digital spec- Curiously, it has been shown that in small networks
ification” of cell identity [37]. The “digital” metaphor with a realistic model of noise, maximizing raw informa-
has its limitations: even for Eve, the graded distribution tion transmission leads to network structures exhibiting
withingeneexpressiondomainscontainsinformation[8]; featuressuchastilingofpatternedrangewithamplifying
nevertheless, it expresses the correct intuition that the input/output readouts [49–51], i.e. features that tend to
final pattern is more tolerant to noise. Importantly, the also make information more accessible, even though the
standard information-theoretic formalism does not cap- optimization scheme employed in these studies did not
ture this intuition: for instance, the profile depicted in specifically consider the encoding format. This remark-
Fig.3Bhasthesame informationcontentforallλ. Noise able coincidence, however, should not obscure the fact
tolerance — a critically important feature in biological thatultimatelythetwotasks—maximizinginformation
systems—becomesmanifestonlywhenthereadoutpro- transmission and re-encoding it in a more accessible for-
cessisconsideredexplicitly,forexample,aswehavedone mat — could be conflicting.
in our definition of accessible information. This point is Information theory is a powerful tool; its formalism
implicitinthetheoreticalworkinvestigatingtheso-called does not, however, aim to replace considerations of what
“input noise” [41], but has not been emphasized. This constitutes useful information or how it might be used
is because in a theoretical discussion of an abstract bio- by the system. As it is gaining popularity in biologi-
chemical circuit, the quantities for which information is cal applications, it is important to remember that for
computed are easily postulated to be the complete input a channel X (cid:55)→ Y, the relation between mutual infor-
and the final output; in this manner, valid theoretical mation I(X,Y) and the ability to use Y to determine
results can be derived without a concern for informa- X is only asymptotic: Shannon [16] proved that it is
tionaccessibility(forsomerecentexamples,see[47,48]). the maximum rate of error-free communication via this
However, when information-theoretic arguments are ap- channel, in the limit of infinite uses of the channel. Im-
plied to experimental data where the measured quantity portantly, in development and biological signaling, the
is only an intermediate step, e.g. a transcription factor number of channel uses (e.g. integration time of the
regulating downstream events, the question of informa- signal) is fundamentally finite [3]. Further, Shannon’s
tion accessibility can no longer be neglected. results assumed an encoder/decoder of infinite computa-
For example, it has been suggested that certain sig- tional power [16]. This asymptotic rate is never in fact
naling circuits may have evolved towards optimal infor- achieved in practice [52], but in biological context, per-
mation transmission [4, 5]. Although the argument is formanceisconstrainedevenfurther,sincethe“encoding
plausible, applying it in practice requires caution. Con- scheme” is usually limited to measuring the same signal
sider,onceagain,theexampleofadevelopmentalcircuit. multiple times. In communication theory, this bears the
If the entire set of functional (cell-fate specific) genes name of “repetition code” and is formally classified as a
were to be included into consideration, then information “bad code”, i.e. a code that does not attain Shannon’s
transmission from the input to this entire layer of func- bound even asymptotically. This means that extracting
tional genes would be a plausible objective function for allthe“raw”informationfromasignalisimpossibleeven
this whole network to maximize, under some “bounded in principle. For example, a signaling pathway with ca-
complexity” constraint penalizing solutions where hun- pacityof1bitisneversufficienttomakeareliablebinary
dredsofcell-fatespecificgenesareallcontrolledbyhighly decision [3], and therefore should not be conceptualized
complex enhancers with combinatorial, cooperative reg- as a binary switch.
ulation. However, the usual, more economical approach As illustrated here, making the distinction between
does not consider the full set of hundreds of cell-fate de- “raw” and “accessible” information will be crucial for
termining genes. Instead, it recognizes that the bulk of understanding the architecture and function of pattern-
the patterning task is accomplished by a small subset of ing and signaling circuits. More work is required: our
dedicated genes that engage in complex cross-regulation definition of accessible information relied on a simplis-
to establish the pattern that all other genes can then in- tic noise model; in general, quantifying the usefulness
terpret simply. If we focus only on this core subset, the of information-bearing signals in contexts where channel
8
uses are limited will require reinstating considerations of pattern of cellular differentiation. J Theor Biol 25:1–47.
rate/fidelity tradeoff, which Shannon could eliminate by [16] Shannon, CE (1948) A mathematical theory of commu-
taking the limit of infinite-time communication. Nev- nication. Bell Systems Technical J 27:379–423, 623–656.
[17] Gergen JP, Coulter D, Wieschaus EF (1986) Segmental
ertheless, information theory remains a most adequate
patternandblastodermcellidentities.Gametogenesisand
frameworktoaddresstheseissues,provideditisextended
the Early Embryo, ed J. Gall (Alan R. Liss, New York)
to quantify both the amount and accessibility of infor-
pp. 195–220.
mation. Our work provides a step in this direction and [18] Gregor T, Tank DW, Wieschaus EF, Bialek W (2007)
demonstrates how the extended framework naturally ex- Probingthelimitstopositionalinformation.Cell130:153–
plains a global architectural property shared by diverse 164.
patterning circuits. [19] Munsky B, Neuert G, van Oudenaarden A (2012) Us-
ing gene expression noise to understand gene regulation.
We thank Ariel Amir, William Bialek, Michael Bren-
Science 336:183–187.
ner, Chase Broedersz, Ted Cox, Paul Francois, Anders [20] Sanchez A, Golding I (2013) Genetic determinants and
Hansen, Ben Machta, Gasper Tkacik, Eric Wieschaus cellular constraints in noisy gene expression. Science
and Ned Wingreen for helpful discussions and comments 342:1188–1193.
on the manuscript. This work was supported by NIH [21] Pera EM, Acosta H, Gouignard N, Climent M, Arregi
I (2014) Active signals, gradient formation and regional
grants P50 GM071508 and R01 GM097275, NSF grants
specificity in neural induction.Exp Cell Res 321:25–31.
PHY-0957573, PHY-1305525, and Harvard Center of
[22] Lumsden A, Krumlauf R (1996) Patterning the Verte-
Mathematical Sciences and Applications.
brate Neuraxis. Science 274:1109–1115.
[23] Raible F, Brand M (2004) Divide et Impera – the
midbrain–hindbrain boundary and its organizer. Trends
Neurosci 27:727–734.
[24] Patthey C, Gunhaga L (2014) Signaling pathways regu-
[1] JohnsonHA(1970)Informationtheoryinbiologyafter18 latingectodermalcellfatechoices.ExpCellRes321:11–16.
years. Science, 168:1545–1550. [25] Saga Y (2012) The mechanism of somite formation in
[2] Waltermann C, Klipp E (2011) Information theory based mice. Curr Opin Genet Dev 22:331–338.
approaches to cellular signaling. Biochim Biophys Acta, [26] Watanabe T, Takahashi Y (2010) Tissue morphogenesis
1810(10):924–32. coupled with cell shape changes. Curr Opin Genet Dev
[3] Bowsher CG, Swain PS (2014) Environmental sensing, 20:443–447.
information transfer, and cellular decision-making. Curr [27] Little SC, Mullins MC (2006) Extracellular modulation
Opin Biotechnol, 28:149–55. ofBMPactivityinpatterningthedorsoventralaxis.Birth
[4] LevchenkoA,NemenmanI(2014)Cellularnoiseandinfor- Defects Res C Embryo Today 78:224–242.
mation transmission. Curr Opin Biotechnol, 28:156–164. [28] Rushlow CA, Shvartsman SY (2012) Temporal dynam-
[5] Tkacik G, Bialek W (2014) Information processing in liv- ics,spatialrange,andtranscriptionalinterpretationofthe
ing systems. arXiv:1412.8752. Dorsalmorphogengradient.CurrOpinGenetDev22:542–
[6] Cheong R, Rhee A, Wang CJ, Nemenman I, Levchenko 546.
A (2011) Information transduction capacity of noisy bio- [29] Driever W, Nusslein–Volhard C (1988) A gradient of bi-
chemical signaling networks. Science 334(6054):354–8. coid protein in Drosophila embryos. Cell 54:83–93.
[7] Selimkhanov J, Taylor B, Yao J, Pilko A, Albeck J, Hoff- [30] Kornberg TB, Tabata T (1993) Segmentation of the
mannA,TsimringL,WollmanR(2014)Systemsbiology. Drosophila embryo. Curr Opin Genet Dev 3:585–593.
Accurate information transmission through dynamic bio- [31] Sokolowski TR, Tkacik G (2015) Optimizing informa-
chemical signaling networks. Science 346(6215):1370–3. tion flow in small genetic networks. IV. Spatial coupling.
[8] DubuisJO,TkacikG,WieschausEF,GregorT,BialekW arXiv:1501.04015.
(2013)Positionalinformation,inbits.Proc Natl Acad Sci [32] Tostevin F, ten Wolde PR (2009) Mutual information
U S A 110:16301–8. betweeninputandoutputtrajectoriesofbiochemicalnet-
[9] HironakaK,MorishitaY(2012)Encodinganddecodingof works. PRL 102:218101.
positional information in morphogen-dependent pattern- [33] TkacikG,DubuisJO,PetkovaMD,GregorT(2015)Po-
ing. Curr Opin Genet Dev 22:553–561. sitional information, positional error, and read-out preci-
[10] LanderA(2013)Howcellsknowwheretheyare.Science sioninmorphogenesis: amathematicalframework.Genet-
339:923–927. ics 199(1): 39–59.
[11] Rogers KW, Schier AF (2011) Morphogen gradients: [34] PetkovaMD,LittleSC,LiuF,GregorT(2014)Maternal
fromgenerationtointerpretation.AnnuRevCellDevBiol origins of developmental reproducibility. Current Biology
27:377–407. 24:12831288.
[12] Nahmad M, Lander AD (2011) Spatiotemporal mecha- [35] Little SC, Tikhonov M, Gregor T (2013) Precise devel-
nisms of morphogen gradient interpretation. Curr Opin opmental gene expression arises from globally stochastic
Genet Dev 21:726–731. transcriptional activity. Cell 154:789–800.
[13] WartlickO,KichevaA,Gonzalez-GaitanM(2009)Mor- [36] Erdmann T, Howard M, ten Wolde PR (2009) Role of
phogen gradient formation. Cold Spring Harb Persp Biol spatial averaging in the precision of gene expression pat-
1:a001255. terns. Phys Rev Lett 103:258101.
[14] MullerP,RogersKW,YuSR,BrandM,SchierAF(2013) [37] Gilbert SF (2013) Developmental Biology. (Sinauer As-
Morphogen transport. Development 140:1621–1638. sociates, Inc.), 10th edition.
[15] Wolpert L (1969) Positional information and the spatial [38] LiXYetal.(2008)Transcriptionfactorsbindthousands
9
of active and inactive regions in the Drosophila blasto-
derm. PLoS Biol. 6(2):e27.
[39] MacArthur S et al. (2009) Developmental roles of 21
Drosophilatranscriptionfactorsaredeterminedbyquanti-
tativedifferencesinbindingtoanoverlappingsetofthou-
sands of genomic regions. Genome Biol. 10(7):R80.
[40] For multiple profiles {c(1),c(2),...}, we define accessible
information as the joint information content in the set of
morphogenprofiles,independentlycorruptedwithnoiseof
magnitude η (compare with Eq. (4)):
0
I (cid:0){c(1),c(2),...}(cid:1)≡I (cid:0){c(1)+η(1),c(2)+η(2),...}(cid:1).
acc raw
[41] Tkacik G, Gregor T, Bialek W (2008) The role of in-
put noise in transcriptional regulation. PLoS ONE 3(7):
e2774.
[42] Krivega I, Dean A (2012) Enhancer and promoter inter-
actions–longdistancecalls.CurrOpinGenetDev22:79–
85.
[43] Kwak H, Lis JT (2013) Control of transcriptional elon-
gation. Annu Rev Genet 47:483–508.
[44] Maheshri N, O’Shea EK (2007) Living with noisy genes:
howcellsfunctionreliablywithinherentvariabilityingene
expression. Annu Rev Biophys Biomol Struct 36:413–434.
[45] Kraut R, Levine M (1991) Spatial regulation of the gap
gene giant during Drosophila development. Development
111:601–609.
[46] SmallS,BlairA,LevineM(1996)Regulationoftwopair-
rulestripesbyasingleenhancerintheDrosophilaembryo.
Dev Biol 175:314-324.
[47] BowsherCG,VoliotisM,SwainPS(2013)Thefidelityof
dynamic signaling by noisy biomolecular networks. PLoS
Comput Biol, 9(3):e1002965.
[48] de Ronde W, ten Wolde PR (2014) Multiplexing oscilla-
tory biochemical signals. Phys Biol, 11(2):026004.
[49] Tkacik G, Walczak AM, Bialek W (2009) Optimizing
information flow in small genetic networks. Phys Rev E
80:031920.
[50] WalczakAM,TkacikG,BialekW(2010)Optimizingin-
formation flow in small genetic networks II. Feed-forward
interactions. Phys Rev E 81:041905.
[51] Tkacik G, Walczak AM, Bialek W (2012) Optimizing
information flow in small genetic networks III. A self-
interacting gene. Phys Rev E 85:041903.
[52] MacKay DJC (2012) Information theory, inference
and learning algorithms (Cambridge University Press),
Fig. 47.17, p. 568.
10
Supplementary Information
Information carried by a linear morphogen Estimating expression magnitude (image
gradient processing)
For a linear morphogen c(x) spanning the range The immunostaining procedure described above yields
[0,c ], with constant Gaussian noise σ , the informa- confocal stacks of images where pixel intensity corre-
max 0
tion content is given by sponds to the recorded fluorescence level. Stacks were
converted into projected Hb, Kr and Eve images (such
(cid:18) (cid:19)
c
I [c]≡I[c(x),x]=ln √max . as displayed on Fig. 4A) as the maximum projection of
raw
σ0 2πe Gaussian-smoothed frames. The width of the averaging
kernel (8 pixels, corresponding to approximately 1 µm)
To show this, we apply the definition of the mutual
was smaller than the radius of the nuclei, therefore for
information:
pixels close to the nucleus center the averaging volume
was wholly within the nucleus. Smoothing frames prior
I[c(x),x]=H[P ]−H[P ]
c c|x tomaximumprojectionensuredrobustnessagainstimag-
ing noise.
HereP istheprobabilitydistributionofc(whichisuni-
c
form between 0 and c ); P is the conditional distri- In each of N = 8 embryos, the location of nuclei was
max c|x
bution of the concentration of c given x (which is Gaus- identified manually. For each of the projected images
sian of width σ ), and H[P] is the differential entropy of (Hb,KrandEve),werecordedthehighestintensityvalue
0
a probability distribution P: within 5 pixels of nuclei center locations as the fluores-
cence intensity in that nucleus. Allowing for a 5-pixel
(cid:90)
“wiggleroom”ensuredrobustnessagainstregistrationer-
H[P]≡− P(z)lnP(z)dz =−(cid:104)lnP(cid:105) .
P rorsacrosscolorchannels,aswellasagainsterrorsinthe
manualselectionofnucleicenterlocations. Therecorded
Clearly, H[Pc] = lncmax. The second term is the en- intensity values were corrected for background autofluo-
tropy of a Gaussian distribution Pσ0 of width σ0: rescence by subtracting the mean intensity recorded in
nuclei located in non-expressing regions of the embryo.
1 (cid:18) z2 (cid:19)
P (z)= exp − Thebackground-correctedfluorescencevaluesreflectpro-
σ0 (cid:112)2πσ02 2σ02 tein concentration, up to a proportionality factor (inten-
sity of a fluorophore). The fractional measurement noise
and therefore:
inestimatingrelativeconcentrationscanbeestimatedas
the standard deviation of pixel intensity values within
(cid:113) (cid:28) z2 (cid:29)
H[P ]=−(cid:104)lnP (z)(cid:105) =ln 2πσ2+ a nucleus on the projected map. In their respective re-
c|x σ0 z 0 2σ2
0 z gionsofexpression,thisstandarddeviationofHb,Krand
(cid:113) 1 (cid:16) √ (cid:17) Eve pixel intensity constituted ≈ 1% of the expression
=ln 2πσ2+ =ln σ 2πe . (8)
0 2 0
Putting this together, we find:
(cid:18) (cid:19)
c
I[c(x),x]=H[P ]−H[P ]=ln √max .
c c|x
σ 2πe
0
Experimental procedures
Antibodystainingwasperformedusingproceduresand
antisera described in [1] and [2]. Confocal microscopy
was performed at 12 bit resolution on a Leica SP5 with
FIG. S1. Example of projected image (Eve). Black polygon
a 20x HC PL APO NA 0.7 immersion objective at 1.4x
indicates the analysis region, manually selected to exclude
magnified zoom using pixels of size 135 x 135 nm cover-
distortedareasclosetotheembryoedge. Rectangleindicates
ing an area of 554x554 mm. For each embryo, 17 images
nuclei with the same projected coordinate onto the AP axis.
slices were obtained at a z interval of 4 microns, span- Even in this perfectly ventral view of the embryo that mini-
ning approximately 50% of embryo thickness. All data mizes the effects of stripe curvature (compare with Fig. 4A
werecollectedinasingleacquisitioncycleusingidentical in the main text), the expression stripes are not exactly per-
scanning parameters. pendicular to this axis.