Table Of Contentff
Between Pure and Approximate Di erential Privacy
∗ †
Thomas Steinke Jonathan Ullman
January 27, 2015
5
1
0
Abstract
2
n We show a new lower bound on the sample complexity of (ε,δ)-differentially private
a algorithms that accurately answer statistical queries on high-dimensional databases. The
J noveltyofourboundisthatitdependsoptimallyontheparameterδ,whichlooselycorre-
4 spondstotheprobabilitythatthealgorithmfailstobeprivate,andisthefirsttosmoothly
2 interpolate between approximate differential privacy (δ>0) and pure differential privacy
(δ=0).
]
S Specifically, we consider a database D ∈ {±1}n×d and its one-way marginals, which are
D thed queriesoftheform“Whatfractionofindividualrecordshavethei-thbitsetto+1?”
. Weshowthatinordertoanswerallofthesequeriestowithinerror±α (onaverage)while
s
c satisfying(ε,δ)-differentialprivacy,itisnecessarythat
[
(cid:112)
v1 n≥Ω dloαgε(1/δ),
5
9
which is optimal up to constant factors. To prove our lower bound, we build on the con-
0 nectionbetweenfingerprintingcodesandlowerboundsindifferentialprivacy(Bun,Ullman,
6
0 andVadhan,STOC’14).
. In addition to our lower bound, we give new purely and approximately differentially
1
private algorithms for answering arbitrary statistical queries that improve on the sample
0
5 complexityofthestandardLaplaceandGaussianmechanismsforachievingworst-caseac-
1 curacyguaranteesbyalogarithmicfactor.
:
v
i
X
r
a
∗
HarvardUniversitySchoolofEngineeringandAppliedSciences.SupportedbyNSFgrantCCF-1116616.
Email:[email protected].
†
Columbia University Department of Computer Science. Supported by a Junior Fellowship from the Simons
SocietyofFellows.Email:[email protected].
Contents
1 Introduction 1
1.1 Average-CaseVersusWorst-CaseError . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2 Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2 Preliminaries 4
3 LowerBoundsforApproximateDifferentialPrivacy 5
4 NewMechanismsforL∞ Error 7
4.1 PureDifferentialPrivacy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
4.2 ApproximateDifferentialPrivacy . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
References 10
A AlternativeLowerBoundforPureDifferentialPrivacy 12
1 Introduction
The goal of privacy-preserving data analysis is to enable rich statistical analysis of a database
while protecting the privacy of individuals whose data is in the database. A formal privacy
guarantee is given by (ε,δ)-differential privacy [DMNS06, DKM+06], which ensures that no in-
dividual’sdatahasasignificantinfluenceontheinformationreleasedaboutthedatabase. The
two parameters ε and δ control the level of privacy. Very roughly, ε is an upper bound on
the amount of influence an individual’s record has on the information released and δ is the
probabilitythatthisboundfailstohold1,sothedefinitionbecomesmorestringentasε,δ→0.
A natural way to measure the tradeoff between privacy and utility is sample complexity—
the minimum number of records n that is sufficient in order to publicly release a given set
of statistics about the database, while achieving both differential privacy and statistical accu-
racy. Intuitively,it’seasiertoachievethesetwogoalswhennislarge,aseachindividual’sdata
will have only a small influence on the aggregate statistics of interest. Conversely, the sample
complexitynshouldincreaseasε andδdecrease(whichstrengthenstheprivacyguarantee).
The strongest version of differential privacy, in which δ = 0, is known as pure differential
privacy. Thesamplecomplexityofachievingpuredifferentialprivacyiswellknownformany
settings (e.g. [HT10]). The more general case where δ >0 is known as approximate differential
privacy,andislesswellunderstood. Recently,Bun,Ullman,andVadhan[BUV14]showedhow
toprovestronglowerboundsforapproximatedifferentialprivacythatareessentiallyoptimal
forδ≈1/n,whichisessentiallytheweakestprivacyguaranteethatisstillmeaningful.2
Since δ bounds the probability of a complete privacy breach, we would like δ to be very
small. Thus we would like to quantify the cost (in terms of sample complexity) as δ → 0. In
this work we give lower bounds for approximately differentially private algorithms that are
nearlyoptimalforeverychoiceofδ,andsmoothlyinterpolatebetweenpureandapproximate
differentialprivacy.
Specifically,weconsideralgorithmsthatcomputetheone-waymarginalsofthedatabase—an
extremelysimpleandfundamentalfamilyofqueries. ForadatabaseD ∈{±1}n×d,thedone-way
marginalsaresimplythemeanofthebitsineachofthed columns. Formally,wedefine
1(cid:88)n
D := D ∈[±1]d
i
n
i=1
where D ∈{±1}d is the i-th row of D. A mechanism M is said to be accurate if, on input D, its
i (cid:12)(cid:12) (cid:12)(cid:12)
outputis“closeto”D. Accuracymaybemeasuredinaworst-casesense—i.e. (cid:12)(cid:12)(cid:12)(cid:12)M(D)−D(cid:12)(cid:12)(cid:12)(cid:12) ≤α,
∞
meaning every one-way marginal is answered with accuracy α—or in an average-case sense—
(cid:12)(cid:12) (cid:12)(cid:12)
i.e. (cid:12)(cid:12)(cid:12)(cid:12)M(D)−D(cid:12)(cid:12)(cid:12)(cid:12) ≤αd,meaningthemarginalsareansweredwithaverageaccuracyα.
1
Someoftheearliestresultsindifferentialprivacy[DN03,DN04,BDMN05,DMNS06]givea
simple (ε,δ)-differentially private algorithm—the Laplace mechanism—that computes the one-
waymarginalsofD ∈{±1}n×d withaverageerrorα aslongas
(cid:112)
n≥Omin dloεgα(1/δ),εdα. (1)
1Thisintuitionisactuallysomewhatimprecise,althoughitissuitableforthisinformaldiscussion. See[KS08]
foramoreprecisesemanticinterpretationof(ε,δ)-differentialprivacy.
2Whenδ≥1/ntherearealgorithmsthatareintuitivelynotprivate,yetsatisfy(0,δ)-differentialprivacy.
1
The√previous best lower bounds are n≥Ω(d/εα) [HT10] for pure differential privacy and n≥
Ω˜( d/εα)forapproximatedifferentialprivacywithδ=o(1/n)[BUV14]. Ourmainresultisan
optimallowerboundthatcombinesthepreviouslowerbounds.
Theorem 1.1 (Main Theorem). For every ε ≤ O(1), every 2−Ω(n) ≤ δ ≤ 1/n1+Ω(1) and every
(cid:104) (cid:105)
α≤1/10,ifM :{±1}n×d →[±1]d is(ε,δ)-differentiallyprivateandE (cid:107)M(D)−D(cid:107) ≤αd,then
1
M
(cid:112)
n≥Ω dloεgα(1/δ).
More generally, this is the first result showing that the sample complexity must grow by a
(cid:112)
multiplicativefactorof log(1/δ)foransweringanyfamilyofqueries,asopposedtoanadditive
dependence on δ. We also remark that the assumption on the range of δ is necessary, as the
Laplacemechanismgivesaccuracyα andsatisfies(ε,0)-differentialprivacywhenn≥O(d/εα).
1.1 Average-CaseVersusWorst-CaseError
Our lower bound holds for mechanisms with an average-case (L ) error guarantee. Thus, it
1
also holds for algorithms that achieve worst-case (L∞) error guarantees. The Laplace mech-
anism gives a matching upper bound for average-case error. In many cases worst-case error
guaranteesarepreferrable. Forworst-caseerror,thesamplecomplexityoftheLaplacemecha-
nismdegradesbyanadditionallogd factorcomparedto(1).
Surprisingly, this degradation is not necessary. We present algorithms that answer every
one-waymarginalwithαaccuracyandimproveonthesamplecomplexityoftheLaplacemech-
anismbyroughlyalogd factor. Thesealgorithmsdemonstratethatthewidelyusedtechnique
ofaddingindependentnoisetoeachqueryissuboptimalwhenthegoalistoachieveworst-case
errorguarantees.
Ouralgorithmforpuredifferentialprivacysatisfiesthefollowing.
Theorem 1.2. For every ε,α > 0, d ≥ 1, and n ≥ 4d/εα, there exists an efficient mechanism M :
{±1}n×d →[±1]d thatis(ε,0)-differentiallyprivateand
(cid:104)(cid:12)(cid:12) (cid:12)(cid:12) (cid:105)
∀D ∈{±1}n×d P (cid:12)(cid:12)(cid:12)(cid:12)M(D)−D(cid:12)(cid:12)(cid:12)(cid:12) ≥α ≤(2e)−d.
∞
M
Andouralgorithmforapproximatedifferentialprivacyisasfollows.
Theorem1.3. Foreveryε,δ,α>0,d ≥1,and
(cid:112)
n≥O d·log(1/εδα)·loglogd,
thereexistsanefficientmechanismM :{±1}n×d →[±1]d thatis(ε,δ)-differentiallyprivateand
∀D ∈{±1}n×d P(cid:104)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)M(D)−D(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≥α(cid:105)≤ 1 .
M ∞ dω(1)
These algorithms improve over the sample complexity of the best known mechanisms for
each privacy and accuracy guarantee by a factor of (log(d))Ω(1). Namely, the Laplace mecha-
nismrequiresn≥O(d·logd/εα)samplesforpuredifferentialprivacyandtheGaussianmech-
(cid:112)
anismrequiresn≥O( d·log(1/δ)·logd/εα)samplesforapproximatedifferentialprivacy.
2
Privacy Accuracy Type Previousbound Thiswork
√
(cid:18)√ (cid:19) (cid:32) (cid:33)
(ε,δ) L1 orL∞ Lower n=Ω˜ αdε [BUV14] n=Ω dloαgε(1/δ)
√
(cid:32) (cid:33)
d·log(1/δ)
(ε,δ) L Upper n=O Laplace
1 εα
√ √
(cid:32) (cid:33) (cid:32) (cid:33)
d·log(1/δ)·logd d·log(1/δ)·loglogd
(ε,δ) L∞ Upper n=O Gaussian n=O
εα εα
(cid:16) (cid:17)
(ε,0) L1 orL∞ Lower n=Ω αdε [HT10]
(cid:16) (cid:17)
(ε,0) L Upper n=O d Laplace
1 εα
(ε,0) L∞ Upper n=O(cid:16)d·logd(cid:17) Laplace n=O(cid:16) d (cid:17)
εα εα
Figure 1: Summary of sample complexity upper and lower bounds for privately answering d
one-waymarginalswithaccuracyα.
1.2 Techniques
Lower Bounds: Our lower bound relies on a combinatorial objected called a fingerprinting
code[BS98]. Fingerprintingcodeswereoriginallyusedincryptographyforwatermarkingdigi-
talcontent,butseveralrecentworkshaveshowntheyareintimatelyconnectedtolowerbounds
fordifferentialprivacyandrelatedlearningproblems[Ull13,BUV14,HU14,SU14]. Inpartic-
ular, Bun et al. [BUV14] showed that fingerprinting codes can be used to construct an attack
demonstratingthatanymechanismthataccuratelyanswersone-waymarginalsisnotdifferen-
tiallyprivate. Specifically,afingerprintingcodegivesadistributiononindividuals’dataanda
corresponding“tracer”algorithmsuchthat,ifadatabaseisconstructedfromthedataofafixed
subset of the individuals, then the tracer algorithm can identify at least one of the individu-
als in that subset given only approximate answers to the one-way marginals of the database.
Specifically, the√ir attack shows that a mechanism that satisfies (1,o(1/n))-differential privacy
requiresn≥Ω˜( d)samplestoaccuratelycomputeone-waymarginals.
Our proof uses a new, more general reduction from breaking fingerprinting codes to dif-
ferentially private data release. Specifically, our reduction uses group differential privacy. This
propertystatesthatifanalgorithmis(ε,δ)-differentiallyprivatewithrespecttothechangeof
one individual’s data, then for any k, it is roughly (kε,ekεδ)-differentially private with respect
to the change of k individuals’ data. Thus an (ε,δ)-differentially private algorithm provides a
meaningfulprivacyguaranteeforgroupsofsizek≈log(1/δ)/ε.
To use this in our reduction, we start with a mechanism M that takes a database of n rows
andis(ε,δ)-differentiallyprivate. WedesignamechanismM thattakesadatabaseofn/krows,
k
copieseachofitsrowsk times,andusestheresultasinputtoM. TheresultingmechanismM
k
is roughly (kε,ekεδ)-differentially private. For our choice of k, these parameters will be small
enoughtoapplytheattackof[BUV14]toobtainalowerboundonthenumberofsamplesused
byM ,whichisn/k. Thus,forlargervaluesofk (equivalently,smallervaluesofδ),weobtaina
k
strongerlowerbound. Theremainderoftheproofistoquantifytheparametersprecisely.
Upper Bounds: Our algorithm for pure differential privacy and worst-case error is an in-
stantiationoftheexponentialmechanism[MT07]usingtheL∞ norm. Thatis, themechanism
3
(cid:12)(cid:12) (cid:12)(cid:12)
samples y ∈ Rd with probability proportional to exp(−η(cid:12)(cid:12)(cid:12)(cid:12)y(cid:12)(cid:12)(cid:12)(cid:12) ) and outputs M(D) = D+y. In
∞
contrast, adding independent Laplace noise corresponds to using the exponential mechanism
with the L norm and adding independent Gaussian noise corresponds to using the exponen-
1
tial mechanism with the L norm squared. Using this distribution turns out to give better tail
2
boundsthanaddingindependentnoise.
For approximate differential privacy, we use a completely different algorithm. We start by
adding independent Gaussian noise to each marginal. However, rather than using a union
bound to show that each Gaussian error is small with high probability, we use a Chernoff
bound to show that most errors are small. Namely, with the sample complexity that we allow
M, we can ensure that all but a 1/polylog(d) fraction of the errors are small. Now we “fix”
the d/polylog(d) marginals that are bad. The trick is that we use the sparse vector algorithm,
which allows us to do indentify and fix these d/polylog(d) marginals with sample complexity
correspondingtoonlyd/polylog(d)queries,ratherthand queries.
2 Preliminaries
We define a database D ∈ {±1}n×d to be a matrix of n rows, where each row corresponds to an
individual, and each row has dimension d (consists of d binary attributes). We say that two
databasesD,D(cid:48) ∈{±1}n×d areadjacentiftheydifferonlybyasinglerow,andwedenotethisby
D ∼D(cid:48). In particular, we can replace the ith row of a database D with some fixed element of
{±1}d toobtainanotherdatabaseD−i ∼D.
Definition2.1(Differential Privacy [DMNS06]). Let M :{±1}n×d →R be a randomized mech-
anism. We say that M is (ε,δ)-differentially private if for every two adjacent databases D ∼ D(cid:48)
andeverysubsetS ⊆R,
P[M(D)∈S]≤eε·P(cid:2)M(D(cid:48))∈S(cid:3)+δ.
A well known fact about differential privacy is that it generalizes smoothly to databases
that differ onmore thana single row. Wesay thattwo databasesD,D(cid:48) ∈{±1}n×d arek-adjacent
iftheydifferbyatmostk rows,andwedenotethisbyD ∼ D(cid:48).
k
Fact 2.2 (Group Differential Privacy). For every k ≥ 1, if M : {±1}n×d → R is (ε,δ)-differentially
private,thenforeverytwok-adjacentdatabasesD ∼ D(cid:48),andeverysubsetS ⊆R,
k
P[M(D)∈S]≤ekε·P(cid:2)M(D(cid:48))∈S(cid:3)+ ekε−1 ·δ.
eε−1
All of the upper and lower bounds for one-way marginals have a multiplicative 1/αε de-
pendence on the accuracy α and the privacy loss ε. This is no coincidence - there is a generic
reduction:
Fact2.3(α andε dependence). Letp∈[1,∞]andα,ε,δ∈[0,1/10].
Suppose there exists a (ε,δ)-differentially private mechanism M :{±1}n×d →[±1]d such that for
everydatabaseD ∈{±1}n×d,
(cid:104) (cid:105)
E (cid:107)M(D)−D(cid:107) ≤αd1/p.
p
M
Then there exists a (1,δ/ε)-differentially private mechanism M(cid:48) : {±1}n(cid:48)×d → [±1]d for n(cid:48) =
Θ(αεn)suchthatforeverydatabaseD(cid:48) ∈{±1}n(cid:48)×d,
(cid:104) (cid:105)
E (cid:107)M(cid:48)(D(cid:48))−D(cid:48)(cid:107) ≤d1/p/10.
M(cid:48) p
4
Thisfactallowsustosuppresstheaccuracyparameterα andtheprivacylossεwhenprov-
ing our lower bounds. Namely, if we prove a lower bound of n(cid:48) ≥n∗ for all (1,δ)-differentially
(cid:104) (cid:105)
private mechanisms M(cid:48) : {±1}n(cid:48)×d → [±1]d with E (cid:107)M(cid:48)(D(cid:48))−D(cid:48)(cid:107) ≤ d1/p/10, then we obtain
M(cid:48) p
a lower bound of n ≥ Ω(n∗/αε) for all (ε,εδ)-differentially private mechanisms M : {±1}n×d →
(cid:104) (cid:105)
[±1]d withE (cid:107)M(D)−D(cid:107) ≤αd1/p. Sowewillsimplyfixtheparametersα=1/10andε=1in
p
M
ourlowerbounds.
ff
3 Lower Bounds for Approximate Di erential Privacy
Ourmaintheoremcanbestatedasfollows.
Theorem 3.1 (Main Theorem). Let M : {±1}n×d → [±1]d be a (1,δ)-differentially private mecha-
nismthatanswersone-waymarginalssuchthat
∀D ∈{±1}n×d E(cid:104)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)M(D)−D(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) (cid:105)≤ d ,
M 1 10
whereD isthetrueanswervector. If2−Ω(n)≤δ≤1/n1+Ω(1) andnissufficientlylarge,then
(cid:32) (cid:33)
n2
d ≤O .
log(1/δ)
Theorem 1.1 in the introduction follows by rearranging terms, and applying Fact 2.3. The
statement above is more convenient technically, but the statement in the introduction is more
consistentwiththeliterature.
Firstwemustintroducefingerprintingcodes. Thefollowingdefinitionistailoredtotheap-
plication to privacy. Fingerprinting codes were originally defined by Boneh and Shaw [BS98]
with a worst-case accuracy guarantee. Subsequent works [BUV14, SU14] have altered the ac-
curacyguaranteetoanaverage-caseone,whichweusehere.
Definition 3.2 (L Fingerprinting Code). A ε-complete δ-sound α-robust L fingerprinting code
1 1
for n users with length d is a pair of random variables D ∈{±1}n×d and Trace:[±1]d →2[n] such
thatthefollowinghold.
Completeness: ForanyfixedM :{±1}n×d →[±1]d,
(cid:104)(cid:16)(cid:12)(cid:12) (cid:12)(cid:12) (cid:17) (cid:105)
P (cid:12)(cid:12)(cid:12)(cid:12)M(D)−D(cid:12)(cid:12)(cid:12)(cid:12) ≤αd ∧(Trace(M(D))=∅) ≤ε.
1
Soundness: Foranyi ∈[n]andfixedM :{±1}n×d →[±1]d,
P[i ∈Trace(M(D−i))]≤δ,
whereD−i denotesD withtheith rowreplacedbysomefixedelementof{±1}d.
Fingerprinting codes with optimal length were first constructed by Tardos [Tar08] (for
worst-case error) and subsequent works [BUV14, SU14] have adapted Tardos’ construction to
workforaverage-caseerrorguarantees,whichyieldsthefollowingtheorem.
5
Theorem3.3. For every n≥1, δ>0, and d ≥d =O(n2log(1/δ)), there exists a 1/100-complete
n,δ
δ-sound1/8-robustL fingerprintingcodefornuserswithlengthd.
1
Wenowshowhowtheexistenceoffingerprintingcodesimpliesourlowerbound.
ProofofTheorem3.1fromTheorem3.3. LetM :{±1}n×d →[±1]d bea(1,δ)-differentiallyprivate
mechanismsuchthat
∀D ∈{±1}n×d E(cid:104)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)M(D)−D(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) (cid:105)≤ d .
M 1 10
Then,byMarkov’sinequality,
(cid:34) (cid:35)
∀D ∈{±1}n×d P (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)M(D)−D(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) > d ≤ 9 . (2)
M 1 9 10
Let k be a parameter to be chosen later. Let nk = (cid:98)n/k(cid:99). Let Mk : {±1}nk×d → [±1]d be the
followingmechanism. OninputD∗ ∈{±1}nk×d,Mk createsD ∈{±1}n×d bytakingk copiesofD∗
andfillingtheremainingentrieswith1s. ThenM runsM onD andoutputsM(D).
(cid:16) k (cid:17)
Bygroupprivacy(Fact2.2),M isa ε =k,δ = ek−1δ -differentiallyprivatemechanism. By
k k k e−1
thetriangleinequality,
(cid:12)(cid:12) (cid:12)(cid:12) (cid:12)(cid:12) (cid:12)(cid:12) (cid:12)(cid:12) (cid:12)(cid:12)
(cid:12)(cid:12)(cid:12)(cid:12)Mk(D∗)−D∗(cid:12)(cid:12)(cid:12)(cid:12)1≤(cid:12)(cid:12)(cid:12)(cid:12)M(D)−D(cid:12)(cid:12)(cid:12)(cid:12)1+(cid:12)(cid:12)(cid:12)(cid:12)D−D∗(cid:12)(cid:12)(cid:12)(cid:12)1. (3)
Now
k·nk ∗ n−k·nk
D = D + 1.
j n j n
Thus
(cid:12)(cid:32) (cid:33) (cid:12)
(cid:12)(cid:12)(cid:12)D −D∗(cid:12)(cid:12)(cid:12)=(cid:12)(cid:12)(cid:12) k·nk −1 D∗+ n−k·nk(cid:12)(cid:12)(cid:12)= n−k·nk (cid:12)(cid:12)(cid:12)1−D∗(cid:12)(cid:12)(cid:12)≤2n−k·nk.
(cid:12) j j(cid:12) (cid:12)(cid:12) n j n (cid:12)(cid:12) n (cid:12) j(cid:12) n
Wehave
n−k·n n−k(cid:98)n/k(cid:99) n−k(n/k−1) k
k = ≤ = .
n n n n
(cid:12)(cid:12) (cid:12)(cid:12) (cid:12)(cid:12) (cid:12)(cid:12)
Thus(cid:12)(cid:12)(cid:12)(cid:12)D−D∗(cid:12)(cid:12)(cid:12)(cid:12) ≤2k/n. Assumek≤n/200. Thus(cid:12)(cid:12)(cid:12)(cid:12)D−D∗(cid:12)(cid:12)(cid:12)(cid:12) ≤d/100and,by(2)and(3),
1 1
(cid:34) (cid:35) (cid:34) (cid:35)
MP (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)Mk(D∗)−D∗(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)1> d8 ≤MP (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)M(D)−D(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)1> d9 ≤ 190. (4)
k
Assumed ≥d ,wered =O(n2log(1/δ))isasinTheorem3.3. Wewillshowbycontra-
nk,δ nk,δ k
dictionthatthiscannotbe–thatisd ≤O(n2log(1/δ)). LetD∗∈{±1}nk×d andTrace:[±1]d →2[nk]
k
bea1/100-completeδ-sound1/8-robustL fingerprintingcodeforn usersoflengthd.
1 k
Bythecompletenessofthefingerprintingcode,
(cid:34) (cid:35)
P (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)Mk(D∗)−D∗(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)1≤ d8 ∧Trace(M(D))=∅ ≤ 1100. (5)
Combinging(4)and(5),gives
P[Trace(M (D∗))(cid:44)∅]≥ 9 > 1 .
k
100 12
6
Inparticular,thereexistsi∗∈[n ]suchthat
k
P[i∗∈Trace(M (D∗))]> 1 . (6)
k
12n
k
We have that Trace(M (D∗)) is a (ε ,δ )-differentially private function of D∗, as it is only
k k k
∗
postprocessingM (D ). Thus
k
(cid:104) (cid:105)
P[i∗∈Trace(Mk(D∗))]≤eεkP i∗∈Trace(Mk(D−∗i∗)) +δk ≤eεkδ+δk, (7)
wherethesecondinequalityfollowsfromthesoundnessofthefingerprintingcode.
Combining(6)and(7)gives
1 ek−1 ek+1−1
≤eεkδ+δ =ekδ+ δ= δ<ek+1δ. (8)
12n k e−1 e−1
k
If k ≤ log(1/12n δ)−1, then (8) gives a contradiction. Let k = (cid:98)log(1/12nδ)−1(cid:99). Assuming
k
δ≥e−n/200 ensuresk≤n/200,asrequired. Assumingδ≤1/n1+γ impliesk≥log(1/δ)/(1+1/γ)−
5≥Ω(log(1/δ)). Thissettingofk givesacontradiction,whichimpliesthat
(cid:32) (cid:33) (cid:32) (cid:33)
n2 n2
d <d =O(n2log(1/δ))=O log(1/δ) =O ,
nk,δ k k2 log(1/δ)
asrequired.
4 New Mechanisms for L Error
∞
Adding independent noise seems very natural for one-way marginals, but it is suboptimal if
one is interested in worst-case (i.e. L∞) error bounds, rather than average-case (i.e. L1) error
bounds.
4.1 PureDifferentialPrivacy
Theorem 1.2 follows from Theorem 4.1. In particular, the mechanism M :{±1}n×d →[±1]d in
Theorem 1.2 is given by M(D)=D+Y, where Y ∼D and D is the distribution from Theorem
4.1with∆=2/n.3
Theorem4.1. For all ε >0, d ≥1, and ∆>0, there exists a continuous distribution D on Rd with
thefollowingproperties.
• Privacy: Ifx,x(cid:48) ∈Rd with||x−x(cid:48)||∞≤∆,then
P [x+Y ∈S]≤eε P (cid:2)x(cid:48)+Y ∈S(cid:3)
Y∼D Y∼D
forallmeasurableS ⊆Rd.
3NotethatwemusttruncatetheoutputofMtoensurethatM(D)isalwaysin[±1]d.
7
• Accuracy: Forallα>0,
(cid:32) (cid:33)d
P [||Y||∞≥α]≤ ∆d ed−αε/∆.
Y∼D εα
Inparticular,ifd ≤εα/2∆,then P [||Y||∞≥α]≤(2e)−d.
Y∼D
• Efficiency: D canbeefficientlysampled.
Proof. ThedistributionD issimplyaninstantiationoftheexponentialmechanism[MT07]. In
particular,theprobabilitydensityfunctionisgivenby
(cid:18) ε (cid:12)(cid:12) (cid:12)(cid:12) (cid:19)
pdfD(y)∝exp −∆(cid:12)(cid:12)(cid:12)(cid:12)y(cid:12)(cid:12)(cid:12)(cid:12)∞ .
Formally,foreverymeasurableS ⊆Rd,
(cid:82) (cid:16) (cid:12)(cid:12) (cid:12)(cid:12) (cid:17)
exp −ε (cid:12)(cid:12)(cid:12)(cid:12)y(cid:12)(cid:12)(cid:12)(cid:12) dy
P [Y ∈S]= (cid:82)S (cid:16) ∆ (cid:12)(cid:12) (cid:12)(cid:12)∞ (cid:17) .
Y∼D exp −ε (cid:12)(cid:12)(cid:12)(cid:12)y(cid:12)(cid:12)(cid:12)(cid:12) dy
Rd ∆ ∞
Firstly,thisisclearlyawell-defineddistributionaslongasε/∆>0.
Privacy is easy to verify: It suffices to bound the ratio of the probability densities for the
shifteddistributions. Forx,x(cid:48) ∈Rd with||x(cid:48)−x||∞≤∆,bythetriangleinequality,
(cid:16) (cid:12)(cid:12) (cid:12)(cid:12) (cid:17)
ppddffDD((xx(cid:48)++yy)) = eexxpp(cid:16)−−ε∆ε(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)xx(cid:48)++yy(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)∞(cid:17) =exp(cid:18)∆ε (cid:16)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)x(cid:48)+y(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)∞−(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)x+y(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)∞(cid:17)(cid:19)≤exp(cid:18)∆ε (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)x(cid:48)−x(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)∞(cid:19)≤eε.
∆ ∞
Define a distribution D∗ on [0,∞) to by Z ∼ D∗ meaning Z = ||Y||∞ for Y ∼ D. To prove
accuracy,wemustgiveatailboundonD∗. TheprobabilitydensityfunctionofD∗ isgivenby
(cid:18) (cid:19)
pdfD∗(z)∝zd−1·exp −εz ,
∆
whichisobtainedbyintegratingtheprobabilitydensityfunctionofD overtheinfinity-ballof
radius z, which has surface area d2dzd−1 ∝zd−1. Thus D∗ is precisely the gamma distribution
withshaped andmeand∆/ε. Themomentgeneratingfunctionistherefore
(cid:32) (cid:33)−d
(cid:104) (cid:105) ∆
E etZ = 1− t
Z∼D∗ ε
forallt<ε/∆. ByMarkov’sinequality
(cid:104) (cid:105)
E etZ (cid:32) (cid:33)−d
P [Z ≥α]≤ Z∼D∗ = 1− ∆t e−tα.
Z∼D∗ etα ε
Settingt=ε/∆−d/α givestherequiredbound.
It is easy to verify that Y ∼ D can be sampled by first sampling a radius R from a gamma
distribution with shape d +1 and mean (d +1)∆/ε and then sampling Y ∈ [±R]d uniformly
at random. To sample R we can set R = ∆(cid:80)d logU , where each U ∈ (0,1] is uniform and
ε i=0 i i
independent. Thisgivesanalgorithm(intheformofanexplicitcircuit)tosampleD thatuses
onlyO(d)realarithmeticoperations,d+1logarithms,and2d+1independentuniformsamples
from[0,1].
8