Table Of ContentQuantum Annealing amid Local Ruggedness and Global Frustration
James King,∗ Sheir Yarkoni, Jack Raymond, Isil Ozfidan, Andrew D. King,
Mayssam Mohammadi Nevisi, Jeremy P. Hilton, and Catherine C. McGeoch
D-Wave Systems
(Dated: March 2, 2017)
ArecentGooglestudy[Phys.Rev.X,6:031015(2016)]comparedaD-Wave2Xquantumprocess-
ing unit (QPU) to two classical Monte Carlo algorithms: simulated annealing (SA) and quantum
Monte Carlo (QMC). The study showed the D-Wave 2X to be up to 100 million times faster than
the classical algorithms. The Google inputs are designed to demonstrate the value of collective
multiqubittunneling,aresourcethatisavailabletoD-WaveQPUsbutnottosimulatedannealing.
7 But the computational hardness in these inputs is highly localized in gadgets, with only a small
1 amount of complexity coming from global interactions, meaning that the relevance to real-world
0 problems is limited.
2 InthisstudyweprovideanewsyntheticproblemclassthataddressesthelimitationsoftheGoogle
inputswhileretainingtheirstrengths. Weusesimpleclustersinsteadofmorecomplexgadgetsand
r
a more emphasis is placed on creatingcomputational hardness through frustrated global interactions
M like those seen in interesting real-world inputs. The logical spin-glass backbones used to generate
these inputs are planar Ising models without fields and the problems can therefore be solved in
1 polynomial time [J. Phys. A, 15:10 (1982)]. However, for general heuristic algorithms that are
unaware of the planted problem class, the frustration creates meaningful difficulty in a controlled
] environment ideal for study.
h
We use these inputs to evaluate the new 2000-qubit D-Wave QPU. We include the HFS
p
algorithm—the best performer in a broader analysis of Google inputs—and we include state-of-
-
t the-artGPUimplementationsofSAandQMC.TheD-WaveQPUsolidlyoutperformsthesoftware
n solvers: whenweconsiderpureannealingtime(computationtime),theD-WaveQPUreachesground
a
states up to 2600 times faster than the competition. In the task of zero-temperature Boltzmann
u
samplingfromchallengingmultimodalinputs,theD-WaveQPUholdsasimilaradvantageanddoes
q
not see significant performance degradation due to quantum sampling bias.
[
2
v CONTENTS ruggedness and classical hardness 6
9
7
IV. Software solvers 7
5 I. Introduction 2
4
A. Proposing a new problem class 2
0 V. Optimization 7
1. B. Evaluation of the 2000-qubit D-Wave
A. Varying ruggedness via logical
0 QPU 2
complexity 7
7
1 B. Varying ruggedness by scaling 8
II. D-Wave quantum processing units 3
:
v
A. Ising minimization 3
i
X VI. Sampling 9
B. Chimera topology 3
r A. Sampling from all valleys 9
a C. Quantum annealing 3
B. Mining for interesting valley structure 9
D. Modeling performance 4
C. Sampling results 10
III. Frustrated Cluster Loop problems 5
VII. Conclusions 12
A. Ruggedness and clusters 5
B. FCL problem generation 5 References 13
C. Problem class parameters 6
A. Calculation of decorrelation 14
D. Confirming correlation between
B. Details of software solvers 15
∗ [email protected] 1. Classical hardware 15
2
2. Included software solvers 15 recently introduced an input class designed to ben-
efit immensely from quantum tunneling. We refer
3. Excluded software solvers 15
to these inputs as Google problems. Their study
4. Parameter tuning 16 showed a massive speed increase (up to 100 million
times faster) of a D-Wave 2X system over simulated
annealing (SA) and quantum Monte Carlo (QMC),
also known as simulated quantum annealing. This
I. INTRODUCTION provided strong evidence for the ability of quan-
tum annealing to leverage quantum tunneling in a
Quantum annealers are designed to take advantage computationally relevant way. However, the spin-
ofquantumtunnelingtofindgoodsolutionstohard glass backbones of the Google problems are easy
optimization problems. When constructing a family to solve, meaning that a) the problems have lim-
ofsyntheticinputstotestthepotentialofaquantum itedrelevancetoreal-worldproblems,andb)certain
annealingplatform,oneshouldthereforeensurethat cluster-detectingalgorithmscansolvethemwithrel-
the inputs a) are such that solvers can benefit from ative ease [4].
quantum tunneling, and b) are hard optimization In this study we provide a problem class that aims
problems with global frustration. to retain the advantages of Google problems while
For a solver to benefit from quantum tunneling, the being more reflective of real-world problems. They
energy landscape associated with the input must are more reflective of real-world problems because,
have tall, thin energy barriers. For an input to rather than relying too heavily on finely-tuned gad-
be computationally hard, the input must have con- gets, they derive much of their computational hard-
straints that interact with each other in nontrivial ness from larger spin-glass backbones with planted
ways. frustration.
Quantum processing units (QPUs) developed by Our problems are synthetic and are easy to solve
D-Wave Systems that use the quantum annealing using knowledge of the problem class.1 More specif-
algorithm have been commercially available since ically,sincethelogicalproblemsareIsingmodelson
2011. These QPUs solve Ising model inputs defined a planar lattice without fields, they are solvable in
on the underlying working graph of the chip. There polynomialtime[5]. However,frustrationinthelog-
have been various efforts to evaluate the perfor- icalproblemscreatesmeaningfuldifficultyforheuris-
manceoftheD-Wavesystemsusingsyntheticinputs tic methods that are unaware of the planted prob-
generated randomly from different distributions, or lem class. Further, the inputs have properties such
input classes. astunableruggednessthatmakethemusefulforthe
evaluation of quantum annealing and classical ap-
This study has two main contributions: to propose
proximations thereof. In this way, they are similar
a new problem class ideal for evaluating D-Wave
toKauffman’sNKmodelthathasprovedveryuseful
QPUs, and to use this problem class to evaluate the
in the analysis of evolutionary algorithms [6–8].
2000-qubit D-Wave QPU.
B. Evaluation of the 2000-qubit D-Wave QPU
A. Proposing a new problem class
Weusethisnewproblemclasstoevaluatethelatest-
Previous evaluations of D-Wave QPUs have used generation D-Wave QPU. We measure its perfor-
problem classes that benefit either too little or too manceinabsolutetermsandweanalyzeitsresponse
much from quantum tunneling to be ideal for evalu- to the ruggedness parameters of the problem class.
ating quantum annealers.
The software competition we consider is much
Ononesideofthisspectrumwehaveproblemssuch stronger than that considered by Denchev et al. [3],
as random unstructured ±1 problems on the Chi- and includes GPU implementations of SA, QMC,
meratopologynativetoD-WaveQPUs. Thesewere
usedbyRønnowetal.[1]intheirevaluationoftheD-
WaveTwoQPUin2014,buttheyarenowknown[2]
to lack a finite-temperature phase transition, mean- 1 Forexample,thesuper-spinheuristic[4]thatreliesonhard-
ingthatquantumtunnelingisunlikelytoplayasig- coded knowledge of clusters would be far faster than the
nificant role when solving them. software solvers we consider. However, such heuristics do
not generalize to other problem classes and it would not
Ontheothersideofthespectrum,Denchevetal.[3] makesensetoincludethemascompetitionsolvers.
3
and SVMC, and also includes Selby’s implementa- is antiferromagnetic, the optimal solution is called
tion [9, 10] of the Hamze-de Freitas-Selby (HFS) al- a ground state, and nonoptimal solutions are ex-
gorithm [9, 11]. In the study of Mandrà et al. [4] cited states. IM instances can be trivially trans-
thatusedawidearrayofalgorithmstosolveGoogle formed to Quadratic Unconstrained Boolean Op-
problems, Selby’s implementation of HFS was the timization (QUBO) instances defined on integers
fastest software solver in terms of both scaling and s={0,1}, ortoMaximumWeighted2-Satisfiability
absolute speed. (MAX W2SAT) instances defined on Booleans s =
{true,false}, all of which are NP-hard.
WefindthattheD-WaveQPUisabletofindground
statesupto2600timesfasterthanthesoftwarecom-
petition. We also consider the problem of sampling
from ground states and find that the D-Wave QPU
maintainsasimilaradvantageanddoesnotstruggle B. Chimera topology
to find a diverse set of optimal solutions.
The remainder of the paper is organized as fol- The native connectivity topology for the D-Wave
lows. In Section II we provide a description of the QPU is based on a C Chimera graph containing
16
2000-qubitD-WavesystemandahistoryofD-Wave 2048 vertices (qubits) and 6016 edges (couplers).
QPUs. In Section III we present the problem class A Chimera graph of size C is an s × s grid of
analyzed in this paper and discuss the concept of Chimera cells (also called unsit tiles or unit cells),
ruggedness and its relevance to optimization prob- eachcontainingacompletebipartitegraphon8ver-
lems. In Section IV we discuss the software solvers tices (a K ). Each vertex is connected to its four
used in our evaluations, as well as notable solvers neighbors4i,n4side the cell as well as two neighbors
that were not suitable. In Section V we present (north/south or east/west) outside the cell: there-
our experimental results on optimization. In Sec- fore every vertex has degree 6 excluding boundary
tion VI we present our experimental results on sam- vertices.
pling from ground states. In Section VII we provide
In this study, as in others, we vary the problem size
further discussion and conclude the paper.
using square subgraphs of the full graph, from size
C (128 vertices) up to C (2048 vertices). Note
4 16
thatthenumberofproblemvariablesn=8s2 grows
II. D-WAVE QUANTUM PROCESSING quadratically with Chimera size. The reason we
UNITS
measure algorithm performance as a function of the
Chimera size and not the number of qubits is that
WestartwithanoverviewofD-Wavedesignfeatures problem difficulty tends to scale exponentially with
andintroducenotationthatwillbeusedthroughout. the Chimera size, i.e., with the square root of the
FordetailsaboutunderlyingtechnologiesseeBunyk number of qubits, since the treewidth of a Chimera
etal.[12],Dicksonetal.[13],Harrisetal.[14],John- graph C is linear in s [17, 18].
s
son et al. [15] or Lanting et al. [16].
Because the chip fabrication and trapped magnetic
flux leave some small number of qubits unusable,
eachprocessorhasaspecifichardwareworkinggraph
A. Ising minimization H ⊂ C16. The qubit yield—the fraction of qubits
thatareoperational—istypicallyaround98%forthe
2000-qubitD-Wavesystemwhereas95%wastypical
D-Waveannealing-basedquantumprocessorsarede-
for the D-Wave 2X. The working graph used in this
signed to find minimum-cost solutions to the Ising
study has 2035 working qubits out of 2048.
minimization (IM) problem, defined on a graph
G = (V,E) as follows. Given a collection of fields
h = {h : i ∈ V} and couplings J = {J : (i,j) ∈
i ij
E}, assign values from {−1,+1} to n spin variables
C. Quantum annealing
s={s } so as to minimize the energy function
i
E(s)=(cid:88)h s + (cid:88) J s s . (1) D-WaveprocessorssolveIsingproblemsbyquantum
i i ij i j
annealing (QA) in the form proposed by Kadowaki
i∈V (i,j)∈E
and Nishimori [19]. The QA algorithm is imple-
The spin variables s can be interpreted as mag- mented in hardware using a framework of analog
neticpolesinaphysicalparticlesystem; inthiscon- control devices to manipulate a collection of qubit
text, negative J is ferromagnetic and positive J states according to a time-dependent Hamiltonian
ij ij
4
shown below. a. Program. Load (h,J) onto the chip; denote
the elapsed programming/initialization time
t .
H(t)=A(t)·H +B(t)·H . (2) i
init prob
QA carries out a gradual transition in time t : b. Anneal. CarryouttheQAalgorithm. Anneal
0 → t , from an initial ground state in H , to a timeta canbesetbyusertosomevalue5µs≤
statedaescribedbytheproblemHamiltonianinHit = ta ≤1000µs.
prob
(cid:80) h σz +(cid:80) J σzσz. The problem Hamiltonian
i i i ij ij i j
matches the energy function (1), so that a ground
c. Read. Record qubit states to obtain a solu-
state for H is a minimum-cost solution to E(s).
prob tion; denote the elapsed readout time t .
r
QA is closely related to adiabatic quantum comput-
ing (AQC). The AQC model of computation was
proposed by Farhi et al. [20] who showed that if the d. Repeat. Repeat steps b and c k times to ob-
transitioniscarriedoutslowlyenoughthealgorithm tain a sample of k solutions.
will find a ground state (i.e., an optimal solution)
with high probability.
Wedefinesampletimet andtotaltimeT asfollows:
Theoretical guarantees about solution times for s
quantumalgorithms(foundin[20])assumethatthe
computation takes place in an ideal closed system, t =(t +t ) (3)
s a r
perfectlyisolatedfromenergyinterferencefromam-
T =t +kt .
bient surroundings. The 2000-qubit D-Wave chip is i s
housed in a highly shielded chamber and cooled to
near absolute zero; nevertheless, as is the case with
For the D-Wave system studied in this paper, the
any real-world quantum device, it must suffer some
amount of interference, which has the general effect median programming time ti is 9.5ms and the me-
of reducing the probability of landing in a ground dian readout time tr is 123µs.
state. Thus, theoretical guarantees on performance In this study, both for software solvers and for the
may not apply to these systems. We consider any D-Wave QPU, we typically report annealing time
D-WaveQPUtobeaheuristicsolver,whichrequires rather than total time. Annealing time is the mea-
empirical approaches to performance analysis. sure of the algorithm proper, and measuring total
time often obscures trends in data. Scaling plots
The D-Wave QPU studied here contains 2035 ac-
areparticularlysusceptibletothisbecausetheover-
tive qubits (quantum bits) and 5912 active cou-
head of programming time makes scaling—typically
plers made of microscopic loops of niobium con-
presented on a semilog plot—look totally flat ex-
nectedtoalargeandcomplexanalogcontrolsystem
cept for an uptick at the very largest problem sizes.
via an arrangement of Josephson Junctions. Ther-
Further, we are most interested in the future po-
mometry on the refrigerator of the D-Wave QPU
tential of D-Wave QPUs, and we expect that pro-
and fits of single qubit measurements to a thermo-
gramming time and readout time will be reduced to
dynamic model indicate that T (cid:46) 15mK. When
smallfractionsoftheircurrentvalues; minimuman-
cooled to temperatures below 9.3K, niobium be-
nealing times will similarly be reduced, allowing us
comesasuperconductorandiscapableofdisplaying
better control over the algorithm parameters. For
quantum properties including superposition, entan-
reference,sincemanypeoplewillbeinterestedinto-
glement, and quantum tunneling. Because of these
tal wall clock time, rather than annealing time, a
properties, the qubits on the chip behave as a quan-
1000 times speedup over software solvers in anneal-
tum mechanical particle process that carries out a
ing time, typical for the D-Wave QPU in this study,
transition from initial state described by H to a
init translatesroughlytoa30timesspeedupintotalwall
problem state described by H [13, 16, 21].
prob clock time including programming and readout.
System characteristics of D-Wave QPUs such as
yield can vary within a generation. If we compare
this specific 2000-qubit D-Wave system to the spe-
D. Modeling performance
cificD-Wave2XQPUstudiedin2015[22],program-
ming time has decreased by 20%, readout is three
Given input instance (h,J), a D-Wave computation times faster, and yield has improved from 95% to
involves the following steps. 99%.
5
III. FRUSTRATED CLUSTER LOOP fields; annealers must then go over or through an
PROBLEMS energy barrier to reach the gadget’s ground state.
Instead of using two-cluster gadgets, we simply use
single-cell ferromagnetic clusters to induce rugged-
A. Ruggedness and clusters
ness, leaving us with a simpler problem class.
Ruggedness is a feature of certain optimiza-
tion problems—more specifically their energy
landscapes—characterized by tall energy barriers B. FCL problem generation
and many local optima [23, 24]. Typically, rugged
problems are harder to solve, particularly with
We create local ruggedness by treating unit cells
Markov chain Monte Carlo (MCMC) methods [25,
of the Chimera graph as ferromagnetically-coupled
26]. Inthelate1980s,whenruggednesswasfirstbe-
clusters. We create global frustration by joining
ing explored in the context of evolutionary biology
these clusters together using a problem generated
and bio-inspired computing, Kauffman’s NK model
on the logical graph of clusters. This creates an
wasputforwardasamodelwithtunableruggedness
energylandscape thatismacroscopicallyinteresting
inspired by genetic fitness functions under varying
and in which the clusters induce wells separated by
degrees of epistasis, or how many other genetic loci
tall, thin energy barriers.
affect the fitness contribution of a given locus [6–8].
ThetunableruggednessoftheNKmodelhasproved Thelogicalgraphofclustersisasquarelattice,with
veryvaluableinthestudyofoptimizationheuristics, alogical16×16latticeofclustersspanningthework-
particularly evolutionary algorithms [27]. inggraphofa2000-qubitD-WaveQPU.2 Theprob-
Closely related to ruggedness is the analysis of spin lems we generate on the logical graph are frustrated
overlap, in which landscape features are inferred loop problems, constraint satisfaction problems first
used in the evaluation of D-Wave QPUs by Hen et
fromthedistributionofoverlapoftworandomstates
al.[35]andmodifiedtoallowprecisionlimitsbyKing
sampled from the Boltzmann distribution [28, 29].
et al. [36].
Tall, thin peaks in the spin overlap distribution
tend to correspond to tall, thin energy barriers; the Werefertothefinalinputsasfrustrated cluster loop
presence of these features correlates not only with (FCL) problems. For a given Chimera graph G
C
ruggedness and classical hardness, but also with thatmayormaynothavemissingqubitsorcouplers,
applicability of quantum annealing, since quantum anFCLproblemisgeneratedfromthreeparameters,
tunneling is likely to be a useful computational re- α (the clauses-to-variables ratio), ρ (the range, or
source in the presence of these tall, thin barriers. precision), and R≥ρ (the ruggedness) as follows:
Zhuetal.[30]haveusedspinoverlapfeaturestopre-
dictwhetheraproblemcanbesolvedbyQAmoreef-
ficientlythanbySA,showingpromisingpreliminary a. Defineeachunitcellasalogicalspin ifithasno
results for optimization problems such as weighted missing qubits or couplers. Use c(v) to denote
the logical spin index corresponding to qubit
partial MAX-2SAT, minimum vertex cover, satis-
fiability, graph partitioning, circuit fault diagnosis, v.
and certain spin-glass instances [30, 31]. This work
b. Whereverallfourcouplersconnectingtwolog-
pointstothepotentialofquantumannealerstohave
ical spins are present, define these couplers as
aplaceinportfoliosolvers[32]andhybridalgorithms
runningonheterogeneouscomputingsystemsalong- a logical coupler.
side CPUs, GPUs, and other coprocessors [33, 34].
c. DefinethelogicalgraphG asthegraphcom-
To induce ruggedness using tall, thin energy barri- L
prising the logical spins and logical couplers.
ers,Denchevetal.[3]usedferromagneticallycoupled
unit tiles as clusters. Flipping such a cluster in the
absenceoffieldsorexternalcouplingsinvolvesjump-
ingoverortunnelingthroughanenergybarrierthat
is 16 Ising units high and has a width of 8 in Ham- 2 Notethata16×16logicallatticeissignificantlylargerthan
mingspace. Denchevetal.[3]actuallygobeyondus- thelargestlogicallattice,4×4,oftheGoogleproblemscon-
sideredbyDenchevetal.[3]—theirtwo-tilegadgetstakeup
ingsingle-tileclustersandusetwo-tilegadgetsstud-
more space than our one-tile clusters and the D-Wave 2X
ied previously by Boixo et al. [21]. This gadget is has a smaller working graph than the 2000-qubit D-Wave
made up of two clusters that form a deceptive trap system. Theselargerlogicalgraphsintheproblemswecon-
to draw annealers into a local minimum using local sidermeanthatthespin-glassbackbonesoftheseproblems
aresignificantlymorecomputationallychallenging.
6
d. Generate a range-bounded frustrated loop 102
problemHamiltonian(h ,J )onG usingpa- n
L L L o
rameters α and ρ as per King et al. [36] (note ti
u
that h is the zero vector). ol Range(ρ)
L s 3
o
e. Define the native Chimera Hamiltonian t 4
s
(hC,JC) with hC as the zero vector and JC ple 5
as: m 6
a
(cid:40) −1, if c(u)=c(v) S 1010.5 0.6 0.7 0.8 0.9 1.0 1.1 ∞
J (u,v)=
C 1 ·J (c(u),c(v)), otherwise. α
R L
It is worth repeating that these Hamiltonians have FIG. 1. Logical problem difficulty as measured by ex-
no fields (i.e., h and h are both zero vectors). pected samples to solution for simulated annealing. Er-
L C
Note also that in-tile couplings in J are all −1 and ror bars show the 95% confidence intervals for the me-
C
inter-tile couplings take values in dians, grouped over α and ρ. Difficulty is maximized at
α=0.65forprecision3,α=0.75forprecision4,α=0.8
{j/R | j ∈{−R,−R+1,...,R}}. for precision 5, and α=0.85 for precision 6.
Since we ensure that ρ ≤ R, and ρ ≥ |j| for any
logicalcouplingj,allcouplingsinJ areintherange value of α, problems with a lower ρ value have their
C
[−1,1] loops spread out more evenly over the logical spins.
Finally, for the native problem, coupler values are
Thelogicalfrustratedloopproblemsmaybediscon-
scaled down by a factor of R ≥ ρ so that inter-tile
nectedandhavemultiplecomponents;werejectsuch
couplings are in the range [−1,1] (in-tile couplings
disconnected inputs at generation time.
arealways−1). ThushighervaluesofρconstrainR
While these problems are large enough to span the tobehigher,andmakeproblemsmorelocallyrugged
entire working graph of the latest D-Wave QPUs, relativetotheglobalHamiltonian. InSectionV,we
the repetition code inherent in logical couplers and attempt to deconvolve the impacts of ρ and R.
spins makes them relatively robust to analog errors
The ability to tune the ruggedness of the inputs by
[37].
varyingR,eitherbyspecifyingR=ρandvaryingρ,
or by varying R independently, gives FCL problems
anadditionaldegreeofutility, particularlywhenas-
C. Problem class parameters sessing the value of quantum tunneling and the po-
tential of quantum annealing. Varying ρ and spec-
ifying R = ρ makes problem generation simpler by
The FCL problem class has three parameters: the
reducingthenumberoffreeparameterswhereasfix-
clauses-to-variables ratio α, the range ρ, and the
ing ρ and varying R allows us to isolate the impact
ruggedness R. We would like to restrict our experi-
ofruggednesswithoutalteringthecomplexityofthe
ments to the most interesting region of the parame-
logical problem.
ter space.
First we aim to determine the value of α that max-
imizes the difficulty of the logical problem. If α is
too low, a problem is underconstrained and is easy D. Confirming correlation between ruggedness
to solve. If α is too high, the planted solution is and classical hardness
expressed too strongly and the problem’s features
approach those of a ferromagnet, making it easy to
We expect to see a positive correlation between
solve. The difficulty of the logical problem depends
ruggedness and classical hardness. Here we charac-
only on α and ρ, not on R. For various values of ρ,
terizeclassicalhardnessusingthedecorrelationtime
weperformasweepofαtodeterminethevaluethat
of an MCMC procedure. To validate this assump-
maximizes the hardness of the logical problem (see
tionwemeasurethedecorrelationtimeforaparallel
Figure 1).
tempering (PT) procedure that uses the Metropolis
The impacts of ρ are more nuanced. First, the algorithm in combination with the standard replica
value of ρ provides an upper bound on the limit of exchange rule [38]; we use the autocorrelation of
α because packing in more loops eventually raises temperature as our measure of decorrelation [39].
the maximum coupler range. Second, for a fixed For more details of this method, see Appendix A.
7
competitive in an absolute sense. Indeed, of the
105
three classes of solvers that they study, only se-
quential algorithms, which they find to have the
e
ur worst performance, have the massive parallelizabil-
t
a ityandlowmemoryrequirementsthatmakeefficient
r
pe GPU implementations possible. It is also possible
m
to implement SA in a field-programmable gate ar-
e
t ray (FPGA), but the additional speedup over GPU
f
o
n 104 implementation is limited and generally not worth
o the increased cost of hardware.
ti
a
el InAppendixBwegivefurtherdetailsofthesoftware
r
r solvers and parameterizations we used. We also dis-
o
c
o cuss algorithms we omitted because of prohibitive
t
u runtimes.
A
103
3.0 3.5 4.0 4.5 5.0
Ruggedness (R) V. OPTIMIZATION
FIG. 2. Box plots showing ruggedness versus classical We measure the expected time to solution (TTS) of
hardness. We hold ρ and α fixed at 3 and 0.65, respec- different solvers on the inputs, calculated as
tively,andvarytheruggednessRofthenativeChimera
problem. AteachvalueofRwegenerated100instances; time per anneal
points indicate outliers. Classical hardness is measured TTS= ground state probability.
using the autocorrelation of temperature. There is a
clearpositivecorrelationbetweenruggednessandclassi- We consider only annealing times and exclude pro-
cal hardness.
gramming and readout times from our analysis as
these are not part of the algorithms proper.
Figure2illustratestherelationshipbetweenrugged- For a given value of ρ, we choose α to maximize
ness and classical hardness. Confirming our intu- the difficulty of the logical problem (see Figure 1).
ition, FCL problems with greater ruggedness are For each selection of ρ and R, we generate 100 FCL
characterized by greater classical hardness. problems at each problem size and solve each prob-
lem with each solver.
IV. SOFTWARE SOLVERS
A. Varying ruggedness via logical complexity
The four software solvers we consider are GPU im-
In our first experiment, we vary ρ and set R=ρ. In
plementationsofSA,QMC,andSVMC,andSelby’s
thiscasetheonlyfreeparameterρcontrolsboththe
CPU implementation of HFS [9, 10].
ruggedness and the logical complexity of the inputs.
Recent studies of D-Wave QPUs have not included Time-to-solution plots are shown in Figure 3. At
GPU-basedsoftwaresolversdespitethefactthatSA the largest problem size, the D-Wave QPU is three
is very amenable to GPU implementation [40]. The orders of magnitude faster than the fastest software
addition of GPU solvers is a significant raising of solver for each value of ρ. D-Wave’s speedup over
thebarintermsofsoftwarecompetition,andmeans software peaks at 2600 times for ρ=4.
thatsolversthatcanbeimplementedonGPUshave
As ρ increases, the impact of local ruggedness in-
taken a leap forward relative to solvers that cannot.
creasesasthelogicalHamiltonianiscompressedrel-
Runonmodernhardware,ourGPU-basedalgorithm
ative to the local wells induced by the clusters. The
implementations are roughly 1000 times faster than
performance of SA drops off sharply while the per-
thecorrespondingsingle-coreCPUimplementations.
formance of DW and QMC declines gracefully. The
Mandrà et al. [4] analyzed the performance of a performance of HFS decreases only very slightly.
diverse set of solvers on the inputs of Denchev et HFS is not affected by the local ruggedness because
al. [3]. However the lack of GPU implementations it is tailored to the Chimera topology and uses up-
of these solvers means that most are unlikely to be dates that contain entire clusters; the performance
8
ρ = 3 ρ = 4 ρ = 5 ρ = 6
) 108
s
µ 107
(
n 106
tio 105 DW
olu 104 SA
os 103 SVMC
t 102 QMC
me 101 HFS
Ti 100
4 8 12 16 4 8 12 16 4 8 12 16 4 8 12 16
Logical lattice size
FIG. 3. Time to solution for D-Wave and software solvers with range values ρ ∈ {3,4,5,6}. For each value of ρ, α
ischosentomaximizelogicalhardness. Shownaremedianvalues(over100inputsateachsize)with95%confidence
intervals.
degradation is due to the slight increase in logical
problem hardness.
102 DW
All solvers except HFS have strictly convex scaling SA
curves because the anneal lengths are optimized for SVMC
the largest problem size and are too long for the k QMC
r
smaller problems. HFS does not use fixed-length o
w
annealsandendsupusingshorterannealsonsmaller
ve 101
inputs. ti
a
Thoughtruescalingismaskedbytheinabilitytoop- el
R
timizeparametersforsmallerinputs[1,41], wenote
that the performance of the D-Wave QPU scales at
least as well as the software solvers between the two 100
largest problem sizes.
2.5 3.0 3.5 4.0 4.5 5.0 5.5
Ruggedness (R)
B. Varying ruggedness by scaling FIG.4. Ruggedness(increasingfromlefttoright)versus
relative work for various solvers. Relative work for each
solver is calculated as TTS divided by median TTS at
In our second experiment, we fix ρ = 3 and vary R = 3.0. The solvers have notably different responses
theruggednessR. Thiskeepsthelogicalcomplexity to increasing ruggedness, with SA struggling the most,
constant,allowingustoisolatetheimpactofrugged- followed by SVMC, then QMC, then the D-Wave QPU.
ness on the various solvers. Here we consider only HFS deals with these energy barriers using exponential
the largest problem size having a 16×16 logical lat- brute force; therefore the parameter R does not affect
itsperformance. Markersindicatemedians(over100in-
tice.
puts) and error bars indicate 95% confidence intervals
Consistent with our findings when varying ρ, tun- for the median.
ing the ruggedness directly by varying R increases
difficulty dramatically for simulated annealing and
less so for other solvers (see Figure 4). Excluding
over SVMC indicates that crucial information is be-
HFS,whosebehaviorisconstantinthisexample,the
ing lost in the mean-field approximation. The im-
workrequiredbyasolveressentiallyscalesaccording
proved scaling of QA (i.e., the D-Wave QPU) over
to its quantumness. The D-Wave QPU is most ca-
QMC may indicate that QMC is failing to faithfully
pable of dealing with ruggedness. QMC—the most
simulate the dynamics of the QA processor, or it
faithful classical simulation of quantum annealing—
may simply be an artifact of our inability to use
comes next, followed by SVMC, which is a mean-
fasterD-Waveanneals. Thisbearsfurtherinvestiga-
field approximation to QMC. Bringing up the rear
tion using future D-Wave QPUs with faster anneal-
is SA, a simulation of a fully classical process.
ing times, again utilizing the tunable ruggedness of
The improved scaling (versus ruggedness) of QMC FCL problems.
9
VI. SAMPLING statescomeinantipodalpairs,3 andbyextensionso
do valleys. We treat each pair of antipodal valleys
as a single valley since it is trivial to move from one
The ability of an Ising solver to sample diverse op-
to the other.
tima has both practical and theoretical importance.
Ground state sampling in combinatorial problems Withvalleysdefinedinthisway,wedefinethetime-
is the basis for construction of space-efficient SAT- to-all-valleys(TTAV)metricastheexpectedamount
based membership filters [42, 43]. The associated of annealing time required to draw at least one
complexityclass,#P—thecountinganalogofNP— sample from each valley. This metric captures the
hasbeenthesubjectofextensiveresearchintheoret- hardest part of sampling from these distributions—
icalcomputersciencesincethe1970s[44]. Sampling finding ground states in every mode—and ensures
from the Boltzmann distribution, in which states a diverse set of solutions. With at least one sam-
with equal energy are sampled with equal proba- ple from each valley, it is possible to find all ground
bility, is of particular interest in machine learning. states using only a modest amount of postprocess-
BoltzmannsamplesareusedtotrainBoltzmannma- ing. The TTAV metric is most meaningfully inter-
chines,ataskknowntobebothhardanduseful[45]. preted relative to TTS since hitting valleys directly
depends on hitting ground states.
While machine learning applications typically de-
pendonfinite-temperatureBoltzmannsampling,us-
ing near-optimal states as well as optimal states,
we focus on zero temperature sampling to sim- B. Mining for interesting valley structure
plify our investigation. This saves us from hav-
ing an additional input parameter β—the inverse Samplingfromallvalleysisnotalwaysmuchharder
temperature—that we would have to either set ar-
than finding a single ground state—an input may
bitrarily or determine empirically. Empirical esti-
have only a single valley or may have valleys that
mation of β can be challenging [46] and basing the are all very close in Hamming space. We wish to
target β on the output of a solver would arguably generate inputs with multiple valleys that are well-
give that solver an unfair advantage.
separated. Sampling from distributions with mul-
tiple, well-separated valleys is particularly hard [48]
andhasimportantapplicationssuchasclassification
using deep Boltzmann machines [49].
Becausewedefinevalleysasclustersofgroundstates
in the logical space, analyzing the valley structure
A. Sampling from all valleys of an input is tractable. The largest logical graphs
are 16×16 lattices having treewidth 16, so solving
a logical problem using dynamic programming and
The expected time required for a solver to find all
returning some fixed number of ground states typ-
ground states of a problem is known, both in the
ically takes less than a second. This allows us to
equiprobable case and the biased case [47]. In the
mine for inputs having interesting valley structure.
caseofanIsingspinproblem,groundstatesoftenlie
in connected valleys in Hamming space, and given We quantify interesting valley structure using the
one ground state in the cluster it is easy to find the distribution of spin overlap P(q) [28, 29] at zero
rest. We therefore adopt a more practical metric temperature,i.e.,fortwogroundstatessampleduni-
based on the time required to sample all valleys of formly with replacement, what fraction of spins do
ground states. they have in common? The random variable P(q)
In ground states of FCL problems, all clusters have takes values in the range [−1,1]. For inputs with-
out fields the distribution is symmetric about zero;
their spins in agreement; therefore the distance be-
we can therefore consider the distribution of the ab-
tweenanytwogroundstatesinthenativeHamming
space is a multiple of 8. However, ground states can solute value P(|q|). We define the mean overlap as
be adjacent (i.e., differ by a single spin) in the log- theexpectationofP(|q|). Inputswithmeanoverlap
near 1 tend to resemble ferromagnets—if there are
ical space. We define a valley as a set of ground
multiple valleys they will be close together. Inputs
states that are connected in the logical Hamming
space. While it is nontrivial to move from one state
to another in the same valley because of the single
tall, thin energy barrier, it can still be done with 3 ForHamiltonianswithnofields,flippingallspinsofastate
a modest amount of postprocessing. We also note doesnotchangetheenergy. Thereforetheantipode(nega-
that, sinceFCLproblemsdonothavefields, ground tion)ofanygroundstateisalsoagroundstate.
10
with lower mean overlap tend to have valleys that 2. KL-divergence of valley distributions
are well-separated.
Inputs that are hard to sample from have multiple TheTTAVresultsshowninFigure5failtoaddressa
valleys that are well-separated. We mine for such specificfear—thatinasignificantminorityofinputs
inputs as follows. First we reject any input with there are valleys that the D-Wave QPU would be
more than 1000 ground states, as these slow down simplyunabletofindduetoquantumsamplingbias.
our analysis and may be too easy. Second, we reject To address this, we calculate for each (solver, prob-
any input that does not have at least 4 valleys since lem) pair the KL-divergence between empirical val-
we want valley collection to be nontrivial. Finally, ley distributions and exact valley distributions (i.e.,
we reject any input with a mean overlap of 0.7 or relative valley sizes). KL-divergence is an asymmet-
higher since we want valleys to be well-separated. ric measure of the distance between two probability
distributions; we calculate it such that it is infinite
if the solver fails to see all valleys, i.e.,
(cid:88) P(v)
C. Sampling results KLD= P(v)log ,
P(cid:98)(v)
valleysv
We generated problems at the 16×16 lattice size where P(v) is the true Boltzmann probability of
with α = 0.85 and ρ = R = 6 and mined them valley v and P(cid:98)(v) is the sample estimate of P(v),
for interesting valley structure as described above.
conditioned on samples being ground states. This
We generated 50,000 inputs and rejected all but 74.
KL-divergence measure includes two types of error.
This gave us an acceptance rate of roughly 0.15% First,thereisadistributionalerror,sinceeachsolver
of inputs. We sought to answer the question, after
samples from a distribution that differs from the
x seconds of annealing, what fraction of the valleys Boltzmann distribution. Second, there is a sample
has each solver seen? For each problem we drew a
size error, since our sample estimate has finite size
numberofsamplesaccordingtothesolverasfollows:
and therefore differs from the solver’s true distribu-
tion. Inthiscontextitisappropriatetoincludeboth
types of error.
D-Wave QPU: 100,000 samples at 5µs (in batches
Figure 6 shows histograms of KL-divergence for the
of 100 per spin-reversal transform)
different solvers. For these FCL problems, fears of
SA: 100,000 samples valleys suppressed by quantum sampling bias are
QMC: 5000 samples unfounded. The D-Wave QPU has a superior KL-
divergence distribution than any of the software
HFS: 5000 samples
solversevenwhenannealingforthreeordersofmag-
nitude less time. On the single input for which the
D-Wave KLD was infinite because at least one val-
ley was never seen, it was also infinite for all other
solvers.
1. Time to all valleys
Results for the TTAV metric are shown in Figure 3. Raw error on model marginals
5. The 2000-qubit D-Wave QPU is the fastest of
the solvers, hitting all valleys in a median time of
The TTAV metric and valley distributions can be
roughly 30ms. The fastest software competition
thought of as representing what sample quality
was HFS, which hit all valleys in a median time of
would look like with postprocessing. We would
roughly 30s.
also like a more raw metric that does not have
In certain situations quantum annealing in the this implicit postprocessing. For this we consider
transverse-field Ising model is subject to inherent marginals of the zero-temperature Boltzmann dis-
sampling bias [50–53], although that does not prove tribution. Specifically, we consider the spin-spin ex-
to be a significant problem here. While the slope of pectations, i.e., for each coupler, what is the ex-
theD-WavecurveisslightlylesssteepthantheHFS pectedproductofthetwoincidentspinsinthezero-
curve, indicating that its samples might be less di- temperature Boltzmann distribution? The L error
1
verse,theD-WaveQPUstillmanagestooutperform on these marginals (i.e., the empirical estimates of
thecompetitionbyaboutthreeordersofmagnitude. spin-spin expectations minus true expectations) is