Optimal Anytime Constrained Simulated ? Annealing for Constrained Global Optimization Benjamin W. Wah and Yi Xin Chen Departmentof Electrical andComputer Engineering and the Coordinated Science Laboratory Universityof Illinois, Urbana-Champaign 1308 West Main Street Urbana,IL 61801, USA fwah, [email protected] URL: http://www.manip.crhc.uiuc.edu Abstract. Inthispaperweproposeanoptimalanytimeversionofcon- strained simulated annealing (CSA) for solving constrained nonlinear programming problems (NLPs). One of the goals of the algorithm is to generate feasible solutions of certain prescribed quality using an av- erage timeof thesameorder of magnitudeas thatspentby theoriginal CSA with an optimal cooling schedule in generating a solution of sim- ilar quality. Here, an optimal cooling schedule is one that leads to the shortestaveragetotalnumberofprobeswhentheoriginal CSAwiththe optimal scheduleis runmultipletimesuntilit(cid:12)ndsa solution. Our sec- ondgoalistodesignananytimeversionofCSAthatgeneratesgradually improving feasible solutions as more time is spent, eventually (cid:12)nding a constrained global minimum (CGM). In our study, we have observed a monotonically non-decreasing function relating the success probability ofobtainingasolutionandtheaveragecompletiontimeofCSA,andan exponential function relating the objective target that CSA is looking for and the average completion time. Based on these observations, we have designed CSAAT(cid:0)ID, the anytime CSA with iterative deepening that schedules multiple runs of CSA using a set of increasing cooling schedules and a set of improving objective targets. We then prove the optimality of our schedules and demonstrate experimentally the results on four continuous constrained NLPs. CSAAT(cid:0)ID can be generalized to solving discrete, continuous, and mixed-integer NLPs, since CSA is applicable to solve problems in these three classes. Our approach can alsobegeneralizedtootherstochasticsearchalgorithms,suchasgenetic algorithms, and be used to determine the optimal time for each run of suchalgorithms. 1 Introduction Alargevarietyofengineeringapplicationscanbeformulatedasconstrainednon- linear programming problems (NLPs). Examples include production planning, ? Proc. Sixth International Conference on Principles and Practice of Constraint Pro- gramming,Springer-Verlag, Sept. 2000 2 Benjamin W. Wah andYi XinChen computer integrated manufacturing, chemical control processing, and structure optimization.Someapplicationsthatareinherentlyconstrainedorhavemultiple objectivesmaybeformulatedasunconstrainedmathematicalprogramsduetoa lackofgoodsolutionmethods. Examplesincludeapplicationsin neural-network learning, computer-aided design for VLSI, and digital signal processing. High- quality solutions to these applications are important because they may lead to lower implementation and maintenance costs. By (cid:12)rst transforming multi-objective NLPs into single-objective NLPs, all constrained NLPs can be considered as single-objective NLPs. Without loss of generality, we consider only minimization problems in this paper. A general discrete constrained NLP is formulated as follows: minimize f(x) subject to g(x)(cid:20)0 x=(x1;x2;:::;xn) is a vector (1) h(x)=0 of discrete variables; T where f(x) is a lower-bounded objective function, h(x) = [h1(x);(cid:1)(cid:1)(cid:1);hm(x)] is a set of m equality constraints, and all the discrete variables in x are (cid:12)nite. Both f(x) and h(x) can be either linear or nonlinear, continuous or discrete (i.e. discontinuous), and analytic in closed forms or procedural. In particular, we are interested in application problems whose f(x), g(x), and h(x) are non- di(cid:11)erentiable. Our general formulation includes both equality and inequality constraints, although it is shown later that inequality constraints can be trans- formed into equality constraints. The search space (sometimes called solution space) X is the (cid:12)nite set of all possible combinations of discrete variables in x that may or may not satisfy the constraints. Such a space is usually limited by some bounds on the range of variables. To characterize the solutions sought in discrete space, we de(cid:12)ne for discrete problems, N(x), the neighborhood [1] of point x in discrete space X, as a (cid:12)nite 0 0 user-de(cid:12)ned set of points fx 2Xgsuch that x is reachablefrom x in one step, 0 0 00 that x 2 N(x) () x 2 N(x), and that it is possible to reach every other x starting from anyx in one or more steps through neighboringpoints. Note that neighboring points may be feasible or infeasible. Point x 2 X is called a discrete constrained local minimum (CLM) if it satis(cid:12)es two conditions: a) x is a feasible point, implying that x satis(cid:12)es all the 0 0 constraintsg(x)(cid:20)0andh(x)=0,and b) f(x)(cid:20)f(x), forall x 2N(x) where 0 x is feasible. Aspecialcasein which xisaCLMis when x isfeasibleandall its neighboring points are infeasible. Point x 2 X is called a constrained global minimum (CGM) i(cid:11) a) x is a 0 0 feasible point, and b) for every feasible point x 2 X, f(x) (cid:21) f(x). According to our de(cid:12)nitions, a CGM must also be a CLM. In the next section we formulate the problem that we study in this paper. Thisisfollowedbyasummaryoftheconstrainedsimulatedannealingalgorithm (CSA) in Section 3 and a statistical model on the CSA procedure in Section 4. Finally, we present our proposed anytime CSA with iterative deepening in Sec- tion 5 and our experimental results in Section 6. OptimalAnytimeCSA for Constrained Global Optimization 3 2 Formulation of the Problem Constrained simulated annealing (CSA) [14] (see Section 3) has been proposed asapowerfulglobalminimizationalgorithmthatcanguaranteeasymptoticcon- vergence to a CGM with probability one when applied to solve (1). One of the di(cid:14)culties in using CSA, like conventional simulated annealing (SA)[8],istodetermineanannealingschedule,orthewaythattemperaturesare decreasedinordertoallowasolutionofprescribedqualitytobefoundquickly.In general,theasymptoticconvergenceofCSAtoaCGMwithprobabilityonewas provedwithrespecttoacoolingscheduleinwhichtemperaturesaredecreasedin alogarithmicfashion[14],basedontheoriginalnecessaryandsu(cid:14)cientcondition of Hajek developed for SA [6]. It requires an in(cid:12)nitely long cooling schedule in order to approach a CGM with probability one. In practice, asymptotic convergence can never be exploited since any algo- rithm must terminate in (cid:12)nite time. There are two ways to complete CSA in (cid:12)nite time. The (cid:12)rst approach uses an in(cid:12)nitely long logarithmically decreas- ing cooling schedule but terminates CSA in (cid:12)nite time. This is not desirable because CSA will most likely not have converged to any feasible solution when terminated at high temperatures. Thesecondapproachistodesignacoolingschedulethatcancompleteinpre- scribed(cid:12)nitetime. Inthispaperweusethefollowinggeometric cooling schedule with cooling rate (cid:11): Tj+1 =(cid:11)(cid:2)Tj; j =0;(cid:1)(cid:1)(cid:1);N(cid:11)(cid:0)1; (2) where (cid:11) < 1, j measures the number of probes in CSA (assuming one probe is made at each temperature and all probes are independent), and N(cid:11) is the total numberofprobesintheschedule.Aprobehereisaneighboringpointexamined by CSA, independent of whether CSA accepts it or not. We use the number of probes expended to measure overhead because it is closely related to execution time. Given T0 >TN(cid:11) >0 and (cid:11), we can determine N(cid:11), the length of a cooling schedule, as: TN(cid:11) N(cid:11) =log(cid:11) : (3) T0 Note that the actual numberof probesin a successfulrun maybe less than N(cid:11), asarunisterminatedassoonasadesirablesolutionisfound.However,itshould beveryclosetoN(cid:11),assolutionsaregenerallyfoundwhentemperaturesarelow. The e(cid:11)ect of using a (cid:12)nite (cid:11) is that CSA will converge to a CGM with probability less than one. When CSA uses a (cid:12)nite cooling schedule N(cid:11), we are interested in its reachability probability PR(N(cid:11)), or the probability that it will (cid:12)ndaCGMinanyofitspreviousprobeswhenitstops.Letpj betheprobability th that CSA (cid:12)nds a CGM in its j probe, then PR(N(cid:11)) when it stops is: N(cid:11) PR(N(cid:11))=1(cid:0) (1(cid:0)pj): (4) j=1 Y 4 Benjamin W. Wah andYi XinChen Table 1. An example illustrating trade-o(cid:11)s between the expected total number of probesinmultiplerunsof CSAto(cid:12)ndaCGM, thecooling rate usedineach run,and the probability of success in each run. The optimal cooling rate at (cid:11)=0:574 leads to theminimumaveragetotalnumberofprobesto(cid:12)ndaCGM.Notethattheprobability of success is not the highest in one run using the optimal cooling rate. (The problem 0 solved is de(cid:12)ned in (6). Each cooling scheduleis run200 timesusing f =200.) (cid:11) cooling rate in one run 0.139 0.281 0.429 0.574 0.701 0.862 0.961 0.990 N(cid:11) avg. cooling schedule 99.8 148.0 207.5 296.0434.5 798.0 2414.0 6963.5 T(cid:11) avg. CPU timeper run 0.026 0.036 0.050 0.074 0.11 0.18 0.54 1.58 PR(N(cid:11)) succ. prob. of one run 1% 10% 25% 40% 55% 70% 85% 95% 1 PR(N(cid:11)) avg. runs to (cid:12)ndsol’n 100 10 4 2.5 1.82 1.43 1.18 1.05 N(cid:11) PR(N(cid:11)) avg. probes to (cid:12)nd sol’n 9980 1480 830 740 790 1140 2840 7330 T(cid:11) PR(N(cid:11)) avg. timeto(cid:12)ndsol’n 2.6 0.36 0.20 0.19 0.20 0.25 0.64 1.7 Reachability can be maintained by keeping the best solution found at any time and by reporting the best solution when CSA stops. Although the exact value of PR(N(cid:11)) is hard to estimate and control,we can always improve the chance of hitting a CGM by running CSA multiple times, each using a (cid:12)nite cooling schedule. Given PR(N(cid:11)) for each run of CSA and that all runs are independent, the expected number of runs to (cid:12)nd a solution is 1 PR(N(cid:11)) and the expected total number of probes is: 1 Expected total number of j(cid:0)1 N(cid:11) = PR(N(cid:11))(1(cid:0)PR(N(cid:11))) N(cid:11)j = (5) probes to (cid:12)nd a CGM PR(N(cid:11)) j=1 X Table1illustratestrade-o(cid:11)sbetweenN(cid:11)andPR(N(cid:11))insolvingaconstrained NLP with a 10-dimensionalRastrigin function as its objective: n 2 minimize f(x)=F 10n+ (xi (cid:0)10cos(2(cid:25)xi));200 (6) i=1 ! X subject to j(xi(cid:0)4:2)(xi+3:2)j(cid:20)0:1 for n=10; where F is the transformation function de(cid:12)ned later in (11). A run of CSA is successfulifit(cid:12)ndsafeasiblepointwithobjectivevaluelessthanorequalto200 in this run, and the probabilityto hit aCGM is calculated by the percentageof successful runs over200 independent runs. Table 1 shows that PR(N(cid:11)) increases towards one when (cid:11) is increased. A long cooling schedule is generally undesirable because the expected number of probes in (5) is large, even though the success probability in one run of CSA approachesone.Ontheotherhand,ifthescheduleistooshort,then thesuccess probability in one run of CSA is low, leading to a large expected number of probes in (5). An optimal schedule is one in which CSA is run multiple times and the expected total number of problems in (5) is the smallest. De(cid:12)nition 1. An optimal cooling schedule is one that leads to the smallest av- erage total number of probes of multiple runs of CSA in order to (cid:12)nd a solution of prescribed quality. OptimalAnytimeCSA for Constrained Global Optimization 5 N(cid:11) Table1showsthat PR(N(cid:11)) isaconvexfunctionwithaminimumat(cid:11)=0:574. That is, the average total number of probes of multiple runs of CSA to (cid:12)nd a CGM (cid:12)rst decreases and then increases, leading to an optimal cooling rate of 0.574 and an averageof 2.5 runs of CSA to (cid:12)nd a CGM. This paper aims at determining an optimal cooling schedule that allows a solutionofprescribedqualitytobefoundintheshortestaverageamountoftime. Inorderto(cid:12)ndtheoptimalcoolingschedule,usersgenerallyhavetoexperiment by trial and error until a suitable schedule is found. Such tuning is obviously not practical in solving large complex problems. In that case, one is interested in running a single versionof the algorithm that can adjust its cooling schedule dynamically in order to (cid:12)nd a schedule close to the optimal one. Moreover, one is interested in obtaining improved solutions as more time is spent on the algorithm.Such an algorithmis an anytime algorithm because it alwaysreports the best solution found if the search were stopped at any time. Thegoalsofthispaperaretwofolds.First,weliketodesigncoolingschedules for CSA in such a ways that the average time spent in generating a solution of certain quality is of the same order of magnitude as that of multiple run of the original CSA with an optimal cooling schedule. In other words, the new CSA is optimal in terms of average completion time up to an order of magnitude with respect to that of the original CSA with the best cooling schedule. Second, we like to design a set of objective targets that allow an anytime-CSA to generate improved solutions as more time is spent, eventually (cid:12)nding a CGM. The approach we take in this paper is to (cid:12)rst study statistically the per- formance of CSA. Based on the statistics collected, we propose an exponential model relating the value of objective targets sought by CSA and the average execution time, and a monotonically non-decreasing model relating the success probabilityofobtainingasolutionandtheaverageexecutiontime.Thesemodels lead to the design of CSAAT(cid:0)ID, the anytime CSA with iterative deepening, that schedules multiple runs of CSA using a set of increasing cooling schedules that exploit the convexity of (5) and a set of improving objective targets. Let Topt(fi) be the averagetime taken by the original CSA with an optimal cooling schedule to (cid:12)nd a CLM of value fi or better, and TAT(cid:0)ID(fi) be the average time taken by CSAAT(cid:0)ID to (cid:12)nd a CLM of similar quality. Based on the principle of iterative deepening [9], we prove the optimality of CSAAT(cid:0)ID by showing: TAT(cid:0)ID(fi)=O(Topt(fi)) where i=0;1;2;(cid:1)(cid:1)(cid:1) (7) (cid:3) Further,CSAAT(cid:0)ID returnssolutionsofvaluesf0 >(cid:1)(cid:1)(cid:1)>f thataregradually improving with time. ThereweremanypaststudiesonannealingschedulesinSA. Schedules stud- ied include logarithmic annealing schedules [6] that are necessary and su(cid:14)cient for asymptotic convergence, schedules inversely proportional to annealing steps in FSA [13] that areslowwhen the annealingstep is large,simulated quenching scheduling inASA [7]that isnot e(cid:14)cientwhen thenumber ofvariablesislarge, proportional (or geometric) cooling schedules [8] using a cooling rate between 6 Benjamin W. Wah andYi XinChen 0.8-0.99oraratecomputedfromtheinitialand(cid:12)naltemperatures[11],constant annealing [3], arithmetic annealing [12], polynomial-time cooling [2] adaptive temperature scheduling based on the acceptance ratio of bad moves [16], and non-equilibrium SA (NESA) [4] that operates at a non-equilibrium condition and that reduces temperatures as soon as improved solutions are found. All the past studies aimed at designing annealing schedules that allow one run of SA to succeed in getting a desirable solution. There was no prior studies that examine trade-o(cid:11)s between multiple runs of SA using di(cid:11)erent schedules and the improved probability of getting a solution. Our approach in this paper isbasedonmultiple runsofCSA,whoseexecutiontimesincreaseinageometric fashionandwhoselastrun(cid:12)ndsasolutiontotheapplicationproblem.Basedon iterative deepening [9], the total time of all the runs will be dominated by the last run and will only be a constant factor of the time taken in the last run. 3 Constrained Simulated Annealing Inthissection,wesummarizeourLagrange-multipliertheoryforsolvingdiscrete constrained NLPs and the adaptation of SA to look for discrete saddle points. Consider a discrete equality-constrained NLP: minimizex f(x) (8) subject to h(x)=0; where x = (x1;:::;xn) is a vector of discrete variables, and f(x) and h(x) are analytic in closed forms (but not necessarily di(cid:11)erentiable) or procedural. An inequality constraint like gj(x) (cid:20) 0 can be transformed into an equivalent equality constraint max(gj(x);0) = 0. Hence, without loss of generality, our theory only considers application problems with equality constraints. A generalized discrete Lagrangian function of (8) is de(cid:12)ned as follows: T Ld(x;(cid:21))=f(x)+(cid:21) H(h(x)); (9) where H is a continuous transformation function satisfying H(y)=0 i(cid:11) y =0. (cid:3) (cid:3) We de(cid:12)ne a discrete saddle point (x ;(cid:21) ) with the following property: (cid:3) (cid:3) (cid:3) (cid:3) Ld(x ;(cid:21))(cid:20)Ld(x ;(cid:21) )(cid:20)Ld(x;(cid:21) ) (10) (cid:3) for all x 2 N(x ) and all (cid:21) 2 R. Essentially, a saddle point is one in which (cid:3) Ld(x ;(cid:21)) is at a local maximum in the (cid:21) subspace and at a local minimum in the x subspace. The concept of saddle points is very important in discrete problems because, starting from them, we can derive the (cid:12)rst-order necessary and su(cid:14)cient condition for CLM that lead to global minimization procedures. This is stated formally in the following theorem [15]: Theorem 1. First-order necessary and su(cid:14)cient condition for CLM. A point in the variable space of (8) is a CLM if and only if it satis(cid:12)es the saddle-point condition (10). OptimalAnytimeCSA for Constrained Global Optimization 7 1. procedure CSA 2. set initial x=(x;(cid:21)) by randomlygenerating xandbysetting (cid:21) 0; 3. initialize temperatureT0 tobe large enough and cooling rate 0<(cid:11)<1 4. set NT (numberof probes pertemperature); 5. while stoppingcondition is notsatis(cid:12)ed do 6. for n 1 to NT do 0 0 7. generate x fromN(x)using G(x;x); 0 0 8. accept x with probability AT(x;x) 9. end for 10. reduce temperature byT (cid:11)(cid:2)T; 11. end while 12.end procedure Fig.1. CSA:Constrained simulatedannealing [15]. Figure 1 describes CSA [14] that looks for saddle points with the minimum objective value. By carrying out probabilistic ascents in the (cid:21) subspace with a probabilityofacceptancegovernedbyatemperature,itlooksforlocalmaximain thatsubspace.Likewise,bycarryingoutprobabilistic descentsinthexsubspace, it looksforlocalminimain thatsubspace.It canbe shownthatthe pointwhere the algorithm stops is a saddle point in the Lagrangianspace. CSAdi(cid:11)ers fromtraditionalSAthat onlyhasprobabilistic descentsin the x space,andthepointwhereSAstopsisalocalminimumoftheobjectivefunction of anunconstrained optimization. By extending the searchto saddlepoints in a Lagrangian space, CSA allows constrained optimization problems to be solved in a similar way as SA in solving unconstrained optimization problems. 0 0 UsingdistributionG(x;x)togeneratetrialpointx inneighborhoodN(x),a 0 MetropolisacceptanceprobabilityAT(x;x),andalogarithmiccoolingschedule, CSA hasbeen provento haveasymptotic convergencewith probabilityone to a CGM. This is stated in the following theorem without proof [14]. Theorem 2. Asymptotic convergence of CSA.TheMarkovchainmodelingCSA converges to a CGM with probability one. AlthoughTheorems1and2werederivedfordiscreteconstrainedNLPs,itis applicable to continuous and mixed-integer constrained NLPs if all continuous variables were (cid:12)rst discretized. Discretization is acceptable in practice because numerical evaluations of continuous variables using digital computers can be consideredasdiscreteapproximationoftheoriginalvariablesuptoacomputer’s precision.Intuitively,ifdiscretizationis(cid:12)neenough,thesolutionsfoundarefairly good approximations to the original solutions. Due to space limitations, we do not discuss the accuracy of solutions found in discretized problems [17]. In the rest of this paper, we apply CSA to solve constrained NLPs, assuming that continuousvariablesin continuousandmixed-integerNLPsare(cid:12)rstdiscretized. 8 Benjamin W. Wah andYi XinChen 4 Performance Modeling of CSA The performanceof aCSA procedure to solvea givenapplicationproblemfrom a random starting point can be measured by the probability that it will (cid:12)nd a solution of a prescribed quality when it stops and the average time it takes to (cid:12)ndthesolution.Therearemanyparametersthatwilla(cid:11)ecthowCSAperforms, such as neighborhood size, generation probability, probability of accepting a pointgenerated,initialtemperature,coolingschedule,andrelaxationofobjective function. In this section, we focus on the relationship among objective targets, cooling schedules, and probabilities of (cid:12)nding a desirable solution. 4.1 Relaxation of objective target One way to improve the chance of (cid:12)nding a solution by CSA is to look for CLM instead of CGM. An approach to achieve this is stop CSA whenever it (cid:12)nds a CLM of a prescribed quality. This approach is not desirable in general because CSA may only (cid:12)nd a CLM when its temperatures are low, leading to littledi(cid:11)erenceintimesbetween(cid:12)ndingCLMandCGM.Further,itisnecessary to provethe asymptotic convergenceof the relaxed CSA procedure. A second approach that we adopt in this paper is to modify the constrained 0 NLP in such a way that a CLM of value smaller than f in the original NLP is consideredaCGMintherelaxedNLP.SincetheCSAprocedureisunchanged,its asymptoticconvergencebehaviorremainsthesame.TherelaxedNLPisobtained by transforming the objective target of the original NLP: 0 0 0 f if f(x)(cid:20)f F(f(x);f )= 0 (11) f(x) if f(x)>f : (cid:26) (cid:3) Assumingthatf isthevalueoftheCGMintheoriginalNLP,itfollowsthat (cid:3) 0 (cid:3) 0 0 (cid:3) the value of the CGM of the relaxed NLP is f if f (cid:20) f and is f if f > f . Moreover, since the relaxed problem is a valid NLP solvable by CSA, CSA will convergeasymptotically to a CGM of the relaxed NLP with probability one. As a relaxed objective function leads to a possibly larger pool of solution points, we expect CSA to have a higher chance of hitting one of these points during its search. This property will be exploited in CSAAT(cid:0)ID in Section 5.2. 0 4.2 Exponential model relating f and N(cid:11) for (cid:12)xed PR(N(cid:11)) In order to develop CSAAT(cid:0)ID that dynamically controls its objective targets, 0 weneed toknowthe relationshipbetweenf ,the degreeofobjectiverelaxation, and N(cid:11), the number of probes in one run of CSA, for a (cid:12)xed PR(N(cid:11)). In this sectionwe(cid:12)ndthisrelationshipbystudyingthestatisticalbehaviorinevaluating four continuous NLPs by CSA. Figure 2 shows a 3-D graph relating the parameters in solving (6), in which PR(N(cid:11)) was obtained by running CSA 200 times for each combination of N(cid:11) 0 0 and f . It shows an exponentially decreasingrelationship between f and N(cid:11) at OptimalAnytimeCSA for Constrained Global Optimization 9 Trace of one run of anytime-CSA f’ 220 200 180 160 1.0 0.8 7 8 9 10 11 12 13 0 0.20.40P.6R(Nα) log2(Nα) 0 Fig.2. A 3-D graph showing an exponentially decreasing relationship between f and N(cid:11) and a monotonically non-decreasing relationship between PR(N(cid:11)) and N(cid:11) when CSA is applied to solve (6). The dotted line shows the trace taken in a run of CSAAT(cid:0)ID. 2 Table 2. The averages and standard deviations of coe(cid:14)cient of determination R on 0 linear (cid:12)tsof f andlog2(N(cid:11)) for (cid:12)xedPR(N(cid:11)). 2 2 Benchmark Mean(R ) Std.Dev.(R ) G1 [10] 0.9389 0.0384 G2 [10] 0.9532 0.0091 Rastrigin 0.9474 0.0397 Problem 5.2 [5] 0.9461 0.0342 (cid:12)xedPR(N(cid:11))andamonotonicallynon-decreasingrelationshipbetweenPR(N(cid:11)) 0 and N(cid:11) at (cid:12)xed f . These observationslead to the followingexponential model: 0 (cid:0)af N(cid:11) =ke for (cid:12)xed PR(N(cid:11)) and positive real constants a and k: (12) To verify statistically our proposed model, we performed experiments on several benchmarks of di(cid:11)erent complexities: G1, G2 [10], Rastrigin (6), and Floudas andPardalos’Problem5.2[5].Foreachproblem,wecollected statistics 0 0 onf andN(cid:11) atvariousPR(N(cid:11)), regressedalinearfunctionon f andlog2(N(cid:11)) 2 to (cid:12)nd a best (cid:12)t, and calculated the coe(cid:14)cient of determination R of the (cid:12)t. 2 Table 2 summarizes the average and standard deviation of R of the linear (cid:12)t 2 2 for each test problem, where R very close to 1 shows a good (cid:12)t. Since R has 0 averagesverycloseto oneandhassmallstandarddeviations, f is veri(cid:12)edto be exponential with respect to N(cid:11) at (cid:12)xed PR(N(cid:11)). 4.3 Su(cid:14)cient conditions for the existence of N(cid:11)opt 0 N(cid:11) In order for N(cid:11)opt to exist at (cid:12)xed f , PR(N(cid:11)) in (5) must have an absolute minimum in (0;1). Such a minimum exists if PR(N(cid:11)) satis(cid:12)es the 0f0ollowing su(cid:14)cientconditions:a)PR(0)=0andlimN(cid:11)!1PR(N(cid:11))=1,andb)PR(0)>0. We do not show the proof of these conditions due to space limitation. 10 Benjamin W. Wah andYi XinChen 100 14000 12000 80 )α 60 N)α10000 P (NR 40 /P (αR 8000 N 6000 20 4000 0 2000 0 2000 4000 6000 8000 0 2000 4000 6000 8000 Nα Nα a) PR(N(cid:11))satis(cid:12)es thetwo su(cid:14)cient conditions b)Absolute minimumin PRN(N(cid:11)(cid:11)) N(cid:11) Fig.3. An example showing the existence of an absolute minimum in PR(N(cid:11)) when 0 CSAwas applied to solve (6) with f =180. (N(cid:11)opt (cid:25)2000.) 0 We collected statistics on PR(N(cid:11)) and N(cid:11) at various f for each of the four test problems studied in Section 4.2. The results indicate that PR(N(cid:11)) satis(cid:12)es N(cid:11) the twosu(cid:14)cient conditions,implying that PR(N(cid:11)) hasan absoluteminimum in (0;1). In other words, each of these problems has an optimal cooling schedule N(cid:11) 0 N(cid:11)opt that minimizes PR(N(cid:11)) at (cid:12)xed f . Figure 3 illustrates the existence of 0 such an optimal schedule in applying CSA to solve (6) with f = 180. The experimental results also show that PR(N(cid:11)) is monotonically nondecreasing. Note that there is an exponential relationship between PR(N(cid:11)) and N(cid:11) in part of the range of PR(N(cid:11)) (say between 0.2 and 0.8) in the problems tested. We do not exploit this relationship because it is not required by the iterative deepening strategy studied in the next section. Further, the relationship is not satis(cid:12)ed when PR(N(cid:11)) approaches0 or 1. Itisinterestingtopointoutthatthesecondsu(cid:14)cientconditionisnotsatis(cid:12)ed 1 N(cid:11) wh00ensearchingwithrandomprobing.Inthiscase,PR(N(cid:11))=1(cid:0)(1(cid:0)S) ,and 2 1 PR(0)=(cid:0)ln (1(cid:0) S)<0, where S is the number of states in the search space. N(cid:11) 0 Hence, PR(N(cid:11)) at (cid:12)xed f does not have an absolute minimum of N(cid:11) in (0;1). 5 Anytime CSA with Iterative Deepening We propose in this section CSAAT(cid:0)ID with two components. In the (cid:12)rst com- ponent discussedinSection5.1,wedesignaset ofcoolingschedulesformultiple runsoftheoriginalCSAsothat(7)issatis(cid:12)ed;thatis,theaveragetotalnumber 0 ofprobesto(cid:12)ndaCLMofvaluef orbetterisofthesameorderofmagnitudeas 0 Topt(f ).InthesecondcomponentpresentedinSection5.2,wedesignaschedule 0 (cid:3) to decrease objective target f in CSAAT(cid:0)ID that allows it to (cid:12)nd f using an (cid:3) averagetotal number of probes of the same order of magnitude as Topt(f ). CSAAT(cid:0)ID in Figure 4 (cid:12)rst (cid:12)nds low-quality feasible solutions in relatively small amountsoftime. Itthen tightens itsrequirementgradually,triesto(cid:12)nd a solution at each quality level, and outputs the best solution when it stops.