Table Of Content

Fitting Spectral Decay with the k-Support Norm Andrew M. McDonald1 Massimiliano Pontil1,2 Dimitris Stamos2 6 1 0 (1) Department of Computer Science 2 n University College London a email: {a.mcdonald,d.stamos.12}@ucl.ac.uk J 4 Gower Street, London WC1E 6BT, UK ] G (2) Istituto Italiano di Tecnologia L s. Via Morego 30, 16163 Genova, Italy c [ 1 January 5, 2016 v 9 4 4 0 Abstract 0 . 1 Thespectralk-supportnormenjoysgoodestimationpropertiesinlowrankmatrixlearning 0 6 problems, empiricallyoutperformingthetracenorm. Itsunitballistheconvexhullofrankk 1 matriceswithunitFrobeniusnorm. Inthispaperwegeneralizethenormtothespectral(k,p)- : v supportnorm,whoseadditionalparameterpcanbeusedtotailorthenormtothedecayofthe i X spectrum of the underlying model. We characterize the unit ball and we explicitly compute r the norm. We further provide a conditional gradient method to solve regularization problems a with the norm, and we derive an efficient algorithm to compute the Euclidean projection on the unit ball in the case p = ∞. In numerical experiments, we show that allowing p to vary significantly improves performance over the spectral k-support norm on various matrix completionbenchmarks,andbettercapturesthespectraldecayoftheunderlyingmodel. Keywords. k-supportnorm,orthogonallyinvariantnorms,matrixcompletion,multitasklearn- ing,proximalpointalgorithms. 1 1 Introduction Theproblemoflearningasparsevectororalowrankmatrixhasgeneratedmuchinterestinrecent years. A popular approach is to use convex regularizers which encourage sparsity, and a number ofthesehavebeenstudiedwithapplicationsincludingimagedenoising,collaborativefilteringand multitask learning, see for example, [BuehlmannandvanderGeer2011, Wainwright2014] and referencestherein. Recently, the k-support norm was proposed by [Argyriouetal. 2012], motivated as a tight relaxation of the set of k-sparse vectors of unit Euclidean norm. The authors argue that as a regu- larizerforsparsevectorestimation,thenormempiricallyoutperformstheLasso[Tibshirani1996] and Elastic Net [ZouandHastie2005] penalties. Statistical bounds on the Gaussian width of the k-supportnormhavebeenprovidedby[Chatterjeeetal. 2014]. Thek-supportnormhasalsobeen extended to the matrix setting. By applying the norm to the vector of singular values of a matrix, [McDonaldetal. 2014] obtain the orthogonally invariant spectral k-support norm, reporting state oftheartperformanceonmatrixcompletionbenchmarks. Motivatedbytheperformanceofthek-supportnorminsparsevectorandmatrixlearningprob- lems, in this paper we study a natural generalization by considering the (cid:96) -norms (for p ∈ [1,∞]) p in place of the Euclidean norm. These allow a further degree of freedom when fitting a model to the underlying data. We denote the ensuing norm the (k,p)-support norm. As we demonstrate in numerical experiments, using p = 2 is not necessarily the best choice in all instances. By tuning the value of p the model can incorporate prior information regarding the singular values. When prior knowledge is lacking, the parameter can be chosen by validation, hence the model can adapt to a variety of decay patterns of the singular values. An interesting property of the norm is that it interpolates between the (cid:96) norm (for k = 1) and the (cid:96) -norm (for k = d). It follows that varying 1 p both k and p the norm allows one to learn sparse vectors which exhibit different patterns of decay inthenon-zeroelements. Inparticular,whenp = ∞thenormprefersvectorswhichareconstant. Amaingoalofthepaperistostudytheproposednorminmatrixlearningproblems. The(k,p)- support norm is a symmetric gauge function hence it induces the orthogonally invariant spectral (k,p)-support norm. This interpolates between the trace norm (for k = 1) and the Schatten p- norms (for k = d) and its unit ball has a simple geometric interpretation as the convex hull of matricesofranknogreaterthank andSchattenp-normnogreaterthanone. Thissuggeststhatthe new norm favors low rank structure and the effect of varying p allows different patterns of decay in the spectrum. In the special case of p = ∞, the (k,p)-support norm is the dual of the Ky-Fan k-norm[Bhatia1997]anditencouragesaflatspectrumwhenusedasaregularizer. Themaincontributionsofthepaperare: i)weproposethe(k,p)-supportnormasanextension of the k-support norm and we characterize in particular the unit ball of the induced orthogonally invariant matrix norm (Section 3); ii) we show that the norm can be computed efficiently and we discuss the role of the parameter p (Section 4); iii) we outline a conditional gradient method to solve the associated regularization problem for both vector and matrix problems (Section 5); 2 and in the special case p = ∞ we provide an O(dlogd) computation of the projection operator (Section 5.1); finally, iv) we present numerical experiments on matrix completion benchmarks whichdemonstratethattheproposednormofferssignificantimprovementoverpreviousmethods, and we discuss the effect of the parameter p (Section 6). The appendix contains derivations of resultswhicharesketchedinorareomittedfromthemainbodyofthepaper. Notation. We use N for the set of integers from 1 up to and including n. We let Rd be the n d-dimensionalrealvectorspace,whoseelementsaredenotedbylowercaseletters. Foranyvector w ∈ Rd, its support is defined as supp(w) = {i ∈ N : w (cid:54)= 0}, and its cardinality is defined d i as card(w) = |supp(w)|. We let Rd×m be the space of d×m real matrices. We denote the rank of a matrix as rank(W). We let σ(W) ∈ Rr be the vector formed by the singular values of W, where r = min(d,m), and where we assume that the singular values are ordered nonincreasing, that is σ (W) (cid:62) ··· (cid:62) σ (W) (cid:62) 0. For p ∈ [1,∞) the (cid:96) -norm of a vector w ∈ Rd is defined 1 r p as (cid:107)w(cid:107) = ((cid:80)d |w |p)1/p and (cid:107)w(cid:107) = maxd |w |. Given a norm (cid:107) · (cid:107) on Rd or Rd×m, (cid:107) · (cid:107) p i=1 i ∞ i=1 i ∗ denotes the corresponding dual norm, defined by (cid:107)u(cid:107) = sup{(cid:104)u,w(cid:105) : (cid:107)w(cid:107) (cid:54) 1}. The convex ∗ hullofasubsetS ofavectorspaceisdenotedco(S). 2 Background and Previous Work Foreveryk ∈ N ,thek-supportnorm(cid:107)·(cid:107) isdefinedasthenormwhoseunitballisgivenby d (k) co(cid:8)w ∈ Rd : card(w) (cid:54) k,(cid:107)w(cid:107) (cid:54) 1(cid:9), (2.1) 2 that is, the convex hull of the set of vectors of cardinality at most k and (cid:96) -norm no greater than 2 one[Argyriouetal. 2012]. Wereadilyseethatfork = 1andk = dwerecovertheunitballofthe (cid:96) and(cid:96) -normsrespectively. 1 2 The k-support norm of a vector w ∈ Rd can be expressed as an infimal convolution [Rockafellar1970,p.34], (cid:40) (cid:41) (cid:88) (cid:88) (cid:107)w(cid:107) = inf (cid:107)v (cid:107) : v = w , (2.2) (k) g 2 g (vg) g∈G g∈G k k where G is the collection of all subsets of N containing at most k elements and the infimum k d is over all vectors v ∈ Rd such that supp(v ) ⊆ g, for g ∈ G . Equation (2.2) highlights that g g k the k-support norm is a special case of the group lasso with overlap [Jacobetal. 2009], where the cardinality of the support sets is at most k. This expression suggests that when used as a regularizer, the norm encourages vectors w to be a sum of a limited number of vectors with small support. Due to the variational form of (2.2) computing the norm is not straightforward, however [Argyriouetal. 2012]notethatthedualnormhasasimpleform,namelyitisthe(cid:96) -normofthek 2 3 largestcomponents, (cid:118) (cid:117) k (cid:117)(cid:88) (cid:107)u(cid:107) = (cid:116) (|u|↓)2, u ∈ Rd, (2.3) (k),∗ i i=1 where |u|↓ is the vector obtained from u by reordering its components so that they are nonincreasing in absolute value. Note also from equation (2.3) that for k = 1 and k = d, the dual normisequaltothe(cid:96) -normand(cid:96) -norm,respectively,whichagreeswithourearlierobservation ∞ 2 regardingtheprimalnorm. Arelatedproblemwhichhasbeenstudiedinrecentyearsislearningamatrixfromasetoflinear measurements,inwhichtheunderlyingmatrixisassumedtohavesparsespectrum(lowrank). The trace norm, the (cid:96) -norm of the singular values of a matrix, has been shown to perform well in this 1 setting,seee.g. [Argyriouetal. 2008,JaggiandSulovsky2010]. Recallthatanorm(cid:107)·(cid:107)onRd×m is called orthogonally invariant if (cid:107)W(cid:107) = (cid:107)UWV(cid:107), for any orthogonal matrices U ∈ Rd×d and V ∈ Rm×m. A classical result by von Neumann establishes that a norm is orthogonally invariant if and only if it is of the form (cid:107)W(cid:107) = g(σ(W)), where σ(W) is the vector formed by the singular values of W in nonincreasing order, and g is a symmetric gauge function [VonNeumann1937]. In other words, g is a norm which is invariant under permutations and sign changes of the vector components,thatisg(w) = g(Pw) = g(Jw),whereP isanypermutationmatrixandJ isdiagonal withentriesequalto±1[HornandJohnson1991,p. 438]. Examplesofsymmetricgaugefunctionsarethe(cid:96) normsforp ∈ [1,∞]andthecorresponding p orthogonally invariant norms are called the Schatten p-norms [HornandJohnson1991, p. 441]. In particular, those include the trace norm and Frobenius norm for p = 1 and p = 2 respectively. RegularizationwithSchattenp-normshasbeenpreviouslystudiedby[Argyriouetal. 2007]anda statistical analysis has been performed by [RohdeandTsybakov2011]. As the set G includes all k subsets of size k, expression (2.2) for the k-support norm reveals that is a symmetric gauge function. [McDonaldetal. 2014]usethisfacttointroducethespectralk-supportnormformatrices,by defining (cid:107)W(cid:107) = (cid:107)σ(W)(cid:107) , for W ∈ Rd×m and report state of the art performance on matrix (k) (k) completionbenchmarks. 3 The (k,p)-Support Norm In this section we introduce the (k,p)-support norm as a natural extension of the k-support norm. Thisfollowsbyapplyingthe(cid:96) -norm,ratherthantheEuclideannorm,intheinfimumconvolution p definitionofthenorm. Definition 1. Let k ∈ N and p ∈ [1,∞]. The (k,p)-support norm of a vector w ∈ Rd is defined d 4 as (cid:40) (cid:41) (cid:88) (cid:88) (cid:107)w(cid:107) = inf (cid:107)v (cid:107) : v = w . (3.1) (k,p) g p g (vg) g∈G g∈G k k wheretheinfimumisoverallvectorsv ∈ Rd suchthatsupp(v ) ⊆ g,forg ∈ G . g g k Let us note that the norm is well defined. Indeed, positivity, homogeneity and non degeneracy are immediate. To prove the triangle inequality, let w,w(cid:48) ∈ Rd. For any (cid:15) > 0 there exist {v } g and {v(cid:48)} such that w = (cid:80) v , w(cid:48) = (cid:80) v(cid:48), (cid:80) (cid:107)v (cid:107) (cid:54) (cid:107)w(cid:107) + (cid:15)/2, and (cid:80) (cid:107)v(cid:48)(cid:107) (cid:54) g g g g g g g p (k,p) g g p (cid:107)w(cid:48)(cid:107) +(cid:15)/2. As(cid:80) v +(cid:80) v(cid:48) = w+w(cid:48),wehave (k,p) g g g g (cid:88) (cid:88) (cid:107)w+w(cid:48)(cid:107) (cid:54) (cid:107)v (cid:107) + (cid:107)v(cid:48)(cid:107) (k,p) g p g p g g (cid:54) (cid:107)w(cid:107) +(cid:107)w(cid:48)(cid:107) +(cid:15), (k,p) (k,p) andtheresultfollowsbyletting(cid:15)tendtozero. Notethat,sinceaconvexsetisequivalenttotheconvexhullofitsextremepoints,Definition1 implies that the unit ball of the (k,p)-support norm, denoted by Cp, is given by the convex hull of k thesetofvectorswithcardinalitynogreaterthank and(cid:96) -normnogreaterthan1,thatis p Cp = co(cid:8)w ∈ Rd : card(w) (cid:54) k,(cid:107)w(cid:107) (cid:54) 1(cid:9). (3.2) k p Definition 1 gives the norm as the solution of a variational problem. Its explicit computation is not straightforward in the general case, however for p = 1 the unit ball (3.2) does not depend on k and is always equal to the (cid:96) unit ball. Thus, the (k,1)-support norm is always equal to the 1 (cid:96) -norm, and we do not consider further this case in this section. Similarly, for k = 1 we recover 1 the (cid:96) -norm for all values of p. For p = ∞, from the definition of the dual norm it is not difficult 1 to show that (cid:107)· (cid:107) = max{(cid:107)· (cid:107) ,(cid:107)· (cid:107) /k}. We return to this in Section 4 when we describe (k,p) ∞ 1 howtocomputethenormforallvaluesofp. Note further that in Equation (3.1), as p tends to ∞, the (cid:96) -norm of each v is increasingly p g dominated by the largest component of v . As the variational formulation tries to identify vectors g v withsmallaggregate(cid:96) -norm,thissuggeststhathighervaluesofpencourageeachv totendto g p g a vector whose k entries are equal. In this manner varying p allows us adjust the degree to which thecomponentsofvectorw canbeclusteredinto(possiblyoverlapping)groupsofsizek. As in the case of the k-support norm, the dual (k,p)-support norm has a simple expression. Recallthatthedualnormofavectoru ∈ Rd isdefinedbytheoptimizationproblem (cid:8) (cid:9) (cid:107)u(cid:107) = max (cid:104)u,w(cid:105) : (cid:107)w(cid:107) = 1 . (3.3) (k,p),∗ (k,p) 5 Proposition2. Ifp ∈ (1,∞]thenthedual(k,p)-supportnormisgivenby (cid:32) (cid:33)1 q (cid:88) (cid:107)u(cid:107) = |u |q , u ∈ Rd, (k,p),∗ i i∈I k whereq = p/(p−1)andI ⊂ N isthesetofindicesofthek largestcomponentsofuinabsolute k d value. Furthermore,ifp ∈ (1,∞)andu ∈ Rd\{0}thenthemaximumin(3.3)isattainedfor  (cid:16) (cid:17) 1 sign(u ) |ui| p−1 ifi ∈ I , w = i (cid:107)u(cid:107) k (3.4) i (k,p),∗ 0 otherwise. Ifp = ∞themaximumisattainedfor  sign(u ) ifi ∈ I ,u (cid:54)= 0,  i k i w = λ ∈ [−1,1] ifi ∈ I ,u = 0, i i k i  0 otherwise. Notethatforp = 2werecoverthedualofthek-supportnormin(2.3). 3.1 The Spectral (k,p)-Support Norm From Definition 1 it is clear that the (k,p)-support norm is a symmetric gauge function. This followssinceG containsallgroupsofcardinalityk andthe(cid:96) -normsonlyinvolveabsolutevalues k p ofthecomponents. Hencewecandefinethespectral(k,p)-supportnormas (cid:107)W(cid:107) = (cid:107)σ(W)(cid:107) , W ∈ Rd×m. (k,p) (k,p) Since the dual of any orthogonally invariant norm is given by (cid:107) · (cid:107) = (cid:107)σ(·)(cid:107) , see e.g. ∗ ∗ [Lewis1995],weconcludethatthedualspectral(k,p)-supportnormisgivenby (cid:107)Z(cid:107) = (cid:107)σ(Z)(cid:107) , Z ∈ Rd×m. (k,p),∗ (k,p),∗ The next result characterizes the unit ball of the spectral (k,p)-support norm. Due to the rela- tionshipbetweenanorthogonallyinvariantnormanditscorrespondingsymmetricgaugefunction, weseethatthecardinalityconstraintforvectorsgeneralizesinanaturalmannertotherankopera- torformatrices. Proposition 3. The unit ball of the spectral (k,p)-support norm is the convex hull of the set of matricesofrankatmostk andSchattenp-normnogreaterthanone. 6 In particular, if p = ∞, the dual vector norm is given by u ∈ Rd, by (cid:107)u(cid:107) = (cid:80)k |u|↓. (k,∞),∗ i=1 i Hence, for any Z ∈ Rd×m, the dual spectral norm is given by (cid:107)Z(cid:107) = (cid:80)k σ (Z), that (k,∞),∗ i=1 i is the sum of the k largest singular values, which is also known as the Ky-Fan k-norm, see e.g. [Bhatia1997]. 4 Computing the Norm Inthissectionwecomputethenorm,illustratinghowitinterpolatesbetweenthe(cid:96) and(cid:96) -norms. 1 p Theorem4. Letp ∈ (1,∞). Foreveryw ∈ Rd,andk (cid:54) d,itholdsthat (cid:34)(cid:88)(cid:96) (cid:32)(cid:80)d |w|↓(cid:33)p(cid:35)p1 (cid:107)w(cid:107) = (|w|↓)p + √i=(cid:96)+1 i (4.1) (k,p) i q k −(cid:96) i=1 where 1 + 1 = 1, and for k = d, we set (cid:96) = d, otherwise (cid:96) is the largest integer in {0,...,k −1} p q satisfying d (cid:88) (k −(cid:96))|w|↓ (cid:62) |w|↓. (4.2) (cid:96) i i=(cid:96)+1 Furthermore,thenormcanbecomputedinO(dlogd)time. Proof. Note first that in (4.1) when (cid:96) = 0 we understand the first term in the right hand side to bezero,andwhen(cid:96) = dweunderstandthesecondtermtobezero. Weneedtocompute (cid:40) (cid:41) d (cid:88) (cid:107)w(cid:107) = max u w : (cid:107)u(cid:107) (cid:54) 1 (k,p) i i (k,p),∗ i=1 where the dual norm (cid:107)·(cid:107) is described in Proposition 2. Let z = |w|↓. The problem is then (k,p),∗ i i equivalentto (cid:40) (cid:41) d k (cid:88) (cid:88) max z u : uq (cid:54) 1,u (cid:62) ··· (cid:62) u . (4.3) i i i 1 d i=1 i=1 Thisfurthersimplifiestothek-dimensionalproblem (cid:40) (cid:41) k−1 d k (cid:88) (cid:88) (cid:88) max u z +u z : uq (cid:54) 1,u (cid:62) ··· (cid:62) u . i i k i i 1 k i=1 i=k i=1 7 Note that when k = d, the solution is given by the dual of the (cid:96) -norm, that is the (cid:96) -norm. For q p the remainder of the proof we assume that k < d. We can now attempt to use Holder’s inequality, which states that for all vectors x such that (cid:107)x(cid:107) = 1, (cid:104)x,y(cid:105) (cid:54) (cid:107)y(cid:107) , and the inequality is tight if q p andonlyif (cid:18) |y | (cid:19)p−1 i x = sign(y ). i i (cid:107)y(cid:107) p We use it for the vector y = (z ,...,z ,(cid:80)d z ). The components of the maximizer u satisfy 1 k−1 i=k i (cid:16) (cid:17)p−1 u = zi ifi (cid:54) k −1,and i M k−1 (cid:32)(cid:80)d z (cid:33)p−1 u = i=(cid:96)+1 i . k M k−1 whereforevery(cid:96) ∈ {0,...,k−1},M denotesther.h.s. inequation(4.1). Wethenneedtoverify (cid:96) thattheorderingconstraintsaresatisfied. Thisrequiresthat (cid:32) (cid:33)p−1 d (cid:88) (z )p−1 (cid:62) z k−1 i i=k whichisequivalenttoinequality(4.2)for(cid:96) = k−1. Ifthisinequalityistruewearedone,otherwise wesetu = u andsolvethesmallerproblem k k−1 (cid:26)k−2 d (cid:88) (cid:88) max u z +u z : i i k−1 i i=1 i=k−1 k−2 (cid:27) (cid:88) uq +2uq (cid:54) 1, u (cid:62) ··· (cid:62) u . i k−1 1 k−1 i=1 We use again Ho¨lder’s inequality and keep the result if the ordering constraints are fulfilled. Con- tinuinginthisway,thegenericproblemweneedtosolveis (cid:26) (cid:96) d (cid:88) (cid:88) max u z +u z : i i (cid:96)+1 i i=1 i=(cid:96)+1 (cid:96) (cid:27) (cid:88) uq +(k −(cid:96))uq (cid:54) 1, u (cid:62) ··· (cid:62) u i (cid:96)+1 1 (cid:96)+1 i=1 where (cid:96) ∈ {0,...,k − 1}. Without the ordering constraints the maximum, M , is obtained by (cid:96) 1 the change of variable u(cid:96)+1 (cid:55)→ (k − (cid:96))qu(cid:96) followed by applying Ho¨lder’s inequality. A direct 8 (cid:16) (cid:17)p−1 computationprovidesthatthemaximizerisu = zi ifi (cid:54) (cid:96),and i M (cid:96) (cid:32) (cid:80)d z (cid:33)p−1 (k −(cid:96))1qu(cid:96)+1 = (k −i=(cid:96)(cid:96))+1q1Mip . (cid:96) Usingtherelationship 1 + 1 = 1,wecanrewritethisas p q (cid:32) (cid:80)d z (cid:33)p−1 u = i=(cid:96)+1 i . (cid:96)+1 (k −(cid:96))Mp (cid:96) Hence,theorderingconstraintsaresatisfiedif (cid:32)(cid:80)d z (cid:33)p−1 zp−1 (cid:62) i=(cid:96)+1 i , (cid:96) (k −(cid:96)) whichisequivalentto(4.2). FinallynotethatM isanondecreasingfunctionof(cid:96). Thisisbecause (cid:96) theproblemwithasmallervalueof(cid:96)ismoreconstrained,namely,itsolves(4.3)withtheadditional constraintsu = ··· = u . Moreover,iftheconstraint(4.2)holdsforsomevalue(cid:96) ∈ {0,...,k− (cid:96)+1 d 1} then it also holds for a smaller value of (cid:96), hence we maximize the objective by choosing the largest(cid:96). The computational complexity stems from using the monotonicity of M with respect to (cid:96), (cid:96) whichallowsustoidentifythecriticalvalueof(cid:96)usingbinarysearch. Note that for k = d we recover the (cid:96) -norm and for p = 2 we recover the result in p [Argyriouetal. 2012,McDonaldetal. 2014],howeverourprooftechniqueisdifferentfromtheirs. Remark 5 (Computation of the norm for p ∈ {1,∞}). Since the norm (cid:107)·(cid:107) computed above (k,p) forp ∈ (1,∞)iscontinuousinp,thespecialcasesp = 1andp = ∞canbederivedbyalimiting argument. We readily see that for p = 1 the norm does not depend on k and it is always equal to the(cid:96) -norm,inagreementwithourobservationintheprevioussection. Forp = ∞weobtainthat 1 (cid:107)w(cid:107) = max((cid:107)w(cid:107) ,(cid:107)w(cid:107) /k). (k,∞) ∞ 1 5 Optimization In this section, we describe how to solve regularization problems using the vector and matrix (k,p)-supportnorms. Weconsidertheconstrainedoptimizationproblem min(cid:8)f(w) : (cid:107)w(cid:107) (cid:54) α(cid:9), (5.1) (k,p) 9 Algorithm1Frank-Wolfe. Choosew(0) suchthat(cid:107)w(0)(cid:107) (cid:54) α (k,p) fort = 0,...,T do Computeg := ∇f(w(t)) Computes := argmin (cid:8)(cid:104)s,g)(cid:105) : (cid:107)s(cid:107) (cid:54) α(cid:9) (k,p) Updatew(t+1) := (1−γ)w(t) +γs,forγ := 2 t+2 endfor wherew isinRd orRd×m,α > 0isaregularizationparameterandtheerrorfunctionf isassumed to be convex and continuously differentiable. For example, in linear regression a valid choice is thesquareerror,f(w) = (cid:107)Xw−y(cid:107)2,whereX ismatrixofobservationsandy avectorofresponse 2 variables. Constrained problems of form (5.1) are also referred to as Ivanov regularization in the inverseproblemsliterature[Ivanovetal. 1978]. Aconvenienttooltosolveproblem(5.1)isprovidedbytheFrank-Wolfemethod[FrankandWolfe1956], seealso[Jaggi2013]forarecentaccount. ThemethodisoutlinedinAlgorithm1,andithasworst caseconvergencerateO(1/T). Thekeystepofthealgorithmistosolvethesubproblem argmin (cid:8)(cid:104)s,g(cid:105) : (cid:107)s(cid:107) (cid:54) α(cid:9), (5.2) (k,p) whereg = ∇f(w(t)),thatisthegradientoftheobjectivefunctionatthet-thiteration. Thisproblem involves computing a subgradient of the dual norm at g. It can be solved exactly and efficiently as a consequence of Proposition 2. We discuss here the vector case and postpone the discussion of the matrix case to Section 5.2. By symmetry of the (cid:96) -norm, problem (5.2) can be solved in the p samemannerasthemaximuminProposition2,andthesolutionisgivenbys = −αw ,wherew i i i is given by (3.4). Specifically, letting I ⊂ N be the set of indices of the k largest components of k d g inabsolutevalue,forp ∈ (1,∞)wehave  (cid:16) (cid:17) 1 −αsign(g ) gi p−1 , ifi ∈ I s = i (cid:107)g(cid:107) k (5.3) i (k,p),∗ 0, ifi ∈/ I k and,forp = ∞wechoosethesubgradient (cid:40) −αsign(g ) ifi ∈ I , g (cid:54)= 0, i k i s = (5.4) i 0 otherwise. 5.1 Projection Operator Analternativemethodtosolve(5.1)inthevectorcaseistoconsidertheequivalentproblem (cid:110) (cid:111) min f(w)+δ (w) : w ∈ Rd , (5.5) {(cid:107)·(cid:107) (cid:54)α} (k,p) 10

Fitting Spectral Decay with the $k$-Support Norm PDF

0.39 MB·

by Andrew M. McDonald

#journals #arxiv

Checking for file health...

Save to my drive

Quick download

Download

Download Fitting Spectral Decay with the $k$-Support Norm PDF Free - Full Version

by Andrew M. McDonald| 0.39

Download Fitting Spectral Decay with the $k$-Support Norm by Andrew M. McDonald in PDF format completely FREE. No registration required, no payment needed. Get instant access to this valuable resource on PDFdrive.to!

Free Download PDF

About Fitting Spectral Decay with the $k$-Support Norm

No description available for this book.

Detailed Information

Author:	Andrew M. McDonald
File Size:	0.39
Format:	PDF
Price:	FREE

Download Free PDF

Safe & Secure Download - No registration required

Why Choose PDFdrive for Your Free Fitting Spectral Decay with the $k$-Support Norm Download?

100% Free: No hidden fees or subscriptions required for one book every day.
No Registration: Immediate access is available without creating accounts for one book every day.
Safe and Secure: Clean downloads without malware or viruses
Multiple Formats: PDF, MOBI, Mpub,... optimized for all devices
Educational Resource: Supporting knowledge sharing and learning

Frequently Asked Questions

Is it really free to download Fitting Spectral Decay with the $k$-Support Norm PDF?

Yes, on https://PDFdrive.to you can download Fitting Spectral Decay with the $k$-Support Norm by Andrew M. McDonald completely free. We don't require any payment, subscription, or registration to access this PDF file. For 3 books every day.

How can I read Fitting Spectral Decay with the $k$-Support Norm on my mobile device?

After downloading Fitting Spectral Decay with the $k$-Support Norm PDF, you can open it with any PDF reader app on your phone or tablet. We recommend using Adobe Acrobat Reader, Apple Books, or Google Play Books for the best reading experience.

Is this the full version of Fitting Spectral Decay with the $k$-Support Norm?

Yes, this is the complete PDF version of Fitting Spectral Decay with the $k$-Support Norm by Andrew M. McDonald. You will be able to read the entire content as in the printed version without missing any pages.

Is it legal to download Fitting Spectral Decay with the $k$-Support Norm PDF for free?

https://PDFdrive.to provides links to free educational resources available online. We do not store any files on our servers. Please be aware of copyright laws in your country before downloading.

The materials shared are intended for research, educational, and personal use in accordance with fair use principles.