Table Of ContentRobust PCA and subspace tracking from
incomplete observations using (cid:96) -surrogates∗
0
ClemensHageandMartinKleinsteuber
DepartmentofElectricalEngineeringandInformationTechnology
TechnischeUniversita¨tMu¨nchen
2
Arcisstr.21,80333Munich,Germany
1
0 {hage,kleinsteuber}@tum.de
2
http://www.gol.ei.tum.de
t
c
O October3,2012
2
]
L Abstract
M
Manyapplicationsindataanalysisrelyonthedecompositionofadatamatrix
into a low-rank and a sparse component. Existing methods that tackle this task
.
t usethenuclearnormand(cid:96) -costfunctionsasconvexrelaxationsoftherankcon-
a 1
t straintandthesparsitymeasure,respectively,oremploythresholdingtechniques.
s
We propose a method that allows for reconstructing and tracking a subspace of
[
upper-bounded dimension from incomplete and corrupted observations. It does
1 not require any a priori information about the number of outliers. The core of
v ouralgorithmisanintrinsicConjugateGradientmethodonthesetoforthogonal
5
projectionmatrices, theso-calledGrassmannian. Non-convexsparsitymeasures
0
are used for outlier detection, which leads to improved performance in terms of
8
robustlyrecoveringandtrackingthelow-rankmatrix. Inparticular,ourapproach
0
can cope with more outliers and with an underlying matrix of higher rank than
.
0 otherstate-of-the-artmethods.
1
2
1 1 Introduction
:
v
i The detection of subspaces that best fit high-dimensional data is a challenging and
X
importanttaskindataanalysiswithcountlessapplications. TheclassicPrincipalCom-
r
a ponentAnalysis(PCA)isstillthestandardtoolforsearchingthebestrank-k approx-
imationofahigh-dimensionaldataset X ∈ Rm×n, wheretheapproximationqualityis
measuredintermsoftheFrobeniusnorm. Ontheonehand,thisminimizationproblem
can be solved easily via a Singular Value Decomposition (SVD), while on the other
handthechoiceoftheFrobeniusnormmakesthisapproachhighlyvulnerabletoheavy
outliers. Inthepastyearsalotofefforthasbeenspentonmethodsthatallowtofind
thebestfittingsubspacedespitethepresenceofheavyoutliersinthemeasurementsor
missing data. Certain approaches in this area commonly known as Robust PCA stick
closelytotheclassicPCAconceptandrobustifyitbymeasuringthedistancefromthe
∗ThisworkhasbeensupportedbytheDFGexcellenceinitiativeresearchclusterCoTeSys.
1
datapointstothesubspacewith(cid:96) -costfunctions,cf.Dingetal(2006),Kwak(2008).
1
Others,includingtheapproachpresentedhere,relateRobustPCAtothefieldofrobust
matrixcompletion. Thatis,givensomeincompleteobservationsXˆ ofacorruptedand
possiblynoisydatamatrixX,whichisthesuperpositionofalow-rankmatrixLanda
sparsematrixS,thetaskistorecoverLandS.
1.1 RelatedWork
One of the most prominent approaches to Robust PCA in the past years is proposed
by Candes et al (2009) and Wright et al (2009). The authors formulate a convex op-
timization problem by relaxing the hard rank constraint to a nuclear norm minimiza-
tionandanalyzehowwellthe(cid:96) -relaxationapproximatesthe(cid:96) -norminthelow-rank
1 0
andsparsedecompositiontask. MethodstosolvetheproblemincludeSingularValue
Thresholding (SVT, Cai et al (2010)) and the exact (EALM) or inexact (IALM) aug-
mentedLagrangianmultipliermethod(Linetal,2010),whicharefastandcomparably
reliableiftheunderlyingassumptions,i.e. low-rank-propertyandsparsityofoutliers,
arevalid. Theproblemofreconstructingthelow-rankmatrix Linthecasewhereen-
tirecolumnsinthemeasurementsarecorruptedisconsideredbyChenetal(2011). A
methodcalledSpaRCSisproposedbyWatersetal(2011),whichrecoversthelow-rank
andthesparsepartofamatrixfromcompressivemeasurements.
InmanyapplicationsitisreasonabletoassumethatnoadditiveGaussiannoiseis
present in the model, as noise is negligible compared to the signal power, e.g. when
dealingwithhigh-qualitysensorsinthefieldofComputerVision. However,therealso
existapproachessuchasGoDec(ZhouandTao,2011)thatexplicitlymodeladditional
Gaussian noise. The method uses thresholding techniques and extends the low-rank
and sparse decomposition to the noisy case. SpaRCS and GoDec aim at recovering
boththelow-rankmatrixLandtheoutliersS fromnoisymeasurementsandtherefore
requireanupperboundforthecardinalityofthesparsecomponent.
Incontrasttotheoften-performedrank-relaxation,therealsoexistmethodswhich
fixthedimensionofthesubspaceapproximation,suchasthegreedyalgorithmGECO
(Shalev-Shwartzetal,2011). Thismethodreconstructsadatasetiterativelybasedon
SVDwhileincreasingtherankoftheapproximationineachstep.Ageneralframework
for optimizing over matrices of a fixed rank has also been proposed by Shalit et al
(2010). One drawback here is that the manifold requires a fixed rank k and optimal
pointsthatmayhavearankstrictlylowerthankarenotinthefeasibleset.
An alternative and elegant way to control the rank in terms of an upper bound is
optimization on the set of fixed-dimensional subspaces, the so-called Grassmannian.
As pointed out by Meyer et al (2011), optimizing on the Grassmannian offers many
advantages,suchaslimitedmemoryusageandalowernumberofparameterstoopti-
mize. Althoughtheoptimizationproblembecomesnon-convex,reliableperformance
can be achieved in practice. In the work of Keshavan and Montanari (2010), spec-
traltechniquesarecombinedwithalearningalgorithmontheGrassmannian. Boumal
andAbsil(2011)proposeaRiemanniantrust-regionmethodontheGrassmannianfor
low-rank matrix completion, which, however, does not consider heavy outliers. The
GROUSEalgorithm(Balzanoetal,2010)furthermoredemonstratesthatoptimization
on the Grassmannian allows to estimate the underlying subspace incrementally. In-
steadofbatch-processingadataset,thesamplescanbeprocessedoneatatime,which
makes it possible to track a subspace that varies over time, even if it is incompletely
observed. Asmallportionofthedataisusedtoobtainaninitialsubspaceestimateand
witheverynewincomingdatasamplethisestimateismodifiedtofollowchangesinthe
2
dominantsubspaceovertime. WhilethemethodofBalzanoetal(2010)operateswith
(cid:96) -cost functions, its recent adaptation GRASTA (He et al, 2012) performs a gradient
2
descentontheGrassmannianandaimsatoptimizingan(cid:96) -costfunctiontomitigatethe
1
effectsofheavyoutliersinthesubspacetrackingstage. Theauthorsovercomethenon-
differentiabilityofthe(cid:96) -normbyformulatinganaugmentedLagrangianoptimization
1
problematthecostofdoublingthenumberofunknownparameters.
1.2 OurContribution
Althoughthe(cid:96) -normleadstofavorablyconditionedoptimizationproblemsitiswell-
1
knownthatpenalizingwithnon-convex(cid:96) -surrogatesallowsreconstructioneveninthe
0
case when (cid:96) -based methods fail, see e.g. Chartrand and Staneva (2008). Therefore,
1
weproposeaframeworkthatcombinestheadvantagesofGrassmannianoptimization
withnon-convexsparsitymeasures. Ourapproachfocusesprimarilyonreconstructing
andtrackingtheunderlyingsubspaceandcanoperateonbothfullyandincompletely
observed data sets. The algorithm performs a low-rank and sparse decomposition.
However,itdoesnotrequireanyinformationaboutthecardinalityorthesupportsetof
thesparseoutliers,thusbeingdifferentfromSpaRCSandGoDec.
In contrast to GRASTA (He et al, 2012), the method presented in this paper directly
optimizes the cost function and thus operates with less than half the number of un-
knowns. LikealloptimizationmethodsontheGrassmannianouralgorithmallowsto
upper-boundthedimensionoftheunderlyingsubspaceandeasilyextendstotheprob-
lemofrobustlytrackingthissubspace.
Experimentalresultsconfirmthattheproposedmethodcancopewithmoreoutliers
andwithanunderlyingmatrixofhigherrankthanotherstate-of-the-artmethods,thus
extendingpossibleareasofapplicationfromthestrictlow-rankandsparsedecompo-
sition to the more general area of robust dimensionality reduction. In the following
sectionwepresentanalternatingminimizationschemeandrelateourapproachtodi-
mensionalityreductionviaPCA.Subsequently,wecarefullyderiveandexplainaCon-
jugate Gradient (CG) type algorithm on the Grassmannian for solving the individual
minimization tasks. We then extend the static method by a dynamic subspace track-
ing algorithm and finally, we evaluate the performance of the proposed method with
various(cid:96) -surrogatesandcompareourapproachtootherstate-of-the-artmethods.
0
2 Problem statement
LetX ∈Rm×n bethedatamatrixofwhichweselectthepartialentriesXˆ =A(X)using
alinearoperator. WeconsiderthedatamodelX = L+S whereLisofrankrk(L) ≤ k
andS issparse. Ouraimistorecover L fromtheobservations Xˆ. Adirectapproach
leadstothenumericallyinfeasibleoptimizationproblem
min(cid:107)Xˆ −A(L)(cid:107) . (1)
0
rkL≤k
Sincerk(L)≤k,wecanwritethematrixLastheproductL=UY,whereY ∈Rk×nand
U isanelementoftheso-calledStiefelmanifoldSt ={U ∈Rm×k|U(cid:62)U = I }withI
k,m k k
denotingthe(k×k)-identitymatrix. ThisfactorizationofLhasthefollowingpractical
interpretation: U isanorthonormalbasisoftherobustlyestimatedsubspaceofthefirst
k principal components, hereafter referred to as (robust) dominant subspace. Note,
that neither U nor Y are uniquely determined in this factorization. Let O(k) = {θ ∈
3
Rk×k|θ(cid:62)θ= I }definethesetoforthogonalmatrices,thenU canalwaysbeadjustedby
k
anorthogonalmatrixθ ∈ O(k),suchthatUY = Uθθ(cid:62)Y withθ(cid:62)Y havinguncorrelated
rows,leadingtothek(robustlyestimated)principlecomponentscoresofX.
Our concession to the numerical infeasibility of (1) is to employ an alternative,
possibly non-convex sparsity measure h. Some concrete examples can be found in
Section6.1. Asaresult,problem(1)relaxestotheminimizationproblem
min h(Xˆ −A(UY)). (2)
U∈Stk,m,Y∈Rk×n
Problem(2)canbeaddressedintwodifferentways. Eitherthefulldataset Xˆ ispro-
cessedatonce,resultinginoneestimateforUandY,orthedatavectorsxˆareprocessed
oneatatime. Inthelattercase,whileforeachnewsamplexˆoneobtainsacorrespond-
ingoptimalcoordinatesety,thesubspaceestimateU shouldvarysmoothlyovertime
and fit both recent and current observations. We will refer to the former problem as
subspacereconstructionorRobustPCAandtothelatterassubspacetracking.
3 An alternating minimization framework using (cid:96) sur-
0
rogates
Directly tackling problem (2) has two severe drawbacks. The first is the above men-
tioned ambiguity in the factorization UY and the second is the fact that reasonable
sparsitymeasuresarenotsmoothandthusforbidtheuseoffastsmoothoptimization
methods. Weovercomethesetwoproblemsbyproposinganalternatingminimization
frameworkthatgetsridoftheambiguitybyiteratingonamoreappropriategeometric
settingandallowsustousesmoothapproximationsofh,whichcombinetheadvantage
ofhavingfastsmoothoptimizationtoolsathandtogetherwiththestrongcapabilityof
non-convexsparsitymeasuresinreconstructiontasks.
So let h : Rm×n → R with µ ∈ R+ be a smooth approximation of h such that h
µ µ
converges pointwise to h as µ tends to zero. Monotonically shrinking the smoothing
parameterµbetweenalternatingminimizationstepsallowstocombinetheadvantages
of both smooth optimization and (cid:96) -like sparsity measures. Schematically, we tackle
0
problem(2)byiteratingthefollowingtwostepsuntilconvergence.
Step#1 LetL(i) = U(i)Y(i) bethei-thiterateoftheminimizationprocess. Inthefirst
step,theestimateofthedominantsubspaceisimprovedbysolving
U(i+1) =argminh (Xˆ −A(UU(cid:62)L(i))). (3)
µ
U∈Stk,m
Wemastertheambiguityproblembyreformulating(3)asaminimizationtaskonthe
setofsymmetricrank-kprojectors,whichpossessesamanifoldstructureandisknown
astheGrassmannian
Gr :={P∈Rm×m|P=UU(cid:62),U ∈St }. (4)
k,m k,m
Thisresultsintheoptimizationproblem
P(i+1) =argminh (Xˆ −A(PL(i))), (5)
µ
P∈Grk,m
withP(i+1) =U(i+1)(U(i+1))(cid:62).
4
Step #2 In the second step, the coordinates with respect to U(i+1) of the projected
datapointsareadjustedbysolving
Y(i+1) =argminh (Xˆ −A(U(i+1)Y)). (6)
µ
Y∈Rk×n
Then, µ is decreased previous to the next iteration. Since ordinary PCA of the data
allows a reasonably good initialization of U and Y, we initialize our algorithm with
atruncatedSVDofsomematrix X thatisinaccordancewiththemeasurements, i.e.
0
A(X )= Xˆ. ThealternatingschemeissummarizedinAlgorithm1.
0
Algorithm1AlternatingschemeforRobustPCA
Initialize:
ChooseX ,s.t.A(X )= Xˆ.
0 0
ObtainU(0)fromkleftsingularvaluesofX .
0
Y(0) =U(0)(cid:62)X
0
L(0) =U(0)Y(0)
P(0) =U(0)U(0)(cid:62)
Chooseµ(0)andµ(I),compute
(cid:18) (cid:19)1/(I−1)
c = µ(I)
µ µ(0)
fori=1: Ido
P(i+1) =argminh (Xˆ −A(PL(i))) Step#1
µ(i)
P∈Grk,m
findU(i+1) s.t. U(i+1)U(i+1)(cid:62) = P(i+1)
Y(i+1) =argminh (Xˆ −A(U(i+1)Y)) Step#2
µ(i)
Y∈Rk×n
L(i+1) =U(i+1)Y(i+1)
µ(i+1) =c µ(i)
µ
endfor
Lˆ =U(I)Y(I), Sˆ = X−Lˆ
4 Optimizing the smooth sparsity measure
Theproposedalternatingminimizationschemeinvolvesthetwonon-convexbutsmooth
optimizationproblems(5)and(6). WeminimizebothusingaConjugateGradienttype
method due to its scalability and superlinear rate of convergence. The optimization
methods are conceptually very similar but differ in the domains they operate on, as
becomesclearfromthecostfunctionsandtheirrespectivegradients.
(cid:16) (cid:17)
f : Gr →R, P(cid:55)→h (Xˆ −A(PL)), ∇f =−A∗ ∇h (Xˆ −A(PL)) L(cid:62) (7)
1 k,m µ 1 µ
(cid:16) (cid:17)
f : Rk×n →R, Y (cid:55)→h (Xˆ −A(UY)), ∇f =−U(cid:62)A∗ ∇h (Xˆ −A(UY)) (8)
2 µ 2 µ
Note that A∗ denotes the adjoint of the operator A. Conjugate Gradient methods for
minimizationproblemsintheEuclideancase,suchas(8)arestandardandwellestab-
lished. Incontrasttothis,thecoreofouralgorithm,i.e.minimizing(7)isageometric
optimizationproblemontherealGrassmannianandthusrequiresadditionalconcepts
suchasvectortransportandretraction.
5
Inthefollowing,wewillrecallsomegeneralconceptsandfurthercollecttheingre-
dientsforouralgorithm. Inparticular,wederiveanewretractionthatiscrucialforthe
algorithm’scomputationalperformance.Foradeeperandmoregeneralinsightintothe
topicofGeometricOptimizationwerefertotheworkofAbsiletal(2008).
4.1 GeometryoftheGrassmannian
Consideraprojector P ∈ Gr asapointontheGrassmannianandletu(m) := {Ω ∈
k,m
Rm×m|Ω(cid:62) =−Ω}bethesetofskew-symmetricmatrices.UsingtheLiebracketoperator
[Z ,Z ]=Z Z −Z Z ,thesetofelementsinthetangentspaceofGr atPisgivenby
1 2 1 2 2 1 k,m
T Gr ={[P,Ω]|Ω∈u }. Inthefollowing,wewillendowRm×mwiththeFrobenius
P k,m m
innerproduct(cid:104)Z ,Z (cid:105) := tr(Z(cid:62)Z )wheretr(·)denotesthetraceoperator,andconsider
1 2 1 2
the Riemannian metric on Gr accordingly as the restriction of this product to the
k,m
respective tangent space. The orthogonal projection of an arbitrary point Z ∈ Rm×m
ontothetangentspaceatPis
π : Rm×m →T Gr , Z (cid:55)→[P,[P,Z ]] (9)
P P k,m s
withZ = 1(Z+Z(cid:62))beingthesymmetricpartofZ.
s 2
It is crucial for optimization procedures on manifolds to establish a relation be-
tweenelementsofthetangentspaceandcorrespondingpointsonthemanifold,which
motivatestheusageofretractions. Conceptually,aretractionatsomepointPisamap-
pingfromthetangentspaceatPtoGr withalocalrigidityconditionthatpreserves
k,m
gradientsatP.
The generic way of locally parameterizing a smooth manifold is via Riemannian
exponentialmappings.Astheyarecostlytocomputeingeneralweperformanapprox-
imation based on the QR-decomposition. LetGl(m) be the set of invertible (m×m)-
matrices and let R(m) ⊂ Gl(m) be the set of upper triangular matrices with positive
entriesonthediagonal.ItfollowsfromtheGram-Schmidtorthogonalizationprocedure
thattheQR-decompositionO(m)×R(m)→Gl(m), (Q,R)(cid:55)→ QRisadiffeomorphism.
Accordingly,everyA ∈Gl(m)decomposesuniquelyintoA =: A A withA ∈ O(m)
Q R Q
andA ∈R(m). Moreover,itfollowsthatthemap
R
qΩ: R→O(m), qΩ(t):=(Im+tΩ)Q (10)
issmoothforallΩ ∈ u(m)withderivativeat0beingq˙Ω(0) = Ω,cf.Kleinsteuberand
Hu¨per(2007). Using
α(cid:48) : T Gr →Gr , α(cid:48)(ξ):=q (1) P(cid:0)q (1)(cid:1)(cid:62) (11)
P P k,m k,m P [ξ,P] [ξ,P]
andexploitingthepropertiesoftheexponentialmapping,onecandevelopthefollowing
result:
Lemma1 Considerarbitraryorthogonalmatricesθ∈O(m). Themapping
α : T Gr →Gr , α (ξ):=θ(cid:0)q (1)(cid:1)θ(cid:62)Pθ(cid:0)q (1)(cid:1)(cid:62) θ(cid:62) (12)
P,θ P k,m k,m P,θ θ(cid:62)[ξ,P]θ θ(cid:62)[ξ,P]θ
definesasetofretractionsonGr .
k,m
Proof1 Thefirstcondition,namelyα (0) = P,followsstraightforwardly. Itremains
(cid:12) P,θ
toshowthat ddt(cid:12)(cid:12)t=0αP,θ(tξ)=ξ.
6
Firstly, note that q[tξ,P](1) = q[ξ,P](t). Since it has been verified that q˙Ω(0) = Ω
(cf.Helmkeetal(2007))and[ξ,P]isskewsymmetric,itfollowsthat
ddt(cid:12)(cid:12)(cid:12)t=0αP,θ(tξ)=θq˙θ(cid:62)[ξ,P]θ(0)θ(cid:62)Pθ(cid:0)qθ(cid:62)[ξ,P]θ(0)(cid:1)(cid:62)θ(cid:62)+θqθ(cid:62)[ξ,P]θ(0)θ(cid:62)Pθ(cid:0)q˙θ(cid:62)[ξ,P]θ(0)(cid:1)(cid:62)θ(cid:62)
=θθ(cid:62)[ξ,P]θθ(cid:62)P−Pθθ(cid:62)[ξ,P]θθ(cid:62)
=[P,[P,ξ]]
=ξ
Tounderstandthelaststepoftheequation, notethatξ ∈ T Gr andtherefore, ξ is
P k,m
invariantundertheprojectionπ (·). (cid:3)
P
As an associated vector transport, i.e. a mapping that for a given ξ ∈ T Gr trans-
P k,m
portsthetangentelementη∈T Gr alongtheretractionα (ξ)tothetangentspace
P k,m P,θ
T Gr ,wechoose
αP,θ(ξ) k,m
τ (η):=θ(cid:0)q (1)(cid:1)θ(cid:62)ηθ(cid:0)q (1)(cid:1)(cid:62)θ(cid:62). (13)
ξ,P,θ θ(cid:62)[ξ,P]θ θ(cid:62)[ξ,P]θ
Note that in our algorithm the context of τ is always clear. Thus, we will drop the
subscriptsandsimplywriteτ(η)forenhancedlegibility.
4.2 CGontheGrassmannian
Inthefollowing,wesketchhowthewell-knownnonlinearCGmethodextendstothe
Grassmannianforminimizingasmoothfunction f: Gr →R. Recall,thatif f isthe
k,m
restrictionofasmoothfunction fˆ: Rm×m →R,theRiemanniangradientinthetangent
spaceisgivenby
grad f(P)=π (∇fˆ), (14)
P
where∇fˆisthecommongradientof fˆinRm×m. TheCGmethodontheGrassmannian
can be outlined as follows. Starting at an initial point P(0) ∈ Gr , the Riemannian
k,m
gradientΓ(0) canbecomputedand H(0) = −Γ(0) isselectedasinitialsearchdirection.
In each iteration suitable step-size t(i) is determined using a backtracking line-search
algorithmontheGrassmannian,cf. Algorithm2. Thenewiterateisthenobtainedvia
P(i+1) =α (t(i)H(i)). Finally,thesearchdirection
P(i)
H(i+1) =−Γ(i+1)+β(i)τ(H(i)) (15)
isupdated,whereweconsidertwoupdaterulesforβ(i),namely
(cid:104)Γ(i+1),Γ(i+1)(cid:105) (cid:104)Γ(i+1),(Γ(i+1)−τ(Γ(i)))(cid:105)
β(i) = , β(i) = . (16)
FR (cid:104)Γ(i),Γ(i)(cid:105) HS (cid:104)τ(H(i)),(Γ(i+1)−τ(Γ(i)))(cid:105)
The former is a Riemannian adaption of the well-known Fletcher-Reeves update for-
mula and guarantees convergence of our algorithm (see the subsequent remark). The
latterisanadaptationoftheHestenes-Stiefelformulaandtypicallyleadstobettercon-
vergencebehaviorinpractice,whichiswhyweuseitforallouralgorithms.
Algorithm2BacktrackinglinesearchonGrassmannian
Chooset >0;c,ρ∈(0,1)andsett←t
init init
repeat
t←ρt
(cid:16) (cid:17)
until f α (tH(i)) ≤ f(P)+ct tr(Γ(i)(cid:62)H(i))
P
Choosestep-sizet(i) :=t
7
RemarkConvergenceofthegeometricCGmethod,whichusestheretractionandvector
transport as described above together with the Fletcher-Reeves update β and the
FR
strong Wolfe-Powell condition, is guaranteed by a result of Ring and Wirth (2012)
in the sense that liminf (cid:107)Γ(i)(cid:107) = 0. Note, that the Wolfe-Powell condition would
i→∞
requireaslightmodificationofAlgorithm2. However,wedonotstressthisissuesince
ithasnoimpactonthepracticalconvergencebehavior.
4.3 ImplementationofCGonGrassmannian
So far, the algorithm outlined above requires full (m × m)-matrices for the iterates
P(i),Γ(i) and H(i). Thisisadrasticlimitationontheperformanceofapracticalimple-
mentation. Inthissectionwederiveanewretractionandshowhowitcanbeusedto
avoidfullmatrixmultiplicationandtoreducethestoragerequirementstremendously.
ThekeyideaistodecomposetheprojectionmatricesP ∈ Gr intoP = UU(cid:62) andto
k,m
iterateonStiefelmatricesU ∈St instead. Moreover,onecanexploitthestructureof
k,m
thetangentspaceT Gr atthestandardprojectorI,thatis
I k,m
I =(cid:34)Ik 0(cid:35), T Gr =(cid:40)(cid:34)0 A(cid:62)(cid:35) (cid:12)(cid:12)(cid:12)A∈R(m−k)×k(cid:41). (17)
0 0 I k,m A 0 (cid:12)
GivenU,alargeQR-decompositionofU yieldsafastwayofconstructing
(cid:104) (cid:105)
V := U U⊥ ∈O(m), (18)
where U⊥ ∈ St denotes a basis of the orthogonal complement of the subspace
(m−k),m
spannedbyU. Then,ifP=UU(cid:62)andξ ∈T Gr ,theidentities
P k,m
(cid:34) (cid:35)
0 A(cid:62)
V(cid:62)PV =I and V(cid:62)ξV = (19)
A 0
holdforsomeA∈R(m−k)×k. Therefore,insteadofstoringthefullPandξitissufficient
tostoreU andA. Formally,thisdefinesthebijection
con : T Gr →R(m−k)×k, con (ξ)= A. (20)
V P k,m V
TheprojectionontoT Gr followsfromastraightforwardcalculation.
P k,m
(cid:104) (cid:105)
Lemma2 Let V = U U⊥ ,P = UU(cid:62) and Z = 1(Z +Z(cid:62)) be defined as above
s 2
andtheorthogonalprojectionontothetangentspaceatPbedenotedbyπ . Thenthe
P
identity
con (π (Z))=(U⊥)(cid:62)Z U (21)
V P s
holds.
Proof2 Usingthedefinitionofπ (Z)andthefactthatV(cid:62)PV =I,
P
V(cid:62)π (Z)V =[V(cid:62)PV ,V(cid:62)[P,Z ]V]
P s
=[I,[I,V(cid:62)Z V]]
s
andthesamestructureasin(19)isobtained. (cid:3)
8
Lemma 1 allows to choose a particular retraction that is easy to compute. Consider
A=con (ξ)andlet
V
(cid:34) (cid:35)
R
A=θ (22)
A 0
bethelargeQR-decompositionofA,withθ ∈O(m−k)andRanuppertriangular(not
A
necessarilyinvertible)(k×k)-matrix. Furthermore,define
(cid:34) (cid:35)
I −R(cid:62)
M(R):= k (23)
R I
k
anditsQ-factorθ ∈O(2k).
M
(cid:34) (cid:35)
I 0
Lemma3 Letα (ξ)bearetractionasin(12),whereθischosenasθ:=V k .
P,θ 0 θ
A
ThentheStiefelmatrix
(cid:34) (cid:35)(cid:34) (cid:35)
θ 0 I
U(cid:101):=θ M k ∈St (24)
0 I 0 k,m
m−2k
satisfiesα (ξ)=U(cid:101)U(cid:101)(cid:62).
P,θ
(cid:34) (cid:35)
θ 0
Proof3 Onecandeduceq (1)=(I +θ(cid:62)[ξ,P]θ) = M from
θ(cid:62)[ξ,P]θ m Q 0 I
m−2k
(cid:34) (cid:35) (cid:34) (cid:35)
I I 0
θ(cid:62)[ξ,P]θ= k [V(cid:62)ξV,V(cid:62)PV] k
θ(cid:62) 0 θ
A A
(cid:34) (cid:35)(cid:34) (cid:35)(cid:34) (cid:35)
I 0 0 −A(cid:62) I 0
= k k
0 θ(cid:62) A 0 0 θ
A A
(cid:34) (cid:35)
0 −A(cid:62)θ
= A
θ(cid:62)A 0
A
(cid:104) (cid:105)
=(cid:34)R0(cid:35) −RT0 0 .
0
Therefore,theretractionis
(cid:34) (cid:35) (cid:34) (cid:35)
θ 0 θ(cid:62) 0
α (ξ)=θ M θ(cid:62)Pθ M θ(cid:62)
P,θ 0 I 0 I
m−2k m−2k
(cid:34) (cid:35)(cid:34) (cid:35)(cid:34) (cid:35) (cid:34) (cid:35)(cid:34) (cid:35)(cid:34) (cid:35)
I 0 θ 0 I 0 I 0 θ(cid:62) 0 I 0
=V k M k V(cid:62)PV k M k V(cid:62)
0 θ 0 I 0 θ(cid:62) 0 θ 0 I 0 θ(cid:62)
A m−2k (cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)A(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32) (cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)(cid:32)A(cid:32) m−2k A
(cid:124) (cid:123)(cid:122) (cid:125)
I
(cid:34) (cid:35)(cid:34) (cid:35)(cid:34) (cid:35) (cid:34) (cid:35)(cid:34) (cid:35)
I 0 θ 0 I (cid:104) (cid:105) θ(cid:62) 0 I 0
=V k M k I 0 M k V(cid:62),
0 θ 0 I 0 k 0 I 0 θ(cid:62)
A m−2k m−2k A
whichcompletestheproof. (cid:3)
Similarlytotheretraction,thevectortransportfrom(13)issimplified,asisdescribed
inthefollowing.
9
(cid:104) (cid:105)
Lemma4 Letθ,ξ,P,U(cid:101)beasaboveandV(cid:101):= U(cid:101) U(cid:101)⊥ . Thenforη∈T Gr and
P k,m
B:=con (η),theidentitycon (τ (η))=θ(cid:62)Bholds.
V V(cid:101) ξ,P,θ A
(cid:34) (cid:35)
0 B(cid:62)
Proof4 From(19)wehaveV(cid:62)ηV = . Thevectortransportcanthusbewrit-
B 0
tenas
(cid:34) (cid:35)(cid:34) (cid:35) (cid:34) (cid:35) (cid:34) (cid:35)(cid:34) (cid:35)
θ 0 I 0 0 B(cid:62) I 0 θ(cid:62) 0
τ (η)=θ M k k M θ(cid:62).
ξ,P,θ 0 I 0 θ(cid:62) B 0 0 θ 0 I
k A A k
Usingthefactthat
(cid:34) (cid:34) (cid:35)(cid:34) (cid:35) (cid:34) (cid:35)(cid:34) (cid:35)(cid:35) (cid:34) (cid:35)
(cid:104) (cid:105) θ 0 I θ 0 0 θ 0
θ(cid:62) U(cid:101) U(cid:101)⊥ =θ(cid:62) θ M k θ M = M
0 I 0 0 I I 0 I
m−2k m−2k m−k m−2k
wecanshowthat
(cid:34) (cid:35)(cid:34) (cid:35)(cid:34) (cid:35)(cid:34) (cid:35)(cid:34) (cid:35)
V(cid:101)(cid:62)τ (η)V(cid:101)=(cid:104)U(cid:101) U(cid:101)⊥(cid:105)(cid:62)θ θM 0 Ik 0 0 B(cid:62) Ik 0 θ(cid:62)M 0 θ(cid:62)(cid:104)U(cid:101) U(cid:101)⊥(cid:105)
ξ,P,θ 0 I 0 θ(cid:62) B 0 0 θ 0 I
m−2k A A m−2k
(cid:34) (cid:35)(cid:34) (cid:35)(cid:34) (cid:35)
I 0 0 B(cid:62) I 0
= k k
0 θ(cid:62) B 0 0 θ
A A
(cid:34) (cid:35)
0 B(cid:62)θ
= A
θ(cid:62)B 0
A
andcon (τ (η))followsagainfromthematrixstructure. (cid:3)
V(cid:101) ξ,P,θ
Algorithm 3 illustrates how the minimization of (2) on the Grassmannian is effi-
cientlyimplementedinpractice. Note,thattherein, H andG denotethepreimagesof
the respective bijection con and thus are of dimension (m−k)×k. We do not fur-
V
ther discuss the CG method in the Euclidean space as it is a standard method. In all
minimization procedures, the convergence is evaluated by observing the progress in
decreasingthecostfunction.
Algorithm3ImplementationofCGonGr
k,m
input U =U(i)
ObtainV andU⊥fromEq. (18)
ComputeG =(U⊥)(cid:62)∇f˜(UU(cid:62))U
H =−G
repeat
Obtainθ ,RfromHasinEq. (22)
H
Determinestep-sizetacc. Algorithm2
Obtainθ fromQR-dec.ofM(tR)(23)
M
UpdateU accordingto(24)
UpdateV,U⊥,Gasabove
Computeτ(G)=θ(cid:62)Gandτ(H)=θ(cid:62)H
H H
UpdateHfollowing(15)and(16)
untilconverged
output U(i+1)
10