Table Of ContentLearning Stable Group Invariant Representations
with Convolutional Networks
JoanBruna,ArthurSzlamandYannLeCun
CourantInstitute
3 NewYorkUniversity
1
NewNork,NY,10013
0
{bruna,lecun}@cims.nyu.edu
2
n
a
J
6 1 Introduction
1
Manysignalcategoriesinvisionandauditoryproblemsareinvarianttotheactionoftransformation
]
I groups,suchastranslations,rotationsorfrequencytranspositions.Thispropertymotivatesthestudy
A ofsignalrepresentationswhicharealsoinvarianttotheactionofthesetransformationgroups. For
. instance,translationinvariancecanbeachievedwitharegistrationorwithauto-correlationmeasures.
s
c
Transformationgroupsareinfactlow-dimensionalmanifolds,andthereforemeregroupinvariance
[
isingeneralnotenoughtoefficientlydescribesignalclasses. Indeed,signalsmaybeperturbedwith
1 additivenoiseandalsowithgeometricaldeformations,soonecanthenaskforinvariantrepresenta-
v tionswhicharestabletotheseperturbations.Scatteringconvolutionalnetworks[1]constructlocally
7 translation invariant signal representations, with additive and geometrical stability, by cascading
3 complexwaveletmodulusoperatorswithalowpasssmoothingkernel. Bydefiningwaveletdecom-
5
positionsonanylocallycompactLieGroup,scatteringoperatorscanbegeneralizedandcascadedto
3
providelocalinvariancewithrespecttomoregeneraltransformationgroups[2,3]. Althoughsuch
.
1 transformationgroupsarepresentacrossmanyrecognitionproblems,theyrequirepriorinformation
0 whichsometimescannotbeassumed.
3
1 Convolutional networks [4] cascade filter banks with point-wise nonlinearities and local pooling
: operators. Byremappingtheoutputofeachlayerwiththeinputofthefollowingone,thetrainable
v
filtersimplementconvolutionoperators.Weshowthattheinvariancepropertiesbuiltbydeepconvo-
i
X lutionalnetworkscanbecastasaformofstablegroupinvariance. Thenetworkwiringarchitecture
determinestheinvariancegroup,whilethetrainablefilter coefficientscharacterizethegroupaction.
r
a
Deepconvolutionalarchitecturescascadeseverallayersofconvolutions,non-linearitiesandpooling.
These architectures have the capacity to generate local invariance to the action of more general
groups.Underappropriateconditions,thesegroupscanbefactorizedasproductsofsmallergroups.
Eachofthesefactorscanthenbeassociatedwithasubsetofconsecutivelayersoftheconvolutional
network. In these conditions, the invariance properties of the final representation can be studied
fromthegroupstructuregeneratedbyeachlayer.
2 Problem statement
2.1 StableGroupInvariance
Atransformationgroup GactsontheinputspaceX (assumedtobeaHilbertspace)withalinear
groupaction(g,x)7→g.x∈X,whichiscompatiblewiththegroupoperation.
AsignalrepresentationΦ : X −→Z isinvarianttotheactionofGif∀g ∈G,x ∈X , Φ(g.x)=
Φ(x) . However, mere group invariance is in general too weak, due to the presence of a much
larger,highdimensionalvariabilitywhichdoesnotbelongtothelow-dimensionalgroup. Itisthen
necessary to incorporate the notion of outer “deformations” with another group action ϕ : H ×
1
X −→ X , where H is a largergroupcontaining G. Thegeometricstability can bestated with a
Lipschitzcontinuityproperty
kΦ(ϕ(h,x))−Φ(x)k≤Ckxkk(h,G), (1)
wherek(h,G)measuresthe“distance”fromhtotheinvariancegroupG. Forinstance,whenGis
thetranslationgroupofRd andH ⊃ GisthegroupofC2 diffeomorphismsofRd,thenϕ(h,x) =
x◦h and one can select as distance the elastic deformationmetric k(h,G) := |∇τ| +|Hτ| ,
∞ ∞
whereτ(u)=h(u)−u[2].
Eventhoughthegroupinvarianceformalismdescribesglobalinvariancepropertiesoftherepresen-
tation,italsoprovidesavalidandusefulframeworktostudylocalinvarianceproperties. Indeed,if
onereplaces(1)by
kΦ(ϕ(h,x))−Φ(x)k≤Ckxk(kh k +k(h,G)), (2)
G G
whereh isaprojectionofhtoGandkgk isametriconGmeasuringtheamountoftransforma-
G G
tionbeingapplied, thenthe localinvarianceis expressedby adjustingthe proportionalitybetween
thetwometrics.
2.2 ConvolutionalNetworks
AgenericconvolutionalnetworkdefinedonaspaceX =L2(Ω )ofsquare-integrablesignalsstarts
0
with a filter bank {ψ } , ψ ∈ L1(Ω )∀λ, which for each input x(u) ∈ X produces the
λ λ∈Λ1 λ 0
collection
z(1)(u,λ)=x⋆ψ (u)= x(u−v)ψ (v)dv, u∈Ω ,λ∈Λ .
λ Z λ 0 1
Ifthefilterbankdefinesastable,invertibleframe,thenthereexisttwoconstantsa,A>0suchthat
∀x,akxk≤kz(1)k≤Akxk,
wherekz(1)k2 = kz(1)(·,λ)k2. BydefiningΩ = Ω ×Λ , thefirstlayerofthenetwork
λ∈Λ1 1 0 1
canbewrittenasthPelinearmapping
F :L2(Ω ) −→ L2(Ω )
1 0 1
x(u) 7−→ z(1)(u,λ).
z(1)isthentransformedwithapoint-wisenonlinearoperatorM :L2(Ω)→L2(Ω)whichisusually
non-expansive,meaningthatkMzk ≤ kzk. Finally, a localpoolingoperatorP can be definedas
anylinearornonlinearoperator
P :L2(Ω)−→L2(Ω)
whichreducestheresolutionofthesignalalongoneormorecoordinatesandwhichavoids“alias-
e
ing”. IfΩ=Ω ×Λ ,,×Λ and(2J0,,2Jk)denotethelossofresolutionalongeachcoordinate,it
0 1 k
resultsthatΩ=Ω ×Λ ,,×Λ ,with|Ω |=2−αJ0|Ω |,|Λ |=2−αJi|Λ |,whereαisanoversam-
0 1 k 0 0 i i
plingfactor. Linearpoolingoperatorsareimplementedaslowpassfiltersφ (u,λ ,,λ )followed
e f f f f f J 1 m
byadownsampling.
Then,ak-layerconvolutionalnetworkisacascade
L2(Ω )M−◦→F1 L2(Ω )−P→1 L2(Ω )M−◦→F2 L2(Ω ) ··· −P→k L2(Ω ), (3)
0 1 1 2 k
whichproducessuccessivelyz(1),z(2),...,z(kf). f
Thefilterbanks(F ) , togetherwith thepoolingoperators (P ) , progressivelytransformthe
i i≤k i i≤k
signal domain; filter bank steps lift the domain of definition by adding new coordinates, whereas
poolingstepsreducetheresolutionalongcertaincoordinates.
3 Invariance Properties ofConvolutional Networks
3.1 Thecaseofone-parametertransformationgroups
Let us start by assuming the simplest form of variability produced by a transformation group. A
one-parametertransformationgroupisafamily{Ut}t∈R ofunitarylinearoperatorsof L2(Ω)such
2
that (i) t 7→ U it is strongly continuous: lim U z = U z for every z ∈ L2(Ω), and (ii)
t t→t0 t t0
U =U U . OneparametertransformationgroupsarethushomeomorphictoR(withtheaddition
t+s t s
asgroupoperation),anddefineanactionwhichiscontinuousinthegroupvariable.Uni-dimensional
translationsU x(u) =x(u−tv ),frequencytranspositionsU x= F−1(Fx(ω−tω ))(whereF,
t 0 t 0
F−1 are respectively the forward and inverse Fourier transform) or unitary dilations U x(u) =
t
2−t/2x(2−tu)areexamplesofone-parametertransformationgroups.
One-parametertransformationgroupsareparticularlysimpletostudythankstoStone’stheorem[5],
whichstatesthatunitaryone-parametertransformationgroupsareuniquelygeneratedbyacomplex
exponentialofaself-adjointoperator:
U =eitA , t∈R.
t
Here,thecomplexexponentialofaself-adjointoperatorshouldbeinterpretedintermsofitsspec-
tra. In the finite dimensionalcase (when Ω is discrete), this meansthatthereexistsan orthogonal
transformOsuchthatifzˆ(ω)=Oz,then
∀z, U z =O−1diag(eitω.zˆ(ω)). (4)
t
In other words, the group action can be expressed as a linear phase change in the basis which
diagonalizestheuniqueself-adjointoperatorAgivenbyStone’stheorem. Intheparticularcaseof
translations, the change of basis O is given by the Fouriertransform. As a result, one can obtain
a representationwhich is invariantto the action of {U } with a single layerof a neuralnetwork:
t t
a lineardecompositionwhichexpressesthedatain the basis givenbyO followedbyapoint-wise
complexmodulus.Inthecaseofthetranslationgroup,thiscorrespondstotakingthemodulusofthe
Fouriertransform.
3.2 Presenceofdeformations
Stone’stheoremprovidesarecipeforglobalgroupinvarianceforstronglycontinuousgroupactions.
Without noise nor deformations, an invariant representation can be obtained by taking complex
moduliin a basiswhichdiagonalizesthegroupaction, which canbe implementedina shallow 1-
layerarchitecture. However,theunderlyinglow-dimensionalassumptionisrarelysatisfied, dueto
thepresenceofmorecomplexformsofvariability.
Thiscomplexvariabilitycanbemodeledasfollows.IfOisthebasiswhichdiagonalizesagivenone-
parametergroup,thenthe groupactionis expressedin the basisF−1O asthe translationoperator
T z(u)=z(u−s).Whereasthegroupactionconsistsinrigidtranslationsonthisbasis,byanalogy
s
adeformationisdefinedasanon-rigidwarpinginthisdomain: L z(u)=z(u−τ(u)),whereτ is
τ
adisplacementfieldalongtheindexesofthedecomposition.
Theamountofdeformationcan be measuredwith the regularityofτ(u), whichcontrolshowdis-
tantthewarpingisfrombeingarigidtranslationandhenceanelementofthegroup. Thissuggests
that, in order to obtain stability to deformations, rather than looking for eigenvectors of the in-
finitesimal groupaction, one should look forlinearmeasurementswhich are well localized in the
domain where deformations occur, and which nearly diagonalize the group action. In particular,
thesemeasurementscanbeimplementedwithconvolutionsusingcompactlysupportedfilters,such
asinconvolutionalnetworks.
Let z(n)(u,λ ,...,λ ) be an intermediate representation in a convolutional network, and whose
1 n
firstlayerisfullyconnected.SupposethatGisagroupactingonzvia
g.z(n)(u,λ ,...,λ )=z(n)(u,λ +η(g),λ ,...,λ ), (5)
1 n 1 2 n
whereη : G → Λ . Thiscorrespondstotheidealizedcasewherethetransformationonlymodifies
1
one componentof the representation. A local poolingoperatoralong the variable λ , at a certain
1
scale2J,attenuatesthetransformationbygassoonas|η(g)|≪2J.Itthusproduceslocalinvariance
withrespecttotheactionofG.
3.3 GroupFactorizationwithDeepNetworks
Deep convolutional networks have the capacity to learn complex relationships of the data and to
build invariance with respect to a large family of transformations. These propertiescan be partly
explainedintermsofafactorizationoftheinvariancegroupsperformedsuccessively.
3
Whereas pooling operators efficiently produce stable local invariance, convolution operators pre-
serve the invariance generated by previous layers. Indeed, suppose z(n)(u,λ ) is an interme-
1
diate representation in a convolutional network, and that G acts on z(n) via g.z(n)(u,λ ) =
1
z(n)(f(g,u),λ +η(g)). Itfollowsthatifthenextlayerisconstructedas
1
z(n+1)(u,λ ,λ ):=z(n)(u,·)⋆ψ (λ ),
1 2 λ2 1
then G acts on z(n+1) via g.z(n+1)(u,λ ,λ ) = z(n)(f(g,u),λ +η(g),λ ), since convolutions
1 2 1 2
commutewiththegroupaction,whichbyconstructionisexpressedasatranslationinthecoefficients
λ . Thenewcoordinatesλ arethusunaffectedbytheactionofG.
1 2
As a consequence, this property enables a systematic procedure to generate invariance to groups
of the form G = G ⋊G ⋊...⋊G , where H ⋊H is the semidirect productof groups. In
1 2 s 1 2
this decomposition, each factor G is associated with a range of convolutional layers, along the
i
coordinateswheretheactionofG isperceived.
i
4 Perspectives
Theconnectionsbetweengroupinvarianceanddeepconvolutionalnetworksofferaninterpretation
of their efficiency on several recognition tasks. In particular, they might explain why the weight
sharinginducedbyconvolutionsisavalidregularizationmethodinpresenceofgroupvariability.
Moreconcretely,weshallalsoconcentrateonthefollowingaspects:
• GroupDiscovery. Onemightaskforthegroupoftransformationswhichbestexplainsthe
variabilityobservedinagivendataset{x }. Inthecasewherenogeometricdeformations
i
arepresent,onecanstartbylearningthe(complex)eigenvectorsofthegroupaction:
U∗ =arg min var(|Ux |).
i
UTU=1
When the data correspondsto a uniform measure on the group, then this decomposition
canbeobtainedfromthediagonalizationofthecovarianceoperatorΣ = E(xTx). Inthat
case,therealeigenvectorsofΣaregroupedintopairsofvectorswithidenticaleigenvalue,
whichthendefinethecomplexdecompositiondiagonalizingthegroupaction.
Inpresenceofdeformations,theglobalinvarianceisreplacedbyameasureoflocalinvari-
ance.Thisproblemiscloselyrelatedtothesparsecodingwithslownessfrom[6].
• Structured Convolutional Networks. Groups offer a powerful framework to incorporate
structureintothe familiesoffilters, similarlyisin[7]. Onthe onehand,onecanenforce
globalpropertiesof thegroupby definingthe convolutionsaccordingly. Forinstance, by
wrappingthedomainoftheconvolution,oneisenforcingaperiodicgrouptoemerge. On
the other hand, one could further regularize the learning by enforcing a group structure
withinafilterbank. Forinstance,onecouldaskacertainfilterbankF = {h ,...,h }to
1 n
havetheformF ={R h } ,whereR isarotationwithanangleθ.
θ 0 θ θ
References
[1] J.Bruna,S.Mallat,“InvariantScatteringConvolutionalNetworks”,IEEETPAMI,2012.
[2] S.Mallat,“GroupInvariantScattering”,CPAM,2012.
[3] L.Sifre, S.Mallat, “Combined Scattering for Rotation Invariant Texture Analysis”, ESANN,
2012.
[4] Y.LeCun, L.Bottou, Y.Bengio, P.Haffner, “Gradient-Based Learning Applied to Document
Recognition”,IEEE,1998
[5] M.H.Stone,“Onone-parameterunitaryGroupsinHilbertSpace”,Ann.ofMathematics,1932.
[6] C.Cadieu, B.Olshausen, “LearningTransformationalInvariantsfromNaturalMovies”, NIPS
2009.
[7] K.Gregor,A.Szlam,Y.Lecun,“StructuredSparseCodingvialateralinhibition”,NIPS,2011.
4