Table Of Content1
A Unified RGB-T Saliency Detection Benchmark:
Dataset, Baselines, Analysis and A Novel Approach
Chenglong Li, Guizhao Wang, Yunpeng Ma, Aihua Zheng, Bin Luo, and Jin Tang
Abstract—Despite significant progress, image saliency detec- In this paper, we contribute a comprehensiveimage bench-
tion still remains a challenging task in complex scenes and mark for RGB-T saliency detection, and the following two
7
environments. Integrating multipledifferent but complementary
main aspects are considered in creating this benchmark.
1
cues, like RGB and Thermal (RGB-T), may be an effective
0
way for boosting saliency detection performance. The current • A good dataset should be with reasonable size, high di-
2 research in this direction, however, is limited by the lack of a
versityandlowbias[4].Therefore,we useourrecording
n comprehensivebenchmark.ThisworkcontributessuchaRGB-T
system to collect 821 RGB-T image pairs in different
a imagedataset,whichincludes821spatiallyalignedRGB-Timage
J pairs and their ground truth annotations for saliency detection scenesandenvironmentalconditions,andeachimagepair
purpose.Theimagepairsarewithhighdiversityrecordedunder is aligned and annotated with ground truth. In addition,
1
1 different scenes and environmental conditions, and we annotate the category, size, number and spatial information of
11challengesontheseimagepairsforperformingthechallenge-
salient objects are also taken into account for enhancing
sensitive analysis for different saliency detection algorithms.
] thediversityandchallenge,andwepresentsomestatistics
V We also implement 3 kinds of baseline methods with different
modalityinputstoprovideacomprehensivecomparisonplatform. ofthecreateddatasettoanalyzethediversityandbias.To
C
With this benchmark, we propose a novel approach, multi- analyze the challenge-sensitive performance of different
s. taskmanifoldrankingwithcross-modalityconsistency,forRGB- algorithms,weannotate11differentchallengesaccording
c T saliency detection. In particular, we introduce a weight for to the above-mentionedfactors.
[ each modalitytodescribethereliability,andintegratetheminto
• To the best of our knowledge,RGB-T saliency detection
thegraph-basedmanifoldrankingalgorithmtoachieve adaptive
1 remains not well investigated. Therefore, we implement
fusion of different source data. Moreover, we incorporate the
v cross-modalityconsistentconstraintstointegratedifferentmodal- somebaselinemethodstoprovideacomparisonplatform.
9
ities collaboratively. For the optimization, we design an efficient Ononehand,weregardRGBorthermalimagesasinputs
2
algorithm to iteratively solve several subproblems with closed- in some popular methods to achieve single-modality
8
form solutions. Extensive experiments against other baseline
2 saliency detection. These baselines can be utilized to
methods on the newly created benchmark demonstrate the
0 identifytheimportanceandcomplementarityofRGBand
effectiveness of the proposed approach, and we also provide
1. basicinsightsandpotentialfutureresearchdirectionsforRGB-T thermal information with comparing to RGB-T saliency
0 saliency detection. detection methods. On the other hand, we concatenate
7 the features extracted from RGB and thermal modalities
Index Terms—RGB-T benchmark, Saliency detection, Cross-
1 modality consistency, Manifold ranking, Fast optimization. together as the RGB-T feature representations, and em-
:
v ploy some popular methods to achieve RGB-T saliency
i detection.
X
I. INTRODUCTION
r Salient object detection has been extensively studied in
a
IMAGE saliency detection is a fundamental and active pastdecades,andnumerousmodelsandalgorithmshavebeen
problem in computer vision. It aims at highlightingsalient proposed based on different mathematical principles or pri-
foreground objects automatically from background, and has ors[5–15]. Most of methodsmeasuredsaliency by measuring
received increasing attentions due to its wide range of ap- local center-surround contrast and rarity of features over the
plications in computer vision and graphics, such as object entire image [5, 12, 16, 17]. In contrast, Gopalakrishnan et
recognition, content-aware retargeting, video compression, al. [18] formulated the object detection problem as a binary
and image classification. Despite significant progress, image segmentation or labelling task on a graph. The most salient
saliency detection still remains a challenging task in complex seeds and several background seeds were identified by the
scenes and environments. behaviorofrandomwalksonacompletegraphandak-regular
graph. Then, a semi-supervised learning technique was used
Recently, integrating RGB data and thermal data (RGB-
to infer the binary labels of the unlabelled nodes. Different
T data) has been proved to be effective in several computer
fromit,Yangetal.[19]employedmanifoldrankingtechnique
vision problems, such as moving object detection [1, 2] and
to salient object detection that requires only seeds from one
tracking[3].GiventhepotentialsofRGB-Tdata,however,the
class, which are initialized with either the boundary priors or
research of RGB-T saliency detection is limited by the lack
foreground cues. They then extended their work with several
of a comprehensive image benchmark.
improvements [20], including multi-scale graph construction
and a cascade scheme on a multi-layer representation. Based
The authors are with Anhui University, Hefei 230601, China. Email:
[email protected]. on the manifold ranking algorithms, Li et al. [21] gener-
2
ated pixel-wise saliency maps via regularized random walks paper.
ranking, and Wang et al. [22] proposed a new graph model
which captured local/global contrast and effectively utilized II. RGB-T IMAGE SALIENCY BENCHMARK
the boundary prior.
In this section, we introduce our newly created RGB-
With the created benchmark,we proposea novelapproach, T saliency benchmark, which includes dataset with statistic
multi-task manifold ranking with cross-modality consistency, analysis,baselinemethodswithdifferentinputsandevaluation
for RGB-T saliency detection. For each modality, we employ metrics.
the idea of graph-based manifold ranking [19] for the good
saliencydetectionperformanceintermsofaccuracyandspeed.
A. Dataset
Then, we assign each modality with a weight to describe
Wecollect821RGB-Timagepairsbyourrecordingsystem,
the reliability, which is capable of dealing with occasional
which consists of an online thermal imager (FLIR A310)
perturbation or malfunction of individual sources, to achieve
and a CCD camera (SONY TD-2073). For alignment, we
adaptive fusion of multiple modalities. To better exploit the
uniformly select a number of point correspondences in each
relationsamongmodalities,weimposethecross-modalitycon-
image pairs, and compute the homography matrix by the
sistentconstraintsontherankingfunctionsofdifferentmodali-
least-square algorithm. It is worth noting that this registration
tiestointegratethemcollaboratively.Consideringthemanifold
method can accurately align image pairs due to the following
ranking in each modality as an individualtask, our method is
two reasons. First, we carefully choose the planar and non-
essentially formulated as a multi-task learning problem. For
planar scenes to make the homography assumption effective.
the optimization, we jointly optimize the modality weights
Second, since two camera views are almost coincident as we
andtherankingfunctionsofmultiplemodalitiesbyiteratively
made, the transformation between two views is simple. As
solving two subproblems with closed-form solutions.
eachimagepairisaligned,weannotatethe pixel-levelground
This paper makes the following three major contributions
truth using more reliable modality.Fig. 1 shows some sample
for RGB-T image saliency detection and related applications.
image pairs and their ground truths.
• It creates a comprehensive benchmark for facilitating The image pairs in our dataset are recorded in approxi-
evaluatingdifferentRGB-Tsaliencydetectionalgorithms.
mately60sceneswithdifferentenvironmentalconditions,and
The benchmark dataset includes 821 aligned RGB-T
the category, size, number and spatial information of salient
images with the annotated ground truths, and we also
objectsare also taken intoaccountforenhancingthe diversity
presentthefine-grainedannotationswith11challengesto
and challenge. Specifically, the following main aspects are
allow us to analyze the challenge-sensitive performance
considered in creating the RGB-T image dataset.
of differentalgorithms.Moreover,we implement3 kinds
of baseline methods with different inputs (RGB, thermal • Illumination condition. The image pairs are captured
under different light conditions, such as sunny, snowy,
and RGB-T) for evaluations. This benchmark will be
and nighttime. The low illumination and illumination
available online for free academic usage 1.
variation caused by different light conditions usually
• It proposes a novel approach, multi-task manifold rank-
bring big challenges in RGB images.
ing with cross-modalityconsistency,for RGB-T saliency
detection. In particular, we introduce a weight for each • Background factor. Two background factors are taken
into account for our dataset. First, similar background
modality to represent the reliability, and incorporate the
to the salient objects in appearance or temperature will
cross-modality consistent constraints to achieve adaptive
introduce ambiguous information. Second, it is difficult
and collaborative fusion of different source data. The
toseparateobjectsaccuratelyfromclutteredbackground.
modality weights and ranking function are jointly op-
timized by iteratively solving several subproblems with • Salient object attribute. We take different attributes of
salient objects, including category (more than 60 cate-
closed-form solutions.
gories), size (see the size distribution in Fig. 2 (b)) and
• It presents extensive experiments against other state-of-
number,into accountin constructingourdataset forhigh
the-art image saliency methods with 3 kinds of inputs.
diversity.
The evaluation results demonstrate the effectiveness of
the proposed approach. Through analyzing quantitative • Object location. Most of methods employ the spatial
information (center and boundary of an image) of the
results, we further provide basic insights and identify
salientobjectsas priors,whichis verifiedto be effective.
the potentials of thermal information in RGB-T saliency
However, some salient objects are not at center or cross
detection.
image boundaries, and these situations isolate the spatial
The rest of this paper is organized as follows. Sect. II
priors. We incorporate these factors into our dataset
introducesdetailsoftheRGB-Tsaliencydetectionbenchmark.
construction to bring its challenge, and Fig. 2 presents
Theproposedmodelandtheassociatedoptimizationalgorithm
the spatialdistributionof salient objectson CB and CIB.
are presented in Sect. III, and the RGB-T saliency detection
Considering the above-mentioned factors, we annotate 11
approach is introduced in Sect. IV. The experimental results
challenges for our dataset to facilitate the challenge-sensitive
and analysis are shown in Sect. V. Sect. VI concludes the
performanceof differentalgorithms. They are: big salient ob-
ject(BSO),smallsalientobject(SSO),multiplesalientobjects
1RGB-Tsaliency detection benchmark’s webpage:
http://chenglongli.cn/people/lcl/journals.html. (MSO), low illumination (LI), bad weather (BW), center bias
3
SSO,SA,CB,TC IC,TC,CB,MSO,SSO LI,BSO,IC,CIB MSO,SA,CB,CIB,TC BSO,TC,OF BW,OF,CIB,IC,CB,MSO BSO,TC,SA
CB,MSO,SA,OF CB,CIB,TC,BSO MSO,TC,CIB BW,TC,CB,CIB,MSO CIB,MSO,BSO BW,BSO,MSO,CIB,TC LI,CB,IC,TC,SSO
Fig.1. Sampleimagepairswithannotated groundtruthsandchallenges fromourRGB-Tdataset.
TABLEII
LISTOFTHEBASELINEMETHODSWITHTHEUSEDFEATURES,THEMAINTECHNIQUESANDTHEPUBLISHEDINFORMATION.
Algorithm Feature Technique Book Title Year
MST[15] Lab&Intensity Minimumspanningtree IEEECVPR 2016
RRWR[21] Lab Regularized random walksranking IEEECVPR 2015
CA[23] Lab Celluar Automata IEEECVPR 2015
GMR[19] Lab Graph-basedmanifoldranking IEEECVPR 2013
STM[8] LUV&Spatial information Scale-based treemodel IEEECVPR 2013
GR[24] Lab Graphregularization IEEESPL 2013
NFI[25] Lab&Orientations &Spatial information Nonlinear feature integration JournalofVision 2013
MCI[26] Lab&Spatial information Multiscale contextintegration IEEETPAMI 2012
SS-KDE[27] Lab Sparsesamplingandkernel densityestimation SCIA 2011
BR[28] Lab&Intensity&Motion Bayesianreasoning ECCV 2010
SR[29] Lab Self-resemblance JournalofVision 2009
SRM[16] Spectrum Spectral residualmodel IEEECVPR 2007
(CB), cross image boundary (CIB), similar appearance (SA), we regard RGB or thermal images as inputs in 12 popular
thermal crossover (TC), image clutter (IC), and out of focus methodstoachievesingle-modalitysaliencydetection,includ-
(OF). Tab. I shows the details, and Fig. 2 (a) presents the ing MST [15], RRWR [21], CA [23], GMR [19], STM [8],
challenge distribution. We will analyze the performance of GR[24],NFI[25],MCI[26],SS-KDE[27],BR[28],SR[29]
different algorithms on the specific challenge using the fine- andSRM[16].Tab.IIpresentsthedetails.Thesebaselinescan
grained annotations in the experimental section. be utilized to identify the importance and complementarity
of RGB and thermal information with comparing to RGB-T
saliency detection methods. On the other hand, we concate-
B. Baseline Methods
nate the features extracted from RGB and thermal modalities
Toprovideacomparisonplatform,weimplement3kindsof together as the RGB-T feature representations, and employ
baselinemethodswithdifferentmodalityinputs.Ononehand, the above-mentioned methods to achieve RGB-T saliency
4
TABLEI
LISTOFTHEANNOTATEDCHALLENGESOFOURRGB-TDATASET.
Challenge Description
BSO Big Salient Object - the ratio of ground truth salient
objects overimageismorethan0.26.
SSO SmallSalient Object-theratioofgroundtruthsalient
objects overimageislessthan0.05.
LI Low Illumination - the environmental illumination is
low.
BW Bad Weather - the image pairs are recorded in bad
weathers, suchassnowy,rainy,hazyandcloudy.
MSO Multiple Salient Objects - the number of the salient
objects intheimageismorethan1. Fig.3. Illustration ofgraphconstruction.
CB CenterBias-thecentersofsalientobjectsarefaraway
fromtheimagecenter.
CIB Cross Image Boundary - the salient objects cross the
The adaptive threshold is defined as:
imageboundaries.
SA Similar Appearance - the salient objets have similar
w h
colororshapetothebackground. 2
TC Thermal Crossover - the salient objects have similar T = S(i,j), (1)
w h
temperature tothebackground. × Xi=1Xj=1
IC ImageClutter-theimageiscluttered.
OF OutofFocus-theimageisout-of-focus. where w and h denote the width and height of an image,
respectively. S is the computed saliency map. The F-measure
(F)isdefinedasfollowswiththeprecision(P)andrecall(R)
of the above adaptive threshold:
(1+β2) P R
Fβ2 = β2 P×+R× , (2)
×
where we set the β2 = 0.3 to emphasize the precision as
suggested in [17]. PR curves and F metric are aimed at
0.3
quantitative comparison, while MAE are better than them
for taking visual comparison into consideration to estimate
(a) Challenge distribution (b) Size distribution dissimilarity between a saliency map S and the ground truth
G, which is defined as:
w h
1
MAE = S(i,j) G(i,j). (3)
w h | − |
× Xi=1Xj=1
III. GRAPH-BASED MULTI-TASKMANIFOLD RANKING
The graph-based ranking problem is described as follows:
Givenagraphandanodeinthisgraphasquery,theremaining
nodes are ranked based on their affinities to the given query.
(c) Spatial distribution of CB (d) Spatial distribution of CIB
The goal is to learn a ranking function that defines the
Fig.2. Datasetstatistics. relevance between unlabelled nodes and queries.
Thissectionwillintroducethegraph-basedmulti-taskmani-
foldrankingmodelandtheassociatedoptimizationalgorithm.
detection. The optimized modality weights and ranking scores will be
utilized for RGB-T saliency detection in next section.
C. Evaluation Metrics
A. Graph Construction
There exists several metrics to evaluate the agreement
between subjective annotations and experimental predictions. Given a pair of RGB-T images, we regard the thermal
In this work, We use (P)recision-(R)ecallcurves (PR curves), image as one of image channels, and then employ SLIC
F metric and Mean Absolute Error (MAE) to evaluate algorithm [30] to generate n non-overlapping superpixels.
0.3
all the algorithms. Given the binarized saliency map via the We take these superpixels as nodes to construct a graph
threshold value from 0 to 255, precision means the ratio of G=(V,E),whereV isanodesetandE isasetofundirected
the correctly assigned salient pixel number in relation to all edges. In this work, any two nodes in V are connected if one
the detected salient pixel number, and recall means the ratio of the following conditions holds: 1) they are neighboring;
of the correct salient pixel number in relation to the ground 2) they share common boundaries with its neighboring node;
truthnumber.Differentfrom(P)recision-(R)ecallcurvesusing 3) they are on the four sides of image, i.e., boundary nodes.
a fixed threshold for every image, the F metric exploits an Fig. 3 shows the details. The first and second conditions are
0.3
adaptive threshold of each image to perform the evaluation. employed to capture local smoothness cues as neighboring
5
superpixels tend to share similar appearance and saliency
values. The third condition attempts to reduce the geodesic
distance of similar superpixels. It is worth noting that we can
explore more cues in RGB and thermal data to construct an
adaptive graph that makes best use of intrinsic relationship
among superpixels. We will study this issue in future work
as this paper is with an emphasis on the multi-task manifold
ranking algorithm.
If nodes V and V is connected, we assign it with an edge
i j
weight as:
(a) (b) (c)
Wkij =e−γk||cki−ckj||, k =1,2,...K, (4) Fig.4. Illustration oftheeffectiveness ofintroducing themodalityweights
and the cross-modality consistency. (a) Input RGB and thermal images.
where ck denotes the mean of the i-th superpixel in the k-th (b) Results of our method without modality weights and cross-modality
i consistency are shown in the first and second rows, respectively. (c) Our
modality, and γ is a scaling parameter.
resultsandthecorresponding groundtruth.
B. Multi-Task ManifoldRankingwith Cross-ModalityConsis-
ranking algorithm is proposed as follows:
tency
1 K n sk sk
We first review the algorithm of graph-based manifold min ((rk)2 Wk i j 2)+
ranking that exploits the intrinsic manifold structure of data sk,rk 2kX=1 iX,j=1 ij|| Dkii − Dkjj||
for graph labeling [31]. Given a superpixel feature set X = p q (7)
x ,...,x Rd×n, some superpixels are labeled as queries K
{ 1 n}∈ µ sk y 2+ Γ (1 r) 2+λ sk sk−1 2,
and the rest need to be ranked according to their affinities to || − || || ◦ − || || − ||
kX=2
the queries, where n denotes the number of superpixels. Let
s:X Rn denote a ranking function that assigns a ranking where Γ = [Γ1,...,ΓK]T is an adaptive parameter vector,
value →s to each superpixel x , and s can be viewed as a which is initialized after the first iteration (see Alg. 1), and
i i
vector s = [s ,...,s ]T. In this work, we regard the query r = [r1,...,rK]T is the modality weight vector. denotes
1 n ◦
labels as initial superpixel saliency value, and s is thus an the element-wise product, and λ is a balance parameter. The
initialsuperpixelsaliencyvector.Lety=[y ,...,y ]T denote third term is to avoid overfitting of r, and the last term is
1 n
an indication vector, in which y = 1 if x is a query, and the cross-modalityconsistentconstraints. The effectivenessof
i i
y = 0 otherwise. Given G, the optimal ranking of queries introducingthese two terms is presentedin Fig. 4. With some
i
are computed by solving the following optimization problem: simple algebra, Eq. (7) can be rewritten as:
1 K n sk sk
min1( n Wij si sj 2+µ s y 2), (5) smk,irnk 2Xk=1((rk)2iX,j=1Wikj|| Di kii − Djkjj||2)+ (8)
s 2 ||√D − D || || − || p q
iX,j=1 ii jj µ sk y 2+ Γ (1 r) 2+λ CS 2,
p || − || || ◦ − || || ||F
where D = diag D11,...,Dnn is the degree matrix, and where S = [s1;s2;...;sK] RnK×1, and C Rn(K−1)×nK
{ }
D = W . diag indicates the diagonal operation. µ is a ∈ ∈
ii j ij denotesthecross-modalityconsistentconstraintmatrix,which
paramePtertobalancethesmoothnesstermandthefittingterm. is defined as:
Then, we apply manifold ranking on multiple modalities,
I2,1 I2 0 0 0
and have − ···
0 I3,2 I3 0 0
1 n sk sk − ··· , (9)
mskin2(iX,j=1Wikj|| Di kii − Djkjj||2+µ||sk−y||2), (6) 0 ·0·· 0 ··· IK·,K··−1 −IK
p q where Ik and Ik,k−1 are the identity matrices with the size of
k =1,2,...,K.
n n.
×
From Eq. (6), we can see that it inherently indicates that
available modalities are independent and contribute equally. C. Optimization Algorithm
This may significantly limit the performance in dealing with
We present an alternating algorithm to optimize Eq. (8)
occasional perturbation or malfunction of individual sources.
efficiently, and denote
Therefore,weproposeanovelcollaborativemodelforrobustly
performingsalientobjectdetectionthati)adaptivelyintegrates 1 K n sk sk
J(S,r)= ((rk)2 Wk i j 2)+
ditiifefse,reini)t mcooldlaabliotrieastivbealsyedcoomnptuhteeisr rthesepercatnivkeingmofduanlctrieolniasbiol-f 2Xk=1 iX,j=1 ij|| Dkii − Dkjj||
p q
multiple modalities by incorporating the cross-modality con- µ sk y 2+ Γ (1 r) 2+λ CS 2.
sistentconstraints.Theformulationofthemulti-taskmanifold || − || || ◦ − || || ||F (10)
6
Algorithm 1 Optimization Procedure to Eq. (8) Algorithm 2 Proposed Approach
Input: The matrix Ak = I (Dk)−21Wk(Dk)−12, the indi- Input: OneRGB-Timagepair,theparametersγk,µ1,µ2 and
−
cation vector Y, and the parameters µ and λ; λ.
Set rk = 1; ε=10−4, maxIter =50. Output: Saliency map ¯s.
K
Output: S, r. 1: // Graph construction
1: for t=1:maxIter do 2: Construct the graph with superpixels as nodes, and cal-
2: Update S by Eq. (13); culate the affinity matrix Wk and the degree matrix Dk
3: if t==1 then with the parameter γk;
4: for k =1:K do 3: // first stage ranking
5: Γk = (sk)TAksk; 4: Utilize the boundary priors to generate the background
6: end forp queries;
7: end if 5: Run Alg. 1 with the parameters µ1 and λ to obtain the
8: for k =1:K do initial saliency map sfs (Eq. (16));
9: Update rk by Eq. (15); 6: // second stage ranking
10: end for 7: Computetheforegroundqueriesusingtheadaptivethresh-
11: if Jt Jt−1 <ε then olds;
| − |
12: Terminate the loop. 8: Run Alg. 1 with the parameters µ2 and λ to compute the
13: end if final saliency map ¯s (Eq. (17)).
14: end for
IV. TWO-STAGERGB-T SALIENCY DETECTION
Given r, Eq. (10) can be written as: In this section, we present the two-stage ranking scheme
for unsupervised bottom-up RGB-T saliency detection using
1 K n sk sk the proposed algorithm with boundary priors and foreground
J(S)= ((rk)2 Wk i j 2)+
2Xk=1 iX,j=1 ij|| Dkii − Dkjj|| (11) queries.
p q
µ sk y 2+λ CS 2, A. Saliency Measure
|| − || || ||F
Given an input RGB-T images represented as a graph and
and we reformulate it as follows:
somesalientquerynodes,thesaliencyofeachnodeisdefined
J(S)=(R S)TA(R S)+µ S Y 2+λ CS 2, as its ranking score computed by Eq. (8). In the conventional
◦ ◦ || − || || || (12) ranking problems, the queries are manually labelled with the
where Y = [y1;y2;...;yK] RnK×1, and A is a block- ground-truth.Inthiswork,we firstemploytheboundaryprior
diagonal matrix defined as: A∈ = diag A1,A2,...,AK widelyusedinotherworks[11,19]tohighlightthesalientsu-
RnK×nK, where Ak = I (Dk)−12{Wk(Dk)−12. R} =∈ perpixels,andselecthighlyconfidentsuperpixels(lowranking
[r1;...;r1;r2;...;r2;...;rK;...;−rK] RnK×1. Taking the scores in all modalities) belonging to the foreground objects
derivative of J(S) with respect to S,∈we have as the foreground queries. Then, we perform the proposed
algorithmtoobtainthefinalrankingresults,andcombinethem
RRTA+λCTC −1 withtheirmodalityweightstocomputethefinalsaliencymap.
S= +I Y (13)
(cid:18) µ (cid:19)
B. Ranking with Boundary Priors
where I is an identity matrix with the size of nK nK.
× Based on the attention theories for visual saliency [32], we
Given S, Eq. (10) can be written as:
regard the boundary nodes as background seeds (the labelled
1 data) to rank the relevances of all other superpixel nodes in
J(rk)= (rk)2(sk)TAksk+Γk(1 rk), (14) the first stage.
2 −
Taking the bottom image boundary as an example, we
and we take the derivative of J(rk) with respect to rk, and utilize the nodes on this side as labelled queries and the
obtain rest as the unlabelled data, and initilize the indicator y in
rk = 1+ (sk1)TAksk, k =1,2,...,K. (15) Eemq.pl(o8y).inGgitvheenpyro,ptohseedrarnakniknigngvaallugeosri{thsmkb},aanrde wcoemnpourmtedalibzye
(Γk)2
sk as ˆsk withtherangebetween0and1.Similarly,given
{ b} { b}
A sub-optimal optimization can be achieved by alternating the top, left and right image boundaries, we can obtain the
between the updating of S and r, and the whole algorithm is respectiverankingvalues ˆsk , ˆsk , ˆsk .Weintegratethem
{ t} { l} { r}
summarizedinAlg.1.Althoughtheglobalconvergenceofthe to compute the initial saliency map for each modality in the
proposed algorithm is not proved, we empirically validate its first stage:
fast convergence in our experiments. The optimized ranking
sk =(1 ˆsk) (1 ˆsk) (1 ˆsk) (1 ˆsk),
functions sk andmodalityweights rk will be utilizedfor fs − b ◦ − t ◦ − l ◦ − r (16)
{ } { } k=1,2,...,K.
RGB-T saliency detection in next section.
7
1 1 1 1
0.8 0.8 0.8 0.8
Precision000...246 NMCGSSSGBRRSRAMRFC-M I [[K I[[R00[ 00[ D0 .[.0.[.320.570E6.944.09 .436[497040389]]2]].5]66]]4]7] Precision000...246 NMCGSSSGBRRSRAMRFC-M I [[K I[[R00[ 00[ D0 .[.0.[.420.340E4.285.86 .354[361180612]]7]].6]49]]6]3] Precision000...246 NMCGSSSGBRRSRAMRFC-M I [[K I[[R00[ 00[ D0 .[.0.[.350.540E4.864.44 .314[936560537]]2]].8]39]]7]3] Precision000...246 NMCGSSSGBRRSRAMRFC-M I [[K I[[R00[ 00[ D0 .[.0.[.430.550E5.725.29 .435[105880636]]2]].0]55]]2]7]
0 Ours [0.753] 0 Ours [0.498] 0 Ours [0.570] 0 Ours [0.628]
0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1
Recall Recall Recall Recall
(a) BSO Subset (b) BW Subset (c) CB Subset (d) CIB Subset
1 1 1 1
0.8 0.8 0.8 0.8
Precision00..46 NMCGAMFCI I[R[ 0[0 0[..504.54.8500925]]6]] Precision00..46 NMCGAMFCI I[R[ 0[0 0[..605.44.5589668]]8]] Precision00..46 NMCGAMFCI I[R[ 0[0 0[..605.51.4569897]]5]] Precision00..46 NMCGAMFCI I[R[ 0[0 0[..605.54.3603941]]1]]
SRM [0.386] SRM [0.427] SRM [0.444] SRM [0.386]
SR [0.410] SR [0.353] SR [0.474] SR [0.456]
0.2 SS-KDE [0.462] 0.2 SS-KDE [0.505] 0.2 SS-KDE [0.532] 0.2 SS-KDE [0.561]
GR [0.516] GR [0.568] GR [0.585] GR [0.630]
BR [0.443] BR [0.442] BR [0.492] BR [0.529]
Ours [0.610] Ours [0.678] Ours [0.666] Ours [0.672]
0 0 0 0
0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1
Recall Recall Recall Recall
(e) IC Subset (f) LI Subset (g) MSO Subset (h) OF Subset
1 1 1 1
0.8 0.8 0.8 0.8
Precision000...2460 NMCGSSSGBORRSRAMRFuC-M I r[[K I[[R0s0[ 00[ D0 .[.0[.[.350.5500E5.824.26. .4465[7080701766]]1]].9]449]]]7]5] Precision000...2460 NMCGSSSGBORRSRAMRFuC-M I r[[K I[[R0s0[ 00[ D0 .[.0[.[.250.4300E3.973.41. .2343[0076507667]]7]].8]923]]]7]1] Precision000...2460 NMCGSSSGBORRSRAMRFuC-M I r[[K I[[R0s0[ 00[ D0 .[.0[.[.440.5500E5.345.58. .4265[7520301224]]6]].4]847]]]8]5] Precision000...2460 NMCGSSSGBORRSRAMRFuC-M I r[[K I[[R0s0[ 00[ D0 .[.0[.[.450.6600E5.325.01. .4466[2010801481]]5]].1]055]]]1]5]
0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1
Recall Recall Recall Recall
(i) SA Subset (j) SSO Subset (k) TC Subset (l) Entire dataset
Fig.5. PRcurvesoftheproposedapproachwithotherbaselinemethodswithRGB-Tinputontheentiredatasetandseveralsubsets,wheretheF0.3 values
areshowninthelegend.
C. Ranking with Foreground Queries The source codes and result figures will be providedwith the
benchmark for public usage in the community.
Givensk ofthek-thmodality,wesetanadaptivethreshold
fs
T =max(sk ) ǫ to generatethe foregroundqueries,where
k fs − A. Experimental Settings
max() indicatesthe maximumoperation,andǫ is a constant,
· For fair comparisons, we fix all parameters and other
which is fixed to be 0.25 in this work. Specifically, we select
settingsofourapproachintheexperiments,andusethedefault
thei-thsuperpixelastheforegroundqueryofthek-thmodality
parameters released in their public codes for other baseline
if sk > T . Therefore, we compute the ranking values sk
fs,i k ss methods.
and the modality weights r in the second stage by employing
In graph construction, we empirically generate n = 300
ourrankingalgorithm.Similarto thefirststage,we normalize
superpixels and set the affinity parameters γ1 =24 and γ2 =
the ranking value sk of the k-th modality as ˆsk with the
ss ss 12, which control the edge strength between two superpixel
range between 0 and 1. Finally, the final saliency map can be
nodes.InAlg.1andAlg.2,weempiricallysetλ=0.03,µ =
obtained by combining the ranking values with the modality 1
0.02(thefirststage)andµ =0.06(thesecondstage).Herein,
weights: 2
we use bigger balance weight in the second stage than in the
K
¯s= (rkˆsk ). (17) firststageastherefinedforegroundqueriesaremoreconfident
ss than the background queries according to the boundary prior.
kX=1
B. Comparison Results
V. EXPERIMENTS
To justify the importance of thermal information, the com-
In this section, we apply the proposed approach over our plementarybenefitsto image saliencydetectionandthe effec-
RGB-T benchmarkandcomparewith otherbaselinemethods. tiveness of the proposed approach, we evaluate the compared
8
TABLEIII
AVERAGEPRECISION,RECALL,ANDF-MEASUREOFOURMETHODAGAINSTDIFFERENTKINDSOFBASELINEMETHODSONTHENEWLYCREATED
DATASET.THECODETYPEANDRUNTIME(SECOND)AREALSOPRESENTED.THEBOLDFONTSOFRESULTSINDICATETHEBESTPERFORMANCE,AND
“M”ISTHEABBREVIATIONOFMATLAB.
RGB Thermal RGB-T
Algorithm Code Type Runtime
P R F P R F P R F
BR [28] 0.724 0.260 0.411 0.648 0.413 0.488 0.804 0.366 0.520 M&C++ 8.23
SR [29] 0.425 0.523 0.377 0.361 0.587 0.362 0.484 0.584 0.432 M 1.60
SRM [16] 0.411 0.529 0.384 0.392 0.520 0.380 0.428 0.575 0.411 M 0.76
CA [23] 0.592 0.667 0.568 0.623 0.607 0.573 0.648 0.697 0.618 M 1.14
MCI [26] 0.526 0.604 0.485 0.445 0.585 0.435 0.547 0.652 0.515 M&C++ 21.89
NFI [25] 0.557 0.639 0.532 0.581 0.599 0.541 0.564 0.665 0.544 M 12.43
SS-KDE [27] 0.581 0.554 0.532 0.510 0.635 0.497 0.528 0.656 0.515 M&C++ 0.94
GMR [19] 0.644 0.603 0.587 0.700 0.574 0.603 0.694 0.624 0.615 M 1.11
GR [24] 0.621 0.582 0.534 0.639 0.544 0.545 0.705 0.593 0.600 M&C++ 2.43
STM [8] 0.658 0.569 0.581 0.647 0.603 0.579 - - - C++ 1.54
MST [15] 0.627 0.739 0.610 0.665 0.655 0.598 - - - C++ 0.53
RRWR [21] 0.642 0.610 0.589 0.689 0.580 0.596 - - - C++ 2.99
Ours - - - - - - 0.716 0.713 0.680 M&C++ 1.39
TABLEIV
AVERAGEMAESCOREOFOURMETHODAGAINSTDIFFERENTKINDSOFBASELINEMETHODSONTHENEWLYCREATEDDATASET.THEBOLDFONTSOF
RESULTSINDICATETHEBESTPERFORMANCE.
MAE CA NFI SS-KDE GMR GR BR SR SRM MCI STM MST RRWR Ours
RGB 0.163 0.126 0.122 0.172 0.197 0.269 0.300 0.199 0.211 0.194 0.127 0.171 0.109
T 0.225 0.124 0.132 0.232 0.199 0.323 0.218 0.155 0.176 0.208 0.129 0.234 0.141
RGB-T 0.195 0.125 0.127 0.202 0.199 0.297 0.260 0.178 0.195 - - - 0.107
methods with different modality inputs on the newly created bias (CB), cross image boundary (CIB), similar appearance
benchmark, which has been introduced in Sect. II. (SA), thermal crossover (TC), image clutter (IC), and out
Overall performance. We first report the precision (P), of focus (OF), see Sect. II for details) to facilitate analysis
recall(R) and F-measure(F) of 3 kindsof methodson entire of performance on different challenging factors, we present
datasetinTab.III.Herein,asthepublicsourcecodesofSTM, the PR curves in Fig. 5. From the results we can see that
MST and RRWR are encrypted, we only run these methods our approach outperformsother RGB-T methods with a clear
on the single modality. From the evaluation results, we can margin on most of challenges except for BSO and BW.
observe that the proposed approach substantially outperforms It validates the effectiveness of our method. In particular,
all baseline methods. This comparison clearly demonstrates for occasional perturbation or malfunction of one modality
the effectivenessof our approach for adaptively incorporating (e.g., LI, SA and TC), our method can effectively incorporate
thermalinformation.Inaddition,thethermaldataareeffective another modal information to detect salient objects robustly,
to boost image saliency detection and complementary to justifyingthe complementarybenefitsofmultiplesourcedata.
RGB data by observing the superior performance of RGB- For BSO, CA [23] achieves a superior performance over
T baselines over both RGB and thermal methods. We also ours, and it may attribute to CA takes the global color dis-
reportMAEof3kindsofmethodsonentiredatasetinTab.IV, tinctionand the globalspacial distance into accountfor better
the results of which are almost consistent with Tab. III. The capturing the global and local information. We can alleviate
evaluation results further validate the effectiveness of the thisproblembyimprovingthegraphconstructionthatexplores
proposedapproach,theimportanceofthermalinformationand more relations among superpixels. For BW, most of methods
the complementary benefits of RGB-T data. havebadperformance,butMCIobtainsabigperformancegain
Fig. 6 shows some sample results of our approach against over ours, the second best one. It suggests that considering
other baseline methods with different inputs. The evaluation multiple cues, like low-level considerations, global consider-
results show that the proposedapproachcan detect the salient ations, visual organizational rules and high-level factors, can
objects more accurate than other methods by adaptively and handleextremelychallengesinRGB-Tsaliencydetection,and
collaborativelyleveragingRGB-T data. It is worth notingthat we will integrate them into our framework to improve the
someresultsusingsinglemodalityarebetterthanusingRGB- robustness.
Tdata.Itisbecausethattheredundantinformationintroduced
by the noisy or malfunction modality sometimes affects the C. Analysis of Our Approach
fusion results in bad way. We discuss the details of our approach by analyzing the
Challenge-sensitive performance. For evaluating RGB-T main components, efficiency and limitations.
methodsonsubsetswithdifferentattributes(bigsalientobject Components. To justify the significance of the main com-
(BSO), small salient object (SSO), multiple salient objects ponents of the proposed approach, we implement two special
(MSO), low illumination (LI), bad weather (BW), center versions for comparative analysis, including: 1) Ours-I, that
9
(a) Image pair (b) BR (c) SR (d) SRM (e) CA (f) MCI (g) NFI (h) SS-KDE (i) GMR (j) GR (k) STM (l) RRWR (m)MST (n)Ours
&Ground-truth
Fig. 6. Sample results of the proposed approach and other baseline methods with different modality inputs. (a) Input RGB and thermal image pair and
theirgroundtruth.(b-i)TheresultsofthebaselinemethodswithRGB,thermalandRGB-Tinputs.(j-m)TheresultsofthebaselinemethodswithRGBand
thermalinputs.(n)Theresultsofourapproach.
0.8 0.15
1 1
0.7
0.8 0.6
0.8
0.1
0.5
Precision00..46 00..34 0.05 Criteria0.6
0.2 p 0.4
0.2 OOOuuurrrsss--III 0.1 PRreeccailslion Sto
0 0 F-measure 0 0.2
0 0.2 0.4 0.6 0.8 1 Ours-I Ours-II Ours Ours-IOurs-IIOurs
Recall
(a) (b) (c) 0
0 10 20 30 40 50
Fig. 7. PR curves, the representative precision, recall and F-measure and Iteration
MAEoftheproposedapproachwithitsvariants ontheentire dataset.
Fig.8. Average convergence curve oftheproposed approach onthe entire
dataset.
removesthe modalityweightsin the proposedrankingmodel,
i.e., fixes rk = 1 in Eq. (8), and 2) Ours-II, that removesthe
K
cross-modality consistent constraints in the proposed ranking fast optimization to the proposed ranking model.
model, i.e., sets λ=0 in Eq. (8). The experiments are carried out on a desktop with an Intel
The PR curves, representative precision, recall and F- i73.4GHzCPUand16GBRAM,andimplementedonmixing
measure, and MAE are presented in Fig. 7, and we can platform of C++ and MATLAB without any optimization.
draw the following conclusions. 1) Our method substantially Fig.8showstheconvergencecurveoftheproposedapproach.
outperformsOurs-I. This demonstrates the significance of the Although involves 5 times of the optimization to the ranking
introduced weighted variables to achieve adaptive fusion of model, our method costs about 1.39 second per image pair
different source data. 2) The complete algorithm achieves due to the efficiency of our optimization algorithm, which
superiorperformancethanOurs-II,validatingtheeffectiveness convergesapproximately within 5 iterations.
of the cross-modality consistent constraints. Wealsoreportruntimeofothermainproceduresofthepro-
Efficiency. Runtime of our approachagainstother methods posedapproachwiththetypicalresolutionof640 480pixels.
×
is presented in Tab. III. It is worth mentioning that our 1)Theover-segmentationbySLICalgorithmtakesabout0.52
approach is comparable with GMR [19] mainly due to the second.2)Thefeatureextractiontakesapproximately0.24sec-
10
From the evaluation results, we also observe the following
research potentials for RGB-T saliency detection. 1) The
ensemble of multiple models or algorithms (CA [23]) can
achieve robust performance. 2) Some principles are crucial
(a)
for effective saliency detection, e.g., global considerations
(CA [23]), boundary priors (Ours, GMR [19], RRWR [21],
MST[15])andmultiscalecontext(STM[8]).3)Theexploita-
tionofmorerelationsamongpixelsorsuperpixelsisimportant
for highlighting the salient objects (STM [8], MST [15]).
(b)
VI. CONCLUSION
Fig. 9. Twofailure cases ofourmethod. Theinput RGB, thermal images,
the ground truth and the results generated by our method are shown in (a) In this paper, we have presented a comprehensive image
and(b),respectively.
benchmark for RGB-T saliency detection, which includes a
dataset, three kinds of baselines and four evaluation metrics.
ond.3)Thefirststage,including4rankingprocess,costsabout With the benchmark, we have proposed a graph-based multi-
0.42second.4)Thesecondstage,including1rankingprocess, task manifold ranking algorithm to achieve adaptive and
spendsapproximately0.14second.Theover-segmentationand collaborative fusion of RGB and thermal data in RGB-T
the feature extraction are mostly time consuming procedure saliency detection. Through analyzing the quantitative and
(about 55%). Hence, through introducing the efficient over- qualitative results, we have demonstrated the effectiveness of
segmentation algorithms and feature extraction implementa- the proposedapproach,and also providedsome basic insights
tion, we can achieve much better computation time under our andpotentialresearchdirectionsforRGB-Tsaliencydetection.
approach. Our future works will focus on the following aspects: 1)
Limitations. We also present two failure cases generated We will expand the benchmark to a larger one, including a
by our method in Fig. 9. The reliable weights sometimes are largerdatasetwithmorechallengingfactorsandmorepopular
wrongly estimated due to the effect of clutter background, baseline methods. 2) We will improve the robustness of our
as shown in Fig. 9 (a), where the modality weights of RGB approach by studying other prior models [33] and graph
and thermal data are 0.97 and 0.95, respectively. In such construction.
circumstance, our method will generate bad detection results.
This problem could be tackled by incorporating the measure REFERENCES
of backgroundclutter in optimizing reliable weights, and will
[1] B. Zhao, Z. Li, M. Liu, W. Cao, and H. Liu, “Infrared
beaddressedinourfuturework.Inadditiontoreliableweight
and visible imagery fusion based on region saliency de-
computation, our approach has another major limitation. The
tectionfor24-hour-surveillancesystems,”inProceedings
first stage ranking relies on the boundary prior, and thus the
of the IEEE International Conference on Robotics and
salientobjectsare failedto be detectedwhenthey crossesim-
Biomimetics, 2013.
age boundaries. The insufficient foreground queries obtained
[2] C.Li,X.Wang,L.Zhang,andJ.Tang,“Weld:Weighted
after the first stage usually result in bad saliency detection
low-rank decomposition for robust grayscale-thermal
performance in the second stage, as shown in Fig. 9 (b). We
foreground detection,” IEEE Transactions on Circuits
will handle this issue by selecting more accurate boundary
and Systems for Video Technology, 2016.
queries according to other prior cues in future work.
[3] C. Li, H. Cheng, S. Hu, X. Liu, J. Tang, and
L. Lin, “Learningcollaborative sparse representationfor
D. Discussions on RGB-T Saliency Detection grayscale-thermaltracking,”IEEETransactionsonImage
We observe from the evaluationsthat integratingRGB data Processing, vol. 25, no. 12, pp. 5743–5756, 2016.
and thermal data will boost saliency detection performance [4] A.TorralbaandA.Efros,“Unbiasedlookatdatasetbias,”
(see Tab. III). The improvements are even bigger while en- in IEEE Conference on Computer Vision and Pattern
countering certain challenges, i.e., low illuminations, similar Recognition, 2011.
appearance, image clutter and thermal crossover (see Fig. 5), [5] J. Harel, C. Koch, and P. Perona, “Graph-based visual
demonstratingtheimportanceofthermalinformationinimage saliency,” in Proceedings of the Advances in Neural
saliencydetectionandthecomplementarybenefitsfromRGB- Information Processing Systems, 2006.
T data. [6] T. Liu, Z. Yuan, J. Sun, J. Wang, N. Zheng, X. Tang,
In addition, directly integrating RGB and thermal informa- and H.-Y. Shum, “Learning to detect a salient object,”
tionsometimesleadtoworseresultsthanusingsinglemodality IEEE Transactions on Pattern Analysis and Machine
(seeFig. 6),astheredundantinformationisintroducedbythe Intelligence, vol. 33, no. 2, pp. 353–367, 2011.
noisy or malfunction modality. We can address this issue by [7] H. Jiang, J. Wang, Z. Yuan, Y. Wu, N. Zheng, and
adaptively fusing different modal information (Our method) S.Li,“Salientobjectdetection:Adiscriminativeregional
that can automatically determine the contribution weights featureintegrationapproach,”inProceedingsoftheIEEE
of different modalities to alleviate the effects of redundant ConferenceonComputerVisionandPatternRecognition,
information. 2013.