Table Of ContentFuzzy Nonlinear Proximal Support Vector Machine for
Land Extraction Based on Remote Sensing Image
Xiaomei Zhong1, Jianping Li2,3*, Huacheng Dou2,3, Shijun Deng2,3, Guofei Wang2,3, Yu Jiang2,3,
Yongjie Wang2,3, Zebing Zhou2,3, Li Wang2,3, Fei Yan4
1TianjinChengjianUniversity,Tianjin,China,2TianjinInstituteofGeotechnicalInvestigationandSurveying,Tianjin,China,3TianjinStarGISInformationEngineering
CompanyLimited,Tianjin,China,4BeijingForestryUniversity,Beijing,China
Abstract
Currently,remotesensingtechnologieswerewidelyemployedinthedynamicmonitoringoftheland.Thispaperpresented
analgorithmnamedfuzzynonlinearproximalsupportvectormachine(FNPSVM)bybasingonETM+remotesensingimage.
ThisalgorithmisappliedtoextractvarioustypesoflandsofthecityDa’aninnorthernChina.Twomulti-categorystrategies,
namely‘‘one-against-one’’ and ‘‘one-against-rest’’ for this algorithm were described in detail and then compared. Afuzzy
membership function was presented to reduce the effects of noises or outliers on the data samples. The approaches of
feature extraction, feature selection, and several key parameter settings were also given. Numerous experiments were
carried out to evaluate its performances including various accuracies (overall accuracies and kappa coefficient), stability,
training speed, and classification speed. The FNPSVM classifier was compared to the other three classifiers including the
maximum likelihood classifier (MLC), back propagation neural network (BPN), and the proximal support vector machine
(PSVM)underdifferenttrainingconditions.Theimpactsoftheselectionoftrainingsamples,testingsamplesandfeatureson
the fourclassifiers were also evaluatedinthese experiments.
Citation:ZhongX,LiJ,DouH,DengS,WangG,etal.(2013)FuzzyNonlinearProximalSupportVectorMachineforLandExtractionBasedonRemoteSensing
Image.PLoSONE8(7):e69434.doi:10.1371/journal.pone.0069434
Editor:GuyJ.-P.Schumann,NASAJetPropulsionLaboratory,UnitedStatesofAmerica
ReceivedDecember9,2012;AcceptedJune7,2013;PublishedJuly24,2013
Copyright: (cid:2) 2013 Zhong etal. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits
unrestricteduse,distribution,andreproductioninanymedium,providedtheoriginalauthorandsourcearecredited.
Funding:FinancialsupportforthisstudywasprovidedbyXiaomeiZhong,JianpingLi,HuchengDou,andtheyhadakeyroleinalgorithmstudy,datacollection
andanalysis,accuracyassessment.
CompetingInterests:Theauthorsdeclarethattheyhavenoconflictofinterest,theyhavenofinancialandpersonalrelationshipswithotherpeopleor
organizationsthatcaninappropriatelyinfluencetheirwork,thereisnoprofessionalorotherpersonalinterestofanynatureorkindinanyproduct,serviceand/or
companythatcouldbeconstruedasinfluencingthepositionpresentedin,orthereviewof,themanuscriptentitled,‘‘FuzzyNonlinearProximalSupportVector
MachineforLandExtractionBasedonRemoteSensingImage’’.Oftheauthors,LiJianping,DouHuacheng,DengShijun,WangGuofei,JiangYu,WangYongjie,
ZhouZebing,andWangLi,arecurrentlyemployedbyTianjinStarGISInformationEngineeringCO,.LTD.Andtheauthorsherebydeclarethatthisaffiliationdoes
notcauseanycompetinginterests,andthatitdoesnotaltertheiradherencetoallthePLOSONEpoliciesonsharingdataandmaterials.
*E-mail:[email protected]
Introduction architectures and therefore the information processing function
thatitcarriesoutistheapproximationofaboundedmapping[7].
Remotesensing(RS)playsakeyroleinthedynamicmonitoring
Furthermore, the approach can effectively avoid some of the
of lands[1–3]. Approaches of land extraction that are based on
problems associated with MLC by simulating the processing
remote sensing image basically include manual visual interpreta-
patterns of the human brain, although it also has some
tion and computerized auto-classification. Due to the large
disadvantages including a slow learning convergent velocity and
number of drawbacks in manual visual interpretation, numerous
being easily converging to local minimum [8]. Lastly, the basic
classification algorithms for computerized auto-classification have
ideaofdecisiontreeclassifieristobreakdownacomplexdecision-
been developed; among the most popular are the maximum
making process into a collection of simpler decisions, thus
likelihood classifier, neural network classifiers and decision tree
providing a solutionwhichisoften easier tointerpret.
classifiers [4]. The maximum likelihood classifier is a popular
Support vector machine (SVM) is based on statistical learning
classifier on the basis of the assumption that classes in the input
theory,andaimstodeterminethelocationofdecisionboundaries
datafollowaGaussiandistribution.However,therewillbeerrors
thatproducetheoptimalseparationofclasses[9].Thisapproach,
in the results if the sample data size is not sufficient, where the
a new classification technique in the field of remote sensing as
inputdatasetdoesnotfollowtheGaussiandistributionand/orthe
comparedtotheabovethreemethods,hasquicklygainedground
classes have much overlap in their distribution, and therefore
in the past ten years. The SVM classifier can achieve higher
resulting in poor separability. The back propagation neural
accuracies than both the ML (Maximum Likelihood) and ANN
network model is widely applied because of its simplicity and its
(ArtificialNeuralNetwork)classifiers[10]can,thusrecentlyithas
power to extract useful information from samples [5,6]. It is a
been applied to classify remote sensing images [11]. Although
hierarchicaldesignconsistingoffullyinterconnectedlayersorrows
perfect performance and high classification accuracy can be
of processing units (with each unit comprising several individual
achieved by basing on the SVM approach, there still are some
processing elements, which will be explained below). Back
shortcomings. Oneof suchshortcomings isthat theSVM mainly
propagation belongs to the class of mapping neural network
aims at the classification of a small number of training samples,
PLOSONE | www.plosone.org 1 July2013 | Volume 8 | Issue 7 | e69434
FuzzyNonlinearProximalSupportVectorMachine
andthecostofcalculationincreasesrapidlywithlargerdatasize,
especiallysoforremotesensingdata.Inordertoresolvesuchissue
of high calculation cost, Fung and Mangasarian [12]proposed
proximal support vector machine (PSVM), which can also be
interpreted as regularized least squares and considered in the
much more general context of regularized networks, wherein
classifiespointsareassignedtotheclosetoftwoparallelplanesthat
are pushed apart as far as possible. In addition, the method is
much more efficient than traditional SVM in terms of running
speedbecauseitmerelyrequiresthesolutionofasinglesystemof
linearequations.Accuracyandspeedofclassificationaredeemed
significant in the classification that’s based on remote sensing
images.Avarietyoffactorswouldaffecttheaccuracyandspeedof
classification: training data size, selection of feature, algorithm
parametersetting,justtonameafew.Often,realdatasetscontain
noisesandthenoisysamplesmightnotberepresentativeofaclass,
asifthereisanuncertaintywithregardtotheclasstowhichthey
belong. The noises tend to corrupt the data samples, and the
optimal hyperplane obtained by the PSVM may be sensitive to
noisesoroutliersinthetrainingsets.Asaresult,aclassifiermight
not be able to correctly classify some of the data samples having
noisydata,sothefuzzysupportvectormachines[13,14]andfuzzy
linear proximal support machines [15,16] were proposed to
Figure1.TheProximalSupportVectorMachineClassifier:The
address theproblem. planesx’v{c~+1aroundwhichpointsofthesetsA+andA-
Normallyhowever,realdatasetisnotlinearlyseparable.Inthis cluster and which are pushed apart by the optimization
paper, we proposed the fuzzy nonlinear proximal support vector problem(1).
machine (FNPSVM) to extract different types of lands, and this doi:10.1371/journal.pone.0069434.g001
technique is actually a fuzzy non-linear extension of the existing
PSVM methods. In addition, we defined a fuzzy membership
function that assigned a fuzzy membership to each data point,
such that different data points could have different effects in the c 1
learning of the separating hyperplane. Additionally, for the min kyk2z (v’vzc2)
purpose of improving algorithm performance, we presented the (v,c,y)[Rnz1zm2 2
approachesofsomekeyparametersofthisalgorithm,aswellasthe
approachesoffeatureextractionandfeatureselection.Andlastly,
wecomparedouralgorithmwiththeotherthreeclassifiers(MLC, s:t:D(Av{ec)zy~e, ð1Þ
BPN,andPSVM).
where e is an m-dimensional vector of ones, and y is an error
The paperisorganizedas follows:
vector.Whenthetwoclasses arestrictlylinearlyseparable,y~0
Section 2 discusses in detail the architectures of PSVM and i
in (1) (which is not the case shown in Figure 1). As depicted in
FNPSVM.
Figure 1, the variables (v,c) determine the orientation and
Training algorithm ofFNPSVMisshown insection 3.
location of theproximal planes:
Experimental results of the algorithm and discussion are
presented insection 4.
Section 5 contains theconcluding remarks.
x’v{c~z1
Architectures of PSVM and FNPSVM
Architecture of PSVM
x’v{c~{1, ð2Þ
To deduce our FNPSVM algorithm, we briefly introduce the
binary category proximal support vector machine first. Let the aroundwhichthepointsofeachclassareclusteredandwhichare
dataset consistingof mpoints inthen-dimensionalrealspaceRn pushed apart as far as possible by the term (v’vzc2) in the
be represented by the m|n matrix, and let each point be
objective function. Consequently, theplane:
represented by an n-dimensional row eigenvector
A(i~1,2,(cid:2)(cid:2)(cid:2),m): In the case of binary classification, each data
poiint A in the class of A+ or A- is specified by a given m|m x’v{c~0, ð3Þ
i
diagonalmatrixD,with+1or-1elementsalongitsdiagonal.The
midway between and parallel to the proximal planes (2), is a
targetisseparatingthemdatapointsintoA+andA-,asdepictedin
separating plane that approximately separates A+ from A- as
Figure 1. For the problem, the proximal support vector machine
with a linear kernel [12] is given by the following quadratic depictedinFigure1.Thedistance(cid:4)(cid:4)(cid:2)v2 (cid:3)(cid:4)(cid:4)iscalledthe‘‘margin’’
programwithparameterc.0(whichcontrolsthetradeoffbetween (cid:4)(cid:4) c (cid:4)(cid:4)
themargin andthe error)andlinear equalityconstraint:
(see Figure 1), and maximizing the margin enhances the
generalization capability of a support vector machine [9,17].
PLOSONE | www.plosone.org 2 July2013 | Volume 8 | Issue 7 | e69434
FuzzyNonlinearProximalSupportVectorMachine
The approximate separating plane (3) shown in Figure 1, acts as (9) as in [12,20] first by substituting the variable v with its dual
decision functionas follows: equivalent v~A’Du, and then by modifying the last term of the
objectivefunctiontobethenormofthenewdualvariableuandc.
8 w0,thenx[Az Now weobtainthefollowing problem:
><
x’v{c v0,thenx[A{ ð4Þ
>:~0,thenx[A{orx[A{ min ckS(D(AA’Du{ec){e)k2z1(cid:4)(cid:4)(cid:4)(cid:2)u(cid:3)(cid:4)(cid:4)(cid:4)2 ð10Þ
(u,c)[Rmz1 2 2(cid:4) c (cid:4)
Architecture of FNPSVM
If we now replace the linear kernel AA’ by a nonlinear kernel K
In this paper, we will employ the following norms of a vector
(A,A)’, weobtain:
x’[Rn [17]:
n
L1normof x:~kxk1~XDxiD ð5Þ min ckS(D(K(A,A’)Du{ec){e)k2z1(cid:4)(cid:4)(cid:4)(cid:2)u(cid:3)(cid:4)(cid:4)(cid:4)2 ð11Þ
i~1 (u,c)[Rmz1 2 2(cid:4) c (cid:4)
Xn !1=2 Let F(u,c)~2ckS(D(K(A,A’)Du{ec){e)k2z12(cid:4)(cid:4)(cid:4)(cid:4)(cid:2)uc(cid:3)(cid:4)(cid:4)(cid:4)(cid:4)2, and
L normof x:~kxk ~ (x)2 ð6Þ
2 2 i setting one-order derivative of F(u,c) with respect to u and c to
i~1 (cid:5)LF(u,c)=Lu~0
zero, i.e. , wearriveat thefollowing formula:
LF(u,c)=Lc~0
L normof x:~kxk ~ max (DxD) ð7Þ
? ? 1ƒiƒn i (cid:5)c(SDK(A,A’)D)’S(D(K(A,A’)Du{ec){e)zu~0
, ð12Þ
c(SDe)’S(D({K(A,A’)Duzec)ze)zc~0
The fuzzy nonlinear binary category proximal support
vector machine. Generally, real data sets are corrupted with where both D and S are diagonal matrices, and so that D~D’,
noises. And as a result, it’s not always the case that one classifier S~S’ and D2~I.Further, we deal with the above formula (12),
obtainedbytrainingwithnoisydatawouldcorrectlyclassifysome andobtain theequations with respectto u andc:
ofthedatasamples.Sincetheoptimalhyperplaneonlydependson
asmallpartofthedatapoints,itmaybecomesensitivetonoisesor 8 (c(SDK(A,A’)D)’SDK(A,A’)DzI)u{c(SDK(A,A’)D)’SDec
outliers in the training set [18,19]. We can associate each data >>>< {c(SDK(A,A’)D)’Se~0
point with a fuzzy membership that reflects their relative degrees ð13Þ
{c(SDe)’SDK(A,A’)Du
awshmicehanitinbgfeulolndgast.a,Tanhdosaeccnoouinsetssfoorrthoeutulinecrsertaarientytreinattehdecalsassletsos >>>: z(c(SDe)’SDez1)czc(SDe)’Se~0
important and have lower fuzzy membership. This equips the
Now let
classifier with the ability to train data that has noises or outliers.
Suchisdonebysettinglowerfuzzymembershipstothedatapoints
thatareconsideredtobenoisesoroutlierswithhigherprobability.
A classifier that is able to use information regarding this fuzzy M ~c(SDK(A,A’)D)’SDK(A,A’)DzI,
1
degree can improve its performance, and reduce the effects of
L ~{c(SDK(A,A’)D)’SDe,
noiseoroutliers.Thusweproposedthefollowingtheoptimization 1
problem in determining theclassifier: C ~{c(SDK(A,A’)D)’Se,
1
M ~{c(SDe)’SDK(A,A’)D,
c 1 2
(v,c,y)m[Rinnz1zm 2kSyk2z2(v’vzc2) L2~c(SDe)’SDez1,
C ~c(SDe)’Se
2
s:t: D(Av{ec)zy~e ð8Þ Andthusformula(13)canbeexpressedbythefollowingformula:
where S denotes a diagonal matrix, i.e. S~diag(s1,s2,(cid:2)(cid:2)(cid:2),sm), (cid:2)M L (cid:3) (cid:2)u(cid:3) (cid:2){C (cid:3)
whosediagonalelementscorrespondtothemembershipvaluesof 1 1 (cid:3) ~ 1 ð14Þ
M L c {C
thedatasamplesbelongingtoA+orA-;andeisthevectorofplus 2 2 2
ones. And0vsƒsƒ1(i~1,2,(cid:2)(cid:2)(cid:2),m).
i Wecanworkoutuandcbysolvingformula(14),andhencethe
Accordingtotheobjectivefunctionof(8),ycanbereplacedby
binary category nonlinear classifier canbewritten asfollows:
v and c, so we then arrive at the following unconstrained
minimization problem:
8w0,thenx[Az
c 1 ><
min kS(D(Av{ec){e)k2z (v’vzc2) ð9Þ K(x,A)Du{c v0,thenx[A{ ð15Þ
(v,c)[Rnz1 2 2 >:~0,thenx[Azorx[A{
Toobtainfuzzynonlinearproximalclassifier,wemodifyformula
PLOSONE | www.plosone.org 3 July2013 | Volume 8 | Issue 7 | e69434
FuzzyNonlinearProximalSupportVectorMachine
The fuzzy nonlinear proximal support vector
machine. There are roughly four types of support vector
machines that handle multi-class problems [21]. Two strategies D ~1forA[Ar,D ~{1forA[=Ar,r[f1,(cid:2)(cid:2)(cid:2),kg
ii i ii i
have beenproposedtoadapttheSVMtoN-classproblems [22],
namelythe‘‘one-against-one’’strategyandthe‘‘one-against-rest’’ Fromformula(14),thekuniqueuandccanbeobtained,andthus
strategy.The‘‘one-against-one’’strategyistoconstructamachine k proximal surfaces aregenerated:
for each pair of classes, resulting in N (N- 1)/2 machines. When
appliedtoatestpixel,eachmachinegivesonevotetothewinning
class,andthepixelislabeledwiththeclasshavingmostvotes.The
K(x,A)Dur{cr~0,r~1,(cid:2)(cid:2)(cid:2),k
‘‘one-against-rest’’strategyistobreaktheN-classcaseintoNtwo-
class cases, in each of which a machine is trained to classify one
Anewgivenpointx[Rnisassignedclasst,dependingonwhichof
class against all others[4]. In thispaper, weemployed the above
theknonlinearhalfspacesgeneratedbytheksurfacesitliesdeepest
mentioned strategies.
in, namely:
¤‘‘One-against-one’’ strategy:
A~(cid:6)A1(cid:2)(cid:2)(cid:2)Ak(cid:7),Az~Ar,A{~Aj, K(x,A)Dut{ct~maxK(x,A)Dur{cr,r~1,(cid:2)(cid:2)(cid:2),k:
r[f1,(cid:2)(cid:2)(cid:2),k{1g,j[f2,(cid:2)(cid:2)(cid:2),kg,rwj: Inthismethod,SVMclassifiersforallpossiblepairsofclassesare
created. Therefore, for M classes, there will be binary classifiers.
Here, k is the class number, while Ar[Rmr|n and Aj[Rmj|n The output from each classifier in the form of a class label is
represent themr andmj points in class r and class j, respectively. obtained.Theclasslabelthatoccursmostisassignedtothatpoint
Letm~mrzmj,andthusDisam|mdiagonalmatrixasfollows: inthedatavector.Incaseofatie,atie-breakingstrategymaybe
adopted.Acommontie-breakingstrategyistorandomlyselectone
of theclasslabels that are tied [23].
Dii~1forAi[Ar,Dii~{1forAi[Aj, Training algorithm of FNPSVM
From formula (14), the k|(k{1)=2 unique u and c can be Fuzzy membership model
obtained,andthusk|(k{1)=2proximalsurfacesaregenerated: In order to improve classification performance and to reduce
the corruption of data samples from noises, we defined a fuzzy
membership function to a given class, where a membership is
K(x,A)Dus{cs~0,s~1,(cid:2)(cid:2)(cid:2),k|(k{1)=2 assigned toeach datapoint. It iswrittenas:
Aittihmnceeldawsbsgyinivke|tner(mpko{sino1tf)tx=h2[eRpfnroolilxosiwmaisansliggsnufoerrdfamctuheslea,:aitnhdclfainssalTlyix(i~isa1s,s(cid:2)ig(cid:2)(cid:2)n,ekd) f(x)~8>><e{t12t1{:t2t1x 0ƒt1xƒƒxtƒ1 t2 ,
>>: 0 t ƒxƒ1
2
K(x,A)Dui{ci~maxT,i~1,(cid:2)(cid:2)(cid:2),k, where x denotes the distance between the data sample and the
i
centeroftheclassthatitbelongsto.Inaddition,t andt thattune
1 2
supposingthedatasetistobeclassifiedintoMclasses.Therefore, the fuzzy membership of each data point in the training are two
M binary SVMclassifiers maybe created whereeach classifier is user-definedconstants,andtheydeterminetherangeinwhichthe
trained to distinguish one class from the remaining M-1 classes. datasampleabsolutelydoesordoesnotbelongtoagivenclass.On
Forexample,classonebinaryclassifierisdesignedtodiscriminate the other hand, they also control the figure of the curve (see
between class one data vectors and the data vectors of the Figure 2).
remaining classes. Other SVM classifiers are constructed in the Areducingvalueofxwouldindicatethatthedistancebetween
same manner. During the testing or application phase, data thedatasamplepointandthecenterofthegivenclassissmaller,
vectorsareclassifiedbyfindingmarginfromthelinearseparating andtheprobabilityofthissamplebelongingtothiscertainclassis
hyperplane. The final output is the class that corresponds to the higher.Whenxisbetween0andt1,thedatasamplepointbelongs
SVM withthelargest margin[23]. tothegivenclasswithabsolutecertainty;andwhenxisbetweent2
¤‘‘One-against-rest’’ strategy: and 1, the data sample point doesn’t belong to the given class.
When the value of x is known, the values of t and t would
1 2
influence the values of fuzzy memberships, and thus would also
influence theultimate classification result.
A~(cid:6)A1(cid:2)(cid:2)(cid:2)Ak(cid:7),Az~Ar,A{~(cid:6)A1(cid:2)(cid:2)(cid:2)Ar{1Arz1(cid:2)(cid:2)(cid:2)Ak(cid:7),
The distance x is the key of each training sample’s fuzzy
r[f1,(cid:2)(cid:2)(cid:2),kg,Az~Ar, membership, andit canbeobtained asfollows:
wherekistheclassnumber,Ar[Rmr|nrepresentsthemrpointsin 1Xn
class r. Letting m~m1zm2z(cid:2)(cid:2)(cid:2)zmk, so that D is a m|m Mt~n VFti,t~1,(cid:2)(cid:2)(cid:2),p:
diagonal matrix as follows: i~1
PLOSONE | www.plosone.org 4 July2013 | Volume 8 | Issue 7 | e69434
FuzzyNonlinearProximalSupportVectorMachine
pointsasthedatapointsaremappedintoahighdimensionalspace
inwhichthedata aremore clearlyseparable [27,28].
ThreekernelfunctionsfornonlinearSVM,includingtheradial
basis function (RBF), the polynomial, and thesigmoid are widely
used.Inthispaper,wehaveadoptedtheGaussianRBFkernelas
thedefaultkernelfunctionmodelduetothefactthat:(1)TheRBF
kernelcanhandlethecasewheretherelationbetweenclasslabels
and attributes is nonlinear [29]; (2) The polynomial function
spends a longer time in the training stage of SVM, and some
previous studies [30–32] have reported that the RBF function
would provide better performance compared to polynomial
function. In addition, the polynomial kernel has more hyper
parameters than RBF kernel does, and may approach infinity or
Figure2.FigureofFuzzyMembershipFunction:t1andt2that zerowhilethedegreeislarge[29];(3)Thesigmoidkernelbehaves
tunethefuzzymembershipofeachdatapointinthetraining like the RBF under certain parameters; however, it is not valid
aretwouser-definedconstants,andtheydeterminetherange
under some parameters [9]; (4) When the size of sample data is
inwhichthedatasampleabsolutelydoesordoesnotbelong
toagivenclass. quitelarge,convergentabilityofRBFkernelisstrongerthanthat
doi:10.1371/journal.pone.0069434.g002 of theother kernelsabove.
The Gaussian kernel functionisexpressed as:
DMt~maxDVFti{MtD,i~1,(cid:2)(cid:2)(cid:2),n,t~1,(cid:2)(cid:2)(cid:2),p: K(A,B) ~e{s(cid:4)(cid:4)Ai’{B:j(cid:4)(cid:4)2,i~1,(cid:2)(cid:2)(cid:2),m,j~1,(cid:2)(cid:2)(cid:2),k
ij
Here, the matrix A[Rm|n, and B[Rn|k;A is the ith row of A,
i
x~1Xp DVF {MD=DM,i~1,(cid:2)(cid:2)(cid:2),n,t~1,(cid:2)(cid:2)(cid:2),p, which is a row vector in Rn, while B:j is the jth column of B; the
i p ti t t kernel K(A,B) maps Rm|n|Rn|k into Rm|k. In particular, if x
t~1
and y are column vectors in Rn, then K(x’,y) is a real number,
wherenisthenumberoftrainingsamplestoagivenclass,andpis K(x’,A’) is a row vector in Rm, and K(A,A’) is a m|m matrix.
the number of feature selected, with VF representing the tth The parameter s of the RBF kernel is a user-defined positive
ti
featurevalueoftheithsample.M isthemeanvalueoftthfeature constantregulatingthewidthoftheGaussiankernel,whichhasan
t
ofnsamplestoagivenclass;DM isthemaxvalueofthedistances important impact on kernel performance. There is however little
t
betweenallsamplepointsandthecenter(M)ofthetthfeaturetoa guidance in the literatures on the criteria of selecting kernel-
t
given class; and x denotes the average distance between the ith specific parameters [33], hence we carried out lots of trials to
i
sample andthecenters of all features. acquire theoptimalparameter s:
Sample Selection Parameter Selection Method
The choice in sample size and sampling design affect the Regardless of using a simple or a more complex classifier, the
performance and reliability of a classifier. Sufficient samples are learningparametershavetobechosencarefullyinordertoyielda
necessary. A previous study indicated that this factor alone could good classification performance. The FNPSVM algorithm pro-
bemoreimportantthantheselectionofclassificationalgorithmsin posedinthispaperrequiresfourgivenparameters,specificallyc,s,
obtaining accurate classifications [24]. t and t . Vapnik [9] discovered that varying kernel functions
1 2
Sample selection includes two parts, namely sample data size would slightly affect classification results of SVM, while the
andselectionmethod.Increasesinsampledatasizegenerallywill parameters of the kernel functions and penalty constant c would
leadtoimprovedperformances,thoughatthesametimeresulting have astrong effecton theperformance of SVM.
in a higher calculation cost. The sample size must be sufficient One such parameter c.0 is an important quantity in
enoughtoprovidearepresentativelymeaningfulbasisfortraining determining a trade-off between the empirical error (number of
of a classifier and for accuracy assessment. The basic sampling wronglyclassifiedinputs)andthecomplexityofthefoundsolution.
designs,suchassimplerandomsampling,canbeappropriateifthe Normally large values for c lead to fewer training errors (and a
sample size is large [25] enough. The adoption of a simple narrower margin), all at the cost of more training time; whereas
sampling design is also valuable in helping to meet the smallvaluesgeneratealargermargin,withmoreerrorsandmore
requirements of a broad range of users [26]. In this paper, we training points situated inside the margin. Since the number of
apply simple random sampling design to collect training samples trainingerrorscannotbeinterpretedasanestimateofthetruerisk,
and testingsamples. thisknowledgedoesnotreallyhelpinchoosingasuitablevaluefor
theparameter.TheparametersoftheGaussiankernelaffectsthe
Kernel Function Strategy complexity of the decision boundary. Improper selection of these
TheconceptofthekernelisintroducedtoextendSVM’sability two parameters can cause over-fitting or under-fitting problems
indealingwithnonlinearclassification.Itcantransformnon-linear [29,34]. Nevertheless, there is little explicit guidance to solve the
boundaries in low-dimensional space into linear ones in high- problem of choosing parameters for SVM. Recently, Hsu [35]
dimensional space by mapping feature vector into a high- suggested a method in determining parameters, namely grid-
dimensional space, and thus the training data can be classified searchandcrossvalidation.Formulti-categoryhowever,thecross
inthehigh-dimensionalspacewithoutknowingthespecificformof validation method is not feasible. In this paper, we advanced his
themappingfunction.Akernelfunctionisageneralizationofthe method and proposed an approach named the multi-layer grid
distance metric that measures the distance between two data search andrandom-validation.
PLOSONE | www.plosone.org 5 July2013 | Volume 8 | Issue 7 | e69434
FuzzyNonlinearProximalSupportVectorMachine
Thebasicideaofrandom-validationisthatwerandomlydivide onGaussianRBFkernelindealingwiththen-classcasewereused,
thesamplesetintotrainingsetandtestsetofdifferentsizetoeach and the results (various accuracies, training speed, and classifica-
category. The test set is sequentially tested using the classifier tion speed) obtained using FNPSVM algorithm were compared
trained on the training set, and the classification accuracy is with those derived from the four conventional classification
derived. The above procedure is iteratively executed for n times methodsincludingthemaximumlikelihoodclassifier(MLC),back
during each cycle, and n accuracies are obtained. Finally, the propagation neural network (BPN), support vector machine
random-validation accuracy isthemean ofnaccuracies. (SVM), and proximal support vector machine (PSVM) under
Werecommendthe‘‘multi-layergridsearch’’methodoncand different trainingconditions (shownin Table1).
s using n random-validation, in order to accurately find the Feature extraction and feature selection. (1) Feature
optimal parameters while lowering computational cost. We first extraction.Featureextractionhasastrongimpactonclassification
acquire the boundary of the parameters c and s, and the 2- accuracy. In this paper, we extracted 14 features, including six
dimentional grid of pairs of (ci,sj) is roughly constructed. Here, bands of ETM+ image, the first principle components of K-L
i~1,2,(cid:2)(cid:2)(cid:2),m,andj~1,2,(cid:2)(cid:2)(cid:2),n,thusm|ngird-planeandm|n transform and K-T transform, soil index, NDVI (normalized
pairs of (ci,sj) are obtained. The FNPSVM algorithm uses each differencevegetationindex),compositionindex,aswellasH(hue),
pair of (ci,sj) to learn by basing on n random-validation, and S (saturation), and I (intensity) color components of HSI color
obtainstheclassificationaccuracy.Thecorresponding(c,s) of space.Some ofthefeatures can beobtainedas follows:
i j high
thebestaccuracyistheoptimalpair.Ifthebestaccuracydoesnot
satisfytherequirementofclassification,anew2-dimentionalgrid- Soilindex: SI~(B {(255{B ))=(B z(255{B ))½37(cid:4)
5 4 5 4
planethat’sbasedonthecenterofthepairof(c,s) shouldbe
i j high
constructed, and the learning by using new pairs of (c,s) in the
newgrid-planeisexecutedtoacquirehigheraccuracy.Theabove
NDVI: VI~(B {B )=(B zB )
procedureisperformediterativelytofindtheoptimalparametersc 4 3 4 3
and s.
Although the multi-layer grid search and random-validation
seemsimple,itisactuallypracticalbecauseofthefactthat:(1)For Compositionindex: CI~(B5{B1)=(B5zB1)½37(cid:4)
each parameter, a finite number of possible values is prescribed,
andthenallpossiblecombinationsof(c,s)areconsideredtofind Here B1, B3, B4, B5 represent the first band, third band, forth
onethatyieldsthebestresult;(2)thecomputationaltimeinfinding band, andfifth band of ETM+ image, respectively.
goodparametersthroughtheapproachisn’tmuchmorethanthat In the field of digital image processing, a number of color
of advanced methods, since there are only two parameters models were proposed, such as RGB, HSI, CIE, etc. But selecting
(generally the complexity of grid search grows exponentially with the most optimal color space is still a problem in color image
the number of parameter); (3) The grid-search can be easily segmentation [20].
parallelized because each(c,s)isindependent,unlikesomeother TheRGBcolormodelissuitableforcolordisplay,butlesssofor
advanced methods that require iterativeprocesses. color analysis because of its high correlation among R, G, and B
colorcomponents[38].Incolorimageprocessingandanalysis,we
Experiments and Discussion know that: (1) H and S components are closely correlated to the
color sense of the eyes; (2) Hue information and intensity
All experiments were run on 1800MHz ADM Sempron (tm) information are distinctly differentiated in HSI model; (3) By
processor 3000+ under Windows XP using Matlab 7.0 compiler. HSI model, computer program can easily process color informa-
WehaveadoptedtheclassificationcriterionofChen[36];saline- tion after the color sense of the eye has been transformed into
alkalized lands are classified into heavy saline-alkalized land, specific values, so we extracted H, S, and I color components of
moderate saline-alkalized land, andlightsaline-alkalized land. HSIcolorspaceasthreefeaturesofclassification.Falsecolorimage
composite of bands 5, 4, and 2 were performed, after which the
Classification Experiments Using ETM+ Image imagewasexportedintoRGB image. Andfinallythe RGBmodel
Experiment summary. We have selected Da’an, a city in was transformed into HSI model according to the following
northern China with a total area of 4,879km2 as our test area. formulas [39]:
Multi-spectral (Landsat-7 ETM+) remote sensing data (30 m
spatial resolution, UTM project) acquired on August 30th, 2000 ( )
½(R{G)z(R{B)(cid:4)=2
was used to classify the image data into nine land cover types H~arccos
(heavy saline-alkalized land, moderate saline-alkalized land, light ½(R{G)2z(R{B)(G{B)(cid:4)1=2
saline-alkalized land, water area, cropland, grassland, rural
residential area, urban residential area,andsand land).
According to the topographic maps of Da’an city (1:100,000 3 1
S~1{ ½min(R,G,B)(cid:4),I~ (RzGzB),
scale), we implemented precise geometric correction and resam- RzGzB 3
pling of the image. Geometric correction of image was accom-
plished through two-order polynomial while resampling was (2)Featureselection.Normally,thesizeofarealdatasetissolarge
achieved through cubic convolution with the error of matching thatlearningmightnotwork,andtherunningtimeofalearning
lessthan onepixel. We selected 270samples (90for trainingand algorithm might be drastically increased before removing these
180fortesting)foreachclassusingarandomsamplingprocedure unwanted features. Thus we must select some features that are
from the image, totally 810 training samples and 1,620 test neither irrelevantnorredundant tothetarget concept.
samples for nine classes. For each sample set, the test set was Featureselectionforclassificationisawell-researchedproblem,
independent of thetraining set. striving to improve the classifier’s generalization ability, and to
Todemonstratetheeffectivenessoftheproposedmethod,both reduce the dimensionality and the computational complexity. It
‘‘one-against-one’’and‘‘one-against-rest’’strategiesthatarebased directly reduces the number of original features by selecting a
PLOSONE | www.plosone.org 6 July2013 | Volume 8 | Issue 7 | e69434
FuzzyNonlinearProximalSupportVectorMachine
Table1. Trainingdata conditions underwhichthe classification algorithms were tested.
Samplesize Numberoffeatures Trainingcaseno.
Trainingsamplenumber Testingsamplenumber
60 210 4 A
7 B
10 C
14 D
90 180 4 E
7 F
10 G
14 H
120 150 4 I
7 J
10 K
14 L
doi:10.1371/journal.pone.0069434.t001
subset of them that still retains sufficient information for problem,thecorrespondingparametersofthebestperformanceof
classification [40]. Feature selection attempts to select the each algorithm were chosenforthepurposeof comparison.
minimally sized subset of features according to the following (1)ParametersettingofPSVMandFNPSVM.Theperformance
criterion. Thecriterion canbe [41]: ofclassificationalgorithmsisaffectedbytheparametersettingsof
thosealgorithms.Asdescribedinsection3.4,wesearchedforthe
1) The classification accuracy does not significantly decrease; optimalparameterst ,t ,c,andsforFNPSVMclassifier.Inthis
1 2
and procedure, we used two steps to find the best parameters. In the
2) Theresultingclassdistributionwhengivenonlythevaluesfor firststep,wesettheparameterst =0.1andt =0.8,andsearched
1 2
theselectedfeatures,isascloseaspossibletotheoriginalclass forthekernelparametersandpenaltyconstantcasdescribedin
distribution whengiven allfeatures. section 3.5. In the second step, we set the parameters s and c as
foundinthefirststep,andsearchedfortheparameterst andt of
1 2
For this paper, in terms of the above criterion, the data types
the fuzzy membership mapping function. In the first step, we
and the characteristics of remote sensing image, we adopted constructedthetwo-dimensionalgridforthefirstlayer.Thevalues
traditional DB Index rules which used the methods of between- ofcandswereprescribedfrom2214to214,multipliedby24.The
classscatterandwithin-classscattertoselectclassificationfeatures. grid-searchusing5-timerandom-validationwasexecuted,andwe
DBIndexrules are asfollows[42]: foundthattheoptimalparameterpair(c,s)was(210,2210),having
1) the highest overall classification accuracy (93.31%) and kappa
1 X
Si~N kx{Xik, Table2.DB indices offourteen featuresand theirranks.
ix[Ni
where Ni denotes the number of samples of ith class; and Xi Rank Feature DBindex
represents thecenterof the ithclass.
1 the6thbandofETMimage 2.0408
2)
2 the5thbandofETMimage 4.2657
dij~(cid:4)(cid:4)Xi{Xj(cid:4)(cid:4), 3 the4thbandofETMimage 5.0092
4 CI(CompositionIndex) 6.3319
where d isthedistancebetween thecenters of thetwoclasses. 5 the1stcomponentofK-Ltransform 7.0428
ij
3) DB Index DB ~1Xk R, R~ max SizSj,where k is 6 HcomponentofHSIcolorspace 7.6135
k ki~1 i i j~1,(cid:2)(cid:2)(cid:2),k,j=i dij 7 SI(SoilIndex) 8.5819
thenumber of classes. 8 NDVI(NormalizedDifferenceVegetationIndex) 9.8511
ThesmallerthevalueofDB is,thebettertheperformanceof 9 the1stcomponentofK-Ttransform 10.8020
k
classificationis.Basedontheaboverulesand270samplepointsof 10 the1stbandofETMimage 14.8599
each category, we obtained DB indices of fourteen features and 11 the3rdbandofETMimage 25.2807
their ranks(see Table2).
12 the2ndbandofETMimage 26.7408
Parameter setting. Due to the differing nature of the
13 IcomponentofHIScolorspace 29.8844
impacts that algorithm parameters have on different algorithms,
it is impossible to account for such differences in evaluating the 14 ScomponentofHIScolorspace 153.2745
comparative performances of the algorithms [4]. To avoid this
doi:10.1371/journal.pone.0069434.t002
PLOSONE | www.plosone.org 7 July2013 | Volume 8 | Issue 7 | e69434
FuzzyNonlinearProximalSupportVectorMachine
value (0.9248). Table 3 summarized the resultsof first-layer grid- gradient descent with momentum. The other parameters of the
search. Subsequently we constructed the second-layer grid based network are chosen as follows: learning rate g=0.5, momentum
on the center (210, 2210); and the values of c and s were chosen factora=0.8,minimumgradientd=10220,andminimummean
from27to213andfrom227to2213,multipliedby2respectively; squareerrore=1026.Figure3showstheclassificationmapsusing
and the grid-search using 5-time random-validation was imple- theMLC,BPN,PSVM,andFNPSVM,allbasedonthesettingsof
mented. As was shown in Table 4, c=213 and s=2213 gave the above parameters ofvarious classifiers.
bestoverallclassificationaccuracy(93.56%)andkappacoefficient Performance assessments. Normally, settings of the vari-
(0.9275). As the accuracies could fundamentally satisfy our ous parameters on different algorithms affect the classification
classification demand, we began the next step, where we set the results, so it is difficult to evaluate the comparative performances
parametersc 213ands=2213,andsearchedfortheparameterst ofthealgorithmsbecauseofthechangingparameters.Toaddress
= 1
and t . Unfortunately, we couldn’t find that the changes of this problem, the best performance of each algorithm on each
2
parameters t (0.05,0.2) and t (0.7,0.9) to be able to training case was listed in the following tables. The criterion for
1 2
significantly improve the performance of the FNPSVM, hence evaluating the performances of classification algorithms includes
weset t =0.1,t =0.8. accuracy,speed,stabilityandcomprehensibility,amongothers[4].
1 2
(2) Parameter setting of BP neural network.There are many In this paper, we chose one group of criteria, consisting of
parameters associated with BP neural network, including neuron classification accuracy, speed and stability to assess the perfor-
number,transferfunction,learningrate,iterationtimeandsoon. mances of different algorithms. Table 5 gave overall accuracies
Itisnoteasytoknowbeforehandwhichvaluesoftheseparameters and kappa coefficients using various multi-class strategies and
arethebestforaproblem.Consequentlyinthispaper,inorderto classifiers with ETM+ data on different cases. Using different
yieldtheoptimal classificationperformances, thesettings ofsome classifiers under different training conditions, Table 6 gave
key parameters of BP neural network were achieved by repeated training speed and classification speed of the entire data set.
trials andsome experiences fromprevious studying. Means and standard deviations of the overall classification
ABPneuralnetworkwithahiddenlayercanapproximatewith accuracies basing on different training samples, testing samples
arbitrary precisionanarbitrary non-linearfunctionthat’sdefined and features, were manifested in Table 7. Figure 4 shows the
on a compact set of Rn [43,44].We employed three-layer BP boxplots of the overall classification accuracies, developed by
neural network including input layer, hidden layer and output randomlyselectingtrainingsamplesandtestingsamplesfromthe
layer. The number of neurons in the hidden layer is one of the 270samples ofeach classfor sixtimes.
primaryparametersofBPNalgorithm;currentlyhoweverthereis (1)Classificationaccuracy.Inthispaper,classificationaccuracy,
no authoritative rule to determine it. Larger number of hidden oneofthemostimportantcriterionsinevaluatingtheperformance
units leads to a poor generalization and increases training time, oftheclassifier,wasmeasuredusingoverallaccuraciesandkappa
buttoofewneuronswouldcausethenetworkstounfitthetraining coefficientscomputedbytheconfusionorerrormatrix.Themost
set and to prevent the correct mapping of inputs and outputs. In widelyusedwaytorepresenttheclassificationaccuracyofremote
this paper, the number of neurons in the hidden layer was sensing datashould beintheformofan errormatrix,applicable
determined by the empirical formula [44] to be 20, thus the for a variety of site-specific accuracy assessments. Numerous
network structure became n-20-9 (n denotes the number of researchershaverecommendedusingerrormatrixinrepresenting
features). accuracy inthepast, andit hasnowbecome oneof thestandard
We chose log-sigmoid function as the transfer functions from conventionstoadoptsuchpractice.Theeffectivenessoftheerror
inputlayer,whilesettingthelimitontheneuralnetwork’siteration matrix in representing accuracy can be seen from the fact that
number to be 1,000 times for each desired output. Levenberg- accuracies of each category are fundamentally described along
Marquard optimum algorithm (trainlm function in Matlab withboththeerrorsofinclusionanderrorsofexclusionpresentin
software) was utilized as the training function because it could the classification [25,45]. In order to accommodate the effects of
greatlyincreasethetrainingspeedofthenetworkbyutilizingalot chance agreement, some researchers suggest using kappa coeffi-
of memory. Gradient descent with momentum weight and bias cient and adopting it as a standard measure of classification
learning function was employed to calculate a given neuron’s accuracy[46].Foody[47]alsopointedoutthatsincemanyofthe
weight change from the neuron’s input and error, the weight, remote sensing data sets are dominated by mixed pixels, the
learning rate, and the momentum constant according to the standard accuracy assessment measures such as the kappa
Table3.Theoverallaccuracies(%)andkappacoefficientsofthefirstlayergrid-searchusing5-timerandom-validationbasedon
ETM+image.
cs 2214 2210 226 222 22 26 210 214
2214 40.87/0.3348 69.15/0.6530 17.92/0.0766 11.11/0 11.11/0 11.08/0 11.11/0 11.11/0
2210 59.65/0.5461 74.25/0.7103 59.18/0.5408 12.45/0.0151 11.09/0 11.11/0 11.11/0 11.11/0
226 64.00/0.5950 81.48/0.7916 75.83/0.7281 25.00/0.1563 11.43/0.0036 11.52/0.0046 11.34/0.0026 11.34/0.0026
222 76.93/0.7405 88.02/0.8653 85.30/0.8346 42.13/0.3490 12.53/0.0160 11.57/0.0051 11.62/0.0058 11.49/0.0043
22 85.60/0.8380 90.71/0.8955 90.32/0.8911 47.95/0.4145 13.15/0.0230 11.55/0.0050 11.60/0.0055 11.54/0.0048
26 89.33/0.8800 92.78/0.9188 89.91/0.8865 46.20/0.3948 13.80/0.0303 11.42/0.0035 11.43/0.0036 11.61/0.0056
210 91.86/0.9085 93.31/0.9248 89.36/0.8803 48.60/0.4218 13.49/0.0268 11.57/0.0051 11.55/0.0050 11.70/0.0066
214 92.44/0.9150 93.08/0.9221 87.70/0.8616 45.14/0.3828 13.79/0.0301 11.71/0.0068 11.61/0.0056 11.67/0.0063
doi:10.1371/journal.pone.0069434.t003
PLOSONE | www.plosone.org 8 July2013 | Volume 8 | Issue 7 | e69434
FuzzyNonlinearProximalSupportVectorMachine
Table4.Theoverallaccuracies(%)andkappacoefficientsofthesecondlayergrid-searchusing5-timerandom-validationbased
onETM+ image.
cs 227 228 229 2210 2211 2212 2213
27 91.20/0.9010 92.82/0.9193 92.34/0.9138 92.91/0.9203 92.71/0.9180 92.11/0.9113 91.00/0.8988
28 91.14/0.9003 92.45/0.9151 92.74/0.9183 92.71/0.9180 92.42/0.9148 92.07/0.9108 91.71/0.9068
29 91.34/0.9026 91.95/0.9095 92.94/0.9206 92.68/0.9176 92.75/0.9185 92.57/0.9165 92.10/0.9111
210 89.70/0.8841 92.45/0.9151 93.36/0.9253 92.48/0.9155 93.17/0.9231 92.91/0.9203 92.08/0.9110
211 90.99/0.8986 91.37/0.9030 92.37/0.9141 92.75/0.9185 92.99/0.9211 92.29/0.9133 92.57/0.9165
212 90.19/0.8896 91.49/0.9043 92.23/0.9126 93.08/0.9221 92.63/0.9171 92.96/0.9208 93.06/0.9220
213 90.16/0.8893 91.57/0.9051 92.42/0.9148 92.82/0.9193 93.05/0.9218 92.80/0.9190 93.56/0.9275
doi:10.1371/journal.pone.0069434.t004
coefficient isoftennot suitableforaccuracy assessmentinremote against-one’’ multi-class classification strategy of PSVM and
sensing.Althoughitssensitivitytothedensityorfrequencyofthe FNPSVMfor comparisonwith theothertwoclassifiers.
dynamicchangeinrealworldhadsomeresearchersarguingabout 2)ThelevelofclassificationaccuraciesachievedbyPSVMand
its effect, the fact remains that the kappa coefficient has many FNPSVM was significantly higher than that produced by either
intriguing features as an index of classification accuracy. More theMLCorBPNclassifier.Inaddition,theyyieldedsignificantly
specifically, it offers some compensation for chance agreement, betterresultsthantheMLCorBPNclassifierdidinall12training
and a variance term could be calculated, enabling the statistical cases(Table5).TheaccuracydifferencesbetweenthePSVMand
testingofthesignificanceofthedifferencebetweentwocoefficients FNPSVMwere rather small,and quitethesame asthat between
[25,48]. theMLCandBPN(Table5).Themeanoverallaccuraciesofthe
We also need to emphasize that the various measures of PSVMandFNPSVMwereremarkablyhigherthanthoseofMLC
accuracyaretoevaluatedifferentcomponentsofaccuracyandto and BPN, however the differences between MLC and BPN or
make different assumptions on thedata [49]. The fact is that the betweenPSVMandFNPSVMwereonlyslight(Table7).Thisis
measurement and meaning of classification accuracy depend expectedbecausethePSVMandFNPSVMaredesignedtolocate
substantially on individual perspective and demands [49,50]. An anoptimalseparatinghyperplane,whiletheothertwoalgorithms
accuracyassessmentcanbeconductedforavarietyofreasons,and maynotbeabletolocatethisseparatinghyperplane.Statistically,
many researchers have recommended that measures such as the the optimal separating hyperplanes located by the PSVM and
kappa coefficientofagreementbeadopted asastandard[25,46]. FNPSVMshouldbegeneralized tounseensampleswiththeleast
errorsamongallseparatinghyperplanes.Generally,asthenumber
q of available features increases, the overall accuracies and kappa
X
Overallaccuracy~ n =n|100% coefficients of PSVM and FNPSVM grow gradually. Unexpect-
kk
k~1 edly however, the increase in the number of available features
didn’t always lead to an improvement of the accuracies of MLC
and BP. On the contrary, MLC and BP showed better
q q comparative performances on training cases with ten features
n P nkk{ P nkznzk thantheydidontrainingcaseswithfourteenfeatures,whichmight
Kappacoefficient~ k~1 q k~1 be explained by the presence of a large number of irrelevant
n2{ P nkznzk features that would hurt the classification performances. This
k~1 againdemonstratestheimportanceoffeatureselection.Intermsof
Table5,itcouldbeseenthattheaccuraciesandkappacoefficients
In terms of the above parameters selected from different
ofthefouralgorithmsimprovedwiththeincreaseintrainingdata
algorithms, and basing on the 270 samples of each category
size,though not significantly.
obtained through simple random sampling design, we obtained
3)TheoverallaccuracydifferencesbetweenMLCandBPNon
overall classification accuracies and kappa coefficients using
the data set used in this study were generally small, and those
various multi-class strategies and classifiers on 12 training cases
withtheETM+datasetconsistingof4,037,099points(seeTable5). between PSVM and FNPSVM were also not obvious. However,
manyof them were statisticallysignificant.
Unfortunately, confronting such a large dataset, SVM failed on
(2) Training speed and classification speed.Training speed and
this problem because it required the more costly solution of a
classification speed are twoimportant criterions in evaluating the
linear or quadratic program. Several patterns can be observed
performances of classification algorithms. Shown in Table 6, the
fromTable 5 andTable 7,explained as follows:
training speed and classification speed of the four classifiers were
1)Asfarasthemulti-classclassificationstrategiesofPSVMand
substantially different. Generally, the training time and classifica-
FNPSVM were concerned, the accuracies of ‘‘one-against-one’’
tion time rise with an increase in available features. The training
strategyinalltrainingcaseswereabout1–2%higherthanthoseof
speedofBPNwassignificantlylowerthanthoseoftheotherthree
‘‘one-against-rest’’ strategy. Also, through experiments, we found
classifiers because of its complex network structure. As far as
that compared to the classification speed of ‘‘one-against-rest’’
classification speed was concerned, in all training cases, those of
strategy,theclassificationspeedof‘‘one-against-one’’strategywas
thePSVMandFNPSVMwereremarkablylowerthanthoseofthe
atleasttwotimeshigher,forbothPSVMandFNPSVM(notlisted
MLC and BPN. The classification of the MLC and BPN in all
in the following tables). So in this paper, we employed ‘‘one-
trainingcases tookfromlessthananhourtoonlyafewminutes,
PLOSONE | www.plosone.org 9 July2013 | Volume 8 | Issue 7 | e69434
FuzzyNonlinearProximalSupportVectorMachine
Figure3.ClassificationmapsforthetestareainnorthernChinausingvariousclassifiersunderthesametrainingcase(90training
samplesforeachclass,10features).(a)MLCalgorithm.(b)BPNalgorithm,g=0.5,a=0.8,d=10–20,e=10–6.(c)PSVMalgorithm,c=213,s=2–
13.(d)FNPSVM,t1=0.1,t2=0.8,c=213ands=2–13.
doi:10.1371/journal.pone.0069434.g003
whilethePSVMandFNPSVMtookmorethanseveralhoursand optimalkeyparametersincludingthekernelparameterssandthe
ten hours, respectively. This was due to the fact that PSVM and constant c in the training process, therefore yielding a better
FNPSVM involved large matrix calculation and reverse matrix performance. Compared with PSVM, the training speed and
operationduringtheprocessofclassification.Inaddition,itshould classification speed of FNPSVM were more than twice its
be noted that we have spent much time in searching for the counterparts. The reason was that in terms of the comparison
PLOSONE | www.plosone.org 10 July2013 | Volume 8 | Issue 7 | e69434
Description:an algorithm named fuzzy nonlinear proximal support vector machine This algorithm is applied to extract various types of lands of the city Da'an in northern China diagonal matrix D, with +1 or -1 elements along its diagonal. The .. moderate saline-alkalized land, and light saline-alkalized land.