Table Of ContentSocio-Aff ective Computing 2
Basant Agarwal
Namita Mittal Editors
Prominent Feature
Extraction for
Sentiment Analysis
Socio-Affective Computing
Volume 2
SeriesEditor
AmirHussain,UniversityofStirling,Stirling,UK
Co-Editor
ErikCambria,NanyangTechnologicalUniversity,Singapore
This exciting Book Series aims to publish state-of-the-art research on socially
intelligent,affectiveandmultimodalhuman-machineinteractionandsystems.Itwill
emphasizetheroleofaffectinsocialinteractionsandthehumanisticsideofaffective
computingby promotingpublicationsat the cross-roads between engineering and
human sciences (including biological, social and cultural aspects of human life).
Three broad domains of social and affective computing will be covered by the
book series: (1) social computing, (2) affective computing, and (3) interplay of
thefirsttwodomains(forexample,augmentingsocialinteractionthroughaffective
computing).Examplesofthefirstdomainwillincludebutnotlimitedto:alltypesof
socialinteractionsthatcontributetothemeaning,interestandrichnessofourdaily
life, for example, information produced by a group of people used to provide or
enhancethefunctioningofasystem.Examplesoftheseconddomainwillinclude,
but not limited to: computational and psychological models of emotions, bodily
manifestations of affect (facial expressions, posture, behavior, physiology), and
affectiveinterfacesandapplications(dialoguesystems,games,learningetc.).This
series will publish works of the highest quality that advance the understanding
and practical application of social and affective computing techniques. Research
monographs, introductory and advanced level textbooks, volume editions and
proceedingswillbeconsidered.
Moreinformationaboutthisseriesathttp://www.springer.com/series/13199
Basant Agarwal • Namita Mittal
Prominent Feature Extraction
for Sentiment Analysis
123
BasantAgarwal NamitaMittal
MalaviyaNationalInstituteofTechnology MalaviyaNationalInstituteofTechnology
Jaipur,India Jaipur,India
Socio-AffectiveComputing
ISBN978-3-319-25341-1 ISBN978-3-319-25343-5 (eBook)
DOI10.1007/978-3-319-25343-5
LibraryofCongressControlNumber:2015954185
SpringerChamHeidelbergNewYorkDordrechtLondon
©SpringerInternationalPublishingSwitzerland2016
Thisworkissubjecttocopyright.AllrightsarereservedbythePublisher,whetherthewholeorpartof
thematerialisconcerned,specificallytherightsoftranslation,reprinting,reuseofillustrations,recitation,
broadcasting,reproductiononmicrofilmsorinanyotherphysicalway,andtransmissionorinformation
storageandretrieval,electronicadaptation,computersoftware,orbysimilarordissimilarmethodology
nowknownorhereafterdeveloped.
Theuseofgeneraldescriptivenames,registerednames,trademarks,servicemarks,etc.inthispublication
doesnotimply,evenintheabsenceofaspecificstatement,thatsuchnamesareexemptfromtherelevant
protectivelawsandregulationsandthereforefreeforgeneraluse.
Thepublisher,theauthorsandtheeditorsaresafetoassumethattheadviceandinformationinthisbook
arebelievedtobetrueandaccurateatthedateofpublication.Neitherthepublishernortheauthorsor
theeditorsgiveawarranty,expressorimplied,withrespecttothematerialcontainedhereinorforany
errorsoromissionsthatmayhavebeenmade.
Printedonacid-freepaper
SpringerInternational PublishingAGSwitzerlandispartofSpringerScience+Business Media(www.
springer.com)
Thebookisdedicatedto myfamily.
Preface
The field of sentiment analysis is an exciting and new research direction due to
a large number of real-world applications. Discovering people’s opinion is very
importantforbetterdecision-making.Sentimentanalysisisthestudythatanalyses
people’sopinionandsentimenttowardsentities,suchasproducts,services,etc.,in
thetext.Ithasalwaysbeenimportanttoknowwhatotherpeoplethink.Peopleare
usingonlinereviewsites,blogs,forums,socialnetworkingsites,etc.,forexpressing
theiropinionthatincreasetheuser-generateddataontheweb.Therefore,anecessity
ofanalysingandunderstandingtheseonlinegenerateddata/reviewshasarisen.The
usercanknowthemeritsanddemeritsoftheproductfromtheexperiencessharedby
peopleontheweb,whichcanbeusefulforthemindecision-making.E-commerce
companiescanimprovetheirproductorservicesonthebasisofpeople’sopinionand
currenttrends.Theautomaticanalysisofonlinecontentstoextractopinionrequires
deep understandingof naturaltext by the machine, but capabilitiesof most of the
existingmodelsareknowntobeunsatisfactory.
Twotypesofapproacheshavebeenusedintheliteratureforsentimentanalysis:
(i) machine learning approach and (ii) semantic orientation approach. Machine
learningapproachesfacemainchallengesas(i) machinelearningapproachespro-
ducehigh-dimensionalfeaturevectorconsistingofnoisy,irrelevantandredundant
features. Most of the existing feature selection techniques, used for sentiment
analysis, do not consider the redundancy among the features. Existing methods
select the importantfeaturesbased on goodnesscriteria for the class attribute. (ii)
Generally,generatedfeaturevectorhastodealwithproblemofdatasparsity.
Semanticorientationapproachesarecategorizedintocorpus-basedandlexicon-
based (knowledge-based) approaches. Corpus-based approaches mainly depend
on the method to determine the polarity of the words. These approaches do not
performwellbecausepolarityofwordschangeswiththedomainandcontext,and
there is no such corpus available which can provide polarity of words depending
on the domain and context. Knowledge-based approaches depend on the already
developed knowledge bases like SentiWordNet, WordNet, etc. The problem with
vii
viii Preface
these approaches is the coverage of knowledge bases as most of the available
knowledge bases contain general knowledge (not affective knowledge) which is
insufficienttodeterminethepolarityofthedocument.
Theobjectiveofthisbookistoimprovetheperformanceofthesentimentanal-
ysismodelbyincorporatingthesemantic,syntacticandcommon-senseknowledge.
This book presents the semantic concept extraction method that uses dependency
relations between words to extract the features from the text. Proposed approach
combinesthesemanticandcommon-senseknowledgeforthebetterunderstanding
ofthetext.Inaddition,thebookalso presentsnovelmethodstoextractprominent
featuresfromtheunstructuredtextbyeliminatingthenoisy,irrelevantandredundant
features. This book also aims to propose a method for efficient dimensionality
reductiontoalleviatethedatasparsenessproblembeingfacedbymachinelearning
model.Themainfindingsofthisbookareasfollows.
1. Performanceofthe sentimentanalysiscanbe improvedbyreducingthe redun-
dancy among the features. In this book, experimental results show that min-
imum Redundancy-Maximum Relevance (mRMR) feature selection technique
improvestheperformanceofthesentimentanalysisbyeliminatingtheredundant
features.
2. Boolean Multinomial Naive Bayes (BMNB) machine learning algorithm with
mRMRfeatureselectiontechniqueperformsbetterthanSupportVectorMachine
(SVM)classifierforsentimentanalysis.
3. Theproblemofdatasparsenessisalleviatedbysemanticclusteringoffeatures,
whichinturnimprovestheperformanceofthesentimentanalysis.
4. Semantic relations among the words in the text have useful cues for sentiment
analysis. Common-sense knowledge in form of ConceptNet ontology acquires
knowledge,whichprovidesabetterunderstandingofthetextthatimprovesthe
performanceofthesentimentanalysis.
5. Consideringtheimportanceof thefeaturewith respectto the domainimproves
theperformanceofthesentimentanalysis.
6. Splitting of the multi-word features improves the performance of sentiment
analysisforthedomainshavingonlylimitedlabelleddataset.
Alltheexperimentsareperformedonfourstandarddatasets,viz.moviereview
dataset provided by Cornell University and product review dataset (i.e. book,
DVD, electronics) consisting of Amazon reviews. Experimental results show the
effectivenessofalltheproposedmethodsoverstate-of-the-artmethods.
Jaipur,India BasantAgarwal
December,2015
Acknowledgements
This book would not be possible without the contributionof many individuals, to
whom I express my appreciation and gratitude. Mostly the contents of this book
are takenfrommyresearchwork donefor PhD.First of all, I am deeplyindebted
tomysupervisorDr.NamitaMittalfortheknowledgeandexperiencesheshared
withme.ShewasalwaysavailablewheneverIneededherguidance.Herinvaluable
suggestion and advice helped me a lot in nurturingmy research work in the right
direction.Sheisverykindandsupportive.Itisimpossibleformetoimaginehaving
abettersupervisorformyPhD.Itwasreallyanhonourtoworkwithher.
I would especially like to thank Dr. Erik Cambria for permittingme to be an
internatTemasekLaboratoriesattheNationalUniversityofSingaporeandforall
the knowledge he and his group shared with me during my internship period. I
wouldliketothankSoujanyaPoriawithwhomImetthere;discussionswiththemon
variouscommon-senseandSenticNet-basedsentimentanalyseswereveryuseful.
I am gratefulto Dr M. C. Govil,Head ofDepartment,ComputerEngineering,
Malaviya National Institute of Technology,Jaipur, for his kind cooperation. I am
also grateful to all the members of my doctoral guidance committee: Dr. Neeta
Nain, Dr. Girdhari Singh, Dr. Dinesh Goplani and Dr. D. Boolchandani; their
constructive suggestions helped me a lot to improve my PhD work. I would like
to thank all the learned faculty members, especially Dr. Vijay Laxmi and Prof.
M.S.Gaur,fortheirsupportandhelpduringthisperiod.Iamalsothankfultothe
non-teachingstaffoftheDepartmentofComputerEngineeringforallthehelp.
My special gratitude also goes to all the persons from whom I have received
criticalreviewsduringthisperiod:Prof.AlexandraGelbukh(ResearchProfessor,
CenterforComputingResearch,CIC,NationalPolytechnicInstitute,IPN,Mexico),
Prof.PushpakBhattacharya(Professor,IndianInstituteofTechnology,Bombay,
India),Prof.R.K.Agarwal(Professor,JawaharlalNehruUniversity,Delhi,India)
and Prof. Niladri Chatterjee (Professor, Indian Institute of Technology, Delhi,
India) who visited MNIT during this period. Their valuable suggestions and
feedbackwereusefulformyPhDwork.
ix