Table Of ContentA Warm Welcome Matters! The Link Between Social
Feedback and Weight Loss in /r/loseit∗
Tiago O. Cunha,1,2 Ingmar Weber,1 Gisele L. Pappa2
1QatarComputingResearchInstitute,HBKU,Qatar
2FederalUniversityofMinasGerais,Brazil
[email protected],[email protected],[email protected]
ABSTRACT arguably,manyothereffectsfromone’ssocialenvironmentshould
dominate the benefits of receiving encouragement from potential
7 Socialfeedbackhaslongbeenrecognizedasanimportantelement
strangersonline.
1 ofsuccessfulhealth-relatedbehaviorchange.However,mostofthe
Tostudytheimportanceofonlinesupport,onewould,ideally,set
0 existingstudieslookattheeffectthatofflinesocialfeedbackhas.
up a proper experiment with a randomized control and treatment
2 Thispaperfillsgapsintheliteraturebyproposingaframeworkto
group. The treatment group would then receive encouragement
studythecausaleffectthatreceivingsocialsupportintheformof
n while the control group remains ignored. However, such studies
commentsinanonlineweightlosscommunityhason(i)theprob-
a bothrequireaccesstoanappropriateplatformandtheyalsocome
J abilityoftheusertoreturntotheforum,and,moreimportantly,on
withcertainethicalconcerns[34].Pushedtothelimititwould,for
(ii)theweightlossreportedbytheuser.Usingamatchingapproach
8 example,beunethicaltowithholdonlinesocialsupportfromasui-
forcausalinferenceweobserveadifferenceof9lbslostbetween
1 cidalpersonwantingtochat.Evennotofferingsupporttoaperson
userswhodoordonotreceivecomments. Surprisingly,thiseffect
] ismediatedbyneitheranincreaseinlifetimeinthecommunitynor tryingtoloseweightcouldbequestionable.
Inthiswork,weproposeaframeworkforconductingcausation
I byanincreasedactivityleveloftheuser. Ourresultsshowtheim-
S studies on weight loss from social media data. We make use of
portance that a “warm welcome” has when using online support
. agrowingcollectionofmethodsforcausalinferencefromobser-
s forumstoachievehealthoutcomes.
c vationaldata. Thoughnotwithouttheirlimitations,suchmethods
[ allowtogobeyondarguingaboutcorrelations, attemptingtorule
1. INTRODUCTION outasmanyconfoundingfactorsaspossible.Inthisstudywelook
1
Overthelastdecades,obesityrateshaveincreasedinmanycoun- at the effect of receiving social support in a popular weight loss
v
triesaroundtheworld,makingtheconditionamajorpublichealth community, the /r/loseit subreddit, on users’ self-reported weight
5
2 problem. Obesityisassociatedwithsignificantlyincreasedriskof loss. Specifically,welookatwhetheruserswhoreceiveacertain
2 morethan20chronicdiseasesandhealthconditions[39],anddi- numberofcommentsontheirfirstpostinthecommunityaremore
5 rectlyaffectsqualityoflife. IntheU.S.68.8%ofadultsareover- likelythanthosewhodonotto(i)returnforanotheractivityinthe
0 weightorobese1. Financially,healthissuesrelatedtoobesityand community, and to (ii) later report a higher weight loss, as mea-
. lifestylediseasesimposeanever-increasingburdenwithmedical suredtothecommunity’sbadgesystem.Tocorrectforcontentdif-
1 costslinkedtoobesityestimatedatUSD147billionin20082. ferencesinusers’posts,whicharelinkedtoreceivingmoreorless
0 Though the percentage of U.S. adults self-reporting to be on a support,weapplyamatchingapproach:apostreceivinganumber
7 dietinanygivenweekhasfallenfrom31%in1991,20%arestill ofcommentsbiggerthanacutoffispairedwithapostverysimi-
1
tryingtoloseweightthroughdietingatanypointintime3. Previ- larincontentthatreceivedanumberofcommentssmallerthanthe
:
v ousstudieshavetiedsuccessfulweightlossaswellasotherpositive cutoff. Heresimilarityisdefinedintermsofpostsexhibitingsim-
i healthoutcomestothepresencesofsocialsupport.Inparticularfor ilarfeaturesderivedviaastatisticalmodelusingLDAtopics[5],
X long-termbehavioralchangesrequiredforachievingandmaintain- LinguisticInquiryandWordCount(LIWC)4 features, sentiment
r ingweightloss,userengagementisessential[38].Previousstudies analysis[19],question-centricwordsandpostslength.
a
showedthatuserswhostaylongerintheseprogramshavegreater Similartopreviouswork[10,2],weobserveanincreaseinreturn
successinachievingtheirgoals[30]. probability to the community for those users receiving feedback.
Throughtheinternetandsocialmediaithasbecomeeasierfor Wethenextendpreviousworkbyshowingthat, amongreturning
userstofindvirtualsupportgroupsforanythingfromweightloss users, thosewhohadpreviouslyreceivedcommentsontheirpost
to drug addiction or depression. Though there are a number of reporthigherweightlossthanthematchedcontrolgroupwhodid
studieslookingattheeffectofreceivingsocialsupportonsustained not(46lbvs.37lb).Thesefindingsarestatisticallysignificant.
engagementwithanonlinecommunity[7,10,2],theeffectofsuch Toseeif,forthoseusersreturningtothecommunity,thediffer-
support on health outcomes has not been thoroughly studied. In enceinreportedweightlossismediatedbyadifferencein(i)future
particular,itisnotclearhowlargeonewouldexpecttheeffectof lifetimeinthecommunity,or(ii)anincreasedengagementwiththe
receivingonlinecommentsonobservedhealthoutcomestobeas, communityweappliedaso-calledSobel. Somewhatsurprisingly,
onlyabout5%ofthedifferenceinreportedweightlossappearsto
∗ThisisapreprintofanarticleappearingatWWW2017 be mediated by an increased lifespan in the community, and this
1http://www.niddk.nih.gov/health-information/health-statistics/ is not statistically significant even at p = .1. Instead, the rate of
Pages/overweight-obesity-statistics.aspx weightlossisthemaindifferencebetweenthetwogroups.
2http://www.cdc.gov/chronicdisease/overview/#sec3
3http://goo.gl/0naf5U 4http://www.liwc.net
Ourmaincontributionsareasfollows. of weight loss [41], or to show that a network of engaged users
is linked to persistent sharing of fitness related information [29].
• We propose a framework to study the effect of receiving The latter is of great importance as self-monitoring is one of the
commentsonauser’sfirstpostinaweightlosscommunity factorsalreadyshowntobeassociatedwithincreasedweightloss
onlaterreportedweightloss. [18].However,noneofthesestudieswereabletoisolatetheeffects
• Confirmingpriorwork,weobservedanincreasedreturnprob-
of online social support on weight loss, as there are many other
abilityforthoseuserswhoreceivecommentsvs.thosewho
underlyingfactorsandreal-worldvariablesdifficulttoaccountfor.
donot.
Themaingoalofourworkistomovefromdetectingcorrelation
• Amongthoseuserswhoreturnandreportweightlossthrough
and towards demonstrating causation. For that, we need to ade-
theirbadges, thereisa9lbsdifferencebetweenthosewho
quatelycontrolforcovariatesthataffecttheprobabilityofreceiving
hadpreviouslyreceivedcommentsandthosewhohadnot.
socialsupport. Forexample,differenttypesofpeoplemightdiffer
• We show that the difference in reported weight loss is not
intheirabilitytoelicitsocialsupportbecauseofdifferencesintheir
mediatedby(i)alongerlifetimeinthecommunityor(ii)an
personality,theirmood,ortheirwritingstyle.Similarly,mediation
increasedactivitylevelinthecommunity.
• Weprovideadetaileddiscussionoflimitations, designim- mechanisms that may underlie the observed relationship between
plicationsandpotentialextensions. socialsupportandweightloss,andwhicharenottypicallyinvesti-
gated,needtobeconsidered.Weproposedtoimprovetheabilityto
Wehopethattheinsightsderivedfromourstudyleadtomech- explaintheeffectofthesupportreceivedby(i)applyingamatch-
anismsfurtherstrengtheningthesocialsupportonlineforumspro- ingapproachtoreducethebiasinestimatingtheeffectofreceiving
vide,especiallytonewusers. socialsupport,and(ii)applyingSobelTeststotestfortheexistence
ofmediatingeffects.
2. RELATEDWORKS StudiesofhealthcommunitiesinReddit.Reddithasbeenused
tostudydifferenthealthconditionsunderdifferentperspectives,in-
Ontheimportanceofsocialsupportforpositivehealthout-
cluding social support. For instance, Cunha et al. [10] study the
comes. Priorresearchhasextensivelyexaminedtheroleofsocial
r/loseitsubredditandobservethatsocialsupportseemstobelinked
support in enhancing mental and physical health. It has been ar-
toanincreaseinreturnprobability,afindingouranalysisconfirms.
gued that receiving social support may reduce the rate at which
Nevertheless,thegreatmajorityofReddithealthcommunitystud-
individualsengageinriskybehaviors,preventnegativeappraisals,
iesperformmoreexploratoryanalysisofusersbehaviorordeter-
and increase treatment adherence [15]. Research has shown that
minecorrelationsbetweenlanguageandhealthoutcomes,withthe
conditionssuchassmoking[26],depression[16],andcoronarydis-
fewstudieslookingatcausalitymentionedinthenextsubsection.
ease[25]maybecontrolledwithsocialsupport.Also,face-to-face
Inthefirstcategory,Eschleretal.[14]performacontentanalysis
support groups are positively correlated with desirable outcomes,
in the posts of patients in different cancer stages in the subreddit
suchaslowerbloodpressure,andlowerbloodsugarlevels,result-
r/cancer,showingthatpatientsandsurvivorparticipantsshowdif-
ingindirectlyfromadaptivecopingskillsandresponses[36].
ferenttypesofemotionalneedsaccordingtotheirillnessphase.In
In the context of obesity, improvements in healthy eating and
thesecondcategory,Tamersoyetal.[37]performedananalysisto
physicalactivity, aswellmoresuccessfuloutcomesinweightre-
identifykeylinguisticandinteractioncharacteristicsofshort-term
ductionprograms, havebeendemonstratedinstudiesconsidering
andlong-termabstainers,focusingontobaccooralcohol.
offlinesupportgroups[43].
Causalinferencefromuser-generatedobservationaldata. A
Onlinesupportforums. Themainimplicationofthesestudies
commonmethodologyforcausalinferencefromobservationaldata
isthatdevelopingsocialsupportnetworksmayhelppeopleman-
wasborrowedfromthedomainofmedicineandappliespropensity
age their health conditions. Online health communities can be
scorematching[21].ThismethodologyisusedbyChoudhuryetal.
used to develop large social support networks, to understand and
[11]toidentifysuicidalideationinRedditmentalhealthcommu-
to promote health behavior. People have always tried to answer
nities, Tsapeli and Musolesi [40]to investigate the causal impact
health related questions by themselves, now the Internet has be-
ofseveralfactorsonstresslevelusingsmartphonedata,DosReis
comeanimportantresource. Previousstudiessuggestthat30%of
and Cullota [13] to study the effect of exercise on mental health,
U.S. Internet users have participated in medical or health-related
andChengetal. [8]tounderstandwhethercommunityfeedback
groups.Advantagesofonlinecommunitiesincludeaccesstomany
regulates the quality and quantity of a user’s future contributions
peerswiththesamehealthconcerns,andconvenientcommunica-
inawaythatbenefitsthecommunity. Olteanuetal. [28]present
tionspanninggeographicdistances.Thesecommunitiespresentan
amoregeneralframeworkforapplyingthemethodologytosocial
interestingcontrasttosimilarofflinegroups,astheyprovideanen-
mediatimelines.
vironment where people aremore likely to discuss problems that
However, we did not find any studies looking at causality be-
theydonotfeelcomfortablediscussingface-to-face. Inaddition
tweenonlinevariablesandweightlossinonlinecommunities.Cunha
suchonlinehealthcommunitiesareknowntofosterwell-being,a
etal.[10]applyasimilarmatchingframeworktotheoneproposed
senseofcontrol,self-confidenceandsocialinteractions[22].
heretostudytheeffectofsocialsupportonachangeinreturnprob-
Still, little is known about how the support provided in these
ability in the subreddit r/loseit. They do not, however, study the
communities can help enhance positive health outcomes, such as
ultimatelymoreimportantissueofweightloss,anddonotconsider
weightloss.Theliteratureofferslittleinformationabouthowmem-
mediatingeffects. Theyalsofailtoreportonformalmeasuresof
bersoflargeonlinehealthcommunitiesexperiencesocialsupport
thequalityofthematching,suchaswhetherthecovariatesarein-
for weight loss. Most works concentrate on showing that social
deed balanced in the control and the treatment group. Here, we
supportexistsinonlineweight-managementcommunities,andqual-
apply this well-known methodology to study causality in weight
ify the types of support present online [41, 3]. For example, the
loss,accountingforbalancecheckingandmediationanalysis. Our
presenceofsupportwasshowninpopularweight-losscommuni-
matchingoccursdirectlyonthevariablesandnotonthepropensity
ties,includingSparkPeople[20]andFatSecret[4].
scoreswhich,asshownin[24],ispreferable.
Based on the fact that support exists, a few studies have tried
tocorrelateonlineengagementandsupportwiththeeffectiveness
3. DATA (link post). This gave us a set of 25,647 users who had no pub-
Reddit5 is a social news website and forum. Its content is or- licactivityinthecommunitypriortotheirpost. Weusethisuser
ganizedincommunitiesbyareasofinterestcalledsubreddits. In set to study the effect of receiving comments on this post on the
2015ithad8.7millionusersfrom186countrieswriting73.2mil- probabilitytoreturnlater.
lionpostsand725.9millioncommentsin88,700activesubreddits6. Group 2 (G2). For users in Group 1, we extracted the list of
Forourstudywelookatthepopularweightlosssubredditloseit7. unique users that both (i) returned again to the community later
Thedatausedinouranalysiscoversfiveyears(August2010to tocommentorpostand(ii)alsohadbadgeinformationindicating
October2014)andwascrawledfromRedditusingPRAW(Python weightloss. Thisleftuswithasetof6,143users. Figure2shows
RedditAPIWrapper),aPythonpackagethatallowssimpleaccess the weight loss distribution displayed in the badges. We use this
to Reddit’s official API in November 2014. In Reddit users can setofreturninguserstostudytheeffectofreceivingcommentson
submitcontent,suchastextualpostsordirectlinkstoothersites, theirinitialpostontheweightlosstheyachieve.
bothcollectivelyreferredtoasposts.Thecommunitycanthenvote
postedsubmissionsup(upvotes)ordown(downvotes)toorganize
the posts and determine their position on the site’s pages. Infor-
mationondownvotesandupvotesis,however,notexposedviathe
API,insteadtheyexposetheaggregatednumberofvotes,referred
toasscore(numberofupvotesminusnumberofdownvotes).Users
canalsoreplytopostswithcomments.
Thedatawecollectedincludeposts,commentsandothermeta-
data (i.e., timestamp, user name, score). In total, we obtained
70,949 posts and 922,245 comments. These data were generated
by107,886uniqueusers, ofwhich38,981(36.1%)wroteatleast
onepostand101,003(93.6%)atleastonecomment.Table1shows
themean, medianandstandarddeviation(SD)forbasicstatistics
ofthedataset,includingthelengthofpostsandcommentsandthe
numberofdailymessages.
Mean Median SD
Postsperday 45.5 45 22.7
Commentsperday 586.6 599 264.3 Figure2:Users’weightlossdistributiondisplayedinthebadges.
Scoreperpost 35.7 6 126.7
Scorepercomments 3.1 2 11.4
Wordsperposts 89.3 64 95.8 4. METHODS
Wordspercomments 25.5 14 35.3
Inthissection,wediscussourmatchingmethodologytoinvesti-
gateapotentialcausaleffectofreceivingsocialsupportintheRed-
Table1:Basicstatisticsofloseitdataset.
dit loseit community. First we present the steps of our matching
approach(seeFigure3),laterweexplainthemediationtest.
Aparticipatingusercanadda“badge”(theiconwhichappears
nexttousernames,seeFigure1)totheirprofilethatindicatesself-
reportedinformationabouttheirweightlossprogressinpoundsand
kilograms.Thebadgescanbeupdatedbytheusersatanytime.
Figure1:Examplesoftheusers’weightlossbadgesonloseit.The
weightlossvalueisdisplayedinpoundsandkilograms.
Figure3:Matchingapproachdiagram.
Toanswerourresearchquestionswecreatedtwosetsofusers.
Group1(G1). Weextractedthelistofuniqueuserswhosefirst
Step1-Treatmentandcontroldefinition-Thefirststepofour
recordedactivityinthecommunitywasatextualpost(selfpost),
matchingapproachistochoosetheappropriatedefinitionoftreat-
ratherthanacommentorapostconsistingexclusivelyofaURL
ment. Given the fact that 96% of the first posts received at least
5https://www.reddit.com onecomment,weexperimentedwithdifferentdefinitionsoftreat-
6“Active”isdeterminedbyhaving5ormorepostsandcomments menttoavoidthatthenumberofremainingmatchedobservations
duringatleastoneweekin2015. becomestoosmalltodrawanystatisticallysignificantconclusions.
7loseit-LosetheFat,https://www.reddit.com/r/loseit/. Afterrunningtheexperiments,wedefinedthetreatmentgroupas
thoseuserswhoreceivedatleast4commentsontheirfirstpostin sentimentand14LDAtopics. Additionallyweusethecoefficient
loseit.Thecontrolgroupconsistsofalluserswhoreceived3orless valuesascovariates“weights”inthesimilaritycomputationinStep
commentsontheirfirstpost.Withthisdefinition,wecanguarantee 3. Thischoicewasmotivatedbythefactthattheregressioncoef-
(i)statisticalsignificanceofourfindings,and(ii)thebalance(see ficientshavetwodesirableproperties. Thefirstoneisascalenor-
Step4)betweenthetwogroupsafterperformingmatching. malizationproperty,wheresomethingmeasuredin,say,kilometers
Step2-Statisticalmethodforcovariatesselection-Choosing would have a larger coefficient than the same property measured
appropriate confounding variables is an important step in match- inmeters. Thisnormalizationiscrucialforcomputingmeaningful
ingmethods. Ideally,conditionalontheobservedcovariates,there similaritiesinametricspace. Second, theyreflectanimportance
shouldbenoobserveddifferencesbetweenthetreatmentandcon- ofthepredictivevariableinrelationtotheresponsevariable. This
trol groups. To satisfy the assumption of ignorable treatment as- meansthatvariableswithmoreeffectonreceivingfeedbackwillbe
signment, itisimportanttoincludeinthematchingprocedureall givenmoreimportanceonthepostsimilarity.
variablesknowntoberelatedtothetreatmentassignment. Gener- Step 3 - Matching approach - Matching is a nonparametric
allypoorperformanceisfoundbymethodsusingarelativelysmall method of controlling for the confounding influence of pretreat-
set of “predictors of convenience", such as gender only. Oppo- mentcontrolvariables(alsoknownasconfoundingorcovariates)
sitely, including variables that are actually unassociated with the inobservationaldata. Thekeygoalofmatchingistopruneobser-
outcomecanyieldslightincreasesinvariance.Commonlythecon- vationsfromthedatasothattheremainingdatahavebetterbalance
foundings’choiceisbasedonpreviousresearchandscientificun- betweenthetreatedandcontrolgroups,meaningthattheempirical
derstanding,whichcanyieldresearcherdiscretionandbias[35]. distributions of the covariates in the groups are more similar and
Hereinstead,weproposetouseastatisticalmodeltoselectthe modeldependencyisreduced[23].
most important covariates. We first examine whether attributes Withoutmatchingwemayhaveimbalance,forexample,agen-
ofthecontentofposts, arepredictiveofreceivingtreatment. We erallyoptimisticusermightwriteafirstpostwithamorepositive
model a prediction task with the data being split into two cate- tonethanamorepessimisticcounterpart. Letusimaginethat, in
gories,theonesthatreceivedtreatmentandtheonesthatdidnot. responsetotheirposts,theformeruserreceiveslotsofsupportand
Thenweusethevariablesthatremainedinthefinalmodelasthe the latter receives none. Now let us further imagine that the for-
confoundingsinthematchingapproach(seeStep3). meruserreturnsformoreactivityonthesubredditlater, whereas
Inourcase,thedefinitionofthepredictivevariableswasmoti- thelatteruserisnevertobeseenagain. Thequestionthenarises
vatedbythehypothesisthatpostswithsimilarcontenthaveasim- whetherthesupportreceived“caused”theformerusertoreturnor,
ilar probability of receiving feedback. Since user attributes like rather,whetherthatuserwasatahigherdispositiontoreturnany-
demographics or profile images are not available in Reddit, and way and the social support received was a mere correlate. Here
hence the sole focus on the post’s content is natural. We used a thetoneoftheposts,animportantcovariate,isimbalancedandis
topicalrepresentation ofthefirstpost’s content(title+body) ex- generally more positive in the treated group (= those with social
tracted by Latent Dirichlet Allocation (LDA) [5], the various se- support)thaninthecontrolgroup(=thosewithoutsocialsupport).
manticcategoriesofwordsextractedfromLIWC,sentimentanal- Matchingapproachesareappliedinsuchscenariostoremovethe
ysiscomputedwithValenceAwareDictionaryforsEntimentRea- relationshipbetweenthecovariatesandthesupposedcausalvari-
soning(VADER)[19],countsofquestion-centricwords(what,where, ablebyreducingtheimbalance.
when, which, who, whose, why, how) and the length of a post Inthesimplestcase,matchingisappliedtosettingsofadepen-
(number of whitespace delimited words), a total of 78 variables. dentoutcomevariableY,atreatmentvariableT(1 = treated,0 =
i i
TherequiredparametersforLDA–numberoftopics, numberof control) and a set of pretreatment covariates X [32]. We want
i
iterations,αandβ–wereempiricallydefinedas20,2,000,0.4and toobservethetreatmenteffectforthetreatedobservationi(TE),
i
0.1. Therationaletogetquestionwordsistounderstandtowhat whichisdefineasthevalueofiwhenireceivesthetreatmentminus
extent posts on weight loss seek explicit feedback or suggestions thevalueofiwhenitdoesnotreceivethetreatment.
fromtheRedditcommunity.
We used a logistic regression with LASSO as our prediction
TE =Y(1)−Y(0)=observed−unobserved (1)
method. Logistic regression is well-suited to handle binary de- i i i
pendent variables, while LASSO is a method that performs both Obviouslyifiistreated,wecannotalsoobserveiwhenitdoes
variable selection and regularization in order to enhance the pre- notreceivethetreatment. Hence,matchingestimatesY(0)witha
i
diction accuracy and interpretability of the statistical model. To Y (0), where j is similar to i. In the best case, each i is matched
j
assess the quality of the model produced, we (i) computed the to a j with the exact same values for all the control variables. In
mean AUC (Area Under the Curve) over the 10-fold cross vali- practice,“similarenough”observationsarebeingmatched.
dation setting, and (ii) performed a qualitative analysis using the Matching can be viewed as trying to find hidden randomized
featuresthatremainedinthemodeltocomputethesimilaritybe- experimentsinsideobservationaldata. Themostcommonlyused
tweenposts.Asthegoalofthematchingistopairpoststhat“look matchingmethodisPropensityScoreMatching(PSM)[24],which
similartoahumanreader”,thisqualitativeanalysisisimportantto aims to approximate a complete random experiment. PSM first
understandwhetherfeatureshavesucceededtoadequatelyidentify builds a model to predict the probability of a particular user to
similartexts. receive the treatment. Users are then matched according to their
The highest cross-validated AUC of 0.62 was obtained for 37 probabilityofreceivingthetreatment.However,recentlyKingand
variables. However, inordertofurtherreducethedimensionality Nielsen[24]showedthatthismethodissuboptimalandthatPSM
of the space used for matching, we chose features from a model can,undercertaincircumstances,evenincreasethebiasinthedata.
withslightlylowerAUC(0.61)butwhichusedonly20variables. Here,weapplyamatchingdistanceapproach(MDA)[31],which
Intermsofqualitativeanalysis,bothmodelsfaredsimilarlywithout aimstoapproximateafullyblockedexperiment[21]. Forthiswe
anynoticeabledifference. measurethecosinedistanceamongtheobservationsbasedontheir
The 20 variables that “survived” the shrinkage were: 5 LIWC covariates. Treatedunitsarematchedtotheirnearestcontrol, as-
categories (negative emotions, anger, sexual, reward and work), suming they pass a predefined similarity threshold, a.k.a. caliper.
Ideally this similarity threshold should be as close as possible to processingstep:thematchingreducesimbalancebetweenthetreated
1,barringconstraintsrelatedtodatasparsity. Tofindanappropri- and control groups in terms of the covariates used for matching.
atevalue,wegraduallyincreasethevalue,startingfrom0.9,until Hence,theremainingunprunedobservationsaresimilarexceptfor
weareabletoobservethreeconditions: (i)thematchedpostsare thetreatmentcondition,andthetreatmentconditioncanbeusedas
similarenough(basedonaqualitativeanalysis),(ii)treatmentand anindependentvariableintheSobeltest.
controlgroupsarebalanced(seeStep4),and(iii)resultsarestatis-
ticallysignificant. Weallowone-to-manymatches,i.e.,wematch
5. RESULTS
withreplacement.
Pruningtheunmatchedobservationsmakesthecontrolvariables Inthissectionwepresenttheresultsofthecausalinferenceanal-
matter less. In other words, it breaks the link between the con- ysisweconductedtomeasuretheeffectofreceivingsocialsupport
foundingandthetreatmentvariable,consequentlyreducingtheim- forthefirstpostausershareswiththecommunity. InFigure5the
balance,modeldependence,researchdiscretionandbias. starsindicatethesignificancelevelsforapermutationtest,withthe
Step4-Balancecheck-Onenecessaryconditionforasuccess- numberofasteriskscorrespondingtothep-values,***for0.1%,**
fulapplicationofamatchingmethodologyisabalanceoftheco- for1%,and*for5%.Inapermutationtest,thelabels(C)ontroland
variates. If,say,oneLDAtopicwasmorestronglypronouncedin (T)reatmentarerepeatedlyrandomlyshuffledandforeach(fake)
thetreatmentgroupthaninthecontrolgroupthenthisimbalance, control-treatmentassignmenttheeffectsizeismeasured. Thesig-
ratherthananycausaleffect, couldleadtoanapparenttreatment nificancelevelindicatesthefractionofpermutationsthatleadtoan
effect.Toassessifthetreatmentandcontrolgroupsaresufficiently effectsizebiggerthantheoneactuallyobserved.[17].
balancedafterthematching,wecheckthestandardizedmeandif-
Userengagement
ference[1]foreachconfoundingvariablec. Foracontinuousco-
variate,thestandardizedmeandifferenceisdefinedas: Westartbyanalyzingtheeffectofreceivingsocialsupportonthe
probability of a user to return to the community to comment or
dc= (x¯(cid:113)treast2tmreeantmte−nt+x¯s2ccoonntrtorlol) (2) pGorsotuapga1in(.seFeoSretchtiisonan3a)l.yAsimswonegutsheodsethuese2r5s,,61487,0u0s0errsecperievseedntthine
2 treatment(atleast4comments)and7,647didnot(control).
where x¯ and x¯ denote the mean of the covariate in Weappliedourone-to-manymatchingapproachwithasimilar-
treatment control
thetreatmentandcontrolgroups,respectively. s2 and s2 itythresholdof0.965toensurethatforGroup1ourmethodwas
treatment control
denotethecorrespondingsamplevariances. matchingsimilarenoughpostsandbalancingthegroups(seeFig-
The standardized difference compares the difference in means ure4).Thematchingproduced14,570similarpairs(14,570unique
in units of the pooled standard deviation. It is not influenced by treatmentand5,279uniquecontrolusers).Ourresultsindicatethat
samplesizeandallowsforthecomparisonoftherelativebalance. receivingsocialsupportincreasestherelativeprobabilityofauser
Theremainingbiasfromaconfoundingvariablecisconsideredto returningtothecommunitybyroughly66%(seeFigure5,redbar).
beinsignificantifd issmallerthan0.1[27]. Thisanalysisisstatisticallysignificantat0.1%. Theseresultspro-
c
Step 5 - Effect size estimation - After showing that any con- videevidenceforhowimportantsocialsupportistoincreaseuser
foundingbiashasbeensufficientlyeliminated,wecanestimatethe engagement,whichisassociatedwithbetterchancesofobtainsuc-
effectoftreatmentonthematchedtreatedandcontrolunits. Here cessinweightlossprograms[30].Notethatthiseffectsizesisbig-
foragivenmatchingoftreatedandcontrolunits,wecomputethe gerthantheonereportedinasimilarstudy[10].Theretheauthors
estimatedaveragetreatmenteffect(EATE). usedadifferentdefinitionofcontrolandtreatmentgroupbasedon
rankingthepostsbythenumberofcommentsreceivedthangetting
(cid:80)N (Yi(1)−Yj(0))∗100 “top40%”vs.“bottom40%”comments, ratherthanour“atleast
EATE= i=1,j=1 Yj(0) (3) 4”vs.“atmost3”.
N
Mediation test - Though matching methods can shed light on
whetherachangeinthetreatmentconditionTlikelycausesachange
inthedependentvariableY,matchingmethodsdonotprovidein-
sightsintowhether(i)thiscausalrelationshipis“direct”,orwhether
(ii)itisbeingmediatedbyanothervariableM.Inourcase,receiv-
ingsocialsupportmightleadtoanincreasedengagementwiththe
communitywhich,inturn,isresponsibleforanincreaseinweight
losssuccess. Thustheweightlosssuccesswouldbemediatedby
anincreasedengagementwiththecommunity.
Mediationanalysisistheprocessofdeterminingwhetherornot
variablesactingasanin-betweenstep,calledmediators,arepresent
whenlookingattherelationshipbetweenanindependentvariableT
(herethetreatmentconfidtion)andadependentoutcomevariableY.
Asaresult,whenthemediatorisincludedinananalysismodelwith
theindependentvariable,theeffectoftheindependentvariableis
reducedandtheeffectofthemediatorremainssignificant[6].
Toverifyifanyvariableplaystheroleofamediatoranditssig-
Figure4: Standardizeddifferenceofmeansforeachconfounding
nificanceintherelationshipofsocialsupportandweightloss,we
variable(Group1). Notethatafterthematchingallthevaluesare
apply the Sobel test [33]. The Sobel test assesses the statistical
below0.1,thusthegroupsarebalanced.
significanceoftheindirecteffect.
Notethatsuchanalysisworksnaturallywithmatchingasapre-
Figure6: Standardizeddifferenceofmeansforeachconfounding
Figure5:Theeffectsizeforthefactorsanalyzed.
variable(Group2). Notethatafterthematchingallthevaluesare
below0.1,thusthegroupsarebalanced.
Toshowthatthematchingapproachindeedmatchessimilarposts,
wepresentinTable2partsofapairofpostsmatchedaccordingto
behaviors were similar. The groups presented a mean number of
ourapproach,thispairhadacosinesimilarityof0.97.
updates of 1.75±3.93 (treatment) and 1.60±2.70 (control), but
this difference was not statistically significant, (p=0.16). As the
Treatment: Hi guys, new here, I’m on a low carb and dairy
twogroups’badgeupdatingbehaviorsaresimilaranditdoesnot
(pale)diet,butrecentlyijustwentonavacation,Iate..-Wrap
seemtoeffectouranalysis.
with grilled chicken, lettuce, and a small amount of buffalo
Wealsoexperimentedwithdifferentdefinitionsforthetreatment
sauce,onwholegrainwrap,Banana,Apple,Orange,andplain
cutofftoseeifthereisaneffectof“diminishingreturns”: receiv-
oatmeal - Tossed salad with grilled chicken. Is this healthy
ing at least one comment (vs. none) could have a bigger impact
eating on the pale diet? I also did not exercise, but we did
thanreceiving10(vs.9orless). Figure7presentstheeffectsize
somewalkingaround...
fordifferenttreatmentdefinitions, althoughwecannotguarantee
Control: Hey everyone, I’m new to loseit. I’m starting my
thebalancepropertyforallthecutoffs, theseresultsshowthatas
firstworkout/dietroutineever.MydoctorsaysIneedtogetmy
expectedwhenweincreasethecutofftheeffectsizedrops.
cholesterolundercontrol.Sofar,I’vebeendoing30minutesof
cardioandtakingcareofmydiet. I’vecutoutsoda,beer,and
red meat. I’ve switched to skim milk, olive oil, whole grain,
andbrownrice...
Table2:Partsofsimilarpostsmatched.
Weightloss
Nextweinvestigatedtheeffectofreceivingsocialsupportonweight
loss. Forthisanalysisweusedthesetof6,143usersinGroup2
(seeSection3),amongthoseusers4,657receivedatleast4com-
ments(treatmentgroup)and1,486didnot(controlgroup). Here,
toensuresimilarenoughpostsandbalancedgroupsinthematch-
ingforGroup2,weusedasimilaritythresholdof0.955. Figure5
(greenbar)showsthatreceivingsocialsupportinthefirstpostleads
toarelativeincreaseintheachievedweightlossof26%,oranab-
solutemeandifferenceof9lbs.Theobservedeffectsizeisstatisti-
callysignificantandafterapplyingthebalancecheck(seeFigure6) Figure7:Effectsizefordifferenttreatmentdefinitions.
weconfirmedthatallcovariateswerebalancedbetweenthecontrol
andtreatmentgroups.
Mediationtest
However,notethatthefrequencywithwhichpeopleupdatetheir
badgesmayinterfereinthisanalysis. Maybeuserswhodonotget After estimating the causal effect of receiving social support on
commentsdonotupdatetheirbadgesasoften,eveniftheyloseas weightloss,wefocusoncheckingifcertainvariablesthatwerenot
muchweightasothers. Inotherworse,receivingsocialfeedback includedinthesetofcovariatescouldactasamediator,explaining
might simply lead to more active “profile management” than to partoftheobservedeffectofsocialsupportonweightloss. Con-
moreweightloss.Totestthisalternativeexplanation,wecomputed ceptuallyandbasedonpriorwork,receivingsocialfeedbackcould
thenumberofbadgeupdates(everychangeinthebadgeinforma- cause the effected user to (i) show a higher activity level in the
tion)forusersinthetreatmentandcontrolgroups.Afterwards,we community,and(ii)remainlongerinthecommunity.Thereforewe
ranapermutationtesttocheckifthetwogroups’badgeupdating firstcheckif,indeed,receivingacommentonauser’sfirstposthas
aneffectonthesevariablesand, ifyes, ifthiseffectmediatesthe a mechanism where posts that do not are brought to the modera-
observedeffectonthereportedweightloss. tors’attentionsothattheycanprovideanadequatereply. Itcould
Figure 5 shows that receiving social support also has an effect alsobeworthwhiletoconstructa“positivitybot”whichprovides
over the users’ lifespan (i.e., the difference in days between the non-genericpositivefeedbackonpostsoverlookedbythecommu-
dateoftheirlastandfirstactivityobservedinthecommunity)and nity[42]. Otherresearchersareexploringthecreationofaframe-
theofnumberactivities(i.e.,thesumofthenumberofcomments work to allow formal testing of theories of different moderation
andposts). TheeffectsizeforlifeSpanis6%(seeFigure5; blue styles.8 Ourresearchcontributesbyprovidingatheorytotest.
bar),butthiseffectwasnotstatisticallysignificant,forthenumber Ethical considerations. For our study, we used only publicly
ofactivitiestheeffectsizewas43%(seeFigure5;yellowbar). availabledatathatuserschosetopostonline. Allanalysisisdone
Sincetheobservedeffectonthelifespanissmall,weestimated inaggregateandwedonotpostresultsforanyindividual. How-
if the social support has an effect on the users’ weight loss rate ever,asisoftenthecasewithsuchdatacollection,usersmightnot
(i.e., the weight loss in lb divided by the lifespan). As the rate beawareofthefactthattheyarebeingstudiedbyresearchers.Toat
is an unstable estimate for users who are only active for one or leastpartlyalleviatesuchconcerns,wereachedoutthemoderators
twodaysinthecommunity,wechosetolookatthemedianrather toinformthemaboutourstudy. Theirreactionwasverypositive
than the average effect. Concretely, we computed the median of (“Wow, I’m really looking forward to it”) and they also pointed
theindividualpairedratiosof(weightlossratetreatedindividual ustothecommunitysurvey[12]thatwehadpreviouslybeenun-
/weightlossratecontrolindividual). Wethenusethemedianof awareof.Oncefinalized,wewillshareourfindingswiththeloseit
thesemediansasanestimateoftheeffectsize.Asexpectedthereis communitytoencourageapositiveatmosphereand,inparticular,
aneffectontheweightlossrate,whereusersthatreceivedatleast ensureawarmwelcomeofnewmembers.
4 comments (treatment) lose weight roughly 35% faster than the Whattypeofsocialsupportmatters? Onecouldalsotryand
onesthatdidnotreceived(0.48lb/dayvs.0.35lb/day). extractthetopicsortonefromthecommentsonagivenposttosee
Finally,weappliedaSobeltesttoverifyiflifespanandnumber ifparticulartypesofcommentshavealargereffectonthereported
ofactivitiesactasmediatorintherelationshipofthesocialsupport weightloss. This,however,comeswithendogeneityproblemsas
andweightloss,i.e.,iftheyexplainasignificantpartofthecausal thetypeofcommentsreceivedislikelycorrelatewiththesubject
effect of social support on weight loss. The results of the Sobel matterofthepost. Givenlargeenoughdatasetsonecouldhopeto
testshowedthattheproportionoftheeffectofsocialsupportover correctforthisusingourmatchingframeworkwherethetreatment
weight loss due to lifespan and the number of activity is small – is no longer binary – receiving a comment or not – but is multi-
5.6%and3.4%respectively. However,theseresultswerenotsta- variate.Wechosenottoexplorethisrouteduetosparsityconcerns.
tisticallysignificantevenatp=0.1.Surprisingly,amongtheusers Wedid, however, experimentwithusinganothertypeofsocial
thatreturntopostagain,thedifferenceinachievedweightlossdoes feedbackbasedonvotes: Reddithasavotingsystemwithup-and
notseemtobelinkedtoeitherlifespanorengagementinthecom- down-votes and an aggregate “score” combining these two types
munity. Rather, the rate of weight loss seems to be effected for of votes, positive - negative, can be obtained via the API. In our
thoseusersreturningtothecommunity. dataset, thisscorewasnevernegative. Usingthesamematching
setup (see Section 4), this gave an effect size of 16% to users to
6. DISCUSSION returntothecommunityand45%toweightloss(p < 0.01). The
balance condition also held for this experiment. Though most of
Doesweightlossequalsuccess?Ouranalysiscruciallyassumes
the limitations discussed below also apply to this setup, the fact
thatmembersoftheloseitcommunitywanttoloseweight. Ifthat
thata“similar”effectisobservedforadifferentdefinitionofsocial
wasnotthecasethentalkingabout“weightlosssuccess”wouldbe
feedbackindicatesthatourresultsmightholdmoregeneral.
meaningless. However, results from a recent survey of the loseit
Whobenefitsmostfromsocialsupport? Allofourresultsare
community [12] indicate that 91% of the respondents were cur-
aggregatesindicatingthat,onaverage,usersseemtobenefitfrom
rently trying to lose weight, with another 7% trying to maintain
receivingsocialsupportintheformofcomments. Forfuturework
their weight. Therefore, it seems adequate to consider a higher
it would be interesting to look into what type of users are most
levelofweightlossasadesirableoutcome.
orleastlikelytobenefitfromsuchsupport,forexamplelookingfor
Qualitativeevidence.Thoughouranalysisisdeliberatelyusing
gender-specificeffects.Thoughgenderisnotanattributeofauser’s
quantitative methods, there is also qualitative evidence to further
profile,itcansometimesbeinferredfromtheirposts(“I’mamother
supporttheclaimthatsocialsupportreceivedinthecommunityef-
oftwo...”)orfromsharedprogresspictures.Insomecasesauser’s
fectsweightloss. Theaforementionedcommunitysurvey[12]in-
chosenscreennamesuchas“john123”or“mary456”alsoprovide
cludesthequestion“Whatdoyoulikeabout/r/loseit?”.Thetopics
hints. Similarly,onecouldlookforcasesofusersindicatingtheir
mostemphasizedbythesurveyparticipantswererelatedtoterms
startingweight, ratherthanjusttheweightloss, tostudywhether
suchas“community”,“people”,“supportive”and“support”. Sim-
theeffectofsocialsupportistiedtoauser’sinitialweight.
ilarly, onecaneasilyfindpostsexplicitlyacknowledgingtheper-
Impactofpruningoneffectsize. Westartedouranalysiswith
ceived importance of the social support such as: “I have visited
theassumptionthat, aswehadpreviouslyobservedfortheeffect
this page almost daily over the past 15 months, and it was really
onreturn-to-postprobability[10],pruningviamatchingwouldlead
helpfulinkeepingmymotivation. Ihopethismayprovidesimilar
toalowerestimateoftheeffectofsocialfeedbackonweightloss
motivationtothosejuststarting!”.
whencomparedtotheunmatchedanalysis. Intuitively,matching,
Designsimplications. Ourfindingssuggestthatwhetherornot
and hence pruning, should reduce the effect of confounding user
auserreceivesfeedbackontheirinitialpostaffects(i)theirproba-
variablessuchas“positiveoutlookonlife”whichmightaffectwrit-
bilitytoreturntothecommunity,and(ii)giventhattheyreturn,the
amountofweightlosstheyreport. Thereforemechanismsthatin-
8See https://civic.mit.edu/blog/natematias/
creasethelikelihoodtoreceivea“warmwelcome”areexpectedto
reddit-moderators-lets-test-theories-of-moderation-together
leadtomoreengagementwiththecommunityandtobetterhealth for thoughts by Nathan Matias’ and https://np.reddit.com/r/
outcomes. Fortunately,thevastmajorityofinitialposts(96%)al- TheoryOfReddit/comments/456503/want_to_test_your_theories_
readyreceiveatleastonecomment. Itcouldbeworthconsidering of_moderation_lets/forasubredditonthetopic.
ingstyleandhaveapositiveeffectonbothreturn-to-postbehavior asymptoticallyconvergetothesameresultasthematchedpairsbe-
aswellasonweightlosssuccess.However,weobservedtheoppo- comemoreandmoresimilar,beingidenticalonallthecovariates
site:theraweffectsizefortheunmatcheddatawas22%(treatment inthelimitingcase.Wethereforedidnotincludeexperimentswith
cutoffof4comments)whereasitwas26%forthematchedanalysis highersimilaritythresholdsbecauseofissuesofdatasparsity.
(treatmentcutoffof4commentsandsimilaritythresholdof0.955). Concerningthecovariatesused,webelieveourstatisticalmethod
Inparticular,userswhoweretreatedandwho,eventually,lostless isareasonablechoice. Introducingtoomanyadditionalcovariates
than the median weight loss were pruned more often (27%) than can lead to problems of high dimensionality when attempting to
theirtreatedcounter-partswholostalotofweight(23%). Atthe match similar posts along, say, hundreds of dimensions. In such
sametimefortheuntreatedusersthedifferencesinpruningrates settingsitbecomesdifficulttobalanceallthecovariatesconsidered.
for less-than-median (28%) and more-than-median (28%) weight Furthermore, many other potential covariates should be balanced
lossweresmall.Thoughthisissurprising,itactuallyhelpstomake atleast“onaverage”, iftheyarecorrelatedwiththecovariatesin
theoverallclaim,i.e.,thatsocialfeedbacksupportsweightlossin the final model. Still, the exclusion of unknown but potentially
anonlinecommunity,morerobust. crucial covariates is always a concern when applying a matching
methodology. We hope that other researchers will validate – or
7. LIMITATIONS invalidate–ouranalysisusingtheirownsetofcovariates.
Limitations of observability of returning users. Our main
Limitationsofusingbadgestotrackweightloss. Toinfera
analysis relies on users returning to the loseit community to post
user’s weight loss success or failure we are currently relying on
orcommentagainaftertheirinitialpost. Assumingtheuseralso
thebadgesusedintheloseitcommunity. Thesebadgesonlycap-
hasbadgeinformation,thenthisseconddatapointprovidesanesti-
tureself-reportedweightlossprogress. Thefirstissueimposesan
mateofboththeabsoluteweightloss(inthebadge)andtheweight
importantlimitationas, onecouldimagine, receivingsocialfeed-
loss rate (in the difference of time stamps). This means that we
back leads to a heightened sense of self-awareness and a feeling
cannotmakeanystatementsconcerningweightlossforuserswho
of“beingwatched”inthecommunity. Thoughthiscouldleadto
doreturntothecommunityforasecond,publicactivity. Though
positivepeerpressure,itcouldalsoincreasetheprobabilityofover-
ourmainresultsareconditionalonusersreturning,wealsoobserve
reportingweightlossprogress.Badgesarealsoalwaysvisiblydis-
socialfeedbackleadingtoanincreaseinreturnprobability(seeFig-
played next to a user’s screen name, increasing the likelihood of
ure5).However,theissueofpredictingwhowillorwillnotreturn
socialsignalingeffectingtheiruse.Onesolutiontothisissuecould
toanonlinecommunityhasbeenstudiedbefore[10,9]andsowe
be to perform a similar analysis using auto-generated weight up-
didnotanalyzethisfurther.
dates from smart scales as used by Wang et al. [44]. Such data
sourcesarelesslikelytobepronetomisreportingerrors.
8. CONCLUSIONS
Toavoidatleastsomeofthelimitationsintroducedbyusingthe
badgestoinferauser’sweightloss,wealsoperformedanalysisona Wepresentedananalysisoftheeffectofreceivingsocialfeed-
different,smallersetofuserswhoexplicitlyreporttheirweightloss back in the form of comments on the reported weight loss in the
progressintheirposts. Thiscanbeeitherbyusingthecommunity /r/loseit community on Reddit. Correcting for confounding fac-
conventionsof“SW/CW/GW”forstarting/current/goalweight,or torsthroughamatchingmethodology,userswhoreceiveatleast4
through posting things such as “I’ve lost 10lbs”. In this analysis commentsontheirfirstpostinthecommunitywere(i)66%more
thetreatmentgrouphad1,421usersandthecontrolgroup36users. likelytoreturnforafutureactivityinthecommunity,and(ii)con-
Duetodatasparsity,weonlyappliedthethresholdof0.7. Though ditionalontheuserreturning, thosewhohadpreviouslyreceived
pronetootherlimitations,thisalternativewayofinferringweight feedbackendupreportingonaverage9lbmoreinweightloss.For
lossledtoqualitativelysimilareffectsizesof9.5%. thesereturningusers, thiseffectisnotmediatedbyneither(i)an
Limitationstodeterminingthestartdateofweightlossjour- increased level of activity in the community, nor by (ii) a longer
ney. Ouranalysis,especiallythatrelatedtotherateofweightloss lifespaninthecommunity. Thoughobservationalstudieshavein-
(Section5),assumesthatauser’sweightlossjourneystartstheday herentlimitationsoncausalinference,ourworkhelpstoillustrate
theyfirstannouncethemselvestothecommunityintheformofa theimportanceofreceivingfeedbackinonlinesupportforums,in
post. However,inpractice,usersmightwellfirstobservethecom- particularforusersnewtothecommunity.
munity before deciding to post. This means that the actual rate
ofweightlossislikelytobelower,asthetimeperiodoverwhich 9. REFERENCES
theweightlossisachievedislonger. Amoresubtleissuerelated
[1] P.C.Austin.Anintroductiontopropensityscoremethodsfor
to this passive use of the support community is that it could af-
reducingtheeffectsofconfoundinginobservationalstudies.
fectthewritingstyle. Putsimply,userswhohavebeenfollowing
Multivariatebehavioralresearch,46(3):399–424,2011.
thecommunityforawhilemight(i)havea“headstart”asfaras
weightlossisconcerned,and(ii)theymightwriteinastylemorein [2] L.Backstrom,R.Kumar,C.Marlow,J.Novak,and
tunewiththecommunitywhich,inturn,couldleadtomoresocial A.Tomkins.Preferentialbehaviorinonlinegroups.In
Proceedingsofthe2008InternationalConferenceonWeb
feedback. If these stylistic differences are not represented in the
extractedcovariates,thiscouldleadtoanoverestimateoftheeffect SearchandDataMining,pages117–128.ACM,2008.
size.Thoughwecannotcompletelyruleoutthispossibility,webe- [3] P.BallantineandR.Stephenson.Helpme,I’mfat!Social
lievethattheeffectsizesarelargeenoughtomakeitunlikelythat supportinonlineweightlossnetworks.J.ofConsumer
theyarefullyexplainedbythishiddenadaptiontothecommunity. Behaviour,2011.
Limitationsofourmatchingapproach.Whenapplyingamatch- [4] L.Black,J.Bute,andL.Russell.CasesonOnline
ingapproach,thereareanumberofchoiceoneneedstomakesuch DiscussionandInteraction:ExperiencesandOutcomes,
as(i)theselectionofcovariates,(ii)howtonormalizeandpoten- chapter"TheSecretIsOut!":SupportingWeightLoss
tiallyweightdifferentcovariates,(iii)whichdistancemetrictouse throughOnlineInteraction.2010.
and whether to use “blocking”, and (iv) which similarity thresh- [5] D.M.Blei,A.Y.Ng,andM.I.Jordan.Latentdirichlet
oldtochoose. Ofthese, anychoicesfor(ii), (iii)and(iv)should allocation.J.Mach.Learn.Res.,3,2003.
[6] B.Boone.Mediationanalysis:Expandingfromonemediator [27] D.Mimno,H.M.Wallach,E.Talley,M.Leenders,and
tomultiplemediators.2012. A.McCallum.Optimizingsemanticcoherenceintopic
[7] M.Burke,C.Marlow,andT.Lento.Feedme:Motivating models.InProceedingsoftheConferenceonEmpirical
newcomercontributioninsocialnetworksites.InCHI,2009. MethodsinNaturalLanguageProcessing,2011.
[8] J.Cheng,C.Danescu-Niculescu-Mizil,andJ.Leskovec. [28] A.Olteanu,O.Varol,andE.Kıcıman.Towardsan
Howcommunityfeedbackshapesuserbehavior.arXiv open-domainframeworkfordistillingtheoutcomesof
preprintarXiv:1405.1429,2014. personalexperiencesfromsocialmediatimelines.In
[9] M.D.ChoudhuryandS.De.Mentalhealthdiscourseon ICWSM,2016.
reddit:Self-disclosure,socialsupport,andanonymity.In [29] K.Park,I.Weber,andetal.Persistentsharingoffitnessapp
ICWSM,2014. statusontwitter.InCSCW,pages184–194,2016.
[10] T.Cunha,I.Weber,andetal.Theeffectofsocialfeedbackin [30] K.Patrick,K.Calfas,andetal.Outcomesofa12-month
aredditweightlosscommunity.InDigitalHealth,2016. web-basedinterventionforoverweightandobesemen.Ann
[11] M.DeChoudhury,E.Kiciman,andetal.Discoveringshifts BehavMed,2011.
tosuicidalideationfrommentalhealthcontentinsocial [31] D.RubinandE.Stuart.Affinelyinvariantmatchingmethods
media.InCHI,2016. withdiscriminantmixturesofproportionalellipsoidally
[12] denovosibi.[results]winter/spring2016usersurvey,2016. symmetricdistributions.Statistics,2006.
https://www.reddit.com/r/loseit/comments/4acxxn/results_ [32] D.B.Rubin.Multivariatematchingmethodsthatareequal
winterspring_2016_user_survey/. percentbiasreducing,i:Someexamples.Biometrics,1976.
[13] V.L.DosReisandA.Culotta.Usingmatchedsamplesto [33] M.E.Sobel.Somenewresultsonindirecteffectsandtheir
estimatetheeffectsofexerciseonmentalhealthfromtwitter. standarderrorsincovariancestructuremodels.Sociological
InProceedingsoftheTwenty-NinthAAAIConferenceon methodology,16:159–186,1986.
ArtificialIntelligence,pages182–188,2015. [34] J.SongandK.Chung.Observationalstudies:cohortand
[14] J.Eschler,Z.Dehlawi,andW.Pratt.Self-characterized case-controlstudies.Plasticandreconstructivesurgery,
illnessphaseandinformationneedsofparticipantsinan 2010.
onlinecancerforum.InICWSM,2015. [35] E.A.Stuart.Matchingmethodsforcausalinference:A
[15] A.F.Fontana,R.D.Kerns,andetal.Support,stress,and reviewandalookforward.Statisticalscience:areview
recoveryfromcoronaryheartdisease:alongitudinalcausal journaloftheInstituteofMathematicalStatistics,2010.
model.HealthPsychol,8(2):175–193,1989. [36] C.F.Sullivan.Genderedcybersupport:Athematicanalysis
[16] S.Grav,O.Hellzèn,andetal.Associationbetweensocial oftwoonlinecancersupportgroups.Journalofhealth
supportanddepressioninthegeneralpopulation:thehunt psychology,8(1):83–104,2003.
study,across-sectionalsurvey.J.ofclinicalnursing,2012. [37] A.Tamersoy,M.DeChoudhury,andD.H.Chau.
[17] T.Hesterberg,D.S.Moore,S.Monaghan,A.Clipson,and Characterizingsmokinganddrinkingabstinencefromsocial
R.Epstein.Bootstrapmethodsandpermutationtests. media.InHypertext,2015.
IntroductiontothePracticeofStatistics,5:1–70,2005. [38] P.J.Teixeira,E.V.Carraça,andetal.Successfulbehavior
[18] J.M.Hutchesson,Y.C.Tan,andetal.Enhancementof changeinobesityinterventionsinadults:asystematicreview
self-monitoringinaweb-basedweightlossprogrambyextra ofself-regulationmediators.BMCmedicine,13(1):1,2015.
individualizedfeedbackandreminders:Randomizedtrial.J [39] M.Thiese,G.Moffitt,R.Hanowski,andetal.Commercial
MedInternetRes,18(4):e82,Apr2016. DriverMedicalExaminations:PrevalenceofObesity,
[19] C.J.HuttoandE.Gilbert.VADER:Aparsimonious Comorbidities,andCertificationOutcomes.J.Occup.
rule-basedmodelforsentimentanalysisofsocialmediatext. Environ.Med.,2015.
InICWSM,2014. [40] F.TsapeliandM.Musolesi.Investigatingcausalityinhuman
[20] K.O.Hwang,A.J.Ottenbacher,andetal.Socialsupportin behaviorfromsmartphonesensordata:aquasi-experimental
aninternetweightlosscommunity.Internationaljournalof approach.EPJDataScience,4(1):1–15,2015.
medicalinformatics,79(1):5–13,2010. [41] G.M.Turner-McGrievyandD.F.Tate.Weightlosssocial
[21] K.Imai,G.King,andE.A.Stuart.Misunderstandings supportin140charactersorless:useofanonlinesocial
betweenexperimentalistsandobservationalistsaboutcausal networkinaremotelydeliveredweightlossintervention.
inference.Journaloftheroyalstatisticalsociety,2008. TranslationalBehavioralMedicine,3(3):287–294,2013.
[22] G.J.JohnsonandP.J.Ambrose.Neo-tribes:Thepowerand [42] J.vanderZwaan,V.Dignum,andetal.Aconversation
potentialofonlinecommunitiesinhealthcare. modelenablingintelligentagentstogiveemotionalsupport.
CommunicationsoftheACM,49(1):107–113,2006. InModernAdvancesinIntelligentSystemsandTools.2012.
[23] G.King,C.Lucas,andR.Nielsen.Thebalance-samplesize [43] M.Wang,L.Pbert,andS.Lemon.Influenceoffamily,friend
frontierinmatchingmethodsforcausalinference.PS: andcoworkersocialsupportandsocialunderminingon
PoliticalScienceandPolitics,42:S11–S22,2014. weightgainpreventionamongadults.Obesity,2014.
[24] G.KingandR.Nielsen.Whypropensityscoresshouldnot [44] Y.Wang,I.Weber,andP.Mitra.Quantifiedselfmeetssocial
beusedformatching.http://j.mp/1sexgVw,2015. media:Sharingofweightupdatesontwitter.InDH,2016.
[25] H.Lett,J.Blumenthal,andetal.Socialsupportandcoronary
heartdisease:epidemiologicevidenceandimplicationsfor
treatment.Psychosomaticmedicine,2005.
[26] R.Mermelstein,S.Cohen,andetal.Socialsupportand
smokingcessationandmaintenance.Journalofconsulting
andclinicalpsychology,54(4):447,1986.