Table Of ContentIntelligence48(2015)85–95
ContentslistsavailableatScienceDirect
Intelligence
Themagicalnumbers7and4areresistanttotheFlynneffect:No
evidence for increases in forward or backward recall across 85
years of data
GillesE.Gignac
SchoolofPsychology,UniversityofWesternAustralia,35StirlingHighway,Crawley,WesternAustralia,6009,Australia
a r t i c l e i n f o a b s t r a c t
Articlehistory: A substantial amount of empirical research suggests that cognitive ability test scores are
Received5July2014 increasingbyapproximatelythreeIQpointsperdecade.Theeffect,referredtoastheFlynneffect,
Receivedinrevisedform23September2014 hasbeenfoundtobemoresubstantialonmeasuresoffluidintelligence,aconstructknowntobe
Accepted3November2014 substantiallycorrelatedwithmemoryspan.Miller(1956)suggestedthatthetypicalshort-term
Availableonlinexxxx
memorycapacity(STMC)ofanadultisseven,plusorminustwoobjects.Cowan(2005)suggested
thatthetypicalworkingmemorycapacity(WMC)ofanadultisfour,plusorminusoneobject.
Keywords: However,thepossibilitythatbothSTMCandWMCtestscoresmaybeincreasingacrosstime,in
Flynneffect linewiththeFlynneffect,doesnotappeartohavebeentestedcomprehensivelyyet.Basedon
Short-termmemory
DigitSpanForward(DSF)andDigitSpanBackward(DSB)adulttestscoresacross85yearsofdata
Workingmemory
(respectiveNsof7,077and6,841),themeanadultverbalSTMCwasestimatedat6.56(±2.39),
Miller’slaw
andthemeanadultverbalWMCwasestimatedat4.88(±2.58).NoincreasingtrendintheSTMC
or WMC test scores was observed from 1923 to 2008, suggesting that these two cognitive
processesareunaffectedbytheFlynneffect.Consequently,iftheFlynneffectisoccurring,itwould
appeartobeaphenomenonthatiscompletelyindependentofSTMCandWMC,whichmaybe
surprising,giventheclosecorrespondencebetweenWMCandfluidintelligence.
©2014ElsevierInc.Allrightsreserved.
1.Introduction test score comparisons due to changes in test items and
administrationacrosseditions(Kaufman,2010),changesinthe
Oneofthemostsensationalscientificobservationsinthe rateofhumancognitivedevelopmentinboththeyoungand
areaofcontemporaryintelligenceresearchisthatintelligence the elderly (Parker, 1986), changes in standard deviations
testscoreshaveincreasedsinceabout1930(Flynn,2012;Lynn, (Rodgers,1998),aswellastheabsenceoffactorialinvariance
1982). The reported effect is not small, as it corresponds associated with intelligence battery test scores Must, te
to approximately three IQ points per decade (Flynn, 2007; Nijenhuis,Must,&vanVianen,2009;Wichertsetal.,2004).
Neisser,1998).Furthermore,theconsequencesarenotnegli- Consequently,thepurposeofthisinvestigationwastoexamine
gible,asFlynn(1987)contendedthatthe“…gainssuggestthat theFlynneffectonseveralnormativesamplesattheobserved
IQtestsdonotmeasureintelligencebutratheraweakcausal scorelevelonpossiblytheonlysubtestofintellectualfunction-
linktointelligence”(p.190).Theprecisenatureandcausesof ingthathasessentiallynotchangedforoveracentury:Digit
the“Flynneffect”remainenigmatic(Williams,2013).Further- Span.AsDigitSpanincorporatesbothforwardandbackward
more,anumberoflimitationsassociatedwithstudiessupport- recall items, an additional purpose of this investigation was
iveoftheFlynneffecthavebeenarticulated,includinginvalid to estimate precisely the typical verbal short-term memory
capacity (STMC) and working memory capacity (WMC) of
adults,soastoverifytheproposedvaluesreportedbyMiller
E-mailaddress:[email protected]. (1956;7±2)andCowan(2005;4±1).
http://dx.doi.org/10.1016/j.intell.2014.11.001
0160-2896/©2014ElsevierInc.Allrightsreserved.
86 G.E.Gignac/Intelligence48(2015)85–95
1.1.OverviewoftheFlynnEffect thedigitsneedtoberecalledintheorderwithwhichtheywere
presented,and Digit SpanBackward (DSB), wherethe digits
TheaccumulatedresearchsuggeststhattheFlynneffectis needtoberecalledinthereverseorderwithwhichtheywere
morepronouncedonfluidintelligencetests,incomparisonto presented.
testslikelytobeaffectedbyeducation,suchasvocabularyand Although Digit Span was initially considered a relatively
knowledgeofworldlyfacts(Flynn,2007;Rönnlund,Carlstedt, poor measure of intellectual functioning (Matarazzo, 1972;
Blomstedt,Nilsson,&Weinehall,2013).Inarelativelyrecent Wechsler, 1939), such a position appears to be based more
investigation,Flynn(2009a)reportedongoingIQgains(1943 onpresumptionandclinicalexperience,ratherthanrigorous
to2008)inBritishchildren(5.5to11yearsold)asmeasured statistical evidence (Bachelder & Denny, 1977; Verive &
by the Raven’s Progressive Matrices (Raven, Court, & Raven, McDaniel, 1996). For example, Wechsler (1939) presumed
1986;Raven,Rust,&Squire,2008).Additionally,Flynn(2009b) that there was not a sufficientamountof variabilityin Digit
reportedcontinued(1995–2006)IQincreasesequaltothreeIQ Spanscorestobeahighqualitydiscriminatorofintelligence,
points per decade in adults based on the Wechsler scales. as approximately 90% of the adult population appeared to
Based on an examination of the Seattle Longitudinal Study recallsomewherebetweenfiveandeightdigits.Additionally,
(SLS)database,Schaie,Willis,andPennak(2005)reporteda Wechsler (1939) claimed that both DSF and DSB correlated
Flynneffectequaltoapproximately½ofastandarddeviationin poorlywithotherintelligencesubtestsandcontainedlittleofg.
cognitive ability test scores between birth cohort 1931 and However, Wechsler’s (1939) own reported results do not
birth cohort 1952. As the results were most pronounced for supportsuchaposition.First,basedontheWechsler-Bellevue
inductivereasoning,Schaieetal.(2005)recommendedthatit (Wechsler, 1939) normative sample (ages: 20–34, N=355),
would be insightful to evaluate possible test score changes DigitSpanwasassociatedwithameaninter-subtestcorrela-
acrosstimeinfluidtypecapacitiesmorebasicthaninductive tion of .38, which is comparable to the mean inter-subtest
reasoning. correlationof.44forthewholebattery.Additionally,basedon
Arguably,onesuchrelativelyelementarycognitiveability thesameportionofthenormativesample,Wechsler(1939)
construct is memory span. Individualdifferences in memory reportedthecorrectedsubtest-FSIQcorrelation(areasonable
span (WMC in particular) are known to be correlated proxyofagcomponentloading)associatedwithDigitSpanat
substantially with fluid intelligence. Based on a meta- .51, which, arguably, was not substantially smaller than the
analysis, Kane, Hambrick, and Conway (2005) estimated that averagecorrectedsubtest-FSIQcorrelationof.61.Morerecently,
approximately 50% of the true score variance between WMC basedontheWechslerAdultIntelligenceScale–IV(WAIS-IV;
andfluidintelligenceisshared.BasedontheWAIS-IVnormative Wechsler,2008)normativesample(N=2,200)andabifactor
sample,Gignac(2014)suggestedthatthesharedvariancemay model,Gignac(2014)foundthatDSFandDSBwereassociated
becloserto60%.Thesubstantialempiricalassociationbetween withgloadingsof.46and.58,respectively,whichwouldsuggest
WMC and fluid intelligence is considered an important thatbothsubtestsaremoderateindicatorsofg.Disattenuated
phenomenon,as it hasbeen theorised that WMCis a critical for imperfect reliability in subtest scores, the corrected g
determinant,orratelimitingfactor,intheperformanceoffluid loadingscorrespondedto.51and.64,respectively.Jensenand
intelligence tasks (Carpenter, Just, & Shell, 1990; Fry & Hale, Figueroa(1975)alsofoundthatDSBcorrelatedmoresignifi-
1996).Oberauer,Su,Wilhelm,andSander(2007)proposedthat cantly with g than DSF. Thus, although Digit Span is
theassociationbetweenWMCandfluidintelligenceisembed- certainly not an excellent indicator of g, it is arguably a
dedbythecentralnervoussysteminsuchawaythatonlya fair to good indicator of intellectual functioning, particu-
limited number of bindings can be created to facilitate the larlyDSB.
developmentofnovelrelationalrepresentations.Consequently, DigitSpanhasalsobeenobservedtosharevariancewitha
given the close correspondence between WMC and fluid number of socially important variables. For example, Frank
intelligenceonbothempiricalandtheoreticalgrounds,the (1983) reviewed four studies (seven independent samples)
reported increases in fluid intelligence test scores (Flynn, which examined the association between the Wechsler sub-
2007) would arguably be expected to be associated with scalesandgradepointaverage.DigitSpanwasassociatedwitha
concomitantincreasesinmemoryspan,particularlyWMC. meanvaliditycoefficientof.35,whichwasverycomparableto
themeanvaliditycoefficientof.37acrossall11subtests.Digit
1.2.ThecaseforDigitSpan Spanhasalsobeenfoundtocorrelatewithyearsofeducation
completed(r=.44,Pauletal.,2005;r=.43,Birren&Morrison,
Oneofthemostcommonlyusedtestsofmemoryspanis 1961),readingcomprehension(r=.30;Daneman&Merikle,
DigitSpan(Blankenship,1938;Dempster,1981).Accordingto 1996; Norman, Kemper, & Kynette, 1992), and brain volume
Bronner,Healy,Lowe,andShimberg(1927),DigitSpanwasin (r=.41;Wickett,Vernon,&Lee,2000).Additionally,amongsta
useasearlyas1887.DigitSpan’spopularitywasestablishedby batteryofcognitiveabilitytests,DigitSpanwasfoundtobethe
virtueofthefactthatitwasincludedinbothoftheintelligence best predictor of academic achievement amongst learning-
batteriesthatemergedasthemostpopularintheearly20th problem children (Serwer, Shapiro, & Shapiro, 1972). Digit
century:theStanford-Binet(Terman,1917);andtheWechsler- Spanhasalsobeenfoundtobearespectablepredictorofjob
Bellevue scale (W-B; Wechsler, 1939). Although there are performance(mediumcognitivedemands:r=.51;Verive&
severalslightvariationsoftheDigitSpansubtest,typically,the McDaniel,1996).Finally,MillerandVernon(1992)foundthat
testconsistsofadministeringseveralseriesofsingledigitstobe the association between reaction time and g was mediated
recalled in a particular order. In most cases, the number of by individual differences in short-term memory span. Thus,
digitswithinaseriesrangesfrom3to9.Therearetwocommon in light of theabove, it is likelytenable to suggest that Digit
formsoftheDigitSpantest:DigitSpanForward(DSF),where Spanissomewherebetweenamoderatetogoodindicatorof
G.E.Gignac/Intelligence48(2015)85–95 87
intellectual functioning on both empirical and theoretical sample of twelve individuals, which would arguably not be
grounds(Bachelder&Denny,1977). consideredsufficientlylargetopublishfirmstatementsabout
Although there is some evidence to suggest otherwise themeanlevelofWMCinthebroaderadultpopulation.Infact,
(e.g., Colom, Flores-Mendoza, Quiroga, & Privado, 2005), it muchof the WMC research suffers from the same limitation
is relatively widely recognised that DSB is a better measure as that identified for STMC: small, unrepresentative samples.
of working memory capacity (WMC) than DSF (Hedden & Fortunately,however,DigitSpanisoftenadministeredwiththe
Gabrieli,2004;Oberauer,Süß,Schulze,Wilhelm,&Wittmann, inclusionofbothforward(i.e.,DSF)andbackward(i.e.,DSB)
2000). Working memory (WM) is considered different from spanitems.Consequently,asDigitSpanhasbeenadministered
short-termmemory (STM)inthatWMrequiresthemainte- within a relatively large number of high quality normative
nanceandthemanipulation(ortransformation)ofinformation samples (e.g., Wechsler scales and others), there was the
temporarilyduringcognitiveactivity(Baddeley&Hitch,1974; opportunitytoestimateverypreciselythetypical(i.e.,mean)
Baddeley, 2002; Oberauer et al., 2000). STM, by contrast, is verbalSTMCandverbalWMCofhealthyadults,whichwasa
considered to require only the maintenance of objects in secondarypurposeofthisinvestigation.
memory.Theoretically,DSBisconsideredameasureofWMC, InadditiontothetypicalSTMandWMcapacitiesofadults,
astherequirementtomentallyre-orderthedigitsisconsidered itwasconsideredusefultoestimatetheamountofvariability
a form of cognitive manipulation (Oberauer et al., 2000). in STMC and WMC within the adult population. Miller’s law
Empirically, DSB has also been found to correlate more (i.e., 7 ± 2) suggests that approximately 95% of individuals’
substantiallywithothermeasuresofWMC,incomparisonto STMC lie somewhere between five and nine objects, which
DSF(Redick&Lindsey,2013).Consequently,itwashypothe- impliesthatSTMCisassociatedwithastandarddeviationof
sized that STMC and WMC, as measured by DSF and DSB, approximately1(i.e.,1*±1.96).1Cowan’slaw(4±1)suggests
respectively,wouldevidencesubstantialincreasesacrosstime, thatapproximately95%ofindividuals’WMCliesomewhere
in line with the Flynn effect observed for fluid intelligence betweenthreeandfiveobjects,whichimpliesthatWMCis
measuressuchasRaven’s(Flynn,2007).Furthermore,asDSBis associatedwithastandarddeviationofapproximately.50
abetterindicatorofWMC,itwashypothesisedthattheFlynn (.50 * ±1.96). From a coefficient of variation perspective
effectwouldbemoresubstantialforDSBthanDSF. (SD/M),Miller’slawandCowan’slawimplythatSTMCand
WMCareassociatedwithstandardizedvariabilityestimatesof
1.3.WhatistheAverageMemorySpan? .14(1/7)and.13(.50/4),respectively.Arguably,basedon
thesevalues,theamountofSTMCandWMCvariabilitymay
Inaclassicpaper,Miller(1956)contendedthatthemean be considered rather low, in comparison to other cognitive
maximalnumberofseriallyprocessedobjectsahealthyadult capacities. For example, based on the WAIS-IV normative
canstoreinSTMisseven,plusorminustwo.Arguably,Miller’s samplemeansandstandarddeviationsreportedinBeaujean
(1956) assertion was based on a relatively small amount andSheng(2014),themeancoefficientofvariationassociated
of empirical research and a liberal amount of speculation, withnineoftheWAIS-IVsubtests(45–54yearolds;N=200)
ratherthanacomprehensivequantitativereviewoftheSTMC wascalculatedbymetobe.27(range:.21to.36).Additionally,
research.Despitethis,Miller’s(1956)magicalnumberseven basedonthenormativesamplemeansandstandarddeviations
(plus or minus two) continues to be widely recognised (18–30 year olds) associated with the BIRT Memory and
(Goldstein,2010).Ofcourse,therearesomecriticsofMiller’s Information Processing Battery (BIMPB; Oddy, Coughlan, &
law,withauthorsthatthevaluesseven,plusorminustwo,are Crawford,2007)reportedinBaxendale(2010),listrecalland
too high or too low (Dehn, 2008). Although a substantial design recall were calculated by me to be associated with
amount of STMC empirical research has accumulated since coefficients of variation of .22 and .24, respectively. Conse-
Miller (1956), much of this research has been based on quently,Miller’slawandCowan’slawimplysubstantiallyless
relatively small (N b25), non-representative samples variabilityinSTMCandWMCthanothercognitivecapacities.
(i.e., university students), and somewhat different tasks and Itwillbenoted,however,thatmemoryspanhaslongbeen
scoringprotocols.Consequently,ameta-analysisdoesnotseem suggested to be associated with relatively little variabilityin
feasible in order to estimate, precisely, the mean STMC of humancapacity(Sattler,1982;Wechsler,1939).Furthermore,
healthy adults. However, as described above, one important the lack of variability has been articulated to be a reason to
exceptionistheDigitSpantest,whichhasbeenusedinthearea consider measures such as Digit Span to be relatively weak
of intellectual assessment for over a century (Bolton, 1892; indicatorsofintellectualfunctioning(Matarazzo,1972).How-
Wechsler,2008). ever,thecontentionthatSTMCandWMCareassociatedwith
IncontrasttoSTMC,WMCtestsarewidelyconsideredtobe relativelylowlevelsofvariabilitydoesnotappeartoyethave
moredifficult,astheyrequirethemaintenanceofinformationin beentestedspecificallyacrossanumberofnormativesamples
STM,aswellasthesimultaneousmanipulation(ortransforma- andastandardizedrepresentationofvariability(i.e.,coefficient
tion)ofthatinformation(Oberaueretal.,2000).Alongthelines ofvariation).Thus,asecondarypurposeofthisinvestigation
ofMiller’s(1956)magicalnumberseven,Cowan(2005,2010) wasto estimate precisely the verbal STMC and verbal WMC
proposedthatthemeanmaximalWMCforahealthyadultis means,standarddeviations,andcoefficientsofvariationinthe
fourobjects,plusorminusone.Basedonaseriesofexperiments
withnovicesandexpertsatchess,GobetandClarkson(2004)
argued that Cowan (2010) magical number four was an 1 Thevalueof1.96correspondsto95%ofthestandardnormaldistribution.
Multiplying the standard deviation by the 95% standard normal deviate
overestimatebyoneobject,asthenaturalWMCofhealthy
(i.e.,1.96)ispresumedtocorrespondtothe±2valueassociatedwithMiller’s
adults appeared to be closer to three objects. Although
law.Inthiscase,astandarddeviationofonecorrespondstoa±valueofvery
insightful,GobetandClarkson’s(2004)studywasbasedona nearly2.
88 G.E.Gignac/Intelligence48(2015)85–95
healthyadultpopulation,soastoverifythevaluesproposedby Wechslereditions,aswellasotherpublicationswhichuseda
Miller(1956;7±2)andCowan(2005;4±1). comparableDigitSpanscale,aswillbedescribedfurtherbelow.
InadditiontoBeaujeanandSheng(2014),Daley,Whaley,
Sigman,Espinosa,andNeumann(2003)reportedaDigitSpan
1.4.FlynnEffectandMemorySpan:previousresearch ForwardtestscoreincreaseequaltoaCohen’sd=−.19,based
on two samples of Kenyan children tested between 1984
Investigations which have examined the possibility of (N=118)and 1998 (N=537). Much larger differences were
cognitiveabilitytestscoreincreasesacrosstimetendtohave observedforRaven’sandavocabularytest.Itwillbenotedthat
done so at the aggregate level. For example, Parker (1986) Daleyetal.alsoreportedasubstantialreductionsintestscore
examinedFSIQdifferencesacrosstheW-B,theWAIS,andthe standard deviations across time (25%–30% smaller), which
WAIS-R, but did not report results at the subscale level. suggests that the changes in test scores may have been due
Similarly, Flynn’s (1987) comprehensive investigation was principallytoimprovementsatthelowerendofthedistribu-
based principally upon total scale scores (FSIQ, VIQ, PIQ). In tion.Unfortunately,althoughthe1984samplewassomewhat
ordertohelpunderstandmorefully thenatureoftheFlynn normativeinnature,itwasrathersmallinsize.Furthermore,
effect,recentresearchhasfocussedupontheexaminationof the second sample was essentially a convenience sample,
test score changes at the subscale level (Flynn, 2007). For whichmakesvalidinterpretationsofthecomparisonsdifficult.
example,BeaujeanandSheng(2014)examinedmeanleveltest Finally, as the samples were based on children, the test
scoredifferencesacrosstheWAIS,WAIS-R,WAIS-III,andWAIS- scorechangescouldhavearisenduetochangesintherateof
IV at the subscale level. As the raw data were not available, maturationaldevelopmentinchildrenacrosstime.
Beaujean et al. identified the subscale raw score means that InadditiontothesmallnumberofFlynneffectinvestiga-
corresponded to a scaled score of 10 for each subtest to tions relevant specifically to Digit Span, a small number of
determine whether scores increased across editions/time. Flynneffectstudieshaveincludedothermeasuresofmemory
Beaujeanetal. reported substantialDigit Span subtestmean span.Forexample,basedonthenormativesamplesassociated
increasesacrosstime.Onthesurface,theprocedureusedby with the Adult Memory and Information Processing Battery
Beaujeanetal.mayseemvalid.However,suchamethodology (AMIPB; Coughlan & Hollowes, 1985; Oddy et al., 2007),
wouldonlybevalidifthenumberofitems,aswellasscoring Baxendale(2010)foundvirtuallynomeandifferencesacross
procedure,remainedconstantacrosseditions.Infact,thereare groupsonthelistrecalltask.However,ameanincreaseacross
a large number of changes in the number of items within the two normative samples of approximately half of one
subtests and scoring procedures across Wechsler editions, standard deviation was observed for the design recall task.
manyofwhichcompromisetheinterpretationofasubstantial Baxendale(2010)offeredlittleinthewayofexplanationfor
amountofpublishedFlynneffectresearch(Kaufman,2010;but whytheeffectwasobservedforspatialbutnotverbalrecall,
seealsoFlynn,2010). except to suggest that the two processes are not perfectly
In comparison to other Wechsler subtests, there have correlated.Itisprobablyimportanttonotethatalargenumber
beenrelativelyfewchangestotheDigitSpansubtestoverthe oftheitemswithintheAMPIBchangedfromthe1985to2007
years. There are two significant changes that are important editions(Oddyetal.,2007).
to consider, however. First, the WAIS-R (and later editions) In another investigation, Rönnlund and Nilsson (2008)
awardeduptoamaximumoftwopointsforrecallingcorrectly foundthatanepisodicmemorylatentvariablemeanincreased
bothtrialsassociatedwithaDigitSpanitem.Bycontrast,within by .60 of a z-score from 1988–1990 to 2003–2005, which
theW-BandtheWAIS,theDSFandDSBscoressimplyreflect wouldbesuggestiveofaFlynneffect.However,anarguably
the largest digit series recalled correctly. Thus, it would distinctlimitationassociatedwithRönnlundaetal.isthatthe
naturallybeexpectedthattheWAISandtheW-BDigitSpan individuals selected to participate in the investigation were
raw scores would be lower than those observed in later drawn exclusively from a single regional town in Sweden
Wechslereditions.Infact,theDigitSpanTotal(DST)rawscore (Umeå, population: 110,000). Also, the mean age of the
meanthatcorrespondstoascaledscoreof10withintheWAIS samplesincludedinRönnlundaetal.wasrelativelyold,asno
is11andincreasedto15–16intheWAIS-R(ages20–24years). participantsundertheageof35yearswereincluded.Thus,the
The difference of four to five points may simply reflect the resultsobservedinRönnlundaetal.maybeduetotheoverall
changein scoring,notnecessarily achangein memory span health improvements reported in the elderly over the years
ability.Secondly,anewDigitSpansubtestwasaddedtothe (Jeune&Brønnum-Hansen,2008).
WAIS-IV(DigitSpanSequencing,DSS),andthescoresassoci- Consequently, in light of the above, the purpose of this
atedwithDSTwerebasedonthesumofDSF,DSB,andDSS. investigationwastwofold:(1)toestimateacrossacombina-
Thus, DST within the WAIS-IV is based on the sum of three tion of normative samples the typical verbal STM and WM
subtests, rather than two subtests. Naturally, the DST raw capacitiesofhealthyadults;and(2)totestthehypothesisthat
scoremeanthatcorrespondstoascaledscoreof10withinthe theverbalSTMandtheverbalWMcapacitiesofadultshave
WAIS-IIItotheWAIS-IVincreasedfrom17–18to28–29(ages increasedacrosstimeinlinewiththeFlynneffect.
20–24years). Again, such an increase would not necessarily
reflect an increase in ability across time, but, instead, the 2.Method
changeinthescoring.Fortunately,theWAIS-R,WAIS-III,and
WAIS-IVreportedadditionaltablesintheirtechnicalmanuals 2.1.Samplesandmeasure
thatincludethe‘longestdigitspanforward’and‘longestdigit
span backward’ means and standard deviations across all Inordertoaddressthetwoprincipalquestionsposedinthis
age groups. These values facilitate valid comparisons across investigation,theresultsassociatedwithseveralpublications
G.E.Gignac/Intelligence48(2015)85–95 89
Table1
MeanandstandarddeviationsassociatedwithLongestDigitSpanForward(LDSF)andLongestDigitSpanBackward(LDSB)acrosstime.
Source Year N Ages LDSF LDSB DST
WellsandMartin 1923 50 Adults 6.3(NA) 5.1(NA) 11.40
Wechsler 1933 236 Adults 6.60(1.13) NA NA
Weisenburgetal. 1936 70 18–59 6.69(1.02) 4.87(1.16) 11.56
W-B 1939 1,081 17–70 NA NA 12.00
WMS 1945 96 20–49 6.53(1.17) 4.80(1.12) 11.23
WAIS 1955 1,785 16–75 NA NA 11.00
WAIS–R 1981 1,880 16–74 6.45(1.33) 4.87(1.43) 11.32
MAS 1991 845 18–90 6.63(1.22) 4.83(1.30) 11.46
WAIS-III 1997 2,000 16–74 6.59(1.35) 4.85(1.49) 11.44
WAIS-IV 2008 1,900 16–74 6.72(1.31) 4.84(1.39) 11.56
N-weightedM 6.56(1.22) 4.88(1.32) 11.44
Note.WellsandMartin(1923)createdanormativesamplegroupforthepurposesofstudyingpsychopathology;Wechsler(1933)publishednormativeDigitSpan
Forwarddatatocomparethevariabilityassociatedwithalargenumberofhumancharacteristics;Weisenburgetal.(1936)createdanormativesamplegroupforthe
purposesofstudyingaphasia;DST=DigitSpanTotal;W-B=Wechsler-Bellevue;WMS=WechslerMemoryScale;WAIS=WechslerAdultIntelligenceScale;
MAS=MemoryAssessmentScales;NA=notavailable.
(journalarticles,books,andtechnicalmanuals)werecompiled. WAIS-IV, the ‘longest digit span forward’ (LDSF) and the
Acrossallselectedpublications,alargelyidenticalDigitSpan ‘longest digit span backward’ (LDSB) means and standard
testwasadministered.Specifically,thenatureoftheDigitSpan deviationswerereportedinsupplementaltableswithintheir
testconsideredforinclusioninthisinvestigationconsistedofa respective technical manuals.3 Thus, LDSF was considered a
seriesofdigitsreadtotheparticipantorallyatarateofonedigit comparable estimate of DSF and LDSB was considered a
everysecond.Theparticipanthadtorepeatthedigitsorally.In comparable estimate of DSB. Furthermore, LDSF added to
thecaseofDSF,thedigitshadtoberepeatedintheorderwith LDSBwasconsideredanestimateofDST.Inordertoincrease
whichtheywereread.InthecaseofDSB,thedigitshadtobe the comparability of the WAIS-R, WAIS-III, and WAIS-IV
repeatedinthereverseorderwithwhichtheywereread.DSF normativesamplescoreswiththeothersourcesincludedin
andDSBaretypicallyrecognisedasmeasuresofverbalSTMC this investigation (all of which did not include very old
and verbal WMC, respectively (Oberauer et al., 2000). In all participants),I calculatedthe N-weightedmeans basedon
cases,themeansincludedinthisinvestigationcorrespondedto theLDSFandLDSBvaluesassociatedwiththe16to74year
thelargestseriesofdigitsrecalledcorrectly.Inalmostallcases, oldagegroups,insteadofsimplyusingthetotalsample(16
thenumberofdigitswithinaseriesrangedfromthreetonine to 90years) normative sample LDSF and LDSB means and
forDSFandtwotoeightforDSB.Inmostcases,themeansand standarddeviations.
standarddeviationsassociatedwithDSF,DSB,andDSTwere The DSF and DSB results associated with the Wechsler
available.However,insomecases,onlytheresultsforDSFor MemoryScale(WMS;Wechsler,1945)normativesamplewere
DSTwereavailable. includedinthisinvestigation4,astheDigitSpansubtestwas
The sources/samples included in this investigation are essentially identical to thatincludedin theWAIS(Wechsler,
listed in Table 1. It can be observed that the Digit Span 1955).However,thenormativesampleresultsassociatedwith
normative sample results associated with the Wechsler- the WMS-R (Wechsler, 1997) were excluded, because the
Bellevue(W-B;Wechsler,1939)2,theWechslerAdultIntelli- WMS-Rnormativesamplewasnotwidelyagerepresentative.
gence Scale (WAIS; Wechsler, 1955), the Wechsler Adult Specifically, the norms associated with the WMS-R used
Intelligence Scale – Revised (WAIS-R; Wechsler, 1981), the interpolatedvaluesforseveralagegroupsfrom18to45years
Wechsler Adult Intelligence Scale – III (WAIS-III; Wechsler, of age (Elwood, 1991). The WMS-III (Wechsler, 1987) Digit
1997)andtheWAIS-IV(Wechsler,2008)wereincludedinthe Spannormswerealsoexcluded,astheywereidenticaltothose
analysis.WithrespecttotheW-BandtheWAIS,themeansand associated with the WAIS-III (Wechsler, 1997). Finally, Digit
standard deviations associated with DSF and DSB were not Span was not included in the WMS-IV (Wechsler, 2008). In
reported. However, the raw score DST values (DSF + DSB) lightoftheabove,withrespecttotheWechslerMemoryScales,
which corresponded to a scaled score of 10 (i.e., the scaled only the results associated with the WMS (Wechsler, 1945)
mean) were published and included in this investigation. In wereincludedinthisinvestigation.
the cases of the W-B and the WAIS, the raw score that With respect to non-Wechsler scales, the Digit Span
correspondedtoascaledscoreof10wasconsideredappropri- normativesampleresultsassociatedwiththeMemoryAssess-
ateforinclusioninthisinvestigation,astheDSFscoreandthe mentScales(MAS;Williams,1991)wereincluded,astheMAS
DSBscorecorrespondedtothenumberofdigitsinthelongest Digit Span test is essentially identical to the Digit Span test
series recalled accurately (Wechsler, 1939, 1955). Further- includedintheWAIS-R(Wechsler,1981).AlthoughtheMAS
more,theDSBandtheDSFscoreswereaddedtogethertoform
theDSTscore.WithrespecttotheWAIS-R,WAIS-III,andthe
3 TheLDSFandLDSBmeansandstandarddeviationsassociatedwiththe
WAIS-RnormativesamplewerereportedintheWAIS-RNItechnicalmanual
2 TheWechsler-Bellevue(Wechsler,1939)wasnormedonatotalsampleof (Kaplan,Fein,Morris,&Delis,1991).
1750subjects,however,670ofthosewerechildrenasyoungas7years.The 4 TheWechslerMemoryScale(Wechsler,1945)wasnormedonasampleof
adult portion of the normative sample amounted to 1071 participants 200healthyadults(ages25to50),however,theDSFandDSBrawscoremeans
(Wechsler,1958,p.87).. andstandarddeviationswerereportedforonly96oftheadults.
90 G.E.Gignac/Intelligence48(2015)85–95
technicalmanualdoesnotincludetherawscoremeansand increasedacrosstime,threePearsoncorrelationswereestimat-
standarddeviationsforDSFandDSB,theyweresuppliedtome edbetweenyearandthethreememoryspanscores(pvalues
via email (M. Williams, personal communication, June 10, estimatedviapermutationtests).Noneofthecorrelationswere
2014).Lesswell-knownarethethreeoldestsourcesincluded statisticallysignificant:DSFr=.45,p=.270;DSBr=−.57,
in this investigation. Weisenburg, Roe, and McBride (1936) p =.124; DST r = −.06, p =.880. Thus, as the estimated
createdanormativesamplegroupforthepurposesofstudying correlationswerenon-significantanddifferentiallydirected,the
and diagnosingaphasia. To this effect, a control groupof 70 hypothesis that memory span scores would evidence mean
adults were selected from three hospitals in Pennsylvania. levelincreasesacrosstimewasunsupported(seeFig.1).
Although the participants included in the normative sample Finally,thepossibilityofceilingeffectsassociatedwiththe
were admitted to hospital, individuals who were suffering Digit Span subscale scores was also examined. As the DSF
fromanypsychologicalconditionwereexcluded.Abatteryof meanof6.56wasapproximatelytwostandarddeviationsless
intelligencetestswasadministeredtotheparticipants,includ- thanthemaximumpossiblescoreof9,andtheDSBmeanof
ingaDigitSpantest.Theitemsrangedfromfivetoeightdigits 4.88wasapproximately2.5standarddeviationslessthanthe
for DSF and threeto seven digits for DSB.Weisenburg etal. maximumpossiblescoreof8,itwasconsideredunlikelythat
reportedthemeansandstandarddeviationsseparatelyforDSF the DSF and DSB subtest scores suffered from substantial
andDSB.Next,Wechsler(1933)publishedanormativesample ceilingeffects.Infact,withrespecttothehighestperforming
mean and standard deviation associated with DSF. Although normative group across the WAIS-R, WAIS-III, and WAIS-IV
it is impossible to be certain, it would seem reasonable to (20–24yearsofage),thepercentageofparticipantswhoscored
presume that the version of Digit Span used by Wechsler themaximumDSFscore(i.e.,9digits)wasequalto9.5%,7.0%,
(1933)wasthesameasthatwhichmadeitswayintothewell- and11.0%,respectively.Withrespectto DSB,8.5%,7.0%,and
known Wechsler scales. Finally, Wells and Martin (1923) 3.5%oftheparticipantswithintheWAIS-R,WAIS-III,andWAIS-
createdanormativesamplegroupforthepurposesofstudying IV normative samples achieved the highest score (8 digits),
psychopathology. Several tests were administered to the respectively.Thus,althoughtherewasasmallceilingeffectin
normativesamplegroup,includingDigitSpan.TheDSFportion thedata,itwasneithersubstantial,norwasthereanincreasing
ofthetestconsistedofseriesofdigitsranginguptoninedigits, trendacross time, supportingfurthertheabsence of a Flynn
and the DSB portion of the test consisted of series of digits effect.
ranginguptoeightdigits.
A number of ostensibly useful sources of data were 4.Discussion
excluded, as they were judged not to have administered a
sufficiently similar Digit Span test, or the results were not This investigation had two purposes: (1) to estimate
reportedinacomparablemanner.Also,somesourceswerenot preciselythetypicalverbalSTMCandverbalWMCofadults,
basedonasamplesufficientlyrepresentativetobeconsidered and(2)todeterminewhetherthesecapacitieshaveincreased
reasonablynormative.NotableexclusionswereRussell’s(1975, across time, in line with the Flynn effect. The results of this
1988) revision of the WMS, the Stanford-Binet (S-B) intelli- investigationsuggestthatthetypicaladulthasaverbalSTMCof
gencebatteries(Terman,1917;Terman&Childs,1912),aswell 6.56objects(plusorminus2.39),andaverbalWMCof4.88
asStarr(1924),Brener(1940)andElwood(2001).Thus,based objects (plus or minus 2.58). Secondly, in contrast to fluid
ontheinformationincludedinTable1,itcanbeseenthatthere intelligence test scores, STMC and WMC test scores do not
were10normativesamplesourcesincludedinthisinvestiga- appeartohaveincreasedacrosstime.
tion across 85years (1923 to 2008). The DSF, DSB, and Based on the results of this investigation, Miller’s (1956)
DST sample sizes corresponded to 7077, 6841, and 9770, proposal that the typical STMC of an adult is approximately
respectively. seven objects was largely supported in this investigation, as
themeanDSFscorewasestimatedat6.56:thus,somewhere
3.Results betweensixandsevenobjects.However,ifMiller’slaw(7±2)
implies that approximately 95% of individuals’ STMC fall
AscanbeseeninTable1,theN-weightedDSF,DSB,andDST somewherebetweenfiveandnine,itwouldimplythatSTMC
means(andSDs)correspondedto6.56(1.22),4.88(1.32),and wasassociatedwithastandarddeviationofapproximately1
11.44 (NA),respectively. Inordertoestimatethe95%lower (i.e.,1*1.96).Theresultsofthisinvestigationsuggestthatthe
andupperboundsassociatedwiththeDSFandDSBdistribu- standard deviation is only somewhat larger at 1.22, which
tions,theDSFandDSBstandarddeviationsweremultipliedby implies that 95% of the population’s STMC lies somewhere
thestandardnormaldeviate(i.e.,1.96).InthecaseofDSF,the between 2.39 (i.e., 1.22 * 1.96) above and below the mean
deviationtermcorrespondedto2.39(i.e.,1.22*1.96).Thus,it of 6.56 objects. Thus, in rounded terms, Miller’s law may be
maybesuggestedthat95%oftheadultpopulationhasaverbal considered largely accurate, at least in the context of verbal
STMCequaltosomewherebetween4.17and8.95objects.In STMC.
the case of DSB, the deviation term corresponded to 2.58 Fromacoefficientofvariationperspective(SD/M),STMC
(i.e.,1.32*1.96).Thus,itmaybesuggestedthat95%oftheadult wasassociatedwithavalueof.19,which,althoughonthelower
population hasa verbal WMC equal to somewhere between side, is roughly comparable to other cognitive capacities. For
2.30and7.46objects. example,basedontheWAIS-IVnormativesamplemeansand
ItcanalsobeobservedinTable1thattherewasverylittle standarddeviationsreportedinBeaujeanandSheng(2014),I
variability in the means across time. The DSF, DSB, and DST calculatedthemeancoefficientofvariationassociatedwithnine
rangescorrespondedto6.30–6.72,4.80–5.10,and11.00–12.00, oftheWAIS-IVsubtests(45–54yearolds;N=200)tobe.27
respectively.Totestthehypothesisthatmemoryspanscores (range:.21to.36).Additionally,basedonthenormativesample
G.E.Gignac/Intelligence48(2015)85–95 91
Fig.1.ScatterplotofDigitSpanTotal,DigitSpanForward,andDigitSpanBackwardmeansacrosstime(1923–2008).
means and standard deviations (18–30 year olds) associated testthanDSF.Based,ontheWAIS-IVresultsreportedinTableC.
with the BMIPB (Oddy et al., 2007) reported in Baxendale 4 of the technical manual (Wechsler, 2008), the mean item
(2010),listrecallanddesignrecallwerecalculatedby me to difficultiesassociatedwithDSFandDSBwerecalculatedbyme
be associated with coefficients of variation of .22 and .24, tobep=.70andp=.53,respectively.Thus,DSBdoesappear
respectively.Strictlyspeaking,Miller’slawimpliesacoefficient to be somewhat more difficult from a pure psychometric
ofvariationof.14(1/7),whichmaybesuggestedtoberather perspective. Theoretically, WMC involves the application of
low,incomparisontotheresultsreportedinthisinvestigation two principal cognitive processes, encoding and transforma-
and other cognitive capacities. Thus, the somewhat larger tion, rather than simply encoding (Oberauer, Lewandowsky,
estimate of variability in STMC reported in thisinvestigation, Farrell, Jarrold, & Greaves, 2012). Consequently, it may be
in comparison to that implied by Miller (1956), helps bring suggested that the greater amount of variability associated
STMCcloserinlinewithothercognitivecapacities. withDSBimpliesthatindividualdifferencesinthecapacityto
Cowan’s(2005)proposalofatypicalWMCoffourobjects performbothprocesses(encodingandtransformation)arenot
appearstobeanunderestimatebyapproximatelyoneobject,as correlated perfectly. Further support for such a position is
thisinvestigationestimatedaDSBmeanof4.88,orfiverounded. reflected in the fact that DSF and DSB are only moderately
As per STMC, it would be useful to replicate the estimate of correlatedatr=.55,basedontheWAIS-IVnormativesample
fiveobjectsonlarge,representativesamplesandadiversityof (Wechsler, 2008). Even after disattenuation for imperfect
measures(spatial,non-numeric,etc.).EvenmoresothanMiller reliability (DSF α = .81; DSB α =.82), the disattenuated
(1956), Cowan (2005) appears to have underestimated the correlation(r=.67)isfarfromunity.Thus,arguably,thekey
amountofvariabilityinWMCintheadultpopulation,asDSB distinctionbetweenDSBandDSFisnotsimplythatDSBismore
wasassociatedwithastandarddeviationof1.32(95%normal difficult;instead,thereappearstobeaqualitativedistinction,
deviation term =2.58, or three rounded), rather than the aswell(Hurlstone,Hitch,&Baddeley,2013).
standarddeviationof.50impliedbyCowan’s(2005)proposalof Itwashypothesizedthatmemoryspanwouldbeaffectedby
plusorminusoneobject(i.e.,.50*1.96).Thus,inlightofthe theFlynneffect,asmemoryspan(WMCinparticular)isvery
resultsofthisinvestigation,Cowan’slawofWMCmaybemore closelyrelatedtofluidintelligence(Gignac,2014;Kaneetal.,
accurately restated as 5 ± 3. Arguably, this is a relatively 2005). The results of this investigation failed to support the
substantialre-statement;again,onewhichshouldbeverifiedon hypothesisthatmemoryspanwouldbeaffectedbytheFlynn
adiversityofWMCmeasures. effect.Overall,therewerenomeaningfulchangesinmemory
Fromacoefficientofvariationperspective(i.e.,SD/M),it spanfrom1923to2008,asmeasuredbyDSF,DSB,andDSTtest
wouldappearthatWMCisacognitiveprocessassociatedwith scores.Incontrasttomemoryspan,substantialincreaseshave
substantially more variability than STMC. In fact, DSB was beenreportedforfluidintelligence,particularlyasmeasuredby
associatedwitha42%largercoefficientofvariationthanDSF Raven’sProgressiveMatrices(Flynn,2012).ThelackofFlynn
(.19 vs. .27). Superficially, it may be suggested that greater effect associated with STMC and WMC may be considered
variabilitymaybeexpectedforDSB,asDSBisamoredifficult surprising,consideringmemoryspanissointimatelyrelated
92 G.E.Gignac/Intelligence48(2015)85–95
withfluidintelligence(Chuderski,2013;Colom,Abad,Quiroga, theWAIS(Wechsler,1955),perhapstheonlytwoeditionsthat
Shih,&Flores-Mendoza,2008;Colom,Abad,Rebollo,&Shih, allowforvalidSimilaritiessubtestcomparisonsinadults.5In
2005; Kane et al., 2005). Thus, if the Flynn effect is not additiontothisinvestigation,thereareothersthathaveeither
operating predominantly on g (te Nijenhuis & van der Flier, failedtoobserveaFlynneffectorhaveobservedareversalof
2013),anditisnotoperatingonSTMCorWMC,thecontention theFlynneffect(e.g.,Shayer&Ginsburg,2009;Sundet,Barlaug,
that fluid intelligence test scores are increasingsubstantially &Torjussen,2004;Teasdale&Owen,2008).Ultimately,how
across time is arguably difficult to reconcile. Based on the the results associated with this investigation should be
WAIS-IV normative sample, Gignac and Watkins (2013) integratedwithintheFEliteraturemaybedebatable,adebate
estimated that the amount of unique internal consistency whichwillnotberesolveddefinitivelyhere.Furtheranalysis
reliability associated with the Perceptual Reasoning index andsynthesisisofcourseencouraged.
scores(similartofluidintelligence)wasapproximately.18. From a methodological perspective, it will be noted that
Thus,onceg,WMC,andSTMCvarianceisremovedfromfluid manyofthepublishedstudiessupportiveoftheFlynneffectused
intelligenceliketestscores,thereisonlyasmallamountof indirectandpossiblyunsubstantiatedquantitativemethods.For
reliablevarianceuponwhichtoeffectsubstantial, system- example,Parker(1986)madeuseoftheslopeassociatedwith
aticchangesofanysort. time(inyears)andIQfortheStanford-Binet(Terman&Merrill,
Noteworthy, however, is the item-level research which 1973)andappliedittotheestimationofdifferencescoresfrom
suggeststhattheFlynneffectassociatedwithRaven’sscores individualswhocompleteddifferentWechslerscales(i.e.,WAIS
maybedueprincipallytocohortdifferencesinthecapacityfor andWAIS-R).Inanothercase,BeaujeanandSheng(2014)did
abstraction(Fox&Mitchum,2013).Basedontheresultsofthis nothaveaccesstotherawdata,consequently,theyestimated
investigation, it would appear that any possible increases in thestandarddeviationsassociatedwiththeWechslersubscales
abstraction capacity across time have occurred completely byidentifyingtherawscoreequivalentsassociatedwithscaled
devoid ofany increases in WMC. Of course, statistically,it is scoresof7and13.Finally,inthecontextofevaluatingRaven’s
possiblethattwosubtestsmaybeobservedtobeassociated IQscoregainsin theDutchfrom1952to 1982,Flynn(1987)
withasubstantialinter-correlationacrosstwocohorts,butonly reportedthepercentageofmenwhoanswered24itemsormore
one subtest evidence increases across time (Flynn, 2007). correctly and applied a method with several assumptions to
However,giventhatthelargeassociationbetweenWMCand estimate the changes in terms of IQ scores. Arguably, these
fluid intelligence is theorised to be, at least partly, causal in methods are not ideal and/or particularly straightforward. By
nature(e.g.,Halford,Cowan,&Andrews,2007),theobservation contrast,astrengthofthisinvestigationisthatthemeansand
of a Flynn effect foronlyfluid intelligencemay be suggested standard deviations were obtained directly, and the methods
tobeimprobable.Nonetheless,itshouldbeacknowledgedthat usedtoanalysethedataandreporttheresultsweresimpleand
thesubstantial,butcurrently non-experimentally established, straightforward.
association between WMC and fluid intelligence does not There are, naturally, limitations associated with this
necessitateaFEacrossbothconstructs(Flynn,2007). investigation.Inparticular,thestandardnormaldeviateterms
Infact,itispossiblethattheresultsofthisinvestigationmay estimatedforSTMC(±2.39)andWMC(±2.58)assumethatthe
beconsideredinlinewiththecontentionthattheFlynneffectis DSF and DSB scores were normally distributed. It is highly
operating primarily at the level of abstraction ability (Flynn, unlikelythattheywereperfectlyso,asmostcognitiveabilitytest
2012;Fox&Mitchum,2013),ratherthanonatestsuchasDigit scoresareskewedtosomedegree(Micceri,1989).Consequent-
Span,as Digit Span is based on stimuli to which individuals ly,theestimatesreportedinthisinvestigationareaccuratethe
85yearsagoandtodaywouldhaveaboutanequalamountof extentthatthedistributionswerenotverysubstantiallyskewed.
exposure,i.e.,digitsfromonetonine.Suchacontentionmay Accesstorawdatawouldallowforevenmorepreciseestimates
beconsideredostensiblyplausible,however,whenexamined thanthosereportedinthisinvestigation.
thoroughly,onewoulddrawtheconclusionthathumanshave Therewerealsoslightceilingeffectsinthedata.Specifically,
beenexposedtodigitsatasubstantiallyincreasingrateacross approximately5–10%oftheparticipantsrecalledcorrectlythe
time. First, consider that, by 1930, only 12% of residents of largestseriesofdigitsassociatedwiththeDSF(i.e.,9)andthe
New York had access to a telephone at home. By 1960, the DSB(i.e.,8)subtests.Thus,themeanSTMCandWMCvalues
percentage had increased to 76% (U.S. Bureau of Labor reported in this investigation are likely underestimates to a
Statistics,2006).In2011,89%ofUShouseholdshadacellular small degree. For the same reason, the variability in STMC
phone and 71% a landline (U.S. Census Bureau, 2013a). and WMC scores reported in this investigation may also be
Furthermore, in 1997, 18% of US residents had access to the expected to be underestimates to a small degree. Given the
internetathomeand,by2007,thenumberincreased to 62%
(U.S. Census Bureau, 2013). Thus, phone numbers, login
numbers,personalidentificationnumbers,digitalclocks,digital 5 TheSimilaritiessubtestwithintheW-B(Wechsler,1939)andtheWAIS
odometers,cablenetworkswith100sofchannels,onlinestock (Wechsler,1955)consistedof12and13items,respectively.Twooftheitems
broking accounts, etc., the typical person today is very likely within the WAIS were completely revised and one additional item was
included.AccordingtoWechsler(1945,p.188),arawSimilaritiesscoreof12
usingdigitsatarateastonishinglygreatertothatofthetypical
correspondedtoascaledscoreof10intheW-B(ages17to70).Basedonthe
person in the 1920s, the oldest data point used in this age-grouped(ages16to69)rawscoreandscaledscoreequivalentspublished
investigation. inWechsler(1955),Icalculatedthatascaledscoreof10correspondedtoanN-
Finally, it will be noted that there were essentially no weightedmeanrawscoreof12.99.Thus,themeanSimilaritiesrawscore
appearstohaveincreasedbyonepointfromtheW-BtotheWAIS.However,
changes in adult abstraction ability based on theSimilarities
giventheextraitemaddedtotheWAIS,itwouldbeplausibletosuggestthat
subtest (a measure of verbal abstraction; Weiss, Saklofske,
therewasnomeaningfulchangeinverbalabstractionabilityinadultsfrom
Coalson,&Raiford,2010)fromtheW-B(Wechsler,1939)to 1939to1955.
G.E.Gignac/Intelligence48(2015)85–95 93
relatively small amount of time it takes to administer Digit of intelligence. However, given that verbal STMC and verbal
Span,itwouldbearguablybeneficialfortheWechslerscalesto WMCtestscoresdonotappeartohaveincreasedinthelast
includea10digitseriesandaninedigitserieswithintheDSF 85years, crystallised intelligence test scores only minimally
andDSBsubtests,respectively. or inconsistently (Flynn, 2007; Lynn, 1990, 2009), and that
Much of the validity of the results reported in this changes in subtest items/scoring/administration across edi-
investigation rests upon the contention that Digit Span is at tionsmayexplainalargepercentageofseveralsubtestscore
leastadecentindicatorofintellectualfunctioning.Somewould meanchangesacrosstime(Kaufman,2010),itmaybeprudent
questionsuchacontention(e.g.,Matarazzo,1972).Although to acknowledge that the magnitude, pervasiveness, and true
certainly not the best indicator of intellectual functioning, I natureoftheFlynneffectremainsasubstantiallyopenquestion.
believe the empirical evidence reviewed in the introduction
above suggests that Digit Span, and Digit Span Backward in
Acknowledgements
particular,isagoodindicatorofgandastrongcorrelateoffluid
intelligence(Gignac,2014).TheDigitSpansubtestwaschosen
Special thanks to Mike Williams for supplying to me via
because memory span has been relatively neglected in the
personalcommunicationtheDSFandDSBmeanandstandard
Flynneffectliterature,aswellasbecauseitaffordedthebest
deviationvaluesassociatedwiththeMemoryAssessmentScales
opportunitytoevaluatetestscorechangesacrosstimefroma
(Williams, 1991). Thanks also to Mark Hurlstone for some
subtestthathaschangedlittleovertheyears.
helpfulconversationsduringtheproductionofthismanuscript.
It should also be acknowledged that the WMC and fluid
intelligence research has been conducted primarily at the
References
latentvariablelevel,however,thisinvestigationwasconducted
at the observed score level, which is compromised, to some
Bachelder,B.L.,&Denny,M.R.(1977).Atheoryofintelligence:I.Spanandthe
degree,bymeasurementerror.Ideally,thehypothesisofthe complexityofstimuluscontrol.Intelligence,1(2),127–150.
Flynneffectwouldbeexaminedwithinthecontextoflatent Baddeley, A. D. (2002). Is working memory still working?. European
variable modelling, as measurement error would be held Psychologist,7(2),85–97.
Baddeley,A.D.,&Hitch,G.(1974).Workingmemory.PsychologyofLearningand
constant across all comparisons (i.e., 0). However, the use Motivation,8,47–89.
of latent variable modelling in this context rests upon the Baxendale,S.(2010).TheFlynneffectandmemoryfunction.JournalofClinical
assumptionoffactorialinvariance.Thepublishedresearchto- andExperimentalNeuropsychology,32(7),699–703.
Beaujean,A.,&Sheng,Y.(2014).AssessingtheFlynneffectintheWechsler
date suggests that this is an implausible assumption (Must scales.JournalofIndividualDifferences,35(2),63–78.
etal.,2009;Wichertsetal.,2004).Itremainsapossibilitythat Birren,J.E.,&Morrison,D.F.(1961).AnalysisofWAISsubtestsinrelationtoage
an invariant latent variable could be created from several andeducation.JournalofGerontology,16,363–369.
Blankenship,A.B.(1938).Memoryspan:areviewoftheliterature.Psychological
memoryspantasks,ratherthanawholeintelligencebattery.As Bulletin,35(1),1–25.
theWAIS-IVincludesthreememoryspantasks(DSF,DSB,and Bolton,T.L.(1892).Thegrowthofmemoryinschoolchildren.AmericanJournal
DSS),oncetheWAIS-Vispublished,itmaybeapossibilityto ofPsychology,4,362–380.
Brener,R.(1940).Anexperimentalinvestigationofmemoryspan.Journalof
test the hypothesis tested in this investigation at the latent
ExperimentalPsychology,26(5),467–482.
variablelevel.However,itwouldbedonesowithinarelatively Bronner,A.F.,Healy,W.,Lowe,G.M.,&Shimberg,M.E.(1927).Amanualof
shortspanofyears. individualmentaltestsandtesting.Boston,MA:Little,Brown&Co.
Finally, the samples included in this investigation were Carpenter, P.A., Just, M.A., & Shell, P. (1990). What one intelligence test
measures:atheoreticalaccountoftheprocessingintheRavenProgressive
drawnexclusivelyfromtheUSA,asitprovedtobethecountry MatricesTest.PsychologicalReview,97(3),404–431.
withthelargestnumberofgoodqualitysamplesavailableforthe CensusBureau,U.S.(2013).ComputerandinternetuseintheUnitedStates.
purposesofexaminingthequestionsraisedinthisinvestigation. (2013,May)RetrievedSeptember23,2013,fromhttp://www.census.gov/
prod/2013pubs/p20-569.pdf
ItispossiblethattheFlynneffectmaybeobservedformemory
Chuderski, A. (2013). When are fluid intelligence and working memory
spanscoresinothernationalities.Researchersareencouraged isomorphicandwhenaretheynot?Intelligence,41(4),244–262.
to explore this possibility, providing sufficiently good quality Colom, R., Abad, F.J., Quiroga, M., Shih, P.C., & Flores-Mendoza, C. (2008).
Workingmemoryandintelligencearehighlyrelatedconstructs,butwhy?
sourcesofdatacanbeidentified.Similarly,anextensionofthis Intelligence,36(6),584–606.
investigation on samples of data from children may prove Colom,R.,Abad,F.J.,Rebollo,I.,&Shih,P.(2005).Memoryspanandgeneral
enlightening. However, it would appear that there would be intelligence:Alatent-variableapproach.Intelligence,33(6),623–642.
Colom,R.,Flores-Mendoza, C.,Quiroga, M.Á.,&Privado,J.(2005). Working
fewer good quality samples available for inclusion in such an
memoryandgeneralintelligence:Theroleofshort-termstorage.Personality
investigation.Forexample,the‘longestdigitspanforward’and andIndividualDifferences,39(5),1005–1014.
‘longestdigitspanbackward’meansandstandarddeviations Coughlan,A.,&Hollowes,C.(1985).Adultmemory&informationprocessing
battery.Leeds,UK:LeedsUniversityHospital.
associated with the WISC-R (Wechsler, 1974) were not
Cowan,N.(2005).Themagicalnumber4inshort-termmemory:Areconsider-
published,tomyknowledge.BasedontheWISC-III(Wechsler, ationofmentalstoragecapacity.BehaviouralandBrainSciences,24,87–185.
1991)andWISC-IV(Wechsler,2003)normativesamples,there Cowan, N. (2010). The magical mystery four how is working memory
werevirtuallynochangesinmeanLDSFandLDSBvalues. capacitylimited,andwhy?CurrentDirectionsinPsychologicalScience,
19(1),51–57.
Inconclusion,itiscommonlystatedthattheaccumulated Daley,T.C.,Whaley,S.E.,Sigman,M.D.,Espinosa,M.P.,&Neumann,C.(2003).IQ
empirical results suggest that intelligence test scores have ontherisetheFlynneffectinruralKenyanchildren.PsychologicalScience,
increasedbyapproximatelythreeIQpointsperdecade(Neisser 14(3),215–219.
Daneman, M., & Merikle, P.M. (1996). Working memory and language
etal.,1996;Nisbettetal.,2012).Suchevidenceisoccasionally
comprehension: A meta-analysis. Psychonomic Bulletin & Review, 3(4),
usedintheacademicpress(e.g.,Flynn,2007;Stanovich,2011) 422–433.
andinthepopularpress(e.g.,Gladwell,2007;Holloway,1999; Dehn,M.J.(2008).Workingmemoryandacademiclearning:Assessmentand
intervention.Hoboken,NJ:JohnWiley&Sons.
Murdoch, 2007) to support the position that conventional
Dempster,F.N.(1981).Memoryspan:Sourcesofindividualanddevelopmental
intelligencetestscoresareofquestionablevalidityasindicators differences.PsychologicalBulletin,89(1),63–100.
94 G.E.Gignac/Intelligence48(2015)85–95
Elwood,R.W.(1991).TheWechslerMemoryScale-Revised:Psychometric Must,O.,teNijenhuis,J.,Must,A.,&vanVianen,A.E.M.(2009).Comparabilityof
characteristicsandclinicalapplication.NeuropsychologyReview,2(2), IQscoresovertime.Intelligence,37,25–33.
179–201. Neisser,U.(Ed.).(1998).Therisingcurve:Long-termgainsinIQandrelated
Elwood, R.W. (2001). MicroCog: assessment of cognitive functioning. measures.Washington,DC:AmericanPsychologicalAssociation.
NeuropsychologyReview,11(2),89–100. Neisser,U.,Boodoo,G.,Bouchard,T.J.,Jr.,Boykin,A.W.,Brody,N.,Ceci,S.J.,&
Flynn,J.R.(1987).MassiveIQgainsin14nations:WhatIQtestsreallymeasure. Urbina,S.(1996).Intelligence:knownsandunknowns.AmericanPsychol-
PsychologicalBulletin,101,171–191. ogist,51(2),77–101.
Flynn,J.R.(2007).Whatisintelligence?NewYork,NY:CambridgeUniversity Nisbett, R.E., Aronson, J., Blair, C., Dickens, W., Flynn, J., Halpern, D.F., &
Press. Turkheimer,E.(2012).Intelligence:newfindingsandtheoreticaldevelop-
Flynn,J.R.(2009a).RequiemfornutritionasthecauseofIQgains:Raven’sgains ments.AmericanPsychologist,67(2),130–159.
inBritain1938–2008.EconomicsandHumanBiology,7,18–27. Norman,S.,Kemper,S.,&Kynette,D.(1992).Adults'readingcomprehension:
Flynn,J.R.(2009b).TheWAIS-IIIandWAIS-IV:Daubert motionsfavorthe Effectsofsyntacticcomplexityandworkingmemory.JournalofGerontology,
certainlyfalseovertheapproximatelytrue.AppliedNeuropsychology,16, 47(4),258–265.
1–7. Oberauer,K.,Lewandowsky,S.,Farrell,S.,Jarrold,C.,&Greaves,M.(2012).
Flynn,J.R.(2010).ProblemswithIQgains:Thehugevocabularygap.Journalof Modeling working memory: an interference model of complex span.
PsychoeducationalAssessment,28(5),412–433. PsychonomicBulletin&Review,19(5),779–819.
Flynn,J.R.(2012).Arewegettingsmarter?:risingIQinthetwenty-firstcentury. Oberauer,K.,Su,H.-M.,Wilhelm,O.,&Sander,N.(2007).Individualdifferences
CambridgeUniversityPress. inworkingmemorycapacityandreasoningability.InA.R.A.Conway,C.
Fox,M.C.,&Mitchum,A.L.(2013).Aknowledge-basedtheoryofrisingscoreson Jarrold,M.J.Kane,A.Miyake,&J.N.Towse(Eds.),Variationinworking
“culture-free”tests.JournalofExperimentalPsychology:General,142(3), memory(pp.21–48).NewYork:OxfordUniversityPress.
979–1000. Oberauer,K.,Süß,H.M.,Schulze,R.,Wilhelm,O.,&Wittmann,W.W.(2000).
Frank,G.(1983).TheWechslerenterprise:Anassessmentofthedevelopment, Working memory capacity—facets of a cognitive ability construct.
structure,anduseoftheWechslertestsofintelligence.Oxford:Pergamon PersonalityandIndividualDifferences,29(6),1017–1045.
Press. Oddy,M.,Coughlan,A.,&Crawford,H.(2007).BIRTmemoryandinformation
Fry,A.F., & Hale,S. (1996). Processing speed, working memory, and fluid processingbattery.Horsham,UK:BrainInjuryResearchTrust.
intelligence:Evidenceforadevelopmentalcascade.PsychologicalScience, Parker,K.C.H.(1986).Changeswithage,year-of-birthcohort,agebyyear-
7(4),237–241. of-birthcohortinteraction,andstandardizationoftheWechsleradult
Gignac,G.E.(2014).Fluidintelligencesharescloserto60%ofitsvariancewith intelligencetests.HumanDevelopment,29(4),209–222.
workingmemorycapacityandisabetterindicatorofgeneralintelligence. Paul,R.H.,Lawrence,J.,Williams,L.M.,Richard,C.C.,Cooper,N.,&Gordon,E.
Intelligence,47,122–133. (2005).Preliminaryvalidityof“integneuroTM”:Anewcomputerized
Gignac,G.E.,&Watkins,M.W.(2013).Bifactormodelingandtheestimationof batteryofneurocognitivetests.InternationalJournalofNeuroscience,
model-basedreliabilityintheWAIS-IV.MultivariateBehavioralResearch, 115(11),1549–1567.
48(5),639–662. Raven,J.C.,Court,J.H.,&Raven,J.(1986).ManualforRaven’sProgressiveMatrices
Gladwell,M.(2007).Noneoftheabove.NewYorker,83(40),92–96(2007, andVocabularyScales:Section2—ColouredProgressiveMatrices.London:H.
December). K.Lewis.
Gobet,F.,&Clarkson,G.(2004).Chunksinexpertmemory:Evidenceforthe Raven,J.,Rust,J.,&Squire,A.(2008).Manual:ColouredProgressiveMatricesand
magicalnumberfour…orisittwo?Memory,12(6),732–747. CrichtonVocabularyScale.London:NCSPearson.
Goldstein,E.(2010).Cognitivepsychology:Connectingmind,researchand Redick,T.S.,&Lindsey,D.R.(2013).Complexspanandn-backmeasuresof
everydayexperience.Belmont,CA:CengageLearning. workingmemory:Ameta-analysis.PsychonomicBulletin&Review,20(6),
Halford,G.S.,Cowan,N.,&Andrews,G.(2007).Separatingcognitivecapacity 1102–1113.
fromknowledge:Anewhypothesis.TrendsinCognitiveSciences,11(6), Rodgers, J.L. (1998). A critique of the Flynn Effect: Massive IQ gains,
236–242. methodologicalartifacts,orboth?Intelligence,26(4),337–356.
Hedden,T.,&Gabrieli,J.D.(2004).Insightsintotheageingmind:aviewfrom Rönnlund,M.,Carlstedt,B.,Blomstedt,Y.,Nilsson,L.G.,&Weinehall,L.(2013).
cognitiveneuroscience.NatureReviewsNeuroscience,5(2),87–96. Secular trends in cognitive test performance: Swedish conscript data
Holloway,M.(1999).Flynn’seffect.ScientificAmerican,280(1),37–38. 1970–1993.Intelligence,41(1),19–24.
Hurlstone,M.J.,Hitch,G.J.,&Baddeley,A.D.(2013).Memoryforserialorder Rönnlund, M., & Nilsson, L.G. (2008). The magnitude, generality, and
acrossdomains:Anoverviewoftheliteratureanddirectionsforfuture determinants of Flynn effects on forms of declarative memory and
research.PsychologicalBulletin,140(2),339–373. visuospatialability:Time-sequentialanalysesofdatafromaSwedish
Jensen, A.R., & Figueroa, R.A. (1975). Forward and backward digit span cohortstudy.Intelligence,36(3),192–209.
interactionwithraceandIQ:PredictionsfromJensen'stheory.Journalof Russell,E.W.(1975).Amultiplescoringmethodfortheassessmentofcomplex
EducationalPsychology,67(6),882–893. memory functions. Journal of Consulting and ClinicalPsychology, 43(6),
Jeune,B.,&Brønnum-Hansen,H.(2008).Trendsinhealthexpectancyatage65 800–809.
forvarioushealthindicators,1987–2005,Denmark.EuropeanJournalof Russell,E.W.(1988).RenormingRussell'sversionoftheWechslermemory
Ageing,5(4),279–285. scale. Journal of Clinical and Experimental Neuropsychology, 10(2),
Kane, M.J., Hambrick, D.Z., & Conway, A.R. (2005). Working memory 235–249.
capacityandfluidintelligencearestronglyrelatedconstructs:commenton Sattler,J.M.(1982).Assessmentofchildren’sintelligenceandspecialabilities.
Ackerman,Beier,andBoyle(2005).PsychologicalBulletin,131(1),65–71. Boston:Allyn&Bacon.
Kaplan,E.,Fein,D.,Morris,R.,&Delis,D.C.(1991).WechslerAdultScale–Revised Schaie,K.W.,Willis,S.L.,&Pennak,S.(2005).Anhistoricalframeworkforcohort
–NeuropsychologicalInstrument-Manual.SanAntonio,TX:Psychological differencesinintelligence.ResearchinHumanDevelopment,2(1–2),43–67.
Corporation. Serwer,B.J.,Shapiro,B.J.,&Shapiro,P.P.(1972).Achievementpredictionof
Kaufman,A.S.(2010).“InWhatWayAreApplesandOrangesAlike?”ACritique 'high-risk'children.PerceptualandMotorSkills,35(2),347–354.
ofFlynn’sInterpretationoftheFlynnEffect.JournalofPsychoeducational Shayer,M.,&Ginsburg,D.(2009).Thirtyyearson–alargeanti‐Flynneffect/
Assessment,28(5),382–398. (II):13‐and14‐year‐olds.Piagetiantestsofformaloperationsnorms
Lynn,R.(1982).IQinJapanandtheUnitedStatesshowsagrowingdisparity. 1976–2006/7.BritishJournalofEducationalPsychology,79(3),409–418.
Nature,297,222–223. Stanovich,K.(2011).Rationalityandthereflectivemind.OxfordUniversityPress.
Lynn,R.(1990).Differentialratesofsecularincreaseoffivemajorprimary Starr,A.S.(1924).TheDiagnosticValueoftheAudito-VocalDigitMemorySpan.
abilities.BiodemographyandSocialBiology,37(1–2),137–141. PsychologicalClinic,15,61–84.
Lynn,R.(2009).FluidintelligencebutnotvocabularyhasincreasedinBritain, Sundet,J.M.,Barlaug,D.G.,&Torjussen,T.M.(2004).TheendoftheFlynn
1979–2008.Intelligence,37,249–255. effect?:Astudyofseculartrendsinmeanintelligencetestscoresof
Matarazzo, J.D. (1972). Wechsler’s measurement and appraisal of adult Norwegian conscripts during half a century. Intelligence, 32(4),
intelligence (5thed.).NewYork:OxfordUniversityPress. 349–362.
Micceri, T. (1989). The unicorn, the normal curve, and other improbable teNijenhuis,J.,&vanderFlier,H.(2013).IstheFlynneffectong?:Ameta-
creatures.PsychologicalBulletin,105(1),156–166. analysis.Intelligence,41(6),802–807.
Miller,G.A.(1956).Themagicalnumberseven,plusorminustwo:somelimits Teasdale,T.W.,&Owen,D.R.(2008).Seculardeclinesincognitivetestscores:A
onourcapacityforprocessinginformation.PsychologicalReview,63(2), reversaloftheFlynnEffect.Intelligence,36(2),121–126.
81–97. Terman,L.M.(1917).TheStanfordrevisionandextensionoftheBinet-Simonscale
Miller,L.T.,&Vernon,P.A.(1992).Thegeneralfactorinshort-termmemory, formeasuringintelligence.Vol.18,Baltimore,MD:Warwick&York.
intelligence,andreactiontime.Intelligence,16(1),5–29. Terman,L.M.,&Childs,H.G.(1912). Atentativerevision andextensionof
Murdoch,S.(2007).IQ:Asmarthistoryofafailedidea.Hoboken,NJ:JohnWiley theBinet-Simon measuring scaleof Intelligence. JournalofEducational
&Sons. Psychology,3(2),61–74.