Table Of Content

LANGUAGE,COGNITIONANDNEUROSCIENCE,2018 https://doi.org/10.1080/23273798.2018.1431679 REGULARARTICLE When does abstraction occur in semantic memory: insights from distributional models Michael N. Jones MichaelJonesDepartmentofPsychologicalandBrainSciences,IndianaUniversity,Bloomington,IN,USA ABSTRACT ARTICLEHISTORY Abstraction is a core principle of Distributional Semantic Models (DSMs) that learn semantic Received2December2016 representations for words by applying dimensional reduction to statistical redundancies in Accepted17January2018 language. Although the posited learning mechanisms vary widely, virtually all DSMs are prototype models in that they create a single abstract representation of a word’s meaning. This KEYWORDS Semanticmemory;latent stands in stark contrast to accounts of categorisation that have very much converged on the semanticanalysis; superiority of exemplar models. However, there is a small but growing group of accounts in categorisation;abstraction psychology, linguistics, and information retrieval that are exemplar-based semantic models. Thesemodelsborrowmanyoftheideasthathaveledtotheprominenceofexemplarmodelsin fields such as categorisation. Exemplar-based DSMs posit only an episodic store, not a semantic one. Rather than applying abstraction mechanisms at learning, these DSMs posit that semantic abstractionisanemergentartifactofretrievalfromepisodicmemory. Abstractionisan essentialmechanismtolearnandrep- prototype models: They attempt to collapse the entire resent meaning in memory. Theoretical notions of set of a word’s linguistic exemplars into a single econ- abstraction vary across research domains, but tend to omical representation of word meaning. However, this emphasise aggregation across exemplars to a central practiceisincontrasttotheliteratureoncategorisation “average”representation(Reed,1972),transformingsen- that has largely disposed of prototype representations sorimotor input to a deeper knowledge representation infavourofexemplar-basedmodels.Inthispaper,Ihigh- (Barsalou, 1999; Damasio, 1989), or reducing idiosyn- lightthecontradictionbetweenliteratures,andattempt cratic dimensions to focus on those attributes most to build a case for exemplar-based models of distribu- common to members of a category (Rosch & Mervis, tionalsemantics. 1975). In modern computational models of semantic Abstraction is a core mechanistic principle of DSMs. memory, notions of abstraction are formally specified Most DSMs apply some form of dimensional reduction andappliedtoreal-worldlinguistic datatoevaluatethe to words’ experienced linguistic contexts, essentially structure of semantic memory that the mechanisms abstracting over the dimensions that are idiosyncratic wouldproduce. to each context, and converging on the stable Modern distributional semantic models (DSMs; e.g. higher-order dimensions that optimally explain the Landauer & Dumais, 1997) have become immensely covariational pattern of words across contexts. Aggre- popular in the cognitive literature due to their success gation is also a core principle of virtually all DSMs – at fitting human experimental data, their utility in real- multiple linguistic contexts are averaged across, worldapplications,andtheirinsightsasmodelsofcogni- either explicitly or implicitly, resulting in a single tion. In general, DSMs learn distributed representations central representation for the word that is stored. A for word meanings from statistical redundancies across word’svectorpatternacrossthesereduceddimensions linguistic experience. Because they are often applied to is thought to represent its generic meaning. Hence, text corpora as learning data, DSMs are also referred to eachDSMformallyspecifiesanabstractionmechanism as “corpus-based” models, although, in principle, their by which episodic memory is transformed into seman- learning mechanisms can be applied to covariational tic memory; in this sense, DSMs embody the idea of structureinanydataset(e.g.perception,speech,etc.). abstraction and allow us to quantitatively evaluate DespitethewiderangeofDSMsintheliterature,they various process explanations of aggregation and virtually all share the characteristic that they are dimensional reduction. CONTACT MichaelN.Jones [email protected] InPress:Language,Cognition,andNeuroscience,SpecialIssueon“Concepts:What,When,andWhere”. ©2018InformaUKLimited,tradingasTaylor&FrancisGroup 2 M.N.JONES The particular mechanisms posited for abstraction in the desired output being the activation of the correct DSMs differinseveraltheoreticallyimportantways, and target word, with other words being inhibited. The include reinforcement learning, probabilistic inference, error signal (difference between true and observed latent induction, and Hebbian learning. Enumerating output pattern) is then backpropagated through the the differences between the mechanisms used by each network to increase the likelihood that the correct model is beyond the scope of this article (see Jones, word will be activated at the output layer given the Willits,&Dennis,2015forareview);butallDSMsessen- input wordsinthefuture. Hence, thecontextis usedto tially specifyanabstractionmechanismtoformalisethe predict the word.2 After training on a large text corpus, classic notion in linguistics that “you shall know a word a word’s pattern across the hidden layer begins to by the company it keeps” (Firth, 1957). The theoretical show higher-order relationships that go beyond the points that follow apply broadly to all DSMs that posit first order relationships it was being trained to predict. abstraction at learning, regardless of specific learning Very much like LSA’s reduced representation, the mechanism.AstwoexamplesofDSMswithverydifferent reduced representation across word2vec’s hidden layer architecturesandlearningmechanisms,1brieflyconsider has now learned similarity between words that are pre- classic Latent Semantic Analysis (LSA; Landauer & dicted by similar contexts. While LSA used SVD for data Dumais, 1997) and the newest DSM – Google’s reduction, word2vec used backpropagation; but both word2vec (Mikolov, Sutskever, Chen, Corrado, & Dean, modelsessentiallyabstractsemanticsfromepisodes. 2013). ThesesimilaritiescanbeseenacrossalloftheDSMs– LSA begins with a word-by-document frequency allachievethedesiredoutcomeofareducedabstraction matrixofatextcorpus.Thisinitial“episodic”matrixrep- ofwordmeaningfromepisodicco-occurrences.Thejury resentsfirst-orderrelationships:wordsaresimilarifthey isstilloutonwhich(ifany)mechanismisthemostplaus- have frequently co-occurred in contexts. LSA then iblemodelofhowhumansconstructsemanticrepresen- applies singular value decomposition to this episodic tations. But one property that is clear to all of these matrix (cf. factor analysis) retaining only the 300 or so “abstractionatlearning”DSMsisthattheymaybeclassi- dimensions that account for the largest amount of var- fiedasprototypemodels.Themodelsattempttocreatea iance in the original matrix. Singular value decompo- single abstracted representation of meaning for each sition serves as an abstraction mechanism, reducing word,and thissinglesemanticrepresentationiswhatis the dimensionality and emphasising second-order storedandusedindownstreamfittingofpsycholinguis- relationships that were not obvious in the episodic ticdata. matrix. In the reduced space, words will be similar if There are many similarities in the literatures on theyoccurinsimilarcontexts,eveniftheyneverdirectly semantic memory and categorisation, enough that it is co-occur (e.g. category exemplars and synonyms). But likely that the cognitive mechanisms that subserve much of the information idiosyncratic to specific con- semantic learning and category learning may be textsthatwouldberequiredtoreconstructthefullorig- heavilyrelatedtoeachother.Butonekeycontradiction inal episodic matrix has now been lost. Hence, LSA stands out: While the literature on categorisation has achieves an abstracted semantic representation by very much converged on the superiority of exemplar- applyingtruncatedSVDtothehistory ofepisodes. basedmodels,DSMmodelsareallessentiallyprototype Mikolov et al.’s (2013) word2vec achieves a similar models. outcome,albeit inarather differentway.Word2vec isa “neuralembedding”modelthathasbeenextremelysuc- Lessonsfrom categorization models cessfulincomputationallinguistics.Toacognitivescien- tist,themodelisessentiallyafeedforwardconnectionist Categorisation and semantic abstraction have many network (cf. Rumelhart networks explored in Rogers & similarities, and it is commonly believed that the McClelland, 2004) with some optimisation tricks that process of categorisation may be used to produce allow it to be scaled up to large amounts of text data. semantic structure (see Rogers &McClelland, 2011fora Word2vec has localist input and output layers, each review). The categorisation literature has been domi- with one node for each word in the corpus. The input nated for many years by a debate between prototype and output layers are fully connected via a hidden and exemplar-based theories. Prototype theories are layer of ∼300 nodes, which allows the model to learn based largely on principles of cognitive economy cham- nonlinear patterns in the text corpus. When a word is pioned by Rosch and Mervis (1975). Prototype theories experienced, the other words that it occurs with serve (e.g. Reed, 1972) posit that as category exemplars are asitscontext.Withthenodeforthecontextwordsacti- experienced, humans gradually abstract generalities vated, activation feeds forward to the output layer with across them and construct a single prototypical LANGUAGE,COGNITIONANDNEUROSCIENCE 3 representation of the category that is the central ten- pattern ofotherwords thatatargetword (the category dencyofitsexemplars;categorisationofanewexemplar label) occurs with. This reframing of semantic learning dependsonitssimilaritytocategoryprototypes.Incon- is very similar to current state-of-the-art exemplar- trast, exemplar theories (e.g. Medin & Schaffer, 1978; based models of categorisation in which “…a stimulus Nosofsky, 1988) posit that humans store every experi- is stored in memory as a complete exemplar that enced exemplar in memory, and categorisation of a includes the full combination of other features. Thus new exemplar depends on its weighted similarity to all the ‘context’ for a feature is the other features with storedexemplars. which it co-occurs.” (Kruschke, 2008, p. 273). However, Perhaps more than any other sub-field of cognition, DSMs aggregate over the multiple exemplars to create the categorisation literature has very much converged aneconomicalprototype. ontheconclusionthatexemplarshavebeatenoutproto- Hence, most DSMs collapse all instances of a word’s types as models of human categorisation (but see context into a single representation, or point in high- Murphy, 2016, for a careful discussion of the limitations dimensionalspace,verymuchconsistentwithrepresen- of both). In addition to exemplar models providing a tational economy (Rosch, 1973). This process produces better quantitative account of human categorisation huge issues in semantic representation that are known data, therearemanytheoreticallydifferentiatingeffects to the field – for example, a homograph like bank has thatareeasilyexplainablebyexemplarmodelsbutthat both senses of its meaning collapsed into a single rep- are simply impossible under prototype accounts. For resentation, despite thefact that they arevery different example,categorystructureswithnonlinearlyseparable context patterns. As a result, the representation structure (e.g. XOR) are easily learned by humans, but becomes a weighted average (biased to the more fre- impossible to account for by single prototype models quent sense) of the multiple senses of bank. A homo- (Ashby & Maddox, 1993; Nosofsky, 1988). Even when graph like bank, with multiple unrelated senses, has a using linear category structures that should be condu- similar characteristic structure to experimental stimuli cive to prototype models, exemplar models still give a with XOR structure. But the prototype collapsing is a superior quantitative fit to human data (Stanton, problemforallwordswithgradedamountsofpolysemy Nosofsky, & Zaki, 2002). Hence, it is certainly odd that thatwouldbecapturedbyanexemplar-basedmodelbut thefieldofdistributionalsemanticsisdominatedbypro- areabstractedoverbyaprototype-basedDSM.Multiple totype models, while the field of categorisation has distinct statistical structures that map onto the same largelydismissedtheminfavourofexemplaraccounts. label are collapsed in most DSMs, leading to a range of Inthe typical categorisation experiment, subjectsare both theoretical and practical issues for the models. presented with stimulus patterns – exemplars – Butfarfromrare,multiplesensesandcontextualmodu- accompanied by a category label. At test, the exper- lation patterns are really the norm in linguistic infor- imenter can present old or new exemplars, and the mation(Jones,Dye,&Johns,2016;Kintsch,2001). subject responds with the most appropriate category label for each stimulus. We can think of distributional Lessonsfrom multiple-trace models learningofsemanticsinan analogous way:The context is the exemplar pattern, and the word is the label of Posner and Keele’s (1968) schema abstraction exper- thecategorytowhichthisparticularexemplarbelongs. iment is a classic in semantic memory research, and Inword2vec,forexample,theotherwordsthatoccur was a key laboratory phenomenon that lead Tulving withatargetwordareusedasthecontext,orexemplar (1972) to divide declarative memory into separate pattern,andthecorrectlabelisthetargetword.Sointhe semanticandepisodicstoresinhismodulartaxonomy.3 sentence “I amdrinking aglass of milk,”drinking+glass In their task, Posner and Keele presented subjects with are used as the context to predict milk. The exemplar randomdotpatternsasexemplarsofmultiplecategories. pattern for milk in this context is a localist vector with Unbeknownst to subjects, the exemplars they experi- drinking and glass set to one and all other words set to enced were created from parent prototype patterns for zero. Across many language exemplars that are all of each category. A category exemplar was a random per- the category milk, word2vec homes in on a pattern of turbation (low or high distortion) of the prototype activationacrossitshiddenlayerthatoptimallypredicts pattern, but prototypes were never shown to subjects milk as the label given any language exemplar context during learning. There are many interesting effects that contains milk. In addition, the hidden layer pattern from the schema abstraction task, but a key finding is for milk will bevery similar toother words that arepre- thatwhilesubjectsattestarebetteratclassifyingexem- dicted by similar contexts such as juice and wine. So in plarsthey weretrainedon,they werebetter atthepro- allDSMs,anexemplarcanbethoughtofasthecontext totype than new exemplars. In addition, with a delay 4 M.N.JONES between training and test, performance on the proto- is an elegant existence proof that phenomena used to type (which was never experienced) is better than per- arguefortheexistenceofsemanticmemorymayactually formance on the old exemplars that subjects were beduetotheprocessofretrievalfromepisodicmemory. actually trained on. Furthermore, exemplars and proto- In the interest of parsimony, there may be no need to typesfollowdifferenttrajectoriesofdecayasafunction posit an additional semantic store when a model that ofretentiontime. hasonlyanepisodicstorecanproduceallthephenom- This pattern of results suggests that subjects are ena that a model with two distinct stores could. This storing experienced exemplars in episodic memory at claim bears considerable similarity to other instance- thesametimeastheyarecreatinganabstractedproto- based models of memory and exemplar-based models typeforthecategory.Thedifferentialdecaypatternsalso of categorisation. So why mention historical cases like suggestthattheseinformationsourcesarestoredbydis- MINERVA 2 and schema abstraction here? Because one tinct memory systems, and that semantic memory is ofthefirstsuccessfulexemplar-basedDSMsincognitive more resilient to decay than is episodic memory. The science extends MINERVA 2’s architecture exactly to a pattern would seem to argue in favour of DSMs that textcorpus,andmakesthesametheoreticalclaims. use abstraction at encoding to create a prototypical pattern,andepisodicmemoryisthenexplainedbyadis- Exemplar-basedsemantic models tinctmodel. However, Hintzman (1984; 1986) provided a classic While it is true that most DSMs are prototype models, demonstration using his MINERVA 2 memory model there is a small family of exemplar-based semantic that questioned whether the effects seen in Posner models that diverge from the usual quest for cognitive and Keele’s (1968) schema abstraction task suggest the economy. Exemplar-based semantic models are also existence of a prototype in memory at all. Briefly, referred to as “retrieval-based” models in the cognitive MINERVA 2 is a instance-based memory model: it literature or simply as “memory models” in compu- stores a pattern for each exemplar in episodic memory, tational linguistics. Rather than positing abstraction as but has no semantic memory. Multiple presentations of a dimensional reduction mechanism at learning, they an exemplar simply lay down multiple memory traces. store all of a word’s episodic contexts, and abstraction The model explains a range of episodic memory effects is a consequence of retrieval from episodic memory. such as recognition, judgments of frequency, etc. But it Hence, there is no semantic memory per se in these can also perform the classification task used in the models, only episodic memory. In exemplar-based schema abstraction experiments. When presented with models,phenomenathathavetypicallybeenattributed a probe pattern (an old or new exemplar) MINERVA 2 tosemanticmemoryareanemergentartifactofretrieval simultaneously computes the probe’s similarity to all from episodic memory. The locus of semantics is not at storedexemplartracesinmemory,andtheretrievedcat- encoding, but at retrieval. These models have grown egorylabelfortheprobeisweightedbythescaledsimi- from exemplar-based models in categorisation, and larity of the probe to all exemplars (cf. Nosofsky’s, 1986 instance-based models in memory. Intuitively, many exemplar-based model of categorisation). The retrieved people believe the idea that we store everything we patternis referred to as an “echo” from memory, and is ever experience rather than creating and storing an based loosely on the principle of harmonic resonance. economical abstraction is far-fetched. But given the Although it has no semantic memory per se, MINERVA successofexemplar-basedsemanticmodelsataccount- 2reproducesthekeyphenomenainschemaabstraction ing for an impressive array of semantic behaviours thathadpreviouslybeenseenasevidencefordualepi- without any semantic memory, and the current resur- sodic and semantic stores. The model performs better gence of usage-based theories in linguistics (Goldberg, on old exemplars at immediate test (but better on the 2006; Johns & Jones, 2015; Tomasello, 2003), exemplar- prototype than new exemplars), and performance on basedsemanticmodelsdeserveacloserlook. the prototype is better than the training exemplars Kwantes(2005)extendedHintzman’s(1986)MINERVA afterforgetting.Superiorperformanceontheprototype 2toexplainsemanticphenomenawithwordsbytraining isduesimplytothefactthatitisthecentraltendencyof it on a text corpus. In his Constructed Semantics Model the exemplar patterns; hence the prototype’s pattern is (CSM),4 each word’s representation in memory is a distributed across the exemplars. The performance tra- binaryvectorthatreflectswhetheritoccurredinadocu- jectories of exemplars and the prototype as a function ment or not – its episodic history. Note that memory in ofdelayhavedistinctslopes. CSM is the same word-by-document matrix that LSA Hintzman’s (1984; 1986) demonstration is well and other DSMs learn from. But where LSA applies covered in most contemporary memory textbooks – it abstraction to this episodic matrix and stores a higher- LANGUAGE,COGNITIONANDNEUROSCIENCE 5 order representation, CSM stores the episodic matrix DSMs. For example, it can handle polysemous words itself. When a word is presented to CSM, its episodic because the multiple senses of the words are still rep- vector is used as a probe as in MINERVA 2. Each word resented and are dissociable with nonlinear activation inmemory isactivatedrelativetoitscontextualoverlap of exemplars (cf. Nosofsky, 1986). Memory traces with the probe word, and the echo pattern is then the whose context fits one or the other sense of a word similarity-weighted sum ofall traces inmemory, exactly can be differentially activated in CSM. Abstractionist as in Hintzman (1986). Words that have similar contex- DSMs, on the other hand, collapse multiple senses of a tual histories to the probe word will contribute more of word to a single point in high-dimensional space, their pattern to the echo than will words with rather losing the distinction in favour of an averaged independent histories, and the echo for the word is representation.5 then an ad hoc, and probe specific, prototype created Kwantes (2005) work suggests that the same basic by the this process of retrieval from episodic memory. memory system could underlie both episodic and To compute the semantic similarity between two semantic knowledge, and his work has given rise to a words, one simply computes the cosine between their handful of other models that have explored semantic twoechopatterns. abstraction as a memory retrieval operation rather than It is fairly obvious how CSM can determine similarity a learning mechanism. For example, Dennis (2005; see for words that frequently co-occur with each other also Thiessen, 2017) presented a memory-based model (their echo cosines would be a noisy amplification of of verbal processing, including semantics and syntactic the terms’ likelihood of co-occurrence relative to information as retrieval from long-term memory and chance). But higher-order semantic similarities also constraint satisfaction in working memory. The model emerge from this process of retrieval, even between mechanisms are based on a Bayesian interpretation of two words that have zero contextual overlap. For string edit theory from linguistics. Dennis’ model posits example, two synonyms would not activate each other that processing a word or sentence is at its core a at all because they have never co-occurred in the text memory-retrievalprocess. corpus, but they would activate many of the same Johns and Jones (2014, 2015; see also Thiessen & other words due to their similar contextual usage; as a Pavlik,2013)extendedthispreviousworkintoanexem- result, their retrieved echo patterns are extremely plar-based model, based on a hybrid of Hintzman’s similar. Models such as LSA and word2vec accomplish MINERVA 2 (1986) and Jones and Mewhort’s (2007) this second-order statistical inference while learning a BEAGLE architectures, that encodes sentences from a corpus,whereasCSMdoesitwhileretrievinginformation natural language text corpus into individual memory fromepisodicmemory. exemplars.Theretrievalmechanismisusedtogenerate Hence, semantic abstraction in CSM is a parallel to expectancies about the future structure of sentences, schema abstraction in MINERVA 2: the prototype is an much in the same way as Kwantes (2005) constructs a emergent property of retrieval from episodic memory. word’s meaning as a prediction of the future contexts AsKwantes (2005)puts it,CSM“…takeswhatitknows in which it might occur. Johns and Jones found that about a word’s contexts and uses retrieval to estimate such an exemplar-based model successfully accounted what other context might also contain the word” for a wide range of sentence processing tasks that had (p. 706). The model bears obvious similarity in outcome commonlybeenseenasevidenceforrule-basedabstrac- to prototype-based DSMs, but it differs considerably in tion of linguistic constraints. Johns, Jamieson, Crump, the psychological mechanism that it attributes abstrac- Jones, and Mewhort (2016) extended this model to tionto;inCSM,itisthewell-establishedprocessofretrie- demonstrate that rule-based grammatical behaviour is valthatuncoversdeepersemanticstructure. a natural emergent property of retrieval from a model As with Hintzman’s (1986) demonstration, CSM is a that stores exemplars of linguistic experience. Hence, more parsimonious model of semantics – it does not bothsemanticsandsyntaxmayverywellbeconstructed require two separate stores or processes to explain propertiesofretrievalfromepisodicmemoryratherthan semantic and episodic memory, and serves as an exist- abstractedstructuresorrules,perse.6 enceproofthatsemanticphenomenamaybeexplained byamodelthatonlypositsanepisodicmemorystore.In Exemplar-basedmodels innatural language addition,thesuccess ofthemodelisreinforcedbycon- processing verging evidence supporting exemplar-based models in the fields of categorisation and recognition. Further- It is tempting to think of exemplar-based models as a more, there are real benefits to CSM that allow it to psychologycentrictheorywithlittle,ifany,practicalsig- handle phenomena not possible by abstractionist nificance. After all, why would a computing scientist 6 M.N.JONES want to store all data instances? Data compression and quitesimilartomultipleprototypetheoriesofcategoris- abstractionarecoregoalstoinformationretrievalappli- ation (Minda & Smith, 2001). Similarly, there has been cations.However,exemplarmodelsarenowseeingcon- considerable successinNLPwithmodelsthatrepresent siderable use in natural language processing (NLP) as words as regions, rather than points, in distributional well, for the very same reasons that they are preferred space (Erk, 2009; Vilnis & McCallum, 2015). These in categorisation: the affordance of nonlinear activation models preserve nonlinear activation of exemplars, but ofmemoryexemplarsgivenaprobe. while embedding them in a more reasonable search A classic example in NLP was presented by Daele- space with attractor basins. The practice has a similar mans, Van Den Bosch, and Zavrel (1999), showing that outcometosettingathresholdonthesimilarityfunction abstractionist models lose exceptions to common pat- in exemplar-based psychological models to reduce the ternsinavarietyoflanguageprocessingtasks.Prototype activation of irrelevant items (which is precisely what modelsofferthebestsinglerepresentation,butthedis- Kwantes,2005,modeldoes).Thisalsosuggestsconsider- tributionofmeaningsandusagerulesisheavilyskewed able potential for the application of hybrid rule-and- innaturallanguages–prototypemodelsdiscardthetail exception models from human category learning (e.g. (cf. Johns & Jones, 2010 in lexical semantics). Daelmans Nosofsky, Palmeri,&McKinley,1994). etal.foundthatretainingexceptionaltraininginstances Also of interest in practical NLP applications is the in memory was actually beneficial for generalisation recent rise of so-called memory networks (Weston, accuracy across a wide range of common NLP tagging Chopra,&Bordes,2015)thathaveprovenverysuccessful tasks,andtheyarguethatthefieldneedstotakeexem- atopenquestionansweringwithcomplexreal-worldtext plar-basedmemorymodelsmuchmoreseriously. materials. Memory networks use a long-term exemplar More recently, exemplar-based models have seen a memory network as a dynamic knowledge base, and resurgence in NLP, offering better accuracy on applied have produced state-of-the-art results with difficult problems that the field had been deadlock on with tasks such as question answering, summarisation, and abstractionist models. For example, Erk and Padó text-based inference (Bordes, Usunier, Chopra, & (2010) used an exemplar-based memory model in Weston,2015). whichseparate exemplars were encoded for words and sentences (cf. Dennis, 2005; Johns & Jones, 2015). They Discussion found superior performance on a practical paraphrase task using this architecture due to the nonlinear acti- Meaning is a fundamental human attribute that perme- vation of related exemplars – this behaviour allowed ates all cognition, from low-level perceptual processing the model to “ignore” the exemplars that were other to high-level problem solving, and everything in senses of a target word, which would have been a col- between. Semantic abstraction is what makes us a lapsed noise source in an abstractionist model. In fact, powerful species – informavores. The idea that humans their exemplar model outperformed all then state-of- constructandstoreabstractedsemanticrepresentations the-artparaphrasingmodels,andhasconsiderablesimi- forconceptsisalmostsacredincognitivescience.Butit larity to exemplar-based memory models in cognitive isalsoatoddswithconclusionsfromotherareasofcog- science(e.g.Thiessen&Pavlik,2013). nition,suchascategorisationandrecognition,whichpre- However, an issue with the application of exemplar- sumablytapaspectsofthesamecognitivemechanismas basedmodelstoappliedNLPtaskswillalwaysbeproces- semantic learning. And exemplar-based DSMs suggest sing time. Exemplar-based models are fast to train, but that, like Hintzman’s (1986) demonstration, we might requiresubstantialandevencomputationallyimpractical be able to explain all the same semantic phenomena memory resources, and are slow to retrieve the correct without a semantic memory. According to exemplar- answer. In contrast, prototype models embody data based DSMs, semantic memory is a process, not a compression, putting all the time into training the structure. single best representation, but then the search time for Itistemptingtoseeexemplar-basedDSMsas“cheat- a similar instance in memory is much more efficient. In ing:” If the model simply stores all data, then it can applied problems, such as information retrieval, access computeanaccuratesemanticrepresentationwhenever timeiseverything.However,therehavebeenmanysuc- oneisneeded.Butthetheoreticalclaimisprofound–itis cessful hybrid models emerging in NLP that balance a frightening proposal that we may not actually have accuracy with generalisation and speed. Multiple proto- semantic memory. Your interpretation of the words type models (e.g. Reisinger & Mooney, 2010) have you are reading right now may be constructed on the become popular to represent the distinct senses of a flyasanartifactofretrievingthevisualpatternsfromepi- word without needing to store all exemplars, and are sodicmemory.Ourphenomenologyofmeaningmaybe LANGUAGE,COGNITIONANDNEUROSCIENCE 7 continuously constructed as the interaction between equivalence between classic exemplar based models stimuli, episodic memory, and the memory retrieval andneuralexemplartheories.Notonlydoexemplarthe- mechanism that mediates them (Kintsch & Mangalath, ories provide superior quantitative fits, but increasing 2011). But exemplar-based DSMs should also put us at biological plausibility also extends predictions to find- ease – they provide converging evidence that perform- ingsfromcognitiveneuroscience(e.g.Ashby&Valentin, ance across multiple cognitive domains (e.g. categoris- 2016; Hélie, Paul, & Ashby, 2012; Valentin, Maddox, & ation, recognition, semantics) may be explained by the Ashby,2014). same unified cognitive principle. Exploring exemplar- The predominance of prototype-based DSMs in the based DSMs also has practical considerations for edu- literature may be partially due to Chomskian presump- cation, where exemplar-based models of categorisation tionsinlinguisticsthatthejobofthecognitivemechan- have been successfully applied (Norman, Young, & ismistoabstracttherulesofagrammarfrominstances. Brooks, 2007; Nosofsky, 2017). And since humans are This abstractionist presumption may have implicitly both the producers and consumers of linguistic infor- guidedarchitecturaldecisionsinearlyDSMs.Inaddition, mation in all practical NLP tasks, it is also reassuring thenotionofcognitiveeconomy(Rosch&Mervis,1975) that the recent findings in NLP may suggest that the was a guiding principle to models of semantic abstrac- best models to serve humans in these tasks bear con- tion. However, both of these theoretical presumptions siderablesimilaritytothemodelswebelievehumancog- are currently being revisited given the strength of nitionhasevolvedtouse. usage-based theories in linguistics (Tomasello, 2003). How might exemplar-based semantic models be Butamorelikelyreasonforthepreferenceofprototype- implemented in neural hardware? A common criticism over exemplar-based DSMs in practice is that exemplar againstexemplartheoryisthatitisstrandedatthecom- models are much more computationally expensive putationallevelofMarr’shierarchy,butthetransitionto thanprototypemodels.Thefront-enddatacompression implementation is untenable. The claim that we simply coretoprototypeDSMsmeansthattheyrequirefarless store everything that we experience seems unintuitive memorytostore,andarefarmoreefficienttouse,than and goes against the core principle of cognitive exemplar-models. economy. However, Hintzman (1990) has shown how Development ofDSMs ingeneralhasbenefitedfrom an exemplar-based memory model such as MINERVA 2 cross-disciplinary interactions with applied fields, such can be easily implemented within a neural network fra- as information retrieval. Put simply, these models are mework. In addition, there is a small body of work both theoretically informative to cognitive science, and attempting to understand and formalise biologically useful for practical NLP tasks. However, the utility of plausibleexemplartheoriesofrecognitionandcategoris- the model should not constrain its theoretical informa- ation,whichhavetypicallypointedtoaroleforthehip- tiveness. Prototype DSMs are needed because they pocampus and surrounding medial temporal lobe provide an efficient and economical estimate of a structures (e.g. Pickering, 1997). Futhermore, Becker’s word’s aggregate meaning. Exemplar-based DSMs (2005) models cleanly demonstrate that hippocampal contain more information, but the retrieval problem coding would give rise to distinct memory represen- becomes intractable and untenable for practical tasks. tationsforhighlysimilaritems. Nobody wants to type a search query into Google and Morerecentworkincategorisationisnowfocusingon have it determine what you mean by activating and thebasalgangliaandstriatumasgivingrisetotheoper- weighting exemplars in real time; the time-intensive ations needed for exemplar models (see Ashby & Rose- computation should have been completed and stored dahl, 2017, for a review). Ashby and Rosedahl recently longbeforeyoutypeinthequerywords. introduced a neural implementation of exemplar ButtheconstraintofutilityinNLPmayhavehadthe theory,inwhichakeyrolefortheformationofcategory unwanted effect of guiding our theoretical models of exemplars is assigned to synaptic plasticity at cortical- the mind away from exemplar models. There are many striatal synapses. Rather than storing strict exemplars, differences between the brain and computational data- perse,theirmodeladdsnodesandmanipulatesconnec- bases in how they represent and retrieve information. tivity between striatal and sensory neurons, achieving The search and abstraction processes used in cognition the same effect as classic exemplar models. Ashby and need not be identical to those best for database Rosedahl show that their neural implementation of search. Models of cognition have long assumed that exemplar theory is mathematically equivalent to classic memoryexemplarscanbeactivatedinparallel,although exemplar theories such as the General Context Model thecodeweusetoimplementthisinamodelwillusually (Nosofsky, 1986), and makes identical predictions. The use a loop routine. This is a distinct difference between work of Ashby & Rosedahl establishes an important the two disciplines: Looping through all exemplars is 8 M.N.JONES not an efficient method of, for example, word similarity Ashby,F.G,&Valentin,V.V.(2016).Multiplesystemsofpercep- matching, but it may well be the correct model of how tual category learning: Theory and cognitive tests. In H. Cohen & C. Lefebvre (Eds.), Handbook of categorization in humans do it. The constraints of our current compu- cognitive science, second edition (p. in press). New York: tational hardware should not be used as reasons to Elsevier. discardotherwisesuperiorfittingmodelsofhumancog- Barsalou, L. W. (1999). Perceptions of perceptual symbols. nition,suchasexemplarmodels. BehavioralandBrainSciences,22(4),637–660. Becker, S. (2005). A computational principle for hippocampal learningandneurogenesis.Hippocampus,15(6),722–738. Notes Bordes, A., Usunier, N., Chopra, S., & Weston, J. (2015). Large- scale simple question answering with memory networks 1. LSA and word2vec are formally equivalent: Levy and (arXivpreprintarXiv:1506.02075). Goldberg (2014) demonstrated analytically how the Daelemans,W.,VanDenBosch,A.,&Zavrel,J.(1999).Forgetting SGNS architecture of word2vec is implicitly factorizing exceptions is harmful in language learning. Machine a word-by-context matrix whose cell values are shifted Learning,34(1–3),11–41. PMIvalues. Damasio, A. R. (1989). Time-locked multiregional retroactiva- 2. The model’s direction can also be inverted, using the tion: A systems-level proposal for the neural substrates of word to predict the context (SGNS) rather than using recallandrecognition.Cognition,33(1),25–62. thecontexttopredicttheword(CBOW). Dennis,S.(2005).Amemory-basedtheoryofverbalcognition. 3. ThebulkoftheevidenceusedbyTulvingtoarguefordis- CognitiveScience,29,145–193. tinctsemantic and episodic memorysystems was from Erk, K. (2009). Representing words as regions in vector space. neuropsychologicalpatients. Proceedingsofthethirteenthconferenceoncomputational 4. Themodelissimplyreferredtoasthe“semanticsmodel” natural language learning (pp. 57–65). Association for in Kwantes’ (2005) original paper, but “Constructed ComputationalLinguistics. Semantics Model” has become it’s popular name Erk, K., & Padó, S. (2010). Exemplar-based models for word among semantic modelers because semantic represen- meaningincontext.Proceedingsoftheacl2010conference tations are constructed on the fly from episodic short papers (pp. 92–97). Association for Computational memoryinthemodel. Linguistics. 5. An exception here is the topic model, which uses con- Firth,J.R.(1957).Asynopsisoflinguistictheory(pp.1930–1955). ditionalprobabilities,soitisnotsubjecttometricrestric- Oxford:Blackwell. tions of spatial models (e.g., Griffiths, Steyvers, & Goldberg, A. E. (2006). Constructions at work: The nature of Tenenbaum,2007). generalization in language. New York: Oxford University 6. Andessentiallythesamearchitecturehasbeenusedby Press. Goldinger(1998)toexplain“abstract”qualitiesofspoken Goldinger,S.D.(1998).Echoesofechoes?Anepisodictheoryof wordrepresentationfromepisodicmemoryretrieval. lexicalaccess.PsychologicalReview,105(2),251–279. Griffiths,T.L.,Steyvers,M.,&Tenenbaum,J.B.(2007).Topicsin semanticrepresentation.PsychologicalReview,114(2),211. Acknowledgements Hélie,S.,Paul,E.J.,&Ashby,F.G.(2012).Aneurocomputational account of cognitive deficits in Parkinson’s disease. This work was supported by NSF BCS-1056744 and IES Neuropsychologia,50(9),2290–2302. R305A150546. I would like to thank Randy Jamieson, Melody Hintzman, D. L. (1984). MINERVA 2: A simulation model of Dye,andBrendanJohnsforhelpfulinputanddiscussions. human memory. Behavior Research Methods, Instruments, & Computers,16(2),96–101. Hintzman,D.L.(1986).“Schemaabstraction”inamultiple-trace Disclosure statement memorymodel.PsychologicalReview,93,411–428. Hintzman, D. L. (1990). Human learning and memory: Nopotentialconflictofinterestwasreportedbytheauthors. Connectionsanddissociations.AnnualReviewofPsychology, 41(1),109–139. Johns, B. T., Jamieson, R. K., Crump, M. J. C., Jones, M. N., & Funding Mewhort,D.J. K.(2016).The combinatorialpowerofexperi- This work was supported by National Science Foundation ence. Proceedings of the 37th meeting of the cognitive [grant number BCS-1056744] and Institute of Education sciencesociety. Sciences[grantnumberR305A150546]. Johns,B.T.,&Jones,M.N.(2010).Evaluatingtherandomrep- resentation assumption of lexical semantics in cognitive models.PsychonomicBulletinandReview,17,662–672. Johns, B. T., & Jones, M. N. (2014). Generating structure from References experience: The role of memory in language. Proceedings of Ashby,F.G.,&Maddox,W.T.(1993).Relationsbetweenproto- the 35th annual conference of the cognitive science type, exemplar, and decision bound models of categoriz- society.Austin,TX:CognitiveScienceSociety. ation.JournalofMathematicalPsychology,37(3),372–400. Johns, B. T., & Jones, M. N. (2015). Generating structure from Ashby,F.G.,&Rosedahl,L.(2017).Aneuralimplementationof experience:Aretrieval-basedmodeloflanguageprocessing. exemplartheory.PsychologicalReview,124(4),472–482. CanadianJournalofExperimentalPsychology,69,233–251. LANGUAGE,COGNITIONANDNEUROSCIENCE 9 Jones,M.N.,Dye,M.,&Johns,B.T.(2016).Contextasanorga- Nosofsky,R.M.,Sanders,C.A.,&McDaniel,M.A.(2017).Testsof nizingprincipleofthelexicon.InB.Ross(Ed.),Thepsychology an Exemplar-Memory Model of Classification Learning in a oflearningandmotivationion(Vol.67,pp.239–283).https:// High-Dimensional Natural-Science Category Domain. doi.org/10.1016/bs.plm.2017.03.008 Journal of Experimental Psychology: General. http://psycnet. Jones, M. N., & Mewhort, D. J. K. (2007). Representing word apa.org/doi/10.1037/xge0000369 meaningandorderinformationinacompositeholographic Pickering,A.D.(1997).Newapproachestothestudyofamnesic lexicon.PsychologicalReview,114(1),1–37. patients:Whatcananeurofunctionalphilosophyandneural Jones,M.N.,Willits,J.A.,&Dennis,S.(2015).Modelsofsemantic networkmethodsoffer?Memory,5(1–2),255–300. memory.InJ.R.Busemeyer&J.T.Townsend(Eds.),Oxford Posner,M.I.,&Keele,S.W.(1968).Onthegenesisofabstractideas. handbook of mathematical and computational psychology JournalofExperimentalPsychology,77(3p1),353–363. (pp.232–254).NewYork: OxfordUniversityPress. Reed, S. K. (1972). Pattern recognition and categorization. Kintsch, W. (2001). Predication. Cognitive Science, 25(2), 173– CognitivePsychology,3(3),382–407. 202. Reisinger,J.,&Mooney,R.J.(2010).Multi-prototypevector-space Kintsch, W., & Mangalath, P. (2011). The construction of modelsofwordmeaning.InHumanLanguageTechnologies: meaning.TopicsinCognitiveScience,3(2),346–370. The2010AnnualConferenceoftheNorthAmericanChapter Kruschke,J.K.(2008).Modelsofcategorization.InR.Sun(Ed.), of the Association for Computational Linguistics, Los TheCambridge Handbookof Computational Psychology (pp. Angeles,June2010. 267–301).NewYork:CambridgeUniversityPress. Rogers,T.T.,&McClelland,J.L.(2004).Semanticcognition:Apar- Kwantes, P. J. (2005). Using context to build semantics. allel distributed processing approach. Cambridge, MA: MIT PsychonomicBulletin&Review,12,703–710. Press Landauer, T. K., & Dumais, S. T. (1997). A solution to Plato’s Rogers,T.T.,&McClelland,J.L.(2011).5semanticswithoutcat- problem:Thelatentsemanticanalysistheoryofacquisition, egorization. In E. M. Pothos & A. J. Wills (Eds.), Formal induction, and representation of knowledge. Psychological Approaches in Categorization (pp. 88–119). New York: Review,104(2),211–240. CambridgeUniversityPress. Levy, O., & Goldberg, Y. (2014). Neural word embedding as Rosch, E.H. (1973).Natural categories. Cognitive Psychology,4 implicit matrix factorization. In Z. Ghahramani, M. Welling, (3),328–350. &C.Cortes (Eds.),Advancesinneuralinformationprocessing Rosch,E.,&Mervis,C.B.(1975).Familyresemblances:Studiesin systems (pp. 2177–2185). Cambridge, MA: MIT Press the internal structure of categories. Cognitive Psychology, 7 Cambridge. (4),573–605. Medin,D.L.,&Schaffer,M.M.(1978).Contexttheoryofclassifi- Stanton,R.D.,Nosofsky,R.M.,&Zaki,S.R.(2002).Comparisons cationlearning.PsychologicalReview,85(3),207–238. between exemplar similarity and mixed prototype models Mikolov, T., Sutskever, I., Chen, K., Corrado, G. S., & Dean, J. using a linearly separable category structure. Memory & (2013). Distributed representations of words and phrases Cognition,30(6),934–944. andtheircompositionality.InAdvancesinneuralinformation Thiessen,E.D.(2017).What’sstatisticalaboutlearning?Insights processingsystems(pp.3111–3119). frommodellingstatisticallearningasasetofmemorypro- Minda,J.P.,&Smith,J.D.(2001).Prototypesincategorylearn- cesses. Philosophical Transactions of the Royal Society B: ing:Theeffectsofcategorysize,categorystructure,andsti- BiologicalSciences,372(1711),20160056. mulus complexity. Journal of Experimental Psychology: Thiessen,E.D.,&Pavlik,P.I.(2013).Iminerva:Amathematical Learning,Memory,andCognition,27(3),775. model of distributional statistical learning. Cognitive Murphy,G.L.(2016).Isthereanexemplartheoryofconcepts? Science,37(2),310–343. PsychonomicBulletin&Review,23(4),1035–1042. Tomasello, M. (2003). Constructing a language: A usage-based Norman, G., Young, M., & Brooks, L. (2007). Non-analytical theory of language acquisition. Cambridge, MA: Harvard models of clinical reasoning: The role of experience. UniversityPress. MedicalEducation,41,1140–1145. Tulving,E.(1972).Episodicandsemanticmemory.InE.Tulving Nosofsky, R. M. (1986). Attention, similarity, and the identifi- & W. Donaldson (Eds.), Organization of memory (pp. 381– cation–categorization relationship. Journal of Experimental 403).NewYork,NY:AcademicPress. Psychology:General,115(1),39–57. Valentin,V.V.,Maddox,W.T.,&Ashby,F.G.(2014).Acompu- Nosofsky, R. M. (1988). Exemplar-based accounts of relations tationalmodelofthetemporaldynamicsofplasticityinpro- between classification, recognition, and typicality. Journal cedural learning: sensitivity to feedback timing. Frontiers in of Experimental Psychology: Learning, Memory, and Psychology,5,1–9. Cognition,14(4),700–708. Vilnis,L.,&McCallum,A.(2015). Wordrepresentationsviagaus- Nosofsky,R.M.,Palmeri,T.J.,&McKinley,S.C.(1994).Rule-plus- sianembedding.arXivpreprintarXiv:1412.6623. exception model of classification learning. Psychological Weston, J., Chopra, S., & Bordes, A. (2015). Memory networks. Review,101(1),53–79. arXivpreprintarXiv:1410.3916.

ERIC ED599996: When Does Abstraction Occur in Semantic Memory: Insights from Distributional Models PDF

2018·0.14 MB·English

by ERIC

#additional_collections #ericarchive

Checking for file health...

Save to my drive

Quick download

Download

Download ERIC ED599996: When Does Abstraction Occur in Semantic Memory: Insights from Distributional Models PDF Free - Full Version

by ERIC| 2018| 0.14| English

Download ERIC ED599996: When Does Abstraction Occur in Semantic Memory: Insights from Distributional Models by ERIC in PDF format completely FREE. No registration required, no payment needed. Get instant access to this valuable resource on PDFdrive.to!

Free Download PDF

About ERIC ED599996: When Does Abstraction Occur in Semantic Memory: Insights from Distributional Models

No description available for this book.

Detailed Information

Author:	ERIC
Publication Year:	2018
Language:	English
File Size:	0.14
Format:	PDF
Price:	FREE

Download Free PDF

Safe & Secure Download - No registration required

Why Choose PDFdrive for Your Free ERIC ED599996: When Does Abstraction Occur in Semantic Memory: Insights from Distributional Models Download?

100% Free: No hidden fees or subscriptions required for one book every day.
No Registration: Immediate access is available without creating accounts for one book every day.
Safe and Secure: Clean downloads without malware or viruses
Multiple Formats: PDF, MOBI, Mpub,... optimized for all devices
Educational Resource: Supporting knowledge sharing and learning

Frequently Asked Questions

Is it really free to download ERIC ED599996: When Does Abstraction Occur in Semantic Memory: Insights from Distributional Models PDF?

Yes, on https://PDFdrive.to you can download ERIC ED599996: When Does Abstraction Occur in Semantic Memory: Insights from Distributional Models by ERIC completely free. We don't require any payment, subscription, or registration to access this PDF file. For 3 books every day.

How can I read ERIC ED599996: When Does Abstraction Occur in Semantic Memory: Insights from Distributional Models on my mobile device?

After downloading ERIC ED599996: When Does Abstraction Occur in Semantic Memory: Insights from Distributional Models PDF, you can open it with any PDF reader app on your phone or tablet. We recommend using Adobe Acrobat Reader, Apple Books, or Google Play Books for the best reading experience.

Is this the full version of ERIC ED599996: When Does Abstraction Occur in Semantic Memory: Insights from Distributional Models?

Yes, this is the complete PDF version of ERIC ED599996: When Does Abstraction Occur in Semantic Memory: Insights from Distributional Models by ERIC. You will be able to read the entire content as in the printed version without missing any pages.

Is it legal to download ERIC ED599996: When Does Abstraction Occur in Semantic Memory: Insights from Distributional Models PDF for free?

https://PDFdrive.to provides links to free educational resources available online. We do not store any files on our servers. Please be aware of copyright laws in your country before downloading.

The materials shared are intended for research, educational, and personal use in accordance with fair use principles.