Table Of ContentPretrained Transformers
for Text Ranking
BERT and Beyond
iii
Synthesis Lectures on Human
Language Technologies
Editor
GraemeHirst,UniversityofToronto
SynthesisLecturesonHumanLanguageTechnologiesiseditedbyGraemeHirstoftheUniversity
ofToronto.Theseriesconsistsof50-to150-pagemonographsontopicsrelatingtonatural
languageprocessing,computationallinguistics,informationretrieval,andspokenlanguage
understanding.Emphasisisonimportantnewtechniques,onnewapplications,andontopicsthat
combinetwoormoreHLTsubfields.
PretrainedTransformersforTextRanking:BERTandBeyond
JimmyLin,RodrigoNogueira,andAndrewYates
2021
AutomatedEssayScoring
BeataBeigmanKlebanovandNitinMadnani
2021
ExplainableNaturalLanguageProcessing
AndersSøgaard
2021
Finite-StateTextProcessing
KyleGormanandRichardSproat
2021
SemanticRelationsBetweenNominals,SecondEdition
ViviNastase,StanSzpakowicz,PreslavNakov,andDiarmuidÓSéagdha
2021
EmbeddingsinNaturalLanguageProcessing:TheoryandAdvancesinVector
RepresentationsofMeaning
MohammadTaherPilehvarandJoseCamacho-Collados
2020
ConversationalAI:DialogueSystems,ConversationalAgents,andChatbots
MichaelMcTear
2020
iv
NaturalLanguageProcessingforSocialMedia,ThirdEdition
AnnaAtefehFarzindarandDianaInkpen
2020
StatisticalSignificanceTestingforNaturalLanguageProcessing
RotemDror,LotemPeled,SegevShlomov,andRoiReichart
2020
DeepLearningApproachestoTextProduction
ShashiNarayanandClaireGardent
2020
LinguisticFundamentalsforNaturalLanguageProcessingII:100Essentialsfrom
SemanticsandPragmatics
EmilyM.BenderandAlexLascarides
2019
Cross-LingualWordEmbeddings
AndersSøgaard,IvanVulić,SebastianRuder,ManaalFaruqui
2019
BayesianAnalysisinNaturalLanguageProcessing,SecondEdition
ShayCohen
2019
ArgumentationMining
ManfredStedeandJodiSchneider
2018
QualityEstimationforMachineTranslation
LuciaSpecia,CarolinaScarton,andGustavoHenriquePaetzold
2018
NaturalLanguageProcessingforSocialMedia,SecondEdition
AtefehFarzindarandDianaInkpen
2017
AutomaticTextSimplification
HoracioSaggion
2017
NeuralNetworkMethodsforNaturalLanguageProcessing
YoavGoldberg
2017
v
Syntax-basedStatisticalMachineTranslation
PhilipWilliams,RicoSennrich,MattPost,andPhilippKoehn
2016
Domain-SensitiveTemporalTagging
JannikStrötgenandMichaelGertz
2016
LinkedLexicalKnowledgeBases:FoundationsandApplications
IrynaGurevych,JudithEckle-Kohler,andMichaelMatuschek
2016
BayesianAnalysisinNaturalLanguageProcessing
ShayCohen
2016
Metaphor:AComputationalPerspective
TonyVeale,EkaterinaShutova,andBeataBeigmanKlebanov
2016
GrammaticalInferenceforComputationalLinguistics
JeffreyHeinz,ColindelaHiguera,andMennovanZaanen
2015
AutomaticDetectionofVerbalDeception
EileenFitzpatrick,JoanBachenko,andTommasoFornaciari
2015
NaturalLanguageProcessingforSocialMedia
AtefehFarzindarandDianaInkpen
2015
SemanticSimilarityfromNaturalLanguageandOntologyAnalysis
SébastienHarispe,SylvieRanwez,StefanJanaqi,andJackyMontmain
2015
LearningtoRankforInformationRetrievalandNaturalLanguageProcessing,Second
Edition
HangLi
2014
Ontology-BasedInterpretationofNaturalLanguage
PhilippCimiano,ChristinaUnger,andJohnMcCrae
2014
vi
AutomatedGrammaticalErrorDetectionforLanguageLearners,SecondEdition
ClaudiaLeacock,MartinChodorow,MichaelGamon,andJoelTetreault
2014
WebCorpusConstruction
RolandSchäferandFelixBildhauer
2013
RecognizingTextualEntailment:ModelsandApplications
IdoDagan,DanRoth,MarkSammons,andFabioMassimoZanzotto
2013
LinguisticFundamentalsforNaturalLanguageProcessing:100Essentialsfrom
MorphologyandSyntax
EmilyM.Bender
2013
Semi-SupervisedLearningandDomainAdaptationinNaturalLanguageProcessing
AndersSøgaard
2013
SemanticRelationsBetweenNominals
ViviNastase,PreslavNakov,DiarmuidÓSéaghdha,andStanSzpakowicz
2013
ComputationalModelingofNarrative
InderjeetMani
2012
NaturalLanguageProcessingforHistoricalTexts
MichaelPiotrowski
2012
SentimentAnalysisandOpinionMining
BingLiu
2012
DiscourseProcessing
ManfredStede
2011
BitextAlignment
JörgTiedemann
2011
LinguisticStructurePrediction
NoahA.Smith
2011
vii
LearningtoRankforInformationRetrievalandNaturalLanguageProcessing
HangLi
2011
ComputationalModelingofHumanLanguageAcquisition
AfraAlishahi
2010
IntroductiontoArabicNaturalLanguageProcessing
NizarY.Habash
2010
Cross-LanguageInformationRetrieval
Jian-YunNie
2010
AutomatedGrammaticalErrorDetectionforLanguageLearners
ClaudiaLeacock,MartinChodorow,MichaelGamon,andJoelTetreault
2010
Data-IntensiveTextProcessingwithMapReduce
JimmyLinandChrisDyer
2010
SemanticRoleLabeling
MarthaPalmer,DanielGildea,andNianwenXue
2010
SpokenDialogueSystems
KristiinaJokinenandMichaelMcTear
2009
IntroductiontoChineseNaturalLanguageProcessing
Kam-FaiWong,WenjieLi,RuifengXu,andZheng-shengZhang
2009
IntroductiontoLinguisticAnnotationandTextAnalytics
GrahamWilcock
2009
DependencyParsing
SandraKübler,RyanMcDonald,andJoakimNivre
2009
StatisticalLanguageModelsforInformationRetrieval
ChengXiangZhai
2008
Copyright©2022byMorgan&Claypool
Allrightsreserved.Nopartofthispublicationmaybereproduced,storedinaretrievalsystem,ortransmittedin
anyformorbyanymeans—electronic,mechanical,photocopy,recording,oranyotherexceptforbriefquotations
inprintedreviews,withoutthepriorpermissionofthepublisher.
PretrainedTransformersforTextRanking:BERTandBeyond
JimmyLin,RodrigoNogueira,andAndrewYates
www.morganclaypool.com
ISBN:9781636392288 paperback
ISBN:9781636392295 PDF
ISBN:9781636392301 hardcover
DOI10.2200/S01123ED1V01Y202108HLT053
APublicationintheMorgan&ClaypoolPublishersseries
SYNTHESISLECTURESONHUMANLANGUAGETECHNOLOGIES
Lecture#53
SeriesEditor:GraemeHirst,UniversityofToronto
SeriesISSN
Print1947-4040 Electronic1947-4059
Pretrained Transformers
for Text Ranking
BERT and Beyond
Jimmy Lin
UniversityofWaterloo
Rodrigo Nogueira
UniversityofWaterloo
Andrew Yates
UniversityofAmsterdamandMaxPlanckInstituteforInformatics
SYNTHESISLECTURESONHUMANLANGUAGETECHNOLOGIES#53
M
&C Morgan &cLaypool publishers
ABSTRACT
Thegoaloftextrankingistogenerateanorderedlistoftextsretrievedfromacorpusinresponse
to a query. Although the most common formulation of text ranking is search, instances of the
taskcanalsobefoundinmanynaturallanguageprocessing(NLP)applications.Thisbookpro-
vides an overview of text ranking with neural network architectures known as transformers, of
which BERT (Bidirectional Encoder Representations from Transformers) is the best-known
example.Thecombinationoftransformersandself-supervisedpretraininghasbeenresponsible
foraparadigmshiftinNLP,informationretrieval(IR),andbeyond.
Thisbookprovidesasynthesisofexistingworkasasinglepointofentryforpractitioners
who wish to gain a better understanding of how to apply transformers to text ranking prob-
lems and researchers who wish to pursue work in this area. It covers a wide range of modern
techniques,groupedintotwohigh-levelcategories:transformermodelsthatperformreranking
in multi-stage architectures and dense retrieval techniques that perform ranking directly. Two
themespervadethebook:techniquesforhandlinglongdocuments,beyondtypicalsentence-by-
sentence processing in NLP, and techniques for addressing the tradeoff between effectiveness
(i.e., result quality) and efficiency (e.g., query latency, model and index size). Although trans-
former architectures and pretraining techniques are recent innovations, many aspects of how
theyareappliedtotextrankingarerelativelywellunderstoodandrepresentmaturetechniques.
However, there remain many open research questions, and thus in addition to laying out the
foundations of pretrained transformers for text ranking, this book also attempts to prognosti-
catewherethefieldisheading.
KEYWORDS
pretrained transformers, BERT, neural information retrieval, effectiveness–
efficiencytradeoffs,reranking,multi-stageranking,denseretrieval,representational
learning,documentexpansion,queryexpansion