Table Of ContentMultiple paternity: determining the minimum number of sires of a large
brood
A.Erikssona,b,B.Mehliga,M.Panovac,C.Andre´c,andK.Johannessonc
aDepartmentofPhysics,Go¨teborgUniversity,Go¨teborg,Sweden
bDepartmentofMarineEcology,UniversityofGothenburg,SE-43005Go¨teborg,Sweden
cDepartmentofMarineEcology-Tja¨rno¨,UniversityofGothenburg,SE-45296Stro¨mstad,Sweden.
Wedescribe anefficient algorithm for determining exactly the minimum numberofsires consistent with the
multi-locusgenotypesofamotherandherprogeny.Weconsidercaseswhereasimpleexhaustivesearchthrough
allpossiblesetsofsiresisimpossibleinpractice(becauseitwouldtaketoolongtocomplete). Ouralgorithm
forsolvingthiscombinatorialoptimisationproblemavoidsvisitinglargepartsofsearchspacewhichwouldnot
9 improvethesolutionfoundsofar(i.e., resultinasolution withfewernumberofsires). Thisisofparticular
0 importancewhenthenumberofallelictypesintheprogenyarrayislargeandwhentheminimumnumberofsires
0 isexpectedtobelarge.Preciselyinsuchcasesitisimportanttoknowtheminimumnumberofsires:thisnumber
2 givesanexactboundonthemostlikelynumberofsiresestimatedbyarandomsearchalgorithminaparameter
regionwhereitmaybedifficulttodeterminewhetherithasconverged. Weapplyouralgorithmtodatafromthe
n
marinesnail,Littorinasaxatilis.
a
J Keywords:Multiplepaternity,Combinatorialoptimisation,MicrosatelliteDNA
1
2
I. INTRODUCTION
]
E
A number of species from different taxa are known to mate numerous times during a mating season. Females of such
P
species are likely to give birth to offspring fathered by more than one sire, and in some cases, the offspring may be fa-
.
o thered by a large number of sires. For example, queens of the honeybee are known to leave the nest followed by hundreds
i
b ofmales(Wattanachaiyingcharoenetal.,2003). Femalesofthesaltwaterflymatemanytimesadayduringthematingseason
- (BlythandGilbourn,2006)andinspeciesofperiwinkles(marinegastropods)femalesmaterepeatedlyduringthematingseason
q
whichisseveralmonthslong(Reid,1996;Saur,1990).
[
Thedegreeofmultiplepaternitycanbe inferredfromempiricaldata, obtainedforexamplebygenotypingfemalesandoff-
1 springusinghigh-resolutiongeneticmarkers,suchasmicrosatellitesorsinglenucleotidepolymorphisms.Toestimatethenum-
v berofsirescorrespondingtoagivenmulti-locusdatasetisusuallystraightforwardwhenthenumberofoffspring,thenumber
3 oftheirallelictypes,andthenumberofsirestobedeterminedaresmall.
3
Intheexamplesmentionedabove,however,thelevelsofmultiplepaternityareoftensohighthatthemathematicalanalysis
2
oftheempiricaldatabecomesverycomplexandtime-consuming.Inthispaperwedescribeanewefficientalgorithmtoanalyse
3
. multiplepaternityinempiricaldatasetswhenthepaternalgenotypesareunknown.Weapplythealgorithmtodatasetsfromthe
1
periwinklespeciesLittorinasaxatilisexhibitinghighlevelsofmultiplepaternity[ithasbeenshownpreviouslythatatleast7-10
0
malesarefatherstotheoffspringofonebrood(Ma¨kinenetal.,2007)].
9
Consider an example. Tab. I shows the multi-locusgenotypesof 42 progenyfrom a brood of a female periwinkle. How
0
: doesonedeterminethenumberofsiresfromthedatashowninTab.I? Moreprecisely,thisquestionmaybeposedinatleast
v
threedifferentways: whatistheactualnumberofsires,asopposedtothemostlikelynumber,ortheminimumnumberofsires
i
X consistentwiththemotherandherprogenyarray?
Unlessoneisabletodirectlyobservethematingsitisingeneralnotpossibletodeterminetheactualnumberofsiresforan
r
a arraysuchastheoneshowninTab.I. Analternativeistoinsteaddeterminethemostlikelynumberofsiresconsistentwiththe
multi-locusgenotypesofmotherandprogeny(Wang,2004). Inthisapproachitiscommonlyassumedthatthepopulationisin
Hardy-Weinbergequilibrium,thatalllociaresubjecttoneutralevolution,andthatalllociareinpairwiselinkageequilibrium.
Themostlikelysetofsiresisdeterminedbyarandomsearchalgorithm(Pressetal.,1986),locallymaximisingthelikelihood
constrainedbyrequiringconsistencywithmotherandprogeny.Arandomsearchhoweverdoesnotguaranteeconvergence.The
onlywaytomakesurethatthealgorithmhasconvergedistocomparewithanexhaustivesearch.
Exhaustivesearch algorithmshave beenpublishedin the literature. An exampleavailable for downloadis GERUD (Jones,
2001,2005). Theexhaustivesearchiscommonlyconductedasfollows: alistofpaternalallelesatalllociisdeterminedfrom
theallelesintheprogenyarray,subtractingthoseofthemother. Ifforagivenchildatagivenlocusthepaternalallelescannot
be uniquelyextracted, both are kept in the list. From this list, a set of potentialsires is obtained by constructingall possible
multi-locuspaternalhaplotypes.Thislistisprunedbyremovingindividualswhicharenotconsistentwithanyoftheprogenyin
thedata.Theminimumnumberofsiresisdeterminedbyexhaustivelysearchingthisprunedset:firstthealgorithmtestswhether
asinglefatherfromthissetisconsistentwiththeprogenyarray. Ifthisisnotthecase,allpairsofsiresaretested,ifnecessary
alltripletsofsires, andsoforth. Thisalgorithmensuresthattheminimumnumberofsiresconsistentwiththedataarefound.
Thisalgorithmhasbeensuccessfullyusedinmanycircumstances.Itworkswellwhenthelistofpaternalallelesisnottoolong,
andwhentheminimumnumberoffatherstobedeterminedisnottoolarge.
2
TheminimumnumberofsiresforthedatasetsshowninTab.Iisfoundtobetwelve(usingthealgorithmdescribedbelow).
With the search algorithm summarised above, one would have to go througha prohibitivelylarge number of sets of possible
sires. Sincethenumberofsetstosearchtypicallyincreasescombinatoriallywiththenumberofsires,theexhaustivesearchinthe
algorithmproposedby(Jones,2005)islimitedtomaximallysixsiresforagivensample.Inpractice,whenthenumberofsiresis
fiveorsix,thealgorithmmaytakeverylongtoconverge(Sefcetal.,2008). Inmanypracticalapplications(Amavetetal.,2008;
Portnoyetal.,2007;RispoliandWilson,2008;Simmonsetal.,2008;Songetal.,2007;Takagietal.,2008;VanDorniketal.,
2008)thenumberofsiresdeterminedwithGERUDdoesnotexceedfour. (ItshouldbenotedthatGERUD2.0allowstheuser
to truncate the pruned set of sires in an ad-hoc fashion in order to check for up to eight sires. But due to the truncation, the
minimumnumberofsiresispotentiallyoverestimated.Byhowmuchisunclear.)
AmaximumlikelihoodapproachindicatesthatthenumberofsiresinsamplesofL.saxatilis(Ma¨kinenetal.,2007)istypically
largerthansix. We havethereforedevelopedanewalgorithmfordeterminingtheminimumnumberofsiresconsistentwitha
givenprogenyarray,suchasthatshowninTab.I. ThealgorithmisdescribedinSec. II. Itmakesitpossibletoexactlydetermine
the minimumnumberof sires for the empiricaldata sets listed in Tab. II, in seven of the nine data sets the minimumnumber
of sires is found to be larger than six. Further, the new algorithm enables us to estimate the number of sires for a data set
whenconvergenceofmaximum-likelihoodalgorithmsisdifficulttoascertain. Fig.3forexampleshowsarunofthemaximum-
likelihood algorithm COLONY (Wang, 2004) (which estimates the most likely number of fathers given the populationallele
frequencies)forthedatashowninTab.I,comparedtotheminimumnumberofsires(whichistwelveforthisdataset). Fig.2
raisesthequestionhowlikelyitisthattheminimumnumberofsiresisequaltotheactualnumberofsiresforagivensample.
Usingouralgorithmwehaveinvestigatedthisquestionemployingdatasetsgeneratedbyacoalescentalgorithmwithinastep-
wisemutationmodel: theanswerdependsuponthepropertiesofthepopulationinquestion. Incaseswheretheprobabilitythat
thetwonumbersarethesameishigh,wecaninferthattheactualnmberofsiresinthissituationcanbereliablyestimatedfrom
empiricalsamples(itshouldbeemphasizedthattheresultofouralgorithmalwaysisanexactlowerbound). Lastbutnotleast,
weemployouralgorithmtoanswerthefollowingquestion:givenanempiricaldataset,howcouldonemostefficientlyincrease
theaccuracyoftheestimateofthenumberofsires: bygenotypingmorelociforagivensetofprogeny,orbygenotypingmore
progenyforagivennumberofloci? Again,theanswerdependsuponthepropertiesofthepopulationinquestion.
The remainder of this article is organised as follows. In Sec. II we describe the new algorithm, called ‘MinFathers’. We
alsobrieflydescribehowweproducedartificialsamplesusingthecoalescentinordertotestthenewalgorithm. Ourresultsare
summarisedinSec. III. ConclusionsaredrawninSec.IV.
II. METHODS
A. Anefficientsearchalgorithm
Inthissectionwedescribeournewalgorithmwhichdeterminestheminimumnumberofsiresconsistentwithagivenprogeny
array,suchasthatshowninTab.I.
Tofindtheminimalsetofsiresisequivalenttofindingapartitionoftheprogeny,suchthatallprogenyinagivenmemberof
thepartitioncanbeinheritedfromasinglefather. Afatherisrepresentedbyalistofthetwoallelesateachlocus.Eachpaternal
allelemayeitherhaveadefinitevalue,ornovaluemayyethavebeenassignedtothisallele. Eachprogenytooisrepresented
byalistofalleles,oneforeachlocuswhenthealleleofthemotherhasbeensubtracted. Usually,theallelictypesareuniquely
determined. Thereare,however,twoexceptions: whenagenotypeerrorhasoccurred,andwhenthemotherandtheoffspring
areidenticalandheterozygous. Inthiscase,atthelocusinquestion,therearetwopossibleallelesfortheprogeny. We ignore
thesecomplicationsforthemoment,andreturntothemlater,afterhavingdescribedthealgorithminitssimplestform.
In ouralgorithm,the mostgeneralcommonfatherfor a set of progenyis foundthrougha sequenceof mergingoperations.
Thisoperationmapsthetwomostgeneralfathersoftwosetsofprogenytothemostgeneralfatherofthecombinedset. Since
wearealwayssearchingforminimumnumberoffathers,thefathersarealwaystakentobeheterozygousateachlocus(orsome
lociremainundetermined).
Themergingoftwofathersf andf′proceedsindependentlyateachlocus(sincefreerecombinationisassumed).Assumethat
acommonfatherf forasetofj progenyhasbeenfound. Nowaddanotherprogenytothisset. Assumethatthenewprogeny
hasallelic typea atagivenlocus. Itsmostgeneralfatherhasthegeneticconfiguration{a,0}atthislocus. Themostgeneral
commonfatherf′ofthejointsetofj+1progenyisobtainedbymergingf and{a,0}asdescribedinTab.III. Atagivenlocus,
thefatherf′mayhaveseveralpossibleconfigurations,dependingontheconfigurationsoff andthefatherofp. Inthetable,the
asterisk denotesthatthe correspondingallelic typehasnotyetbeendetermined,or is unknown. The mostgeneralfather ofa
singleprogenyis{a,∗}.
Now consider a partition F of j progeny. We introduce the following terminology. A partition is called valid if for each
elementof the partitionthere is a commonfather for all progenyin the element. In other words, a valid partition of progeny
correspondsto a set of fathers for the progenyin question. Our algorithm can now be formulated as follows: for each valid
partitionF ofprogeny1,...,j,generateallvalidpartitionsF′ofprogeny1,...,j+1byaddinganewprogenyj+1toeach
3
elementofF providedthenewfathermergeswiththecommonfatheroftheelementofF intoavalidcommonfather. Starting
fromanemptysetoffathers,F =∅,andasetofprogenyS,wecanfindallvalidsetsoffathersofS bythisalgorithm.
It is possible to find the minimum number of fathers from this method by taking the minimum modulus of all partitions
found.Thisisusuallymuchbetterthanfirstgeneratingthefullsetofpartitionsoftheprogeny,andsubsequentlycheckingwhich
partitionsare valid; how muchmore efficient this is dependson the data. Our algorithm for findingthe minimumnumberof
fathersissummarisedinFig.1. ItrecursivelybuildsallvalidpartitionsofasetofprogenyS,exceptsomepartitionsthatcanbe
showntonotcorrespondtotheminimumnumberoffathers. InFig.2weshowasearchtreefora simplesetoffourprogeny.
Eachprogenyhastwoloci,andateachlocustheallelecorrespondingtothemotherhasbeensubtracted,sothateachprogenyis
describedbyalistoftwoalleles(oneforeachlocus): p = (a,b),p = (c,d),p = (c,e),andp = (a,d). Itisassumedthat
1 2 3 3
a,b,c,anddarefourdifferentallelictypes.
We conclude this section by emphasizing four important points. First, note that if we have a given number of fathers for
progeny1,...,j,thenumberoffathersforprogeny1,...,j+1mustbeatleastashigh. Hence,wheneverthesetoffathersis
atleastaslargeastheminimumnumberfoundsofar,wecanstopsearchingforpartitionsofS basedonthecurrentpartition.
When we have a complete partition of S which is smaller than the minimum found so far, we can update the minimum. As
a startingpoint, we can use any validupperbound; in the presentimplementationwe use the trivialboundthatthe minimum
numberoffathersisn∗ ≤⌈|S|/2⌉.Usingouralgorithm,wehave
n∗ =MinFathers(∅,S,⌈|S|/2⌉). (1)
Second, the algorithm as described above is valid only if the genetic material inherited from the father at each locus is
uniquelydetermined. In general, this is notthe case, as pointed out above. Instead, there may be one or more loci with two
possiblechoices.Inthiscase,whengeneratingthevalidpartitionscontainingagivenprogenyweloopoverallpossiblevariants
ofallelesintheseloci. Inpracticaldata,itisraretohavemorethanonesuchlocus,butifmanylociareconsideredthiscould
causeproblems: thenumberofvariantsoftheallelethatmayneedtobeconsideredis2m,wheremisthenumberoflociwith
multiplechoicesintheprogeny. Fordatasetswherethisisaproblem,itispossibletoextendthealgorithmtoamorecomplex
mergingoperation,whereforeachlocusofafather,wekeeptrackofallpossiblepairsofallelesthatcansimultaneouslymatch
allprogenyderivingfromthefather.
Third,ifwefindthatsomeoftheprogenynotyetincludedinthecurrentsetoffatherscanbedirectlyinheritedfromanyof
thefathers(i.e. thefathercontainsthenecessarygeneticmaterialatallloci),wecansafelyremovethemfromfromthesetof
progeny. In other words, if a new progenyp can be directly inherited froma father f, a merge f′ between f and p will lead
tof′ = f, i.e. nochangeinf. Hence,itisclearthatgiventhepresentsetoffathers,itisnotpossibleto findanotherwayof
mergingthese progenywhichwilllead tofewerfathersforthe wholeset. In ouralgorithm,wheneverweadda new fatherto
the partition, we removethe progenythat can be directly inherited from the new father, beforerecursively searchingfor new
partitions.
Fourth, in general the key to an efficient solution of a combinatorial problem such as the present one lies in cutting away
as large parts of the search space as early as possible. In the present context, this means that if we can consider the most
constrainingprogenyfirst, we can discard partitions that will not be valid for the whole set of progeny,or that will be larger
than the minimum, at an early stage. We sort the progenyfirst with respectto the numberof undeterminedsymbols(if any),
andwherethennumberofno-caresymbolsisequal,withrespecttothenumberofmultiplechoicelociintheprogeny,sothat
individualswithmultiple-choiceorundeterminedlociwillbeconsideredfirst. Beforethesorting,wecheckthemultiplechoice
loci. Consider a multiple choice locus in a given progeny. If the two alleles only occur togetherat that locus in all progeny,
orifonlyoneofthelocioccuraloneortogetherwithsomethirdallele,wecansafelyreplacethemultiplechoicebythemost
frequentoftheallelictypes. Pickingtheleastfrequentallelictypecanonlyexcludesomepossiblemergesthatwouldhavebeen
possiblewiththemorecommonallele.However,ifbothallelesoccuralsoaloneorwithsomethirdallelictype,wecannotsafely
concludewhichchoicewillleadtotheminimumnumberoffathers. Inthiscaseitisnecessarytokeepbothoptions.
B. Generatingsamplesusingthecoalescent
Inthissectionwedescribehowwehaveusedthecoalescenttogenerateartificialsamplesinordertotestthesearchalgorithm
describeintheprevioussection.
Inordertounderstandunderwhichcircumstancestheminimumnumberoffathersisequaltothetruenumberoffathers,we
generaten progenywithaknownnumberoffathers,n ,asfollows. First,thegenegenealogiesoftheLlociinamotherand
p f
n fathersare generatedaccordingto the standardneutralcoalescenttheory for an unstructuredpopulationwith constantsize
f
N. Because weassume thatthe lociareunlinked,thegenegenealogiesofdifferentlociare statistically independent. Ineach
branchof the genealogies, mutationsoccur with probabilityµ per generation, so thatthe numberof mutationsin a branchof
T generations is Poisson distributed with mean µT. In the coalescent, time is measured in units of 2N generations and the
mutationrateisgivenbythescaledparameterθ =4Nµ.
4
Wemodelmicrosatellitedatausingthestepwisemutationmodel,whereeachmutationleadstoeitherthegainorthelossof
a single repeat unit(KimuraandOhta, 1975, 1978). Thus, given the genealogy, for each locuswe start from the most recent
commonancestorofthewholesample,andassignitallele0(inthestepwisemutationnumber,onlydifferencesinthenumber
ofrepeatunitsarerelevant). Wethenrecursivelygeneratetheallelesofeachnodeinthegenealogybygeneratingthestepwise
mutationsalongeachbranchasdescribedabove,untilwehaveassignedtheallelesforallindividualsinthesample.
Giventheallelictypesofthemotherandthefathers,weproducetheoffspringasfollows:foreachprogenywepickarandomly
chosenfather,suchthateachfatherisequallylikelytobepicked.Ifintheendnotallfathershavebeenpickedforatleastsome
offspring,werepeatthewholeprocessuntilthisisthecase. Thisguaranteesthatthetruenumberoffathersisexactlyn . For
f
each locus in the progeny we then form the progeny according to Mendelian inheritance from the mother and the father, by
picking one allele from the mother and one from the father. This procedure guarantees that the marginal distribution of the
numberofoffspringperfatherisapproximatelybinomial,whichisconsistentwiththeneutraltheoryandwithempiricaldatafor
thesnails(Ma¨kinenetal.,2007).
Theperformanceof‘MinFathers’dependsontheamountofgeneticvariation(determinedbythemutationrateθ)andonthe
number of progeny per father, n = n /n . When θ is small, the minimum number of fathers is small and the algorithm
p/f p f
terminatesquickly.Alsowhenθislargethealgorithmisefficientbecauseitcanusuallyeliminateimpossiblemergesatanearly
stage. When θ is intermediate and n is large, however, the algorithm may have to investigate a significant fraction of the
p/f
possiblecombinations,andinthesecasesitmaynotbepracticaltousethealgorithm.Despitethiscaveat,Tab.IIandtheresults
presentinthenextsectionshowthatthealgorithmcanbeusedonempiricaldatawithalargenumberofprogenyandacrossa
largerangeofparametersforθandn .
p/f
III. RESULTS
InthissectionwedescribetheresultsobtainedwiththenewalgorithmproposedinsectionII.
A. Applicationtoempiricaldata
We have applied our algorithm to the L. saxatilis data by (Ma¨kinenetal., 2007), and to a new L. saxatilis data set
(Bostro¨metal. , 2009, in preparation),given in Tab. I. We begin by describingour results for the new data set (Tab. I). The
original data contains more progeny than listed in Tab. I. Using our algorithm we find that the minimum number of sires is
twelve,asgiveninthefirstrowofTab. II. ThealgorithmGERUD2.0couldnotberunbecausethisalgorithmdeterminesthe
exactsolutiononlyforupto sixsires. We havealso runCOLONY (Wang,2004) (whichestimatesthe mostlikelynumberof
sires), andthe correspondingresultsareshownin Fig. 3. Itisnotentirelyclearwhetherthealgorithmhasconverged;the log
likelihood may have reached a plateau but it is not clear whether further explorationof the state-space may yield still higher
likelihoodvalues.Itisthereforevaluabletohavetheexactlowerbound(alsoshowninFig.3)fromournewalgorithm.
Usingthenewalgorithm,MinFathers,wehavere-analysedthedataof(Ma¨kinenetal.,2007);thecorrespondingresultsare
alsogiveninTab. II. ThelasteightdatasetsinTab.IIwereanalysedusingGERUDby(Ma¨kinenetal.,2007)andarebroadly
consistentwiththecorrespondingresultsofCOLONY.Itmustbenotedhoweverthatin(Ma¨kinenetal.,2007),thesearchwas
performedonthreelocionly,andusinganadhoctruncationofthepossiblesetoffathers. Itisthereforeofinteresttodetermine
whattheminimumnumberofsiresactuallyis. ThecorrespondingresultsareshowninTab.II,andprovideanexactlowerbound
forthemostlikelynumberoffathersdeterminedby(Ma¨kinenetal.,2007). Exceptintwocases,GERUD2.0couldnotberun
becausethetrueminimumnumberoffathersexceededthemaximumvalueofsix.
TheresultssummarisedinTab. IIraisethequestionofhowmuchlargerthantheexactminimumoneexpectsthemostlikely
numberof fathersto be. The answer dependsuponthe populationmodel, and upon the parametersdescribing it, such as the
mutation rate θ, the number L of loci, and the numbern of progenyin the data. This question is addressed in the following
section.
B. Thedifferencebetweentheminimumandthemostlikelynumberofsires
InthissectionweconsiderapopulationinHardy-Weinbergequilibrium,weassumethatalllociaresubjecttoneutralevolu-
tion,andthatalllociareinpairwiselinkageequilibrium. Weposethequestion: howmuchlargerthantheminimumnumberis
themostlikelynumberofsiresinabroodofagivenmother?Tothisendwegeneratedsamplesusingthecoalescentasdescribed
inSec. II.B.
Figs. 4and5summariseourresults. First, Fig.4showstheprobabilitythatthenumberoffathersis equalto theminimum
numberoffathers,p ,asafunctionofthenumbern ofprogenyperfather,forthreevaluesofthemutationrateθ;θ = 1,
same p/f
θ = 10,andθ = 100. Whenθ islarge,weobserveasharptransitionwherep increasesfromalmostzerotoalmostunity.
same
Whenθissmall,however,samplingmanyoffspringdoesnotresultinasignficiantincreaseofp becausethefathersaretoo
same
5
geneticallysimilar. Forintermediatevaluesofθ, p doesincreasewiththenumberofprogenyperfather,butneverreaches
same
unitybecausethefathersarestillsignificantlygeneticallycorrelated.
Second,Fig.5showshowp changesasafunctionofθforfivedifferentvaluesofthenumberoflociL. WhenL=1,the
same
minimumnumberoffathersisalwayssmallerthanthetruenumberoffathers(unlessthereisonlyonefather),thereforep
same
equalszero. ForL≥2wefindthatp increasesasafunctionofbothθandL. Theextenttowhichtheminimumnumberof
same
fathersagreeswiththetruenumberoffathersdependsontheprobabilitythatthefathersareallgeneticallydistinct. WhenLis
increasing,thishappensforsmallervaluesofθ,asindicatedbytheincreasinglysharptransitionsfromp =0top =1in
same same
thefigure.
Thus,whenthemutationrate,numberofloci,andnumberofoffspringperfatheraresufficientlyhighsothatp ≈ 1,the
same
minimumnumberoffathersalmostalwaysequalsthetruenumberoffathers. Asaconsequence,theresultingnumberofsires
contributingtoagivenfamilydoesnotdependonwhetherthepopulationisstructuredorpanmitic. Inotherwords,theresultis
insensitivetoassumptionsabouttheunderlyingpopulationstructure.
IV. DISCUSSION
Weconcludewithadiscussionof,first,howtheminimumnumberoffathersisinfluencedbygenotypingerrors. Second,we
addressthefollowingquestion. Iftheaimistoincreasetheprobabilityofdeducingthetruenumberoffathersfromempirical
datasuchasTab.I,isitbetteroftosamplemoreoffspringormoreloci?
Microsatellites can be prone to genotyping errors (see reviews by DeWoodyetal., 2006; HoffmanandAmos, 2005;
SelkoeandToonen, 2006). Sourcesof error when genotypingmicrosatellites include stuttering (appearanceof PCR products
oneormorerepeatsshorterthananactualallele),alleledropout(non-amplificationofoneofthetwoallelesinaheterozygote,
usuallyalongerone),non-specificPCR-productsduetoannealingofprimerstomultiplesites, nullalleles(alleleswithamu-
tationintheprimerregion,whichpreventtheiramplificationinPCR)and,finally,mistypingandothermistakesduringmanual
scoringoftheresults.
Changestothenumberofrepeats(PCRstutteringevents)maycausetheminimumnumberoffatherstoappearlargerthanit
actuallyis. PCRstutteringeventsinamicrosatellitelocusareusuallyincremental(singleadditionsordeletionsofarepeatunit).
Hence,whenthesamplesizeislarge,itislikelythattheresultingallelesarepresentinotherfathers(theeffectofPCRstuttering
issimilartomutationsoccurringduringmeiosis). Therefore,whenthefrequencyofstutteringeventsissmall,theeffectonthe
minimumnumberoffathersisexpectedtobesmall.
Whenonlyasinglealleleisamplified(e.g. becauseofalleledropoutornullalleles),theminimumnumberoffathersofthe
samplemayin-ordecrease. Ifthefrequencyofsucherrorsissmall,however,theeffectisexpectedtobesmall: assumingthat
theincidenceoftheseerrorsareindependentacrossdifferentloci,thelikelihoodthattheerrorschangetheminimumnumberof
fathersdecreasesrapidlywithincreasingnumberofloci.
Wenowturntothequestionofhowtobestincreasetheaccuracyofestimatingthetruenumberoffathers. Whenthenumber
ofoffspringislarge,asinthemarinesnails,therearetwopossibilitiesforincreasingtheprobabilitythatwecandeducethetrue
numberof fathers. One mayeither samplethe same lociin moreoffspring,orone maysample morelociin theoffspringwe
alreadyhave. Whichisthebetteroption? Ourresultsshowthatincreasingthenumberoflocigenerallyprovidesthequickest
wayofincreasingtheprobabilityoffindingthetruenumberoffathersfromtheminimumnumberoffathers,butwhetherthisis
feasibleornotdependsontheavailabilityandcostofadditionalhigh-qualitymarkers(seeFigs.4and5). Itmaybelesscostly
to sample moreindividualswith fewerloci, if possible. In this case, however,the accuracymay be limited by the numberof
offspringavailablebutalsothegeneticvariationintheloci.Iffewlociaresampled,andthemutationrateislow,ourresultsshow
thatitmaynotbepossibletoincreasetheaccuracybysamplingmoreindividualsbeyondacertainlimit,whichisdeterminedby
theprobabilitythatthefatherssharethesamealleles.
InordertorelatethetheoreticaldiscussionofSec.3.2totheempiricaldata(Tabs.IandII),wehaveestimatedtheparameterθ
fromthedatainMa¨kinenetal.(2007)usingtwostandardestimators,θˆ =h(x −y )2i(Wehrhahn,1975)andθˆ =(F−2−1)/2
v i j F
(OhtaandKimura, 1973). Here x and y are alleles of progeny i and j from mothers x and y, respectively, and F is the
i j
homozygosity(theprobabilitythatx =y ).Itisknownthatθˆ isunbiased[buthasalargevariance(ZhivotovskyandFeldman,
i j v
1995)],whereasθˆ isbiasedforlargevaluesofθ, buthasasmallervariance. ForthedatainTab.IIweobtainθˆ = 124and
F v
θˆ = 49whenaveragedoverallfiveloci. While theseestimatesareuncertainbecauseofthesmallsamplesize, wesee from
F
Figs.4and5thatthenumberofprogenysampledin(Ma¨kinenetal.,2007)(n = 21)isprobablytoolowtoreliablyestimate
p
the true numberof fathers (the estimated n is the range 2...4 for these data). Increasingthe number of loci is not likely
p/f
tohelpverymuch. Theprogenysampledwerechosenfromlargefamilies(of70to100progeny). Ourresultsindicatethatin
thiscase,thebeststrategyforincreasingtheaccuracyofthenumberoffathersistosamplestillmoreprogeny.Indeed,analysis
ofthefullsetofprogenythisdatasetindicatesthatthetruenumberoffathersinthesefamiliesissignificantlyhigherthanthe
minimumnumberoffathersreportedinTab.II(Bostro¨metal. ,2009,inpreparation).
Acknowledgments. We thank J. Bostro¨m, T. Hofving, T. Areskoug and T. Ma¨kinen for providing data, and the Swedish
ResearchCouncil,theLinneusinitiativeAdaptationtoChangingMarineEnvironments(ACME)andtheCentreforTheoretical
6
BiologyatGothenburgUniversityforsupport.
References
AmavetP.,RossoE.,MarkarianiR.,etal.,MicrosatelliteDNAMarkersAppliedtoDetectionofMultiplePaternityinCaimanlatirostrisin
SantaFe,Argentina.J.Exp.Zool.309A:637-642,2008.
Bostro¨mJ.,HofvingT.,Andre´C.,AreskougT.,ErikssonA.,JohannessonK.,MehligB.,Ma¨kinenT.,andPanovaM.,unpublished,2009.
BlythJ.E.andGilburnA.S.,Extremepromiscuityinamatingsystemdominatedbysexualconflict.J.Ins.Behav.,19:447-455,2006.
DeWoodyJ.,NasonJ.D.andHipkins.V.D.,Mitigatingscoringerrorsinmicrosatellitedatafromwildpopulations.MolecularEcologyNotes,
6(4):951–957,2006.
HoffmanJ.I.andAmosW.,Microsatellitegenotypingerrors:detectionapproaches,commonsourcesandconsequencesforpaternalexclusion.
MolecularEcology,14(2):599–612,2005.
JonesA.G.,GERUD1.0: acomputerprogramforthereconstructionofparentalgenotypesfromprogenyarraysusingmultilocusDNAdata.
MolecularEcologyNotes,1:215-218,2001.
Jones A. G., GERUD 2.0: a computer program for the reconstruction of parental genotypes from half-sib progeny arrays with known or
unknownparentage.MolecularEcologyNotes,5:207-711,2005.
KimuraM.andOhtaT.,Distributionofallelicfrequenciesinafinitepopulationunderstepwiseproductionofneutralalleles.PNAS,72:2761–
2764,1975.
KimuraM.andOhtaT.,Stepwisemutationmodelanddistributionofallelicfrequenciesinafinitepopulation.PNAS,75:2868–2872,1978.
Ma¨kinenT.,PanovaM.,Andre´C.,HighLevelsofMultiplePaternityinLittorinasaxatilis:HedgingtheBets?J.Hered.,98:705-711,2007.
OhtaT.andKimuraM.,Amodelofmutationappropriatetoestimatethenumberofelectrophoreticallydetectableallelesinafinitepopulation.
GenetRes,22(2):201–204,1973.
PressW.H.,TeukolskyS.A.,VetterlingW.T.,andFlanneryB.P.,NumericalRecipes,CambridgeUniversityPress1986.
PortnoyD.S.,PiercyA.N.,MusickJ.A.,etal.,Geneticpolyandryandsexualconflictinthesandbarshark,Carcharhinusplumbeus,inthe
westernNorthAtlanticandGulfofMexico.Mol.Evol.,16:187-197,2007.
ReidD.G.,SystematicsandevolutionofLittorina.RaySociety,London,1996.
RispoliV.F.andWilsonA.B.,Sexualsizedimorphismpredictsthefrequencyofmultiplematinginthesex-rolereversedpipefishSyngnathus
typhle.J.Evol.Biol.,21:30-38,2008.
SaurM.,MatediscriminationinLittorinalittorea(L.)andLittorinasaxatilis(Olivi)(Mollusca:Prosobranchia).Hydrobiologia193:261-270,
1990.
SefcK.M.,MattirsdorferK.,SturmbauerCh.,andKoblmu¨llerS.,Highfrequencyofmultiplepaternityinbroodsofasocialymonogamous
cichlidfishwithbiparentalnestdefence.Mol.Ecol.,17:2531,2008.
SelkoeK.A.andToonenR.J.,Microsatellitesforecologists:apracticalguidetousingandevaluatingmicrosatellitemarkers.EcologyLetters,
9(5):615–629,2006.
SimmonsL.W.,BeveridgeM.,andEvansJ.P.,MolecularEvidenceforMultiplePaternityinaFeralPopulationofGreenSwordtails.J.Hered.,
99:610-615,2008.
SongS.D.,DrewR.A.I.,andHughesJ.M.,Multiplepaternityinanaturalpopulationofawildtobaccofly,Bactroceracacuminata(Diptera
:Tephritidae),assessedbymicrosatelliteDNA.Mol.Ecol.,16:2353-2361,2007.
TakagiM.,SakaiK.,andTaniguchiN.,DirectevidenceofmultiplepaternitiesinnaturalpopulationofviviparousJapanesesurfperchbyallelic
markersofmicrosatelliteDNAloci.Fish.Science,74:976-982,2008.
VanDoornikDM,ParkerSJ,MillardSR,etal.,MultiplepaternityisprevalentinPacificoceanperch(Sebastesalutus)offtheOregoncoast,
andiscorrelatedwithfemalesizeandage.Env.Biol.Fish.,83:269-275,2008.
WangJ.,Sibshipreconstructionfromgeneticdatawithtypingerrors.Genetics,166:1963-1979,2004.
Wattanachaiyingcharoen W.,OldroydB.P.,WongsiriS.,PalmerK.,PaarJ.,AscientificnoteonthematingfrequencyofApisdorsata.Api-
dologie,34:85-86,2003.
WehrhahnC.F.,Theevolutionofselectivelysimilarelectrophoreticallydetectableallelesinfinitenaturalpopulations.Genetics,80(2):375–
394,1975.
ZhivotovskyL.A.andFeldmanM.W.,Microsatellitevariabilityandgeneticdistances.PNAS,92(25):11549–11552,1995.
7
Figures
8
functionMinFathers(F,S,nˆ)returnintegeris
begin
ifS isemptyand|F|<nˆthen
nˆ :=|F|
PrintfathersinF
elseif|F|<nˆ then
pickaprogenyp∈S
foreachvariantqofploop
foreachvalidfatherf′ ∈{Merge(f,q):f ∈F}loop
S′ :={s∈S\{p}: scannotbeinheritedfromf}
nˆ :=min(nˆ,MinFathers(S′,(F\{f})∪{f′},nˆ))
endloop
S′ :={s∈S\{p}: scannotbeinheritedfromq}
nˆ :=min(nˆ,MinFathers(S′,(F ∪{q},nˆ))
endloop
endif
returnnˆ
endMinFathers
FIG.1 Algorithmfor finding the minimum number of fathers. S istheset of progeny, F isthe set of fathers (F = ∅initially), nˆ isthe
minimumnumber offathersforthewholesetofprogeny foundsofar(⌈|S|/2⌉initially). Seethetextforanexplanationofthealgorithm.
Underwhichcircumstancesaloopovervariantsqofpisnecessaryisexplainedinthetext.
9
F={}
S={1,2,3,4}
F={{1}}
S={2,3,4}
F={{1,2}} F={{1},{2}}
S={3,4} S={3,4}
F={{1,2,3}} F={{1,2},{3} F={{1,3},{2}} F={{1},{2,3}} F={{1},{2},{3}}
S={4} S={4} S={4} S={4} S={4}
F={{1,2,3,4}} F={{1,2,4},{3}} F={{1,3,4},{2}} F={{1,4},{2,3}} F={{1,4},{2},{3}}
S={} S={} S={} S={} S={}
F={{1,2,3},{4}} F={{1,2},{3,4}} F={{1,3},{2,4}} F={{1},{2,3,4}} F={{1},{2,4},{3}}
S={} S={} S={} S={} S={}
F={{1,2},{3},{4}} F={{1,3},{2},{4}} F={{1},{2,3},{4}} F={{1},{2},{3,4}}
S={} S={} S={} S={}
F={{1},{2},{3},{4}}
S={}
FIG.2 ShowsasearchtreeforfourprogenyconstructedbythealgorithmdescribedinSec. II. Eachprogenyhastwoloci,andateachlocus
the allelecorresponding to themother has been subtracted, sothat each progeny isdescribed by a listof twoalleles(one for each locus):
p1 = (a,b),p2 = (c,d),p3 = (c,e),andp3 = (a,d). Itisassumedthata,b,c,anddarefourdifferentallelictypes. Thefigureillustrates
howlargepartsofthesearchtreecanbecutaway(yellow)becausetheyneednotbevisited. Thismayconsiderablyspeedupthealgorithm.
Inthefigure,eachboxrepresentsthestateofthealgorithmateachiteration;F isthesetoffathers(eachfatherisrepresentedbythelistof
offspringassignedtoit),andSisthesetofoffspringthealgorithmhasyettobeassignedtoafather.Thestatesarevisitedfromtoptobottom,
and fromleft toright, followingthe linesthat emanate from thebottom of each node, except theterminal nodes (shown assquares). The
algorithmstopswhenthesetSisempty.Asetofindividualsthatcannotinheritfromasinglefatherisshowninbolditalicfont(i.e.thefather
isinvalid). Thestateswhicharecolouredyellowarenevervisited,eitherbecausetheydescendfromastatewithaninvalidfather,orbecause
itcanbeseenthatthestatedoesnotleadtoasolutionwithfewerfathersthanthebeststatefoundsofarinthesearch.
10
−500
d
o
o −505
h
keli −510
li
g −515
o
L
−520
2 4 6 8 10
No. iterations[×105]
20
18
s
r
e
h
at16
f
o.
N
14
12
0 2 4 6 8 10
No. iterations[×105]
FIG.3 ShowsarunofCOLONY(Wang,2004)forthedatashowninTab.I.Top:Thetimeevolutionoftheloglikelihoodofthedata.Bottom:
Themostlikelynumberofsiresasafunctionofthenumberofiterations(dots). Alsoshownistheexactlowerboundprovidedbythenew
algorithm,MinFathers(dashedline).