Table Of Contentresearch papers
J Dummy-atom modelling of stacked and helical
IUCr
nanostructures from solution scattering data
ISSN2052-2525
BIOLOGYjMEDICINE
Max Burian and Heinz Amenitsch*
InstituteofInorganicChemistry,GrazUniversityofTechnology,Stremayrgasse9/V,Graz8010,Austria.*Correspondence
e-mail:[email protected]
Received6February2018
Accepted9April2018 Theavailabilityofdummy-atommodellingprogramstodeterminetheshapeof
monodisperse globular particles from small-angle solution scattering data has
led to outstanding scientific advances. However, there is no equivalent
EditedbyJ.Trewhella,UniversityofSydney, procedure that allows modelling of stacked, seemingly endless structures, such
Australia as helical systems. This work presents a bead-modelling algorithm that
reconstructsthestructuralmotif ofhelical androd-likesystems.Thealgorithm
Keywords:SAXS;stackedstructures;helical
isbasedona‘projectionscheme’:byexploitingtherecurrentnatureofstacked
structures;shaperetrieval;SasHel;structure
systems, such as helices, the full structure is reduced to a single building-block
determination;solutionscattering;
computationalmodelling;structuralbiology; motif.Thisbuildingblockisfittedbyallowingrandomdummy-atommovements
nanoscience. without an underlying grid.The proposed method isverified using avariety of
analytical models, and examples are presented of successful shape reconstruc-
Supportinginformation:thisarticlehas tion from experimental data sets. To make the algorithm available to the
supportinginformationatwww.iucrj.org
scientific community, it is implemented in a graphical computer program that
encourages user interaction during the fitting process and also includes an
option for shape reconstruction ofglobular particles.
1. Introduction
Small-angle X-ray scattering (SAXS) is an established tech-
nique to study the structural aspects, such as the size and
shape,ofmolecularsystemsinsolution(Lietal.,2016).Asthis
structural information is not directly apparent from the
recorded scattering intensity, one requires a fitting process
that generally relies on an underlying mathematical model
(Pedersen,1997).Whileavarietyofsuchanalyticalmodelsare
available in the literature (Pedersen, 1997), each of them is
boundtoagivenshape.Thefittingprocessofanexperimental
datasetthereforerequirespreviousknowledgeofthesample
suchthattheappropriatemodelcanbechosen.Dummy-atom
(DA) modelling, which describes the particle shape as a
variable bead assembly, bypasses this issue as the fitting
processisnolongerconstrainedtoasinglegeometry(Chaco´n
etal.,1998;Waltheretal.,2000;Svergunetal.,2001;Franke&
Svergun, 2009; Koutsioubas et al., 2016). Consequently, no a
priori knowledge regarding the solute’s shape is necessary.
This advancement has helped to make SAXS an attractive
technique to characterize stable molecules in solution, far
beyond the community of scatteringexperts (Yang,2014).
However,singlemoleculesinsolutionarenotalwaysstable.
In fact, it is in their nature to interact and aggregate, often
resulting in highly organized hierarchical systems stretching
over several orders of magnitude (Palmer & Stupp, 2008;
Wasielewski, 2009; Busseron et al., 2013; Praetorius & Dietz,
2017). Helical and chiral superstructures are a common yet
spectacularclassofsuchsystems,e.g.thelengthofthehuman
genome DNA double helix exceeds several centimetres
(Venter et al., 2001) while the forces that give rise to the
390
https://doi.org/10.1107/S2052252518005493 IUCrJ (2018).5,390–401
research papers
Figure1
Theprojectionscheme,visualizedusingtheexampleofasingle-strandedhelix.Forsuchseeminglyendlessgeometries,thereisabuildingblock(upper
left-handcorner)thatisrecurrentalongthezaxis(seefullhelixontheright).ForafullbodyconsistingofMstackedbuildingblocks,onefindsdistinct
structuralmotifs,suchasthebuildingblockitself,neighbouringbuildingblocks,single-spacedbuildingblocksandsoonupto(M(cid:3)1)-spacedbuilding
blocks.Thescatteringintensityofthefullgeometrycanhencebecalculatedbysummingthecontributionsofthesestructuralmotifs,scaledbytheir
recurrence.ThisbypassesthedemandforanactualDAmodelrepresentingthefullstructure,aswesolelyevaluateprojectionsofthebuildingblock
insteadofactualstackedduplicates,hencethetermprojectionscheme.
characteristic helical motif act on the molecular (nanometre) seemingly infinite nature of e.g. helical systems onto a single
level (Dobbs, 2007). Amyloid fibrillation, i.e. the aggregation building-blockunit,representedbydummyatoms(DA).This
ofproteinsintoinsoluble,oftenhelical,fibrils,hasbeenlinked building block is altered by random DA movements while
to critical diseases such as type 2 diabetes or Alzheimer’s simultaneously fitting the corresponding scattering curve
(Dobson,2003;vonBergenetal.,2000)aswellastounwanted againsttheexperimentalone.Theproposedmethodisverified
drug degradation (Morozova-Roche & Malisauskas, 2007; using simulated data sets of various one-dimensional struc-
Vestergaard et al., 2007). Consequently, a comprehensive turesandwesubsequentlyapplyittoexperimentallyobtained
understanding of these systems requires structural character- SAXSdata.Thefullalgorithm isimplementedinacomputer
ization at the (supra-)molecular scale. Small-angle X-ray program(SasHel),whichalsoincludesanoption forglobular
diffraction of aligned fibres [e.g. the famous studies of the geometries. The reduction of the system’s complexity by
structureofDNA(Wilkinsetal.,1953;Watson&Crick,1953)] symmetricprojectionandthefastcodeimplementationresult
is a powerful technique for this purpose. However, the in a toolkit that allows a full shape retrieval from scattering
evaluationofsolutionscatteringdatafromrandomlyoriented dataintheorderof3–90minonastandardworkstation(see
helicalstructuresfacestwochallenges:(i)onlyafewanalytical Appendix A),depending on the level ofdetail.
models are available from which structural information, such
aspitchandtwist,canberetrieved(Schmidt,1970;Pringle&
Schmidt,1971;Hamley,2008;Teixeiraetal.,2010);and(ii)the
2. Projection scheme
endless nature of helices does not allow DA modelling using
current programs (Volkov & Svergun, 2003; Gingras et al., Dummy-atom modelling is based on the idea that a given
2008). particleshapeisrepresentedbyanassemblyofNsmallbeads
In this work, we present a bead-modelling algorithm to ofequalscatteringlengthdensity,calleddummyatoms(DA).
determine the structural motif of monodisperse systems, AccordingtotheDebyeformula(Debye,1915),thescattering
highly elongated in one dimension, in solution from SAXS intensityofthisassemblycanthenbecalculatedfromthebead
data. Weuse symmetrical boundary conditions to project the assembly as
391
IUCrJ (2018).5,390–401 BurianandAmenitsch (cid:2) Dummy-atommodelling
research papers
(cid:2) (cid:3)
IðqÞ¼f ðqÞ2XN XN sin qrij ; ð1Þ Fig.1(NBB=400,M=15),thisincreasesthecalculationspeed
DA qr approximately seven fold.
i¼1 j¼1 ij Thenotionofreducingagivenmodeltoitsstructuralmotifs
where the norm of the scattering vector is denoted as q = raisesacriticalpointregardingtherequirednumberofstacks
4(cid:2)sin((cid:3))/(cid:4) (2(cid:3) is the scattering angle and (cid:4) is the radiation M necessary to represent a seemingly endless structure. This
wavelength)andf (q)representstheDAformfactor(given number defines the length of the entire projected model L,
DA
inthisworkbyaGaussiansphereapproximation;Koutsioubas simplyviaL=MH .Accordingtoscatteringtheory,rod-like
BB
& Pe´rez, 2013; Svergun et al., 1995; Grudinin et al., 2017; structurespresentacharacteristicq(cid:3)1power-lawbehaviourin
Fraseretal.,1978).ADAdiameterof0.2nmwasusedforall the intermediate Porod regime (Glatter & Kratky, 1982),
reconstructions. As this formalism considers the distances whereas the transition from this Porod regime (q(cid:3)1) to the
betweenallpossibleDApairsr =|r (cid:3)r|insidetheassembly, adjacentlower-angleGuinierregime(q(cid:3)0)occursataspecific
ij j i
its mathematical complexity is OðN2Þ. Hence, the computa- scatteringvectorq dependingonthelengthoftherod,which
1
tional effort increases drastically for large N. Adequate canbeestimatedusingq =181/2/Linthesimplifiedframework
1
modelling of seemingly infinite rod-like geometries requires of the Hammouda model (Hammouda, 2010). Thus, if the
very large numbers of DAs (N > 10000), making the Debye experimental data present a q(cid:3)1power law at even the smal-
formula appearinadequate in its standardnotation. lestaccessiblescatteringangleq ,thelengthofthestructure
min
Rod-likestructures,such ashelices,often possess acertain must be larger than L > 181/2/q . Similarly, we use this
min
structural motif, a building block, which recurs along the relationtodeterminetheminimumnumberofstacksrequired,
elongation direction. Hence, only this building block is of according to M > 2 + 181/2/(H q ) (with the additional
BB min
interest for shape reconstruction. We define a building block term of+ 2to avoid truncationeffects).
ofN dummyatomswithitselongationdirectionparallelto
BB
the z axis (see Fig. 1). The full rod-like structure is then
3. Determination of the stacking distance
reproduced by duplicating the building block M times along
thezdirection,withastacking distancecorresponding tothe The projection scheme is a faster alternative for calculating
buildingblock’sheightH (seeFig.1).Theevaluationofthis the scattering intensity of a stacked structure compared with
BB
stackedmodelbymeansoftheDebyeformula[equation(1)] the standard Debye formula. As is apparent from equation
now includes redundant terms, as distinct motifs inside the (2),theformalismrequiresanadditionalinputparameter,the
structurearenowrecurrent.Forinstance,thesinglebuilding- stacking distance between the building blocks H . In the
BB
block motif would be evaluated at each repetition along z, following, we present a pathway to determine this using the
such that the same computation is performed a total of M pairdistance distribution function (PDDF).
times. It is therefore sufficient to calculate the scattering ThePDDFcorrespondstothesurface-weightedprobability
intensity of the building block only once and scale it by the to find two points separated by a distance r inside a particle
numberofstacks.Thesameprincipleappliestointer-building- (Glatter&Kratky,1982).Itisthusahistogramofalldistances
blockcontributions.Forexample,asthemotifofneighbouring that appear inside a given particle, weighted by the corre-
building blocks can be found (M (cid:3) 1) times in the full struc- spondingelectrondensities.InthecaseofDAassemblies,the
ture,wecanevaluatethismotifonceand,again,multiplyitby PDDF can be also interpreted as the volume-weighted pair
the corresponding scalar. correlation function p(r).
Inmathematicalterms,theDebyeformulacanbeadjusted An analytically available example of this definition yields
toneglectthese redundancies,resulting in thecase ofstacked spheresusingtheformalismbasedonthe
workofGlatter(1980),fromwhichaseriesofconclusionscan
IðqÞ¼fDAðqÞ2(MXNBB XNBB sinq(cid:2)(cid:4)(cid:4)qr(cid:4)(cid:4)r(cid:3)j(cid:3)rr(cid:4)(cid:4)i(cid:4)(cid:4)(cid:3) bsyestdermawonf(tseene Fstiagc.kSe1dinsphtheeressu).ppEovritdinegntliyn,foarsmathtieonsyfsotrema
i¼1 j¼1 j i
þ2MX(cid:3)1ðM(cid:3)kÞXNBB XNBB sin(cid:5)(cid:4)q(cid:2)(cid:4)(cid:4)(cid:2)rjþk(cid:4)HBBe(cid:3)z(cid:3)(cid:3)r(cid:4)i(cid:4)(cid:4)(cid:6)); cpoenaktasi.nTshteefinrsstppheeareks(bthlueectorarrceesipnoFnidgi.nSg1)PrDelDatFesptroetsheentmseteann
q(cid:4) r þk(cid:4)H e (cid:3)r(cid:4) shape of all the spheres involved – it is hence an intra-
k¼1 i¼1 j¼1 j BB z i
building-blockPDDF.Thefollowingninepeaks(redtracesin
ð2Þ
Fig. S1) are caused by the repetitive nature of the system, as
wheree denotestheunitvectoralongthezaxis(elongation/ for each possible sphere-to-sphere distance a new peak is
z
stacking direction). This formalismreducesthecalculationof found – they are hence inter-building-block PDDFs. These
the scattering intensity to the sum of structural motifs found inter-building-block PDDFs, in particular the distances
inside the stacked model (see scheme in Fig. 1). It further between them, therefore hold information on the stacking
bypasses the demand for a DA model to represent the full distance between the building blocks.
structure,asweevaluateonlyprojectionsofthebuildingblock In order to determine the stacking distance of a structure
instead of all stacked duplicates, hence the term ‘projection fromagivenPDDF,itwouldbeintuitivetomeasurethepeak
scheme’. Its benefit, compared with the standard Debye distances. However, the exact peak shapes and therefore
formula, is a reduction in the mathematical complexity from positionsaredistorteddueto(i)thelinearhigh-rdecayinthe
O½ðMN Þ2(cid:5)toOðMN2 Þ.Inthecaseoftheexampleshownin PDDF(seedashedlinesinFigs.S1andS2)and(ii)theoverlap
BB BB
392
BurianandAmenitsch (cid:2) Dummy-atommodelling IUCrJ (2018).5,390–401
research papers
of neighbouring peak contributions (Glatter, 1980; Feigin & needs to be minimized, where (cid:6) (q ) denotes the experi-
exp m
Svergun,1987).Adirectmeasurementofthepeakpositionsin mental error and N the number ofdata points.
exp
the PDDF might therefore result in misdetermination of the In DA modelling, the principle idea behind this minimiza-
stackingdistance.Astableapproachtocircumventthisissueis tionprocedureisstraightforward:agiveninitialconfiguration
to calculate the derivative of the PDDF (dPDDF). This is gradually altered until the best agreement between the
numerically fast and easy operation suppresses the above- corresponding scattering curves is found. Here, we randomly
mentioned decay distortion (see dPDDF in Fig. S1). The fillaninitialbuilding-blockvolumewithDAsthataresetfree
resulting dPDDF can then be fitted by e.g. a damped sinu- at the start of the fitting procedure (500–1500 DAs per
soidal function in which the period is directly related to the building block are suggested, depending on the experimental
mean stacking distance (see Fig. S1 and Section S1 in the resolution – see Section S2). In order to optimize the target
supporting information). function (cid:5)2, various metaheuristic methods exist such as
Amorecomplexexampleillustratingthenameddistortion SimulatedAnnealing(Kirkpatricketal.,1983)ortheGenetic
effects is the case of a torus (a ring with a circular cross Algorithm (Mitchell, 1996). In our case, we employ a meta-
section; Kawaguchi, 2001). Such a torus presents two char- heuristic fitting procedure (Boussaı¨d et al., 2013; Gogna &
acteristic dimensions: the ring–centre diameter and the ring Tayal, 2013) with an antifragile implementation (Taleb &
thickness.Asaresult,thePDDFofsuchasingletorusexhibits Douady, 2013) and the algorithm improves (cid:5)2 by randomly
twodistinctpeakscorrelatingtothesestructuralfeatures(see moving single DAs. In contrast with current DA modelling
Fig.S2;themodelscatteringdata,includinganartificialerror programs,theDAsdonotmoveonanunderlyinggrid.Asthe
band,werecalculatedaccordingtoAppendixA).Considering magnitudeofthemovementisscaledbyatemperaturefactor,
now the case of multiple stacked rings (Kornmueller et al., we force the system to freeze eventually in a given condition
2015),foreachringthatisaddedontopoftheother(s)wefind andthe algorithm toconverge.
anewpeakinthecorrespondingPDDF(seeFigs.S2andS3). DA modelling faces the general problem of uniqueness: as
However, in this case, the radial size of the tori (diameter modelsconsistof>103DAs,theinformationcontentgivenby
20nm) was selected to be larger than the stacking distance the scattering data is highly overdetermined by a given
between them (7nm), resulting in an increased distortion of configuration.Fittingofthescatteringdatawithoutrestraints
the inter-building-block peak positions [see distortion canthereforeleadtophysicallyunfeasibleresults,inparticular
phenomenon(ii)above,aswellasFig.S2].AsshowninFig.S2, with regard to the model homogeneity and compactness.
the determination of the stacking distance by fitting of the Currentalgorithmscontrolandoptimizethesetwoproperties
dPDDF yields astable result. during the fitting process by means of a ‘looseness penalty’
Weemploythisapproachtodeterminethestackingdistance regularization term as a quantitative measure of the DAs’
from all repetitive structures presented in this work. The local vicinity: by counting and maximizing the number of
advantage of this choice is that we avoid the use of a contacting neighbours of each DA, a compact and homo-
complementary characterization technique, as the underlying geneous configuration is achieved (Svergun, 1999; Franke &
PDDFcanbedeterminedfromagivendatasetusingavariety Svergun,2009;Koutsioubasetal.,2016).Weadaptthisproven
ofavailablesoftwarepackages.Amoredetaileddiscussionof technique to our grid-free algorithm in two ways. First, we
thelimitationsandpossiblecausesofmisdeterminationcanbe introducetheparameter d ,denotingthedistancebetween
N12
found in Sections S1.2and S1.3 and Figs.S1–S6. agivenDAandits12(close-packedlimit)nearestneighbours.
Similar to the looseness penalty, d quantifies the local
N12
vicinityofeachDA,actingasahomogeneityclassifierforthe
algorithmtodecideifagivenDApositionisacceptedornot.
4. Fitting algorithm
Second, we apply the idea of a regularization term and
According to the presented projection scheme, we reduce a
introduce a ‘radial compactness’ parameter RC(X) into the
seeminglyinfinitestructuretoabuilding-blockmotiftogether
minimizationprocedure,keepingtheDAclosetothe(radial)
with two boundary conditions,the stacking distance H and
BB centre ofmass of the configuration.
the number of necessary stacks M. As these two values are
As the choice of a grid-free DA algorithm is a departure
determined directly from either the PDDF or the scattering
from current implementations, we face challenges for which
data, the modelling occurs within the single building-block
no readily available solutions exist. In particular, we must
volume. We thus represent the building-block motif by a
introduce new concepts, (i) to obtain a hard-contact limit for
dummy-atom configuration X such that the scattering inten-
neighbouring DAs, (ii) to allow model scalability over
sity I (q ) (where q denotes the q value of the measured
calc m m different length scales and (iii) to generate a random move-
data points) can be calculated using equation (2). To find a
mentdependingonthecurrentannealingtemperature.Inthe
configuration that best fits the experimental scattering inten-
following subsections we discuss these topics in more detail,
sity I (q ), the relation
exp m startingwiththeintroductionofthescalingparameterD and
X
the random-movement generator, and then moving to the
1 XNexp"I ðq Þ(cid:3)I ðq Þ#2 mathematical definitions of d and RC(X) regarding
(cid:5)2 ¼ exp m calc m ; ð3Þ N12
N (cid:6) ðq Þ homogeneity and compactness. A general overview of these
exp m exp m
conceptsandtheirparametersisgiveninTable1.Attheend
393
IUCrJ (2018).5,390–401 BurianandAmenitsch (cid:2) Dummy-atommodelling
research papers
Table1
Main concepts of the proposed grid-free fitting algorithm, the corresponding mathematical parameters, their function and their implementation
comparedwithcurrentsolutionsinfixed-gridDAmodellingprograms.
Fordetailsofcurrentfixed-gridmethods,seeSvergun(1999),Franke&Svergun(2009)andKoutsioubas&Pe´rez(2013).
Concept Fixed-gridmethods Parameter Function Implementation
Scalability Gridsize D Scalingparameter d ,RC(X),randommovement
X N12
generator
Homogeneity Fixedgrid/loosenesspenalty d AvoidDAclustering/disconnected Movementclassifier/forcedrecon-
N12
DAs figuration
Compactness Loosenesspenalty/limitedsearch RC(X)/D KeepDAscloseto(radial)centre Regularizationterm/forcedrecon-
X,crit
volume ofmass/avoidmodelexplosion figuration
of this section, we further address the topic of model Alongthezaxisweintroduceacontinuitycondition(seethe
uniqueness and repeatability and give a conclusive overview limitedbuilding-blockheightinFig.1)suchthatDAsleaving
ofthe algorithm’simplementationin the programSasHel. the building block in the vertical direction are re-projected
back inside the building-blockvolume.
4.1. Scalability – scaling parameter D
X
Allowing DAs to move randomly without an underlying 4.2. Random-movement generator
grid brings an advantage in terms of the model volume and
As we pursue the concept of grid-free random DA move-
therefore the radial size. In a fixed grid system, the final DA
mentstofindanoptimalconfiguration,werequirearandom-
densityispredefinedbythegridresolution,whichisrelatedto
movement generator that depends on a series of parameters,
theresolutionofthescatteringpatterninreciprocalspace.In
including (i) the dimensions of the current DA configuration
the case of reconstruction of a highly elongated structure,
(H and D ), (ii) the current temperature T of the fitting
BB X
radial expansion of the DA configuration during the fitting
process(1>T>0)and(iii)thehelicalchiralityofthesample
process is only possible by the addition of new DAs, hence
(helicalfieldbias(cid:7)).Inevery numericaliterationthroughout
increasing the numerical overhead (N ’ D2 for radial
DA X thefittingprocess,achosenDAismovedalongtheunitaxes
growth).Radialcompression,ontheotherhand,islimitedby
e , e and e by the random vector r (T), where
x y z rand
the grid resolution, such that at a certain point the grid type
(e.g. hexagonal packing of the DAs) may impose artificial 0 h (cid:7) (cid:8)i 1
D T Rand þ(cid:7)Tcos 2(cid:2)zi e
shtotirgubhceltyumeroalorlenfgecaoattaeurdrseesstt.rhuTachntuistrheiess,ewpxaphreetrriciemutlehanerltygarlitldrimureeitsfoaolsruattihoreenasicsoanlsiekabeollyef rrandðTÞ¼BB@DXXThRandxyþ(cid:7)Tsin(cid:7)H2H(cid:2)BBzBBi(cid:8)ieyxCCA: ð5Þ
(cid:8)H TRand e
BB z z
computationaloverhead(N <10000)mustbemaintained.
DA
Our grid-free approach does not present such limitations: as
Here, a random number generator Rand returns any value
thenumberofDAsperbuildingblock(mainlydeterminedby
between(cid:3)1<Rand <1everytimeitiscalled.Inorderto
x,y,z
theexperimentalresolution)iskeptconstant,thealgorithmon
adjust the random movement by the size of the current
its own adapts to a change in DA density while at the same
configuration, the radial and longitudal contributions are
time maintaining the relative model resolution and computa-
scaledbythecurrentdiameterD orthebuilding-blockheight
X
tional resources (see Section S2.2 for a discussion of model
H ,respectively(movementalongthezdirectionisscaledby
BB
resolution).Thisinparticularbenefitsuserswhohavechosen
(cid:8) to ensure axial homogeneity, with a standard value of (cid:8) =
the wrong radial size for the starting configuration, as the
0.1). Further, each movement along the x, y or z direction is
algorithm adapts according to the scattering curve without
scaled by the current temperature T. In the case of radial
computational drawbacks. We exploit this adaptive potential
movement along thexorydirection,weadded ahelical bias
of our algorithm using an estimated diameter D of a given
X term (depending on the DA’s current e position z; the
z i
configurationX,whichactsasascalingparameterthroughout
building-blockmotifextendsfromz=0toz=H ).Thisbias
BB
the algorithm.
term is scaled by the helical field bias (cid:7) (user defined with
WequantitativelytrackthegrowingorshrinkingoftheDA
valuesoftheorderof0<(cid:7)<1,defaultvalue0.3)aswellasby
configurationXthroughoutthefittingprocessviatheradiusof
the current temperature T, such that it favours counter-
gyration R . For elongated cases, we assume a cylindrical
G,X clockwise solutions motivatedby thefirst structural model of
referencegeometry,suchthatweestimateD aftereveryDA
X DNA (Watson & Crick, 1953). It is important to note that,
movementaccording to
mathematically, the helical bias term correlates quadratically
withthe current temperature T.Therefore,itscontribution is
1 XNBB !1=2 only significant in the early stages of the fitting process (see
D ¼2ð21=2ÞR ¼2ð21=2Þ x2þy2 : ð4Þ
X G;X N i i Fig. S11 for its influence on the reconstruction) such that,
BB i¼1
using its default value of 0.3, it does not influence the
394
BurianandAmenitsch (cid:2) Dummy-atommodelling IUCrJ (2018).5,390–401
research papers
modellingofnon-helical systems (Fig.S15) but facilitates the 4.4.Compactness–theradialcompactnessparameterRC(X)
reconstruction ofhigh-aspect helical motifs.
In order to prevent unphysical disassembly of the DA
In the case of fitting globular structures (M = 1), the
configurationduring thefittingprocedure,weretaintheDAs
random-movement vector r (T) is significantly simplified
rand close to the radial centre of mass (RCOM) of a given
accordingto
configuration.Weachievethisbyintroducingapotentialwell
such that the DAs are weighted as a function of their radial
0 1
D TRand e
X x x distance from the centre of mass rCOM. DAs inside the
rrandðTÞ¼@DXTRandyeyA; ð6Þ potential well should be unaffected, soithat they move freely
D TRand e
X z z regardless of their position. Once DAs drift towards the well
border,a force should push them back towards the centre of
wherethecurrentdiameterD isnowevaluatedoverallthree
X mass, hence avoiding radial disassembly. To find an adequate
directions, such that equation (4) now also includes the
mathematicaldescriptionofthispotentialfield’wespecified
contribution ofthe DAs’ current zpositions z2.
i the following criteria: (i) radial continuity to avoid non-
linearities; (ii) asymptotic behaviour, such that
’ðrCOM !0Þ!0 and ’ðrCOM !1Þ!1; (iii) scalability in
4.3. Homogeneity – the d parameter i i
N12
magnitude (0 < ’ < 1); and (iv) scalable potential-well size
The random nature of DA movements causes a side effect
andsteepness.Basedon theserequirements (and inspired by
withregardtotheuniquenessofthefittedmodel:asthereare
the potential field for a Gaussian distribution of electrostatic
infinite possibilities for describing a given structure by
charges;Schlick,2010),weusethemathematicalpropertiesof
randomly filling it with point scatterers, the terminology of a
theerrorfunctionerfðxÞtodefinearadial-symmetricpotential
unique model is not reasonable in this context. However,
well acting on each DA. As a result, we obtain the radial
certain criteria related to the homogeneity of the DA config-
compactness parameter RC(X) according to
uration make a fitted DA reconstruction, in a physical sense,
veryunlikely.Thesescenariosare(i)high-densityDAclusters 1 XNBB 1(cid:11) (cid:9)(cid:2)rCOM (cid:2)þ1(cid:10)(cid:12)
and(ii)singledisconnectedDAsfarawayfromtheremaining RCðXÞ¼ 1þerf i (cid:3) ; ð7Þ
N 2 D 2
configuration. BB i¼1 X
We optimize the homogeneity throughout the fitting
process by analogy with the established looseness parameter whererCOMrepresentstheradialdistancebetweentheRCOM
i
infixed-gridsystems(Svergun,1999;Franke&Svergun,2009; of configuration X and the ith DA (in the case of globular
Koutsioubas & Pe´rez, 2013), namely by evaluating the local geometries, rCOM represents the distance between the centre
i
vicinityaroundeachDA.Inthesefixed-gridimplementations, ofmassofconfigurationXandtheithDA).Withregardtothe
the distance between neighbouring DAs is known such that specifications defined above, this formalism fulfils the above-
onlythenumberofcontactsneedstobecounted.Inourgrid- mentioned points (i)–(iv) by means of the scaling parameter
freecase,weusethesameprincipleinaninvertedmanner:we D (which is evaluated after every DA movement). In parti-
X
assume an ideal close-packed condition with 12 neighbours cular,forpoint(iv),scalablepotential-wellsizeandsteepness,
andcalculatetheirmeandistancetothecentralDA,resulting the potential-well boundary (half-height position) will be
in the parameter d . In the extreme case of a DAwithin a foundatðrCOMÞ=D =((cid:2)+1)/2(cid:2)=0.66,whereitsderivativeis
N12 i X
high-density cluster [scenario (i) above], d will be very 1/(cid:2).Inabsolute terms,foraverycompactstructure suchasa
N12
small.Ontheotherhand,forasinglefree-floatingDAthatis cylinderwithasmoothsurfaceoraninfinitelythincylindrical
far away from the core DA assembly [scenario (ii)], d will shell, RC(X) will be approximately 0.06 or 0.24, respectively.
N12
be very large. The magnitude of d is hence inversely Each single atom moving further away from D will cause
N12 X
proportional to the DA density around a given DA. Equally, RC(X) toincrease towards 1.
theaveraged overthefullDAconfigurationd denotes We account for this radial compactness throughout the
N12 N12,X
an inverse measure of the mean DA density of the config- fitting process by using RC(X) scaled by the compactness
uration. weight (cid:9) as a regularization term. Thus, the algorithm in fact
Weusethed parameterforatwofoldpurpose.Inorder minimizes the function
N12
toavoidscenario(i),wedonotallowDAstocomecloserthan
0.1hd i. This acts as a hard-contact limit so that we fðXÞ¼(cid:5)2þj(cid:9)jRCðXÞ; ð8Þ
N12,X
circumvent DA clustering and therefore unfeasible singula-
ritiesthroughoutthefittingprocess.Inordertoavoidscenario instead ofonly (cid:5)2alone.
(ii), we repeatedly force DAs with d > 2hd i that are As an ultimate radial boundary, similar to the search-
N12 N12,X
outside the mean radial distance h|r|i to move towards the volume diameter in DAMMIN (Svergun, 1999), we apply a
X
centreofmassoftheclosestfractionofDAs.Thisavoidsfree criticaldiameterD =2D suchthatsingleDAsmovingfar
X,crit X
floatingofasingleDAandthereforeforcestheDAstoremain away from the RCOM are repeatedly forced into a more
inacompactconfiguration.Adetaileddescriptionofhowthe compact configuration (see Section S3 and Fig. S27 for a
d parameterisimplementedinthefittingalgorithmcanbe detailed description of how the radial compactness is imple-
N12
found in Section S3 and Fig.S27. mented in the algorithm).
395
IUCrJ (2018).5,390–401 BurianandAmenitsch (cid:2) Dummy-atommodelling
research papers
4.5. Model resolution and uniqueness starting configuration according to a user-defined diameter,
the building-block stacking distance and the number of DAs
The most obvious visual side effect of using random
per building block. Once started, the fitting algorithm under-
movementsisrelatedtotheoutersurfaceofthefittedmodel.
goesN iterations:startingfromatemperatureT ,thesystem
For example, if DAs can only move on an artificial grid, the k 0
coolsdownasdefinedbythequenchingcoefficientq (0<q
fittedconfigurationofaglobularparticlewillpresentasurface T T
<1)suchthatthecurrenttemperatureatagiveniterationkis
smoothness according to the lattice planes of the grid. As
T =T qk(werecommendthedefaultvaluesN =100,T =1
already discussed above, allowing random DA movements k 0 T k 0
andq =0.99asaninitialsetofparametersforconvergence).
resultsininfinitepossibilitiesforrepresentingagivenparticle T
Ineachkiteration,alltheDAsarerandomlymovedusingthe
volumeandthusalsoitssurface.Consequently,thesurfaceof
random-movement generator (see Section 4.2) under the
a random-movement fitted configuration will be significantly
restraints explained in Section 4.3, such that a movement is
rougherthanacorrespondingfixed-gridmodel.However,this
only accepted if (i) the new DA position complies with the
increased surface roughness is just a visual symptom of the
hard-contact limit 0.1hd i (if not, up to 100 new move-
information content provided by the scattering curve, as we N12,X
mentsareconsidered)and(ii)animprovementinthefunction
discuss in the following.
f(X) is found [see equation (8)]. After each k iteration the
In absolute terms, we can expect to end up with a fitted
sequence of DAs is randomly mixed to avoid sequential
configuration that presents structural features, both on the
biasing ofthe algorithm.
surface and within the volume, that are below the resolution
Throughoutthefittingprocedure,wepursuetheconceptof
limitoftheexperimentaldata(d =(cid:2)/q ,whereq isthe
min max max
antifragility (Taleb & Douady, 2013), a concept applicable to
upper angular range of the experimental data). This implies
metaheuristic optimization (Boussaı¨d et al., 2013; Gogna &
that e.g. thin helical tapes require a large accessible angular
Tayal,2013):theconvergingsystemisrepeatedlyforcedoutof
range in order to be resolved properly.If this is not the case,
itslocalminimumsuchthataglobalminimumismorelikelyto
the retrieved model is at high risk of being over-interpreted.
On a less obvious note, a low angular resolution can further
lead to artefacts within the configuration that might not be
seen by a common surface representation (we address this
topic in moredetailin Section S2).
A common technique to avoid misinterpretation of such
artefacts is to test the reproducibility of the reconstruction
(Volkov & Svergun, 2003). This approach brings a series of
advantages. First, the overall stability and reliability of the
reconstruction from a given data set are assessed. Second, a
consecutive averaging process of all reconstructions projects
the DAs onto an artificial occupancy-weighted grid, which
provides a straightforward procedure to determine DA
validityandvolumeinhomogeneity.Third,thisoccupancymap
helps to identify structural artefacts in single reconstructions
caused by insufficient information content of the scattering
data, thus avoiding over interpretation. Fourth, the recon-
struction of an arbitrary shape from scattering data is a
(highly) underdetermined problem so it is not guaranteed to
obtain a unique result from a given fitting algorithm.
Performing a reproducibility analysis when reconstructing a
structural motif from experimental scattering data, in parti-
cular when the information content and validity of the data
are questionable, is hence highly recommended. In quantita-
tive terms, the reproducibility of a reconstruction may be
judgedbythemeannormalizedspatialdiscrepancyhNSDiof
parallel reconstructions, which represents an measure of
dissimilarity (hNSDi = 0 for identical models) for the inde-
pendentruns (Volkov &Svergun, 2003).
4.6. Implementation
Figure 2
We implemented the fitting algorithm in the computer Scattering curves computed from the seemingly endless model geome-
triesA–GshowninFig.3(redcircles;seeAppendixAformodeldetails)
program SasHel, a Qt graphical user interface (GUI) written
andfitsfromthereconstruction(blacklines).Forbettervisualization,the
in C++ that allows user interaction. When starting a fitting
scatteringpatternsareshiftedverticallyanderrorbarshavebeenomitted
procedure, the program generates a random cylindrical (seeFig.S14forhigh-qmagnificationsofthesecurves).
396
BurianandAmenitsch (cid:2) Dummy-atommodelling IUCrJ (2018).5,390–401
research papers
Figure3
Theoreticalandreconstructedthree-dimensionalmodelsoftheseeminglyendlessgeometriesusedhere.Thenumbersbelowthelabelsdenotethemean
normalized spatial discrepancy hNSDi, obtained from eight independent reconstructions (random artificial cylinder: hNSDi = 1.03 (cid:7) 0.01, random
artificialsingle-strandhelix:hNSDi=1.06(cid:7)0.01).ThePDDFsanddPDDFsusedtodeterminethestackingdistancebetweenthebuildingblockscanbe
foundinFigs.S12andS13.SeeFig.2forthecorrespondingscatteringintensitiesandFig.S15forpointrepresentations,asdenotedbythelettersA–G.
be found. These non-optimal moves are forced onto the oscillations in the higher-q regime 1 < q < 2nm(cid:3)1 (see
configuration without consideration of the damage caused to Fig. S14), which is a resonance effect caused by the stacking
themodel,makingthealgorithmlesspronetodistortionofthe nature ofidentical building blocks (see Section S2.3).
solutionspacebyregularizationterms.Asummaryofthefull Fig.3presentsthecorrespondingthree-dimensionalmodels
algorithm, including an example implementation in pseudo- and reconstructions (see Fig. S15 for point representations).
code,can be found in Section S3 andFig.S27. Models A and B show a circular cross section in agreement
TheprogramSasHelfurtherincludesa‘parallelmode’that with the model, while in the case of the cylindrical shell the
runs a unique reconstruction on each available CPU core, so emptycoreispresent.Further,therectangularcrosssectionof
the model’s validity and reproducibility can easily be tested model C is visible in the reconstructed model. However, the
using e.g.DAMAVER (Volkov & Svergun,2003). sharp corners are not fully resolved, which is most likely the
effectofinsufficientresolutionfromthescatteringdata(d ’
res
1.6nmcomparedwiththerectangularcrosssectiona(cid:6)b=4
5. Model examples
(cid:6) 20nm). In cases D–G, the helical fingerprints (single- or
Totesttheimplementedalgorithm,wesimulatedanumberof double-strandednature)arewellresolvedinthereconstructed
scattering intensities from seemingly endless bodies (see models. Cases D and E, and F and G, present noticeable
Appendix A for detailed dimensions). All reconstructions differences in their cross sections, helping to distinguish
wereperformedusingthedefaultfittingparametersasfollows: between a helical filament (empty core) and a helical tape
starting temperature T = 1, quenching coefficient q = 0.99, (filledcore).Further,variationsinthethicknessofthehelical
0 T
numberofiterationsN =200,helicalbiasparameter(cid:7) =0.3, strands can be observed. The overall agreement between
k
compactness weight (cid:9) = 1, DA form factor diameter D = modelandreconstructionforthemainfeaturesdemonstrates
DA
0.2nm and number of stacked building blocks M = 10. The thefunctionalityofthealgorithmforsimilarseeminglyendless
dimensions of the initial random configuration, including the bodies.
stackingdistanceofthebuildingblocks,weredeterminedfrom Inordertoevaluatethestabilityandreproducibilityofthe
the relative dPDDFs (see Figs. S12 and S13). For all recon- reconstruction algorithm, eight independent runs for each
structions,weused500–800DAsperbuildingblock,resulting model geometry were analysed using DAMAVER according
incomputationtimesforeachrunbetween20and60minona to the literature (Volkov & Svergun, 2003). Accordingly, we
standardworkstation, respectively. obtainedthemeannormalizedspatialdiscrepancyhNSDiasa
The scattering intensities of the theoretical models are measure of the similarity of independent reconstructions of
shown in Fig. 2, along with the fits from the reconstructions. each model (hNSDi = 0 for identical configurations). As
For all geometries used, we find agreement between the showninFig.3,thehNSDiofmodelsA–Ggraduallyincreases
theoreticalandfittedscatteringintensities(the(cid:5)2valueofall from 1.01 to 1.37 with the complexity of the model. On an
fitsisbelowthethresholdof0.1).Yet,casesA–Dshowslight absolutescale,thesehNSDivaluesarehigherthanthoseinthe
397
IUCrJ (2018).5,390–401 BurianandAmenitsch (cid:2) Dummy-atommodelling
research papers
literature (Volkov & Svergun, 2003), where stable recon-
structionswerelinkedtoanhNSDiof0.4–0.7.Thereasonfor
thisdifferenceisfoundinthegrid-freenatureofourapproach:
NSD values for grid-free programs such as GASBOR are
generally higher (Svergun et al., 2001). To validate our find-
ings, we constructed eight randomly filled artificial cylinders
and single-strand helices (N = 700), yielding hNSDi values
DA
of 1.03 (cid:7) 0.01 and 1.06 (cid:7) 0.01, respectively. These values, in
contextwiththehNSDioftheabovereconstructions,provide
a reference for grid-free models where an hNSDi between 1
and 1.4 isobserved.
ItisalsopossibletorunthefittingalgorithmusingonlyM=1
stacks, which corresponds to the case of the building block
alone.Thus,thesameprogramcanalsobeusedtofitglobular
particles.Wetestedthisoptioninasimilarwaytothemethod
outlined above by simulating a number of scattering inten-
sities from globular bodies (see Appendix A for detailed
dimensions).Forallreconstructions,weusedthesamedefault
fittingparametersasmentionedpreviously.Thedimensionsof
the initial random configurations were determined from the
maximum dimension found in the relative PDDFs (see
Fig.S12).However,weincreasedthenumberofDAsto800–
1200, now resulting in computation times between 5 and
10min per run on a standardworkstation, respectively.
Thescatteringintensitiesofthetheoreticalmodelsandthe
fits from the reconstructions are shown in Fig. 4(a). For all
geometries used, we find agreement between the theoretical
andfittedscatteringintensities(the(cid:5)2valueofallfitsisbelow
the threshold of 0.1). Evidently, the oscillations previously
foundfortherepeatingmotifinthehigher-angleregime (see
Fig. 2) are not present. The three-dimensional models and
reconstructions corresponding to the scattering patterns are
shown in Fig. 4(b) (see Fig. S16 for point representations).
Alsointhiscase,thefittedmorphologiesclearlyrepresentthe
corresponding models,includingthehollowgeometries Iand
L.ThehNSDivaluesforeightindependentreconstructionsof
each geometry are within the range 0.96–1.28, as denoted in
Fig. 4(b) [in relative terms, the hNSDi values for eight
Figure 4
randomlyfilledartificialspheresandcubes(NBB=1500)were (a) Scattering curves computed from globular model geometries (red
1.04 (cid:7) 0.01 and 1.06 (cid:7) 0.01, respectively]. These findings circles;seeAppendixAformodeldetails)andfitsfromthereconstruc-
validate the use of the algorithm for globular bodies as well. tion (black lines). For better visualization, the scattering patterns are
shifted vertically and error bars have been omitted. The PDDFs of all
modelscanbefoundinFig.S12.(b)Theoreticalandreconstructedthree-
dimensionalmodelsoftheglobulargeometriesusedhere(seeFig.S16for
6. Experimental examples
point representations of the reconstructions). The numbers below the
As final example, we applied the described reconstruction labelsdenotethemeannormalizedspatialdiscrepancyhNSDi,obtained
fromeightindependentreconstructions(randomartificialsphere:hNSDi
procedure to experimental data for a self-assembled peptide
=1.04(cid:7)0.01,randomartificialcube:hNSDi=1.06(cid:7)0.01).
double-strand helix (Kornmueller et al., 2015). Prior to the
fitting process, we determined a building-block stack spacing
of53nmfromthecorrespondingPDDF(seeinsetinFig.5and independenttapeswithinthebuildingblock(seeFig.S17afor
Fig.S13). The scatteringdata were then fittedusing 800 DAs pointrepresentations).However,thetwotapesdonotappear
over the angular range 0.08 < q < 2.14nm(cid:3)1, resulting in a to be symmetric along the z direction, suggesting a displace-
real-spaceresolutionofapproximately(cid:2)/q =1.5nm,where mentangleof’6¼180(cid:8)betweenthem.Thecrosssectionofthe
max
the default fitting parameters according to Section 4 were helical tapes presents a rather rough surface that does not
used. allow more detailed interpretation, as is typical for such
AsshowninFig.5(a),weagainfindagreementbetweenthe random-movementDAmodels.Nevertheless,acomparisonof
experimental and fitted scattering curves ((cid:5)2 = 0.08). The thereconstructionwiththemodelaccordingtothepreviously
correspondingreal-spacereconstruction(Fig.5b)presentstwo published dimensions (Kornmueller et al., 2015) gives good
398
BurianandAmenitsch (cid:2) Dummy-atommodelling IUCrJ (2018).5,390–401
research papers
Figure5
Verificationofthereconstructionalgorithmusingexperimentalscatteringdata.(a)Experimentalscatteringdata(errorbarsomittedforclarity)and
fittedcurvefromthereconstructionoftheself-assembledpeptidedoublehelix.ThecorrespondingPDDFisshownintheinset,whereasthedPDDFused
todeterminethestackingdistancecanbefoundinFig.S13.(b)Orthogonalviewsofthemodelreconstructedfromthescatteringpatterninpanel(a)
(green)comparedwiththepreviouslypublishedstructure(blue).(c)Experimentaldata,fittedcurveandPDDFforalcoholdehydrogenase1(ADH).(d)
Orthogonalviewsofthemodelreconstructedfromthescatteringpatterninpanel(c)(green)comparedwiththecrystalstructuremodel(blue).SeeFig.
S17forpointrepresentationsofthereconstructionsshowninpanels(b)and(d),andFig.S18foranumericalstabilityanalysisofbothreconstructions.
agreement (see the red model in Fig. 5c). A movie of the showninFig.3,thehNSDiwasfoundtobe1.27(cid:7)0.02,hence
rotatinghelicescanbefoundinthesupportinginformation.To confirming the reproducibility of the reconstruction.
obtain a measure of the uniqueness of the final model, we Analogous to the above, we further applied the recon-
repeatedthefittingproceduretoendupwith16independent struction procedure to experimental data for the globular
reconstructions. The average of all 16 models (Volkov & protein alcohol dehydrogenase 1 (ADH) in phosphate-
Svergun,2003),asshowninFig.S18(a),isconsistentwiththe buffered saline at pH 7.5. The data set was taken from the
singlereconstruction.Inagreementwiththereferencevalues SASBDB database (Valentini et al., 2015; date of download
399
IUCrJ (2018).5,390–401 BurianandAmenitsch (cid:2) Dummy-atommodelling
Description:Tayal, 2013) with an antifragile implementation (Taleb &. Douady, 2013) and the .. smoothness according to the lattice planes of the grid. As.