Table Of ContentOpen-architecture Implementation of
Fragment Molecular Orbital Method
for Peta-scale Computing
Toshiya Takami∗, Jun Maki∗, Jun-ichi Ooba∗, Yuichi Inadomi†, Hiroaki Honda†, Taizo Kobayashi∗,
Rie Nogita∗, and Mutsumi Aoyagi∗†
∗Computing and Communications Center, Kyushu University, 6–10–1 Hakozaki, Higashi-ku, Fukuoka 812-8581, Japan
7 †PSI Project Laboratory, Kyushu University, 3–8–33–710 Momochihama, Sawara-ku, Fukuoka 814-0001, Japan
0
0
2
Abstract—We present our perspective and goals on high-
n
performance computing for nanoscience in accordance with the
a
J global trend toward “peta-scale computing.” After reviewing
our results obtained through the grid-enabled version of the
1
fragment molecular orbital method (FMO) on the grid testbed
1
by the Japanese Grid Project, National Research Grid Initiative
(NAREGI), we show that FMOis one of the best candidates for
]
C peta-scale applications by predicting its effective performance
in peta-scale computers. Finally, we introduce our new project
D
constructing a peta-scale application in an open-architecture
. implementation of FMO in order to realize both goals of high-
s
performance in peta-scale computers and extendibility to multi-
c
[ physics simulations.
1
v I. INTRODUCTION
5
On account of the recent developments of the grid com-
7
0 puting environment, we can use many computer resources Fig. 1. In FMO, the target molecule is divided into fragments for which
ab initio calculation is performed. Usually, carbon–carbon single bonds are
1 more than a thousand CPUs. However,those large distributed
chosenasaboundarybetweenfragments.
0 resources are accessible only when we pass through some
7
gateway to the grid system after a tedious procedure for
0
grid-enabling of application programs. Thus, for scientists in
/
s nanosciencetousethegridsystemasoneofthedailytools,it shown in section III, and the introduction of the actual im-
c plementationofFMOinordertorealizepeta-scalecomputing
: is importantto make their applicationsgrid-awarein advance.
v is given in section IV. Finally in section V, we discuss what
On the other hand, the development on high performance
i is necessary in the actual peta-scale computing for scientific
X computing (HPC) environments shows no sign of slowing
applications.
r down, and, within several years, we might reach the scale of
a peta (∼1015) in computer resources for scientific computing. II. GRID-ENABLED FMO
It is expected that the “peta-scale computer” exhibits more
In this section, we review the results on a grid-enabled
than a peta flops performance in floating-point calculations,
versionofFMO,andevaluateitseffectivityinthegridtestbed
its available memory is more than a peta byte in total, or it
of NAREGI [3]. Our main contribution to the project is to
has external storages amount to more than a peta byte. Thus,
develop large-scale grid-enabled applications in nanoscience.
the global trend in HPC is now to study how to realize the
One of these applications is the Grid-FMO which is based
peta-scale computing [1].
on the famous ab initio molecular orbital (MO) package
The purpose of this paper is twofold: one is to present program, GAMESS [6], and is constructed by dividing the
our computational results in nanoscience supported by the package into several independent modules so that they are
Japanese Grid project, National Research Grid Initiative executed on a grid environment. The algorithm of FMO and
(NAREGI) [2], and the other is to show a perspective about the implementation of Grid-FMO is briefly reviewed in the
the HPC applications for peta-scale computing. This paper is following subsections.
coordinatedas follows. Those computationalresults using the
A. Algorithm of FMO
fragmentmolecular orbital method (FMO) on grid computing
environments are shown in section II. The performance pre- TheFMOmethoddevelopedbyDr.Kitauraandco-workers
diction of the FMO calculation in a peta-scale computer is [4]canexecuteallelectroncalculationinlargemoleculeswith
Fig.3. FlowofLC-FMOrepresented inNAREGIWorkflowTool
fragment calculation (monomer), run.dim for the fragment
pair calculation (dimer), and run.tot for the total energy
calculation,andare invokedfroma scriptprogramwhichalso
managesthefile transferoverdistributedcomputers.Thetotal
flow of the LC-FMO is shown in fig. 2.
Another benefit of the loosely-coupled form is the ex-
tendibility of its functionality. In fact, after the first release
Fig.2. Flow ofcalculation ofthegrid-enabled FMO of the LC-FMO, we have developed two other modules
which realize a linkage to the initial density database and
a visualization feature of the total molecular orbitals. Since
morethan10thousandsatoms.Thebriefoverviewoftheflow the top-level program to invoke modules is lightweight and
of the fragment MO method up to dimer correction is the can be written in script languages, further modification of the
following: (1) the molecule to be calculated is divided into total flow is easily performed.
fragments,(2)abinitioMOcalculation[5]isexecutedforeach
C. Execution on NAREGI Grid
fragment(monomer)undertheelectro-staticpotentialmadeby
In order to execute LC-FMO on the NAREGI grid, we put
all the other fragments, and this calculation is repeated until
theflowoftheprogramintoNAREGIWorkflowTool[3].The
the potential is self-consistently converged, (3) ab initio MO
graphicalrepresentationof the LC-FMO programis shown in
calculation is executed for pairs of fragments (dimer), and
fig. 3, where modules and input/output files are represented
(4) the totalenergyis obtainedby summingup allthe results.
in icons connected each other. This procedure is simple and
ThisalgorithmofFMOisimplementedinseveralfamousMO
straightforward because we have already reconstructed the
packages such as GAMESS [6], ABINIT-MP [7], etc.
programinto a suitable form for distributed computing.Thus,
Although the most time-consuming part of this calculation
the importantis whether the basic structure of the program is
isabinitioMOcalculationsforeachfragmentandeachpairof
grid-aware or not.
fragments, these calculations can be executed independently.
Once a program is grid-enabled, we can execute it by
Thus, FMO is easily executed in parallel computers with
the use of the large-scale computing resources over one
high efficiency. The FMO routines included in those famous
thousand CPUs. From the programmer’s points of view, the
packages are already parallelized for cluster machines, and
most preferable thing is that we are relieved of arranging
are coordinated so that they exhibit efficient performance.
the resources to execute the program in high efficiency. In
However, if the program is used in distributed computing en-
NAREGI testbed system [3], NAREGI Super Scheduler can
vironments,we should consider robustnessand controllability
managetheresourcerearrangementwith thehelpofNAREGI
in addition to the efficiency. Thus, grid-enablingis necessary.
Information Service.
Among many ways of grid-enabling, we choose a strategy to
reconfigure the program into a loosely-coupled form in order D. Efficiency of LC-FMO
to satisfy such properties.
Robustness and efficiency are conflicting each other in
general. Since our reconstruction of FMO to increase the
B. Loosely-coupled FMO
controllability might hurt the efficient execution of the pro-
We have developed a grid-enabled version of FMO, called gram,itisnecessarytoevaluatethe efficiencyofLC-FMO by
a Loosely-coupled (LC) FMO program [8], as part of the comparing with the original FMO program in GAMESS.
NAREGI project. At first, we show the procedure to change In table I, we show elapsed times for chicken egg white
the GAMESS-FMO programs into a “loosely-coupled form,” cystatin (1CEW, 1701 atoms, 106 fragments, shown in fig. 4)
where the original FMO in the GAMESS package is divided by the LC-FMO and the originalGAMESS-FMO codes. This
into several “single task” modules which are connected each is obtained on the NAREGI testbed system which consists of
other through input/output file transfers. Those modules con- 16 CPUs of Xeon 3GHz. Generally speaking, an increase of
sists of run.ini for the initialization, run.mon for the the total elapsed time is inevitable when we reconstructsome
TABLEI TABLEII
THEELAPSEDTIMEFORCHICKENEGGWHITECYSTATIN. TIMINGDATAOBTAINEDINASINGLENODEOFIBMP51.9GHZ.
Input 1cew 1ao6half 1ao6 1ao6 dim
LC-FMO GAMESS-FMO
No.Atom 1701 9121 18242 36484
Initial Guess 37s 4s Nf 106 561 1122 2244
Monomer 1h11m 59m Nd(Nf) 690 4192 8416 16832
Dimer 2h16m 2h11m Nes(Nf) 4875 152888 620465 2499814
Energy 4s <1s Im 17 17 17 17
Total 3h28m 3h10m monomer 1356 13364 40005 140810
(Average) (0.752) (1.40) (2.10) (3.69)
Time SCF-dimer 2037 20689 70465 186901
(sec) (Average) (2.95) (4.94) (8.37) (11.1)
ES-dimer 398 13772 55955 208627
(Average) (0.0816) (0.0901) (0.0902) (0.0835)
ElapsedTime(sec) 3799 47886 166601 536898
TABLEIII
TIMINGDATAOBTAINEDIN16CPUSOFXEON3GHZ.
Input 1cew 1ao6half 1ao6
No.Atoms 1701 9121 18242
Nf 106 561 1122
Nd(Nf) 690 4192 8416
Nes(Nf) 4875 152888 620465
Im 17 17 17
monomer 1030.4 10808.9 33989.2
(Average) (9.72) (19.3) (30.3)
Fig.4. Themolecularstructureofchickeneggwhitecystatin(1CEW,1701 Time SCF-dimer 1677.4 17517.6 50819.2
atoms,106fragments). (sec) (Average) (41.3) (71.0) (102.7)
ES-dimer 293.1 9594.8 39133.4
(Average) (1.02) (1.07) (1.07)
ElapsedTime(sec) 3003.5 38065.9 126330.9
application into a loosely-coupled form, which is considered
as a cost for grid-enabling.As shown in table I, however, the
increase of the time is relatively small. Thus, it is evaluated B. Performance Prediction of FMO
that the LC-FMO is effectively executed on grids.
The procedure to predict the performance of FMO in the
III. PERFORMANCE PREDICTION
peta-scale computeris a somewhatphenomenologicalmethod
In this section, we predict what performance would be
based on actual measurements in current computer systems.
expected if we executed our FMO program in a “peta-scale
First, we assume an execution model of FMO which gives a
computer.” Since the actual hardware is not available at
theoretical timing function of the number of fragments N .
f
present,thepredictionisbasedonahypotheticalspecifications
After fixing some parameters by several measurementsof the
of the computer.
program, we predict the elapsed time on the virtual computer
A. Hypothetical Specification of the Peta-scale Computer with peta-scale specifications.
It is said that, in a few years, such computer systems 1) Execution Model of GAMESS-FMO: Even if you could
with peta-scale specifications will be designed [9]. Here, we analyze all the program codes of GAMESS, it is almost
concentrate on the CPU resourcesof the peta-scale computer, impossible to determine the number of floating-point cal-
andestimatehowmanyCPU resourcesarenecessaryto attain culations in a FMO routine precisely since there are many
peta-scale performances. conditionalbrancheswhichdependonthe molecularstructure
The current peak performance of a single-core CPU is givenasaninput.Ourstrategy,here,isapracticalapproachto
the order of 10 giga flops. If a parallel computer with 10 obtainphenomenologicalvaluesoftheexecutiontimethrough
peta-flops peak performance is configured by the CPU to experimental executions of FMO in the current machines.
achieve 1 peta-flop effective performance, 1,000,000 CPU ThetotalexecutiontimeofFMOisdividedintothreeparts:
cores are necessary in total. Since cluster system constructed amonomerpart,anSCF-dimerpart,andanES-dimerpart.The
by1,000,000machinesconnectedeachotherisunimaginable, monomer and SCF-dimer parts perform SCF calculations for
itisnecessarytorearrangethoseresourcesintomultiplelayers. eachfragmentsandeachpairoffragments,respectively,while
We do not concern here how to realize computational nodes the ES-dimer part obtains dimer correction terms under an
with multiple CPU cores. If we can configure a lowest layer electro-static(ES)approximationbetweenfragments.Usually,
by a node with the performance of 100 ∼ 1,000 CPUs, the the ES approximation is applied to the fragment pair with
totalcomputerwillbe aparallelsystem composedof1,000∼ fragments which are separated more than a certain threshold.
10,000 nodes. Westartfromanexpressionofthetotalamountofcomputation
TABLEIV (a)
PARAMETERVALUESTOREPRESENTTHEEXECUTIONMODELOFFMO 4.0
Parameter Value 3.5
ffm((00)) ffm((11)) 02..5893 00..00001349 ec) 23..50
d d s
fe(s0) — 0.082 — me ( 2.0
Eibm Exeon 1.0 0.071 Ti 1.5
IBM p5
1.0
0.5 Xeon
as a function of the number of fragments N , 0.0
f
0 500 1000 1500 2000 2500
Fragments
Ftotal(Nf)=Fm(Nf)+Fd(Nf)+Fes(Nf), (1) (b)
12
where F (N ), F (N ), and F (N ) represent the number
m f d f es f
10
of floating-point calculations for monomer, SCF-dimer, and
ES-dimer parts, respectively. These are assumed as c) 8
e
s
Fm(Nf) = hfm(0)+fm(1)NfiNfIm (2) me ( 6
Ti 4
F (N ) = f(0)+f(1)N N (N ) (3) IBM p5
d f h d d fi d f 2 Xeon
F (N ) = f(0)N (N ), (4)
es f es es f 0
0 500 1000 1500 2000 2500
according to the algorithm of the FMO theory [4], where Fragments
I is the number of monomer loops, N (N ) and N (N ) (c)
m d f es f
0.1
represent the numbers of SCF-dimers and ES-dimers. SCF
0.09
calculationsformonomersand dimersin FMO dependon the
0.08
environmental potential which is made by all the fragments 0.07
other than the target monomer and dimer, while it can be ec)0.06
s
ignoredforES-dimers.Thus,werepresenttheamountofeach e (0.05
m
SCF calculation for a monomer and an SCF-dimer up to a Ti0.04
0.03 IBM p5
lineartermofN ,andthatofnon-SCFcalculationforan ES-
f 0.02
dimer as a constant. The parameters, fm(0), fm(1), fd(0), fd(1), 0.01 Xeon
f(0), should be determined from the actual timing data. If we 0
es
define additional parameters, the number of parallel nodes K 0 500 1000 1500 2000 2500
Fragments
and the effectivefloating-pointperformanceE (flops) of each
node,we can comparethe actualtiming data with the amount
Fig. 5. The fragment number dependency of the amount of computation
of computation divided by KE. for(a) amonomer, (b) anSCF-dimer, and (c) anES-dimer areplotted with
Table II and III show the timing data of each part in test theeffective performance ratios (E=1forIBMp5,E=0.071forXeon).
executions of several inputs on the machines, 1 node of IBM Lines are functions (a)(fm(0)+fm(1)Nf)/KE,(b) (fd(0)+fd(1)Nf)/KE,
p5 1.9GHz and a 16 CPU cluster of intel Xeon 3GHz. Since and(c)fe(s0)/KE obtained bytheleast-square method.
the averaged size of fragments in these inputs are considered
as almost the same, the difference of the elapsed time mainly
20000
depends on the number of the total fragments N . The least-
f
square fitting with the additional parameters representing the mer16000
effective performances of each computer, Eibm and Exeon, di
determines all the parameter values shown in table IV after F-12000
C
the normalization E = 1. In figures 5 (a), (b) and (c), we S
plotthetimingdataaibnmdtheresultsofthefunctions,(a)(fm(0)+ m. of 8000
fm(1)Nf)/KE, (b) (fd(0)+fd(1)Nf)/KE, and (c) fe(s0)/KE. Nu4000
2) PerformanceinPeta-scaleComputer: Inordertopredict
thetotalperformanceofFMO,itisconvenientifthefunctions 0
0 500 1000 1500 2000 2500
N (N ) and N (N ) are represented in simple functions of
d f es f Fragments
N . By the least-square fitting of the data in table II and III,
f
we obtain a function for the number of SCF-dimers
Fig. 6. Thenumber ofSCF-dimers inthe test executions andthe function
Nd(Nf)bytheleast-square method.
N (N )=7.50 N . (5)
d f f
30
25
ES dimer
r) 20
u
o
h
e ( 15 SCF dimer
m
Ti 10
5 Monomer
0
0 50000 100000
Fragments
Fig.7. PredictedelapsedtimeoftheFMOcalculationbyGAMESS
as a function of the number of fragments N . This is obtained on
f
Fig.8. Multi-physics/multi-scale simulator stackincluding OpenFMO.
the assumption that 10,000 nodes are available, and each node is 5
times faster than a current machine.
If we consider a constraint
(N −1)N
f f
N (N )+N (N )= , (6)
d f es f
2
the number of ES-dimers is represented in the form
(N −1)N
N (N )= f f −7.50 N . (7)
es f f
2
If we substitute these functions and parameters into Eqs.
(2), (3), and (4), the total computational amount of FMO
is obtained as a function of N . As already seen in the
f
Fig. 9. Can multi-physics/multi-scale simulations reveal the true aspect of
previoussection,theFMOalgorithmisappropriateforparallel
thecomplexworldofmatter?
executions when the number of fragments is large enough
compared to the number of available nodes. Suppose that we
have a peta-scale computer with K = 10000 and E = 5,
i.e., the number of available nodes is 10,000 and each node
One of the reason why the current program fails to run is
has an effective speed 5 times faster than a node of IBM p5
memory consumption in each node, where the current FMO
1.9GHz which is used in this paper as a reference machine
code in GAMESS tries to allocate two-dimensional arrays
(E =1).Then,thepredictedelapsedtimeF (N )/KE
ibm total f
with respect to fragments, e.g., the distance between two
is calculated as a quadratic function of N shown in fig.
f
fragments.Thisexceeds40gigabytesevenifwe considerthe
7. From the viewpoint of the elapsed time, we can perform
symmetry. The limit of the number of fragmentis considered
quantum calculations for molecules with more than 100,000
as the order of 10,000 in the current implementation. Thus,
fragments if the peta-scale computer is realized.
the reconstruction of the FMO code is necessary in order to
According to the effective performance measurements on
PCs, TheFMO calculationis executedin 0.5∼1.0gigaflops correspond to the peta-scale computing.
per one CPU of Xeon or Pentium4, which means that our In this section, we introduceour new project to reconstruct
referencemachine,anodeofIBMp51.9GHz,exhibitsalmost a FMO program from scratch. The name of this project,
10 giga flops for the program since it is E /E ≈ 14 OpenFMO [10],stands for the followingopenness:(1) names
ibm xeon
times faster than a Xeon. Then, the total performance of and argument lists of APIs constructing the FMO program
FMO calculations for a peta computer with K = 10000 are opened publicly, (2) the program structure is also opened
and E = 5 is considered as 0.5 peta flops. Thus, FMO is to combining other theories of physics and chemistry, i.e.,
consideredasapredominantcandidatewhichcanrecordpeta- multi-scale/multi-physics simulations, and (3) the skeleton
scale performance. program is developed under some open-source licenses and
its development process will also be opened to the public. In
IV. OPENFMO PROJECT fig. 8, we show a stack structure of this program. Although
In spite of the fact that the FMO is a suitable algorithm the main target is still a quantum chemical calculation of
to achieve the peta-scale performance through large-scale molecules,itcanbecombinedwithothertheoriestoconstruct
parallelexecutions,theactualGAMESS-FMOcodecannotbe multi-physics/multi-scalesimulations(fig.9)forcomplexphe-
executedfor the molecule with more than 100,000fragments. nomena [11].
A. Open Architecture Implementation of FMO Generally speaking, it is very difficult for scientific appli-
cations to be executed in a peta-scale performance since a
AsalreadyshowninsectionII-A,thefragmentMOmethod
simple parallel scheme fails in case that the total number of
consists of the standard ab initio MO calculations including
CPUs is the order of 1,000,000. Thus, it will be necessary
thegenerationofFockmatriceswithone/two-electronintegral
that the application itself becomes multi-layered in order to
calculations. Using this property, the usual FMO program is
use the resources efficiently corresponding to the hardware
divided into two layers, the skeleton program which control
architecture of peta-scale computers. The FMO calculation
the whole flow of the FMO algorithm, and the molecular
which we have studied in this paper has a two-stage structure
orbital (MO) APIs to provide the charge distribution of each
in its original algorithm: ab initio calculations are performed
fragmentthrough ab initio MO calculations. Since these spec
for each fragment and fragment pair while the interactions
of interfaces between the skeleton and the APIs are fixed and
between these fragments are described by a classical electro-
opened publicly, either of them can easily be substituted by
static potentials.Thistheoreticalreconstructionoftheoriginal
other programs.
ab initio MO method into the layered form is considered as
B. Open Interface to Multi-physics Simulations the key to the successful performance of the program.
The multi-physics simulation is one of the predominant Scientific calculations with layered structures have been
strategy to construct peta-performance applications for nano- carriedoutasamulti-physics/multi-scalesimulationinvarious
scale materials. Our new implementationof FMO can also be fields. This type of calculations is important not only as a
opened for such multi-physics simulations. Since the FMO techniqueofrealisticsimulationsbutalsoasanactualexample
method is based on electro-static interaction between frag- for the peta-scale computing. However, we claim that the
ments, we can extend each fragment to the general object deep understandingof the physical/chemicaltheories must be
which can provide a static charge distribution. For examples, necessary for a proper construction of effective simulations
we often use molecular mechanics representation of atomic which describes complex aspects of nature. Then, the the-
clusters which should be given by quantum mechanical de- oretical and practical way of constructing such simulations
scription,ortheenvironmentalchargedistributionsurrounding should be established before the time when the peta-scale
a molecule which models a solvent can be included as a computers are available. We expect that our implementation
fragment. Thus, the MO-APIs for a certain fragment can be of the simulator is one of the solutions for high-performance
substituted to other programs based on the different approxi- scientific simulations in the next-generation.
mation levels.
ACKNOWLEDGMENT
In addition to the description in molecular sciences, this
ThisworkispartlysupportedbytheMinistryofEducation,
is extended to larger-scale simulations through the “multi-
Sports,Culture,ScienceandTechnology(MEXT)throughthe
physics/multi-scale simulator” layer. This type of extension
Science-grid NAREGI Program under the Development and
is widely used for the large-scale computation representing
Application of Advanced High-performance Supercomputer
realisticmodelsofmoleculesincellsorotherlivingorganisms.
Project.
C. Open Source Development of the Programs
REFERENCES
The source code of the skeleton program of OpenFMO is
publicly opened according to some open-source licenses. At [1] Therecentactivities forpeta-scale computing inJapanarereviewed in
“The11th‘Science inJapan’Forum”(WashingtonDC,Jun16,2006),
present, the OpenFMO project is managed by several core
http://www.jspsusa.org/FORUM2006/Agenda06.html
members including the authors of this paper. Once we show [2] NAREGIPoject, http://www.naregi.org/
the effectivenessof our approachto the openimplementation, [3] S. Matsuoka, S. Shimojo, M. Aoyagi, S. Sekiguchi, H. Usami, and
K.Miura, “Japanese Computational Grid Research Project: NAREGI,”
and the direction of the project is settled properly, we are
inProc.IEEE,2005,vol.93,pp.522–533.
willingtochangethemanagementoftheprojectintoso-called [4] K. Kitaura, E. Ikeo, T. Asada, T. Nakano, M. Uebayasi, “Fragment
open-source-softwaredevelopments. Molecular OrbitalMethod:anApproximate Computational Methodfor
LargeMolecules,” Chem.Phys.Lett.,1999,vol.313,pp.701–706.
V. SUMMARY AND DISCUSSION [5] C. C. J.Roothaan, “New Developments in Molecular Orbital Theory,”
Rev.Mod.Phys.,1951,vol.23,pp.69–89.
Inthispaper,weshowedourresultsinnanoscienceexecuted [6] http://www.msg.ameslab.gov/GAMESS/GAMESS.html
on the NAREGI grid system, and expressed our perspective [7] T.Nakano, T.Kaminuma, T.Sato, K.Fukuzawa, Y. Akiyama, M.Ue-
bayasi,K.Kitaura,“Fragmentmolecularorbitalmethod:useofapproxi-
toward the peta-scale computing. In section III, we showed
mateelectrostaticpotential,”Chem.Phys.Lett.,2002,vol.351,pp.475.
that FMO is one of the peta-scale applications. However, it [8] M.Aoyagi,“LooselyCoupledApproachonNano-ScienceSimulations,”
has been also realized that the actual implementations of the an oral presentation given in “The 11th ‘Science in Japan’ Forum,”
http://www.jspsusa.org/FORUM2006/aoyagi-slide.pdf
FMOmethod,atpresent,doesnotcorrespondtotheexecution [9] RikenSupercomputer Project, http://www.nsc.riken.jp/
inpeta-scaleenvironments,i.e.,theydoesnotsolvemolecules [10] OpenFMOProject, http://www.openfmo.org/
with more than 10 thousand fragments even if peta-scale [11] T.Takami,J.Maki,J.Ooba,T.Kobayashi,R.Nogita,andM.Aoyagi,
“Interaction and Localization of One-Electron Orbitals in an Organic
computersare available. In order to improvethe situation, we
Molecule: Fictitious ParameterAnalysisforMultiphysics Simulations,”
started an open-source project for multi-physics simulations J.Phys.Soc.Jpn.76,013001(2007)[4pages];e-printnlin.CD/0611042.
including new FMO codes from scratch.