Table Of ContentA spectral-Galerkin turbulent channel flow solver for large-scale
simulations
Mikael Mortensena
aDepartment of Mathematics, Division of Mechanics, University of Oslo
7
1
0
2 Abstract
n
Afully(pseudo-)spectralsolverfordirectnumericalsimulationsoflarge-scaleturbulentchannelflows
a
J is described. The solver utilizes the Chebyshev base functions suggested by J. Shen [SIAM J. Sci.
3 Comput., 16, 1, 1995], that lead to stable and robust numerical schemes, even at very large scale.
1 New and fast algorithms for the direct solution of the linear systems are devised, and algorithms
andmatrices for all requiredscalarproducts andtransformsare provided. We validate the solverfor
]
very high Reynolds numbers. Specifically, the solver is shown to reproduce the first order statistics
A
of Hoyas and Jiménez [Phys. Fluids, 18(1), 2006], for a channel flow at Re = 2000. The solver is
N τ
available through the open source project spectralDNS [https://github.com/spectralDNS].
.
h
Keywords: DNS, Fourier, Chebyshev, Biharmonic, Helmholtz, turbulence
t
a
m
[ 1. Introduction
1
v Direct NumericalSimulations (DNS) ofturbulent flowsis a veryimportantresearchtool, utilized
7 across a range of scientific communities [1]. DNS is used extensively to validate statistical models,
8 andto further our understanding ofcomplex mechanismstaking place inside turbulent flows. One of
7
the manyadvantagesofDNSis thatit providesallinformationabouta flow,andquantitiesthat can
3
be very hard to study experimentally, like velocity-pressureinteractions, are trivially extracted from
0
. a DNS. In this regard, DNS both complements and extends the knowledge we are able to extract
1
from experiments.
0
ThemostcommonlyknownDNSusesimplegeometries,becauseturbulencephysicsmaythenmost
7
1 easilybeisolatedandstudied. Isotropicandhomogeneousturbulenceareusuallystudiednumerically
: in triply periodic domains, which allows for a spectral Fourier decomposition in all three spatial
v
directions. Spectral methods are often favoredin DNS due to their superior accuracy and resolution
i
X
properties. One example is given in the DNS review of Moin and Mahesh [1], who report that, for
r similar accuracy in first derivatives, a second-order finite difference scheme requires approximately
a
5.5 times more points than Fourier, in each spatial direction, whereas for a 6’th order Padé scheme
the factor is about 1.6.
In this paper we will consider the pressure driven turbulent channel flow, where there are two
periodic directions that can be handled with Fourier expansions, and a non-periodic wall-normal
direction that requires a different type of discretization. There are many challenges associated with
this inhomogeneity not faced by the pure Fourier solvers, but the first problem at hand is the dis-
cretization. Early DNS channel solvers, see, e.g., [2, 3, 4], typically used a Chebyshev expansion
for the wall-normal direction and, as such, were still able to obtain spectral accuracy in all three
Email address: [email protected] (MikaelMortensen)
Preprint submitted toElsevier January 17, 2017
spatialdirections. AChebyshev-tautechnique,thatutilize the recurrencerelationsofthe Chebyshev
polynomials, was used to approximate derivatives, and the coefficient matrices that appeared (tridi-
agonal)could then be inverteddirectly and efficiently [4]. A downside to the Chebyshev-taumethod
is usually quoted [5] as numerical instability and roundoff errors, caused by the recurrence relation,
and severe condition numbers of the coefficient matrices. For Chebyshev-tau methods the condition
numbers have been reported to grow with size as (N8), for a discretization using N points in the
O
wall-normaldirection. Discouragedbythese numbers,allmajorrecentchannelflowsimulationshave
found other, non-spectral, ways of discretizing the non-periodic direction.
ThelargestknownchannelsimulationstodatehavebeenperformedbyLeeandMoser[6],where,
forRe 5200,theyusedacomputationalboxofresolution[10240 1530 7680]forthestreamwise,
τ
≈ × ×
wall-normal and spanwise directions, respectively. Lee and Moser used seventh-order B-splines for
thewall-normaldirection. OthersimulationsofsimilarmagnitudehavebeenperformedbyHoyasand
Jiménez[7,8], Lozano-DuranandJiménez[9]andBernardini,PirozzoliandOrlandi[10]. Bernardini
et al. used second-order finite differences throughout. The Jiménez group used dealiased Fourier in
thetwoperiodicdirectionsandseventh-ordercompactfinitedifferences,withfourth-orderconsistency
andextendedspectral-likeresolution[11],forthewall-normaldirection. ThesolverbyJiménez’group
isreportedtoswitchfromChebyshevtofinitedifferencesiftheresolutionisaboveacertainthreshold
[8](reachedaroundRe =1000). Inotherwords,theyattempttouseafullyspectraldiscretizationfor
τ
aslargeRe aspossible. Aspreviouslymentioned, spectralmethodsareattractivefortheir accuracy
τ
andresolutionproperties,thataresuperiortothoseofanyfinitedifferenceorsplinemethod. Assuch,
itisdesirabletodevelopfullyspectralsolversthatcanbeusedforlarge-scaleturbulencesimulations.
In his seminal papers [12, 13], Jie Shen describe how to construct Legendre and Chebyshev
basis functions that lead to sparse matrices, susceptible to very fast direct solvers. To the author’s
knowledge,the bases have not been used for large-scalechannel flow simulations, and algorithms for
the required direct solvers have not, until now, been devised. Shen’s bases have been used for the
NavierStokes equationsbefore,though. Bouchonet.al. [14]describea spectral-Galerkinformulation
verysimilar to the one usedin this paper. However,they choosethe Legendrebasis overChebyshev,
which has some consequences when aiming for large-scale, because fast transforms are required in
moving from spectral to physical space, and back again. For Fourier and Chebyshev bases, the
Fast Fourier Transforms (FFTs of (NlogN)) apply directly. However, until recently, the discrete
O
Legendre transforms required (N2) operations. This scaling has now been improved by several
O
authors, as recently reviewed by Hale and Townsend [15, 16], but the methods are still not quite on
parwiththeFFTs. Forexample,HaleandTownsenddescribean (N(logN)2/loglogN)algorithm,
O
using intermediate fast transforms from Legendre to Chebyshev coefficients.
In this paper we will describe and assessa spectral-Galerkinchannel flow solverbased onFourier
and Shen’s Chebyshev basis [13]. The solver will consist of parts that scale at worst as NlogN, for
a 1D problem of size N, and as such as N3logN for a 3D box of size N3. We will give a proper
descriptionofthetheoreticalbasisinSec2,thediscretizationofNavier-StokesequationsinSec3,and
we will describe necessary algorithms, including a new fast direct solver for the biharmonic problem
that arise, in Sec 4. We will finally show, in Sec 5, that roundoff is not a major issue, and that the
Shen-Fourier spectral-Galerkin method is in deed applicable to large-scale simulations. The solver
hasbeenimplementedintheopensourcecodespectralDNS[17],wherethebulkofthecodeiswritten
in high-level Python [18], with critical parts migrated to Cython [19] for efficiency.
2
2. Basis functions and fast transforms
The Navier-Stokes equations, used to describe turbulent flow in a doubly periodic channel, can
be written in rotational form as
∂u
=H+ν 2u p˜,
∂t ∇ −∇
u=0, (1)
∇·
where u(x,t) = (u,v,w) is the velocity vector, x = (x,y,z) and t are position and time, and the
nonlinearity H(x,t) = ( , , ) = u ω, where ω = u. The constant dynamic viscosity
x y z
is denoted as ν and p˜(xH,t) iHs a Hpressure m×odified to accou∇nt×for both the driving force, β(t), and
the kinetic energy, i.e., p˜= p+u u/2+βy, where p is the instantaneous pressure normalized by
·
a constant density. The computational domain is Ω = [ 1,1] [0,L ] [0,L ], with channel walls
y z
located at x = 1, such that no-slip applies at u( 1,y,−z,t)=×0. The w×alls are spanning the y z
± ± −
plane and the equations are periodic in the y and z directions with periodic lengths L and L ,
y z
respectively.
The domain Ω is discretized using N =(N ,N ,N ) intervals, where the two periodic directions
x y z
use uniform intervals. The computational mesh is given as
jL kL
X = x R3 (x ,y ,z )= h(i), y, z ,where
N i j k
∈ | N N
n (cid:18) y z (cid:19)
(i,j,k) [0,1,...,N ] [0,1,...,N 1] [0,1,...,N 1] , (2)
x y z
∈ × − × −
o
where x =h(i) represents
i
cos iπ i=0,1,...,N for Chebyshev-Gauss-Lobattopoints,
h(i)= Nx ∀ x (3)
cos(cid:16)(2i+(cid:17)1)π i=0,1,...,N for Chebyshev-Gauss points.
2Nx+2 ∀ x
(cid:16) (cid:17)
The spectral-Galerkin method makes use of a three-dimensional scalar product in the weighted
L2(Ω) space, that is defined as
σ
u,υ = u(x)υ∗(x)σ(x)dxdydz, (4)
h iσ
ZΩ
where υ∗ is the complex conjugate of the test function υ and the weights σ are unity for periodic
directions and σ(x) = 1/√1 x2 for the inhomogeneous direction. In this work we will make use
−
of the discrete weighted l2(Ω) space, where quadrature is employed for the integration. As such, we
σ
redefine the scalar product as
Nx Ny−1Nz−1
∗
u,υ = u(x ,y ,z )υ (x ,y ,z )σ(x ), (5)
h iσ i j k i j k i
i=0 j=0 k=0
X X X
that is more amendable to the fast transforms that will be defined later in this section.
The Navier Stokes equations are discretized using Fourier basis functions for the periodic direc-
tions,andacombinationofChebyshevpolynomialsinthe wall-normaldirection. Threedifferentsets
of basis functions and function spaces are relevant for the wall-normaldirection
φ (x)=T (x), W =span φ Nx , (6)
k k Nx { k}k=0
φ¯ (x)=T (x) T (x), W¯ =span φ¯ Nx−2, (7)
k k − k+2 Nx { k}k=0
2(k+2) k+1
φˇ (x)=T (x) T (x)+ T (x), Wˇ =span φˇ Nx−4, (8)
k k − k+3 k+2 k+3 k+4 Nx { k}k=0
3
where T (x) is the k’th degree Chebyshev polynomial of the first kind. The basis functions and
k
function spaces in (7) and (8) were suggested by Shen [13], and satisfy, respectively, the boundary
conditions φ¯ ( 1)=0, φˇ ( 1)=0 and φˇ′( 1)=0.
k ± k ± k ±
Three-dimensional basis functions and function spaces, that are periodic in y and z directions,
can now be defined as
ψk(x)=φl(x)eı(my+nz), VN =span ψk(x): k KN , (9)
{ ∈ }
ψ¯k(x)=φ¯l(x)eı(my+nz), V¯N =span ψ¯k(x): k K¯N , (10)
{ ∈ }
ψˇk(x)=φˇl(x)eı(my+nz), VˇN =span ψˇk(x): k KˇN , (11)
{ ∈ }
where ı = √ 1. The wavenumbermesh, K , for space V , is defined by the Cartesian product of
N N
−
wavenumbers from the two periodic and the inhomogeneous wall-normal direction: K (l,m,n) =
N
Kx(l) Kp(m,n), where
×
2πm 2πn
Kp = (m,n)= , ,where
L L
n (cid:18) y z (cid:19)
N N N N N N
y y y z z z
(m,n) [ , +1,..., 1] [ , ,+1,..., 1] (12)
∈ − 2 − 2 2 − × − 2 − 2 2 −
o
and Kx(l) = l Zl = 0,1,...,N . The two remaining wavenumber meshes, K¯ and Kˇ , differ
x N N
fromK only{in∈the|rangeofthefirst}indexsets,K¯xandKˇx,endinginN 2andN 4,respectively
N x x
− −
(see Eqs. (7) and (8)).
In the spectral-Galerkin method we look for solutions of the velocity components of the form
1
u(x,t)= uˆk(t)ψˇk(x), (13)
N N
y z
kX∈KˇN
1
v(x,t)= vˆk(t)ψ¯k(x), (14)
N N
y z
kX∈K¯N
1
w(x,t)= wˆk(t)ψ¯k(x), (15)
N N
y z
kX∈K¯N
where uˆk(t) = uˆ(l,m,n,t) are the expansion coefficients for the velocity component in x-direction
(and similar for the other two components) and the scaling by N and N is merely for convenience
y z
and compliance with the definition used later for the inverse discrete Fourier transform. Note that
from now on we will simply use the notation uˆ for uˆk(t), when it is possible to simplify without loss
of clarity. Likewise we will simply use u for u(x,t).
For an efficient method it is crucial to be able to compute u quickly from uˆ, or, vice versa, to
compute uˆ quickly from u. This may be achieved using the fast transform methods to be defined
next. Consider first how to compute u from the known expansion coefficients uˆ. Writing out the
summation terms, the expression (13) may be evaluated on the entire mesh (2) as
1 1
u(x ,y ,z ,t)= uˆ(l,m,n,t)φˇ(x )eımyjeınzk x X ,
i j k l i N
N N ∀ ∈
z y
n m l
X XX
Sˇ−1
x
| F−{1z }
y
| F−{1z }
z
u=|ˇ−1(uˆ)= −1( −1(ˇ{−z1(uˆ))), } (16)
T Fz Fy Sx
4
where ˇ−1 is used as short notation for the complete inverse transform in space Vˇ , and −1 and
−1reTpresentinversediscreteFouriertransformsalongdirectionsyandzrespectivelNy. ForsiFmyplicity,
wFez have introduce here a special notation called the inverse Shen transform, −1, that is used to
Sx
transform coefficients from spectral to physical space in a series expansion that is using either one
of the bases in (6, 7, 8). The inverse Shen transform, −1, is performed along the wall-normal x
Sx
direction, and it may be computed using fast Chebyshev transforms for all the three bases in (6, 7,
8), see Alg 6 in the Appendix. The transforms areslightly different for the three basesand −1, ¯−1
and ˇ−1 areusedtodistinguishbetweenthemwithobviousnotation. Similarly, ¯−1 and S−x1 deSfixne
theiSnvxersetransformsforspacesV¯ andV . NotethatatransforminanyonediTrectionisTperformed
N N
overallindices intheothertwodirections. Forexample,forthetransforminthex-directionwehave
u(x ,m,n,t)= ˇ−1(uˆ(l,m,n,t)) i [0,1,...,N ] and m,n Kp, (17)
i Sx ∀ ∈ x ∈
and similar for the other two directions.
Fast transforms may also be used in the scalar product defined in Eq. (5), using basis ψˇk for υ
u,ψˇk =h u(xi,yj,zk,t)e−ınzke−ımyjφˇl(xi)σ(xi) k KˇN,
σ ∀ ∈
D E Xi Xj Xk
F
n
| {zFm }
| {z Sˇ }
l
=h|ˇ(u)=hˇ( ( (u))){.z } (18)
l m n
S S F F
Here hˇ() denotes the complete three-dimensional scalar product and h = L L N−1N−1 is a con-
S · y z y z
stant. and represent discrete Fourier transforms in z- and y-directions, respectively, and
n m
ˇ()=F(,φˇ) isFused to represent the forwardShen scalar product in the x-direction. The weights,
l l σ
S · ·
σ(x ), required for the Shen transforms, are defined as
i
π i=0,1,...,N for Chebyshev-Gauss-Lobatto points,
σ(xi)=(cNixNπ+x1 ∀∀i=0,1,...,Nxx for Chebyshev-Gauss points, (19)
where c =c =2 and c =1 for 0<i<N , see Sec. 1.11 of [20].
0 Nx i x
The scalarproducthˇin(18)does notrepresentacomplete transform. To find a transformation
S
from physical u to spectral uˆ, we make use of Eq. (13) directly on the left hand side of (18) and use
extensively the discrete orthogonality of the Fourier basis functions
1
u,ψˇk = uˆ(q,r,s,t)ψˇ(q,r,s),ψˇ(l,m,n) k KˇN,
σ NyNz * + ∀ ∈
D E (q,r,Xs)∈KˇN σ
Nx−4 Nx
=h φˇ (x )φˇ(x )σ(x )uˆ(q,m,n,t),
q i l i i
q=0 i=0
X X
Nx−4
=h Bˇ uˆ(q,m,n,t). (20)
lq
q=0
X
Here Bˇ = (φˇ ,φˇ) = Nx φˇ (x )φˇ(x )σ(x ) are the components of a banded mass matrix with
lq q l σ i=0 q i l i i
only 5 nonzero diagonals, see Table 6. The complete transformation is now obtained by setting Eq.
P
5
(20) equal to (18) and solving for uˆ. Moving to matrix notation we get
Nx−4
h Bˇ uˆ(q,m,n,t)=hˇ(u)(l,m,n,t) k Kˇ ,
lq N
S ∀ ∈
q=0
X
uˆ= ˇ(u)=Bˇ−1ˇ(u), (21)
T S
where ˇ(u) denotes the complete transformation, such that u = ˇ−1(ˇ(u)). Note that, since Bˇ
assembTles to a pentadiagonal matrix for its decoupled odd and evenTcoeffiTcients, the solution (Bˇ−1)
canbe obtainedveryfast andthe complete transformationthus requiresa fast Chebyshevtransform
( (N logN )) and a fast linear algebraic solve ( (N )) for the wall-normal direction, given m and
x x x
nO. Similar transforms and ¯ are defined for thOe two other bases (6) and (7), using mass matrices
withcomponentsB =T(T ,TT) (diagonal)andB¯ =(φ¯,φ¯ ) (threenon-zerodiagonals)andscalar
lq q l σ lq l q σ
products ()=(,T ) and ¯()=(,φ¯) . See Alg. 5 and 6 in the Appendix for algorithms for all
l l σ l l σ
S · · S · ·
required transforms.
3. Discretization of Navier Stokes equations
TheNavierStokesequations(1)aresolvedusingaschemeproposedbyKim,MoinandMoser[4].
This scheme is developed by taking the Laplacian of the wall-normal momentum equation and the
curl of the momentum equation. Following elimination of the pressure, the equations to solve are
∂
2u=h +ν 4u, (22)
u
∂t∇ ∇
∂g
=h +ν 2g, (23)
g
∂t ∇
∂u
f + =0, (24)
∂x
where
∂v ∂w
f = + , (25)
∂y ∂z
∂w ∂v
g = , (26)
∂y − ∂z
∂ ∂ ∂ ∂2 ∂2
y z
h = H + H + + , (27)
u −∂x ∂y ∂z ∂y2 ∂z2 Hx
(cid:18) (cid:19) (cid:18) (cid:19)
∂ ∂
z y
h = H H . (28)
g
∂y − ∂z
Thetworemainingvelocitycomponentsarecomputedfromthedefinitionsoff andg. Thebiharmonic
equation(22)issolvedwithboundaryconditionsu( 1)=u′( 1)=0,wheretheNeumannconditions
± ±
follow from the continuity equation. The boundary conditions for Eq. (23) are g( 1)=0.
±
We consider the spectral-Galerkin discretization of (22), (23) and (24), using a central difference
intime, with Crank-Nicolsonfor the linear terms andan Adams-Bashforthmethod for the nonlinear
terms. To this end, the discretization in time is performed with a constant time step t > 0, such
△
that time is represented discretely as t =κ t, κ [0,1,...], and variables with a time superscript,
κ
like uκ =u(t ). We get the variationalformu△lation∈: with u0(x),u1(x),g0(x) and g1(x) given,for all
κ
6
κ>1 find uκ+1 Vˇ , gκ+1 V¯ and fκ+1 V¯ such that
N N N
∈ ∈ ∈
2(uκ+1 uκ)
∇ − ,ψˇk = huκ+21,ψˇk +ν 4uκ+12,ψˇk ψk VˇN, (29)
(cid:28) △t (cid:29)σ D Eσ D∇ Eσ ∀ ∈
gκ+1 gκ
− ,ψ¯k = hgκ+21,ψ¯k +ν 2gκ+12,ψ¯k ψk V¯N, (30)
(cid:28) △t (cid:29)σ D Eσ D∇ Eσ ∀ ∈
∂uκ+1
fκ+1,ψ¯k σ = ∂x ,ψ¯k ∀ψk ∈V¯N. (31)
(cid:28) (cid:29)σ
(cid:10) (cid:11)
Heresuperscriptκ+1isusedtorepresentCrank-Nicolsonforlinearterms(e.g.,uκ+21 =0.5(uκ+1+uκ))
2
and Adams-Bashforth for nonlinear (e.g., hκu+12 =1.5hκu−0.5hκu−1).
With fκ+1 and gκ+1 known, the two remaining velocity components are then computed by pro-
jection to the Dirichlet space V¯ : Find vκ+1 V¯ and wκ+1 V¯ such that
N N N
∈ ∈
∂vκ+1 ∂wκ+1
fκ+1,ψ¯k σ = ∂y + ∂z ,ψ¯k ∀ψ¯k ∈V¯N, (32)
(cid:28) (cid:29)σ
(cid:10) (cid:11) ∂wκ+1 ∂vκ+1
gκ+1,ψ¯k σ = ∂y − ∂z ,ψ¯k ∀ψ¯k ∈V¯N, (33)
(cid:28) (cid:29)σ
(cid:10) (cid:11)
whichsimplifiesconsiderablybecauseallthederivativesareinperiodicdirections. Writteninspectral
space Eqs. (32) and (33) become simply the algebraic expressions
fˆκ+1 =ımvˆκ+1+ınwˆκ+1 k K¯0, (34)
k k k ∀ ∈ N
gˆκ+1 =ımwˆκ+1 ınvˆκ+1 k K¯0, (35)
k k − k ∀ ∈ N
whereK¯0 isusedtodenotethattheseequationscanbesolvedforallwavenumbersexceptm=n=0.
N
For m=n=0 we solve instead the momentum equations in y and z directions (see Eq. (1)):
vκ+1 vκ
−t ,ψk = Hyκ+21,ψk σ +ν ∇2vκ+12,ψk σ− βκ+1,ψk σ ∀k∈K¯x×Kp(0,0), (36)
(cid:28)wκ+1△ wκ (cid:29)σ D E D E (cid:10) (cid:11)
− ,ψk = zκ+12,ψk +ν 2wκ+12,ψk k K¯x Kp(0,0). (37)
(cid:28) △t (cid:29)σ DH Eσ D∇ Eσ ∀ ∈ ×
Note that the regular pressure is eliminated since m = n =0, and that (36) is the only place where
the driving force, or the mean pressuregradient,β, enters the equations. Also, since β is constantin
space, the term <βκ+1,ψ¯k >σ will only be non-zero for l=m=n=0.
The final step of the method is to rewrite all equations on matrix form, using one-dimensional
scalar products. Inserting the expansion (13) for u in Vˇ , and similar for g,h and h in V¯ , the
N u g N
inner products required to solve Eqs. (29-31) are
∇4u,ψˇk σ =h φˇ′q′′′,φˇl σ−2(m2+n2) φˇ′q′,φˇl σ+(m2+n2)2 φˇq,φˇl σ uˆq, (38)
D∇2u,ψˇkEσ =hh(cid:16)φˇ′q′,φˇl (cid:17)σ−(m2+n2) φˇ(cid:16)q,φˇl σ(cid:17)uˆq, (cid:16) (cid:17) i (39)
D∇2g,ψ¯kEσ =hh(cid:16)φ¯′q′,φ¯l(cid:17)σ−(m2+n2)(cid:16)φ¯q,φ¯l(cid:17)σ igˆq, (40)
(cid:10)∂∂ux,ψ¯k (cid:11) =hh(cid:16)φˇ′q,φ¯l (cid:17)σuˆq, (cid:0) (cid:1) i (41)
(cid:28) (cid:29)σ (cid:16) (cid:17)
hu,ψˇk σ =h −(m2+n2) φ¯q,φˇl σHˆx,q−ım φ¯′q,φˇl σHˆy,q−ın φ¯′q,φˇl σHˆz,q , (42)
D E h (cid:16) (cid:17) (cid:16) (cid:17) (cid:16) (cid:17) i
hg,ψ¯k σ =h ım φ¯q,φ¯l σHˆz,q−ın φ¯q,φ¯l σHˆy,q , (43)
(cid:10) (cid:11) h (cid:0) (cid:1) (cid:0) (cid:1) i
7
where for brevity in notation (as before) it is understood that the scalar products act along the
first dimension of the transformed variables, i.e., (φˇ′,φ¯) uˆ is short for the matrix vector product
q l σ q
Nx−4(φˇ′,φ¯) uˆ(q,m,n,t),forallmandn. Thescalarproductsareusedtosetuplinearsystemsof
q=0 q l σ
equationsforthe inhomogeneouswall-normaldirection. Allscalarproducts (, ) canbe represented
σ
Pby sparse matrices. The required matrices with components Bˇ = (φˇ ,φˇ) ·,B·¯ = (φ¯ ,φ¯) ,Mˇ =
lq q l σ lq q l σ lq
(φ¯ ,φˇ) ,C¯ =(φˇ′,φ¯) andCˇ =(φ¯′,φˇ) aregiveninTable6,Aˇ = (φˇ′′,φˇ) ,A¯ = (φ¯′′,φ¯)
q l σ lq q l σ lq q l σ lq − q l σ lq − q l σ
aregivenin[13],whereasQˇ =(φˇ′′′′,φˇ) isgivenbelowinEqs(57)-(60). Notethatthematricesare
lq q l σ
computedusingquadrature,whichhassomeimplicationsforChebyshev-Gauss-Lobatto(GL)points,
where the rows of the highest modes differ from those presented in Lemma 2.1 and 3.1 of [13]. This
disagreement,thatfollowsfrominexactquadratureatthehighestmode,explainstheinclusionofthe
c termformatrixBˇ andthec termformatrixB¯. ThematrixwithcomponentsB =(T ,T )
k+4 k+2 lq q l σ
is also different in the highest mode from Eq. 2.7 of [13], but agrees with Eqs. 1.135 and 1.136 of
[20].
Assemblingallscalarproducts,thefinalmatrixformofEqs. (29),(30)and(31),thatcanbeused
to solve for uˆκ+1, gˆκ+1 and fˆκ+1, given wavenumbers m and n, are now found as
Hˇuˆκ+1 = 2Aˇ 2z2Bˇ Hˇ uˆκ+ tˇ(hκ+1/2) k Kˇ , (44)
− − △ S v ∀ ∈ N
H¯gˆκ+1 =(cid:16)2B¯ H¯ gˆκ+ (cid:17)t¯(hκ+1/2) k K¯ , (45)
− △ S g ∀ ∈ N
B¯fˆκ+1 =(cid:0)C¯uˆ (cid:1) k K¯N. (46)
∀ ∈
The coefficient matrices are given as
ν t 2z2+ν tz4
Hˇ(m,n)= △ Qˇ+ 1+ν tz2 Aˇ △ Bˇ m,n Kp, (47)
− 2 △ − 2 ∀ ∈
ν t (cid:0) ν tz2(cid:1)
H¯(m,n)= △ A¯+(1+ △ )B¯ m,n Kp, (48)
− 2 2 ∀ ∈
where z2 =m2+n2. Equations (36) and (37) are also written on matrix form as
H¯vˆκ+1 = 2B¯ H¯ vˆκ+ tB¯ ˆyκ+21 t¯(βκ+1) l K¯x,m=n=0, (49)
− △ H −△ S ∀ ∈
H¯wˆκ+1 =(cid:0)2B¯ H¯(cid:1)wˆκ+ tB¯ ˆzκ+12 l K¯x,m=n=0. (50)
− △ H ∀ ∈
Finally, the nonl(cid:0)inear term(cid:1)s are computed with the recipes given in Eqs. (42, 43). To this end
the required vector Hˆ is found by projecting to the Dirichlet vector space V¯3, which corresponds to
N
transforming the nonlinear cross product, evaluated in real space, over the entire mesh
ˆ = ¯((u ω) ) i=x,y,z. (51)
i i
H T × ∀
The curl, ω = (g,∂ u ∂ w,∂ u ∂ v), is computed by projecting each individual term to its
z x y x
− −
appropriate spectral space, before transforming back to physical space. To this end, the two terms
∂ v and ∂ w are projected to V (requires C =(φ¯′,T ) , see Tab. 6), whereas the remaining ∂ u
x x N lq j k σ y
and ∂ u are projected to Vˇ . With compact notation using the transforms, we obtain
z N
∂ v(x)= −1(B−1Cvˆ) x X , (52)
x N
T ∀ ∈
∂ w(x)= −1(B−1Cwˆ) x X , (53)
x N
T ∀ ∈
∂ u(x)= ˇ−1(ımuˆ) x X , (54)
y N
T ∀ ∈
∂ u(x)= ˇ−1(ınuˆ) x X . (55)
z N
T ∀ ∈
8
Here, with a slight abuse of notation, the left hand side is simply representing the respective ex-
pressions evaluated on the quadrature points in the real mesh X . Note that the nonlinear term is
N
also generally in need of dealiasing, at least for the two periodic directions, but this is not discussed
further in this paper.
4. Implementation
An outline of the solution procedure, used for the numerical method described in Sec. 3, is given
in Algorithm 1.1 The major computational cost in Alg. 1 comes from computing the nonlinear
convection and solving for Eqs. (44, 45). We consider in this section the linear solvers.
Algorithm 1: Solution procedure for Navier Stokes equations.
Initialize u0(x) and u1(x)
Compute uˆ0,uˆ1,gˆ0 and gˆ1
k k k k
Set parameters (e.g., mesh and viscosity) and end time T
Compute LU decompositions of Hˇ(m,n),H¯(m,n) m,n Kp
∀ ∈
t t
←△
κ 1
←
Compute nonlinear convection Hˆ0(uˆ0,gˆ0)
k k k
while t<T do
Compute nonlinear convection Hˆκk
Compute right hand sides of Eqs. (44), (45)
Solve Eq.(44) for uˆκ+1 k Kˇ
k N
∀ ∈
Solve Eqs.(45) for gˆκ+1 k K¯
k N
∀ ∈
Solve Eqs.(46) for fˆκ+1 k K¯
k N
∀ ∈
Solve Eqs.(34, 35) for vˆk,wˆk k K¯0
∀ ∈ N
Solve Eqs.(49, 50) for vˆκ+1,wˆκ+1 l K¯x,m=n=0
∀ ∈
κ κ+1
←
t t+ t
← △
Update to new time step
Shen [13] writes that it is possible to solve Eq. (45) directly with essentially the same number of
operations as a pentadiagonal solver. To this end, note that the matrix H¯ decouples into odd and
evencomponents,leadingtotwomatricesoftypeupperHessenberg. AdirectLUdecomposition(see,
e.g., [21]), leads to a lower triangular matrix L¯ with only one subdiagonal and an upper triangular
matrixU¯ thatisdense. However,eachrowinU¯ containsatmostthreedistinctvaluesatU¯ ,U¯
k,k k,k+2
and U¯ , and then U¯ =U¯ j =k+6,k+8,...N 2. Consequently only three diagonals
k,k+4 k,j k,k+4 x
in U¯ need storage and the back sol∀ve can be traversed very−efficiently in (N ). Note also that
x
O
the decoupling into odd and even coefficients leads to two subsystem that may be trivially solved
simultaneously in two threads. For optimal performance, the odd and even coefficient would then
1Note that throughout this paper we are using the pseudocode conventions of Kopriva [20], with some minor
differences. Avectorizationstatementlike{wk}Nk=0←{uk}Nk=0 indicatesthatcomponentsuk,fork=0,1,...,N,are
copied from uk to wk. Similar conventions apply to matrices, e.g., {U1,k}Nk=0 = {V2,k}Nk=0 can be used for copying
row 2 of V to row 1 of U. Broadcasting is also implied, here meaning that for {w}N ← c, where c is a scalar, all
k=0
elementsofwk fromk=0toN getsthescalarvaluec.
9
needto be storedcontiguouslyincomputer memory,andnotalternately,whichwouldbe the normal
way of storing the coefficients.
The biharmonic problem in Eq. (44) is more challenging to solve for efficiently, but it is still
possible, as suggested yet not devised by Shen [13], to find an algorithm that is (N ) for a system
x
of N unknowns. Note that Hˇ is the sum of three matrices Qˇ,Aˇ and Bˇ, and canObe written as
x
Hˇ =ξ Qˇ+ξ Aˇ+ξ Bˇ, (56)
0 1 2
whereξ ,ξ andξ areconstants(dependingonmandn). Thematrix Aˇ Nx−4 containsonlythree
0 1 2 { kj}k,j=0
nonzero diagonals at j = k 2,k,k+2, whereas matrix Bˇ Nx−4 contains five nonzero diagonals
− { kj}k,j=0
at j = k 4,k 2,k,k+2,k+4. The nonzero elements of the upper triangular Qˇ Nx−4 matrix
− − { kj}k,j=0
are given as [13]
Qˇ = 4(k+1)(k+2)2, (57)
kk
−
Qˇ =p q +r s , j =k+2,k+4,...,N 4, (58)
kj k j k j x
−
where
1
p =8k(k+1)(k+2)(k+4)π, q = , (59)
k j
j+3
(j+2)2
r =24(k+1)(k+2)π, s = . (60)
k j
(j+3)
A straight forward direct LU decomposition (without pivoting) of Hˇ can be performed as shown in
Alg. 2, which leads to a lower triangular matrix Lˇ Nx−4 with two nonzero diagonals at j =k 2
{ kj}k,j=0 −
and k 4 plus a unity main diagonal. The upper triangular matrix Uˇ is dense and as such a show
stoppe−rforanefficientbacksolve. However,wenotethat Uˇ Nx−4 containsthreedistinctdiagonals
{ kj}k,j=0
at j =k,k+2 and k+4, whereas the remaining part can be expressed as
Uˇ =ξ (a q +b s ), j =k+6,k+8,...,N 4 and k N 10, (61)
kj 0 k j k j x x
− ≤ −
where a Nx−10 and b Nx−10 are two new vectors that can be computed recursively from Lˇ, as
{ k}k=0 { k}k=0
shown in Alg. 3. If a and b are pre-computed, the total storage requirement for the complete LU
decomposition is less than 7N , since there are two diagonals for Lˇ, three for Uˇ plus a and b. The
x
solution of the complete system can be obtained very quickly with one simple forward elimination
and a back solve, where the back solve is very fast ( (N )) because of (61), leading to a formula for
x
the backwards substitution of Uˇ uˆ =y like (validOfor k N 10)
kj j k x
≤ −
Nx−4 Nx−4
uˆk =yk Uˇk,k+2uˆk+2 Uˇk,k+4uˆk+4 ξ0ak qjuˆj ξ0bk sjuˆj/Uˇkk, (62)
− − − −
j=k+6 j=k+6
k−Xj even k−Xj even
that only requires one new addition to the row-sums for each back-traversedrow, see Alg. (4).
5. Verification and validation
We have in previous sections described a Navier-Stokes solver for channel flows applicable to
large-scale simulations. We have devised algorithms for fast direct sparse solvers, scalar products,
and for the necessary fast transforms between physical and spectral space. The complete solver
10