A New Parallel Domain-Decomposed Chebyshev Collocation Method for Atmospheric and Oceanic Modeling

Spectral methods seek the solution to a differential equation in terms of series of known smooth function. The Chebyshev series possesses the exponential-convergence property regardless of the imposed boundary condition, and therefore is suited for the regional modeling. We propose a new domain-decomposed Chebyshev collocation method which facilitates an efficient parallel implementation. The boundary conditions for the individual sub-domains are exchanged through one grid interval overlapping. This approach is validated using the one dimensional advection equation and the inviscid Burgers’ equation. We further tested the vortex formation and propagation problems using two-dimensional nonlinear shallow water equations. The domain decomposition approach in general gave more accurate solutions compared to that of the single domain calculation. Moreover, our approach retains the exponential error convergence and conservation of mass and the quadratic quantities such as kinetic energy and enstrophy. The efficiency of our method is greater than one and increases with the number of processors, with the optimal speed up of 29 and efficiency 3.7 in 8 processors. Efficiency greater than one was obtained due to the reduction the degrees of freedom in each sub-domain that reduces the spectral operational count and also due to a larger time step allowed in the sub-domain method. The communication overhead begins to dominate when the number of processors further increases, but the method still results in an efficiency of 0.9 in 16 processors. As a result, the parallel domain-decomposition Chebyshev method may serve as an efficient alternative for atmospheric and oceanic modeling.


Introduction
The advance of fast Fourier transform (FFT) and spectral transform method [16] brought the successful global atmospheric modeling using spectral methods [1,15].Comparing with finite difference and finite element (volume) methods, the global spectral models can eliminate pole problems and preserve high accuracy and efficiency due to the "exponential convergence" property.The spectral methods also offer the discrete conservations of kinetic energy and enstrophy, which are very important for the two-dimensional turbulence.In addition to the popularity of spectral method for global models, finite difference and finite element (volume) methods are usually used to deal with limited-area models in atmospheric sciences.The main barrier of spectral methods extended to limited area models is the implementation time-dependent boundary conditions.The application is still limited to small-scale phenomena, see the discussions in [19], [4] and [5].[4] and [5] proposed a Chebyshev spectral method for the limited area atmospheric and oceanic modeling.The method successfully handles the time dependent boundary conditions while retaining the advantage of global spectral methods.The idea of maintaining the exponential convergence property with domain decomposition for spectral methods was initiated by [14].[8,9] proposed an idea which makes hyperbolic equations on complicated geometries to be on squared sub-domains thus easier to execute spectral methods.The advantages of domain decomposition for Chebyshev spectral method is given in [8].[9] introduced both "patched" and "overset" methods for computing hyperbolic equations on complicated domain geometry.A decade later, [12] developed a conservative staggered-grid Chebyshev multi-domain method for compressible flows with patched grids.[10] found that within each subdomain, the polynomial order can be set independently of its neighbors.Chebyshev pseudospectral (collocation) method with domain decomposition is also applied to deal with viscous flow calculation in order to reducing the Gibbs phenomena on entire domain [22].These earlier approaches with Chebyshev domain decomposition solved problems such as hyperbolic equations [8,9], viscous flow [22], and compressible flows [10,12].
Unfortunately, none has directly applied to the atmospheric modeling.
The main objective of this study is to develop and implement the first parallel Chebyshev collocation method with domain decomposition in the atmospheric modeling.This is unique in atmospheric modeling while no periodic boundary constraint is enforced.The multi-dimensional domain decomposition provides the most flexible combination and effective performance for the large-scale atmospheric modeling.We solve the shallow water equations in a limited-area as shown in [5].Since our shallow water model is on the advective form, we choose the collocation method rather than tau method for accuracy based on [5].Domain decomposition is the most efficient way in parallel computing to improving the computational efficiency.Message Passing Interface (MPI) programming is employed in this study.The multi-domain calculations are performed using several computing processes, which reduces a significant amount of computational time and relieves the memory requirement for a large-scale system from the advantage of parallel computing [18].Section 2 briefly introduces the Chebyshev collocation method and the new parallel domain decomposition approach.Section 3 investigates the accuracy of the parallel domain-decomposed Chebyshev method using advection and inviscid Burgers' equations.The characteristics are discussed in terms of the exponential convergence of error.Section 4 presents the simulation results using 2-D shallow water equations.Finally, conclusions are given in section 5.

Methods
The throughout description of the Chebyshev spectral method could be found in [4,5].If a function is expanded by Chebyshev series (1) we get the spectral coefficients by the relation The development of parallel domain decomposition approach [18] for Chebyshev collocation method [2,11] is unique and novel in the atmospheric modeling.The computational saving is significant not only due to the distributed computing for each domain, but also due to the relaxation for time constraint (allowing larger due to larger ), as mentioned in [8].
Note that the spectral method with domain decomposition can be regarded as a variation of "spectral element method" [20].One of the major advantages of this technique is that much of the computation is local to an element.The Chebyshev spectral method uses the non-uniform grid arrangement with finer resolution near the boundary.The domain decomposition enhances the homogeneity of the fine resolution as the number of sub-domains increases, which shall provide more flexible alternative for further adaptive grid refinement.As the climate modeling community relies heavily on parallel supercomputers with many nodes and many processors within a node, similar approach such as the spectral element method or discontinuous Galerkin type method may become more viable.
Minimal communication between sub-domains is required for an efficient atmospheric model.A typical one-grid width overlapped boundaries is used here to exchange information between sub-domains.Fig. 1 shows a schematic diagram for the information exchange along the overlapped boundaries.

Test problems
In order to show the accuracy and convergence of the new domain-decomposed Chebyshev approach, we consider the 1-D linear advection equation (3) The initial and boundary conditions are given by the analytical solution (4) where , and are chosen as those in [3].Here, we also compare our results with the fourth-order finite difference scheme (FD4).Fig. 2 compares the analytical solution, the approximation by Chebyshev collocation method, and FD4 at time t=1.0.The domain is divided into two sub-domains with two overlapped grids, e.g., overlapped with one grid width as in Fig. 1.The degree of freedom of each sub-domain is 12. Fig. 2a shows that the approximation of Chebyshev collocation method with domain decomposition is almost identical to the analytic solution in 1-D advection problem without numerical dispersion.Fig. 2b further shows the error of the numerical results of Chebyshev collocation method with single domain, double domain, and FD4 method with analytical solution at t=1.0.As expected, the absolute accuracy is slightly higher in the single domain than the double domains since the advected Gaussian function is centered at x=0.5, where the double domains have coarser resolution.However, the error convergence in the spectral solutions are uniformly decreasing on the order of as N approaches 64, while the error in FD4 decreases on the order of N. The exponential convergence of the spectral method remains regardless of the number of domains.
Further consider the inviscid Burgers' equation in the limited domain (5) with the initial condition .The boundary conditions are given by the general solution of (5).Thus, the analytical solution gives The time of scale-collapse of u is 1 with the position of scale-collapse at .Given the appropriate and , the analytical solution of (5) at certain x and t can be found numerically for desired accuracy.Fig. 3a shows the numerical results compared with the analytic solution for the case and .Numerical solutions obtained the general pattern of the analytical solution with only 32 total degree of freedom.Fig. 3c shows the error distribution in the physical space.Note that the error in double domains is larger than that in single domain since the scale collapse (sharp front) locates at the coarse collocation points.On the other hand, for another case and , the scale collapse happens near the center of the domain (Fig. 3b), better results are achieved for double domains since the single domain has coarse collocations points there (Fig. 3d).

2-D non-linear shallow water model
In order to further verify the new domain-decomposed Chebyshev collocation method in a more realistic atmospheric model, we examine the vortex formation and propagation using the 2D nonlinear shallow water equations: (6) where u and v are the velocity components in x and y directions, respectively, f is the Coriolis parameter, Q(x, y) is the outer force, is the basic state of geopotential, and h is the total height deviation from .
In this simulation, the vortex formation by the Q-forcing is located at (1000 km, -1000 km).All the model settings are the same as the 2-D test case in [5].We will observe the vortex drift from the easterly background flow to the westerly and recurve due to the effects and the surrounding flow vorticity gradient.The vortex track is schematically shown in Fig. 1.Note the vortex crosses the sub-domain boundaries in our domain decomposition simulations.Once the derivatives have been calculated, RK4 method is used to do the time integration.Fig. 5 shows the results by computing (9) with Chebyshev collocation method on day 3, 6, and 9.The contour lines represent the geopotential field .The results match quite well with [5], and it is encouraging that there is almost no difference between single domain, DD , and DD simulations.The vortex moves smoothly without obvious oscillation when passing the sub-domain boundaries.Fig. 6a and fig.6b show the analysis of the axisymmetric velocity, vorticity on day 3 and 9.All results are almost identical within the round-off error.
To analyze the convergent rate, we use the single domain calculation with the degree of freedom N=192 as the reference state.Fig. 7 compares L2 error between the case using a single domain and the case using double sub-domains, respectively (degree of freedom is N=192 for both cases).Only the results of day 2 (48 hours) and day 4 (96 hours) are shown for generality.These results confirm that the exponential convergence holds regardless of the number of sub-domains.However, the accuracy of the domain decomposed case is slightly less than the single domain results due to the difference in the non-uniform grid spacing.
The parallel performance for the 2-D shallow water equations model are summarized in Table 1 with the speed-up defined as SU=T1/Tp, and the efficiency SU/n; T1 and Tp are the cpu time for single domain and p sub-domains with domain-decomposition, respectively, and n is the number of MPI tasks.The benchmark test is performed on a Linux cluster, in which each computing node has two quad-cores processors.The communication is based on the infiniband connection.Note that the efficiency is great than 1 and increases with the increasing number of MPI tasks for the number of MPI tasks less and equal 8.This leads to a major advantage of the domain-decomposed Chebyshev method.The communication overhead begins to dominate when the number of MPI tasks further increases (e.g., n=16).This mainly results from more data exchange across the computing nodes.
Table 2 shows the conservation property of mass, kinetic energy, and enstrophy in the whole domain after day 3, 6 and 9, respectively.The results suggest the conservation of the first and second moments is reasonably retained using the new domain decomposition approach.

Conclusions
We have introduced Chebyshev collocation method with domain decomposition in the atmospheric modeling.The sub-domain boundary information exchange is by overlapping the sub-domains in one grid width.By the property of Chebyshev grids setting and consider the relation between and with CFL condition, we can enlarge the with Chebyshev domain decomposition compared with single domain.Our domain decomposition Chebyshev collocation method indicates the exponential convergence property in 1-D linear advection model.In the test of inviscid Burgers' equation, we integrate the model up to the shock formation time.We show that the domain decomposition spectral method in general yields smaller errors when compared to the single domain calculations.In a more realistic atmospheric modeling with a 2-D shallow water model, the domain decomposition Chebyshev method gives results identical to the single domain spectral method with a error on the order of 第 18 屆全國計算流體力學研討會 宜蘭，中華民國一百年八月 18th CFD Taiwan Yilan, August, 2011 when degrees of freedom is considered.The domain decomposition spectral method is capable of a stable integration of 9 days in our test.The time-splitting method [19] is also tested in our 2D shallow water model.Since 4 of 6 derivative terms in (9) need to be calculated in the small gravity wave time step, the time-splitting method is not efficient in this case, and the additional complication causes the model loses stability.The work of assessing the additional advantage of relaxing the CFL criteria of is ongoing.It is interesting to test Chebyshev domain decomposition method with the immerse boundary condition method in the lateral continental shelf in oceanic modeling.1 The comparison of computational time, speed-up, and efficiency for different numbers of MPI task in 2-D nonlinear shallow water simulation (The degree of freedom is N=192).
Table 2 The conservation property of mass, kinetic energy, and enstrophy after day 3, 6 and 9.
Fig. 1 One-dimensional schematic diagram of the information exchange at the overlapped boundaries.

S D
Fig. 7 The L2 error between the case using a single domain and the case using double sub-domains, respectively (degree of freedom is N=192 for both cases).Label S represents the case using single domain, and label D represents the case using double domains.
and (2) can be calculated efficiently by the Fast Chebyshev Transform.Similar to the Fast Fourier Transform, the Fast Chebyshev Transform reduces the N degree of freedom transformation to operations instead of .This advantage favors * Corresponding author address: H.-C. Kuo, Dept. of Atmospheric Sciences, National Taiwan University, Taipei 106, Taiwan.E-mail: kuo@as.ntu.edu.tw第 18 屆全國計算流體力學研討會 宜蘭，中華民國一百年八月 18th CFD Taiwan Yilan, August, 2011 its potential application in the large-scale atmospheric modeling.

Fig. 2 (
Fig. 2 (a) Comparison between the analytical solution (Exact) and numerical results for the linear advection equation.Label "D" represents the Chebyshev collocation method with double domains.The result with the fourth-order finite difference scheme (FD4) is also superimposed.(b) The L2 error for FD4 and Chebyshev collocation method using single domain (S) and double domain (D), respectively.

Fig. 3 Fig. 4
Fig. 3 Analytical solution and numerical results in single domain and double domains for inviscid Burgers' equation with (a) and and (b) and .The error distribution in the physical space for (c) and and (d) and .

Fig. 5 Fig. 6
Fig. 5 The geopotential field for shallow water model with single domain, DD , and DD on 72, 144, and 216 hours respectively.