Massively Parallel Computing of Shortest Raypath and Traveltime in 2-D and 3-D Models a essential for a of geophysical

Based on Huygens' principle, the authors present. an accurate and computationally efficient method to compute the shortest ra)rpath and traveltime in a tw'o-and three-dimensional (2-D and 3-D) space of a dis­ crete block model. The efficiency of the method is achieved through ap­ proximation, \\1hile the accuracy of the calculated traveltime solely depends on machine precession. The accuracy of the raypath is realized by the small increment in the orientation of the ray incidence. Whet.her the computa­ tional efficiency and accuracy can be justified depends on the model's com­ plexity and requirements in its own application. In addition, the feasibility of implementing the algorithm on the Cray T3D Massively Parallel Proces­ sors (MPP) is proposed. The velocity distribution in a 2-D space is discretized into homogeneous polygonal cells. The search for the shortest traveltime and path between two given points can be reduced to a discrete graph searching. In the gen­ eral 3D case, the velocity model is characterized by discrete con,rex blocks bounded by polyhedral surfaces. Although the 3-D algorithm is a straight­ forward extension of the 2-D case, the computing operations in 3-D are much more CPU intensive. The method is demonstrated with examples showing raypaths and wavefronts in 2 0 and 3D block models. On the basis of these examples, the proposed algorithm is capable of solving the optimal raypaths from di ff er­ ent source points in parallel on the MPP S)1stem. (Key

increment in the orientation of the ray incidence. Whet. her the computa tional efficiency and accuracy can be justified depends on the model's com plexity and requirements in its own application. In addition, the feasibility of implementing the algorithm on the Cray T3D Massively Parallel Proces sors (MPP) is proposed.
The velocity distribution in a 2-D space is discretized into homogeneous polygonal cells. The search for the shortest traveltime and path between two given points can be reduced to a discrete graph searching. In the gen eral 3D case, the velocity model is characterized by discrete con,rex blocks
Furthermore, such methods in principle can C)nly be used to c£1lculate one raypath betw·een two specif�ic points in a sequential manner. In a prac.tical 3-D prestack migration p1�ocess, the objecti\1e is to fi ll every grid point in a 3-D volun1e \Vith the fi rst arri\'Lll traveltime frc)m an arbitrary source location. Computing tra\1eltime at all grid points be.comes an extremely CPU intensive task. The con\1entional scheme, thus, becclmes impractically slo\\1 for actL1al data • processing.
A sche1ne based on the expanding wavefr(1nt theor)' �'as recent])' developed and pre sented as an alternative to conventional raypath CCJ1nputing. Reshef and Koslot'f ( 1986) pro p()Sed the earlier versi<) n of this idea based on the eikonal eqt1ation soluti()n. Vidale ( 1988,199()) presented a t' i nite-dif' ference scheme to calculate the trave.ltime at a11 corners ot · cubic cells. Van Trier clnd Symes ( 1991) proposed the imp1·0\ied version ()f, so]\1ing the eikonal equa tion via an upw·ind fi nite-dit' ference scheme. Dellinger ( 199 l) later applied it to anis()tropic media. Qin et c1l, ( 1992) t· urthe.r improv·ed the algorithm b-yr calculating the traveltime along expanding wa\1efronts instead ot' along the expanding squares. Ray·paths fr om the source p()int to all g . rid points can be obtained by t1·acing the secondary SC)urce points in succession through grid points.
The above procedures explicitly c£:1lculate 1n i11imum tra veltimes on the network nodes along an expanding \vavefront. The sh()ftest raypath approach pose . ct many advantages O\ier pre\1 ious schemes. For instance, this method considers the incidence C)f' wa\les in all possible . take. off angles from the source. It is applicable to any type of source-receiver geomet1·y and complex structure model with strc)ngly v'aryi ng vel ocities. More<.)\ier .. it al\\1ays finds the glo bal minimum on the expanding wa\1efront� additionall�r, head waves and dit't'ra ction \�/aves a1·e calculated where they '1;re the first a1Tiving energies, etc.
The scheme's computational efficiency is an()ther mc1jor concern about the shortest r�1ypath 301 approach. Raypaths from a source to all grid points in the model can be constructed in one pass. In fact, the cost of generating a family of raypaths is no more than that of calculating one path between any two specified points. This algorithm is particularly appropriate for imple mentation on a distributed memory MPP system, as an even higher level of parallelism can be re alized by parallelly computing several families of shortest paths from different sources. Although there are no restrictions with the classical ray theory, some drawbacks are in herent just from the sheer nature of the sparse graphs approach. The accuracy of the method in approximating the raypath strongly depends on the co mpactness of a network's disc . rete nodes. If the nodes are close together, the raypaths from the source can be changed with a very small increment in a takeoff angle. An excellent approximation of ray tracing can be produced. Howe. ver, including too many nodes in the network would greatly add to an undesirable bur den on computer resources which would subsequently reduce. the computional efficiency.
It is intuitively clear that a net\\1ork with a smaller number of nodes and larger block sizes theoretically generates a better raypath approximation of the actual ray connecting two speci fied points than the one with more cells. Closely examining Figure 1 indicates that the system with smaller rectangular cells actually generates a poorer approximation. Therefore, as long as the desired geological features are incorporated into the model, selecting a grid as coarse as possible is preferable so as to achieve computational efficiency and accuracy. With regard to a uniform velocity 2-D mode], this figure shows the prob lem of inaccuracy of computing the shortest raypath between two points A and B due to the sparseness of node. s on the networks. With a given node interval distance, network (b) with a smaller cell size clearly generates a poorer raypath approximation to the true ra)' than the raypath in network (a) composed of larger cells and a smaller number of nodes.

B
Motivated b)' those. results, in this study, a more flexible network is designed by allowing the earth model to be composed of homogeneous blocks of irregular size. In the 2-D case, each block is a convex polygon ch. aracterized by polygon vertices and associated ve. locities. A net work of nodes on the boundaries of the block is generated by the algorithm. For general 3-D 302 TA O, V(Jl. 7, N(J. 3, SeJJtember 1996 1nedia, the model consists of blocks ot' con\1ex polyhedron bounded by polygonal f'acets as the bc)undaries. The ne\\/ net�'C)rks normall)' C()ntain t'eV\.1er nodes on the network and, thus� rele . ase the burden C)ll comput<:ttional time and memc)ry. MorecJvei-, tl1is allows the ce ll bot1ndaries to t'i t better on a complex structure. , such as irregularl y shaped fault planes, dipping beds and salt domes, without re.so1·ting to stair-step approximations inherent w·ith square cells. Figure 2 cc)mpares the net\\lork systems for 2-D models .
This paper fi rst i1 lt1strates the algorithm of the construction of the node and edge net\\i()fk in a 2-D block ffi()del and the computation cJt· the traveltime and ray·path trom a given source point to all nodes. A series of' examples of' increasing complexit)' are employe. d to ill ustrate the salient characteristics at· tra\1el-time and raypath computations. A straightf'or�t ard extending ot· the procedure into the 3-D problem is then addressed. To <:lchie\1e maximum perf'()f mance, the key concepts and t' eatures i·egarding machine dependent architec.ture and the c. omputing env ·ironment are necessary. In the following section, the. concept of the massively parallel processing (MPP) computation for a highly scalable heteroge11.ec)US supercomputing S)1stem is t· irst introduced.

IVIACROARCHITECTURE OF CRAY T3D MPP
Cray T3D MPP is a scalable heterogeneC)US computing syste1n with multiple-instruction multiple-data (Iv1IMD) character in architecture . It is scalable fr om I 6 to 1024 nodes with t\\tO processing elements (PEs) in each ( Figure 3a). The DEC Alpha microprocesso1· , with the capability ot· 150 �1FLOPS peak perfo1·mance and a di1·ect-mapped d'1ta cache of 256 lines (32 b)1tes per line), is the heart ot· every PE. The memory-of' the T3D is physically distributed; howev·er, it is globally addressable by each PE. Each PE on a node o\\1ns a single bank of local Dy ·namic Random-Access Memory· (DRAM) external to the Alpha processor. The memory inte.rface between the PE and local mem()ry is the Cray custom interface . This circuitry ex tends the local virtual address to point into a global adct1�ess space. Each PE can directl)1 read from and w·rite to other PE me.mories. Although each PE can directly address the memo1·y of' every' other PE., the PE accesses its O\\'n DRAM much fa ster.
Improving the procedure's ef' t' iciency e.ntails minimizing communication overhead and maximizing sustained computational rates by considering all macr()architecture characteris tics of the T3D MPP system in the design of the algorithm. The tlexible synchronization mecha nism is impo1· tant f' or parallel computing on MPP machines.

SEARCHING THE SHORTEST RAYPATH
The 2-D earth model, regardless of the velocity complexity, is considered to be C()mposed t)f' a fi nite number ot' uniform-velocity polygonal ce lls. The alg()rithm request that all cells be · .
; ., The PEs are con nected by a 3-D torus system interconnected network. This design allows for an extremely fast remote memory access fo r efficient MPP system us age.
convex to ensure that all segn1ents of' the ra)'paths connecting any· two nodes on the boundaries of the cell must lie entirely within it.
In accordance 1W'ith Moser's paper, the present authors do not place nodes at the c. ell ver texes in such a way· that all net\vork nodes are shared by two blocks, except those on the outer boundaries. The input parameters req11ired to construct the network are vertex coordinates, the t'i xed distance. bet\\leen two neighboring nodes and the \ie]ocities associated with each discrete block. Coordinates ()f all network nodes are automatically calculated and distributed over the local memories of all a\1ailable PEs. Meanwhile., f(11A each node, information about the block numbers, which are shared with this node, is also saved. The node coordinates and bloc . k number information are related to the network S)'Stem1s geometry and should not be changed at any time throughout the computation. Therefore, no data communication on these a1�rays is required. The compt1tation efficiency is thus signifi cantly improved. An array ot· the same length needs to be al located to store the trav·eltime values which are updated throughout the processing. Fetching data from a remote PE is con siderably slower than data movring within the local memory (Numrich, 1994� Koeninger, 1992. Accordingly, the stl)rage of nodal coordinates is arranged so that all nodes on the boundaries of one cell are saved into the local memo1�y of a specific PE to minimize data communication.
Dijkstra's algorithm is applied concurre.ntly at al l PEs to pick out the node with the mini mum traveltime \\1 hich can t'u nction as the ne\v secondary sot1rce t .. or the next iteration of radiation. Calculating the traveltime between any two nodes on the cell boundaries is rela tively easy and rapid. The major cost of the . method a1�ises fro m the checking of the \\i'avet'ro nt position and fi nding the secondary· source on the cell boundaries. Howe\1er, the sorting and searching process cannot be vectorized. This is the most time-consuming part of the scheme. i\s the traveltime information is distributed over all PEs, the process actual ly executes in parallel to pick out local minimums. A global minimum is then selected fro m the sequence of local minimums so that the first arriv·al traveltime at this node can be calculate. d. Although interprocessor co mmunications cannot be comp1etel)' avoided, the algorithm is designed to red uce the data movements among PEs as much as possible.

ALGORITHl\1 AND IMPLE1\1ENT A TION
The present authors basically fo llow Mose1·'s procedure in applying Dijkstra's algorithm to find a new source along the expanding wavefro nt. Howe\'er, due to the distinctness in the graphic network put fo rth and the characteristics of the hardware archite.cture, the algorithm used here is not the same as that addressed in Moser's work.
The procedure initial ly determines the coc)rdinates ot' aJI network nodes fro m the given model geometries and the n()de interval. The calculated node positions are evenly distributed over the local memor)' C)f the available PEs. A one-dimensional array with the same dimension of .. node number must be allocated to store the trave1time at each node. During iterative computaional procedure ., the tra\reltime information at each node is also distributed over the PEs . During the computation, all nodes are div·ided into two disjo int sets., open and closed, denc)ted by P and C. Closed set C consists of nodes ha\,. ing had wavefronts t� rom the source propagated through th�m, \\thereas all nodes in the open set P hav·e not been hit by the \v avefro nt.
Initially·, all nodes reside in set P, and the traveltimes at all nodes are set to be infinite. In Ja5·011 C. Kao & Hott;-Wei Chen the beginn.ing of the process, straight ray·paths radiate t' rom the source point into all nodes on the boundaries of the cell containing the source point. The tra\leltimes of all nodes on the cell boundaries are then updated. In set P, a node with minimum traveltime is picked as the second ary· sou1�ce point for the next iteration of wavefront radiating. It is assumed that the. secondar)' source point is considered to be hit by the �,avefront and must be moved into set C from set P. Because each node is shared by· two cells .. the secondary sourc. e becomes a new starting point for emitting rays into all boundary nodes of those cells. The traveltimes on each node of ray· incidence are then updated. A node in set P with the minimum arrival time is again selected as the new secondary source. The same process is repeated until the wavefront goes through all the network nodes thereb)', moving al l nodes from set P into set C.
The. entire procedures described above c.an be presented in terms of a pseudo code and fu rther translated into either Rational Fortran (Ratfor for short) or standard Fortran language. The. algorithm can be formulated as follows: 2. Use p as the secondary source and update the travel time of all nodes in set P which are also on the boundaries of cells containing node p.
3. Move node p into set C. } It should be note. ct that since node coordinates are determined by the. geometry of the block model, the nodal coordinate system is independent of the source position an d should not be changed during the processing. This procedure allows for the computation of raypaths from several separate sources in parallel. Designing a program for massi\'ely parallel computing is based on the Cray MPP Fortran (CRAFT) (Pase et al., 1992), and inter-processor communica tions are facilitated via shared memory access libraIJ' routines (Barriuso, 1994 ).

SYNTHETIC EXAMPLES
In the t'ollo\\ting synthetic applications� twoand th14ee-dimensional models are used to demonstrate the behavior of the algorithn1. A series of examples of increasing complexit)' is employed to illustrate the salient characteristics of travel time computations. A relatively simple and grabe n-like model \V'ith a moderate velocity· contrast is presented. A circular body with the characteristics of high velocity contrast and steep di . P is also used to mimic salt and magma  Figure 4 shows a higher velocity block embedded in a lower velocit)' bac. kground. To create a system to fit the algorithm, the medium is divided into f' ive discrete con\lex polygons in which 287 nodes are generated onto the boundaries. Figure 5 describes the procedures of computing raypath and traveltime f� or a S< . )urce located inside a block. The raypath t� rom a source point to all boundary nodes of the cell are first calculated based on Dijkstra's algorithm. Traveltimes at any position inside the polygonal cells can be estimated from those on the cell boundaries. The ·minimum arrival ti n1e is picked along the cell boundaries as the secondary source. The sec ondar)' source, denoted as C, is used ag ain for the next iteration t() radiate rays to all nodes that share th is source .. The calculated ra)1path and corresponding trav'eltime from C are then up dated again for the next iteration. The procedure is executed in parallel to determine the local minimums. Finally, a global minimum of travel time is se lected t� rom the sequence of local minimums. The traveltime within each block can be obtained by first detenning the traveltime along each ra)'path . A bi-directional linear interpol ation is then performed to al locate the ac tual travel time desired f' or display. The. computation can be pe1·formed over local machines, such as a workstation or PC. Figure 6 shc)w·s the final shortest ra)lpaths and t� irst arrival traveltime isochron contours. Some blank areas in Figure 6 that do not contain raypaths are the result of searching for the t�i nal global minimum of travel time. Figure 7 depicts a somewhat re alistic 2-D graben-like structure model with the geological features of steep normal t· ault planes and tilted fau lt blocks. The velocity change across any adjacent blocks is moderate enough to represent most of the geological t�eatures generally encountered. Figure 8 displays the calcu lated ra)'paths and isochronous contours. ;' . , .  . .
i . --   dit'f'1·acted \vave. f' r onts in the shado\\1 zone. Inside, the ve]()City contrast is such that it prc)v·okes a closure of wa,1et'rc.)nts (focusing) due tc) the i·ef' racted ray·s. The focusing effect, along with refracted, diffracted and head wa\1es are pa1·ticularly distinguishable in Figure 9. Alth()ugh the rnodel is geologically irrelev·ant, the prop()Sed method does not raise an y C()mputational prob lems e\1en for extreme!)' severe first order ve.locity C(Jnt1·asts. The inodel is particularly important to test the algorithm's tlexibility, stability an d accu-1·ac1·. Because the image c_)f the subsu1·face structure is highly· influenced by the accuracy ot' the image condition obtained t'rom dift' erent tra\1eltime generators . A disto1·ted or even vanished image can be obtained w·hen a low velocity body exists bet\V·een the source and the major target structure to be imaged. This can occur in the presence of an irregular})' shaped low velocity body· that ot'ten creates a shado\v zone. The. main ad\1antage of' using this approach is that traveltime information can be <)btained 11ot only along the structu1·e boundary and within a low velocity circular body but also within the shadovv zone. Also, the large velocity con.trast bet\veen the salt and the. surrounding sediments frequent . 1)1 pr()duces anl)malously large (post critical) retlections a11d C()1·r espondingl)' reduced amplitudes t' or refl ections f� rom deeper struc tures. A . highly dist()rted lo\\1 -wavelength image of very· large amplitude that contaminates the actual image obtained can be observed in the n1igrated se. ction when post-critically refracted wa\1es are included.

Three-dimensional Model
The concept ot· computing in a 2-D medium can naturally· be extended into the 3-D case. The 3-D model is still cc)nside1�ect here as a piecewise smooth medium consisting of discrete blocks. Each block is a constant velocity con\;ex polyhedron with pol)1gonal fa cets as bound aries. All net�·ork nodes are evenly spread o',r er the boundary t'acets. It should be noted that all nodes are placed inside the boundary fac et such that each one is shared by two and only two blocks. Defini11g the input parameters f' or a complex 3-D model is rather cumbersome. Ho\\1ever, once. the network system is created, the computational scheme in the 3-D case is the same as that in the 2-D model, except that the rnemol)' size is great])' increased and the tloating p<)int operations are much more intensi\1e.
For simplicity, the procedure was experimented over a 3-0 body consisting of a higher \,elocity· block embedded into a unit'orm velocity background as sho\\1n in Figure I O(a). The 3-D body is divided into seven discrete subregions, each of' \\'hich is a convex pol)1hedron. The nodes at each boundary facet are generated f' r<)m the scheme. Figure 1 O(b) depicts the polyhe dral regions and nc . )de.s on one facet. Given a sc)urce. point at the center of the top surt'ace, Figure 11 (a) shows the sho1�test rays originating fr om the source to specified nodes on the ()Ute1· pla11es. Figure 1 1 (b) depicts the wa\1ef'1·()nts c.lt seve1�a1 time steps. Figure 12 shows the successive expanding (1t· the fi rst arrival surf,tces in the 3-D medium.

DISCUSSION AND CONCLUSIONS
In this study, the authors have dev'eloped and tested a paralle.l algorithm of the sparse graphic searching method which employs Dijkstras' algorithms and Hugens' principle of find ing the shortest raypath and their corresponding t1·av1el-time t'o r 2-D and 3-D models. Ray'paths   from a source to all gr.id points in the mc)del can be constructed in ()ne pass. An even higher level of parallelism c'1n be achie\'ed by the parallel computing of sev·eral t'amilies of shortest paths from different sources. The approach in this paper is relatively new with respect tc) the parallel nature of the algorithm. ,L\dditionally ·, the impro\1ement of �1oser's method p1-ov·ides an intuitive approach to calculate trave lti1ne a11d raypath. The computational efficiency and accuracy of the sche111e are the 1najor concerns of· the current sho1-test raypath approach. The scheme is more flexible than the conventi()nal approach for it allows t'o r the earth model to be composed of' a homogeneoLts C()nvex polygon ()t� irregular size. The proposed method also allows for coarse gtid along the edges t' or traveltime and raypath calculation. The implementa ti()n of the parallel versic)n of· the alg()rithm on a distributed memory MPP system ensures the entire scheme is rapid and efficient.
Since the approach here. is developed f'or a homoge11ous cell, the accurac)' of the com puted traveltime is only related to machine run-off error (i.e., machine epsilon). Comparing the ac curacy betv.1een the · calculated and analytical solutions is unnecessary, which is in con- trast to the fi .nite-diffe.rence approach of travel time computing that in . herits an anisotropy property of traveltime values at each grid point. The computational efficienC)' and accuracy can be justified on the basis of the complexity of the model and the requirements in thei14 own appli cation. In fact, for a 2-D case, such as the model in Figure 4, the execution time on an MPP machine takes much le.ss than a second. The scalar version of the program which runs on a sun Sparc/20, including input and output to a disk, takes only 0.3 second of \\1all clock time.
The shortest ra)1path method is a rapid yet re liable approach for computing first arrival traveltime at e\,iery grid of the 2-D and 3-. D block model. The originality of the algorithm is the searching of the. shortest ra)1paths from a source point to all nodes of the model network.
The travel time at any location of the model can be calculated from values on the cell bound ar ies. Due to the parallel nature of the scheme., the algorithm is particularly appropriate for implementation on a MPP syste.m .
The algorithm is, in principle, applicable to the model of an isotropic medium of any kind. So fa14, the authors have onl)' considered the raypaths in cells of a constant velocity. For more practical velocity distributions, a heterogeneous media \\i' ith continuously varying velocity blocks should be considere . d. In such a velocity model, rather than a straight segment, raypaths may bend smoothly within the cells. As an alternative approach to accommodate vari able \r elocity in a heterogeneous n1edium, the mode. I can be constructed by using small blocks (the constant-velocity blocks should be sufficiently small) compared to the minimum resolution required. Howev·er, the primary purpose in the current algorithm here aims to maintain the advantages ot' using larger block size t'or bette1· approximation. The.refore., this approach has been ruled out. The problem can be resolved b)1 computing the bending ra)1path within the cell. That will be the f'uture research of' the authors.
The current approach is developed with the computation of first arrival traveltime over e'1ery grid of any· model in mind. Head waves, refracted and diffracted rays are automatically computed. The approximation of reflected ra)'S can be obtained by simultaneously computing all the rays fr om all source locations along the re.flector boundary.
Potential applications of the algorithm include real-time earthquake hypocenter dete.rmi nation for earthquake se.ismology studies and the computation of excitation time imaging con dition for pre.stack depth migration. Furthermore, real-time parallel computing of trave.ltime from any· earthquake to any specified location may become one of the ve1 .. y important factors in designing an early warning system and the earthquake mitigation program which are cur re.ntly being jnvestigated in Taiwan.
Nevertheless, the experiments of the method on the 2-D and 3-D block models show the et�fi ciency' and accuracy of the algorithm even in the presence of extremely severe, arbitrarily shaped velocity contrasts. The scheme's parallel nature makes the proposed algorithm particu larly appropriate for implementation on a Massively Parallel Processor type system that per mits the displaying of i4esults interactively on a graphic terminal.