Pattern Name: GeometricDecomposition |
AlgorithmStructure Design Space |
Supporting Document: Examples |
Generating a parallel version using OpenMP is fairly straightforward. Since OpenMP is a shared-memory environment, there is no need to explicitly partition and distribute the two key arrays (uk and ukp1); computation of new values is implicitly distributed among UEs by the two parallel loop directives, which also implicitly provide the required synchronization (all threads must finish updating ukp1 before any threads start updating uk, and vice versa. The SCHEDULE(STATIC) clauses distribute the iterations of the parallel loops among available threads.
real uk(1:NX), ukp1(1:NX) dx = 1.0/NX dt = 0.5*dx*dx C-------initialization of uk, ukp1 omitted do k=1,NSTEPS C$OMP PARALLEL DO SCHEDULE(STATIC) do i=2,NX-1 ukp1(i)=uk(i)+(dt/(dx*dx))*(uk(i+1)-2*uk(i)+uk(i-1)) enddo C$OMP END PARALLEL DO C$OMP PARALLEL DO SCHEDULE(STATIC) do i=2,NX-1 uk(i)=ukp1(i) enddo C$OMP END PARALLEL DO print_step(k, uk) enddo end
An MPI-based program for this example uses the data distribution described in the Motivation section of the main document. The program is an SPMD computation in which each process's code strongly resembles the sequential code for the program, except that the loop to compute values for variable ukp1 is preceded by message-passing operations in which the values of uk in the leftmost and rightmost cells of the subarray are sent to the processes owning the left and right neighbors of the subarray, and corresponding values are received. The following shows the code to be executed by each process; some details are omitted for the sake of simplicity. NP denotes the number of processes.
real uk(0:(NX/NP)+1), ukp1(0:(NX/NP)+1) dx = 1.0/NX dt = 0.5*dx*dx C-------process IDs for this process and its neighbors integer my_id, left_nbr, right_nbr, istat(MPI_STATUS_SIZE) C-------initialization of uk, ukp1 omitted C-------initialization of my_id, left_nbr, right_nbr omitted C-------compute loop bounds (must skip ends of original array) istart=1 if (my_id .eq. 0) istart=2 iend=NX/NP if (my_id .eq. NP-1) iend=(NX/NP)-1 C-------main timestep loop do k=1,NSTEPS C-----------exchange boundary information if (my_id .ne. 0) then call MPI_SEND(uk(1),1,MPI_INTEGER,left_nbr,0, $ MPI_COMM_WORLD,ierr) endif if (my_id .ne. NP-1) then call MPI_SEND(uk(NX/NP),1,MPI_INTEGER,right_nbr,0, $ MPI_COMM_WORLD,ierr) endif if (my_id .ne. 0) then call MPI_RECV(uk(0),1,MPI_INTEGER,left_nbr,0, $ MPI_COMM_WORLD,istat,ierr) endif if (my_id .ne. NP-1) then call MPI_RECV(uk((NX/NP)+1),1,MPI_INTEGER,right_nbr,0, $ MPI_COMM_WORLD,istat,ierr) endif C-----------compute new values for this chunk of ukp1 do i=istart, iend ukp1(i)=uk(i)+(dt/(dx*dx))*(uk(i+1)-2*uk(i)+uk(i-1)) enddo C-----------copy to uk for next iteration and print do i=istart, iend uk(i)=ukp1(i) enddo print_step_distrib(k, uk, my_id) enddo end