Sequential vs. parallel

Next: Archetype implementation Up: Introduction: mesh computations Previous: Structuring the parallel program

Sequential vs. parallel

Given the parallelization strategy described in the previous section, a parallel program to accomplish a particular mesh computation closely resembles its sequential counterpart, except that the work has been partitioned between a host process and a number of essentially identical grid processes:

Computing new values for grid-based variables.

Sequential program: loops over the whole grid. Points on the boundary may be treated differently from interior points.
Host process: does nothing.
Grid processes: first ensure that ghost boundaries to be used as input contain current values (via a boundary-exchange operation), then each loop over a local section. Because of the presence of the ghost boundaries, no special handling is required for points on ``internal'' boundaries (points that are on the boundary of a local section but that do not correspond to points on the boundary of the whole array). If points on the boundary of the whole array require different treatment, this is handled by grid processes that contain part of the boundary.

Reading values into a grid-based variable.

Sequential program: reads into a whole array, e.g. from a file.

The parallel program may take several approaches. The most straightforward makes use of the host process:

Host process: reads into its array and then participates in redistribution operation that distributes array values over the process grid.
Grid processes: participate in redistribution operation.

An alternative approach reads data directly into the grid processes:

Host process: does nothing.
Grid processes: each read from a separate sequential file. Each file contains data for one local section.

Writing values from a grid-based variable.

Sequential program: writes a whole array, e.g. to a file.

The parallel program may take several approaches. The most straightforward makes use of the host process:

Host process: participates in redistribution operation that collects array values from the process grid and then writes from its array.
Grid processes: participate in redistribution operation.

An alternative approach writes data directly from the grid processes:

Host process: does nothing.
Grid processes: each write to a separate sequential file. Each file contains data for one local section.

Reading values into a duplicated (non-grid) variable.

Sequential program: reads data (global constants, e.g.) from a file.
Host process: reads data in the same way the sequential program would and then participates in a broadcast operation to copy the data to the grid processes.
Grid processes: participate in a broadcast operation to obtain data from the host process.

Writing values from a duplicated (non-grid) variable.

Sequential program: writes data (results of a reduction operation, e.g.) to a file.
Host process: writes data exactly as the sequential program does. (Usually, the variable whose value is to be written has the same value in all processes -- either because it is a global constant or because it is the result of a reduction operation, as described below.)
Grid processes: do nothing.

Performing a reduction operation.

Sequential program: performs the reduction, often by looping over the whole array.
Host process: participates in the reduction operation -- without, however, supplying data -- and receives the result.
Grid processes: participate in the reduction operation, supplying data and receiving the result. (E.g., to compute a global maximum, each grid process computes a local maximum, and then all processes (host and grid) participate in a reduction operation, after which all processes have the resulting global maximum.)

If the computation does not perform whole-grid reads or writes using the host process, then it is possible to parallelize it without a host process; in that case, the actions performed by the host process in the above descriptions are instead performed by one of the grid processes, which is singled out as the ``designated I/O process''.

Next: Archetype implementation Up: Introduction: mesh computations Previous: Structuring the parallel program

Berna L Massingill
Mon Jun 8 19:35:58 PDT 1998