Next: Archetype implementation
Up: Introduction: mesh computations
Previous: Structuring the parallel program
Given the parallelization strategy described in the previous
section,
a parallel program to accomplish a particular mesh computation
closely resembles its sequential counterpart, except that the
work has been partitioned between a host process and a number of
essentially identical grid processes:
- Computing new values for grid-based variables.
-
- Sequential program
-
loops over the whole grid.
Points on the boundary may be treated
differently from interior points.
- Host process
- does nothing.
- Grid processes
- first ensure that ghost
boundaries to be used as input contain
current values (via a boundary-exchange
operation), then
each loop over a local section.
Because of the presence of the ghost
boundaries,
no special handling is required for
points on ``internal'' boundaries (points
that are on the boundary of a local section
but that do not correspond to points on
the boundary of the whole array).
If points on the boundary of the whole
array require different treatment,
this is handled by
grid processes that contain part of the
boundary.
- Reading values into a grid-based variable.
-
- Sequential program
- reads into a whole
array, e.g. from a file.
The parallel program may take several approaches. The most
straightforward makes use of the host process:
- Host process
- reads into its array and
then participates in redistribution operation
that distributes array values over the
process grid.
- Grid processes
-
participate in redistribution
operation.
An alternative approach reads data directly into the grid
processes:
- Host process
- does nothing.
- Grid processes
- each read from a
separate sequential file. Each file
contains data for one local section.
- Writing values from a grid-based variable.
-
- Sequential program
- writes a whole
array, e.g. to a file.
The parallel program may take several approaches. The most
straightforward makes use of the host process:
- Host process
-
participates in redistribution operation
that collects array values from the
process grid and then writes from its array.
- Grid processes
-
participate in redistribution
operation.
An alternative approach writes data directly from the grid
processes:
- Host process
- does nothing.
- Grid processes
- each write to a
separate sequential file. Each file
contains data for one local section.
- Reading values into a duplicated (non-grid)
variable.
-
- Sequential program
- reads data (global
constants, e.g.) from a file.
- Host process
- reads data in the same way
the sequential program would and then
participates in a broadcast operation to
copy the data to the grid processes.
- Grid processes
- participate in a broadcast
operation to obtain data from the host process.
- Writing values from a duplicated (non-grid)
variable.
-
- Sequential program
- writes data (results
of a reduction operation, e.g.) to a file.
- Host process
- writes data
exactly as the sequential program does.
(Usually, the variable whose value is to be
written has the same value in all processes --
either because it is a global constant or
because it is the result of a reduction
operation, as described below.)
- Grid processes
- do nothing.
- Performing a reduction operation.
-
- Sequential program
- performs the reduction,
often by looping over the whole array.
- Host process
- participates in the reduction
operation -- without, however, supplying
data -- and receives the result.
- Grid processes
-
participate in the reduction
operation, supplying data and receiving the
result. (E.g., to compute a global maximum,
each grid process computes a local maximum,
and then all processes (host and grid)
participate in a reduction operation, after
which all processes have the resulting global
maximum.)
If the computation does not perform whole-grid reads or writes using
the host process, then it is possible to parallelize it without
a host process; in that case, the actions performed by the host
process in the above descriptions are instead performed by one
of the grid processes, which is singled out as the ``designated
I/O process''.
Next: Archetype implementation
Up: Introduction: mesh computations
Previous: Structuring the parallel program
Berna L Massingill
Mon Jun 8 19:35:58 PDT 1998