Parallel Programming Environments
Introduction
To implement a parallel algorithm you need to construct a
parallel program. The environment within which parallel programs
are constructed is called the parallel programming
environment. Programming environments correspond roughly to
languages and libraries, as the examples below illustrate -- for
example, HPF is a set of extensions to Fortran 90 (a "parallel
language", so to speak), while MPI is a library of function
calls.
There are hundreds of parallel programming environments. To
understand them and organize them in a meaningful way, we need to
sort them with regard to a classification scheme. In this note, we
organize programming environments in terms of their core
programming models. This is a complicated way to sort parallel
programming environments, since a single programming environment
can be classified under more than one programming model (for
example, the Linda coordination language can be thought of in terms
of a distributed-data-structure model or a coordination model).
In this note, the classifications are given, and the programming
environments in each class are described in general terms. We then
give a very small sampling of the most important programming
environments for each category.
Formal models
Some programming environments are defined in terms of detailed,
formal theories. These programming environments provide guarantees
that less formal approaches can't make. For example, it is possible
to build programming environments that are deterministic and that
strictly preserve a program's sequential semantics. Unfortunately,
formal programming environments use language constructs that are
unfamiliar to traditional programmers, which makes for a barrier to
their use.
Logic programming models
Programming environments based on logic programming use
declarative as opposed to imperative semantics. The most common
approaches are based on Prolog's first-order predicate calculus.
Concurrency is included in one of three ways:
and-parallelism (execute multiple predicates),
or-parallelism (execute multiple guards), or through explicit
mapping of predicates linked together through single assignment
variables.
Functional programming models
Functional programming languages use declarative semantics and
some form of lambda calculus to express the operation of a program.
LISP is perhaps the oldest and best known of the functional
languages. With pure functional languages, there are no side
effects from a function. Hence, executing functions as soon as the
required data is available provides a natural way to achieve
concurrent execution.
Compositional models
These models are based on explicitly distinguishing among ways
in which programs can be composed (put together to form larger
programs). In our context, the most pertinent forms of composition
are sequential (in which programs execute in sequence) and parallel
(in which programs execute "concurrently", where concurrent
execution is frequently modeled as interleaved execution of the
elements of the composition).
Informal models
These models are pragmatic and are motivated by the experiences
of parallel programmers as opposed to the theories of computer
scientists. Consequently, these models don't categorize as neatly
as the formal models do, and there will be some overlap between
categories. For example, a coordination library (MPI) can be used
to write SPMD data-parallel algorithms
Data-parallel models
Data-parallel programming models view a parallel computation in
terms of a single instruction stream operating on multiple sets of
data. The multiple data sets provide the source of the
concurrency.
Coordination models
A coordination model [Mattson96] defines the mechanism used to
coordinate the execution of distinct processes within a parallel
program. We use the term coordination to refer to the three
fundamental issues in concurrency: communication, synchronization,
and process management. Coordination is explicit and occurs at
distinct points within a program. Coordination models are usually
combined with well-known sequential languages, so they have been
very successful at attracting programmers. The most common way to
use coordination models is to write
SPMD programs with data-driven parallelism. Programming
environments that use coordination models usually fall into one of
two classes:
Communication-based coordination models
Communication between concurrent tasks takes place through the
exchange of messages or some other discrete communication event.
The semantics for the communication usually provides
synchronization as well communication.
Shared-memory coordination models
Coordination is provided by the semantics of operations on
shared data structures.
Object-based models
Objects are traditionally used to provide data abstraction.
However, the same basic concept can be used to provide concurrency
abstractions. The object-oriented system provides a framework
within which a parallel program can be constructed.
- HPC++
- Mentat
- CHARM++
- POOMA
Shared-memory models
These models replace communication with implicit data sharing
through a single address space. Shared-memory programming models
are the native model on multiprocessor systems using a symmetric
operating system (the so-called SMP systems). Implicit
communication between tasks, however, is attractive enough that
researchers have extended the model to non-SMP systems as well. We
can define two classes of shared-address-space programming
environments:
Explicit-task shared-memory models
These systems have at their core the idea of very lightweight
processes that share an address space. These processes are called
threads. Since the address space is shared, these systems generally
ignore communication and focus on ways to control the relations
between threads and protect ordering constraints on data access.
Creation and interaction between threads is explicit in these
systems..
- ACE
- Java
- Pthreads
- Threads.h++
- Win32
Implicit-task shared-memory models
Some shared-memory systems use higher-level constructs as
opposed to explicitly controlled threads of execution. For example,
a system might express concurrency in terms of distributed loop
iterations or parallel sections.
A list of key programming environments
The following is a rough draft of a list of the most important
parallel programming environments.
- ACE
(Adaptive Communication Environment): This C++ threads environment
is portable between UNIX and Win32 platforms. It integrates into
the threads a range of IPC mechanisms such as RPC, sockets, and
System V IPC. The overall driving force behind ACE is its use of
many core design patterns for concurrent communication software.
ACS provides a rich set of reusable C++ wrappers and framework
components that perform common communication software tasks across
a range of operating system platforms.
- BSP
(Bulk Synchronous Parallel model): They have plans to support NT,
but I don't know when such a system will be available. Currently,
it runs on a number of systems including PCs under Linux.
- Calypso: An implementation of the
BSP model (though the authors don't describe it this way). A
Calypso program is an SPMD program that consists of master and
worker processes that are distributed about a network of UNIX or NT
workstations (homogenous OS only). The master process takes care of
sequential operations and serves as a task and memory server for
the worker processes. Processes dynamically participate in
computing parallel sections. The system provides dynamic load
balancing and a degree of fault tolerance.
-
CC++ (Compositional C++, Caltech): Parallel programming
language based on C++. Doesn't support NT, but it does support
Solaris (and hence it is not unreasonable to get it to work on a
PC).
- Charm (Charm/Charm++,
University of Illinois at Urbana-Champaign): The Converse tools:
the CHARM message driven Programming environment.
- Chameleon: A message-passing library
from Bill Gropp (gropp@mcs.anl.gov) at Argonne
National Laboratory. Chameleon is a low-level stable interface to
p4, PICL, PVM, and vendor-specific message-passing environments.
Chameleon can be used in a development mode where it provides a
wide variety of debugging information or in production mode where
it emphasizes parallel efficiency.
- Cilk: Efficient
execution of multithreaded computations. Cilk is an algorithmic
multithreaded language. The philosophy behind Cilk is that
programmers should concentrate on structuring their programs to
expose parallelism and exploit locality, leaving the runtime system
with the responsibility of scheduling the computation to run
efficiently on a given platform. Thus, the Cilk runtime system
takes care of details like load balancing, paging, and
communication protocols. Unlike other multithreaded languages,
however, Cilk is algorithmic in that the runtime system guarantees
efficient and predictable performance. Cilk runs on many different
SMP systems including PCs running Linux.
- CM-Fortran: The Fortran environment
on the Connection Machine. Implements a strict data parallel
programming model.
- Code: Visual
parallel programming system.
- Concurrent C: Concurrent C
[Gehani85] extends the CSP model used in Occam to provide a more
general (and complex) concurrent programming model. Concurrent C
uses a synchronous message-passing scheme to facilitate
parallelism, with processes proceeding independently except when
communication takes place. Concurrent C introduces a fairly complex
set of new operators and program structures to support asynchronous
parallel programming. Through the use of transaction pointers (a
transaction is a structured two-way communication mechanism) and
process variables, Concurrent C processes are able to communicate
directly with other processes, regardless of the physical position
of the processes in the system. Concurrent C maps well onto
distributed memory machines. (Actually, Concurrent C can be
implemented on a shared memory system, but constructs for managing
shared memory were deliberately left out of the language to ensure
portability.)
- Crystal: Marina Chen's functional
programming language [Szymanski]. This language is based on
familiar mathematical notation and lambda calculus.
- Data Parallel C: Quinn and Hatcher's
MIMD version of TMC's C* [Hatcher91].
- DOLIB: Distributed Object Library
[D'Azevedo94a] provides a distributed array much like GA. DOLIB is
more general in that its fundamental data type is a one-dimensional
array. Paging is used to try to automatically provide higher
performance. It is also integrated with an I/O library called DONIO
[D'Azevedo94b]. The system was used to create a scalable molecular
dynamics program [D'Azevedo94c]. GA and DOLIB are compared and
contrasted in [Mattson95a].
- Fortran
D: The experimental compiler system from Ken Kennedy's group at
Rice University. The syntax and programming model of Fortran D are
essentially the same as for HPF.
- Fortran M:
Fortran M from Argonne National Laboratory. Works w/on:
Heterogeneous computers. Languages: Fortran. Contact: fortran-m@mcs.anl.gov. A small
set of extensions to Fortran 77 that supports development of
modular, deterministic message-passing programs. [Foster93b]
-
GA: GA or Global Arrays [Nieplocha94] from Pacific Northwest
National Laboratory. Shared-memory programming interface for
distributed-memory computers. GA provides a library interface to a
distributed two-dimensional array data type. GA is compared to
DOLIB and NX in [Mattson95a].
- GLU: Granular
Lucid. Programming system for constructing parallel and distributed
applications. Runs on a number of systems including PCs running
Linux.
- Haskell: A functional programming
language [Hudak92]. Haskell is a higher-order functional language
with a rich polymorphic type system and non-strict semantics (i.e.,
"lazy evaluation"). Its author describes Haskell as a
para-functional language to convey that it is an extension of a
pure functional language and that it includes constructs to
represent explicit parallelism. It uses a meta-language to keep the
functional semantics (what is computed) distinct from the
operational semantics (how is the computation carried out).
"para-Haskell" supports two kinds of parallel annotations:
scheduling constraints and mapping expressions. HPC++ (High
Performance C++): A standard model for parallel programming using
C++. A programming environment from Dennis Gannon's group at the
University of Indiana. It combines his past system pC++ and the
Caltech system CC++. It supports a basic data-parallel approach
thereby providing compatible interaction with HPF distribution
directives.
- HPF:
High Performance Fortran, from the HPF Forum. HPF is a
data parallel dialect of Fortran 90. Extensions have recently
emerged to support task-level parallelism, but the core of the
language and its historical roots are with data-parallel
programming. Most of HPF is directives and language constructs to
partition and distribute arrays among the nodes of a parallel
computer. HPF has not been very successful to date since algorithms
that are not strictly data-parallel are hard to implement with
HPF.
- JADA: JADA is
a Linda-like system that mixes Linda with Java. Multiple tuple
spaces are supported. These can be local (for coordinating between
threads) or remote (for coordinating between distinct applets
potentially distributed over the WWW). JADA was created as part of
the PageSpace (an ESPRIT funded project).
- Java Threads.
- Legion:
Legion is a metasystems project at the University of Virginia. It
provides the illusion of a single virtual machine to users; a
virtual machine that provides secure shared object and shared name
spaces, application-adjustable fault tolerance, improved response
time, and greater throughput. The physical systems can be
supercomputers, workstations, PCs, or even nontraditional computing
devices.
- Linda: The best-known coordination
language is Linda [Carriero91]. In Linda, coordination takes place
through a small set of operations that manipulate objects within a
distinct shared memory. The shared memory supports algorithms that
use high-level constructs such as distributed data structures and
anonymous communication (i.e., the sender and receiver don't know
the identity of one another). The commercial providers of Linda (Scientific Computing Associates
provide Fortran and C versions of the system. Also see JADA,
WWWinda, ISETL-Linda, ParLin, Eilean, P4-Linda, Glenda, POSYBL, and
Objective-Linda.
- Lucid: A parallel functional
language based on intensional logic. Lucid is an implicitly
parallel language. Lucid permits data structures such as arrays,
lists, and trees to be implemented in a manner that is easily
distributable. Lucid is simple and elegant. Lucid is not committed
to any particular model of computation so the writer of a Lucid
compiler has considerable freedom to implement language features in
a manner that cannot be interfered with from the user program.
[Szymanski].
- Mentat
(University of Virginia): First, the good folks at UVA would
probably want to add that Mentat follows a large-grain dataflow
approach and not a data-parallel one. This is an outstanding and
very flexible model. Mentat is an object-oriented parallel
processing system designed to directly address the difficulty of
developing architecture-independent parallel programs. The system
consists of a runtime system, a programming language, and a
monitor. Works on a cluster of Linux PCs. Contact:
mentat@uvacs.cs.virginia.edu. Also take a look at Legion.
- MPI (Message-Passing
Interface, Argonne National Laboratory (CRPC)): Works w/on:
Chameleon. Languages: C and Fortran. Implementations include WinMPI (MPI for MS
Windows 3.1) and
MPICH (A Portable Implementation of MPI). Contact: gropp@mcs.anl.gov (William Gropp).
Also available
here.
- Multiblock Parti: A programming
environment from Joel Saltz's group at the University of
Maryland.
-
NESL (CMU): NESL is a nested data parallel language. NESL
programs are build around a data type called a "sequence". Each
element of a sequence can be any of the atomic types in the
language or another sequence. Parallelism enters the picture
through an "apply-to-each" form over element of the sequence and
through parallel operations on sequences. The NESL program is
compiled into a stack-based intermediate vector code (VCODE). The
VCODE program is run on the target hardware through the VCODE
interpreter. [Hardwick96].
- NOW (Network Of
Workstations): Using a network of workstations to act as a
distributed supercomputer.
-
Occam (Oxford Univ.): Occam is one of the first languages
created explicitly for parallel computing. An Occam program is a
collection of processes that are composed either sequentially or in
parallel. The processes interact through explicit communication
channels. See also
KROC (Occam for all) (Univ. of Kent) and [Pountain86] and my
notes on channels in parallel programming environments.
- OpenMP.
- p4 (Portable
Programs for Parallel Processors, Argonne National Laboratory).
Works w/on: Heterogeneous computers. Languages: C and Fortran.
Contact: lusk@mcs.anl.gov.
Like PVM. Unlike PVM, however, monitors can be used in
shared-memory systems. Also available here.
- PAMS: A commercially supported
programming environment from Myrias. PAMS is a compiler-driven
system. The program identifies parallel loops and uses directives
to make them execute in parallel.
-
Papers: A coordination library designed to emphasize
low-latency communication. It includes synchronized aggregate
communication, reduction operations, a scan operation, some support
for parallel I/O. It supports a variation of the BSP model in that
communication occurs aggregately at a barrier. It breaks with BSP
in that only subsets of processors must participate in the barrier.
See "A parallel processing support library based on synchronized
aggregate communication" in the book Languages and compilers for
parallel computing, edited by Huang, et al.
- Parmacs: The parmacs macros package
[Lusk87] from Argonne National Laboratory is a coordination library
specialized to shared memory systems. This is the environment used
within the SPLASH project.
- pC: A shared memory abstraction of
message-passing from Ridg Scott's group at the University of
Houston. See the comments about pFortran.
- DOLIB:
- PCN from Argonne National Laboratory/California Institute of
Tech. Works w/on: Homogeneous computers. Languages: C and Fortran
can be incorporated. Contact:
tuecke@mcs.anl.gov. Also available here.
- PETSc
(Portable, Extensible, Toolkit for Scientific Computation): A
programming environment to support writing large-scale scientific
applications. It supports C primarily but it can also be mixed with
C++ and F77. PETSc comes from William Gropp's group at Argonne
National laboratory. It incorporates a variety of parallel data
structures including index sets, vectors, matrices (not merely
arrays, but parallel data structures), and distributed arrays. It
also includes libraries of solvers, preconditioners, ODE solvers,
and simple X-window graphics systems. At the simplest level, a
programmer can use PETSc by creating distributed data structures
and getting parallelism from the parallel libraries. I need to look
into it further and see how they make the data distribution visible
to the user. Typically, this is made opaque to the user (which is
smart), but if one needs to write one's own parallel routines that
must work with the PETSc libraries, this distribution must be
visible. This is an impressive package that deserves careful study.
For more information see [Curfman96]. It supports an impressive
range of systems including Windows NT/95.
- pFortran: pFortran is a member of
the "P" family of languages (pC, pC++, and pFortran) [Bagheri91].
All of these languages provide a high-level, shared-memory
abstraction for message-passing systems. pFortran programs use an
SPMD model. Any node can access data on another node using an "@"
notation. For example, a node can access data on node "J" as
D@J.
- PICL (Oak Ridge
National Laboratory): Works on a variety of multiprocessors (and
workstations?). Languages: C (and Fortran-to-C interface routines).
Contact:
worley@msr.epm.ornl.gov (Patrick H. Worley). A subroutine
library that implements a generic message-passing interface for a
variety of multiprocessors. It also provides time-stamped trace
data, if requested.
- POET: An object
oriented framework from Sandia National Laboratories in California.
[Armstrong96]. The POET framework views the data in terms of Cells:
the smallest unit of data that POET concerns itself with. The size
of the Cell must be large enough to amortize the overhead added by
the framework itself. However, it needs to be small enough to
support good load balancing. These Cells are distributed among the
nodes of the parallel system and that distribution is documented in
a partitionMAP object (which is replicated on each node and kept up
to date). The partitionMAP contains information about where each
cell is mapped and how cells (or parts of cells) are communicated
between processors. In terms of execution, the POET framework
provides an Exec component. This is a pure virtual class that has
one important method: "exec(void)". This method means "do
something, it's your turn". All components in the application will
inherit from Exec and will overload the exec(void) method. Using
this approach, a programmer create s a parallel application by
defining the cells and who the communicate, and then creating a
nested collection of Exec objects. The framework then executes by
running the topmost exec and then the other execs in the nest.
Parallel algorithms encapsulated as components in a Smalltalk-like
C++ framework. POET looks at a scientific parallel program as a
collection of such components linked and orchestrated by the
framework. Supports PCs under Linux. This approach is very
interesting and deserves further study.
- POOMA: The Parallel Object Oriented
Methods and Applications Framework from Los Alamos National
Laboratory (John Reynders's group). See [Atlas96]. This is a
narrowly defined framework for developing simulations of physical
systems. It includes physically motivated parallel objects such as
Particles and Fields as well as canonical mathematical methods
which can be applied to these parallel objects (e.g. gather/scatter
of Particles onto a Grid, Fourier transforms). POOMA is a layered
system of objects. Each object in the Framework is composed of or
utilizes objects from lower layers. Upper layers contain global
data objects that are abstractions of scientific problem domains.
Objects lower in the framework capture the abstractions relevant to
parallelism and efficient node-level computation (e.g.,
communication, domain decomposition, load balancing, etc.). An
important abstraction in POOMA is the virtual node (or vnode). When
a distributed object is created, it distributes itself among a
collection of vnodes. A map of the vnodes and which processors they
are mapped to is maintained by the VnodeManager.
- POSYBL
(Programming System for Distributed Applications, University of
Crete): Simple implementation of Linda-like system. The system
consists of a daemon that runs on every workstation in a cluster,
and a C-library of Linda like operations. Works w/on: Heterogeneous
computers. Languages: C. Contact: sxoinas@csd.uch.gr (Sxoinas
Ioannis).
-
pSather: See Sather.
- Pthreads: This is the standard way
to do threads in the UNIX world. This group of threads libraries
includes DCE threads and Solaris threads (which are also known as
UI or Unix International threads).
- PVM (Parallel Virtual
Machine, Oak Ridge National Laboratory): A software system that
enables a collection of heterogeneous computers to be used in
parallel. It includes libraries of user-callable functions and a
daemon program which coordinates inter-machine activity. Works
w/on: Heterogeneous computers. Languages: C and Fortran. Also
available here.
Contact:
pvm@msr.epm.ornl.gov. Bob Daniel at rcd@dash.co.uk mentioned a very fast
PVM that runs on top of NT (aside: Bob works for a company called
Dash that sells optimization software for PC's including SMP
boxes).
- Sather: An
object-oriented language with parameterized classes,
object-oriented dispatch, statically-checked strong typing,
separate implementation and type inheritance, multiple inheritance,
garbage collection, iteration abstraction, higher-order routines
and iters, exception handling, assertions, preconditions,
postconditions, and class invariants. pSather is a parallel
extension of Sather. It extends Sather by adding threads,
synchronization, and data distribution. Unlike actor languages,
multiple threads can execute in one object. It offers several
synchronization mechanisms like futures, gates, mutex,
reader/writer locks, barrier synchronization, rendezvous, and a
disjunctive lock statement.
-
SDDA: Scalable Distributed dynamic Array from the University of
Texas in Austin. See [Edwards96] for more information. SDDA is a
software infrastructure for developing complex dynamic data
structures on distributed memory multiprocessor systems. SDDA
provides all functions required to manage distributed dynamic data
structures. The central idea is an index space. The index space
defines a uniform global address space for an applications
distributed objects. Each object is associated with a unique index
into the SDDA. The object's index uniquely defines the location of
the object within the distributed memory environment. An
application creates, accesses, updates, and deletes objects in the
SDDA via the associated SDDA index. Location of the object is
transparent, and the access API is the same for both remote and
local objects. SDDA uses a hashing technique over the indices to
preserve locality and optimize object access. It sits on top of
MPI.
- Sisal: Sisal [Feo90]
is a functional programming language. It has been heavily used in
shared-memory environments. There has been work to move it to
distributed-memory environments, but this work hasn't led to a
robust distributed-memory implementation. [Cann92]. Sisal extracts
parallelism from a program using a data-dependence analysis. The
language has no explicit parallel constructs. Sisal guarantees
repeatable results in a multiprocessor environment.
Split-C: Parallel extension to C with global address space for
distributed-memory multiprocessors. Split-C is an extension of the
C programming language offering a global address space. It assumes
a single program multiple data (SPMD) model in which each of the
CPUs has a single thread of control and the memory model is a
two-dimensional array, where one dimension is the set of CPUs and
the other dimension is each processor's local address space.
Accesses to memory locations on a remote node are compiled to code
fetching from or putting data to that remote processor. Split-C
allows one to overlap communication of communication and
computation by using split phase operations (called gets, puts, and
stores). Split-C is available on a variety of supercomputers,
including the TMC CM-5, the Meiko CS-2, Intel Paragon, and IBM SP-2
machines, as well as for networks of workstations. Most
implementations, including our SCI implementation, are based on
Active Messages.
- SR
(<SR> Synchronizing Resources): Concurrent programming
language. The SR language is a public-domain language that runs on
Unix multiprocessor machines and on workstations connected over a
LAN. It appears to be a whole new language as opposed to a language
extension. See [Olsson92] and
[Andrews93].
- Sthreads: A
threads-based programming environment from Caltech. The system
consists of a pragma and a collection of synchronization
primitives. If used according to some narrowly defined rules, an
Sthreads program is guaranteed to execute and to produce the same
result in sequential and multi-threaded modes. The underlying
library used to implement the pragmas is also provided with
Sthreads and made visible to the user.
-
Strand: Strand is a parallel language based on concurrent logic
programming [Foster90]. It is very similar to flat concurrent
Parlog. A discussion of its use in scientific programming can be
found in [Mattson90]. The language was developed for commercial
distribution, but it is currently freely available.
- TCGMSG
(Theoretical Chemistry Group Message Passing System, Argonne
National Laboratory): Works w/on: Heterogeneous computers.
Languages: C and Fortran. Contact: rj_harrison@pnl.gov" (Robert J.
Harrison). Like PVM and p4. PARMACS inspired the independent
implementation of TCGMSG, a much simpler but much more robust
package ... at that time the authors of PARMACS were not interested
in doing more work in that area. p4 was subsequently and
independently written largely by two of the original authors of
PARMACS. The current version (4.05) of TCGMSG message-passing
library is a part of the Global Arrays toolkit (providing a
shared-memory programming model for most major parallel
architectures) distribution. It is located on an anonymous ftp
server: ftp.pnl.gov (192.35.193.200), file:
/pub/global/global1.2.tar.Z TCGMSG will be upgraded soon to version
5 that will include asynchronous communication for networks of
workstations and ports to the SGI Power Challenge and Cray
T3D.
- Threads.h++: A
commercially supported product from Rogue Wave Software. It
provides a moderately high-level interface for writing portable
multithreaded programs. It includes basic synchronization
primitives (monitors, mutexes, etc.), futures (which they call
IOUs), thread creation, and a slick way to easily take procedures
and turn them into threads. It's a large and rather complete
package. It's an impressive package. While higher level than NT or
Java threads, it still may be too high level for our needs.
-
TreadMarks: A user-level software-based Distributed Shared
Memory system [Amza96]. Provides a global name space ion top of
physically distributed memory. Synchronization is managed with
barriers and mutex locks. Shared data resides in Fortran Common
blocks. TreadMarks uses a relaxed consistency model for the shared
memory. TreadMarks is commercially supported [Amza96]. It supports
networks of PCs (currently only Unix environments, but soon on NT
as well). Interfaces exist for C, C++, Fortran, and Java.
- Vienna Fortran (VFCS): A
data-parallel Fortran dialect that had a major impact on the
formation of HPF. It is from the Zima group in Vienna.
- Win32 threads.
- WinPar: A
message-passing (MPI and PVM) based environment to support parallel
computing on Intel Architecture workstations. NT is the platform of
choice for WINPAR. WINPAR is an integrated software development
environment for parallel computing targeting personal computers
interconnected by local area networks running Windows NT. The
technical objectives of WINPAR are to provide a message-passing
layer including MPI and PVM, to provide a framework of basic
functionality needed for parallel computing and to provide a set of
tools for code development, simulation, performance prediction,
graphical high-level debugging, monitoring, and visualization of
parallel applications. The commercial objectives of WINPAR are to
offer an affordable parallel development environment for training
and education at universities, research organizations, and
industry, to be compliant to existing standards in the HPCN market,
and thus to extend this currently only UNIX- and MPP-based market
to networked Windows NT computers. The WINPAR environment will be
developed using existing state-of-the-art tools with easy-to-use
graphical user interfaces which are already available for UNIX.
These tools, including AUGUR, MOD ARCH, ParadiseC++, TRAPPER, WPVM,
and WMPI, will be enhanced, integrated, and ported to Windows NT.
Some of these tools contain large modules dealing with user
interaction and visualization. As the experience with previous
ports from UNIX to Windows NT showed, it is often quicker to
completely re-engineer the graphical user interfaces. In this
process overlap areas between different tools will be eliminated by
introducing common data structures and modules. Modern integration
techniques like OLE automation will be used to achieve a tight
integration of the tools and at the same time to provide open
interfaces for future extensions of the environment. The usage of a
commercially available multi-platform C++ object library for
graphical user interfaces will ensure that the WINPAR environment
is available for both UNIX and Windows NT and shares a common look
and feel.
- ZPL:
ZPL is a data-parallel language based on the phase abstractions
programming model [Lin]. ZPL is a subset of ORCA-C specialized to
solving data-parallel computations. It is based on the phase
abstraction model and the CTA (candidate type architecture). ZPL
provides a global view of the computation so the programmer sees a
single address space, and all parallelism is implicitly specified.
ZPL uses the concept of a region at the core of its parallelism. A
region is a set of indices. Once defined, one can specify array
operations by referencing the involved arrays and the index region.
Offsets into regions can be specified to allow relative referencing
of array elements. This is combined with special operations to
handle array boundaries plus array reduction and scan operations.
The ZPL compiler is targeted to the CTA machine model.
(Links are not currently available for all
environments.)