Supercomputing Institute Research Bulletin

Fall 1997

International Workshop on Languages and Compilers for Parallel Computing Explores Architecture of New Computing Platforms

workshop details at:
http://www.msi.umn.edu/Symposia/Compiler/LCPC.html

The International Workshop on Languages and Compilers for Parallel Computing celebrated its tenth anniversary this August. The workshop was held at the University of Minnesota, with 74 scientists and researchers in attendance. It was sponsored by the Supercomputing Institute, the University of Minnesota Department of Computer Science, and Cray Research Inc.

Many workshop speakers explored recent trends in computer system architecture that incorporate multi-level parallelism and multi-level memory hierarchy. The talks fell into four broad categories: data locality enhancement, parallel programming models and languages, automatic program parallelization, and efficient synchronization and communication mechanisms. Highlights of the workshop are presented below.

Data Locality Enhancement

Six research groups presented work on this topic. A group from the University of California at San Diego, led by Jeanne Ferrante and Larry Carter, discussed a model for quantifying the multi-level nature of a program restructuring technique on loop structures, called tiling. Current practice, they noted, is to perform level-by-level loop optimization using a distinct objective at each loop level. However, this strategy does not conform to the architectural trend toward a hierarchical memory system and multi-level program parallelism. Ferrante and Carter presented analyses and simulation results and proposed some multi-level cost functions which can guide the choice of tile sizes and shapes for better performance.

Jingling Xue of the University of New England in Armidale, Australia, and Chua-Huang Huang of Ohio State University discussed their efforts to apply unimodular transformation and tiling techniques on scientific programs to improve cache performance. Their work focused on strategies used to determine an appropriate subspace in a loop iteration space for tiling. They found that in the special case of fully permutable loop nests, their method can obtain an optimal iteration subspace.

Aaron Sawdey and Matthew O’Keefe of the University of Minnesota presented a program analysis tool, called TOPAZ, that determines the overlap areas in self-similar parallel programs. Program transformations can then be applied to the overlapped areas. They showed some early results using the Miami Isopycnic-Coordinate Ocean Model program as the benchmark on the Cray T3D computer. They reported a speedup of 24.12 on 64 processors over the result obtained from running the program on two processors.

Parallel Programming Models and Parallel Languages

Talks on this subject focused on High Performance FORTRAN (HPF), FORTRAN 90, Java, hybrid languages, and thread parallelism. Jan Borowiec of GMD FIRST Research Institute, Berlin, and Arthur Veen of Parallel Computing, Amsterdam, described an algorithm that reduces the complexity of the procedure interface in HPF compilers. The algorithm has been adopted in the commercial HPF compiler produced by the PREPARE project.

Bryan Carpenter of Northeast Parallel Architectures Center at Syracuse University described an effort supported by the Advanced Research Projects Agency (ARPA) Parallel Compiler Runtime Consortium (PCRC). In particular, they discussed the design and implementation of an HPF compilation system based on PCRC runtime library. He compared the NPAC compiler with PGI HPF, a commercial HPF compiler, and reported that for the Laplace benchmark, which performs Jacobi relaxation on a 1024 by 1024 array, the code generated by the NPAC compiler ran faster by up to nearly a factor of two on an eight-node IBM SP2 machine.

Several other groups discussed extensions to existing parallel languages. Robert W. Numrich of SGI/Cray described F— —, an extension to FORTRAN 90 that adopts a single-program-multiple-data programming model and uses explicit data transfers between remote and local memory images for data communication. Numrich concluded that F— — satisfies a need for computational scientists to have user-friendly, portable extensions that enable parallelism in a broad context. Guillermo Trabado and Emilio Zapata of the University of Málaga Complejo Tecnólogico in Málaga, Spain presented an extension to HPF for exploiting locality in irregular problems. They also addressed the lack of support in HPF for solving irregular problems.

Xavier Martorell, representing a research group at Polytechnic University of Catalunya, Spain, discussed the topic of control parallelism. He presented a programming model oriented toward the hierarchical exploitation of unstructured parallelism in multiprogrammed multiprocessor systems. The model offers a set of directives targeted toward FORTRAN programmers. Those directives allow the programmers to express parallelism in application programs. The compiler is responsible for the generation of code that efficiently exploits and manages parallelism at run time. The code runs on top of a user-level thread library that allows dynamic adaptation of parallelism at run time.

Dwip Banerjee and J. C. Browne of the University of Texas at Austin presented a case study of programming in a parallel programming system that targets array-oriented computations. The programming system is based on an integrated graphical and declarative representation of control parallelism and data partition. The example used in the case study was an even-odd reduction of block triangular matrices. The program formulated in the integrated representation revealed parallelism not shown in the original algorithm. The program resulting from using their system showed near-linear speedup across all three phases of the computation for the number of processors ranging from two to 32.

Explicitly parallel programs present new problems to compilers. Vivek Sarkar of Massachusetts Institute of Technology discussed analysis and optimization of explicitly parallel programs using the Parallel Program Graph representation. Jaejin Lee and David Padua of the University of Illinois at Urbana—Champaign and Samuel P. Midkiff of IBM’s T. J. Waston Research Center discussed their work on the Concurrent Static Single Assignment (CSSA) form and presented a transformation algorithm for explicitly parallel programs with interleaving semantics using post-wait synchronization.

John Mellor-Crummey and Vikram Adve of Rice University presented compiler algorithms used in the Rice dHPF compiler, which can simplify the control flow in its generated parallel code. They showed that those algorithms are effective in reducing the number of conditional statements, code size, and overall execution time in the generated parallel code. They experimented with three benchmark programs: TOMCATV, ERLEBACHER and JACOBI, and found they can reduce code size by up to 33 percent, reduce the number of conditional statements by up to 66 percent, and reduce execution time by up to 15 percent, using the proposed algorithms.

Automatic Parallelization and Instruction Level Parallelism

New parallel architecture features present new opportunities for automatic program parallelization. Jenn-Yuan Tsai of the University of Illinois presented a paper, co-authored by Zhenzhen Jiang and Pen-Chung Yew of the University of Minnesota, that addressed program transformation techniques for concurrent multithreaded architectures–in particular, a new “superthreaded” architecture. The new architecture adopts a thread pipelining execution model that allows threads with data dependencies and control dependencies to be executed in parallel. The group evaluated the effectiveness of their program transformation techniques by manually compiling several benchmark programs using their compiler algorithms and running the transformed programs through a trace-driven, cycle-by-cycle processor simulator. They showed that a superthreaded processor can achieve promising speedups for most of the SPEC benchmark programs.

In order to enhance the usability of parallelizing compilers, a group from Purdue University led by Rudolf Eigenmann worked on an environment that allows better interaction between parallelizing compilers and users. They aimed at making the parallelization results and their performance data more accessible and understandable.

Synchronization and Communication

Chau-Wen Tseng of the University of Maryland discussed the issue of reducing synchronization overhead for compiler-parallelized codes on software distributed-shared-memory (DSM) systems. His group believes that DSM systems provide a good target for parallelizing compilers because of their flexibility. Synchronization and load imbalance, however, are significant sources of overhead in such systems. They investigated the impact of their compilation techniques on eliminating barrier synchronization overhead in software DSMs. Experiments on an IBM SP2 indicate that these techniques can improve parallel performance by 20 percent, on average, and by up to 60 percent for some applications.

Xin Yuan, Rajiv Gupta, and Rami Melhem of the University of Pittsburgh presented a global communication optimizer based on an array data flow analysis. This optimizer reduces analysis time by partitioning data flow problems into subproblems and then solving the subproblems one at a time, in a demand-driven manner. Their experiments suggest that using array data flow analysis for communication optimization can be efficient and effective.

Keynote Speech and Special Session

The workshop’s keynote speaker was Ken Kennedy, Noah Harding Professor of Computer Science at Rice University. Kennedy co-chairs President Clinton’s Advisory Committee on High-Performance Computing and Communications, Information Technology, and the Next Generation Internet. About ten years ago, Kennedy led the effort in making High-Performance FORTRAN (HPF) an industry standard, and he was invited to provide an overview of this effort as part of a celebration for the tenth anniversary of this workshop. Kennedy presented a retrospective of of high performance FORTRAN, describing the language design and compiler issues for HPF and its progress over the past ten years. He discussed the impact of HPF on parallel computing and described recent progress in compiling HPF programs.

A special session, titled “SUIF Compiler Infrastructure,” was organized by Monica Lam of Stanford University and Martin Rinard of MIT. Lam and Rinard outlined the goals and technical details of an ARPA-funded national parallelizing compiler infrastructure project. Several universities will participate in the project, which will develop a multi-language parallelizing compiler infrastructure for academic and industrial research. The initial languages supported by this common compiler infrastructure include FORTRAN, C, C++, and Java. Lam and Rinard elaborated on the design of the compiler’s internal data structures, the extensibility of such data structures to other programming languages, the compilation techniques to be provided, and the project’s schedule and current status. The special session generated great interest and enthusiasm among audience members.


In This Issue:

1997 Research Scholars

LCPC Workshop

T3E Upgrade

Computing Applications in Neuroscience

Unraveling Protein Structures

Silicon Nanocrystals

Research Reports


[BULLETINS]


[HOMEPAGE]

 

This information is available in alternative formats upon request by individuals with disabilities. Please send email to alt-format@msi.umn.edu or call 612-624-0528.
 

URL: http://
This page last modified on  
Website related questions or problems should be directed to webmaster@msi.umn.edu
The Supercomputing Institute does not collect personal information on visitors to our website. For the University of Minnesota policy, see www.privacy.umn.edu.