
These researchers investigated architectural tradeoffs for future processor designs. For this work, the group used a detailed execution-driven simulator with the spec cpu 2000 benchmark programs (see www.spec.org). Due to the computationally intensive nature of the simulator and the large input sizes of the current spec benchmark programs, this research used the computing and memory resources of the Supercomputing Institute Origin and SP supercomputers.
In building new generations of microprocessors with ever-higher performance, CPU designers always face two major challenges. The first is to tackle the scalability issue when increasing the number of instructions issued per machine cycle while simultaneously shortening the clock cycle time. The second challenge is to bridge the worsening “speed gap” between the processor and its memory. Previously, the group proposed the “superthreaded” processor architecture in order to address these challenges by exploiting more sources of parallelism in application programs through both instruction-level and thread-level speculation with compiler-assisted cross-thread runtime data dependence checking.
One of the main goals of this project was to develop new compiler technology needed to support the superthreaded architecture. Preliminary results showed excellent potential for reducing application execution time with this architecture, but they also highlighted the need to deal more aggressively with the memory latency problem. In this group’s more recent work, the researchers developed a processor simulator to help validate the superthreaded architecture.
To address one aspect of this goal, the researchers focused on the use of high-performance computing in speculative multithreading. Very large-scale integration (VLSI) transistor densities now allow entire systems to be built on a single chip. However, smaller improvements in processor clock speeds force processor designers to exploit exotic techniques to improve performance. The superthreaded processor architecture was developed with the expectation that this would improve the performance of application programs that are difficult to parallelize using traditional approaches. This new architecture is a hybrid of a high-performance microprocessor, such as the Intel Pentium, and a multiprocessor, such as the SGI Origin. Unlike these traditional systems, however, this approach relies heavily on speculation in which the processor “guesses” which instructions the program will execute next. Run-time verification ensures that the compiler’s speculation was correct. A special operation was executed to fix the processor’s state if the speculated outcome was wrong. The researchers developed a detailed, cycle-accurate simulator of the superthreaded processor to evaluate its performance potential. This simulator actually executes any application program that has been compiled using an integrated parallelizing compiler. While executing the application program, the simulator estimates the number of cycles that an actual superthreaded processor would require to execute the program. The simulator allows the designer to vary the number of thread processing units, the capabilities of each unit, and other important parameters.
Another area of interest included compiler-assisted sub-block reuse. The fact that instructions in programs often produce repetitive results has motivated researchers to explore various alternatives to exploit this value locality, such as value prediction and value reuse. Value prediction improves the available Instruction- Level Parallelism by allowing dependent instructions to be executed speculatively after predicting the values of their operands. Value reuse, on the other hand, attempts to remove redundant computations by buffering the previously produced results of instructions and skipping the execution of instructions with repeating inputs. Previous value reuse mechanisms used a single instruction or a naturally formed instruction group, such as a basic block, a trace, or a function, as the reuse unit. These naturally formed instruction groups are readily identifiable by the hardware at run-time without compiler assistance. However, the performance potential of a value reuse mechanism depends on its reuse detection time, the number of reuse opportunities, and the amount of work saved by skipping each reuse unit. Since larger instruction groups typically have fewer reuse opportunities than smaller groups, but also provide greater benefit for each reusedetection process, it was found to be very important to find the balance point that provides the largest overall performance gain. To this end, these researchers proposed a new mechanism called sub-block reuse to intelligently group instructions into reuse units using compiler-assistance. The goal was to balance the reuse granularity and the number of reuse opportunities. The sub-blocks were produced by slicing the basic blocks at compile-time using appropriate dataflow considerations. The simulations with the specint95 benchmarks showed that sub-block reuse with compiler assistance has a substantial and consistent potential to improve the performance of superscalar processors with speedups ranging from 1022%.
The researchers also developed a complexity-effective verification technique utilizing the Cray SV2 Cache Coherence Protocol. In this instance, modern large-scale multiprocessors, capable of scaling to hundreds or thousands of processors, have proven to be very difficult to design and verify in a timely manner. In particular, the verification process (i.e., proving that the design is functionally correct), is often the most time-consuming aspect of developing the system. The group also proposed a method of dealing with the verification complexity of a directory-based coherence protocol, providing the framework for a methodology that is built on a formal model of the coherence protocol, a language, and the register transfer language (RTL) implementation. This approach was used to verify the SV2 directory-based coherence protocol.
Lakeri Bhende, Graduate Student Researcher
Ying Chen, Graduate Student Researcher
Peng-Fei Chuang, Graduate Student Researcher
Jedediah A. Deitrick, Supercomputing Institute Undergraduate Intern
Bud Fox, Research Associate
Robert Glamm, Graduate Student Researcher
Chris J. Hescott, Undergraduate Student Researcher
Peter D. Holm, Supercomputing Institute Undergraduate Intern
Jian Huang, Graduate Student Researcher
Kettly Joseph, Undergraduate Student Researcher
Baris Kazar, Graduate Student Researcher
Iffat Kazi, Graduate Student Researcher
Jeremy Kizer, Visiting Researcher
A. J. Klein Osowski, Graduate Student Researcher
Syreeta Knight, Undergraduate Student Researcher
Michael Morse, Undergraduate Student Researcher
Mark Nguyen, Undergraduate Student Researcher
Keith Osowski, Graduate Student Researcher
Resit Sendag, Graduate Student Researcher
Arun Venkatesan, Graduate Student Researcher
Keqiang Wu, Graduate Student Researcher
Joshua J. Yi, Graduate Student Researcher
Qing Zhao, Graduate Student Researcher
This information is available in alternative formats upon request by
individuals with disabilities. Please send email to
alt-format@msi.umn.edu
or call 612-624-0528.
HOME
|
QUESTIONS |
FEEDBACK
Events |
Links |
People |
Programs |
Publications |
Support |
Welcome
|
|
URL: http:// |
|
| This page last modified on | ||
| Please direct questions or problems to help@msi.umn.edu | ||
|
Website related questions or problems should be directed to
webmaster@msi.umn.edu
The University of Minnesota Supercomputing Institute does not collect personal information on visitors to our website. For the University of Minnesota policy, see www.privacy.umn.edu. © 2002 by the Regents of the University of Minnesota |
||