BOOKQUE

UPC: Distributed Shared Memory Programming
Tarek El-Ghazawi , William Carlson , Thomas Sterling , Katherine Yelick
· 2005
This is the first book to explain the language Unified Parallel C and its use. Authors El-Ghazawi, Carlson, and Sterling are among the developers of UPC, with close links with the industrial members of the UPC consortium. Their text covers background material on parallel architectures and algorithms, and includes UPC programming case studies. This book represents an invaluable resource for the growing number of UPC users and applications developers. More information about UPC can be found at: http://upc.gwu.edu/ An Instructor Support FTP site is available from the Wiley editorial department.
No image available
Dense and Sparse Matrix Operations on the Cell Processor
Leonid Oliker , Katherine Yelick , Parry Husbands , John Shalf , Samuel W. Williams
· 2005
The slowing pace of commodity microprocessor performance improvements combined with ever-increasing chip power demands has become of utmost concern to computational scientists. Therefore, the high performance computing community is examining alternative architectures that address the limitations of modern superscalar designs. In this work, we examine STI's forthcoming Cell processor: a novel, low-power architecture that combines a PowerPC core with eight independent SIMD processing units coupled with a software-controlled memory to offer high FLOP/s/Watt. Since neither Cell hardware nor cycle-accurate simulators are currently publicly available, we develop an analytic framework to predict Cell performance on dense and sparse matrix operations, using a variety of algorithmic approaches. Results demonstrate Cell's potential to deliver more than an order of magnitude better GFLOP/s per watt performance, when compared with the Intel Itanium2 and Cray X1 processors.
No image available
WTEC Panel Report on High-end Computing Research and Development in Japan
Alvin W. Trivelpiece , Rupak Biswas , Jack Dongarra , Peter Paul , Katherine Yelick , United States. National Aeronautics and Space Administration , United States. Department of Energy. Office of Science , United States. National Coordination Office for Information Technology Research and Development , World Technology Evaluation Center , National Science Foundation (U.S.)
· 2004
This study complements three others underway at about the same time, all inspired by the challenge presented by the achievements of the Japanese Earth Simulator in taking the lead as the world's fastest supercomputer in March 2002.
No image available
The Potential of the Cell Processor for Scientific Computing
Leonid Oliker , Katherine Yelick , Parry Husbands , John Shalf , Samuel Williams , Shoaib Kamil
· 2005
The slowing pace of commodity microprocessor performance improvements combined with ever-increasing chip power demands has become of utmost concern to computational scientists. As a result, the high performance computing community is examining alternative architectures that address the limitations of modern cache-based designs. In this work, we examine the potential of the using the forth coming STI Cell processor as a building block for future high-end computing systems. Our work contains several novel contributions. We are the first to present quantitative Cell performance data on scientific kernels and show direct comparisons against leading superscalar (AMD Opteron), VLIW (IntelItanium2), and vector (Cray X1) architectures. Since neither Cell hardware nor cycle-accurate simulators are currently publicly available, we develop both analytical models and simulators to predict kernel performance. Our work also explores the complexity of mapping several important scientific algorithms onto the Cells unique architecture. Additionally, we propose modest microarchitectural modifications that could significantly increase the efficiency of double-precision calculations. Overall results demonstrate the tremendous potential of the Cell architecture for scientific computations in terms of both raw performance and power efficiency.
No image available
Orphelinat de Tracadie, N.-B.: sous la direction des religieuses hospitalières de Saint-Joseph
No author available
· 19??
No image available
Communication Optimizations for Fine-Grained UPCApplications
Katherine Yelick , Costin Iancu , Wei-Yu Chen
· 2005
Global address space languages like UPC exhibit high performance and portability on a broad class of shared and distributed memory parallel architectures. The most scalable applications use bulk memory copies rather than individual reads and writes to the shared space, but finer-grained sharing can be useful for scenarios such as dynamic load balancing, event signaling, and distributed hash tables. In this paper we present three optimization techniques for global address space programs with fine-grained communication: redundancy elimination, use of split-phase communication, and communication coalescing. Parallel UPC programs are analyzed using static single assignment form and a data flow graph, which are extended to handle the various shared and private pointer types that are available in UPC. The optimizations also take advantage of UPC's relaxed memory consistency model, which reduces the need for cross thread analysis. We demonstrate the effectiveness of the analysis and optimizations using several benchmarks, which were chosen to reflect the kinds of fine-grained, communication-intensive phases that exist in some larger applications. The optimizations show speedups of up to 70 percent on three parallel systems, which represent three different types of cluster network technologies.
No image available
Data Sharing Analysis for Titanium
Benjamin Liblit , Alexander Aiken , Katherine Yelick
· 2001
No image available
Generating Permutation Instructions from a High-level Description
Manikandan Narayanan , Katherine Yelick
· 2003
No image available
Polynomial-time Algorithms for Enforcing Sequential Consistency in SPMD Programs with Arrays
Wei-Yu Chen , Arvind Krishnamurthy , Katherine Yelick
· 2003
No image available
Proceedings of the 12th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming
Katherine Yelick
· 2007

UPC: Distributed Shared Memory Programming

Dense and Sparse Matrix Operations on the Cell Processor

WTEC Panel Report on High-end Computing Research and Development in Japan

The Potential of the Cell Processor for Scientific Computing

Orphelinat de Tracadie, N.-B.: sous la direction des religieuses hospitalières de Saint-Joseph

Communication Optimizations for Fine-Grained UPCApplications

Data Sharing Analysis for Titanium

Generating Permutation Instructions from a High-level Description

Polynomial-time Algorithms for Enforcing Sequential Consistency in SPMD Programs with Arrays

Proceedings of the 12th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming