ACM DL

Parallel Computing (TOPC)

Menu

Search Issue
enter search term and/or author name

Archive


ACM Transactions on Parallel Computing - Special Issue on PPOPP 2012, Volume 1 Issue 2, January 2015



Section: Special Issue on PPOPP'12

Introduction to the Special Issue on PPoPP'12
Keshav Pingali, J. Ramanujam, P. Sadayappan
Article No.: 9
DOI: 10.1145/2716343

Algorithm-Based Fault Tolerance for Dense Matrix Factorizations, Multiple Failures and Accuracy
Aurelien Bouteiller, Thomas Herault, George Bosilca, Peng Du, Jack Dongarra
Article No.: 10
DOI: 10.1145/2686892

Dense matrix factorizations, such as LU, Cholesky and QR, are widely used for scientific applications that require solving systems of linear equations, eigenvalues and linear least squares problems. Such computations are normally carried out on...

Avoiding Communication in Successive Band Reduction
Grey Ballard, James Demmel, Nicholas Knight
Article No.: 11
DOI: 10.1145/2686877

The running time of an algorithm depends on both arithmetic and communication (i.e., data movement) costs, and the relative costs of communication are growing over time. In this work, we present sequential and distributed-memory parallel...

Collective Algorithms for Multiported Torus Networks
Paul Sack, William Gropp
Article No.: 12
DOI: 10.1145/2686882

Modern supercomputers with torus networks allow each node to simultaneously pass messages on all of its links. However, most collective algorithms are designed to only use one link at a time. In this work, we present novel multiported algorithms...

Lock Cohorting: A General Technique for Designing NUMA Locks
David Dice, Virendra J. Marathe, Nir Shavit
Article No.: 13
DOI: 10.1145/2686884

Multicore machines are quickly shifting to NUMA and CC-NUMA architectures, making scalable NUMA-aware locking algorithms, ones that take into account the machine's nonuniform memory and caching hierarchy, ever more important. This article presents...

High-Performance and Scalable GPU Graph Traversal
Duane Merrill, Michael Garland, Andrew Grimshaw
Article No.: 14
DOI: 10.1145/2717511

Breadth-First Search (BFS) is a core primitive for graph traversal and a basis for many higher-level graph analysis algorithms. It is also representative of a class of parallel computations whose memory accesses and work distribution are both...

Section: Special Issue on PPOPP'12

SciPAL: Expression Templates and Composition Closure Objects for High Performance Computational Physics with CUDA and OpenMP
Stephan C. Kramer, Johannes Hagemann
Article No.: 15
DOI: 10.1145/2686886

We present SciPAL (scientific parallel algorithms library), a C++-based, hardware-independent open-source library. Its core is a domain-specific embedded language for numerical linear algebra. The main fields of application are...

Power Management of Extreme-Scale Networks with On/Off Links in Runtime Systems
Ehsan Totoni, Nikhil Jain, Laxmikant V. Kale
Article No.: 16
DOI: 10.1145/2687001

Networks are among major power consumers in large-scale parallel systems. During execution of common parallel applications, a sizeable fraction of the links in the high-radix interconnects are either never used or are underutilized. We propose a...