ACM DL

Parallel Computing (TOPC)

Menu

Search Issue
enter search term and/or author name

Archive


ACM transactions on parallel computing: An introduction
Phillip B. Gibbons
Article No.: 1
DOI: 10.1145/2661651

Techniques

...

Section: ACM transactions on parallel computing

Introduction
David J. Lilja
Article No.: 2
DOI: 10.1145/2609798

Enhancing Performance Optimization of Multicore/Multichip Nodes with Data Structure Metrics
Ashay Rane, James Browne
Article No.: 3
DOI: 10.1145/2588788

Program performance optimization is usually based solely on measurements of execution behavior of code segments using hardware performance counters. However, memory access patterns are critical performance limiting factors for today's multicore...

Adaptive Prefetching on POWER7: Improving Performance and Power Consumption
Víctor Jiménez, Francisco J. Cazorla, Roberto Gioiosa, Alper Buyuktosunoglu, Pradip Bose, Francis P. O'Connell, Bruce G. Mealey
Article No.: 4
DOI: 10.1145/2588889

Hardware data prefetch engines are integral parts of many general purpose server-class microprocessors in the field today. Some prefetch engines allow users to change some of their parameters. But, the prefetcher is usually enabled in a default...

Architecture and Performance of the Hardware Accelerators in IBM’s PowerEN Processor
Timothy Heil, Anil Krishna, Nicholas Lindberg, Farnaz Toussi, Steven Vanderwiel
Article No.: 5
DOI: 10.1145/2588888

Computation at the edge of a datacenter has unique characteristics. It deals with streaming data from multiple sources, going to multiple destinations, often requiring repeated application of one or more of several standard algorithmic kernels....

Section: ACM transactions on parallel computing

A methodology for automatic generation of executable communication specifications from parallel MPI applications
Xing Wu, Frank Mueller, Scott Pakin
Article No.: 6
DOI: 10.1145/2660249

Portable parallel benchmarks are widely used for performance evaluation of HPC systems. However, because these are manually produced, they generally represent a greatly simplified view of application behavior, missing the subtle but...

Automatic parallelization of a class of irregular loops for distributed memory systems
Mahesh Ravishankar, John Eisenlohr, Louis-Noël Pouchet, J. Ramanujam, Atanas Rountev, P. Sadayappan
Article No.: 7
DOI: 10.1145/2660251

Many scientific applications spend significant time within loops that are parallel, except for dependences from associative reduction operations. However these loops often contain data-dependent control-flow and array-access patterns. Traditional...

A simple parallel cartesian tree algorithm and its application to parallel suffix tree construction
Julian Shun, Guy E. Blelloch
Article No.: 8
DOI: 10.1145/2661653

We present a simple linear work and space, and polylogarithmic time parallel algorithm for generating multiway Cartesian trees. We show that bottom-up traversals of the multiway Cartesian tree on the interleaved suffix array and longest common...