enter search term and/or author name
ACM transactions on parallel computing: An introduction
Phillip B. Gibbons
Article No.: 1
Enhancing Performance Optimization of Multicore/Multichip Nodes with Data Structure Metrics
Ashay Rane, James Browne
Article No.: 3
Program performance optimization is usually based solely on measurements of execution behavior of code segments using hardware performance counters. However, memory access patterns are critical performance limiting factors for today's multicore...
Adaptive Prefetching on POWER7: Improving Performance and Power Consumption
Víctor Jiménez, Francisco J. Cazorla, Roberto Gioiosa, Alper Buyuktosunoglu, Pradip Bose, Francis P. O'Connell, Bruce G. Mealey
Article No.: 4
Hardware data prefetch engines are integral parts of many general purpose server-class microprocessors in the field today. Some prefetch engines allow users to change some of their parameters. But, the prefetcher is usually enabled in a default...
Architecture and Performance of the Hardware Accelerators in IBM’s PowerEN Processor
Timothy Heil, Anil Krishna, Nicholas Lindberg, Farnaz Toussi, Steven Vanderwiel
Article No.: 5
Computation at the edge of a datacenter has unique characteristics. It deals with streaming data from multiple sources, going to multiple destinations, often requiring repeated application of one or more of several standard algorithmic kernels....
Section: ACM transactions on parallel computing
A methodology for automatic generation of executable communication specifications from parallel MPI applications
Xing Wu, Frank Mueller, Scott Pakin
Article No.: 6
Portable parallel benchmarks are widely used for performance evaluation of HPC systems. However, because these are manually produced, they generally represent a greatly simplified view of application behavior, missing the subtle but...
Automatic parallelization of a class of irregular loops for distributed memory systems
Mahesh Ravishankar, John Eisenlohr, Louis-Noël Pouchet, J. Ramanujam, Atanas Rountev, P. Sadayappan
Article No.: 7
Many scientific applications spend significant time within loops that are parallel, except for dependences from associative reduction operations. However these loops often contain data-dependent control-flow and array-access patterns. Traditional...
A simple parallel cartesian tree algorithm and its application to parallel suffix tree construction
Julian Shun, Guy E. Blelloch
Article No.: 8
We present a simple linear work and space, and polylogarithmic time parallel algorithm for generating multiway Cartesian trees. We show that bottom-up traversals of the multiway Cartesian tree on the interleaved suffix array and longest common...