ACM Transactions on

Parallel Computing (TOPC)

Latest Articles

Guest Editor Introduction PPoPP 2016, Special Issue 2 of 2


We present efficient locking mechanisms for hierarchical data structures. Several applications work on an abstract hierarchy of objects, and a parallel execution on this hierarchy necessitates synchronization across workers operating on different parts of the hierarchy. Existing synchronization mechanisms are too coarse, too inefficient, or too ad... (more)


High memory contention is generally agreed to be a worst-case scenario for concurrent data structures. There has been a significant amount of research effort spent investigating designs that minimize contention, and several programming techniques have been proposed to mitigate its effects. However, there are currently few architectural mechanisms... (more)

Hybridizing and Relaxing Dependence Tracking for Efficient Parallel Runtime Support

It is notoriously challenging to develop parallel software systems that are both scalable and correct. Runtime support for parallelism—such as... (more)


This article presents estima, an easy-to-use tool for extrapolating the scalability of in-memory applications. estima is designed to perform a simple yet important task: Given the performance of an application on a small machine with a handful of cores, estima extrapolates its scalability to a larger machine with more cores, while requiring minimum... (more)

Efficient Data Streaming Multiway Aggregation through Concurrent Algorithmic Designs and New Abstract Data Types

Data streaming relies on continuous queries to process unbounded streams of data in a real-time... (more)


About TOPC

ACM Transactions on Parallel Computing (TOPC) is a forum for novel and innovative work on all aspects of parallel computing, including foundational and theoretical aspects, systems, languages, architectures, tools, and applications. It will address all classes of parallel-processing platforms including concurrent, multithreaded, multicore, accelerated, multiprocessor, clusters, and supercomputers. 

read more
Forthcoming Articles
SciPAL: Expression Templates and Composition Closure Objects for High Performance Computational Physics with CUDA and OpenMP

We present SciPAL (scientific parallel algorithms library), a C++-based, hardware-independent open-source library.
Its core is a domain-specific embedded language for numerical linear algebra.
The main fields of application are finite element simulations, coherent optics and the solution of inverse problems.
Using SciPAL, algorithms can
be stated in a mathematically intuitive way in terms of matrix and vector operations.
Existing algorithms can easily be adapted to GPU-based computing by proper template specialization.
Our library is compatible with the finite element library deal.II and provides a port of deal.II's most frequently used linear algebra classes to CUDA (NVidia's extension of the programming languages C and C++ for programming their GPUs).
SciPAL's operator-based API for BLAS operations particularly aims at simplifying the usage of NVidia's CUBLAS.
For non-BLAS array arithmetic SciPAL's expression templates are able to generate CUDA kernels at compile-time.
We demonstrate the benefits of SciPAL using the iterative principal component analysis as example which is the core algorithm for the spike-sorting problem
in neuroscience.

Algorithm-based Fault Tolerance for Dense Matrix Factorizations, Multiple Failures and Accuracy

Dense matrix factorizations, such as LU, Cholesky and QR, are widely
used for scientific applications that require solving systems of
linear equations, eigenvalues and linear least squares problems.
Such computations are normally carried out on supercomputers, whose
ever-growing scale induces a fast decline of the Mean Time To
Failure (MTTF). This paper proposes a new hybrid approach, based on
Algorithm-Based Fault Tolerance (ABFT), to help matrix
factorizations algorithms survive fail-stop failures. We consider
extreme conditions, such as the absence of any reliable component
and the possibility of losing both data and checksum from a single
failure. We will present a generic solution for protecting the
right factor, where the updates are applied, of all above mentioned
For the left factor, where the panel has been applied, we propose a
scalable checkpointing algorithm. This algorithm features high
degree of checkpointing parallelism and cooperatively utilizes the
checksum storage leftover from the right factor protection. The
fault-tolerant algorithms derived from this hybrid solution is
applicable to a wide range of dense matrix factorizations, with
minor modifications. Theoretical analysis shows that the fault
tolerance overhead sharply decreases with the scaling in the number
of computing units and the problem size. Experimental results of LU
and QR factorization on the Kraken (Cray XT5) supercomputer validate
the theoretical evaluation and confirm negligible overhead, with-
and without-errors. Applicability to tolerate multiple failures
and accuracy after multiple recovery is also considered.

Collective algorithms for multi-ported torus networks

Modern supercomputers with torus networks allow each node to simultaneously pass messages on all of its links. However, most collective algorithms are designed to only use one link at a time. In this work, we present novel multi-ported algorithms for the scatter, gather, allgather, and reduce-scatter operations. Our algorithms can be combined to create multi-ported reduce, all-reduce, and broadcast algorithms. Several of these algorithms involve a new technique where we relax the MPI message-ordering constraints to achieve high performance and restore the correct ordering using an additional stage of redundant communication.

According to our models, on an n-dimensional torus, our algorithms should allow for nearly a 2n-fold improvement in communication performance compared to known, single-ported torus algorithms. In practice, we have achieved nearly 6x better performance on a 32k-node 3-dimensional torus.

Automatic Parallelization of a Class of Irregular Loops for Distributed Memory Systems

Many scientific applications spend significant time within loops that are parallel, except for dependencies from associative reduction operations. However these loops often contain data-dependent control-flow and array-access patterns. Traditional optimizations that rely on purely static analysis fail to generate parallel code.

This paper proposes an approach for automatic parallelization for distributed memory environments, using both static and run-time analysis. We formalize the computations that are targeted by this approach and develop algorithms to detect such computation. We describe in detail, algorithms to generate a parallel inspector that
performs the run-time analysis, and a parallel executor. The
effectiveness of the approach is demonstrated on several benchmarks and a real-world applications. We measure the inspector overhead and also evaluate the benefit of optimizations applied during the transformation.

Near-Optimal Scheduling Mechanisms for Deadline-Sensitive Jobs in Large Computing Clusters

We consider a market-based resource allocation model for batch jobs in cloud computing clusters. In our model, we incorporate the importance of the due date of a job by which it needs to be completed rather than the number of servers allocated to it at any given time. Each batch job is characterized by the work volume of total computing units (e.g., CPU hours) along with a bound on maximum degree of parallelism. Users specify, along with these job characteristics, their desired due date and a value for finishing the job by its deadline. Given this specification, the primary goal is to determine the scheduling of cloud computing instances under capacity constraints in order to maximize the social welfare (i.e., sum of values gained by allocated users). Our main result is a new $\frac{C}{C-k}\frac{s}{s-1}-approximation algorithm for this objective, where $C$ denotes cloud capacity, $k$ is the maximal bound on parallelized execution (in practical settings, $k << C$) and $s$ is the slackness on the job completion time i.e., the minimal ratio between a specified deadline and the earliest finish time of a job. Our algorithm is based on utilizing dual fitting arguments over a strengthened linear program to the problem.

Based on the new approximation algorithm, we construct truthful allocation and pricing mechanisms, in which reporting the true value and other properties of the job (deadline, work volume and the parallelism bound) is a dominant strategy for all users. To that end, we extend known results for single-value settings to provide a general framework for transforming allocation algorithms into truthful mechanisms in domains of single-value and multi-properties. We then show that the basic mechanism can be extended under proper Bayesian assumptions to the objective of maximizing revenues, which is important for public clouds. We empirically evaluate the benefits of our approach through simulations on datacenter job traces, and show that the revenues obtained under our mechanism are comparable with an ideal fixed-price mechanism, which sets an on-demand price using oracle knowledge of users' valuations. Finally, we discuss how our model can be extended to accommodate uncertainties in job work volumes, which is a practical challenge in cloud settings.

Power Management of Extreme-scale Networks with On/Off Links in Runtime Systems

Networks are among major power consumers in large-scale parallel systems. During execution of common
parallel applications, a sizeable fraction of the links in the high-radix interconnects are either never used
or are underutilized. We propose a runtime system based adaptive approach to turn off unused links, which
has various advantages over the previously proposed hardware and compiler based approaches. We discuss
why the runtime system is the best system component to accomplish this task, and test the effectiveness
of our approach using real applications (including NAMD, MILC), and application benchmarks (including
NAS Parallel Benchmarks, Stencil). These codes are simulated on representative topologies such as 6-D
Torus and multilevel directly-connected network (similar to IBM PERCS in Power 775 and Dragonfly in
Cray Aries). For common applications with near-neighbor communication pattern, our approach can save
up to 20% of total machine's power and energy, without any performance penalty.

Lock Cohorting: A General Technique for Designing NUMA Locks

Multicore machines are quickly shifting to NUMA and CC-NUMA architectures, making scalable NUMA- aware locking algorithms, ones that take into account the machines' non-uniform memory and caching hierarchy, ever more important. This paper presents lock cohorting, a general new technique for designing NUMA-aware locks that is as simple as it is powerful.
Lock cohorting allows one to transform any spin-lock algorithm, with minimal non-intrusive changes, into scalable NUMA-aware spin-locks. Our new cohorting technique allows us to easily create NUMA-aware versions of the TATAS-Backoff, CLH, MCS, and ticket locks, to name a few. Moreover, it allows us to derive a CLH-based cohort abortable lock, the first NUMA-aware queue lock to support abortability.
We empirically compared the performance of cohort locks with prior NUMA-aware and classic NUMA- oblivious locks on a synthetic micro-benchmark, a real world key-value store application memcached, as well as the libc memory allocator. Our results demonstrate that cohort locks perform as well or better than known locks when the load is low and significantly out-perform them as the load increases.


Publication Years 2014-2017
Publication Count 79
Citation Count 55
Available for Download 79
Downloads (6 weeks) 622
Downloads (12 Months) 4800
Downloads (cumulative) 13014
Average downloads per article 165
Average citations per article 1
First Name Last Name Award
Grey Ballard ACM Doctoral Dissertation Award
Honorable Mention (2013) ACM Doctoral Dissertation Award
Honorable Mention (2013)
Guy Blelloch ACM Fellows (2011)
James C Browne ACM Fellows (1998)
James Demmel ACM Paris Kanellakis Theory and Practice Award (2014)
ACM Fellows (1999)
Jack Dongarra ACM-IEEE CS Ken Kennedy Award (2013)
ACM Fellows (2001)
Phillip B Gibbons ACM Fellows (2006)
William D Gropp ACM-IEEE CS Ken Kennedy Award (2016)
SIAM/ACM Prize in Computational Science and Engineering (2014)
ACM Fellows (2006)
David Paul Grove ACM Fellows (2012)
ACM Distinguished Member (2010)
ACM Senior Member (2006)
Rachid Guerraoui ACM Fellows (2012)
Maurice Herlihy ACM Fellows (2005)
Charles E Leiserson ACM-IEEE CS Ken Kennedy Award (2014)
ACM Paris Kanellakis Theory and Practice Award (2013)
ACM Fellows (2006)
ACM Doctoral Dissertation Award (1982)
Michael Mitzenmacher ACM Fellows (2014)
Mooly Sagiv ACM Fellows (2015)
Vijay Saraswat ACM Doctoral Dissertation Award (1989)
Michael Scott ACM Fellows (2006)
Nir N Shavit ACM Fellows (2013)
Julian Shun ACM Doctoral Dissertation Award (2015)
Aravind Srinivasan ACM Fellows (2014)

First Name Last Name Paper Counts
Grey Ballard 3
Nicholas Knight 3
Benjamin Moseley 2
Joseph Naor 2
James Demmel 2
Charles Leiserson 2
Andrew Davidson 2
Tao Schardl 2
Guy Blelloch 2
John Owens 2
Peter Kling 2
Chinmoy Dutta 1
Gopal Pandurangan 1
Andrea Vattani 1
Christian Scheideler 1
Thomas Groß 1
Sungjin Im 1
Davide Bilò 1
Luciano Gualà 1
Hafiz Sheikh 1
Ishfaq Ahmad 1
Yves Robert 1
Rachid Guerraoui 1
Rupesh Nasre 1
Tim Harris 1
Gokcen Kestor 1
Walther Maldonado 1
Maurice Herlihy 1
Xavier Martorell 1
Dave Dice 1
Olivier Tardieu 1
Paul Thomson 1
Aurélien Bouteiller 1
Thomas Hérault 1
Torsten Hoefler 1
William Gropp 1
Andrew Grimshaw 1
William Gropp 1
Ishai Menache 1
Jianjia Chen 1
Stephen Siegel 1
Bo Zhao 1
Ponnuswamy Sadayappan 1
Mahesh Ravishankar 1
Xing Wu 1
Matthieu Dorier 1
Gabriel Antoniu 1
Yu Wang 1
Sergei Vassilvitskii 1
David Harris 1
Kirk Pruhs 1
Ioana Bercea 1
Eric Torng 1
Tim Kaler 1
Minjia Zhang 1
Shenchen Xu 1
Saurabh Kalikar 1
Yuechao Pan 1
Yuduo Wu 1
Carl Yang 1
George Teodoro 1
Wickus Nienaber 1
Darko Petrović 1
Adam Betts 1
Paolo Romano 1
Oliver Sinnen 1
James Dinan 1
Johannes Hagemann 1
Youtao Zhang 1
Jagannathan Ramanujam 1
Francis O'Connell 1
Bruce Mealey 1
Robert Sisneros 1
Rajeev Barua 1
Raoul Steffen 1
Scott Roche 1
Vijaya Ramachandran 1
Mooly Sagiv 1
Phillip Gibbons 1
Aapo Kyrola 1
Erin Carson 1
Jeffrey Blanchard 1
Erik Opavsky 1
Lukas Arnold 1
Aurélien Cavelan 1
Aritra Sengupta 1
Weitang Liu 1
Pascal Felber 1
Zhiyu Liu 1
Vijay Saraswat 1
Mandana Vaziri 1
Étienne Rivière 1
Santosh Mahapatra 1
Frédéric Vivien 1
Lionel Eyraud-Dubois 1
Paul Sack 1
Santiago Pagani 1
Moran Feldman 1
Liane Lewin-Eytan 1
Jun Yang 1
Yi Xu 1
Louis Pouchet 1
Scott Pakin 1
Pradip Bose 1
Chaodong Zheng 1
I Lee 1
Jim Sukha 1
Joseph Izraelevitz 1
Zoltan Majo 1
Aravind Srinivasan 1
Navin Goyal 1
William Hasenplaugh 1
Guido Proietti 1
Anne Benoit 1
Ioannis Koutis 1
Ulrich Meyer 1
Brandon Lucia 1
Leyuan Wang 1
Muhammad Osama 1
Chenshan Yuan 1
Osman Ünsal 1
Eduard Ayguadé 1
Alba De Melo 1
Avraham Shinnar 1
Mikio Takeuchi 1
Virendra Marathe 1
Nir Shavit 1
Nuno Diegues 1
Rajeev Thakur 1
Michael Garland 1
Laxmikant Kale 1
Stephan Kramer 1
Jörg Henkel 1
Andrew Siegel 1
Atanas Rountev 1
Frank Mueller 1
Roberto Gioiosa 1
Marc Snir 1
Timothy Heil 1
Anil Krishna 1
Shadi Ibrahim 1
Leigh Orf 1
Ronghua Liang 1
Rajmohan Rajaraman 1
Michael Scott 1
Jeremy Fineman 1
Uday Bondhugula 1
Felix Wolf 1
Man Cao 1
Swarnendu Biswas 1
Yangzihao Wang 1
Julia Lawall 1
Guillermo Miranda 1
Thomas Ropars 1
Duane Merrill 1
Ehsan Totoni 1
Nikhil Jain 1
Adam Hammouda 1
John Eisenlohr 1
Francisco Cazorla 1
Farnaz Toussi 1
Franck Cappello 1
Ashay Rane 1
Jun Wang 1
Seth Gilbert 1
Peter Sanders 1
Jochen Speck 1
Ravi Kumar 1
Guy Golan-Gueta 1
Ganesan Ramalingam 1
Harsha Simhadri 1
Jiayang Jiang 1
Michael Mitzenmacher 1
Felix Voigtlaender 1
Aleksandar Dragojević 1
Mary Hall 1
Patrick Marlier 1
George Bosilca 1
Peng Du 1
Jack Dongarra 1
Loris Marchal 1
Jonathan Yaniv 1
Sebastian Kobbe 1
Bastian Degener 1
Friedhelm Heide 1
Ciaran McCreesh 1
Julian Shun 1
Steven Vanderwiel 1
Alex Druinsky 1
Peter Pietrzyk 1
Richard Cole 1
Eran Yahav 1
Justin Thaler 1
Stefano Leucci 1
Emircan Uysaler 1
David Böhme 1
Markus Geimer 1
Michael Bond 1
Georgios Chatzopoulos 1
Andy Riffel 1
Edans De O. Sandes 1
David Grove 1
Prabhanjan Kambadur 1
Saeed Maleki 1
Madanlal Musuvathi 1
Todd Mytkowicz 1
Xin Yuan 1
Benjamin Herta 1
Alastair Donaldson 1
Pavan Balaji 1
Keith Underwood 1
Navendu Jain 1
Janmartin Jahn 1
Patrick Prosser 1
Orcun Yildiz 1
Tom Peterka 1
James Browne 1
Timothy Creech 1
Zhunping Zhang 1
Martina Eikel 1
Edgar Solomonik 1
Roshan Dathathri 1
Ravi Mullapudi 1
Hongyang Sun 1
Saman Ashkiani 1
Adrián Cristal 1
Serdar Taşiran 1
David Cunningham 1
Gilles Muller 1
André Schiper 1
Wei Zhang 1
Brian Barrett 1
Barbara Kempkes 1
Víctor Jiménez 1
Alper Buyuktosunoglu 1
Nicholas Lindberg 1
Oded Schwartz 1
Jiaquan Gao 1

Affiliation Paper Counts
Tel Aviv University 1
University of Auckland 1
University of Houston 1
Los Alamos National Laboratory 1
Koc University 1
Spanish National Research Council 1
Louisiana State University 1
Goethe University Frankfurt 1
Hebrew University of Jerusalem 1
University of California , Merced 1
University of Sassari 1
Technical University of Darmstadt 1
RWTH Aachen University 1
Nanjing Normal University 1
University of Virginia 1
Massachusetts Institute of Technology 1
University of Delaware 1
Georgetown University 1
University of Utah 1
Lawrence Livermore National Laboratory 1
University of Roma Tor Vergata 1
University of California, Los Angeles 1
University of California, San Diego 1
Michigan State University 1
University of Wisconsin Madison 1
Wake Forest University 1
University of Puerto Rico 1
Yahoo Research Labs 1
Huawei Technologies Co., Ltd., USA 1
Microsoft Research Cambridge 1
Universite de Bordeaux 1
IBM, Japan 1
University of Glasgow 2
Universite de Lyon 2
University of Gottingen 2
Northeastern University 2
University of Rochester 2
Zhejiang University of Technology 2
Indian Institute of Technology, Madras 2
Pacific Northwest National Laboratory 2
New York University 2
University of L'Aquila 2
National University of Singapore 2
Brown University 2
Washington University in St. Louis 2
Sandia National Laboratories, New Mexico 2
Lawrence Berkeley National Laboratory 2
Google Inc. 2
Instituto Superior Tecnico 2
North Carolina State University 2
University of Texas at Arlington 2
Imperial College London 3
Florida State University 3
University of Brasilia 3
Universitat Politecnica de Catalunya 3
Harvard University 3
Swiss Federal Institute of Technology, Zurich 3
Grinnell College 3
INRIA Institut National de Rechereche en Informatique et en Automatique 3
Indian Institute of Science 3
Barcelona Supercomputing Center 3
Ecole Normale Superieure de Lyon 4
University of Neuchatel 4
Intel Corporation 4
University of Texas at Austin 4
Technion - Israel Institute of Technology 5
University of Tennessee, Knoxville 5
Swiss Federal Institute of Technology, Lausanne 5
University of Maryland 5
University of Pittsburgh 5
Carnegie Mellon University 6
University of California, Berkeley 7
Microsoft Research 7
Ohio State University 7
Karlsruhe Institute of Technology 8
University of Paderborn 8
Argonne National Laboratory 8
MIT Computer Science and Artificial Intelligence Laboratory 8
University of Illinois at Urbana-Champaign 8
IBM Thomas J. Watson Research Center 11
University of California, Davis 14

ACM Transactions on Parallel Computing (TOPC) - Special Issue: Invited papers from PPoPP 2016, Part 2

Volume 4 Issue 2, October 2017 Special Issue: Invited papers from PPoPP 2016, Part 2
Volume 4 Issue 1, October 2017 Special Issue: Invited papers from PPoPP 2016, Part 1
Volume 3 Issue 4, March 2017 Special Issue on PPoPP 2015 and Regular Papers

Volume 3 Issue 3, December 2016
Volume 3 Issue 2, August 2016
Volume 3 Issue 1, June 2016 Special Issue for SPAA 2014
Volume 2 Issue 4, March 2016 Special Issue on PPOPP 2014

Volume 2 Issue 3, October 2015 Special Issue for SPAA 2013
Volume 2 Issue 2, July 2015
Volume 2 Issue 1, May 2015 Special Issue on SPAA 2012
Volume 1 Issue 2, January 2015 Special Issue on PPOPP 2012

Volume 1 Issue 1, September 2014 Inaugural Issue and Special Section on Top Papers from PACT-21, and Regular Papers
All ACM Journals | See Full Journal Index

Search TOPC
enter search term and/or author name