ACM Transactions on

Parallel Computing (TOPC)

Latest Articles

Guest Editor Introduction PPoPP 2016, Special Issue 2 of 2

DomLock: A New Multi-Granularity Locking Technique for Hierarchies

We present efficient locking mechanisms for hierarchical data structures. Several applications work on an abstract hierarchy of objects, and a parallel execution on this hierarchy necessitates synchronization across workers operating on different parts of the hierarchy. Existing synchronization mechanisms are too coarse, too inefficient, or too ad... (more)

Lease/Release: Architectural Support for Scaling Contended Data Structures

High memory contention is generally agreed to be a worst-case scenario for concurrent data structures. There has been a significant amount of research effort spent investigating designs that minimize contention, and several programming techniques have been proposed to mitigate its effects. However, there are currently few architectural mechanisms... (more)

Hybridizing and Relaxing Dependence Tracking for Efficient Parallel Runtime Support

It is notoriously challenging to develop parallel software systems that are both scalable and correct. Runtime support for parallelism—such as... (more)

ESTIMA: Extrapolating ScalabiliTy of In-Memory Applications

This article presents estima, an easy-to-use tool for extrapolating the scalability of in-memory applications. estima is designed to perform a simple yet important task: Given the performance of an application on a small machine with a handful of cores, estima extrapolates its scalability to a larger machine with more cores, while requiring minimum... (more)

Efficient Data Streaming Multiway Aggregation through Concurrent Algorithmic Designs and New Abstract Data Types

Data streaming relies on continuous queries to process unbounded streams of data in a real-time... (more)


About TOPC

ACM Transactions on Parallel Computing (TOPC) is a forum for novel and innovative work on all aspects of parallel computing, including foundational and theoretical aspects, systems, languages, architectures, tools, and applications. It will address all classes of parallel-processing platforms including concurrent, multithreaded, multicore, accelerated, multiprocessor, clusters, and supercomputers. 

read more
Forthcoming Articles

Multi-dimensional intra-tile parallelization for memory-starved stencil computations

SciPAL: Expression Templates and Composition Closure Objects for High Performance Computational Physics with CUDA and OpenMP

We present SciPAL (scientific parallel algorithms library), a C++-based, hardware-independent open-source library.
Its core is a domain-specific embedded language for numerical linear algebra.
The main fields of application are finite element simulations, coherent optics and the solution of inverse problems.
Using SciPAL, algorithms can
be stated in a mathematically intuitive way in terms of matrix and vector operations.
Existing algorithms can easily be adapted to GPU-based computing by proper template specialization.
Our library is compatible with the finite element library deal.II and provides a port of deal.II's most frequently used linear algebra classes to CUDA (NVidia's extension of the programming languages C and C++ for programming their GPUs).
SciPAL's operator-based API for BLAS operations particularly aims at simplifying the usage of NVidia's CUBLAS.
For non-BLAS array arithmetic SciPAL's expression templates are able to generate CUDA kernels at compile-time.
We demonstrate the benefits of SciPAL using the iterative principal component analysis as example which is the core algorithm for the spike-sorting problem
in neuroscience.

Algorithm-based Fault Tolerance for Dense Matrix Factorizations, Multiple Failures and Accuracy

Dense matrix factorizations, such as LU, Cholesky and QR, are widely
used for scientific applications that require solving systems of
linear equations, eigenvalues and linear least squares problems.
Such computations are normally carried out on supercomputers, whose
ever-growing scale induces a fast decline of the Mean Time To
Failure (MTTF). This paper proposes a new hybrid approach, based on
Algorithm-Based Fault Tolerance (ABFT), to help matrix
factorizations algorithms survive fail-stop failures. We consider
extreme conditions, such as the absence of any reliable component
and the possibility of losing both data and checksum from a single
failure. We will present a generic solution for protecting the
right factor, where the updates are applied, of all above mentioned
For the left factor, where the panel has been applied, we propose a
scalable checkpointing algorithm. This algorithm features high
degree of checkpointing parallelism and cooperatively utilizes the
checksum storage leftover from the right factor protection. The
fault-tolerant algorithms derived from this hybrid solution is
applicable to a wide range of dense matrix factorizations, with
minor modifications. Theoretical analysis shows that the fault
tolerance overhead sharply decreases with the scaling in the number
of computing units and the problem size. Experimental results of LU
and QR factorization on the Kraken (Cray XT5) supercomputer validate
the theoretical evaluation and confirm negligible overhead, with-
and without-errors. Applicability to tolerate multiple failures
and accuracy after multiple recovery is also considered.

Collective algorithms for multi-ported torus networks

Modern supercomputers with torus networks allow each node to simultaneously pass messages on all of its links. However, most collective algorithms are designed to only use one link at a time. In this work, we present novel multi-ported algorithms for the scatter, gather, allgather, and reduce-scatter operations. Our algorithms can be combined to create multi-ported reduce, all-reduce, and broadcast algorithms. Several of these algorithms involve a new technique where we relax the MPI message-ordering constraints to achieve high performance and restore the correct ordering using an additional stage of redundant communication.

According to our models, on an n-dimensional torus, our algorithms should allow for nearly a 2n-fold improvement in communication performance compared to known, single-ported torus algorithms. In practice, we have achieved nearly 6x better performance on a 32k-node 3-dimensional torus.

Automatic Parallelization of a Class of Irregular Loops for Distributed Memory Systems

Many scientific applications spend significant time within loops that are parallel, except for dependencies from associative reduction operations. However these loops often contain data-dependent control-flow and array-access patterns. Traditional optimizations that rely on purely static analysis fail to generate parallel code.

This paper proposes an approach for automatic parallelization for distributed memory environments, using both static and run-time analysis. We formalize the computations that are targeted by this approach and develop algorithms to detect such computation. We describe in detail, algorithms to generate a parallel inspector that
performs the run-time analysis, and a parallel executor. The
effectiveness of the approach is demonstrated on several benchmarks and a real-world applications. We measure the inspector overhead and also evaluate the benefit of optimizations applied during the transformation.

Access to Data and Number of Iterations: Dual Primal Algorithms for Maximum Matching under Resource Constraints

In this paper we consider graph algorithms in models of computation where the space usage (random accessible storage, in addition to the read only input) is sublinear in the number of edges $m$ and the access to input is constrained. These questions arises in many natural settings, and in particular in the analysis of MapReduce or similar algorithms that model constrained parallelism with sublinear central processing.

We focus on weighted nonbipartite maximum matching in this paper. For any constant $p>1$, we provide an iterative sampling based algorithm for computing a $(1-\epsilon)$-approximation of the weighted nonbipartite maximum matching that uses $O(p/\epsilon)$ rounds of sampling, and $O(n^{1+1/p})$ space. The results extends to $b$-Matching with small changes. This paper combines
adaptive sketching literature and fast primal-dual algorithms based on relaxed Dantzig-Wolfe decision procedures. Each round of sampling is implemented through linear sketches and can be executed in a single round of
MapReduce. The paper also proves that nonstandard linear relaxations of a problem, in particular penalty based formulations, are helpful in reducing the adaptive dependence of the iterations.

Near-Optimal Scheduling Mechanisms for Deadline-Sensitive Jobs in Large Computing Clusters

We consider a market-based resource allocation model for batch jobs in cloud computing clusters. In our model, we incorporate the importance of the due date of a job by which it needs to be completed rather than the number of servers allocated to it at any given time. Each batch job is characterized by the work volume of total computing units (e.g., CPU hours) along with a bound on maximum degree of parallelism. Users specify, along with these job characteristics, their desired due date and a value for finishing the job by its deadline. Given this specification, the primary goal is to determine the scheduling of cloud computing instances under capacity constraints in order to maximize the social welfare (i.e., sum of values gained by allocated users). Our main result is a new $\frac{C}{C-k}\frac{s}{s-1}-approximation algorithm for this objective, where $C$ denotes cloud capacity, $k$ is the maximal bound on parallelized execution (in practical settings, $k << C$) and $s$ is the slackness on the job completion time i.e., the minimal ratio between a specified deadline and the earliest finish time of a job. Our algorithm is based on utilizing dual fitting arguments over a strengthened linear program to the problem.

Based on the new approximation algorithm, we construct truthful allocation and pricing mechanisms, in which reporting the true value and other properties of the job (deadline, work volume and the parallelism bound) is a dominant strategy for all users. To that end, we extend known results for single-value settings to provide a general framework for transforming allocation algorithms into truthful mechanisms in domains of single-value and multi-properties. We then show that the basic mechanism can be extended under proper Bayesian assumptions to the objective of maximizing revenues, which is important for public clouds. We empirically evaluate the benefits of our approach through simulations on datacenter job traces, and show that the revenues obtained under our mechanism are comparable with an ideal fixed-price mechanism, which sets an on-demand price using oracle knowledge of users' valuations. Finally, we discuss how our model can be extended to accommodate uncertainties in job work volumes, which is a practical challenge in cloud settings.

Power Management of Extreme-scale Networks with On/Off Links in Runtime Systems

Networks are among major power consumers in large-scale parallel systems. During execution of common
parallel applications, a sizeable fraction of the links in the high-radix interconnects are either never used
or are underutilized. We propose a runtime system based adaptive approach to turn off unused links, which
has various advantages over the previously proposed hardware and compiler based approaches. We discuss
why the runtime system is the best system component to accomplish this task, and test the effectiveness
of our approach using real applications (including NAMD, MILC), and application benchmarks (including
NAS Parallel Benchmarks, Stencil). These codes are simulated on representative topologies such as 6-D
Torus and multilevel directly-connected network (similar to IBM PERCS in Power 775 and Dragonfly in
Cray Aries). For common applications with near-neighbor communication pattern, our approach can save
up to 20% of total machine's power and energy, without any performance penalty.

Lock Cohorting: A General Technique for Designing NUMA Locks

Multicore machines are quickly shifting to NUMA and CC-NUMA architectures, making scalable NUMA- aware locking algorithms, ones that take into account the machines' non-uniform memory and caching hierarchy, ever more important. This paper presents lock cohorting, a general new technique for designing NUMA-aware locks that is as simple as it is powerful.
Lock cohorting allows one to transform any spin-lock algorithm, with minimal non-intrusive changes, into scalable NUMA-aware spin-locks. Our new cohorting technique allows us to easily create NUMA-aware versions of the TATAS-Backoff, CLH, MCS, and ticket locks, to name a few. Moreover, it allows us to derive a CLH-based cohort abortable lock, the first NUMA-aware queue lock to support abortability.
We empirically compared the performance of cohort locks with prior NUMA-aware and classic NUMA- oblivious locks on a synthetic micro-benchmark, a real world key-value store application memcached, as well as the libc memory allocator. Our results demonstrate that cohort locks perform as well or better than known locks when the load is low and significantly out-perform them as the load increases.

Partitioning Models for Scaling Parallel Sparse Matrix-Matrix Multiplication


Publication Years 2014-2017
Publication Count 79
Citation Count 62
Available for Download 79
Downloads (6 weeks) 559
Downloads (12 Months) 4796
Downloads (cumulative) 16494
Average downloads per article 209
Average citations per article 1
First Name Last Name Award
Grey Ballard ACM Doctoral Dissertation Award
Honorable Mention (2013) ACM Doctoral Dissertation Award
Honorable Mention (2013)
Guy Blelloch ACM Fellows (2011)
James C Browne ACM Fellows (1998)
James Demmel ACM Paris Kanellakis Theory and Practice Award (2014)
ACM Fellows (1999)
Jack Dongarra ACM-IEEE CS Ken Kennedy Award (2013)
ACM Fellows (2001)
Phillip B Gibbons ACM Fellows (2006)
William D Gropp ACM-IEEE CS Ken Kennedy Award (2016)
SIAM/ACM Prize in Computational Science and Engineering (2014)
ACM Fellows (2006)
David Paul Grove ACM Fellows (2012)
ACM Distinguished Member (2010)
ACM Senior Member (2006)
Rachid Guerraoui ACM Fellows (2012)
Maurice Herlihy ACM Fellows (2005)
Charles E Leiserson ACM-IEEE CS Ken Kennedy Award (2014)
ACM Paris Kanellakis Theory and Practice Award (2013)
ACM Fellows (2006)
ACM Doctoral Dissertation Award (1982)
Michael Mitzenmacher ACM Fellows (2014)
John Douglas Owens ACM Distinguished Member (2017)
Mooly Sagiv ACM Fellows (2015)
Vijay Saraswat ACM Doctoral Dissertation Award (1989)
Michael Scott ACM Fellows (2006)
Nir N Shavit ACM Fellows (2013)
Julian Shun ACM Doctoral Dissertation Award (2015)
Aravind Srinivasan ACM Fellows (2014)
Guy L Steele ACM Fellows (1994)
ACM Grace Murray Hopper Award (1988)
Guy L Steele ACM Fellows (1994)
ACM Grace Murray Hopper Award (1988)

First Name Last Name Paper Counts
Charles Leiserson 3
Grey Ballard 3
Nicholas Knight 3
Joseph Naor 2
Tao Schardl 2
Peter Kling 2
Andrew Davidson 2
James Demmel 2
Benjamin Moseley 2
John Owens 2
William Hasenplaugh 2
Guy Blelloch 2
Chinmoy Dutta 1
Gopal Pandurangan 1
Andrea Vattani 1
Christian Scheideler 1
Thomas Groß 1
Sungjin Im 1
Davide Bilò 1
Luciano Gualà 1
Hafiz Sheikh 1
Ishfaq Ahmad 1
Yves Robert 1
Rachid Guerraoui 1
Rupesh Nasre 1
Charles Bachmeier 1
Tim Harris 1
Gokcen Kestor 1
Serdar Tasiran 1
Walther Maldonado 1
Torsten Hoefler 1
William Gropp 1
Maurice Herlihy 1
Xavier Martorell 1
Olivier Tardieu 1
Paul Thomson 1
Dave Dice 1
Aurélien Bouteiller 1
Thomas Hérault 1
William Gropp 1
Andrew Grimshaw 1
Jianjia Chen 1
Stephen Siegel 1
Ishai Menache 1
Bo Zhao 1
Mahesh Ravishankar 1
Ponnuswamy Sadayappan 1
Xing Wu 1
Matthieu Dorier 1
Gabriel Antoniu 1
Yu Wang 1
Sergei Vassilvitskii 1
Ioana Bercea 1
David Harris 1
Kirk Pruhs 1
Eric Torng 1
Tim Kaler 1
Shenchen Xu 1
Minjia Zhang 1
Saurabh Kalikar 1
Bradley Kuszmaul 1
Yuan Tang 1
Guy Steele 1
Yuechao Pan 1
Yuduo Wu 1
Carl Yang 1
Paolo Romano 1
Oliver Sinnen 1
James Dinan 1
Wickus Nienaber 1
Darko Petrović 1
George Teodoro 1
Adam Betts 1
Johannes Hagemann 1
Youtao Zhang 1
Jagannathan Ramanujam 1
Francis O'Connell 1
Bruce Mealey 1
Robert Sisneros 1
Rajeev Barua 1
Raoul Steffen 1
Scott Roche 1
Vijaya Ramachandran 1
Mooly Sagiv 1
Phillip Gibbons 1
Aapo Kyrola 1
Erin Carson 1
Jeffrey Blanchard 1
Erik Opavsky 1
Lukas Arnold 1
Aurélien Cavelan 1
Marina Papatriantafilou 1
Aritra Sengupta 1
Rezaul Chowdhury 1
Weitang Liu 1
Lionel Eyraud-Dubois 1
Frédéric Vivien 1
Pascal Felber 1
Étienne Rivière 1
Zhiyu Liu 1
Santosh Mahapatra 1
Vijay Saraswat 1
Mandana Vaziri 1
Paul Sack 1
Santiago Pagani 1
Moran Feldman 1
Liane Lewin-Eytan 1
Yi Xu 1
Jun Yang 1
Louis Pouchet 1
Scott Pakin 1
Pradip Bose 1
Chaodong Zheng 1
I Lee 1
Jim Sukha 1
Joseph Izraelevitz 1
Zoltan Majo 1
Navin Goyal 1
Aravind Srinivasan 1
Guido Proietti 1
Anne Benoit 1
Hongyang Sun 1
Ioannis Koutis 1
Ulrich Meyer 1
Armando Solar-Lezama 1
Brandon Lucia 1
Jean Tristan 1
Leyuan Wang 1
Muhammad Osama 1
Chenshan Yuan 1
Nuno Diegues 1
Osman Ünsal 1
Rajeev Thakur 1
Eduard Ayguadé 1
Alba De Melo 1
Avraham Shinnar 1
Mikio Takeuchi 1
Virendra Marathe 1
Nir Shavit 1
Michael Garland 1
Stephan Kramer 1
Laxmikant Kale 1
Jörg Henkel 1
Andrew Siegel 1
Atanas Rountev 1
Frank Mueller 1
Timothy Heil 1
Anil Krishna 1
Roberto Gioiosa 1
Marc Snir 1
Shadi Ibrahim 1
Leigh Orf 1
Ronghua Liang 1
Rajmohan Rajaraman 1
Michael Scott 1
Jeremy Fineman 1
Uday Bondhugula 1
Felix Wolf 1
Yiannis Nikolakopoulos 1
Man Cao 1
Swarnendu Biswas 1
Syed Haider 1
Yangzihao Wang 1
Julia Lawall 1
Thomas Ropars 1
Guillermo Miranda 1
Duane Merrill 1
Ehsan Totoni 1
Nikhil Jain 1
Adam Hammouda 1
John Eisenlohr 1
Ashay Rane 1
Farnaz Toussi 1
Francisco Cazorla 1
Franck Cappello 1
Jun Wang 1
Seth Gilbert 1
Peter Sanders 1
Jochen Speck 1
Ravi Kumar 1
Guy Golan-Gueta 1
Ganesan Ramalingam 1
Harsha Simhadri 1
Jiayang Jiang 1
Michael Mitzenmacher 1
Felix Voigtlaender 1
Vincenzo Gulisano 1
Daniel Cederman 1
Philippas Tsigas 1
Aleksandar Dragojević 1
Mary Hall 1
Loris Marchal 1
Patrick Marlier 1
George Bosilca 1
Jack Dongarra 1
Sebastian Kobbe 1
Jonathan Yaniv 1
Bastian Degener 1
Friedhelm Heide 1
Ciaran McCreesh 1
Julian Shun 1
Steven Vanderwiel 1
Alex Druinsky 1
Peter Pietrzyk 1
Richard Cole 1
Eran Yahav 1
Justin Thaler 1
Stefano Leucci 1
Emircan Uysaler 1
David Böhme 1
Markus Geimer 1
Michael Bond 1
Georgios Chatzopoulos 1
Jesmin Tithi 1
Stephen Tschudi 1
Andy Riffel 1
Pavan Balaji 1
Keith Underwood 1
Xin Yuan 1
Edans De O. Sandes 1
Benjamin Herta 1
David Grove 1
Prabhanjan Kambadur 1
Alastair Donaldson 1
Saeed Maleki 1
Madanlal Musuvathi 1
Todd Mytkowicz 1
Peng Du 1
Janmartin Jahn 1
Navendu Jain 1
Patrick Prosser 1
James Browne 1
Orcun Yildiz 1
Tom Peterka 1
Timothy Creech 1
Zhunping Zhang 1
Martina Eikel 1
Edgar Solomonik 1
Roshan Dathathri 1
Ravi Mullapudi 1
Dan Alistarh 1
Saman Ashkiani 1
Pramod Ganapathi 1
Adrián Cristal 1
Gilles Muller 1
Brian Barrett 1
André Schiper 1
Wei Zhang 1
David Cunningham 1
Barbara Kempkes 1
Nicholas Lindberg 1
Víctor Jiménez 1
Alper Buyuktosunoglu 1
Oded Schwartz 1
Jiaquan Gao 1

Affiliation Paper Counts
Tel Aviv University 1
University of Auckland 1
University of Houston 1
Los Alamos National Laboratory 1
Koc University 1
Spanish National Research Council 1
Louisiana State University 1
Goethe University Frankfurt 1
Hebrew University of Jerusalem 1
University of California , Merced 1
Fudan University 1
University of Sassari 1
Technical University of Darmstadt 1
RWTH Aachen University 1
Nanjing Normal University 1
University of Virginia 1
University of Connecticut 1
University of Delaware 1
Georgetown University 1
University of Utah 1
Lawrence Livermore National Laboratory 1
University of Roma Tor Vergata 1
University of California, Los Angeles 1
University of California, San Diego 1
Michigan State University 1
University of Wisconsin Madison 1
Wake Forest University 1
University of Puerto Rico 1
Yahoo Research Labs 1
Huawei Technologies Co., Ltd., USA 1
Microsoft Research Cambridge 1
Universite de Bordeaux 1
IBM, Japan 1
Twitter, Inc. 1
University of Glasgow 2
University of Texas at Arlington 2
North Carolina State University 2
Instituto Superior Tecnico 2
Google Inc. 2
Lawrence Berkeley National Laboratory 2
Sandia National Laboratories, New Mexico 2
Washington University in St. Louis 2
Brown University 2
National University of Singapore 2
University of L'Aquila 2
New York University 2
Pacific Northwest National Laboratory 2
Indian Institute of Technology, Madras 2
Zhejiang University of Technology 2
University of Rochester 2
Northeastern University 2
University of Gottingen 2
Universite de Lyon 2
Florida State University 3
Universitat Politecnica de Catalunya 3
Harvard University 3
INRIA Institut National de Rechereche en Informatique et en Automatique 3
Indian Institute of Science, Bangalore 3
Imperial College London 3
Massachusetts Institute of Technology 3
University of Brasilia 3
Stony Brook University 3
Grinnell College 3
Barcelona Supercomputing Center 3
University of Neuchatel 4
Ecole Normale Superieure de Lyon 4
University of Texas at Austin 4
Swiss Federal Institute of Technology, Zurich 4
Chalmers University of Technology 5
University of Pittsburgh 5
University of Tennessee, Knoxville 5
University of Maryland 5
Technion - Israel Institute of Technology 5
Swiss Federal Institute of Technology, Lausanne 5
Intel Corporation 5
Carnegie Mellon University 6
Microsoft Research 7
Ohio State University 7
University of California, Berkeley 7
Karlsruhe Institute of Technology 8
MIT Computer Science and Artificial Intelligence Laboratory 8
University of Illinois at Urbana-Champaign 8
University of Paderborn 8
Argonne National Laboratory 8
IBM Thomas J. Watson Research Center 11
University of California, Davis 14

ACM Transactions on Parallel Computing (TOPC) - Special Issue: Invited papers from PPoPP 2016, Part 2

Volume 4 Issue 2, October 2017 Special Issue: Invited papers from PPoPP 2016, Part 2
Volume 4 Issue 1, October 2017 Special Issue: Invited papers from PPoPP 2016, Part 1
Volume 3 Issue 4, March 2017 Special Issue on PPoPP 2015 and Regular Papers

Volume 3 Issue 3, December 2016
Volume 3 Issue 2, August 2016
Volume 3 Issue 1, June 2016 Special Issue for SPAA 2014
Volume 2 Issue 4, March 2016 Special Issue on PPOPP 2014

Volume 2 Issue 3, October 2015 Special Issue for SPAA 2013
Volume 2 Issue 2, July 2015
Volume 2 Issue 1, May 2015 Special Issue on SPAA 2012
Volume 1 Issue 2, January 2015 Special Issue on PPOPP 2012

Volume 1 Issue 1, September 2014 Inaugural Issue and Special Section on Top Papers from PACT-21, and Regular Papers
All ACM Journals | See Full Journal Index

Search TOPC
enter search term and/or author name