ACM DL

ACM Transactions on

Parallel Computing (TOPC)

Menu
Latest Articles

Introduction to the Special Issue for SPAA’17

Distributed Partial Clustering

Recent years have witnessed an increasing popularity of algorithm design for distributed data, largely due to the fact that massive datasets are often collected and stored in different locations. In the distributed setting, communication typically dominates the query processing time. Thus, it becomes crucial to design communication-efficient... (more)

Distributed Detection of Cycles

Distributed property testing in networks has been introduced by Brakerski and Patt-Shamir [6], with the objective of detecting the presence of large dense sub-networks in a distributed manner. Recently, Censor-Hillel et al. [7] have revisited this notion and formalized it in a broader context. In particular, they have shown how to detect 3-cycles... (more)

On Energy Conservation in Data Centers

We formulate and study an optimization problem that arises in the energy management of data centers and, more generally, multiprocessor environments. Data centers host a large number of heterogeneous servers. Each server has an active state and several standby/sleep states with individual power consumption rates. The demand for computing capacity... (more)

The Mobile Server Problem

We introduce the Mobile Server problem, inspired by current trends to move computational tasks from cloud structures to multiple devices close to the end user. An example of this is embedded systems in autonomous cars that communicate to coordinate their actions. Our model is a variant of the classical Page Migration problem. More formally, we... (more)

Tight Bounds for Clairvoyant Dynamic Bin Packing

In this article, we focus on the Clairvoyant Dynamic Bin Packing (DBP) problem, which extends the Classical Online Bin Packing problem in that items arrive and depart over time and the departure time of an item is known upon its arrival. The problem naturally arises when handling cloud-based networks. We focus specifically on the MinUsageTime... (more)

New Cover Time Bounds for the Coalescing-Branching Random Walk on Graphs

We present new bounds on the cover time of the coalescing-branching random walk process COBRA. The COBRA process, introduced in Dutta et al. [9], can... (more)

Distributed Graph Clustering and Sparsification

Graph clustering is a fundamental computational problem with a number of applications in algorithm design, machine learning, data mining, and analysis of social networks. Over the past decades, researchers have proposed a number of algorithmic design methods for graph clustering. Most of these methods, however, are based on complicated spectral... (more)

Near Optimal Parallel Algorithms for Dynamic DFS in Undirected Graphs

Depth first search (DFS) tree is a fundamental data structure for solving various graph problems. The classical algorithm [54] for building a... (more)

NEWS

ACM Transactions on Parallel Computing Names David Bader as Editor-in-Chief

ACM Transactions on Parallel Computing (TOPC) welcomes David Bader as new Editor-in-Chief, for the term November 1, 2018 to October 31, 2021. David is a Professor and Chair in the School of Computational Science and Engineering and College of Computing at Georgia Institute of Technology.

 

About TOPC

ACM Transactions on Parallel Computing (TOPC) is a forum for novel and innovative work on all aspects of parallel computing, including foundational and theoretical aspects, systems, languages, architectures, tools, and applications. It will address all classes of parallel-processing platforms including concurrent, multithreaded, multicore, accelerated, multiprocessor, clusters, and supercomputers. READ MORE

Forthcoming Articles

Tapir: Embedding Recursive Fork-Join Parallelism into LLVM's Intermediate Representation

Processor-Oblivious Record and Replay

Extracting SIMD Parallelism from Recursive Task-Parallel Programs

The pursuit of computational efficiency has led to the proliferation of throughput-oriented hardware, from GPUs to increasingly wide vector units on commodity processors and accelerators. This hardware is designed to efficiently execute data-parallel computations in a vectorized manner. However, many algorithms are more naturally expressed as divide-and-conquer, recursive, task-parallel computations. In the absence of data parallelism, it seems that such algorithms are not well suited to throughput-oriented architectures. This paper presents a set of novel code transformations that expose the data parallelism latent in recursive, task- parallel programs. These transformations facilitate straightforward vectorization of task-parallel programs on commodity hardware. We also present scheduling policies that maintain high utilization of vector resources while limiting space usage. Across several task-parallel benchmarks, we demonstrate both efficient vector resource utilization and substantial speedup on chips using Intel's SSE4.2 vector units, as well as accelerators using Intel's AVX512 units. We then show through rigorous sampling that, in practice, our vectorization techniques are effective for a much larger class of programs.

All ACM Journals | See Full Journal Index

Search TOPC
enter search term and/or author name