Optimizing I/O Performance of HPC Applications with Autotuning
Communication between tasks has been identified as a major challenge for the performance and energy efficiency of parallel applications.
A common way to improve communication is to increase its locality, that is, to reduce the distances of data transfers, prioritizing the usage of faster and more efficient local interconnections over remote ones.
An important problem to be solved in this context is how to determine an optimized mapping of tasks to cluster nodes and cores that increases the overall locality.
In this paper, we propose the EagerMap algorithm to determine task mappings, which is based on a greedy heuristic to match application communication patterns to hardware hierarchies.
Compared to previous algorithms, EagerMap is faster, scales better, and supports more types of computer systems, while maintaining the same quality of the determined task mapping.
EagerMap is therefore an interesting choice for task mapping on a variety of modern parallel architectures.