Neural Network-based Performance Prediction for Task Migration on S-NUCA Many-Cores

Abstract

The performance of a task running on a many-core with distributed shared last-level cache (LLC) strongly depends on two parameters: the power budget needed to guarantee thermally-safe operation and the LLC latency. The task’s thread-to-core mapping determines both the parameters and needs to make a trade-off because both cannot be simultaneously optimal. Arrival and departure of tasks on a many-core deployed in an open system can change its state significantly in terms of available cores and power budgets. Task migrations can thereupon be used as a tool to keep the many-core operating at peak performance. Furthermore, the relative impacts of power budget and LLC latency on a task’s performance may change with its different execution phases mandating its migration on-the-fly. We propose the first run-time algorithm PCMig that increases the performance of a many-core with distributed shared LLC by migrating tasks based on their phases and the many-core’s state. PCMig is based on a model that predicts the performance impact of migrations. We propose a performance prediction model based on a lightweight neural network (NN). To serve as a reference, we also propose an analytical model of the many-core that operates on CPI stacks. We demonstrate an NN-based model achieves a higher prediction accuracy at a lower overhead than an analytical model. PCMig is based on the NN prediction model and results in an up to 7.3 percent increase in performance under a thermal constraint for mixed workloads compared to architecture-aware state-of-the-art (up to 20 percent increase for individual applications). This is achieved with a run-time overhead of less than 0.5 percent.

Publication
IEEE Transactions on Computers
Anuj Pathania
Anuj Pathania
Assistant Professor

Anuj Pathania is an Assistant Professor in the Parallel Computing Systems (PCS) group at the University of Amsterdam (UvA). His research focuses on the design of sustainable systems deployed in power-, thermal-, energy- and reliability-constrained environments.