Pipelined CNN Inference on Heterogeneous Multi-Processor System-on-Chip

Ehsan Aghapour, Yujie Zhang, Anuj Pathania, Tulika Mitra

August, 2022

Abstract

Convolutional Neural Networks (CNNs) based inference is a quintessential component in mobile machine learning applications. Privacy and real-time response requirements require applications to perform inference on the mobile (edge) devices themselves. Heterogeneous Multi-Processor System-on-Chips (HMPSoCs) within the edge devices enable high-throughput, low-latency edge inference. An HMPSoC contains several processing cores, each capable of independently performing CNN inference. However, to meet stringent performance requirements, an application must simultaneously involve all core types in inferencing. A software-based CNN inference pipeline design allows for synergistic engagement of all the cores in an HMPSoC for a high-throughput and low-latency CNN inference. In this chapter, we present two different CNN inference pipeline designs. The first design creates a pipeline between two different types of CPU cores. The second design extends the pipeline from CPU to GPU. We also provide a future perspective and research directions on the subject.

Type

Book section

Publication

Springer Embedded Machine Learning for Cyber Physical, IoT, and Edge Computing

Anuj Pathania

Assistant Professor

Anuj Pathania is an Assistant Professor in the Parallel Computing Systems (PCS) group at the University of Amsterdam (UvA). His research focuses on the design of sustainable systems deployed in power-, thermal-, energy- and reliability-constrained environments.