Pipelined CNN Inference on Heterogeneous Multi-Processor System-on-Chip

Abstract

Convolutional Neural Networks (CNNs) based inference is a quintessential component in mobile machine learning applications. Privacy and real-time response requirements require applications to perform inference on the mobile (edge) devices themselves. Heterogeneous Multi-Processor System-on-Chips (HMPSoCs) within the edge devices enable high-throughput, low-latency edge inference. An HMPSoC contains several processing cores, each capable of independently performing CNN inference. However, to meet stringent performance requirements, an application must simultaneously involve all core types in inferencing. A software-based CNN inference pipeline design allows for synergistic engagement of all the cores in an HMPSoC for a high-throughput and low-latency CNN inference. In this chapter, we present two different CNN inference pipeline designs. The first design creates a pipeline between two different types of CPU cores. The second design extends the pipeline from CPU to GPU. We also provide a future perspective and research directions on the subject.

Publication
Springer Embedded Machine Learning for Cyber Physical, IoT, and Edge Computing
Anuj Pathania
Anuj Pathania
Assistant Professor

Anuj Pathania is an Assistant Professor in the Parallel Computing Systems (PCS) group at the University of Amsterdam (UvA). His research focuses on the design of sustainable systems deployed in power-, thermal-, energy- and reliability-constrained environments.