The demo/tool 'ARM-CO-UP: ARM Co-Operative Utilization of Processors' was accepted for the Embedded Systems Software Competition at the IEEE/ACM Embedded Systems Week 2023.
Abstract: Heterogeneous Multi-Processor System on Chips (HMPSoCs) combines different processors on a single chip. They enable powerful embedded devices, which increasingly perform Machine Learning (ML) inference at the edge. State-of-the-art HMPSoCs can perform on-chip embedded inference using their CPU, GPU, and integrated accelerators. The on-chip GPU in embedded devices is comparable in performance to CPU clusters, and efficient inference requires the cooperative utilization of these processors. Integrated accelerators, although operating with lower bit precision, significantly improve power efficiency at the expense of model accuracy.
However, existing inference frameworks for edge devices typically utilize only a single processor type and lack the ability to use different processor types collaboratively. To this end,We design the ARM-COUP framework based on the ARM-CL framework. The ARM-COUP provides both parallel and serial utilization of different processor types. In parallel mode, it optimizes throughput(FPS) and energy efficiency by leveraging pipeline execution of network partitions for consecutive input data. While in serial mode, it improves inference latency and energy efficiency through layer-switch inference for each input data and layer-wise DVFS. It automates model graph partitioning and mapping, pipeline synchronization, processor type switching, layer-wise DVFS, and integration of new accelerators, even with closed-source libraries.