<?xml version="1.0" encoding="utf-8" standalone="yes" ?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
  <channel>
    <title>Paper | Anuj Pathania</title>
    <link>https://staff.fnwi.uva.nl/a.pathania/tag/paper/</link>
      <atom:link href="https://staff.fnwi.uva.nl/a.pathania/tag/paper/index.xml" rel="self" type="application/rss+xml" />
    <description>Paper</description>
    <generator>Wowchemy (https://wowchemy.com)</generator><language>en-us</language><lastBuildDate>Thu, 10 Aug 2023 00:00:00 +0000</lastBuildDate>
    <image>
      <url>https://staff.fnwi.uva.nl/a.pathania/media/icon_hu0b7a4cb9992c9ac0e91bd28ffd38dd00_9727_512x512_fill_lanczos_center_3.png</url>
      <title>Paper</title>
      <link>https://staff.fnwi.uva.nl/a.pathania/tag/paper/</link>
    </image>
    
    <item>
      <title>The paper &#39;Lifetime Estimation for Core-Failure Resilient Multi-Core Processors&#39; was accepted for IEEE MCSoC &#39;23</title>
      <link>https://staff.fnwi.uva.nl/a.pathania/post/mcsoc-23/</link>
      <pubDate>Thu, 10 Aug 2023 00:00:00 +0000</pubDate>
      <guid>https://staff.fnwi.uva.nl/a.pathania/post/mcsoc-23/</guid>
      <description>&lt;p&gt;&lt;em&gt;Abstract&lt;/em&gt;: Multi-core processors come with several cores integrated on a single die. They often work incessantly under high thermal stress, leading to severe wear-out. Server-class multicores already come with a mechanism to survive a core failure called Core Failure Resilience (CFR). Embedded multi-cores with CFR are already on the horizon. The surviving cores must take on an additional workload from their fellow failed core(s) under CFR. They must also operate on higher frequencies to continue meeting the target performance. However, this additional workload assignment further accelerates the wear-out of the surviving cores due to additional heat from higher frequency operation. Lifetime estimation frameworks rely on detailed simulations, which leads to long simulation times. These frameworks are unsuitable for the early stages of the design process as they cannot quickly evaluate many design points. Existing frameworks cannot estimate the Mean Time to Failure (MTTF) for multicores that include Core-Failure Resilient (CFR) capabilities. We introduce SLICER, the first framework for estimating the MTTF of CFR multi-cores. SLICER integrates with state-of-the-art tools HotSniper and MatEx for fast and accurate MTTF estimation.&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title>The demo/tool &#39;ARM-CO-UP: ARM Co-Operative Utilization of Processors&#39; was accepted for the Embedded Systems Software Competition at the IEEE/ACM Embedded Systems Week 2023.</title>
      <link>https://staff.fnwi.uva.nl/a.pathania/post/competition-esweek-2023/</link>
      <pubDate>Tue, 01 Aug 2023 00:00:00 +0000</pubDate>
      <guid>https://staff.fnwi.uva.nl/a.pathania/post/competition-esweek-2023/</guid>
      <description>&lt;p&gt;&lt;em&gt;Abstract&lt;/em&gt;: Heterogeneous Multi-Processor System on Chips (HMPSoCs) combines different processors on a single chip. They enable powerful embedded devices, which increasingly perform Machine Learning (ML) inference at the edge. State-of-the-art HMPSoCs can perform on-chip embedded inference using their CPU, GPU, and integrated accelerators. The on-chip GPU in embedded devices is comparable in performance to CPU clusters, and efficient inference requires the cooperative utilization of these processors. Integrated accelerators, although operating with lower bit precision, significantly improve power efficiency at the expense of model accuracy.&lt;/p&gt;
&lt;p&gt;However, existing inference frameworks for edge devices typically utilize only a single processor type and lack the ability to use different processor types collaboratively. To this end,We design the ARM-COUP framework based on the ARM-CL framework. The ARM-COUP provides both parallel and serial utilization of different processor types. In parallel mode, it optimizes throughput(FPS) and energy efficiency by leveraging pipeline execution of network partitions for consecutive input data. While in serial mode, it improves inference latency and energy efficiency through layer-switch inference for each input data and layer-wise DVFS. It automates model graph partitioning and mapping, pipeline synchronization, processor type switching, layer-wise DVFS, and integration of new accelerators, even with closed-source libraries.&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title>The paper &#39;Thermal Management for 3D-Stacked Systems via Unified Core-Memory Power Regulation&#39; was accepted for publication at IEEE/ACM CODES&#43;ISSS&#39;23.</title>
      <link>https://staff.fnwi.uva.nl/a.pathania/post/codes&#43;isss-23/</link>
      <pubDate>Sat, 01 Jul 2023 00:00:00 +0000</pubDate>
      <guid>https://staff.fnwi.uva.nl/a.pathania/post/codes&#43;isss-23/</guid>
      <description></description>
    </item>
    
    <item>
      <title>The paper &#39;PELSI: Power-Efficient Layer-Switched Inference&#39; was accepted for publication at IEEE RTCSA &#39;23.</title>
      <link>https://staff.fnwi.uva.nl/a.pathania/post/rtcsa-23b/</link>
      <pubDate>Tue, 30 May 2023 00:00:00 +0000</pubDate>
      <guid>https://staff.fnwi.uva.nl/a.pathania/post/rtcsa-23b/</guid>
      <description>&lt;p&gt;&lt;em&gt;Abstract&lt;/em&gt;: Convolutional Neural Networks (CNNs) are now quintessential kernels within embedded computer vision applications deployed in edge devices. Heterogeneous Multi-Processor Systemon- Chips (HMPSoCs) with Dynamic Voltage and Frequency Scaling (DVFS) capable components (CPUs and GPUs) allow for lowlatency, low-power CNN inference on resource-constrained edge devices when employed efficiently.&lt;/p&gt;
&lt;p&gt;CNNs comprise several heterogeneous layer types that execute with different degrees of power efficiency on different HMPSoC components at different frequencies.We propose the first framework, PELSI, that exploits this layer-wise power efficiency heterogeneity for power-efficient CPU-GPU layer-switched CNN interference on HMPSoCs. PELSI executes each layer of a CNN on an HMPSoC component (CPU or GPU) clocked at just the right frequency for every layer such that the CNN meets its inference latency target with minimal power consumption while still accounting for the powerperformance overhead of multiple switching between CPU and GPU mid-inference. PELSI incorporates a Genetic Algorithm (GA) to identify the near-optimal CPU-GPU layer-switched CNN inference configuration from within the large exponential design space that meets the given latency requirement most power efficiently.&lt;/p&gt;
&lt;p&gt;We evaluate PELSI on Rock-Pi embedded platform. The platform contains an RK3399Pro HMPSoC with DVFS-capable CPU clusters and GPU. Empirical evaluations with five different CNNs show a 44.48% improvement in power efficiency for CNN inference under PELSI over the state-of-the-art.&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title>The paper &#39;3D-TTP: Efficient Transient Temperature-Aware Power Budgeting for 3D-Stacked Processor-Memory Systems&#39; was accepted for publication at IEEE ISVLSI &#39;23.</title>
      <link>https://staff.fnwi.uva.nl/a.pathania/post/islvlsi-23/</link>
      <pubDate>Thu, 04 May 2023 00:00:00 +0000</pubDate>
      <guid>https://staff.fnwi.uva.nl/a.pathania/post/islvlsi-23/</guid>
      <description></description>
    </item>
    
    <item>
      <title>The paper &#39;HiMap: Fast and Scalable High-Quality Mapping on CGRA via Hierarchical Abstraction&#39; was accepted for DATE 2021.</title>
      <link>https://staff.fnwi.uva.nl/a.pathania/post/date-21/</link>
      <pubDate>Tue, 15 Nov 2022 00:00:00 +0000</pubDate>
      <guid>https://staff.fnwi.uva.nl/a.pathania/post/date-21/</guid>
      <description></description>
    </item>
    
    <item>
      <title>The paper &#39;Thermal Management for S-NUCA Many-Cores via Synchronous Thread Rotations&#39; was accepted at DATE 2023.</title>
      <link>https://staff.fnwi.uva.nl/a.pathania/post/date-22/</link>
      <pubDate>Tue, 15 Nov 2022 00:00:00 +0000</pubDate>
      <guid>https://staff.fnwi.uva.nl/a.pathania/post/date-22/</guid>
      <description></description>
    </item>
    
    <item>
      <title>The chapter &#39;Pipelined CNN Inference on Heterogeneous Multi-Processor System-on-Chip&#39; was accepted for the book on Embedded Machine Learning for Cyber Physical, IoT, and Edge Computing (Springer).</title>
      <link>https://staff.fnwi.uva.nl/a.pathania/post/springer-22/</link>
      <pubDate>Mon, 08 Aug 2022 00:00:00 +0000</pubDate>
      <guid>https://staff.fnwi.uva.nl/a.pathania/post/springer-22/</guid>
      <description></description>
    </item>
    
    <item>
      <title>The paper &#39;CPU-GPU Layer-Switched Low Latency CNN Inference&#39; was accepted for Euromicro DSD 2022.</title>
      <link>https://staff.fnwi.uva.nl/a.pathania/post/dsd-22/</link>
      <pubDate>Fri, 17 Jun 2022 00:00:00 +0000</pubDate>
      <guid>https://staff.fnwi.uva.nl/a.pathania/post/dsd-22/</guid>
      <description></description>
    </item>
    
    <item>
      <title>The paper &#39;CoMeT: An Integrated Interval Thermal Simulation Toolchain for 2D, 2.5 D, and 3D Processor-Memory System&#39; was accepted for ACM Transactions on Architecture and Code (TACO).</title>
      <link>https://staff.fnwi.uva.nl/a.pathania/post/taco-22/</link>
      <pubDate>Thu, 14 Apr 2022 00:00:00 +0000</pubDate>
      <guid>https://staff.fnwi.uva.nl/a.pathania/post/taco-22/</guid>
      <description></description>
    </item>
    
    <item>
      <title>The paper &#39;HiMap: Fast and Scalable High-Quality Mapping on CGRA via Hierarchical Abstraction&#39; was accepted for IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.</title>
      <link>https://staff.fnwi.uva.nl/a.pathania/post/tcad-21a/</link>
      <pubDate>Mon, 08 Nov 2021 00:00:00 +0000</pubDate>
      <guid>https://staff.fnwi.uva.nl/a.pathania/post/tcad-21a/</guid>
      <description></description>
    </item>
    
    <item>
      <title>The paper &#39;T-TSP: Transient-Temperature Based Safe Power Budgeting in Multi-/Many-Core Processors&#39; was accepted for IEEE ICCD 2021.</title>
      <link>https://staff.fnwi.uva.nl/a.pathania/post/iccd-21/</link>
      <pubDate>Fri, 17 Sep 2021 00:00:00 +0000</pubDate>
      <guid>https://staff.fnwi.uva.nl/a.pathania/post/iccd-21/</guid>
      <description></description>
    </item>
    
    <item>
      <title>The paper &#39;ChordMap: Automated Mapping of Streaming Applications onto CGRA&#39; was accepted for IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.</title>
      <link>https://staff.fnwi.uva.nl/a.pathania/post/tcad-21b/</link>
      <pubDate>Sun, 17 Jan 2021 00:00:00 +0000</pubDate>
      <guid>https://staff.fnwi.uva.nl/a.pathania/post/tcad-21b/</guid>
      <description></description>
    </item>
    
  </channel>
</rss>
