The paper 'Lifetime Estimation for Core-Failure Resilient Multi-Core Processors' was accepted for IEEE MCSoC '23

Abstract: Multi-core processors come with several cores integrated on a single die. They often work incessantly under high thermal stress, leading to severe wear-out. Server-class multicores already come with a mechanism to survive a core failure called Core Failure Resilience (CFR). Embedded multi-cores with CFR are already on the horizon. The surviving cores must take on an additional workload from their fellow failed core(s) under CFR. They must also operate on higher frequencies to continue meeting the target performance. However, this additional workload assignment further accelerates the wear-out of the surviving cores due to additional heat from higher frequency operation. Lifetime estimation frameworks rely on detailed simulations, which leads to long simulation times. These frameworks are unsuitable for the early stages of the design process as they cannot quickly evaluate many design points. Existing frameworks cannot estimate the Mean Time to Failure (MTTF) for multicores that include Core-Failure Resilient (CFR) capabilities. We introduce SLICER, the first framework for estimating the MTTF of CFR multi-cores. SLICER integrates with state-of-the-art tools HotSniper and MatEx for fast and accurate MTTF estimation.

Anuj Pathania
Anuj Pathania
Assistant Professor

Anuj Pathania is an Assistant Professor in the Parallel Computing Systems (PCS) group at the University of Amsterdam (UvA). His research focuses on the design of sustainable systems deployed in power-, thermal-, energy- and reliability-constrained environments.