TY - JOUR
T1 - Thermally Constrained Codesign of Heterogeneous 3-D Integration of Compute-in-Memory, Digital ML Accelerator, and RISC-V Cores for Mixed ML and Non-ML Workloads
AU - Luo, Yuan Chun
AU - Lu, Anni
AU - Sharda, Janak
AU - Scherer, Moritz
AU - Gomez, Jorge Tomas
AU - Sarwar, Syed Shakib
AU - Li, Ziyun
AU - Pinkham, Reid Frederick
AU - Salvo, Barbara De
AU - Yu, Shimeng
N1 - Publisher Copyright:
IEEE
PY - 2024
Y1 - 2024
N2 - Heterogeneous 3-D (H3D) integration not only reduces the chip form factor and fabrication cost but also allows the merging of diverse compute paradigms that suit different applications. This is especially attractive when modern algorithms, such as the augmented reality/virtual reality (AR/VR) workloads, consist of mixed machine learning (ML) and non-ML workloads. To date, codesign that considers the thermal, latency, and power constraints of H3D hardware is largely unexplored. In this work, a thermally aware framework for H3D hardware design is developed to evaluate the thermal, latency, and power trade-offs for a heterogeneous system with compute-in-memory (CIM), digital ML cores, and RISC-V cores. The framework solves for runtime tunable operating points described as the optimal speedup factor, the number of activated RISC-V cores, the cooling coefficient, and the activity rate based on user-defined criteria, achieving up to 135 TOPS and 215 TOPS/W under 74 $^{\circ}$ C for the AR/VR workloads.
AB - Heterogeneous 3-D (H3D) integration not only reduces the chip form factor and fabrication cost but also allows the merging of diverse compute paradigms that suit different applications. This is especially attractive when modern algorithms, such as the augmented reality/virtual reality (AR/VR) workloads, consist of mixed machine learning (ML) and non-ML workloads. To date, codesign that considers the thermal, latency, and power constraints of H3D hardware is largely unexplored. In this work, a thermally aware framework for H3D hardware design is developed to evaluate the thermal, latency, and power trade-offs for a heterogeneous system with compute-in-memory (CIM), digital ML cores, and RISC-V cores. The framework solves for runtime tunable operating points described as the optimal speedup factor, the number of activated RISC-V cores, the cooling coefficient, and the activity rate based on user-defined criteria, achieving up to 135 TOPS and 215 TOPS/W under 74 $^{\circ}$ C for the AR/VR workloads.
KW - Augmented reality/virtual reality
KW - compute-in-memory (CIM)
KW - heterogenous 3-D integration
KW - machine learning (ML)
KW - RISC-V
KW - thermally constrained framework
UR - http://www.scopus.com/inward/record.url?scp=85197021233&partnerID=8YFLogxK
U2 - 10.1109/TVLSI.2024.3415481
DO - 10.1109/TVLSI.2024.3415481
M3 - Article
AN - SCOPUS:85197021233
SN - 1063-8210
SP - 1
EP - 8
JO - IEEE Transactions on Very Large Scale Integration (VLSI) Systems
JF - IEEE Transactions on Very Large Scale Integration (VLSI) Systems
ER -