Skip to main content Skip to navigation

Workshop / Seminar

Hardware Accelerators For Machine Learning: From 3d Manycore To Processing-In-Memory Architectures by Aqeeb Iqbal Arka – Dissertation Defense


About the event

Student: Aqeeb Iqbal Arka

Advisor: Dr. Partha Pande

Degree: Electrical and Computer Engineering

Dissertation Title: Hardware Accelerators For Machine Learning: From 3d Manycore To Processing-In-Memory Architectures

Abstract: Big data applications such as – deep learning and graph analytics require hardware platforms that are energy-efficient yet computationally powerful. 3D manycore architectures are the key to efficiently executing such compute- and data-intensive applications. Monolithic 3D (M3D) integration opens up the possibility of designing cores and associated network routers using multiple layers by utilizing monolithic inter-tier vias (MIVs) and hence, reducing the effective wire length. Compared to TSV-based 3D ICs, M3D offers the “true” benefits of vertical dimension for system integration. The first part of this work focuses on the design of high-performance and energy efficient architectures for big-data applications, enabled by M3D vertical integration. As an example, we consider heterogeneous manycore architectures with CPUs, GPUs, and Cache as the choice of hardware platform in this work. The disparate nature of these processing elements introduces conflicting design requirements that need to be satisfied simultaneously. Moreover, the on-chip traffic pattern exhibited by different big-data applications need to be incorporated in the design process for optimal power-performance trade-off. We design a M3D-enabled heterogeneous manycore architecture to accelerate machine learning workloads.

Later, we focus on Processing-in-Memory (PIM) architectures to accelerate deep learning applications. We choose Graph Neural Networks (GNNs) as an example as it is both compute- and data-intensive in nature. The high amount of data movement required by GNN computation poses a challenge to conventional von-Neuman architectures as they have limited memory bandwidth. Hence, we propose the use of PIM-based non-volatile memory such as Resistive Random Access Memory (ReRAM). We leverage the efficient matrix operations enabled by ReRAMs and design manycore architectures that can facilitate the unique computation and communication needs of large scale GNN training. Overall, in this work we propose novel architectures that use M3D or ReRAM-based PIM architectures  to accelerate machine learning applications.


Tiffani Stubblefield
(509) 335-2958