Skip to main content Skip to navigation

EECS Colloquium: Holistic Performance Analysis and Optimization of Unified Memory by Tyler Allen, Clemson Univ.

Online
Zoom link

About the event

Abstract
High-performance computing systems have seen tremendous growth in theoretical performance with the inclusion of Graphics Processing Units (GPUs) and other accelerators.  The difficulty and complexity of programming these systems has grown alongside the performance as programmers are required to manage separate programming models and device memories for accelerators alongside traditional CPU resources. A large step towards reducing the complexity introduced by these systems was the introduction of heterogeneous shared memory technologies, such as NVIDIA UVM, to manage the physical location of memory on behalf of the programmer. These technologies greatly reduce the manual resource management burden at the cost of a significant amount of performance. Improving the performance of these systems can enable much higher efficiency on large-scale GPU clusters without modification of user application code that utilizes unified memory and greatly eases the development of future applications by expanding the applicability of unified memory. While prior work has taken steps to understand the application-level performance of these systems, the underlying system-level performance issues are not well studied and therefore challenging to resolve. In this work, we evaluate the performance of unified memory performance using low-level performance profiling and instrumentation and identify key hardware and software bottlenecks. We also introduce a method of decoupling hardware components during page fault servicing to improve the overall performance of the system. Finally, we discuss the general applicability of these findings to future unified memory systems and future optimization targets.

Bio
Tyler Allen is a Computer Science Ph.D. candidate in the School of Computing at Clemson University. He received his master’s degree in Computer Science from Clemson University and his undergraduate degrees in Computer Science and Applied Mathematics at Western Carolina University. His prior work includes GPU application performance optimization, virtual memory systems performance analysis, and energy-efficient and power-bounded computing. His research interests are broadly in HPC, accelerated computing, performance optimization, and other systems areas. In 2021, his work on high-performance GPU implementations of the Word2Vec NLP algorithm won the Best Paper Award at the International Conference on Supercomputing (ICS21). He has previously received departmental awards for teaching, research, and service. He is also the Vice-Chair of the Lead Student Volunteer program for the 2022 Supercomputing Conference.

Contact