Preliminary Exam – Efficient Design, Implementation, and Application of Graph Isomorphism Algorithms
About the event
Efficient Design, Implementation, and Application of Graph Isomorphism Algorithms
Ph.D. Preliminary Examination for Coby Soss
Chair: Assefaw Gebremedhin
Abstract
Graph isomorphism is a useful model in many areas of science such as social network analysis, computer security, computer vision, protein structure identification, compilers, and cheminformatics. Since the graph isomorphism test has so many use cases, there has been a strong incentive to find efficient implementations for the graph isomorphism problem. This research proposal centers around the design, implementation, and application of such a solution. ScaWL, a high-performance implementation of the k-WL algorithm for effective approximation of the graph isomorphism problem, constitutes the core contribution of this research.
The k-dimensional Weisfeiler-Leman (k-WL) algorithm—developed as an efficient heuristic for testing if two graphs are isomorphic—is a fundamental kernel for node embedding in the emerging field of graph neural networks. Unfortunately, the k-WL algorithm has exponential storage requirements, limiting the size of graphs that can be handled. This work presents a novel k-WL scheme with a storage requirement orders of magnitude lower while maintaining the same accuracy as the original algorithm. Due to the reduced storage requirement, our scheme allows for processing much bigger graphs than previously possible on a single compute node. For even bigger graphs, we provide the first distributed-memory implementation. Our k-WL scheme also has significantly reduced communication volume and offers high scalability. Our experimental results demonstrate that our approach is significantly faster and has superior scalability compared to five other implementations employing state-of-the-art techniques.
ScaWL can be applied wherever embeddings, which are structural information derived from the isomorphism test, are useful. These embeddings encode information about the neighborhood structure surrounding a node, informing downstream algorithms for tasks requiring structural graph information. ScaWL’s scalability has the potential to enable these embeddings to support algorithms for network alignment and GraphRAG data matching on large-scale graphs, which are two applications this proposal seeks to explore.
In network alignment, the goal is to align nodes across two graphs such that the structural surrounding of those nodes match as much as possible. The use of embeddings are a natural fit for this problem because they encode structural information about the graphs. This proposal will explore possible applications of ScaWL utilizing the embeddings.
Data matching in a GraphRAG framework is another rich application for ScaWL. The embeddings from ScaWL can be used, possibly in combination with other embedding techniques, to map the semantic graphs generated by GraphRAG. Examples include matching documents at the PDF or paragraph level, or finding matches between the graphs generated from new text and content already stored in a knowledge graph.