Accelerated Community Detection Algorithm using Graphs on GPUs

Abstract:  

Community detection methods are applied to gene expression data to efficiently identify natural divisions or functional modules within biological networks, but their computational demands increase, particularly given the substantial growth in dataset sizes. The Leiden algorithm, an emerging method in this field, offers inherent parallelism that remains underutilized due to the limited parallel processing capabilities offered by today’s modern multi-core CPUs, which have fewer than 100 cores (typically 32-64CPUs). However, Leiden can achieve significant performance gains when implemented on GPUs. GPUs offer high memory bandwidth and an extensive array of parallel processing units that map well to the parallelism in Leiden. As far as we know, cugraph [15] is the sole implementation that has mapped the Leiden algorithm to GPUs, using a blend of Python and C languages. However, it only supports undirected graphs. In addition to this Python implementation to GPUs is comparatively slower than C/C++, which reduces significant performance gains provided by the GPU’s speedup. Conversely, C/C++ optimizes performance more effectively, ensuring an accurate baseline comparison when performing GPU acceleration. This work presents a lightweight C++ based GPU implementation of the Leiden algorithm and, to the best of our knowledge, the very first GPU implementation that supports directed graphs, which generally demands nearly twice the computational time and memory resources compared to undirected graphs. We also evaluate the effectiveness of our approach on larger gene expression datasets and datasets other than gene expression data to show the scalability of application in various data-intensive applications beyond gene expression data. Findings show that our directed GPU/C++ implementation is 4.5x faster than our directed CPU/ C++ implementation, and our directed GPU/C++ version performs 1.9x faster than our undirected CPU/C++ implementation and 9.3x faster than original CPU/Java implementation of the Leiden. We also compare our undirected and directed CPU/C++ versions with undirected GPU implementation cugraph, results show that cugraph achieves 1.8x speedup over our directed GPU/C++ implementation while not providing support for directed graphs and compromising on quality of clusters.
 

Committee:

  • Adwait Jog, Committee Chair (CS/SEAS/UVA)
  • Kevin Skadron, Advisor (CS/SEAS/UVA)
  • Farzad Farnoud (ECE, CS/SEAS/UVA)
  • Miaomiao Zhang (ECE,CS/SEAS/UVA)