List of My Software
Big data
| ParaBWT | a leading parallel and space-efficient algorithm for Burrows-Wheeler transform construction on big genome data. | |
Machine Learning
| PyAGC | a production-ready, modular library and comprehensive benchmark for Attributed Graph Clustering (AGC), built on PyTorch and PyTorch Geometric. | |
| GDGB | the first generative dynamic text-attributed graph (DyTAG) benchmark, including eight high-quality datasets tailored for DyTAG generation. | |
| TGRB | a comprehensive evaluation framework for graph learning methods on text-attributed graphs against both textual and structural attacks, from four domains under transductive poisoning and inductive evasion settings. | |
| M3GQA | the first Graph RAG benchmark focusing on multi- entity queries, a highly practical yet challenging aspect in Graph RAG systems. | |
| GraphCLIP | a framework for enhancing the transferability of Graph Foundation Models in low-resource scenarios such as zero-shot. | |
| Graph RAG Survey | the first comprehensive overview of Graph RAG methodologies, which formalizes the Graph RAG problem and workflow. | |
| LightPCC | the first parallel and distributed pairwise correlation computation on Intel Xeon Phi clusters for data science (e.g. co-expression network construction and feature selection). | |
| G-Sparse | a new compiler framework that extends the popular Halide compiler to enable effective acceleration for generalized sparse computations for GNNs through compiler-driven optimizations and auto-tuning. | |
| SGD Optimizers | a set of stochastic gradient optimizers for deep learning, including AGD and WSAM. | |
Scientific Computing
| LightSpMV | a faster compressed sparse row (CSR)-based sparse matrix-vector multiplication algorithm on CUDA-enabled GPUs. | |
| LightScan | a faster parallel scan primitive for CUDA-enabled GPUs by investigating a hybrid model combining intra-block computation and inter-block communication. | |
Motif Finding
| CUDA-MEME | a fast parallel motif finding algorithm based on MEME (version 3.5.4) algorithm for a single GPU device using CUDA. | |
| mCUDA-MEME | a further extension of CUDA-MEME based on MEME (version 4.4.0) algorithm for multiple GPUs using a hybrid combination of CUDA, MPI and OpenMP. | |
| CompleteMOTIFs | an integrated web tool developed by Harvard Medical School to facilitate systematic discovery of over-represented transcription factor binding motifs from high-throughput chromatin immunoprecipitation experiments. I contributed CUDA-MEME to accelerate motif discovery. | |
Next Generation Sequencing (NGS)
| Short-read alignment | CUSHAW | the first distribution of the CUSHAW software package for NGS read alignment. It is a CUDA compatible short read alignment algorithm for multiple GPUs sharing a single host. This aligner only provides support for ungapped alignment and has been incorporated to NVIDIA Tesla Bio Workbench. |
| CUSHAW2 | the second distribution of the CUSHAW software package for NGS read alignment. It is a fast and parallel gapped read alignment to large genomes, such as the human genome. This aligner has been further accelerated using GPU computing and is implemented in CUSHAW2-GPU. | |
| CUSHAW3 | the third distribution of the CUSHAW software package for NGS read alignment. It is a parallel, sensitive and accurate short-read aligner for both base-space and color-space single-end/paired-end reads. This aligners has been further enhanced using cluster computing and is implemented in CUSHAW3-UPC. | |
| Short-read error correction | DecGPU | the first parallel and distributed pre-assembly short read error correction algorithm using CUDA and MPI. |
| Musket | a parallel and scalable multistage k-mer spectrum based error corrector for Illumina sequence data. | |
| Hector | a parallel multistage homopolymer spectrum based error corrector to handle homopolymer insertions or deletions in 454 sequencing data. | |
| Short-read assembly | PASHA | a parallelized short read assembler for large genomes, such as the human genome, using de Bruijn graphs. |
| SNV calling | SNVSniffer | an integrated caller for germline and somatic single nucleotide variants (SNVs) in diploid genomes. |
| Metagenomics | All-Food-Seq | a software pipeline for quantitative measurement of species composition in foodstuff material. |
Sequence Alignment
| Pairwise sequence alignment | CUDASW++ | the fastest parallel Smith Waterman protein database search algorithm for GPGPUs using CUDA. | |
| SWAPHI | the first parallel algorithm to accelerate the Smith-Waterman protein database search on Xeon Phi coprocessors. | ||
| SWAPHI-LS | the first parallel Smith-Waterman algorithm exploiting Xeon Phi clusters to accelerate the alignment of long DNA sequences. | ||
| XBitPar | a Bit-parallel approximate pattern matching algorithm that is based on the Wu-Manber algorithm and further accelerated by Xeon Phi coprocessors. | ||
| Multiple sequence alignment | MSAProbs | a well-established state-of-the-art multiple sequence alignment algorithm for protein sequences, which produces the highest alignment accuracy compared to the existing leading aligners. | |