List of My Software

Big data

a leading parallel and space-efficient algorithm for Burrows-Wheeler transform construction on big genome data.

Machine Learning

LightPCC	the first parallel and distributed pairwise correlation computation on Intel Xeon Phi clusters for data science (e.g. co-expression network construction and feature selection).
G-Sparse	a new compiler framework that extends the popular Halide compiler to enable effective acceleration for generalized sparse computations for GNNs through compiler-driven optimizations and auto-tuning.
SGD Optimizers	a set of stochastic gradient optimizers for deep learning, including AGD and WSAM.

Scientific Computing

LightSpMV	a faster compressed sparse row (CSR)-based sparse matrix-vector multiplication algorithm on CUDA-enabled GPUs.
LightScan	a faster parallel scan primitive for CUDA-enabled GPUs by investigating a hybrid model combining intra-block computation and inter-block communication.

Motif Finding

CUDA-MEME	a fast parallel motif finding algorithm based on MEME (version 3.5.4) algorithm for a single GPU device using CUDA.
mCUDA-MEME	a further extension of CUDA-MEME based on MEME (version 4.4.0) algorithm for multiple GPUs using a hybrid combination of CUDA, MPI and OpenMP.
CompleteMOTIFs	an integrated web tool developed by Harvard Medical School to facilitate systematic discovery of over-represented transcription factor binding motifs from high-throughput chromatin immunoprecipitation experiments. I contributed CUDA-MEME to accelerate motif discovery.

Next Generation Sequencing (NGS)

Short-read alignment	CUSHAW	the first distribution of the CUSHAW software package for NGS read alignment. It is a CUDA compatible short read alignment algorithm for multiple GPUs sharing a single host. This aligner only provides support for ungapped alignment and has been incorporated to NVIDIA Tesla Bio Workbench.
	CUSHAW2	the second distribution of the CUSHAW software package for NGS read alignment. It is a fast and parallel gapped read alignment to large genomes, such as the human genome. This aligner has been further accelerated using GPU computing and is implemented in CUSHAW2-GPU.
	CUSHAW3	the third distribution of the CUSHAW software package for NGS read alignment. It is a parallel, sensitive and accurate short-read aligner for both base-space and color-space single-end/paired-end reads. This aligners has been further enhanced using cluster computing and is implemented in CUSHAW3-UPC.
Short-read error correction	DecGPU	the first parallel and distributed pre-assembly short read error correction algorithm using CUDA and MPI.
	Musket	a parallel and scalable multistage k-mer spectrum based error corrector for Illumina sequence data.
	Hector	a parallel multistage homopolymer spectrum based error corrector to handle homopolymer insertions or deletions in 454 sequencing data.
Short-read assembly	PASHA	a parallelized short read assembler for large genomes, such as the human genome, using de Bruijn graphs.
SNV calling	SNVSniffer	an integrated caller for germline and somatic single nucleotide variants (SNVs) in diploid genomes.
Metagenomics	All-Food-Seq	a software pipeline for quantitative measurement of species composition in foodstuff material.

Sequence Alignment

Pairwise sequence alignment	CUDASW++	the fastest parallel Smith Waterman protein database search algorithm for GPGPUs using CUDA.
	SWAPHI	the first parallel algorithm to accelerate the Smith-Waterman protein database search on Xeon Phi coprocessors.
	SWAPHI-LS	the first parallel Smith-Waterman algorithm exploiting Xeon Phi clusters to accelerate the alignment of long DNA sequences.
	XBitPar	a Bit-parallel approximate pattern matching algorithm that is based on the Wu-Manber algorithm and further accelerated by Xeon Phi coprocessors.
Multiple sequence alignment	MSAProbs	a well-established state-of-the-art multiple sequence alignment algorithm for protein sequences, which produces the highest alignment accuracy compared to the existing leading aligners.