LESSA: a Library of Building Blocks for Efficient, Scalable and Service-oriented Algorithms

1.Machine Learning Foundation Techniques and Applications

We are dedicated to developing foundation techinques in machine learning, including but not limited to foundation models (e.g. large language/graph models), gradient desecent optimizers, distrbuted learning systems, deep learning compilers and automatic machine learning (AutoML), and further develop practical applications to solve real-world challenges

Example work:

Bin Dou, Baokun Wang, Yun Zhu, Xiaotong Lin, Yike Xu, Xiaorui Huang, Yang Chen, Yun Liu, Shaoshuai Han, Yongchao Liu, Tianyi Zhang, Yu Cheng, Weiqiang Wang and Chuntao Hong: "Transferable and forecastable user targeting foundation model". The Web Conference 2025 (WWW 2025), 2025.
Yun Zhu, Haizhou Shi, Xiaotang Wang, Yongchao Liu, Yaoke Wang, Boci Peng, Chuntao Hong, Siliang Tang: "GraphCLIP: enhancing transferability in graph foundation models for text-attributed graphs". The Web Conference 2025 (WWW 2025), 2025.
Li Ma, Yongchao Liu, Xiaofeng Gao, Peng Zhang, Chuntao Hong: "Building robust and trustworthy HGNN models: a learnable threshold approach for node classification". ACM Transactions on Knowledge Discovery from Data, 2024.
Xiaotang Wang, Yongchao Liu, Yun Zhu, Haizhou Shi, Chuntao Hong: Graph triple attention networks: a decoupled perspective. 2025 ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 2025), 2024
Ting Li, Chunqi Wu, Yang Liu, Zhao Li, Chuan Zhou, Chenhao Qiu, Hongyang Chen, Yongchao Liu, Peng Du and Chuntao Hong: Unsupervised pre-trained social networks for e-commerce community detection. 2024 IEEE International Conference on High Performance Computing and Communications (HPCC 2024), 2024.
Boci Peng, Yongchao Liu, Xiaohe Bo, Sheng Tian, Baokun Wang, Chuntao Hong, Yan Zhang: "Subgraph retrieval enhanced by graph-text alignment for commonsense question answering." European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases 2024 (ECML PKDD 2024), 2024.
Sheng Tian, Xintan Zeng, Yifei Hu, Baokun Wang, Yongchao Liu, Yue Jin, Changhua Meng, Chuntao Hong, Tianyi Zhang, Weiqiang Wang: "GraphRPM: risk pattern mining on industrial large attributed graphs." European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases 2024 (ECML PKDD 2024), 2024.
Pengyu Qiu, Yongchao Liu, Xintan Zeng: "DiVerFed: distribution-aware vertical federated learning for missing information". The 17th International Conference on Knowledge Science, Engineering and Management (KSEM 2024), 2024 (Best Student Paper Award).
Yue Jin, Sheng Tian, Yongchao Liu, Chuntao Hong: "GraphGen: a distributed graph sample generation framework on industry-scale graphs". The European Conference on Computer Systems (EuroSys 2024), 2024 (poster track).
Pengyu Qiu, Yuwen Pu, Yongchao Liu, Wenyan Liu, Yun Yue, Xiaowei Zhu, Lichun Li, Jinbao Li, Shouling Ji: "Integer is Enough: when vertical federated learning meets rounding". 38th AAAI Conference on Artificial Intelligence (AAAI 2024), 38(13), 14704-14712, 2024.
Yue Jin, Yongchao Liu: "GPC: compiler-based optimization for sparse computations in graph neural networks". The European Conference on Computer Systems (EuroSys 2023), 2023 (poster track).
Yun Yue, Zhiling Ye, Jiadi Jiang, Yongchao Liu, Ke Zhang: "AGD: an auto-switchable optimizer using stepwise gradient difference for preconditioning matrix". 37th Conference on Neural Information Processing Systems (NeurIPS 2023), 2023
Yun Yue, Jiadi Jiang, Zhiling Ye, Gao Ning, Yongchao Liu, Ke Zhang: "Sharpness-aware minimization revisited: weighted sharpness as a regularization term". 2023 ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 2023), 2023, pp. 3185–3194 (research track)
Yice Luo, Guannan Wang, Yongchao Liu, Jiaxin Yue, Weihong Cheng and Binjie Fei: "FAF: a risk detection framework on industry-scale graphs". 32nd ACM International Conference on Information and Knowledge Management (CIKM 2023), 2023, pp. 4717–4723.
Yuchen Zhou, Yanan Cao, Yongchao Liu, Yanmin Shang, Peng Zhang, Zheng Lin, Yun Yue, Baokun Wang, Xing Fu and Weiqiang Wang: "Multi-aspect heterogeneous graph augmentation". The 2023 ACM Web Conference (WWW 2023), 2023, pp. 39-48.
Houyi Li, Zhihong Chen, Chenliang Li, Rong Xiao, Hongbo Deng, Peng Zhang, Yongchao Liu and Haihong Tang: "Path-based deep network for candidate item matching in recommenders". 44th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2021), 2021, pp. 1493–1502.
Yang Gao, Peng Zhang, Zhao Li, Chuan Zhou, Hong Yang, Yongchao Liu, and Yue Hu: "Heterogeneous graph neural architecture search". IEEE International Conference on Data Mining (ICDM 2021), 2021, pp. 1066-1071
Yun Yue, Yongchao Liu, Suo Tong, Minghao Li, Zhen Zhang, Chunyang Wen, Huanjun Bao, Lihong Gu, Jinjie Gu and Yixiang Mu: "Adaptive optimizers with sparse group lasso for neural networks in CTR prediction". The European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML-PKDD 2021), 2021, pp. 314–329 .
Yongchao Liu, Houyi Li, Guowei Zhang, Xintan Zeng, Yongyong Li, Bin Huang, Peng Zhang, Zhao Li, Xiaowei Zhu, Changhua He, Guowei Zhang, Xintan Zeng: "GraphTheta: A distributed graph neural network learning system with flexible training strategy". arXiv:2104.10569, 2021
Yongchao Liu, Yue Jin, Yong Chen, Teng Teng, Hang Ou, Rui Zhao, Yao Zhang: "Woodpecker-DL: Accelerating Deep Neural Networks via Hardware-Aware Multifaceted Optimizations". arXiv:2008.04567 [cs.DC], 2020.

2. Compact Computing for Big Data

As data volume increases exponentially and data compression is popularly used in data centers, I would expect that directly operating on compressed data would become commonplace in the future. However, parallel processing of compressed data is a challenging proposition for both shared-memory and distributed-memory systems and the challenges could come from constrained random accesses to uncompressed content, independent decompression of data blocks, balanced distribution of data, memory and computation, adaption of existing algorithms and applications to meet the requirements of compressed data processing and so on. Based on these concerns, I am conceiving of a new concept of computing and name it as Compact Computing tentatively. In general, Compact Computing targets robust, flexible and reproducible parallel processing of big data and consists of three core components, in principle: (1) tightly-coupled architectures, (2) compressive and elastic data representation, and (3) efficient, scalable and service-oriented algorithms and applications. In this context, component 1 can comprise conventional CPUs, a diversity of accelerators (e.g. FPGAs, GPUs, MIC processors and etc.) and fast interconnect communication facilities; component 2 concentrates on data structures and formats that enable efficient on-the-fly streaming compression and decompression; and component 3 targets the development of algorithms and applications that enable robust and efficient processing of big data streams in parallel. By centering around data, Compact Computing enables full-stack computation by tightly coupling algorithms with systems.

Example work:

Jiaxin Jiang, Siyuan Yao, Yuhang Chen, Bingsheng He, Yudong Niu, Yuchen Li, Shixuan Sun, Yongchao Liu: "Community detection in heterogeneous information networks without materialization". International Conference on Management of Data (SIGMOD 2025), 2025.
Chengying Huan, Heng Zhang, Yongchao Liu, Likang Chen, Xuran Wang, Yongchun Jiang, Shaonan Ma, Yanjun Wu: "TeMatch: a fast temporal subgraph matching framework with temporal-aware subgraph matching algorithms". 2025 IEEE International Conference on Data Engineering (ICDE 2025), 2024.
Xiangfei Fang, Chengying Huan, Heng Zhang, Yongchao Liu, Shaonan Ma, Yanjun Wu, Chen Zhao: "OTM: efficient K-order-based core maintenance in large-scale dynamic hypergraphs". ACM Transactions on Knowledge Discovery from Data (impact factor 4.0), 2025
Chengying Huan, Yongchao Liu, Heng Zhang, Shiyang Chen, Shuaiwen Leon Song, Yanjun Wu, Hang Liu:" TeGraph+: scalable temporal graph processing enabling flexible edge modifications". IEEE Transactions on Parallel and Distributed Systems (IEEE TPDS), 2024.
Chengying Huan, Yongchao Liu, Heng Zhang, Shuaiwen Song, Santosh Pandey, Shiyang Chen, Xiangfei Fang, Yue Jin, Baptiste Lepers, Hang Liu, Yanjun Wu: "TEA+: a novel temporal graph random walk engine with hybrid storage architecture." ACM Transactions on Architecture and Code Optimization (ACM TACO), accepted, 2024.
Yue Jin, Chengying Huan, Heng Zhang, Yongchao Liu, Shuaiwen Leon Song, Rui Zhao, Yao Zhang, Changhua He, Wenguang Chen: " G-Sparse: compiler-driven acceleration for generalized sparse computation for graph neural networks on modern GPUs". 32nd International Conference on Parallel Architectures and Compilation Techniques (PACT 2023), 2023, pp. 137-149.
Chengying Huan, Shuaiwen Leon Song, Santosh Pandey, Hang Liu, Yongchao Liu, Baptiste Lepers, Changhua He, Kang Chen, Jinlei Jiang, Yongwei Wu: "TEA: a general-purpose temporal graph random walk engine". The European Conference on Computer Systems (EuroSys 2023), 2023, pp. 182-198.
Chengying Huan, Shuaiwen Leon Song, Yongchao Liu, Heng Zhang, Hang Liu, Charles He, Kang Chen, Jinlei Jiang, Yongwei Wu: "T-GCN: a sampling based streaming graph neural network system with hybrid architecture". 31st International Conference on Parallel Architectures and Compilation Techniques (PACT 2022), 2022, pp. 69–82.
Zhang H, Li L, Liu H, Zhuang D, Liu R, Huan C, Song S, Tao D, Liu Y, He C, Wu Y, Song SL: "Bring orders into uncertainty: enabling efficient uncertain graph processing via novel path sampling on multi-accelerator system". International Conference on Supercomputing 2022 (ICS 2022), 2022, 11:14.
Huan C, Liu H, Liu M, Liu Y, He C, Chen K, Jiang J, Wu Y, Song SL: "TeGraph: a novel general-purpose temporal graph computing engine". 38th International Conference on Data Engineering (ICDE 2022), 2022, pp. 578-592.
Yongchao Liu, Houyi Li, Guowei Zhang, Xintan Zeng, Yongyong Li, Bin Huang, Peng Zhang, Zhao Li, Xiaowei Zhu, Changhua He, Guowei Zhang, Xintan Zeng: "GraphTheta: A distributed graph neural network learning system with flexible training strategy". arXiv:2104.10569, 2021
Liu Y, Schmidt B: "LightSpMV: faster CUDA-compatible sparse matrix-vector multiplication using compressed sparse rows". Journal of Signal Processing Systems, 2017, doi:10.1007/s11265-016-1216-4.
Liu Y, Hankeln T, Schmidt B: "Parallel and space-efficient construction of Burrows-Wheeler transform and suffix array for big genome data". IEEE Transactions on Computational Biology and Bioinformatics, 2016, 13(3): 592-598.
Liu Y, Schmidt B: "LightSpMV: faster CSR-based sparse matrix-vector multiplication on CUDA-enabled GPUs". 26th IEEE International Conference on Application-specific Systems, Architectures and Processors (ASAP 2015), 2015, pp. 82-89 (Best Paper Award)
Liu Y, Popp B, and Schmidt B: "CUSHAW3: sensitive and accurate base-space and color-space short-read alignment with hybrid seeding." PLOS ONE, 2014, 9(1): e86869
Liu Y, Schmidt B: "CUSHAW2-GPU: empowering faster gapped short-read alignment using GPU computing". IEEE Design & Test, 2014, 31(1): 31-39
Liu Y and Schmidt B: "Long read alignment based on maximal exact match seeds". Bioinformatics, 2012, 28(18): i318-i324
Liu Y, Schmidt B, Maskell DL: "CUSHAW: a CUDA compatible short read aligner to large genomes based on the Burrows-Wheeler transform". Bioinformatics, 2012, 28(14): 1830-1837
Liu Y, Schmidt B, Maskell DL: "Parallelized short read assembly of large genomes using de Bruijn graphs". BMC Bioinformatics, 2011, 12:354

3. Large-Scale Biological Sequence Analysis System

As part of my LESSA library, this project has been the core of my research on parallel and distributed algorithm design for bioinformatics, by employing a variety of tightly-coupled and loosely-coupled computing architectures, including heterogeneous computers with accelerators (e.g. Intel SSE, Intel AVX, Intel Xeon Phis, NVIDIA GPUs and AMD GPUs), cluster computing and cloud computing. My final objective is to establish an analysis system for large-scale biological sequences, in order to solve some critical and bottleneck problems in bioinformatics and computational biology, such as genome sequencing based on high-throughput sequencing technologies, meta-genomics, motif discovery, sequence alignment, and phylogenetic inference. Fig. 1 illustrates the diagram of an imaginaory biological data analysis system.

Fig. 1 System diagram for large-scale biological sequence analysis

Example work:

(The full lists of my software and publications are available here)

Liu Y, Maskell DL, Schmidt B: "CUDASW++: optimizing Smith-Waterman sequence database searches for CUDA-enabled graphics processing units". BMC Research Notes, 2009, 2:73
Liu Y, Schmidt B, Maskell DL: "MSA-CUDA: multiple sequence alignment on graphics processing units with CUDA". 20th IEEE International Conference on Application-specific Systems, Architectures and Processors (ASAP 2009), 2009, 121-128 (Best Paper Award)
Liu Y, Schmidt B, Maskell DL: "MSAProbs: multiple sequence alignment based on pair hidden Markov models and partition function posterior probabilities". Bioinformatics, 2010, 26(16): 1958 -1964
Liu Y, Schmidt B, Liu W, Maskell DL: "CUDA-MEME: accelerating motif discovery in biological sequences using CUDA-enabled graphics processing units". Pattern Recognition Letters, 2010, 31(14): 2170 - 2177
Liu Y, Schmidt B, Douglas L. Maskell: "CUDASW++2.0: enhanced Smith-Waterman protein database search on CUDA-enabled GPUs based on SIMT and virtualized SIMD abstractions". BMC Research Notes, 2010, 3:93
Liu Y, Schmidt B, Maskell DL: "Parallelized short read assembly of large genomes using de Bruijn graphs". BMC Bioinformatics, 2011, 12:354
Kuttippurathu L, Hsing M, Liu Y, Schmidt B, Maskell DL, Lee K, He A, Pu WT, and Kong SW: "CompleteMOTIFs: DNA motif discovery platform for transcription factor binding experiments". Bioinformatics, 2011, 27(5): 715-717
Liu Y, Schmidt B, Maskell DL: "CUSHAW: a CUDA compatible short read aligner to large genomes based on the Burrows-Wheeler transform". Bioinformatics, 2012, 28(14): 1830-1837
Liu Y and Schmidt B: "Long read alignment based on maximal exact match seeds". Bioinformatics, 2012, 28(18): i318-i324 (also from ECCB 2012)
Liu Y, Schroeder J, Schmidt B: "Musket: a multistage k-mer spectrum based error corrector for Illumina sequence data". Bioinformatics , 2013, 29(3): 308-315
Liu Y, Wirawan A, Schmidt B: "CUDASW++ 3.0: accelerating Smith-Waterman protein database search by coupling CPU and GPU SIMD instructions". BMC Bioinformatics, 2013, 14:117.
Liu Y, Schmidt B: "CUSHAW2-GPU: empowering faster gapped short-read alignment using GPU computing". IEEE Design & Test, 2014, 31(1): 31-39
Ripp F, Krombholz CF, Liu Y, Weber M, Schaefer A, Schmidt B, Koeppel R, Hankeln T: "All-Food-Seq (AFS): a quantifiable screen for species in biological samples by deep DNA sequencing". BMC Genomics, 2014, 15:639
Liu Y, Tran TT, Lauenroth F, Schmidt B: "SWAPHI-LS: Smith-Waterman algorithm on Xeon Phi coprocessors for long DNA sequences". 2014 IEEE International Conference on Cluster Computing (Cluster 2014), 2014, pp. 257-265 (Best Paper Award Recommendation)
Liu Y, Schmidt B: "SWAPHI: Smith-Waterman protein database search on Xeon Phi coprocessors". 25th IEEE International Conference on Application-specific Systems, Architectures and Processors (ASAP 2014), 2014, pp. 184-185
Liu Y, Loewer M, Aluru S, Schmidt B: "SNVSniffer: an integrated caller for germline and somatic SNVs based on Bayesian models". 2015 IEEE International Conference on Bioinformatics and Biomedicine (BIBM15), 2015, pp. 83-90.
Liu Y, Loewer M, Aluru S, Schmidt B: "SNVSniffer: an integrated caller for germline and somatic single-nucleotide and indel mutations". BMC Systems Biology, 2016, 10(Suppl 2):47
Pan T, Flick P, Jain C, Liu Y and Aluru S: "Kmerind: A flexible parallel library for k-mer indexing of biological sequences on distributed memory systems". 7th ACM Conference on Bioinformatics, Computational Biology, and Health Informatics (ACM-BCB 2016), 2016, pp. 422-433
Chan Y, Xu K, Lan H, Liu W, Liu Y and Schmidt B: ”PUNAS: a parallel ungapped-alignment-featured seed verification for next-generation sequencing read alignment”. 31st IEEE International Parallel and Distributed Processing Symposium (IPDPS 2017), 2017, pp. 52-61.
Lan H, Liu W, Liu Y and Schmidt B: ”SWhybrid: a hybrid parallel framework for large-scale protein sequence database search”. 31st IEEE International Parallel and Distributed Processing Symposium (IPDPS 2017), 2017, pp. 42-51.