COMBAI computational biology and artificial intelligence
COMBAI is a research group dedicated to computational biology, big data analytics, and artificial intelligence. We develop novel computational algorithms to uncover the fundamental principles and big-picture insights hidden within large-scale biological data. Our key findings and achievements are summarized below.
Cancer research is currently facing a major crisis. Despite decades of effort and numerous discoveries, including several Nobel Prize-winning studies, these advances have not led to a significant reduction in cancer mortality. Cancer remains the leading cause of death worldwide, and a unifying theory that explains all cancer types remains elusive.
This stagnation stems from deep-rooted limitations in both experimental biology and computational approaches. Traditional biological experiments often suffer from small sample sizes and limited scales, leading to biased and incomplete observations. Similarly, conventional computational methods, such as systems biology, rely heavily on predefined knowledge databases, such as GENCODE gene annotations, which introduce their own biases.
To address these issues, COMBAI has developed algorithms capable of analyzing the world’s largest biological datasets without prior assumptions. Using this unbiased approach, we discovered a universal cancer principle: noncoding RNAs, rather than proteins as traditionally believed, serve as the primary drivers across all types of cancer. In contrast, proteins primarily serve this role under the normal physiological conditions.
This paradigm-shifting discovery lays a foundational framework for a new era of cancer research and therapeutic development, with the potential to ultimately cure all forms of cancer.
References: Anyou Wang. 2022.Noncoding RNAs endogenously rule the cancerous regulatory realm while proteins govern the normal. Computational and Structural Biotechnology Journal Anyou Wang. 2024.Conceptual breakthroughs of the long noncoding RNA functional system and its endogenous regulatory role in the cancerous regime. Explor Target Antitumor Ther Anyou Wang and Rong Hai .2021. Noncoding RNAs Serve as the Deadliest Universal Regulators of all Cancers. Cancer Genomics-Proteomics 18 (1), 43-52 Anyou Wang, Rong Hai,Paul J Rider and Qianchuan He. Noncoding RNAs and deep learning neural network discriminate multi-cancer types.Cancers 2022, 14(2), 352. Wang, A. & Hai, R. FINET: Fast Inferring NETwork. BMC Res Notes 13, 521 (2020).
Media: Anyou Wang. Keynote Speaker.Title: Theory and perspective: How big data and AI accelerate the discovery of a universal cancer principle from the noncoding RNA regime. Cancer Research 2025 Anyou Wang. Invited Speaker.Title: Noncoding RNAs and deep learning neural network discriminate multi-cancer types. ICC 2023
Traditionally, the functional mechanisms governing proteins have been presumed to apply to noncoding RNAs (ncRNAs). For instance, it has long been assumed that ncRNAs are transcribed by the same RNA polymerases—Pol I, Pol II, and Pol III— as those used in protein-coding gene expression.
However, by performing an unbiased analysis of the world’s largest biological dataset, the Sequence Read Archive (SRA), we discovered that ncRNAs operate through a distinctive functional system. Notably, our results show that ncRNA transcription initiation does not depend on any known RNA polymerase. Instead, it involves a unique initiation mechanism that is mediated by an uncharacterized factor. In our publications, we have referred to this as an “unknown factor,” In fact, we have computationally identified this factor, although experimental validation is still pending.
This discovery lays a new foundation for understanding the noncoding genome. Since approximately 98% of the human genome is noncoding and ncRNAs are central players in abnormal physiological conditions, such as various disease states and immune responses to environmental fluctuations, this distinctive ncRNA system may hold more extensive and critical biological functions than previously attributed to protein-coding genes.
References: Anyou Wang.2022. Distinctive functional regime of endogenous lncRNAs in dark regions of human genome.Computational and Structural Biotechnology Journal. Anyou Wang. 2024.Conceptual breakthroughs of the long noncoding RNA functional system and its endogenous regulatory role in the cancerous regime. Explor Target Antitumor Ther Wang, A. & Hai, R. FINET: Fast Inferring NETwork. BMC Res Notes 13, 521 (2020).
Traditional sequence alignment-based methods struggle to handle millions of genome sequences, leaving the evolutionary origin of SARS-CoV-2 unresolved, which is a major gap during the COVID-19 pandemic.
To overcome this, we redefined the concept of a genome in computational terms—as a precisely ordered sequence of four nucleotides (A, T, C, and G) that can be decomposed into specific genomic features, such as AAA motifs—rather than treating it simply as genetic material, as in the traditional definition. Based on this framework, we developed a novel alignment-free, AI-integrated computational approach that reveals the quantitative evolutionary path of SARS-CoV-2, which alignment-based phylogenetic methods failed to achieve.
Our findings suggest that SARS-CoV-2 most likely originated from mink coronavirus variants. The evolutionary path can be traced through multiple species in the following order: mink, cat, tiger, mouse, hamster, dog, lion, gorilla, leopard, bat, and pangolin.
This study provides a clear and quantitative evolutionary trajectory for SARS-CoV-2, offering new insights into its origin and resolving a critical unknown during the COVID-19 crisis.
References: Anyou Wang,Integrating Fréchet distance and AI reveals the evolutionary trajectory and origin of SARS‐ CoV‐2. J Med Virol. 2024;96:e29557. Full text Wang, A. & Hai, R. FINET: Fast Inferring NETwork. BMC Res Notes 13, 521 (2020).
Media: Anyou Wang. Keynote Speaker.Title: Evolutionary trajectory and origin of SARS-CoV-2. Virology-2022
Software:
The biological basis of longevity across species has long been a mystery. By analyzing large-scale genomic data spanning the animal kingdom, we discovered that noncoding RNAs (ncRNAs), not proteins, as traditionally believed, play a central role in determining lifespan.
Our research also identified specific "longevity motifs" in the human reproductive system and cerebral cortex. Notably, these motifs are more active in the female reproductive system than in the male reproductive system, offering a molecular explanation for why women generally live longer than men.
This discovery establishes a foundational framework for understanding the molecular mechanisms underlying aging and opens new avenues for targeted interventions in longevity and age-related diseases.
References: Anyou Wang.Noncoding RNAs evolutionarily extend animal lifespan. Wang, A. & Hai, R. FINET: Fast Inferring NETwork. BMC Res Notes 13, 521 (2020).