COMBAI computational biology and artificial intelligence

Evolutionary trajectory and origin of SARS-CoV-2

(beta version)

This website for the project "Integrating Fréchet distance and AI reveals the evolutionary trajectory and origin of SARS-CoV-2 "
by Anyou Wang

Alignment-based phylogenetics faces challenges to uncover the evolutionary trajectory and origin of SARS-CoV-2. This study develops a novel alignment-free system integrating Fréchet distance (Fr) and artificial recurrent neural network (RNN) to quantitatively reveal the evolutionary trajectory and origin of SARS-CoV-2 from more than two millions of genome sequences. Fr measures the evolutionary similarity between a variant and the reference in terms of 84 genome features, including 4 single nucleotides, 16 dinucleotides and 64 trinucleotides. RNN recognizes the evolutionary trajectory from Fr data. Globally, SARS-CoV-2 evolutionarily deletes its genome to significantly enhance its infection capacity (tau = -0.64 and p_value= 1.39e-101), and it remarkably deletes 66 features whereas only gains 18 features. Yet only mutating signature features such as TTA, GCT and CG increases its infection potential. In organism level, variants mutating a single biomarker possess low infectious potential, but those mutating multiple markers dramatically increase their infection capacity. Mink coronavirus is the most likely origin of SARS-CoV-2 and the origin trajectory follows the order: mink, cat, tiger, mouse, hamster, dog, lion, gorilla, leopard, bat, and pangolin. Together, the mink-origin SARS-CoV-2 evolves primarily via deletion and mutates multiple loci, causing COVID-19 pandemic.

The typical model

  • def myModel(train_x):
  •         mymodel = keras.Sequential()
    	mymodel.add(Bidirectional(LSTM(20, return_sequences=True), input_shape=(20, train_x.shape[-1])))
    	mymodel.add(Dropout(rate=0.2))
    	mymodel.add(Bidirectional(LSTM(40, return_sequences = True)))
    	mymodel.add(Dropout(rate=0.2))
    	mymodel.add(Bidirectional(LSTM(20, return_sequences=False)))
    	mymodel.add(Dense(units=1))
    	mymodel.compile(loss='mean_squared_error', optimizer='adam')
    	return mymodel
          

    References

    Anyou Wang,Integrating Fréchet distance and AI reveals the evolutionary trajectory and origin of SARS-CoV-2 Anyou Wang, Rong Hai,Paul J Rider and Qianchuan He. Noncoding RNAs and deep learning neural network discriminate multi-cancer types.Cancers 2022, 14(2), 352. Wang, A. & Hai, R. FINET: Fast Inferring NETwork. BMC Res Notes 13, 521 (2020).