COMBAI computational biology and artificial intelligence

SARS-CoV-2 Genome Evolution

(beta version)

This website for "Evolutionary trajectory of SARS-CoV-2 genome"
by Anyou Wang

Traditionally alignment-based phylogenetics faces challenges to uncover the evolutionary trajectory of SARS-CoV-2 (severe acute respiratory syndrome coronavirus 2). This study develops a novel alignment-free system and reveals the evolutionary trajectory of SARS-CoV-2 from more than one million of genome sequences. This new system contains Fr├ęchet distance(Fr) and artificial recurrent neural network. Fr computes the dissimilarity between variant and reference genome, which is decomposed into 84 features (4 single nucleotides, 16 dinucleotides and 64 codons). Recurrent neural network predicts and forecasts time-series Fr trajectory, inferring SARS-CoV-2 evolutionary trajectory. Generally SARS-CoV-2 genome mutates rapidly via deletion during COVID-19 pandemic. Among single nucleotides, C mutates fast but T changes slowly. C-prefix dinucleotide (e.g. CG and CT) also loses dramatically during evolution. Similarly, the virus genome also deletes several codons prefixed by C (e.g. CCT) but gains several T and A prefix codons (e.g.TTA and ATT) during its evolution. Interestingly, codon CCT and CT centrally control the entire SARS-CoV-2 genome, and their evolutionary trajectories fit COVID-19 cases spike. Therefore C-prefix feature trajectory marks SARS-CoV-2 evolution.

The typical model

  • def myModel(train_x):
  •         mymodel = keras.Sequential()
    	mymodel.add(Bidirectional(LSTM(20, return_sequences=True), input_shape=(20, train_x.shape[-1])))
    	mymodel.add(Bidirectional(LSTM(40, return_sequences = True)))
    	mymodel.add(Bidirectional(LSTM(20, return_sequences=False)))
    	mymodel.compile(loss='mean_squared_error', optimizer='adam')
    	return mymodel


    Anyou Wang,Evolutionary trajectory of SARS-CoV-2 genome. Anyou Wang,Evolutionary trajectory and origin of SARS-CoV-2 variant. Anyou Wang, Rong Hai,Paul J Rider and Qianchuan He. Noncoding RNAs and deep learning neural network discriminate multi-cancer types.Cancers 2022, 14(2), 352. Wang, A. & Hai, R. FINET: Fast Inferring NETwork. BMC Res Notes 13, 521 (2020).