Visual and Quantitative Analyses of Virus Genomic Sequences using a Metric-based Algorithm
Journal article
Published version
Permanent lenke
https://hdl.handle.net/11250/3054444Utgivelsesdato
2022Metadata
Vis full innførselSamlinger
Originalversjon
WSEAS Transactions on Circuits and Systems. 2022, 21 323-348. 10.37394/23201.2022.21.35Sammendrag
This work aims to study the virus RNAs using a novel accelerated algorithm to explore anylength repetitive genomic fragments in sequences using Hamming distance between the binaryexpressed characters of an RNA and a query pattern. Primary attention is paid to the building and analyzing 1-D distributions (walks) of atg-patterns - codon-starting triplets in genomes. These triplets compose a distributed set called a word scheme of RNA. A complete genome map is built by plotting the mentioned atg-walks, trajectories of separate (a-, c-, g-, and t-symbols) nucleotides, and the lines designating the genomic words. The said map can be additionally equipped by gene’s designations making this tool pertinent for multi-scale genomic analyses. The visual examination of atg-walks is followed by calculating statistical parameters of genomic sequences, including estimating walkgeometry deviation of RNAs and fractal properties of word-length distributions. This approach is applied to the SARS CoV-2, MERS CoV, Dengue, and Ebola viruses, whose complete genomic sequences are taken from GenBank and GISAID. The relative stability of these walks for SARS CoV-2 and MERS CoV viruses was found, unlike the Dengue and Ebola distributions that showed an increased deviation of their geometrical and fractal characteristics. The developed approach can be useful in further studying mutations of viruses and building their phylogenic trees.