A Taxonomy for RNA Motif Discovery

Document Type : Bioinformatics-Naghibzadeh


1 Department of Bioinformatics, University of Zabol, Zabol, Iran

2 Department of Computer Engineering, Faculty of Engineering, Ferdowsi University of Mashhad, Mashhad, Iran

3 Department of Bioinformatics, University of Zabol, Zabol, Iran Department of Animal Science, Faculty of Agriculture, Tarbiat Modares University, Tehran, Iran


Motifs have critical impacts on the behavioral and structural characteristics of RNA sequences. Understanding and predicting the functionalities and interactions of an RNA sequence requires discovering and identifying its motifs. Due to the importance of motif discovery in bioinformatics, a significant corpus of techniques and algorithms have been proposed, each of which has various advantages and limitations and hence, are suitable for specific applications. To understand these techniques and algorithms, compare them, and choose the most suitable one for a particular application scenario, it is crucial to have a clear understanding of the different vital aspects that characterize these algorithms. The lack of such a framework to study these aspects is a serious existing challenge in the literature that needs further investigation. In this paper, we propose a taxonomy and a framework to address this issue. We define the concept of motif discovery process and three aspects that characterize such a process, which are motif type, discovery technique, and application. We then study the literature and classify the existing approaches along with these aspects. This will give the reader a broader view and more precise understanding of what these techniques and algorithms do, how they do it, and what is the most suitable application for each of them. We then present the possible gaps and challenges foreseen to be the future directions of the area. 


Main Subjects

[1]   D. Wang and A. Farhana, "Biochemistry, RNA Structure," in StatPearls [Internet]: StatPearls Publishing, 2022.
[2]   S. Di Carlo, G. Politano, A. Savino, and A. Benso, "A systematic analysis of a mi-RNA inter-pathway regulatory motif," Journal of clinical bioinformatics, vol. 3, no. 1, pp. 1-14, 2013.
[3]   N. B. Leontis, A. Lescoute, and E. Westhof, "The building blocks and motifs of RNA architecture," Current opinion in structural biology, vol. 16, no. 3, pp. 279-287, 2006.
[4]   N. B. Leontis and E. Westhof, "Analysis of RNA motifs," Current opinion in structural biology, vol. 13, no. 3, pp. 300-308, 2003.
[5]   W. Saenger, Principles of Nucleic Acid Structure. Springer New York, NY, 1984, pp. XX, 556.
[6]   N. B. Leontis and E. Westhof, "Geometric nomenclature and classification of RNA base pairs," Rna, vol. 7, no. 4, pp. 499-512, 2001.
[7]   J. Pohar, D. Lainšček, A. Kunšek, M.-M. Cajnko, R. Jerala, and M. Benčina, "Phosphodiester backbone of the CpG motif within immunostimulatory oligodeoxynucleotides augments activation of Toll-like receptor 9," Scientific reports, vol. 7, no. 1, pp. 1-11, 2017.
[8]   Y. Chen and G. Varani, "RNA structure," eLS, 2010.
[9]   T. J. Macke, D. J. Ecker, R. R. Gutell, D. Gautheret, D. A. Case, and R. Sampath, "RNAMotif, an RNA secondary structure definition and search algorithm," Nucleic acids research, vol. 29, no. 22, pp. 4724-4735, 2001.
[10] F. Fassetti, G. Greco, and G. Terracina, "Mining loosely structured motifs from biological data," IEEE Transactions on Knowledge and Data Engineering, vol. 20, no. 11, pp. 1472-1489, 2008.
[11] Data Mining in Bioinformatics (Advanced Information and Knowledge Processing). Springer-Verlag London, 2005.
[12] Z. Lu and H. Y. Chang, "Decoding the RNA structurome," Current opinion in structural biology, vol. 36, pp. 142-148, 2016.
[13] Y. S. Tsai, S. M. Gomez, and Z. Wang, "Prevalent RNA recognition motif duplication in the human genome," RNA, vol. 20, no. 5, pp. 702-712, 2014.
[14] A. Jolma et al., "Binding specificities of human RNA-binding proteins toward structured and linear RNA sequences," Genome research, vol. 30, no. 7, pp. 962-973, 2020.
[15] D. K. Hendrix, S. E. Brenner, and S. R. Holbrook, "RNA structural motifs: building blocks of a modular biomolecule," Quarterly reviews of biophysics, vol. 38, no. 3, p. 221, 2005.
[16] G. J. Quigley and A. Rich, "Structural domains of transfer RNA molecules," Science, vol. 194, no. 4267, pp. 796-806, 1976.
[17] H.-C. Huang, U. Nagaswamy, and G. E. Fox, "The application of cluster analysis in the intercomparison of loop structures in RNA," Rna, vol. 11, no. 4, pp. 412-423, 2005.
[18] J. C. Lee, J. J. Cannone, and R. R. Gutell, "The lonepair triloop: a new motif in RNA structure," Journal of molecular biology, vol. 325, no. 1, pp. 65-83, 2003.
[19] A. Szewczak and P. Moore, "The sarcin/ricin loop, a modular RNA," Journal of molecular biology, vol. 247, no. 1, pp. 81-98, 1995.
[20] D. Klein, T. Schmeing, P. Moore, and T. Steitz, "The kink‐turn: a new RNA secondary structure motif," The EMBO journal, vol. 20, no. 15, pp. 4214-4221, 2001.
[21] S. A. STROBEL, P. L. ADAMS, M. R. STAHLEY, and J. WANG, "RNA kink turns to the left and to the right," Rna, vol. 10, no. 12, pp. 1852-1854, 2004.
[22] S. Szep, J. Wang, and P. B. Moore, "The crystal structure of a 26-nucleotide RNA containing a hook-turn," Rna, vol. 9, no. 1, pp. 44-51, 2003.
[23] J. Sussman and S. Kim, "Absence of correlation between base-pair sequence and RNA conformation," Science, vol. 212, no. 4500, pp. 1275-1277, 1981.
[24] T. Hermann and D. J. Patel, "RNA bulges as architectural and recognition motifs," Structure, vol. 8, no. 3, pp. R47-R54, 2000.
[25] D. E. Draper, "A guide to ions and RNA structure," Rna, vol. 10, no. 3, pp. 335-343, 2004.
[26] R. T. Batey, R. P. Rambo, and J. A. Doudna, "Tertiary motifs in RNA structure and folding," Angewandte Chemie International Edition, vol. 38, no. 16, pp. 2326-2343, 1999.
[27] S. Kim et al., "The general structure of transfer RNA molecules," Proceedings of the National Academy of Sciences, vol. 71, no. 12, pp. 4970-4974, 1974.
[28] J. H. Cate et al., "Crystal structure of a group I ribozyme domain: principles of RNA packing," Science, vol. 273, no. 5282, pp. 1678-1685, 1996.
[29] K.-Y. Chang and I. Tinoco, "Characterization of a" kissing" hairpin complex derived from the human immunodeficiency virus genome," Proceedings of the National Academy of Sciences, vol. 91, no. 18, pp. 8705-8709, 1994.
[30] P. S. Klosterman, M. Tamura, S. R. Holbrook, and S. E. Brenner, "SCOR: a structural classification of RNA database," Nucleic acids research, vol. 30, no. 1, pp. 392-394, 2002.
[31] E. Ennifar, P. Walter, B. Ehresmann, C. Ehresmann, and P. Dumas, "Crystal structures of coaxially stacked kissing complexes of the HIV-1 RNA dimerization initiation site," Nature structural biology, vol. 8, no. 12, pp. 1064-1068, 2001.
[32] P. Nissen, J. A. Ippolito, N. Ban, P. B. Moore, and T. A. Steitz, "RNA tertiary interactions in the large ribosomal subunit: the A-minor motif," Proceedings of the National Academy of Sciences, vol. 98, no. 9, pp. 4899-4903, 2001.
[33] S. R. Holbrook, J. L. Sussman, R. W. Warrant, and S.-H. Kim, "Crystal structure of yeast phenylalanine transfer RNA: II. Structural features and functional implications," Journal of molecular biology, vol. 123, no. 4, pp. 631-660, 1978.
[34] S. Holbrook and S.-H. Kim, "Intercalation conformations in single-and double-stranded nucleic acids," International Journal of Biological Macromolecules, vol. 1, no. 5, pp. 233-240, 1979.
[35] F. Van Batenburg, A. P. Gultyaev, and C. W. Pleij, "PseudoBase: structural information on RNA pseudoknots," Nucleic acids research, vol. 29, no. 1, pp. 194-195, 2001.
[36] L. X. Shen and I. Tinoco Jr, "The structure of an RNA pseudoknot that causes efficient frameshifting in mouse mammary tumor virus," Journal of molecular biology, vol. 247, no. 5, pp. 963-978, 1995.
[37] T. L. Bailey and C. Elkan, "Unsupervised learning of multiple motifs in biopolymers using expectation maximization," Machine learning, vol. 21, no. 1-2, pp. 51-80, 1995.
[38] T. L. Bailey et al., "MEME SUITE: tools for motif discovery and searching," Nucleic acids research, vol. 37, no. suppl_2, pp. W202-W208, 2009.
[39] Z. Yao, Z. Weinberg, and W. L. Ruzzo, "CMfinder—a covariance model based RNA motif finding algorithm," Bioinformatics, vol. 22, no. 4, pp. 445-452, 2006.
[40] M. Rabani, M. Kertesz, and E. Segal, "Computational prediction of RNA structural motifs involved in posttranscriptional regulatory processes," Proceedings of the National Academy of Sciences, vol. 105, no. 39, pp. 14885-14890, 2008.
[41] M. Hiller, R. Pudimat, A. Busch, and R. Backofen, "Using RNA secondary structures to guide sequence motif finding towards single-stranded regions," Nucleic acids research, vol. 34, no. 17, pp. e117-e117, 2006.
[42] J. Han and M. Kamber, "Data Mining: Concepts and Techniques, 2nd editionMorgan Kaufmann Publishers," San Francisco, CA, USA, 2006.
[43] A. Achar and P. Sætrom, "RNA motif discovery: a computational overview," Biology direct, vol. 10, no. 1, p. 61, 2015.
[44] M. Quadrini, L. Tesei, and E. Merelli, "An algebraic language for RNA pseudoknots comparison," BMC bioinformatics, vol. 20, no. 4, pp. 1-18, 2019.
[45] M. Quadrini, L. Tesei, and E. Merelli, "ASPRAlign: a tool for the alignment of RNA secondary structures with arbitrary pseudoknots," Bioinformatics, vol. 36, no. 11, pp. 3578-3579, 2020.
[46] M. Quadrini, "Structural relation matching: an algorithm to identify structural patterns into RNAs and their interactions," Journal of Integrative Bioinformatics, 2021.
[47] G. Mauri and G. Pavesi, "Algorithms for pattern matching and discovery in RNA secondary structure," Theoretical Computer Science, vol. 335, no. 1, pp. 29-51, 2005.
[48] M. Anwar, T. Nguyen, and M. Turcotte, "Identification of consensus RNA secondary structures using suffix arrays," BMC bioinformatics, vol. 7, no. 1, p. 244, 2006.
[49] G. Badr, I. Al-Turaiki, M. Turcotte, and H. Mathkour, "IncMD: Incremental trie-based structural motif discovery algorithm," Journal of bioinformatics and computational biology, vol. 12, no. 05, p. 1450027, 2014.
[50] Y. Ji, X. Xu, and G. D. Stormo, "A graph theoretical approach for predicting common RNA secondary structure motifs including pseudoknots in unaligned sequences," Bioinformatics, vol. 20, no. 10, pp. 1591-1602, 2004.
[51] M. Miladi et al., "GraphClust2: annotation and discovery of structured RNAs with scalable and accessible integrative clustering," GigaScience, vol. 8, no. 12, p. giz150, 2019.
[52] M. Hamada, K. Tsuda, T. Kudo, T. Kin, and K. Asai, "Mining frequent stem patterns from unaligned RNA sequences," Bioinformatics, vol. 22, no. 20, pp. 2480-2487, 2006.
[53] D. Sankoff, "Simultaneous solution of the RNA folding, alignment and protosequence problems," SIAM journal on applied mathematics, vol. 45, no. 5, pp. 810-825, 1985.
[54] J. Gorodkin, L. J. Heyer, and G. D. Stormo, "Finding the most significant common sequence and structure motifs in a set of RNA sequences," Nucleic acids research, vol. 25, no. 18, pp. 3724-3732, 1997.
[55] J. S. McCaskill, "The equilibrium partition function and base pair binding probabilities for RNA secondary structure," Biopolymers: Original Research on Biomolecules, vol. 29, no. 6‐7, pp. 1105-1119, 1990.
[56] G. Badr, I. Al-Turaiki, and H. Mathkour, "Classification and assessment tools for structural motif discovery algorithms," BMC bioinformatics, vol. 14, no. S9, p. S4, 2013.
[57] J.-H. Chen, S.-Y. Le, and J. V. Maizel, "Prediction of common secondary structures of RNAs: a genetic algorithm approach," Nucleic Acids Research, vol. 28, no. 4, pp. 991-999, 2000.
[58] Y. J. Hu, "Prediction of consensus structural motifs in a family of coregulated RNA sequences," Nucleic acids research, vol. 30, no. 17, pp. 3886-3893, 2002.
[59] S. Michal, T. Ivry, O. Cohen, M. Sipper, and D. Barash, "Finding a common motif of RNA sequences using genetic programming: The GeRNAMo system," IEEE/ACM transactions on computational biology and bioinformatics, vol. 4, no. 4, pp. 596-610, 2007.
[60] K. L. Povl, L. Tommy, and M. Ulf, Textbook of Drug Design and Discovery Third edition ed. USA and Canada: Taylor & Francis, 2005.
[61] W. Yin et al., "Structural basis for inhibition of the RNA-dependent RNA polymerase from SARS-CoV-2 by remdesivir," Science, 2020.
[62] S. Richter, H. Cao, and T. M. Rana, "Specific HIV-1 TAR RNA loop sequence and functional groups are required for human cyclin T1− Tat− TAR ternary complex formation," Biochemistry, vol. 41, no. 20, pp. 6391-6397, 2002.
[63] M. P. Robertson, H. Igel, R. Baertsch, D. Haussler, M. Ares Jr, and W. G. Scott, "The structure of a rigorously conserved RNA element within the SARS virus genome," PLoS biology, vol. 3, no. 1, p. e5, 2005.
[64] A. H. Aldhumani et al., "RNA sequence and ligand binding alter conformational profile of SARS-CoV-2 stem loop II motif," Biochemical and biophysical research communications, vol. 545, pp. 75-80, 2021.
[65] A. Umuhire Juru, N. N. Patwardhan, and A. E. Hargrove, "Understanding the contributions of conformational changes, thermodynamics, and kinetics of RNA–small molecule interactions," ACS chemical biology, vol. 14, no. 5, pp. 824-838, 2019.
[66] S. Ramírez-Clavijo and G. Montoya-Ortíz, "Gene expression and regulation," in Autoimmunity: From Bench to Bedside [Internet]: El Rosario University Press, 2013.
[67] L. Nasalean, J. Stombaugh, C. Zirbel, and N. Leontis, "Non-protein coding RNAs," ed: Springer Berlin Heidelberg, 2009.
[68] K. A. Afonin et al., "In vitro assembly of cubic RNA-based scaffolds designed in silico," Nature nanotechnology, vol. 5, no. 9, pp. 676-682, 2010.
[69] H. Saito and T. Inoue, "Synthetic biology with RNA motifs," The international journal of biochemistry & cell biology, vol. 41, no. 2, pp. 398-404, 2009.
[70] I. Severcan, C. Geary, A. Chworos, N. Voss, E. Jacovetty, and L. Jaeger, "A polyhedron made of tRNAs," Nature chemistry, vol. 2, no. 9, pp. 772-779, 2010.
[71] N. N. Qader and H. K. Al-Khafaji, "Motif discovery and data mining in bioinformatics," Int. J. Comput. Technol, vol. 13, no. 1, pp. 4082-4095, 2014.
[72] J. Xiong, "Essential Bioinformatics Cambridge University press," Newyork. USA, 2006.
[73] M. Vahed, M. Vahed, and L. X. Garmire, "BML: a versatile web server for bipartite motif discovery," bioRxiv, 2021.
[74] P. E. Compeau, P. A. Pevzner, and G. Tesler, "Why are de Bruijn graphs useful for genome assembly?," Nature biotechnology, vol. 29, no. 11, p. 987, 2011.
[75] T. Gao, J. Shu, and J. Cui, "A systematic approach to RNA-associated motif discovery," BMC genomics, vol. 19, no. 1, p. 146, 2018.
[76] M. Sarver, C. L. Zirbel, J. Stombaugh, A. Mokdad, and N. B. Leontis, "FR3D: finding local and composite recurrent structural motifs in RNA 3D structures," Journal of mathematical biology, vol. 56, no. 1-2, pp. 215-252, 2008.
[77] D. H. Mathews, M. D. Disney, J. L. Childs, S. J. Schroeder, M. Zuker, and D. H. Turner, "Incorporating chemical modification constraints into a dynamic programming algorithm for prediction of RNA secondary structure," Proceedings of the National Academy of Sciences, vol. 101, no. 19, pp. 7287-7292, 2004.
[78] G. Pavesi, P. Mereghetti, F. Zambelli, M. Stefani, G. Mauri, and G. Pesole, "MoD Tools: regulatory motif discovery in nucleotide sequences from co-regulated or homologous genes," Nucleic acids research, vol. 34, no. suppl_2, pp. W566-W570, 2006.
[79] G. Pavesi, G. Mauri, M. Stefani, and G. Pesole, "RNAProfile: an algorithm for finding conserved secondary structure motifs in unaligned RNA sequences," Nucleic acids research, vol. 32, no. 10, pp. 3258-3269, 2004.
[80] G. Pavesi, P. Mereghetti, G. Mauri, and G. Pesole, "Weeder Web: discovery of transcription factor binding sites in a set of sequences from co-regulated genes," Nucleic acids research, vol. 32, no. suppl_2, pp. W199-W203, 2004.
[81] M. Pietrosanto, M. Adinolfi, R. Casula, G. Ausiello, F. Ferrè, and M. Helmer-Citterich, "BEAM web server: a tool for structural RNA motif discovery," Bioinformatics, vol. 34, no. 6, pp. 1058-1060, 2018.
[82] A. Guarracino et al., "BRIO: a web server for RNA sequence and structure motif scan," Nucleic Acids Research, 2021.
[83] E. Mattei, G. Ausiello, F. Ferre, and M. Helmer-Citterich, "A novel approach to represent and compare RNA secondary structures," Nucleic acids research, vol. 42, no. 10, pp. 6146-6157, 2014.
[84] M. Pietrosanto et al., "Relative Information Gain: Shannon entropy-based measure of the relative structural conservation in RNA alignments," NAR genomics and bioinformatics, vol. 3, no. 1, p. lqab007, 2021.
[85] J. Ule, K. B. Jensen, M. Ruggiu, A. Mele, A. Ule, and R. B. Darnell, "CLIP identifies Nova-regulated RNA networks in the brain," Science, vol. 302, no. 5648, pp. 1212-1215, 2003.
[86] J. König et al., "iCLIP reveals the function of hnRNP particles in splicing at individual nucleotide resolution," Nature structural & molecular biology, vol. 17, no. 7, pp. 909-915, 2010.
[87] M. Hafner et al., "Transcriptome-wide identification of RNA-binding protein and microRNA target sites by PAR-CLIP," Cell, vol. 141, no. 1, pp. 129-141, 2010.
[88] D. Ray et al., "Rapid and systematic analysis of the RNA recognition specificities of RNA-binding proteins," Nature biotechnology, vol. 27, no. 7, pp. 667-670, 2009.
[89] A. Munteanu, N. Mukherjee, and U. Ohler, "SSMART: sequence-structure motif identification for RNA-binding proteins," Bioinformatics, vol. 34, no. 23, pp. 3990-3998, 2018.
[90] J. Singh, J. Hanson, K. Paliwal, and Y. Zhou, "RNA secondary structure prediction using an ensemble of two-dimensional deep neural networks and transfer learning," Nature communications, vol. 10, no. 1, pp. 1-13, 2019.
[91] W. Dawson, T. Takai, N. Ito, K. Shimizu, and G. Kawai, "A new entropy model for RNA: part III. Is the folding free energy landscape of RNA funnel shaped?," Journal of Nucleic Acids Investigation, vol. 5, no. 1, 2014.
[92] D. Maticzka, S. J. Lange, F. Costa, and R. Backofen, "GraphProt: modeling binding preferences of RNA-binding proteins," Genome biology, vol. 15, no. 1, pp. 1-18, 2014.
[93] E. Bindewald, T. Kluth, and B. A. Shapiro, "CyloFold: secondary structure prediction including pseudoknots," Nucleic acids research, vol. 38, no. suppl_2, pp. W368-W372, 2010.
[94] T. L. Bailey, "STREME: Accurate and versatile sequence motif discovery," Biorxiv, 2020.
[95] S. Engelen and F. Tahi, "Tfold: efficient in silico prediction of non-coding RNA secondary structures," Nucleic acids research, vol. 38, no. 7, pp. 2453-2466, 2010.
[96] X. Chen, N. S. Khan, and S. Zhang, "LocalSTAR3D: a local stack-based RNA 3D structural alignment tool," Nucleic acids research, vol. 48, no. 13, pp. e77-e77, 2020.