Most of these methods are based on word statistics or word comparison, and their scalability allows them to be applied to much larger data sets than conventional MSA-based methods.Ī wide array of AF approaches to sequence comparison have been developed. Therefore, as an alternative to sequence alignment, many so-called alignment-free (AF) approaches to sequence analysis have been developed, with the earliest works dating back to the mid 1970s, although the concept of the alignment-independent sequence comparison gained increased attention only in the beginning of the 2000s. In addition, aligning two long DNA sequences-millions of nucleotide long-is infeasible in practice. Moreover, alignment algorithms assume that the linear order of homology is preserved within the compared sequences, so these algorithms cannot be directly applied in the presence of sequence rearrangements (e.g., recombination and protein domain swapping ) or horizontal transfer in cases where large-scale sequence data sets are processed, e.g., for whole-genome phylogenetics. Additionally, alignment-based techniques have been shown to be inaccurate in scenarios of low sequence identity (e.g., gene regulatory sequences and distantly related protein homologs ). Although alignment-based approaches generally remain the references for sequence comparison, MSA-based methods do not scale with the very large data sets that are available today. Software tools for sequence alignment, such as BLAST and CLUSTAL, are the most widely used bioinformatics methods. Traditionally, sequence comparison was based on pairwise or multiple sequence alignment (MSA). It is the first and key step in molecular evolutionary analysis, gene function and regulatory region prediction, sequence assembly, homology searching, molecular structure prediction, gene discovery, and protein structure-function relationship analysis. It also allows method developers to assess their own algorithms and compare them with current state-of-the-art tools, accelerating the development of new, more accurate AF solutions.Ĭomparative analysis of DNA and amino acid sequences is of fundamental importance in biological research, particularly in molecular biology and genomics. The interactive web service allows researchers to explore the performance of alignment-free tools relevant to their data types and analytical goals. We characterize 74 AF methods available in 24 software tools for five research applications, namely, protein sequence classification, gene tree inference, regulatory element detection, genome-based phylogenetic inference, and reconstruction of species trees under horizontal gene transfer and recombination events. Here, we present a community resource ( ) to establish standards for comparing alignment-free approaches across different areas of sequence-based research. Hence, many AF procedures have been proposed in recent years, but a lack of a clearly defined benchmarking consensus hampers their performance assessment. Alignment-free (AF) sequence comparison is attracting persistent interest driven by data-intensive applications.
0 Comments
Leave a Reply. |
Details
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |