Exon Targeted Retrieval and Classification Toolbox (ExTRaCT): a gene search pipeline to find APOBEC3 Z-domains in novel bat genomes
Delamonica, B.; Bat1K 21-Families Group, ; Larijani, M.; MacCarthy, T.; Davalos, L. M.
Show abstract
MotivationSeveral computation gene search tools exist to identify and annotate an ever-growing body of newly sequenced genomes of different species. Many annotation tools, however, fall short when the target species diverges from well-studied model organisms, and when searching for short genes with multiple copies. ResultsWe have developed the Exon Targeted Retrieval and Classification Toolbox, ExTRaCT, an automated pipeline to identify any gene exon with conserved structure in novel species genome assemblies. In the use cases presented here, we applied our search tool to 102 bat genomes to find APOBEC3 gene family members. We show that our homolog search algorithm is efficient (run time average of 5 hours for over 100 genomes), works well with reference sequences distantly related to the target (1 out of 498 misclassifications, 0 false positives and 2 false negatives), and is easy to use. As genomic sequencing becomes faster and more accessible, ExTRaCT has downstream applications in phylogenetic, biochemical and genomic studies. It is a simple computational tool that provides a solution to target gene identification, requiring neither whole-genome-assembly annotations, nor prior knowledge of closely related species. Availabilityhttps://doi.org/10.5281/zenodo.15769018 ContactBrenda.delamonica@stonybrook.edu Supplementary informationSupplementary data are available at Bioinformatics online.
Matching journals
The top 3 journals account for 50% of the predicted probability mass.