Back

Systematic functional annotation workflow for insects

Bono, H.; Sakamoto, T.; Kasukawa, T.; Tabunoki, H.

2022-06-11 bioinformatics
10.1101/2022.05.12.490705 bioRxiv
Show abstract

Next generation sequencing has revolutionized entomological study, rendering it possible to analyze the genomes and transcriptomes of non-model insects. However, use of this technology is often limited to obtaining nucleotide sequences of target or related genes, with many of the acquired sequences remaining unused because other available sequences are not sufficiently annotated. To address this issue, we have developed a functional annotation workflow for transcriptome-sequenced insects to determine transcript descriptions, which represents a significant improvement over the previous method (functional annotation pipeline for insects). The developed workflow attempts to annotate not only the protein sequences obtained from transcriptome analysis but also the ncRNA sequences obtained simultaneously. In addition, the workflow integrates the expression level information obtained from transcriptome sequencing for application as functional annotation information. Using the workflow, functional annotation was performed on the sequences obtained from transcriptome sequencing of stick insect (Entoria okinawaensis) and silkworm (Bombyx mori), yielding richer functional annotation information than that obtained in our previous study. The improved workflow allows more comprehensive exploitation of transcriptome data and is applicable to other insects because the workflow has been openly developed on GitHub. Simple SummaryThe function of all genes encoded in the genome should be studied for genome editing. The genome editing technology can speeds up insect research for functional analysis of genes. Our knowledge about the functional information of genes is still incomplete currently while genome sequencing of an organism can be completed. The functional information has been annotated based solely on the information that has been obtained from the result of previous biological research. However, this information will be important in determining the target genes for genome editing. In particular, it is very important that this information is in machine-readable form because computer programs mainly parse this information for the understanding of biological systems. In this paper, we describe a workflow-based method for annotating gene functions in insects that make use of transcribed sequence information as well as reference genome and protein sequence databases. Using the developed workflow, we annotated functional information of Japanese stick insect and silkworm, including gene expression as well as sequence analysis. The functional annotation information obtained by the workflow will greatly expand the possibilities of entomological research using genome editing.

Matching journals

The top 8 journals account for 50% of the predicted probability mass.

1
Gigabyte
60 papers in training set
Top 0.1%
15.0%
2
PLOS ONE
4510 papers in training set
Top 17%
10.6%
3
BMC Bioinformatics
383 papers in training set
Top 2%
6.4%
4
PeerJ
261 papers in training set
Top 1.0%
4.9%
5
Genes
126 papers in training set
Top 0.2%
4.0%
6
BMC Genomics
328 papers in training set
Top 0.6%
4.0%
7
Journal of Bioinformatics and Systems Biology
14 papers in training set
Top 0.1%
4.0%
8
Scientific Reports
3102 papers in training set
Top 35%
3.7%
50% of probability mass above
9
Genomics, Proteomics & Bioinformatics
171 papers in training set
Top 2%
3.7%
10
Insects
36 papers in training set
Top 0.4%
3.3%
11
Frontiers in Genetics
197 papers in training set
Top 2%
3.1%
12
PLOS Computational Biology
1633 papers in training set
Top 14%
1.9%
13
Bioinformatics
1061 papers in training set
Top 7%
1.7%
14
GigaScience
172 papers in training set
Top 2%
1.5%
15
Database
51 papers in training set
Top 0.6%
1.2%
16
G3 Genes|Genomes|Genetics
351 papers in training set
Top 2%
1.0%
17
Briefings in Bioinformatics
326 papers in training set
Top 6%
0.9%
18
Journal of Computational Biology
37 papers in training set
Top 0.4%
0.9%
19
Journal of Genetics and Genomics
36 papers in training set
Top 2%
0.8%
20
Plant Physiology
217 papers in training set
Top 3%
0.8%
21
IEEE Journal of Biomedical and Health Informatics
34 papers in training set
Top 2%
0.7%
22
Computational and Structural Biotechnology Journal
216 papers in training set
Top 10%
0.7%
23
Frontiers in Bioinformatics
45 papers in training set
Top 1%
0.7%
24
The FEBS Journal
78 papers in training set
Top 1%
0.7%
25
Pest Management Science
32 papers in training set
Top 1%
0.7%
26
Plant Direct
81 papers in training set
Top 2%
0.7%
27
PLOS Neglected Tropical Diseases
378 papers in training set
Top 6%
0.5%
28
BioData Mining
15 papers in training set
Top 1%
0.5%
29
Bioengineering
24 papers in training set
Top 2%
0.5%
30
Frontiers in Bioengineering and Biotechnology
88 papers in training set
Top 4%
0.5%