Back

AIChatBio: An Artificial Intelligence Chatbot Model for Biological Knowledge Retrieval and Biomacromolecule Design

Liu, E.; Liu, C.-Y.

2025-09-17 bioinformatics
10.1101/2025.09.11.675485 bioRxiv
Show abstract

Conversational agents for bioinformatics data analysis and interpretation remain largely inaccessible to the broader biological research community. This gap is especially pronounced in the current Generative AI era, which demands a paradigm shift in how researchers interact with computational tools. There is a pressing need to bridge well-established biological infrastructures and databases with the capabilities of Generative AI to democratize access to bioinformatics insights. In this study, we present an integrated framework that connects the robust bioinformatics resources of the National Center for Biotechnology Information (NCBI) with Generative AI through a novel Artificial Intelligence Chatbot Model for Biological Knowledge Retrieval and Biomacromolecule Design, AIChatBio. This operational model positions Generative AI as an intelligent information hub. User interactions with the chatbot are enriched by real-time data retrieval from web portal of the biological databases hosted at NCBI, which are then translated into structured inquiries toward the web applications of NCBI and bioinformatics analysis tools. These inquiries are directed toward bioinformatics analysis tools to perform tasks such as sequence alignment and primer design. Additionally, the outputs generated by these tools are interpreted by the chatbot, allowing users to gain meaningful insights without requiring deep technical expertise in bioinformatics. To demonstrate the feasibility of this approach, we developed a prototype implementation that integrates PCR primer design using Primer-BLAST [1], literature interpretation via PubMed for general topics, and the LitVar2 for SNPs associated topics [23]. This system was built using TypeScript and the ChatGPT API combining the bioinformatics web applications from NCBI, and its source code is publicly available via GitHub and the Chrome extension is available at Chrome Web Store. Our work highlights the potential of Generative AI to transform biological data analysis workflows, making them more intuitive, accessible, and scalable for researchers across disciplines.

Matching journals

The top 3 journals account for 50% of the predicted probability mass.

1
Bioinformatics Advances
184 papers in training set
Top 0.1%
22.6%
2
Bioinformatics
1061 papers in training set
Top 1%
19.5%
3
BMC Bioinformatics
383 papers in training set
Top 0.8%
10.5%
50% of probability mass above
4
PLOS ONE
4510 papers in training set
Top 28%
6.3%
5
NAR Genomics and Bioinformatics
214 papers in training set
Top 0.3%
4.9%
6
PLOS Computational Biology
1633 papers in training set
Top 8%
4.2%
7
GigaScience
172 papers in training set
Top 0.5%
3.6%
8
Computational and Structural Biotechnology Journal
216 papers in training set
Top 2%
2.9%
9
Nucleic Acids Research
1128 papers in training set
Top 8%
2.4%
10
Journal of Molecular Biology
217 papers in training set
Top 1.0%
2.4%
11
Briefings in Bioinformatics
326 papers in training set
Top 5%
1.3%
12
IEEE/ACM Transactions on Computational Biology and Bioinformatics
32 papers in training set
Top 0.3%
1.2%
13
iScience
1063 papers in training set
Top 23%
1.1%
14
Scientific Reports
3102 papers in training set
Top 69%
1.0%
15
Frontiers in Genetics
197 papers in training set
Top 8%
1.0%
16
Heliyon
146 papers in training set
Top 5%
0.9%
17
Frontiers in Bioinformatics
45 papers in training set
Top 0.9%
0.7%
18
Advanced Science
249 papers in training set
Top 21%
0.6%
19
Genome Research
409 papers in training set
Top 5%
0.6%
20
Genomics, Proteomics & Bioinformatics
171 papers in training set
Top 7%
0.6%
21
Frontiers in Molecular Biosciences
100 papers in training set
Top 6%
0.6%