Back

Large Language Models for Accessible Reporting of Bioinformatics Analyses in Interdisciplinary Contexts

Yu, L.; Kim, D.; Cao, Y.; Shu, M. W. S.; Shen, M.; Liang, X.; Gu, J.; Jayakumar, R.; Ding, W.; Yang, F.; Zhang, X.; Kim, J.; Yang, P.; Yang, J. Y. H.

2025-11-11 bioinformatics

10.1101/2025.11.09.687479 bioRxiv

Show abstract

Health and life scientists routinely collaborate with quantitative scientists for data analysis and interpretation, yet miscommunication often obscures the interpretation of complex results. Large Language Models (LLMs) offer a promising way to bridge this gap, but their cross-discipline interpretative skill remains limited on real-word bioinformatics analyses. We therefore benchmarked four state-of-the-art LLMs: GPT-4o, o1, Claude 3.7 Sonnet, and Gemini 2.0 Flash, using automated and human evaluation frameworks to ensure holistic evaluation. Automated assessment employed multiple choice questions designed using Blooms taxonomy to assess multiple levels of understanding, while human evaluation tasked scientists to score summaries for factual consistency, lack of harmfulness, comprehensiveness, and coherence. All generally produced readable and largely safe summaries, confirming their value for first-pass translation of technical analyses, however frequently misinterpreted visualisations, produced verbose summaries and rarely offered novel insights beyond what was already contained in the analytics. Our findings suggest that LLMs are best suited for easing interdisciplinary communication rather than replacing domain expertise and human oversight remains essential to guarantee accuracy, interpretative depth, and the generation of genuinely novel scientific insights.

Large Language Models for Accessible Reporting of Bioinformatics Analyses in Interdisciplinary Contexts

Matching journals