VaxLLM: Leveraging Fine-tuned Large Language Model for automated annotation of Brucella Vaccines

Li, X.; Zheng, Y.; Hu, J.; Zheng, J.; Wang, Z.; He, Y.

2024-11-26 bioinformatics

10.1101/2024.11.25.625209 bioRxiv

Show abstract

BackgroundVaccines play a vital role in enhancing immune defense and preventing the hosts against a wide range of diseases. However, research relating to vaccine annotation remains a labor-intensive task due to the ever-increasing volume of scientific literature. This study explores the application of Large Language Models (LLMs) to automate the classification and annotation of scientific literature on vaccines as exemplified on Brucella vaccines. ResultsWe developed an automatic pipeline to automatically perform the classification and annotation of Brucella vaccine-related articles, using abstract and title. The pipeline includes VaxLLM (Vaccine Large Language Model), which is a fine-tuned Llama 3 model. VaxLLM systematically classifies articles by identifying the presence of vaccine formulations and extracts the key information about vaccines, including vaccine antigen, vaccine formulation, vaccine platform, host species used as animal models, and experiments used to investigate the vaccine. The model demonstrated high performance in classification (Precision: 0.90, Recall: 1.0, F1-Score: 0.95) and annotation accuracy (97.9%), significantly outperforming a corresponding non-fine-tuned Llama 3 model. The outputs from VaxLLM are presented in a structured format to facilitate the integration into databases such as the VIOLIN vaccine knowledgebase. To further enhance the accuracy and depth of the Brucella vaccine data annotations, the pipeline also incorporates PubTator, enabling cross comparison with VaxLLM annotations and supporting downstream analyses like gene enrichment. ConclusionVaxLLM rapidly and accurately extracted detailed itemized vaccine information from publications, significantly outperforming traditional annotation methods in both speed and precision. VaxLLM also shows great potential in automating knowledge extraction in the domain of vaccine research. AvailabilityAll data is available at https://github.com/xingxianli/VaxLLM, and the model was also uploaded to HuggingFace (https://huggingface.co/Xingxian123/VaxLLM).

VaxLLM: Leveraging Fine-tuned Large Language Model for automated annotation of Brucella Vaccines

Matching journals