Back

GeneReL: A Large Language Model-Powered Platform for Gene Regulatory Relationship Extraction with Community Curation

Park, J.-S.; Ha, S.; Lee, Y.; Kang, Y. J.

2026-02-11 bioinformatics
10.64898/2026.02.10.705020 bioRxiv
Show abstract

MotivationGene regulatory networks provide fundamental insights into plant biology, yet extracting structured interaction data from scientific literature remains a significant bottleneck. Traditional manual curation cannot scale to meet the demands of modern research, while automated text mining approaches struggle with the complexity of gene nomenclature and relationship classification. Large language models offer promising capabilities for information extraction, but integrated platforms combining LLM extraction with community validation for plant regulatory databases remain scarce. ResultsWe developed GeneReL, an integrated platform combining LLM-based extraction with community-driven curation for gene regulatory networks in Arabidopsis thaliana. The system employs a tiered pipeline using Claude Haiku 4.5 for screening, Claude Sonnet 4 for extraction, and Claude Opus 4 for verification, along with a novel five-step gene normalization pipeline incorporating paper-text search and LLM-based disambiguation with UniProt annotations. The database contains 13,710 curated interactions across 51 relationship types, with 90.2% classified as high confidence based on linguistic certainty markers in source text. Comparison with IntAct reveals 86.8% of interactions are unique to our literature-derived database, demonstrating complementary coverage to existing resources. The web platform provides card-based browsing with voting capabilities, interactive network visualization using Cytoscape.js with locus-ID-based node consolidation, and administrative interfaces for curator review of ambiguous gene mappings. Availability and ImplementationGeneReL is freely accessible at https://generel.newgenes.me. Contactkangyangjae@gnu.ac.kr

Matching journals

The top 6 journals account for 50% of the predicted probability mass.

1
BMC Bioinformatics
383 papers in training set
Top 0.3%
18.3%
2
Bioinformatics
1061 papers in training set
Top 2%
14.1%
3
Plant Direct
81 papers in training set
Top 0.3%
6.3%
4
PLOS ONE
4510 papers in training set
Top 29%
6.3%
5
Plant Communications
35 papers in training set
Top 0.2%
4.8%
6
NAR Genomics and Bioinformatics
214 papers in training set
Top 0.6%
3.9%
50% of probability mass above
7
Plant Physiology
217 papers in training set
Top 1%
3.5%
8
Bioinformatics Advances
184 papers in training set
Top 1%
3.5%
9
Computational and Structural Biotechnology Journal
216 papers in training set
Top 2%
3.5%
10
Database
51 papers in training set
Top 0.2%
3.2%
11
Frontiers in Plant Science
240 papers in training set
Top 3%
2.7%
12
Nucleic Acids Research
1128 papers in training set
Top 8%
2.6%
13
The Plant Journal
197 papers in training set
Top 2%
1.9%
14
Frontiers in Genetics
197 papers in training set
Top 4%
1.9%
15
Molecular Plant
36 papers in training set
Top 0.9%
1.5%
16
Briefings in Bioinformatics
326 papers in training set
Top 5%
1.3%
17
Horticulture Research
43 papers in training set
Top 1%
1.2%
18
PLOS Computational Biology
1633 papers in training set
Top 20%
1.2%
19
Plant Biotechnology Journal
56 papers in training set
Top 1%
0.9%
20
Scientific Reports
3102 papers in training set
Top 71%
0.9%
21
Genomics, Proteomics & Bioinformatics
171 papers in training set
Top 6%
0.8%
22
GigaScience
172 papers in training set
Top 3%
0.8%
23
Applications in Plant Sciences
21 papers in training set
Top 0.3%
0.7%
24
The Plant Cell
141 papers in training set
Top 2%
0.7%
25
Heliyon
146 papers in training set
Top 7%
0.7%
26
GENETICS
189 papers in training set
Top 2%
0.6%
27
Genes
126 papers in training set
Top 4%
0.6%
28
Journal of Biomedical Informatics
45 papers in training set
Top 2%
0.6%
29
BMC Genomics
328 papers in training set
Top 7%
0.6%
30
in silico Plants
24 papers in training set
Top 0.4%
0.6%