Back

IEKB: a comprehensive knowledge base for inner ear genetics integrating curated associations, cochlear interactions, Bayesian candidate prioritisation, explainable dark-gene support relations, and a scientific entity network

Wang, H.; Chen, W.; Ning, H.; Cai, Y.; Xu, Y.; Hou, X.; Pang, L.; Luo, Z.; Tian, C.

2026-04-09 bioinformatics
10.64898/2026.04.06.716823 bioRxiv
Show abstract

Inner-ear genetics has expanded rapidly, yet the supporting evidence remains dispersed across a vast literature and across resources that typically emphasise loci, variants, or expression data rather than integrated biological interpretation. Here we present the Inner Ear Knowledge Base (IEKB; https://earkb.org), an open database that unifies curated associations, cochlear interaction evidence, candidate prioritisation, explainable support relations, and network exploration for inner-ear research. IEKB was built with an automated agent-assisted curation workflow that combines schema-constrained literature extraction, continuous human monitoring, and final expert review by inner-ear genetics researchers. By systematically analysing 250,696 PubMed-indexed records retrieved across 16,563 screened genes, IEKB curates 6,051 gene-phenotype-disease associations from 2,494 genes across 43 phenotype categories and 4,102 cochlear gene-gene interactions with pathway, cell-type, and experimental context. IEKB further includes a Bayesian "dark matter" module that prioritises 243,071 candidate gene-phenotype associations for 13,229 genes across all 43 phenotypes (global AUC-ROC = 0.8603; global AUC-PR = 0.1674), together with a supervised dark-relation layer that ranks phenotype-specific known-gene support for each candidate and a multi-entity scientific network containing nearly 4,000 entities, 28,616 deterministic edges, and 83,712 literature-derived relational links. The web resource supports interactive search, multi-parameter filtering, gene-detail pages, bibliometric exploration, domain-specific enrichment against IEKB phenotype and disease gene sets, network visualisation, bulk download in CSV, JSON, SQLite, and XLSX formats, and natural-language evidence-grounded question answering through a companion conversational interface (IEKB QA). To our knowledge, IEKB is the first openly accessible inner-ear resource that integrates curated associations, cochlear interactions, probabilistic candidate prioritisation, auditable known-gene support relations for novel candidates, and a multi-entity scientific network within a single database. All data are released without registration under the CC BY 4.0 license.

Matching journals

The top 8 journals account for 50% of the predicted probability mass.

1
Nucleic Acids Research
1128 papers in training set
Top 0.9%
14.4%
2
Genome Medicine
154 papers in training set
Top 0.5%
9.2%
3
The American Journal of Human Genetics
206 papers in training set
Top 0.6%
7.2%
4
Bioinformatics
1061 papers in training set
Top 4%
6.4%
5
Nature Communications
4913 papers in training set
Top 32%
4.9%
6
Bioinformatics Advances
184 papers in training set
Top 0.9%
4.3%
7
European Journal of Human Genetics
49 papers in training set
Top 0.3%
3.6%
8
Nature Genetics
240 papers in training set
Top 2%
3.6%
50% of probability mass above
9
PLOS ONE
4510 papers in training set
Top 39%
3.6%
10
Genome Research
409 papers in training set
Top 1%
2.6%
11
Nature Methods
336 papers in training set
Top 4%
1.9%
12
Scientific Reports
3102 papers in training set
Top 53%
1.9%
13
Cell Genomics
162 papers in training set
Top 3%
1.7%
14
Database
51 papers in training set
Top 0.5%
1.5%
15
Human Genetics
25 papers in training set
Top 0.2%
1.5%
16
Nature Medicine
117 papers in training set
Top 3%
1.5%
17
Genetics in Medicine
69 papers in training set
Top 0.7%
1.3%
18
PLOS Genetics
756 papers in training set
Top 10%
1.3%
19
BMC Medical Genomics
36 papers in training set
Top 0.7%
1.2%
20
Genome Biology
555 papers in training set
Top 5%
1.2%
21
Disease Models & Mechanisms
119 papers in training set
Top 2%
1.0%
22
npj Genomic Medicine
33 papers in training set
Top 0.8%
0.8%
23
PLOS Computational Biology
1633 papers in training set
Top 23%
0.8%
24
Proceedings of the National Academy of Sciences
2130 papers in training set
Top 44%
0.7%
25
Frontiers in Genetics
197 papers in training set
Top 10%
0.7%
26
Cell Systems
167 papers in training set
Top 12%
0.7%
27
Nature
575 papers in training set
Top 15%
0.7%
28
Genetics
225 papers in training set
Top 4%
0.7%
29
BMC Bioinformatics
383 papers in training set
Top 7%
0.7%
30
BioData Mining
15 papers in training set
Top 1%
0.6%