Back

Automated Knowledge Graph Construction for CAR T Cell Receptor Design via Hybrid Text Mining

Luo, H.; Tang, D.; Zivanov, A.; Miskov-Zivanov, N.

2026-04-07 synthetic biology
10.64898/2026.04.06.716719 bioRxiv
Show abstract

Designing next-generation Chimeric Antigen Receptors (CARs) requires a systematic understanding of intracellular signaling domains and their downstream biological effects, yet no comprehensive knowledge resource currently exists for this purpose. Here, we present an automated workflow that integrates multiple natural language processing and large language model tools to extract biomolecular interactions from PubMed literature and assemble them into a CAR T cell signaling knowledge graph. Our pipeline combines REACH, INDRA, and Llama 3 across 15 targeted search queries, yielding a directed multi-relational graph of [~]7,500 unique interactions among [~]1,800 entities, including proteins, biological processes, and chemicals. We further demonstrate that queries incorporating biological process ontology terms retrieve more interaction-rich papers than protein-name-only searches, offering practical guidance for future literature mining efforts. The resulting knowledge base provides a structured foundation for predicting T cell phenotypes and prioritizing intracellular domain candidates for CAR design, with broader applicability to knowledge-driven inference in immunotherapy research.

Matching journals

The top 7 journals account for 50% of the predicted probability mass.

1
Nucleic Acids Research
1128 papers in training set
Top 0.3%
22.6%
2
Nature Communications
4913 papers in training set
Top 25%
7.2%
3
Advanced Science
249 papers in training set
Top 3%
4.9%
4
PLOS Computational Biology
1633 papers in training set
Top 8%
4.3%
5
ACS Synthetic Biology
256 papers in training set
Top 0.9%
4.0%
6
Cell Systems
167 papers in training set
Top 4%
3.6%
7
Science Advances
1098 papers in training set
Top 5%
3.6%
50% of probability mass above
8
iScience
1063 papers in training set
Top 5%
3.6%
9
Proceedings of the National Academy of Sciences
2130 papers in training set
Top 24%
2.7%
10
Nature Machine Intelligence
61 papers in training set
Top 1%
2.6%
11
Computational and Structural Biotechnology Journal
216 papers in training set
Top 3%
2.5%
12
Frontiers in Immunology
586 papers in training set
Top 3%
2.4%
13
Bioinformatics
1061 papers in training set
Top 6%
2.4%
14
Nature Methods
336 papers in training set
Top 4%
2.1%
15
Cancer Cell
38 papers in training set
Top 0.7%
2.1%
16
eLife
5422 papers in training set
Top 40%
1.8%
17
Cell Reports Medicine
140 papers in training set
Top 4%
1.7%
18
Communications Biology
886 papers in training set
Top 11%
1.5%
19
Briefings in Bioinformatics
326 papers in training set
Top 5%
1.3%
20
Scientific Reports
3102 papers in training set
Top 64%
1.3%
21
Patterns
70 papers in training set
Top 2%
1.0%
22
Science Translational Medicine
111 papers in training set
Top 5%
0.9%
23
Cell Discovery
54 papers in training set
Top 4%
0.9%
24
Cell Reports Methods
141 papers in training set
Top 4%
0.9%
25
npj Systems Biology and Applications
99 papers in training set
Top 2%
0.8%
26
Nature Biomedical Engineering
42 papers in training set
Top 2%
0.8%
27
Nature Biotechnology
147 papers in training set
Top 7%
0.7%
28
Journal of Chemical Information and Modeling
207 papers in training set
Top 3%
0.7%
29
npj Digital Medicine
97 papers in training set
Top 4%
0.7%
30
Science
429 papers in training set
Top 20%
0.7%