Back

Autoresearch Discovery of Interpretable Filter Rules for Antibody Binder Classification

Landajuela, M.

2026-05-11 bioinformatics
10.64898/2026.05.05.723069 bioRxiv
Show abstract

Antibody design campaigns increasingly generate many candidates before only a small subset can be tested experimentally, making candidate filtering a central bottleneck. We study whether an autoresearch loop can discover better training-free filters for antibody binder classification by iteratively proposing rule variants, evaluating them under a fixed Leave-One-System-Out protocol, recording each experiment in version control, and using the results to guide the next iteration. Across 75 unique logged filter variants on seven antibody-antigen systems, the loop improves average ROC-AUC from 0.6371 for the initial baseline to 0.8060 for a compact final rule that we call the RMSD-Tuned Triad rule, an absolute gain of 0.1689 and a relative improvement of 26.5%. The discovered filter is competitive with supervised machine learning baselines and prompted LLM baselines evaluated on the same systems: it exceeds logistic regression (0.7144), feature-selected balanced logistic regression (0.7536), and GPT-4o tabular few-shot prompting (0.7640), and it comes within 0.0044 ROC-AUC of the strongest GPT-5 tabular few-shot result (0.8104). Unlike the LLM baseline, the final rule requires no prompted examples and no LLM inference once the numeric structure-derived features are available. These results show that systematic autoresearch can turn simple structural-confidence signals into compact, interpretable filters that are useful when target-specific training data are scarce.

Matching journals

The top 4 journals account for 50% of the predicted probability mass.

1
Cell Systems
167 papers in training set
Top 0.1%
28.4%
2
Nature Communications
4913 papers in training set
Top 13%
12.8%
3
Nature Machine Intelligence
61 papers in training set
Top 0.3%
7.4%
4
Nature Methods
336 papers in training set
Top 2%
5.0%
50% of probability mass above
5
Science
429 papers in training set
Top 7%
4.5%
6
Nature Biotechnology
147 papers in training set
Top 2%
3.8%
7
Bioinformatics
1061 papers in training set
Top 5%
3.7%
8
PLOS Computational Biology
1633 papers in training set
Top 13%
2.1%
9
Nucleic Acids Research
1128 papers in training set
Top 8%
2.1%
10
Cell Genomics
162 papers in training set
Top 3%
1.9%
11
Patterns
70 papers in training set
Top 0.7%
1.8%
12
eLife
5422 papers in training set
Top 40%
1.7%
13
Briefings in Bioinformatics
326 papers in training set
Top 4%
1.7%
14
mAbs
28 papers in training set
Top 0.2%
1.3%
15
Communications Biology
886 papers in training set
Top 15%
1.1%
16
Structure
175 papers in training set
Top 2%
1.1%
17
Journal of Chemical Information and Modeling
207 papers in training set
Top 2%
1.0%
18
Protein Science
221 papers in training set
Top 1%
1.0%
19
Proceedings of the National Academy of Sciences
2130 papers in training set
Top 41%
0.9%
20
Frontiers in Immunology
586 papers in training set
Top 6%
0.9%
21
Cell Reports Methods
141 papers in training set
Top 4%
0.8%
22
Nature Computational Science
50 papers in training set
Top 1%
0.8%
23
iScience
1063 papers in training set
Top 30%
0.8%
24
Scientific Reports
3102 papers in training set
Top 73%
0.8%
25
Bioinformatics Advances
184 papers in training set
Top 5%
0.7%
26
Nature
575 papers in training set
Top 17%
0.7%
27
PLOS ONE
4510 papers in training set
Top 72%
0.5%
28
Genome Research
409 papers in training set
Top 5%
0.5%
29
Genome Biology
555 papers in training set
Top 9%
0.5%
30
Journal of Molecular Biology
217 papers in training set
Top 5%
0.5%