Back

Evaluating AI-Assisted Customer Verification for Synthetic Nucleic Acid Screening

Acelas, A.; Palya, H.; Flyangolts, K.; Fady, P.-E.; Nelson, C.

2026-03-01 synthetic biology
10.64898/2026.02.27.708645 bioRxiv
Show abstract

Legitimacy screening, the process of verifying the identity and purpose of customers ordering synthetic nucleic acids, is a primary safeguard against the misuse of synthetic biology. However, the associated costs discourage the adoption of screening practices. To evaluate whether AI tools can facilitate this process, we tested five large language models on five verification tasks using customer profiles of life sciences researchers from around the world. We compared AI performance against an expert human baseline on flag accuracy, source quality, source fidelity, and cost. The best-performing model, Gemini 2.5 Pro aided by four bibliographic and sanctions APIs, achieved comparable flag accuracy to the human baseline (90% and 89%, respectively). Gemini 2.5 Pro outperformed the human baseline on source quality and fidelity, at roughly one-tenth of the cost ($1.18 vs. $14.04 per customer). For information-gathering tasks, which excluded the human review step, costs averaged $0.23 per customer, around 50 times cheaper than human screening. These results support piloting AI-assisted legitimacy screening at providers of synthetic nucleic acids and other dual-use biotechnology products, with AI systems handling information gathering and human reviewers retaining authority over order fulfillment decisions.

Matching journals

The top 3 journals account for 50% of the predicted probability mass.

1
ACS Synthetic Biology
256 papers in training set
Top 0.1%
39.0%
2
Synthetic Biology
21 papers in training set
Top 0.1%
8.7%
3
Nucleic Acids Research
1128 papers in training set
Top 4%
5.0%
50% of probability mass above
4
Nature Communications
4913 papers in training set
Top 31%
5.0%
5
PLOS ONE
4510 papers in training set
Top 44%
2.7%
6
Bioinformatics
1061 papers in training set
Top 6%
2.1%
7
BMC Bioinformatics
383 papers in training set
Top 4%
1.8%
8
Proceedings of the National Academy of Sciences
2130 papers in training set
Top 31%
1.7%
9
Scientific Reports
3102 papers in training set
Top 56%
1.7%
10
Metabolic Engineering
68 papers in training set
Top 0.4%
1.7%
11
Molecular Systems Biology
142 papers in training set
Top 0.6%
1.7%
12
GigaScience
172 papers in training set
Top 2%
1.5%
13
Nature Methods
336 papers in training set
Top 4%
1.5%
14
Scientific Data
174 papers in training set
Top 1%
1.4%
15
Computational and Structural Biotechnology Journal
216 papers in training set
Top 6%
1.3%
16
Communications Biology
886 papers in training set
Top 13%
1.3%
17
Philosophical Transactions of the Royal Society B
51 papers in training set
Top 4%
1.0%
18
Cell Systems
167 papers in training set
Top 10%
1.0%
19
BMC Biology
248 papers in training set
Top 3%
0.9%
20
eLife
5422 papers in training set
Top 54%
0.8%
21
Journal of the American Medical Informatics Association
61 papers in training set
Top 2%
0.8%
22
PLOS Computational Biology
1633 papers in training set
Top 24%
0.8%
23
iScience
1063 papers in training set
Top 30%
0.8%
24
Briefings in Bioinformatics
326 papers in training set
Top 6%
0.8%
25
Advanced Science
249 papers in training set
Top 19%
0.7%
26
Nature
575 papers in training set
Top 15%
0.7%
27
Cell Reports Medicine
140 papers in training set
Top 8%
0.7%
28
The Lancet Digital Health
25 papers in training set
Top 1%
0.7%
29
ACS Omega
90 papers in training set
Top 5%
0.7%
30
Database
51 papers in training set
Top 1%
0.7%