Back

ParSeek: Accurate cryo-EM particle picking with a deep learning model trained on synthetic data

Qian, J.; Gong, Y.; Liu, F.; Huang, Y.; Guo, G.; Zhu, Y.; Huang, Q.

2026-05-11 molecular biology
10.64898/2026.05.07.720949 bioRxiv
Show abstract

Accurate particle picking from noisy cryo-EM micrographs is essential for high-resolution reconstruction. Current deep learning methods rely on manually annotated data, which is labor-intensive, subjective, and limits particle recall under low signal-to-noise ratio (SNR). Here we introduce ParSeek, an automated picker trained entirely on synthetic data without human annotation. Synthetic micrographs are generated by projecting known 3D structures into realistic background patches that reproduce experimental noise. On seven public cryo-EM datasets, ParSeek outperformed Topaz and CryoSegNet on four datasets, achieving the highest F1-score (up to 0.82) and reaching 0.63 on a challenging membrane protein dataset. Density maps from ParSeek-picked particles showed cross-correlation coefficients up to 0.995 with the reference and a minimal resolution difference of 0.1 [A]. ParSeek also overcame severe orientation bias on an influenza dataset, yielding a reasonable reconstruction. Applied to three experimental datasets (an antibody-antigen complex and two GPCRs), ParSeek enabled reconstructions at 5.0 [A], 4.0 [A], and 2.8 [A], respectively. The 2.8 [A] map resolved side-chain densities and ligand flexibility. This study establishes a fully synthetic-data-driven strategy that eliminates manual annotation for training cryo-EM deep-learning models, paving the way for automated, unbiased particle picking.

Matching journals

The top 2 journals account for 50% of the predicted probability mass.

1
Nature Methods
336 papers in training set
Top 0.1%
28.2%
2
Nature Communications
4913 papers in training set
Top 2%
23.0%
50% of probability mass above
3
Structure
175 papers in training set
Top 0.2%
9.3%
4
Science
429 papers in training set
Top 5%
6.5%
5
Nature Biotechnology
147 papers in training set
Top 2%
3.7%
6
Communications Biology
886 papers in training set
Top 2%
3.7%
7
Journal of Structural Biology
58 papers in training set
Top 0.3%
3.7%
8
Proceedings of the National Academy of Sciences
2130 papers in training set
Top 27%
2.1%
9
Nature Machine Intelligence
61 papers in training set
Top 3%
1.2%
10
Nature Structural & Molecular Biology
218 papers in training set
Top 4%
1.0%
11
IUCrJ
29 papers in training set
Top 0.3%
0.9%
12
eLife
5422 papers in training set
Top 55%
0.8%
13
The Lancet Infectious Diseases
71 papers in training set
Top 3%
0.8%
14
Nature
575 papers in training set
Top 15%
0.8%
15
Advanced Science
249 papers in training set
Top 18%
0.8%
16
Acta Crystallographica Section D Structural Biology
54 papers in training set
Top 0.3%
0.8%
17
Cell
370 papers in training set
Top 17%
0.8%
18
Science Advances
1098 papers in training set
Top 29%
0.8%
19
Journal of Structural Biology: X
15 papers in training set
Top 0.2%
0.7%
20
Cell Discovery
54 papers in training set
Top 5%
0.7%
21
Cell Systems
167 papers in training set
Top 13%
0.7%
22
Scientific Reports
3102 papers in training set
Top 77%
0.7%
23
Cell Reports Methods
141 papers in training set
Top 7%
0.5%
24
Nucleic Acids Research
1128 papers in training set
Top 21%
0.5%