Back

Heterogeneous reconstruction algorithms for cryoEM achieve limited particle classification accuracy on real benchmark datasets

Kinman, L. F.; Grassetti, A. V.; Carreira, M. V.; Davis, J. H.

2026-05-11 biochemistry
10.64898/2026.05.08.722747 bioRxiv
Show abstract

The emergence of single-particle cryoEM as a powerful method for structure determination has in large part been fueled by its ability to resolve both single static structures and complex conformational landscapes. Indeed, modern approaches to the heterogeneous reconstruction task can resolve 100s-1,000s of different maps from a single cryoEM dataset. How accurate these algorithms are, however, has proven difficult to rigorously assess, due to a lack of suitable benchmark datasets containing both realistic noise features and ground-truth labels. To address this obstacle, we recently developed a series of benchmark datasets that leverage the targeting power of Cas9 and the programmable heterogeneity of DNA to newly offer access to ground-truth per-particle structural labels in real data. Here, we challenged two popular heterogeneous reconstruction algorithms with mixed particle stacks resampled in silico from these datasets, finding that existing approaches resolve the encoded heterogeneity with limited accuracy. In particular, in realistic particle stacks with complex, multi-scale, and multi-axis heterogeneity, we observed that reconstruction of encoded heterogeneity depended strongly on the application of prior information about where heterogeneity was expected, and that individual particle assignments were made with significant error even when the correct structural states were reconstructed. Both molecular breathing motions and data collection features, such as defocus and projection angle, contributed to the observed particle assignment error. These results highlight important shortcomings of existing heterogeneous reconstruction methods and suggest new avenues for method development in both data collection strategies and in heterogeneous classification and reconstruction algorithms.

Matching journals

The top 5 journals account for 50% of the predicted probability mass.

1
Nature Communications
4913 papers in training set
Top 6%
18.6%
2
Structure
175 papers in training set
Top 0.1%
18.5%
3
eLife
5422 papers in training set
Top 13%
6.3%
4
Journal of Structural Biology: X
15 papers in training set
Top 0.1%
6.3%
5
Nature Methods
336 papers in training set
Top 2%
6.3%
50% of probability mass above
6
Communications Biology
886 papers in training set
Top 0.7%
4.8%
7
Journal of the American Chemical Society
199 papers in training set
Top 1%
4.8%
8
Journal of Structural Biology
58 papers in training set
Top 0.3%
3.9%
9
Proceedings of the National Academy of Sciences
2130 papers in training set
Top 20%
3.6%
10
IUCrJ
29 papers in training set
Top 0.1%
2.7%
11
PLOS Computational Biology
1633 papers in training set
Top 18%
1.5%
12
Cell Systems
167 papers in training set
Top 8%
1.5%
13
Cell Reports Methods
141 papers in training set
Top 3%
1.2%
14
Nature Structural & Molecular Biology
218 papers in training set
Top 4%
1.2%
15
Acta Crystallographica Section D Structural Biology
54 papers in training set
Top 0.3%
0.9%
16
Science Advances
1098 papers in training set
Top 27%
0.9%
17
Communications Chemistry
39 papers in training set
Top 1.0%
0.8%
18
Scientific Reports
3102 papers in training set
Top 73%
0.8%
19
Nature Biotechnology
147 papers in training set
Top 7%
0.8%
20
ACS Nano
99 papers in training set
Top 4%
0.8%
21
Nucleic Acids Research
1128 papers in training set
Top 17%
0.8%
22
Microscopy and Microanalysis
12 papers in training set
Top 0.1%
0.7%
23
Science
429 papers in training set
Top 22%
0.6%
24
Advanced Science
249 papers in training set
Top 22%
0.6%
25
PLOS Biology
408 papers in training set
Top 23%
0.6%