Back

CLEAR-HPV: Interpretable Concept Discovery for HPV-Associated Morphology in Whole-Slide Histology

Liu-Swetz, Y.; Tan, S.; Qin, W.; Wang, H.

2026-02-06 bioinformatics
10.64898/2026.02.04.703870 bioRxiv
Show abstract

Human papillomavirus (HPV) status is a critical determinant of prognosis and treatment response in head and neck and cervical cancers. Although attention-based multiple instance learning (MIL) achieves strong slide-level prediction for HPV-related whole-slide histopathology, it provides limited morphologic interpretability. To address this limitation, we introduce Concept-Level Explainable Attention-guided Representation for HPV (CLEAR-HPV), a framework that restructures the MIL latent space using attention to enable concept discovery without requiring concept labels during training. Operating in an attention-weighted latent space, CLEAR-HPV automatically discovers keratinizing, basaloid, and stromal morphologic concepts, generates spatial concept maps, and represents each slide using a compact concept-fraction vector. CLEAR-HPVs concept-fraction vectors preserve the predictive information of the original MIL embeddings while reducing the high-dimensional feature space (e.g., 1536 dimensions) to only 10 interpretable concepts. CLEAR-HPV generalizes consistently across TCGA-HNSCC, TCGA-CESC, and CPTAC-HNSCC, providing compact, concept-level interpretability through a general, backbone-agnostic framework for attention-based MIL models of whole-slide histopathology.

Matching journals

The top 10 journals account for 50% of the predicted probability mass.

1
Advanced Science
249 papers in training set
Top 2%
8.5%
2
Nature Communications
4913 papers in training set
Top 22%
8.5%
3
Bioinformatics
1061 papers in training set
Top 4%
6.9%
4
Nature Machine Intelligence
61 papers in training set
Top 0.4%
6.5%
5
Cell Systems
167 papers in training set
Top 3%
3.7%
6
Genome Medicine
154 papers in training set
Top 2%
3.6%
7
npj Digital Medicine
97 papers in training set
Top 1%
3.6%
8
Scientific Reports
3102 papers in training set
Top 41%
3.1%
9
Nature Medicine
117 papers in training set
Top 1%
2.9%
10
Briefings in Bioinformatics
326 papers in training set
Top 2%
2.8%
50% of probability mass above
11
Cell Reports Medicine
140 papers in training set
Top 2%
2.1%
12
Nucleic Acids Research
1128 papers in training set
Top 8%
2.1%
13
PLOS Computational Biology
1633 papers in training set
Top 14%
1.9%
14
Cancer Research
116 papers in training set
Top 2%
1.8%
15
Nature Methods
336 papers in training set
Top 4%
1.8%
16
iScience
1063 papers in training set
Top 14%
1.7%
17
IEEE Transactions on Computational Biology and Bioinformatics
17 papers in training set
Top 0.2%
1.7%
18
PLOS ONE
4510 papers in training set
Top 53%
1.7%
19
npj Precision Oncology
48 papers in training set
Top 0.6%
1.7%
20
Medical Image Analysis
33 papers in training set
Top 0.6%
1.7%
21
Science Advances
1098 papers in training set
Top 17%
1.7%
22
Genome Research
409 papers in training set
Top 3%
1.3%
23
Proceedings of the National Academy of Sciences
2130 papers in training set
Top 36%
1.3%
24
Nature Biomedical Engineering
42 papers in training set
Top 1%
1.3%
25
Communications Biology
886 papers in training set
Top 14%
1.2%
26
Genome Biology
555 papers in training set
Top 6%
0.9%
27
IEEE Journal of Biomedical and Health Informatics
34 papers in training set
Top 2%
0.8%
28
Nature Biotechnology
147 papers in training set
Top 8%
0.7%
29
eBioMedicine
130 papers in training set
Top 5%
0.7%
30
BMC Bioinformatics
383 papers in training set
Top 8%
0.7%