Back

SpaceExpander: Automated Drafting and Evaluation of Markush Claims for Chemical Space Expansion

Wu, R.; Mao, L.; Diao, Y.; Li, H.

2026-04-14 bioinformatics
10.64898/2026.04.09.716825 bioRxiv
Show abstract

Drafting Markush claims for chemical patents remains difficult because manual claim writing is slow, error prone, and often fails to capture related chemical space in a systematic manner. We developed SpaceExpander, a computational method that converts disclosed compounds into generalized Markush claims by extracting core scaffolds, defining variable positions, decomposing complex substituents, and expanding substituent space through fragment matching. We evaluated the method on 24 publicly available chemical patents and compared its performance with IntelliPatent. SpaceExpander achieved a mean atom level scaffold accuracy of 0.92 and exactly recovered the reference scaffold in 19 of 24 patents. By contrast, IntelliPatent could process only 2 patents from the same set, indicating more limited applicability to structurally diverse cases. We further examined practical claim coverage in a case study based on the Osimertinib patent. Using representative disclosed compounds as input, SpaceExpander drafted a Markush claim that covered 5 of 7 additional approved third-generation EGFR inhibitors beyond Osimertinib. These results show that SpaceExpander is a validated method for automated Markush claim drafting and chemical space expansion.

Matching journals

The top 5 journals account for 50% of the predicted probability mass.

1
Journal of Chemical Information and Modeling
207 papers in training set
Top 0.1%
26.1%
2
Journal of Cheminformatics
25 papers in training set
Top 0.1%
10.2%
3
Briefings in Bioinformatics
326 papers in training set
Top 0.7%
6.9%
4
Bioinformatics
1061 papers in training set
Top 4%
6.4%
5
Nature Communications
4913 papers in training set
Top 29%
6.4%
50% of probability mass above
6
PLOS ONE
4510 papers in training set
Top 33%
4.3%
7
BMC Bioinformatics
383 papers in training set
Top 3%
2.6%
8
Advanced Science
249 papers in training set
Top 8%
2.1%
9
Scientific Reports
3102 papers in training set
Top 50%
2.1%
10
Genomics, Proteomics & Bioinformatics
171 papers in training set
Top 3%
1.7%
11
PLOS Computational Biology
1633 papers in training set
Top 18%
1.5%
12
Communications Chemistry
39 papers in training set
Top 0.4%
1.3%
13
Nucleic Acids Research
1128 papers in training set
Top 13%
1.3%
14
Cell Systems
167 papers in training set
Top 9%
1.2%
15
Nature Methods
336 papers in training set
Top 5%
1.2%
16
Proceedings of the National Academy of Sciences
2130 papers in training set
Top 38%
1.2%
17
Computational and Structural Biotechnology Journal
216 papers in training set
Top 6%
1.2%
18
Artificial Intelligence in the Life Sciences
11 papers in training set
Top 0.1%
1.0%
19
Chemical Science
71 papers in training set
Top 2%
0.9%
20
Bioinformatics Advances
184 papers in training set
Top 4%
0.9%
21
Clinical Pharmacology & Therapeutics
25 papers in training set
Top 0.6%
0.9%
22
Scientific Data
174 papers in training set
Top 2%
0.9%
23
Genome Medicine
154 papers in training set
Top 7%
0.8%
24
Cancer Research
116 papers in training set
Top 4%
0.7%
25
Metabolites
50 papers in training set
Top 1%
0.7%
26
Nature Machine Intelligence
61 papers in training set
Top 4%
0.7%
27
Communications Biology
886 papers in training set
Top 26%
0.7%
28
Molecules
37 papers in training set
Top 3%
0.5%
29
Acta Pharmaceutica Sinica B
11 papers in training set
Top 1%
0.5%