Back

Linking Trials to Publications: Enhancing Recall by Identifying Trial Registry Mentions in Full-Text

Holt, A. W.; Smalheiser, N. R.

2025-06-10 health informatics
10.1101/2025.06.09.25329285 medRxiv
Show abstract

We have developed a free, public web-based tool, Trials to Publications, https://arrowsmith.psych.uic.edu/cgi-bin/arrowsmith_uic/TrialPubLinking/trial_pub_link_start.cgi, which employs a machine learning model to predict which publications are likely to present clinical outcome results from a given registered trial in ClinicalTrials.gov. The tool has reasonably high precision, yet in a recent study we found that when registry mentions are not explicitly listed in metadata, textual clues (in title, abstract or other metadata) could identify only roughly 1/3-1/2 of the publications with high confidence. This finding has led us to expand the scope of the tool, to search for explicit mentions of registry numbers that are located within the full-text of publications. We have now retrieved ClinicalTrials.gov registry number mentions (NCT numbers) from the full-text of 3 online biomedical article collections (open access PubMed Central, EuroPMC, and OpenAlex), as well as retrieving biomedical citations that are mentioned within the ClinicalTrials.gov registry itself. These methods greatly increase the recall of identifying linked publications, and should assist those carrying out evidence syntheses as well as those studying the meta-science of clinical trials. HighlightsO_LIThose conducting systematic reviews, other evidence syntheses, and meta-science analyses often need to examine published evidence arising from clinical trials. Finding publications linked to a given trial is a difficult manual process, but several automated tools have been developed. The Trials to Publications tool is the only free, public, currently maintained web-based tool that predicts publications linked to a given trial in ClinicalTrials.gov. C_LIO_LIA recent analysis indicated that the Trials to Publications tool has good precision but limited recall. In the present paper, we greatly enhanced the recall by identifying registry mentions in full-text of articles indexed in open access PubMed Central, EuroPMC and OpenAlex. C_LIO_LIThe tool now has reasonably comprehensive coverage of registry mentions, both for identifying articles that present trial outcome results and for other types of articles that are linked to, or that discuss, the trials. This should greatly save effort during web searches of the literature. C_LI

Matching journals

The top 1 journal accounts for 50% of the predicted probability mass.

1
Journal of the American Medical Informatics Association
61 papers in training set
Top 0.1%
54.6%
50% of probability mass above
2
JAMIA Open
37 papers in training set
Top 0.2%
6.7%
3
Scientific Data
174 papers in training set
Top 0.4%
4.2%
4
PLOS ONE
4510 papers in training set
Top 37%
3.8%
5
Journal of Biomedical Informatics
45 papers in training set
Top 0.5%
3.4%
6
Journal of Medical Internet Research
85 papers in training set
Top 2%
2.2%
7
Research Synthesis Methods
20 papers in training set
Top 0.1%
2.0%
8
Bioinformatics
1061 papers in training set
Top 7%
1.6%
9
Frontiers in Neurology
91 papers in training set
Top 3%
1.4%
10
npj Digital Medicine
97 papers in training set
Top 2%
1.4%
11
Scientific Reports
3102 papers in training set
Top 65%
1.3%
12
Clinical and Translational Science
21 papers in training set
Top 0.7%
1.0%
13
Journal of Clinical Epidemiology
28 papers in training set
Top 0.5%
0.8%
14
PeerJ
261 papers in training set
Top 14%
0.8%
15
PLOS Digital Health
91 papers in training set
Top 3%
0.8%
16
Artificial Intelligence in Medicine
15 papers in training set
Top 0.7%
0.8%
17
BMC Biology
248 papers in training set
Top 5%
0.7%
18
Journal of Clinical and Translational Science
11 papers in training set
Top 0.5%
0.7%
19
F1000Research
79 papers in training set
Top 5%
0.7%
20
FACETS
11 papers in training set
Top 0.4%
0.7%
21
PLOS Biology
408 papers in training set
Top 22%
0.7%
22
Trials
25 papers in training set
Top 2%
0.5%
23
BMJ Health & Care Informatics
13 papers in training set
Top 1%
0.5%
24
BMC Medical Research Methodology
43 papers in training set
Top 2%
0.5%
25
BMC Medical Informatics and Decision Making
39 papers in training set
Top 3%
0.5%
26
International Journal of Medical Informatics
25 papers in training set
Top 2%
0.5%