Back

Accounting for Uncertainty in the Null Benchmark in Two-Stage Phase II Trials

Irlmeier, R.; Jin, Z.; Ye, F.

2026-05-18 epidemiology
10.64898/2026.05.14.26353210 medRxiv
Show abstract

Background Simon two-stage designs for binary endpoints and their time-to-event analogues, including the Kwak and Jung method, rely on a fixed null benchmark. Their Type I error control is valid only when that benchmark is correctly specified. In practice, historical benchmarks are often inconsistent due to small samples, population heterogeneity, changing eligibility criteria, and evolving standards of care. Even modest misspecifications can substantially inflate the Type I error rate, leading to costly advancement of ineffective treatments. Methods We propose the Interval-Null Robust (INR) two-stage design framework that accounts for uncertainty in the historical null benchmark. We define the null hypothesis as a plausible range of clinically uninteresting values: p[isin][p0L, p0U] for binary endpoints and {lambda}[isin][{lambda}0L, {lambda}0U] (or equivalent survival probabilities) for time-to-event endpoints. Type I error is controlled uniformly over the full null interval: sup{theta}[isin]{theta}0 Pr{theta}(Go) [≤] . Under the monotonicity of the Go probability, the supremum occurs at the least favorable null configuration - p0U and {lambda}0L - but the design is not reduced to a point-null formulation. The interval defines the uncertainty set for error control and is used in selecting among feasible designs through robust criteria such as worst-case regret or minimal average expected sample size. Results Across representative planning scenarios for both endpoint types, classic designs calibrated to a single benchmark exhibit substantial Type I error inflation when the true null parameter exceeds the assumed planning value. INR designs maintain the nominal Type I error rate across the full null interval, directly addressing this vulnerability to benchmark misspecification. The robustness-efficiency trade-off can be managed through design constraints and robust optimization criteria while preserving uniform Type I error control. Conclusions INR two-stage designs offer a transparent framework for addressing historical control uncertainty in single-arm Phase II trials. By replacing reliance on a fixed benchmark assumption with a more realistic interval of clinically plausible null values, INR design reduces the risk of false-positive Go-decisions caused by benchmark misspecification. INR applies to both binary and time-to-event endpoints and is implemented in the open-source INRDesign R package and accompanying interactive Shiny app.

Matching journals

The top 3 journals account for 50% of the predicted probability mass.

1
Trials
25 papers in training set
Top 0.1%
33.0%
2
Epidemiology
26 papers in training set
Top 0.1%
9.1%
3
Research Synthesis Methods
20 papers in training set
Top 0.1%
8.4%
50% of probability mass above
4
PLOS ONE
4510 papers in training set
Top 23%
7.2%
5
BMC Medical Research Methodology
43 papers in training set
Top 0.1%
6.3%
6
Pharmacoepidemiology and Drug Safety
13 papers in training set
Top 0.1%
3.6%
7
BMC Medicine
163 papers in training set
Top 1%
3.6%
8
Clinical and Translational Science
21 papers in training set
Top 0.2%
3.6%
9
Journal of Clinical Epidemiology
28 papers in training set
Top 0.1%
3.6%
10
Scientific Reports
3102 papers in training set
Top 53%
1.9%
11
PLOS Computational Biology
1633 papers in training set
Top 17%
1.7%
12
Nature Communications
4913 papers in training set
Top 55%
1.3%
13
Clinical Cancer Research
58 papers in training set
Top 1%
1.2%
14
eLife
5422 papers in training set
Top 52%
0.9%
15
JAMA Network Open
127 papers in training set
Top 4%
0.9%
16
Journal of the American Medical Informatics Association
61 papers in training set
Top 2%
0.7%
17
Contemporary Clinical Trials Communications
11 papers in training set
Top 0.6%
0.7%
18
Biometrics
22 papers in training set
Top 0.2%
0.7%
19
Statistics in Medicine
34 papers in training set
Top 0.3%
0.7%
20
International Journal of Epidemiology
74 papers in training set
Top 3%
0.7%
21
Journal of Alzheimer’s Disease
39 papers in training set
Top 1%
0.7%
22
Clinical Pharmacology & Therapeutics
25 papers in training set
Top 0.8%
0.7%
23
PLOS Medicine
98 papers in training set
Top 5%
0.7%