Back

Agreement of an AI tool for joint space width measurement in radiographic knee osteoarthritis: data from the LOSEIT trial

Mayar, S.; Henriksen, M.; Christensen, R.; Hansen, P.; Bliddal, H.; Nybing, J. U.; Nielsen, C. T.; Gudbergsen, H.; Boesen, M. P.; Brejnbol, M. W.

2026-06-12 radiology and imaging
10.64898/2026.06.11.26355242 medRxiv
Show abstract

Background and rationale: Knee osteoarthritis (KOA) is a leading cause of lower limb disability worldwide, characterized by functional limitations, stiffness and pain. The incidence of KOA is especially tied to age and obesity. It is a disabling disease that often makes patients less physically active, thus increasing the risk of other diseases and mortality1. The clinical diagnosis of KOA is based on the symptoms and functional limitations of the joint. The diagnosis is usually supported with a radiograph (X-ray) of the weight-bearing knee. Radiographic features, such as Kellgren-Lawrence grade, are used as eligibility criteria for clinical studies while other features, such as joint space width (JSW), are used as endpoints for structural KOA progression2,3. While the use of these radiographic features is standard in academia, the use of JSW as a structural biomarker has received criticism. Critics point out that JSW is an indirect and projection dependent measure of cartilage deterioration which is sensitive to technical factors such as the angulation of the X-ray beam and the positioning of the knee. Small differences in these factors can alter the measured joint space and may not reflect true disease progression4,5. Despite limitations, minimum joint space width (mJSW) remains as one of the most widely used structural biomarkers in KOA trials and is currently one of the only structural imaging accepted in regulatory guidance as evidence of disease modification in OA drug development3. For JSW to be reliable and consistent in determining the advancement of KOA, the use of fixed-flexion devices is crucial to reduce the risk of unwanted narrowing or widening of the radiographic joint space width6,7. The LOSEIT trial, which the present study is based on, acknowledges the angulation problem and uses a standard clinical fixed-flexion device in weight-bearing PA views to get reliable JSW results8. Historically, a radiologist would draw on and grade radiographs of the knee-joint to extract the features. However, manual reading and annotation is time consuming with notable interobserver variance9. With increasing computational power and the use of deep neural networks, off-the-shelf artificial intelligence (AI) tools have become available for automatic extraction of radiograph features. Automation would free up time from radiologists and provide more consistent measurements due to the reproducible nature of the models10. These tools have received regulatory approval for commercial use, however, regulatory approval does not guarantee uniform or bias free performance when used on real-world data11. Furthermore, in a large multi-hospital chest X-ray study, Zech et al., showed that convolutional neural networks achieved worse results on data from other hospitals than on the original hospitals in which it was tested12. This highlights the risk of overestimating the accuracy of AI tools when only internally validated. It is therefore apparent that external validation is required when testing these AI models. Objectives: The aim of this analysis is to evaluate the agreement of a commercially available AI tool for measuring JSW with the best practice radiologist annotation in the tibiofemoral joint of the knee in radiographs stabilized with a fixed-flexion device and acquired as part of a clinical trial. Methods: This study is a secondary analysis of the data from the LOSEIT trial, a randomized, double-blind, placebo-controlled, single-center trial, where patients were randomized to either liraglutide or identically appearing placebo after an initial weight-loss period to investigate the effects on KOA. Radiographs of the tibiofemoral joint were acquired at enrollment (week -8) and at end-of-trial (week 52) for a total acquisition-to-acquisition time of 60 weeks13. The primary analysis will assess agreement between AI-derived and reference-derived change in JSW from enrolment to follow-up. Change will be calculated as follow-up minus enrolment separately for the AI tool and the reference measurement. The main measure of interest will be the change in medial minimal JSW (mmJSW), with change in lateral minimal JSW (lmJSW), medial fixed JSW (mfJSW) and lateral fixed JSW (lfJSW) as secondary measures. This study will follow an equivalence framework using the two one-sided tests (TOST) approach with a Bland-Altman analysis as the main outcome. The equivalence margin will be set at {delta} = 0.5 mm. Agreement consistent with equivalence will be considered established if the upper limit of the 95% confidence interval (95% CI) for the upper limit of agreement (LoA) and the lower limit of the 95% CI for the lower LoA are within the established margins. The reference JSW will be the average measurement of two independent resident radiologists. If there is a mismatch in the measurements of more than 0.40 mm between the two radiologists, the radiologists will re-annotate the case independently. If the difference remains greater than 0.40 mm, a musculoskeletal radiology consultant will review the radiograph and establish the reference JSW. The index test will be the measurements output by the AI tool. Populations: Patients aged 18 to 74 with symptomatic knee osteoarthritis, radiographically confirmed KL grade 1-3, with a BMI [≥]27, motivated for weight loss and in accordance with the LOSEIT trial inclusion criteria Further statistical details Sample size: Not applicable as this is a secondary analysis. Framework: This is an agreement study assessing the equivalence of a commercially available AI tool for radiographic evaluation of knee osteoarthritis with best practice radiologist measurements. Confidence intervals and P values: All 95% confidence intervals and P-values will be two-sided. Statistical software: SAS Studio and/or R version 4.2.2 (or newer).

Matching journals

The top 5 journals account for 50% of the predicted probability mass.

1
Arthritis Research & Therapy
15 papers in training set
Top 0.1%
14.7%
2
Bone
22 papers in training set
Top 0.1%
13.0%
3
Osteoarthritis and Cartilage
30 papers in training set
Top 0.1%
10.3%
4
JBMR Plus
16 papers in training set
Top 0.1%
8.6%
5
European Radiology
14 papers in training set
Top 0.1%
7.0%
50% of probability mass above
6
Scientific Reports
3102 papers in training set
Top 16%
6.5%
7
PLOS ONE
4510 papers in training set
Top 41%
3.1%
8
Nature Communications
4913 papers in training set
Top 43%
3.1%
9
BMJ Open
554 papers in training set
Top 7%
2.7%
10
Diagnostics
48 papers in training set
Top 0.8%
1.9%
11
RMD Open
13 papers in training set
Top 0.2%
1.5%
12
PLOS Digital Health
91 papers in training set
Top 2%
1.5%
13
Frontiers in Endocrinology
53 papers in training set
Top 1%
1.4%
14
Computer Methods and Programs in Biomedicine
27 papers in training set
Top 0.5%
1.3%
15
eLife
5422 papers in training set
Top 49%
1.3%
16
The Lancet Digital Health
25 papers in training set
Top 0.8%
1.0%
17
eBioMedicine
130 papers in training set
Top 3%
1.0%
18
Frontiers in Medicine
113 papers in training set
Top 5%
1.0%
19
Arthritis & Rheumatology
33 papers in training set
Top 0.4%
0.8%
20
Journal of Clinical Medicine
91 papers in training set
Top 7%
0.7%
21
Rheumatology
21 papers in training set
Top 0.4%
0.7%
22
Journal of Clinical Epidemiology
28 papers in training set
Top 0.6%
0.7%
23
Trials
25 papers in training set
Top 2%
0.7%
24
npj Digital Medicine
97 papers in training set
Top 4%
0.7%
25
Sensors
39 papers in training set
Top 2%
0.5%
26
Thorax
32 papers in training set
Top 1.0%
0.5%
27
Scientific Data
174 papers in training set
Top 3%
0.5%