Back

Diagnostics

MDPI AG

Preprints posted in the last 90 days, ranked by how well they match Diagnostics's content profile, based on 48 papers previously published here. The average preprint has a 0.08% match score for this journal, so anything above that is already an above-average fit.

1
Diagnostic Test Accuracy of Commercially Available Tests for The Recurrence of Bladder Cancer: A Systematic Review and Meta-Analysis

Ntzani, E.; Tsarapatsani, K.-E.; Asimakopoulos, G.-A.; Jalal, H.; Kang, S. K.; Trikalinos, T. A.; CISNET Bladder Cancer Modeling Investigators,

2026-02-03 oncology 10.64898/2026.02.02.26344871 medRxiv
Top 0.1%
19.3%
Show abstract

ObjectiveBladder cancer (BC) is the most common malignancy of the urinary system and among the most frequently diagnosed cancers worldwide. This systematic review and meta-analysis aimed to evaluate the diagnostic accuracy of commercially available urinary biomarkers tests (UBTs) for detecting BC recurrence, focusing on pooled sensitivity and specificity estimates across different tests. MethodsA systematic search was performed on PubMed and EMBASE up to May 2025 to identify studies assessing recurrence of BC in previously diagnosed patients using non-FDA approved UBTs, including Xpert Bladder Cancer, Bladder Epicheck, ADXbladder and Uromonitor. Eligible studies were synthesized using the bivariate Generalized Linear Mixed Model (GLMM) model. ResultsOut of 307 initially screened citations, 33 studies met the eligibility criteria, encompassing a total of 10,478 patients. Xpert Bladder Cancer was evaluated on 13 studies and Bladder Epicheck was assessed on 10 studies. ADXbladder and Uromonitor were assessed in four and six studies, respectively. Meta-analyses included 13 studies for Xpert Bladder Cancer and 10 studies for Bladder Epicheck, yielding pooled sensitivity (95% CI) and specificity (95% CI) estimates of 0.71 (0.61-0.79) and 0.78 (0.74-0.82) for Xpert Bladder Cancer, and 0.75 (0.61-0.86) and 0.90 (0.84-0.94) for Bladder Epicheck. For ADXbladder and Uromonitor, meta-analyses incorporated four and six studies, respectively, resulting in pooled sensitivity and specificity values of 0.55 (0.40-0.69) and 0.60 (0.44-0.75) for ADXbladder, and 0.77 (0.61-0.88) and 0.96 (0.91-0.98) for Uromonitor. ConclusionsThis meta-analysis reveals that commercially UBTs for BC recurrence have varying diagnostic accuracy. Among the evaluated tests, Uromonitor demonstrated the highest pooled sensitivity and specificity, while Xpert Bladder Cancer and Bladder Epicheck showed reliable diagnostic performance. Further research is needed particularly for less extensively studied assays to establish their diagnostic performance.

2
Impact of Image Bit Depth Reduction on Deep Learning Performance in Chest Radiograph Analysis: A Multi-institutional Study

Takita, H.; Mitsuyama, Y.; Walston, S. L.; Saito, K.; Sugibayashi, T.; Okamoto, M.; Suh, C. H.; Ueda, D.

2026-03-09 radiology and imaging 10.64898/2026.03.07.26347853 medRxiv
Top 0.1%
14.1%
Show abstract

PurposeMedical imaging typically generates 12- to 16-bit formats, yet conversion to 8-bit is often required. While deep learning has been widely explored in medical imaging, the influence of image bit depth on model performance is not fully understood. This study evaluates the impact of conversion from 16-bit to 8-bit for sex, age, and obesity classification using deep learning. Materials and methodsIn this retrospective, multi-institutional study, we analyzed 100,002 chest radiographs from 48,047 participants across three institutions. Three convolutional neural network architectures (ResNet52, EfficientNetB2, and ConvNeXtSmall) were trained on both 16-bit and 8-bit versions of the images. Model performance was evaluated using internal test datasets, randomly split multiple times, and an external test dataset. Statistical analysis included paired comparisons of area under the receiver operating characteristic curve (AUC-ROC) values, with Bonferroni correction for multiple comparisons. ResultsAcross all architectures and classification tasks, differences between 16-bit and 8-bit model performance were minimal (mean differences ranging from -0.218% to 0.184%). Statistical analyses revealed no significant differences in AUC-ROC values between bit depths for any model-task combination (all p-values > 0.05 after Bonferroni correction). Effect sizes were small to moderate (Cohens d ranging from -0.415 to 0.391). ConclusionReducing image bit depth from 16-bit to 8-bit does not significantly impact the performance of deep learning models in chest radiograph analysis. These findings suggest that 8-bit images can be used for deep learning applications in medical imaging without compromising model performance, potentially allowing for more efficient data storage and processing.

3
Feasibility Study on Training Dogs to Detect Lung Cancer: Findings of a Retrospective Evaluation

Grah, C.; Oei, S. L.; Ngandeu Schepanski, S.; Wuestefeld, H. F.; Blazejczyk, K.; Kalinka-Grafe, J.; Seifert, G.

2026-02-06 oncology 10.64898/2026.02.04.26345351 medRxiv
Top 0.1%
11.2%
Show abstract

Early detection is critical for lung cancer patients. One lung cancer detection method under study is using sniffer dogs. This study aimed to evaluate, retrospectively, the sensitivity and specificity of the Cancer Detection Dog Collective (CDDC(R)) method under training conditions. A team of five trained sniffer dogs analyzed breath samples from lung cancer patients and cancer-free volunteers, and a cancer sample is positive if at least three dogs indicate it. Dog handlers and experimental observers were blinded to sample identity, and detection accuracy was assessed. Primary endpoint was sensitivity, and selectivity and confounding factors were also assessed. Samples were collected in 2024 from 824 volunteers, including 111 with a confirmed diagnosis of lung cancer (mean age 60, range 34-80, 18% early-stage cancer, 46% not yet oncological treated). A total of 11,900 breath samples were tested with 125 test runs per dog. Individually, the five dogs demonstrated detection performance with sensitivities between 82% and 89%, and specificities of over 95%. The CDDC(R) dog teams corporate decision revealed a sensitivity over 95% and the rate of false positives was 0%. Analysis of potential confounding factors revealed that weather conditions and supervisor skills were associated with the dogs performance. The CDDC(R) method showed high consistency in training scenarios. Further studies should evaluate this method in a controlled clinical study alongside lung cancer screening.

4
Real-Time Detection of Breast Cancer-Related Lymphedema with Shear-Wave Elastography: The Holder-Optimized Elastography Method

Hoe, Z. Y.; Ding, R.-S.; Chou, C.-P.; Hu, C.; Lee, C.-H.; Tzeng, Y.-D.; Pan, C.-T.; Lee, M.-C.; Lee, E. K.-L.

2026-03-02 radiology and imaging 10.64898/2026.02.25.26344759 medRxiv
Top 0.1%
10.7%
Show abstract

BackgroundBreast cancer-related lymphedema (BCRL) is a common complication following breast cancer treatment. While lymphoscintigraphy is considered the diagnostic gold standard, it is unsuitable for routine periodic monitoring or assessment of treatment efficacy. Shear wave elastography (SWE) offers a possible alternative, but traditional modes of operation limit its potential. Proposed SolutionsThe Holder-Optimized Elastography (HOE) method is introduced to eliminate pressure issues introduced by manual operation of ultrasound probes by stabilizing them above the cutis. MethodsThe HOE method was used to acquire ARFI images of high-velocity areas (HVAs, with shear wave velocity greater than 7 m/s) in limbs with and without BCRL (as confirmed and characterized by lymphoscintigraphy) in two cohorts of 15 and 125 patients. ResultsThe HOE method enabled ARFI elastography to directly and consistently visualize the effects caused by both obstructed lymphatic vessels and intraluminal lymphatic fluid as HVAs, whereas traditional hand-held methods did not. Inter-limb differences in HVA burden showed moderate diagnostic performance for detecting BCRL and grading obstruction with modest sensitivity. However, there was systematic underestimation of both early and confluent advanced lesions. ConclusionHOE-based HVA imaging has potential for rapid and non-invasive monitoring of lymphedema course and treatment response and may serve as a useful adjunct to existing diagnostic tools for BCRL. However, further technical refinements and quantitative analytic methods will be required to fully exploit the richer SWV information provided by HOE and to enhance the diagnostic utility of HVAs. Summary StatementThe Holder-Optimized Elastography method ("HOE" method) increases the diagnostic capability of ARFI elastography for breast cancer-related lymphedema, allowing for the non-invasive detection of some lymphatic obstructions but not all. Key ResultsThe Holder-Optimized Elastography (HOE) method revealed the effects caused by fluid-filled lymphatic vessels as "High-Velocity Areas" (HVAs), which are difficult to detect by conventional methods. HVA counts for detecting lymphedema (any obstruction vs. no obstruction) showed high specificity (0.86-1.00) but low sensitivity (0.57-0.67). Conversely, HVA counts for staging lymphedema (i.e. total vs. partial obstruction) showed high sensitivity (up to 1.00) but low specificity (0.48-0.66). The inter-limb difference of HVAs counted in whole-limb scans between affected and unaffected limbs (aka, the "Global Mean Difference") provided the most balanced diagnostic performance (sensitivity 0.67-0.79, specificity 0.88-0.89).

5
Development and validation of a deep learning model for the automated detection of vertebral artery calcification on non-contrast head-and-neck computed tomography

Ueda, Y.; Okazaki, T.; Isome, H.; Patel, A.; Ichimasa, T.; Asaumi, R.; Kawai, T.; Suyama, K.; Hayashi, S.

2026-03-17 radiology and imaging 10.64898/2026.03.15.26348421 medRxiv
Top 0.1%
10.5%
Show abstract

BackgroundVertebral artery calcification (VAC), a critical indicator of cerebrovascular disease, is often overlooked in head-and-neck imaging. Manual detection is time-consuming and prone to inter-observer variability. This study aimed to develop and validate a deep learning model for automated detection and quantitative risk assessment of VAC in non-contrast head-and-neck computed tomography (CT) images, bridging the diagnostic gap between dentistry and vascular medicine. MethodsWe developed a deep learning model based on the ResNet-18 architecture, designated as Grayscale ResNet, optimized for single-channel CT images. The development followed a two-phase strategy: initial training on 539 axial images from head-and-neck CT image followed by iterative refinement (fine-tuning) using a targeted dataset of clinically significant cases to ensure generalizability. The models performance was evaluated using patient-level Receiver Operating Characteristic (ROC) analysis and saliency map visualization for clinical interpretability. ResultsThe optimized model demonstrated a robust performance in distinguishing between cases with and without VAC. In the independent cohort, the model achieved an area under the curve (AUC) of 0.846. At a specific threshold value (98.6%), the system yielded a sensitivity of 80.0% and a specificity of 90.6%. A saliency map analysis confirmed that the model consistently focused on anatomically relevant vascular regions. ConclusionsThe proposed automated system provides an accurate and reliable method for VAC screening using routine head-and-neck CT scans. By transforming incidental imaging findings into a quantifiable risk index, this tool can serve as a vital decision-support system for dentists and radiologists, facilitating early patient referrals and contributing to global stroke prevention.

6
Usages and perceptions of artificial intelligence among French radiologists

Jean, A.; Benillouche, P.; Jacques, T.

2026-03-26 radiology and imaging 10.64898/2026.03.23.26348621 medRxiv
Top 0.1%
10.1%
Show abstract

This study analyzes the adoption, barriers, and expectations of French radiologists regarding the use of Artificial Intelligence (AI) solutions in their daily practice. Despite a recognition of AI's potential to make radiology more precise, predictive, and personalized, its adoption remains limited. The main obstacles identified are the high cost of those solutions and the insufficient equipment of French imaging centers with AI technologies. Nevertheless, the survey reveals a strong willingness to adopt, with over 70% of radiologists expressing their desire to use AI and 0% declaring a refusal to use it. Furthermore, the radiologists' fears of being replaced by AI are very low (0 to 8.8%).

7
Artificial Intelligence and Circulating microRNA Signatures for Early Breast Cancer Detection: A Systematic Review and Meta-Analysis

Solanki, s.; Solanki, N.; Prasad, J.; Prasad, R.; Harsulkar, A.

2026-03-30 oncology 10.64898/2026.03.29.26349657 medRxiv
Top 0.1%
10.1%
Show abstract

Background: Early breast cancer detection remains central to improving clinical outcomes, yet conventional screening pathways, particularly mammography, have recognized limitations in sensitivity, specificity, and performance in dense breast tissue. Circulating microRNAs (miRNAs) have emerged as promising minimally invasive biomarkers, while artificial intelligence and machine learning (AI/ML) offer powerful tools for identifying diagnostically relevant multi-marker patterns within complex biomarker datasets. This systematic review and meta-analysis evaluated the diagnostic performance of AI/ML-based circulating miRNA signatures for early breast cancer detection. Methods: A systematic search of PubMed/MEDLINE, Scopus, and Web of Science Core Collection was conducted from database inception to 31 December 2025. Studies were eligible if they were original human investigations evaluating circulating miRNAs using an AI/ML-based diagnostic model for breast cancer detection and reporting extractable diagnostic performance metrics. Study selection followed PRISMA 2020 and PRISMA-DTA guidance. Methodological quality was assessed using QUADAS 2. Pooled sensitivity and specificity were synthesized using a bivariate random-effects model, and overall diagnostic performance was summarized using a hierarchical summary receiver operating characteristic framework. Results: Seven studies met the inclusion criteria for qualitative synthesis, with eligible studies contributing to the quantitative analysis depending on data availability. Across the pooled analysis, AI/ML-based circulating miRNA models demonstrated good overall diagnostic performance, with a pooled AUC of 0.905 (95% CI: 0.890 to 0.921), pooled sensitivity of 81.3% (95% CI: 76.8% to 85.2%), and pooled specificity of 87.0% (95% CI: 82.4% to 90.7%). Heterogeneity was moderate for AUC (I2 = 42.3%) and sensitivity (I2 = 38.7%) and low for specificity (I2 = 28.4%). Risk-of-bias assessment showed overall low-to-moderate methodological concern, with patient selection representing the most variable domain. Deeks funnel plot asymmetry test showed no significant evidence of publication bias (p = 0.34). Conclusions: AI/ML based circulating miRNA signatures show promising diagnostic accuracy for early breast cancer detection and may have value as non invasive adjunctive tools within imaging supported diagnostic pathways. However, the evidence base remains limited by methodological heterogeneity, variable validation rigor, and the predominance of retrospective case control designs. Prospective, standardized, and externally validated studies are needed before routine clinical implementation can be justified.

8
Predicting 5-Year Breast Cancer Risk from Longitudinal Digital Breast Tomosynthesis: A Single-center Retrospective Study

Xu, Y.; Heacock, L.; Park, J.; Pasadyn, F. L.; Lei, Q.; Lewin, A.; Geras, K. J.; Moy, L.; Schnabel, F.; Shen, Y.

2026-03-24 radiology and imaging 10.64898/2026.03.22.26349001 medRxiv
Top 0.1%
9.2%
Show abstract

Background: Imaging-based breast cancer risk prediction models primarily use full-field digital mammography (FFDM). As digital breast tomosynthesis (DBT) has become a predominant screening modality in the United States, its potential for long-term breast cancer risk prediction remains under-explored. Objective: To develop and evaluate a deep learning model that uses longitudinal DBT exams to predict long-term breast cancer risk. Methods: This retrospective study included 313,531 DBT exams from 161,165 women (mean age, 58.5, std 11.7 years) between January 2016 and August 2020 at Institute A. A risk prediction (DRP) model was developed to estimate 2-5 year breast cancer risk using longitudinal DBT exams, patient age and breast density. Model performance was compared with a single-time point DBT model, the Mirai model using same-day FFDM, and the Tyrer-Cuzick model using the area under the receiver operating characteristic curve (AUC), time-dependent concordance index, and integrated Brier score. Results: In an independent test set (n = 34,580), the longitudinal DRP model achieved a 5-year AUC of 0.720 (95% CI, 0.703-0.738), improving on the single time point DRP model (AUC, 0.706; 95% CI, 0.687-0.724; p < 0.001) and the Mirai model (AUC, 0.687; 95% CI, 0.668-0.705; p < 0.001). In a matched case-control cohort (n=432), the DRP model achieved a 5-year AUC of 0.676 (95% CI, 0.626-0.727), compared with 0.567 (95% CI, 0.514-0.621; p < 0.001) for the Tyrer-Cuzick model. The model reclassified 37.6% (705/1,877) of women with extremely dense breasts as average risk, with a 5-year cancer incidence of 0.7% (5/705), and identified 15.5% (404/2,605) of women with fatty breasts as high risk, with a 5-year cancer incidence of 2.5% (10/404). Conclusion: A deep learning model using longitudinal DBT examinations improved long-term breast cancer risk prediction compared with FFDM-based and clinical risk models. Clinical Impacts: Longitudinal DBT-based risk prediction may enable dynamic risk assessment using screening images, supporting personalized screening strategies and more targeted use of supplemental imaging.

9
A clinical pilot study for personalized risk?based breast cancer screening utilizing the polygenic risk score

Hovda, T.; Sober, S.; Padrik, P.; Kruuv-Kao, K.; Grindedal, E. M.; Vamre, T. B. A.; Eikeland, E.; Hofvind, S.; Sahlberg, K. K.

2026-03-16 radiology and imaging 10.64898/2026.03.07.26347839 medRxiv
Top 0.1%
8.7%
Show abstract

BackgroundPopulation-based mammographic screening is primarily age-based. However, breast cancer risk is multifactorial, and women may benefit from personalized risk-based screening. This pilot study aimed to explore the use of polygenic risk score (PRS) as a tool for risk stratification in personalized screening. MethodsWe included 80 women aged 40-49 years referred for clinical mammography. Exclusion criteria were prior breast cancer or premalignant breast disease, and previous genetic testing. After DNA collection, PRS was calculated from 2805 Single Nucleotide Polymorphisms (SNPs). Screening recommendations were based on each participants relative 10-year breast cancer risk estimated from PRS and compared with the 10-year risk of an average woman of the same age. Women with a self-reported family history of cancer meeting standard criteria were referred for gene panel testing for pathogenic variants in high-risk genes. A follow up questionnaire regarding participants experiences was distributed 6-9 months after PRS testing. ResultsMean age was 45.2 years (SD 2.8). Mean relative 10-year breast cancer risk was 1.18 (SD 0.57). Based on PRS, 40 participants were recommended standard biennial screening 50-69 years, while 40 were advised to begin biennial screening before age 50. Among these, 7 were recommended annual mammography from when their 10-year risk reached twice that of an average 50-year-old. Twenty-one women underwent gene panel testing; no pathogenic variants in breast cancer genes were identified. Five women were advised annual mammography from 40-60 years due to family history of breast cancer, regardless of PRS. Most respondents viewed breast cancer risk assessment positively and did not report increased anxiety after testing. ConclusionsPolygenic risk score testing may influence current screening recommendations and contribute to more personalized risk-based breast cancer screening strategies.

10
CT-based Automated Volumetry as a Biomarker of Global and Split Renal Function in Living Kidney Donors

Fink, A.; Burzer, F.; Sacalean, V.; Rau, S.; Kaestingschaefer, K. F.; Rau, A.; Koettgen, A.; Bamberg, F.; Jaenigen, B.; Russe, M. F.

2026-02-26 radiology and imaging 10.64898/2026.02.24.26346974 medRxiv
Top 0.1%
8.7%
Show abstract

BackgroundKidney volumetry derived from CT has been proposed as a surrogate of renal function in living kidney donor evaluation. However, clinical integration has been limited by reader-dependent workflows and semiautomatic methods susceptible to image quality. PurposeTo evaluate whether fully automated CT-based segmentation of renal cortex, medulla and total parenchymal volume provides reproducible volumetric biomarkers associated with global and split renal function in living kidney donor candidates. Materials and MethodsIn this retrospective single-center study, 461 living kidney donor candidates (2003-2021) underwent contrast-enhanced abdominal CT. A convolutional neural network was trained to automatically segment cortical, medullary, and total parenchymal volumes on arterial-phase images. Segmentation performance was evaluated against manual reference annotations. Volumes were indexed to body surface area. Associations with eGFR, 24-hour creatinine clearance, cystatin C, and tubular clearance were assessed using Spearman correlation coefficient ({rho}), and side-specific volume fractions were compared with scintigraphy -derived split function. ResultsAutomated segmentation achieved excellent agreement with expert reference segmentations (Dice 0.95 for cortex; 0.90 for medulla). eGFR correlated moderately with cortical ({rho} = 0.46) and total parenchymal volume ({rho} = 0.45), and modestly with medullary volume ({rho} = 0.30). Similar associations were observed for other global measures, with the strongest correlation for cortical volume and tubular clearance ({rho} = 0.53). Side-specific volume fractions correlated with scintigraphy-derived split renal function ({rho} = 0.49-0.56; all p < 0.001). ConclusionAutomated CT-based renal subcompartment segmentation provides reproducible volumetric biomarkers within routine donor evaluation. Cortical volume performs comparably to total parenchymal volume and tracks split renal function at the cohort level, suggesting potential utility in donor assessment.

11
Accessible and Reproducible Renal Cell Carcinoma Research Through Open-Sourcing Data and Annotations

de Boer, S.; Häntze, H.; Ziegelmayer, S.; van Ginneken, B.; Prokop, M.; Bressem, K. K.; Hering, A.

2026-04-23 radiology and imaging 10.64898/2026.04.22.26351451 medRxiv
Top 0.1%
8.7%
Show abstract

Background: Medical imaging, especially computed tomography and magnetic resonance imaging, is essential in clinical care of patients with renal cell carcinoma (RCC). Artificial intelligence (AI) research into computer-aided diagnosis, staging and treatment planning needs curated and annotated datasets. Across literature, The Cancer Genome Atlas (TCGA) datasets are widely used for model training and validation. However, re-annotation is often necessary due to limited access to public annotations, raising entry barriers and hindering comparison with prior work. Methods: We screened 1915 CT scans from three TCGA-RCC databases and employed a segmentation model to annotate kidney lesion. After a meta-data-based exclusion step, we hosted a reader study with all papillary (n=56), chromophobe (n=27) and 200 randomly selected clear cell RCC cases. Two students quality checked and corrected the data as well as annotated tumors and cysts. Uncertain cases were checked by a board-certified radiologist. Results: After data exclusion and quality control a total of 142 annotated CT scans from 101 patients (26 female, 75 male, mean age 56 years) remained. This includes 95 CTs with clear cell RCC, 29 with papillary RCC and 18 with chromophobe RCC. Images and voxel-level annotations of kidneys and lesions are open sourced at https://zenodo.org/records/19630298. Conclusion: By making the annotations open-source, we encourage accessible and reproducible AI research for renal cell carcinoma. We invite other researchers who have previously annotated any of these cohorts to share their annotations.

12
Protocol for rapid allelic discrimination qPCR genotyping of the Winnie mouse model

Mansoori, B.; Liang, C.

2026-02-18 molecular biology 10.64898/2026.02.17.704640 medRxiv
Top 0.1%
8.6%
Show abstract

Winnie mice are a widely used in vivo model of inflammatory bowel disease carrying a missense mutation in the Muc2 gene. Here, we present a protocol for genotyping Winnie mice using TaqMan allelic discrimination quantitative PCR. We describe tissue collection, rapid crude DNA extraction, probe-based amplification with dual-labeled fluorophores, and fluorescence-based genotype calling in a single reaction. This protocol enables qualitative SNP genotyping without post-amplification processing and can be readily adapted to other defined point mutations. Graphical abstract O_FIG O_LINKSMALLFIG WIDTH=200 HEIGHT=165 SRC="FIGDIR/small/704640v1_ufig1.gif" ALT="Figure 1"> View larger version (48K): org.highwire.dtl.DTLVardef@1f5d985org.highwire.dtl.DTLVardef@19bbd34org.highwire.dtl.DTLVardef@1a2d2fcorg.highwire.dtl.DTLVardef@c9baed_HPS_FORMAT_FIGEXP M_FIG C_FIG HighlightsO_LIAllelic discrimination qPCR protocol for genotyping the Muc2 p.Cys52Tyr mutation using dual-labeled hydrolysis probes C_LIO_LIEnables rapid discrimination of wild-type, heterozygous, and mutant alleles in a single reaction C_LIO_LICompatible with standard real-time PCR instruments and requires no post-PCR processing C_LIO_LISupports high-throughput genotyping from crude DNA with minimal hands-on time C_LI

13
A Retrospective Multi-Source Clinical Validation of Lenek Intelligent Radiology Assistant: An Artificial Intelligence-Based Chest Radiograph Screening and Triage System for High-Burden Pulmonary and Cardiac Conditions in India

Singh, V.; Jhamb, A.; Sil, S.; Kumar, S.; Agrawal, C.; Pareek, A.; Gautam, A.; Parale, G.; Singh, S.; Padmanabhan, D.

2026-03-16 radiology and imaging 10.64898/2026.03.14.26348373 medRxiv
Top 0.1%
8.6%
Show abstract

BackgroundA critical radiologist shortage exists in India, leading to delayed chest radiograph (CXR) interpretation. This leads to disease progression, higher morbidity, and mortality. Artificial intelligence-based CXR interpretation by Lenek Intelligent Radiology Assistant (LIRA) is a promising solution. This study aims to establish the screening and triaging capabilities of LIRA by assessing its accuracy in detecting abnormalities and pathologies in CXRs from geographically diverse institutions. MethodsWe conducted a retrospective multi-source validation of the diagnostic accuracy of LIRA for the detection of general abnormalities, tuberculosis, consolidation, pleural effusion, pneumothorax, and cardiomegaly. De-identified chest radiographs were input into LIRA models. The obtained interpretations were compared to the established ground truth reporting for the calculation of sensitivity, specificity, and AUROC with 95% CI for individual pathologies across varying probability thresholds. ResultsLIRA demonstrated high sensitivity for general abnormality detection (AUROC 0.93-0.986, 84.4-97.1% sensitivity, 88.9-92.4% specificity) and tuberculosis triaging (Shenzhen & Montgomery: 88.5-89.7% sensitivity, 89.9-90.5% specificity; Jaypee: 98.7% sensitivity, 63.6% specificity). For consolidation (AUROC 0.884-0.895, 96.4-96.9% sensitivity, 70.8-77.1% specificity), pleural effusion (AUROC 0.942-0.967, 79.7-99.1% sensitivity, 81.2-87.7% specificity), pneumothorax (AUROC 0.87, 90.6-94.8% sensitivity, 79.5-82.7% specificity) and cardiomegaly (AUROC 0.883, 95.1% sensitivity, 81.6% specificity), the model exhibited commendable accuracy as well. ConclusionsThe diagnostic performance of LIRA was consistent across various pathologies and chest radiographs from diverse geographic locations, with particular strengths in abnormality detection and tuberculosis screening. The risk-stratified triaging and high sensitivity of LIRA make it a reliable adjunct solution to address radiologist shortages, reduce turnaround times, and support Indias tuberculosis elimination goals.

14
SCOPE: AI-Assisted Early Detection of Potentially Curable Pancreatic Neoplasms on CT from Local and Global Information

Oviedo, F.; Lopez Ramirez, F.; Blanco, A.; Facciola, J.; Kwak, S.; Zhao, J. M.; Syailendra, E. A.; Tixier, F.; Dodhia, R.; Hruban, R. H.; Weeks, W. B.; Lavista Ferres, J. M.; Chu, L. C.; Fishman, E. K.

2026-02-05 radiology and imaging 10.64898/2026.02.04.26345495 medRxiv
Top 0.1%
8.5%
Show abstract

PurposeTo develop SCOPE (Small-lesion COntextual Pancreatic Evaluator), a deep learning model designed to improve CT detection of small pancreatic lesions--pancreatic ductal adenocarcinoma (PDAC), pancreatic neuroendocrine tumors (PanNETs), and cystic lesions--by integrating voxel-level features with global context. Materials and MethodsThis retrospective study used three independent datasets. A development cohort of 4,065 contrast-enhanced CT scans was used to train a deep neural network that performs pancreas, ductal, and lesion segmentation with an integrated classification head. A metamodel combined segmentation-derived and global contextual signals for case-level prediction. Performance was assessed on (1) an internal holdout test set (n = 605), (2) an external multi-institutional PDAC dataset from the PANORAMA challenge (n = 2,238), and (3) an expert-curated small-lesion reader study (n = 200). Areas under the receiver operating characteristic curve (AUCs) were compared using DeLong test; sensitivities and specificities using McNemars test. ResultsOn the internal test set, SCOPE improved lesion-versus-normal AUC compared with the best segmentation baseline (0.974 [95% CI: 0.964, 0.984] vs 0.956; P = .006) and increased small-lesion sensitivity at 95% specificity (0.727 [95% CI: 0.653, 0.801] vs 0.600; P = .012). Performance gains were observed across lesion classes, with significant improvements for PDAC and PanNET detection. On the external dataset, SCOPE improved PDAC-versus-non-PDAC AUC (0.978 vs 0.861, P < .001) and achieved higher sensitivity at 90% and 95% specificity without retraining. For the small-lesion reader study, SCOPE achieved lesion-versus-normal AUC of 0.922 and performed within the range of subspecialty abdominal radiologists; SCOPE provided the correct diagnosis in 14.5% (29/200) of cases in which two or more readers were incorrect. ConclusionSCOPE improves early detection of small, potentially curable, pancreatic lesions on CT by combining local segmentation and global pancreatic context. Its consistent performance across internal, external, and reader datasets supports potential use as a concurrent reader for earlier and more accurate pancreatic lesion detection.

15
UCSF RMaC: University of California San Francisco 3D Multi-Phase Renal Mass CT Dataset with Tumor Segmentations

Sahin, S.; Diaz, E.; Rajagopal, A.; Abtahi, M.; Jones, S.; Dai, Q.; Kramer, S.; Wang, Z.; Larson, P. E. Z.

2026-02-12 radiology and imaging 10.64898/2026.02.11.26346096 medRxiv
Top 0.1%
8.4%
Show abstract

Current standard of care imaging practices cannot reliably differentiate among certain renal tumors such as benign oncocytoma and clear cell renal cell carcinoma (RCC), and between low and high grade RCCs. Previous work has explored using deep learning, radiomics, and texture analysis to predict renal tumor subtypes and differentiate between low and high grade RCCs with mixed success. To further this work, large diverse datasets are needed to improve model performance and provide strong evaluation sets. In this work, a dataset of 831 multi-phase 3D CT exams was curated. Each exam contains up to three contrast-enhanced CT phases. Tumor outlines or bounding boxes were annotated and registered to the image volumes. The pathology results for each tumor and relevant patient metadata are also included.

16
Clinical validation of automated and multiple manual callosal angle measurement methods in idiopathic normal pressure hydrocephalus

Seo, W.; Jabur Agerberg, S.; Rashid, A.; Holmstrand, N.; Nyholm, D.; Virhammar, J.; Fallmar, D.

2026-02-14 radiology and imaging 10.64898/2026.02.12.26346185 medRxiv
Top 0.1%
8.2%
Show abstract

IntroductionIdiopathic normal pressure hydrocephalus (iNPH) is a partially reversible neurological disorder in which imaging biomarkers support diagnosis and surgical decision-making. The callosal angle (CA) is one of the most robust radiological markers of iNPH and has also been associated with postoperative shunt outcome. However, several manual measurement variants exist and artificial intelligence (AI)-based tools now enable automatic CA measurement. Materials and MethodsIn total 71 patients (40 with confirmed iNPH and 31 controls) were included. Six predefined manual methods for measuring CA were applied to preoperative 3D T1-weighted MRI and evaluated for diagnostic performance and interobserver agreement. An AI-derived automatic CA (cMRI from Combinostics) was included as a seventh method and compared with the traditional manual method (perpendicular to the bicommissural plane and through the posterior commissure). Automatic measurements were additionally assessed in pre- and postoperative scans to evaluate robustness against shunt-related artifacts. ResultsAll seven CA variants significantly differentiated iNPH patients from controls (p < 0.05). The traditional method showed the highest discriminative performance (AUC = 0.986, SE = 0.012), while alternative planes demonstrated slightly lower accuracy (AUC range = 0.957-0.978). Interobserver agreement for manual measurements was good to excellent (ICC = 0.687-0.977). Automatic CA measurements showed excellent correlation with the traditional method, preoperative ICC = 0.92; postoperative ICC = 0.96. ConclusionAlthough several CA positions perform comparably, the traditional method remains marginally superior and is best supported by the literature. Automated CA measurements closely match expert manual assessment in pre- and postoperative imaging, supporting clinical implementation.

17
External validation of self-supervised transfer learning for noninvasive molecular subtyping of pediatric low-grade glioma using T2-weighted MRI

Yoo, J. J.; Tak, D.; Namdar, K.; Wagner, M. W.; Liu, A.; Tabori, U.; Hawkins, C.; Ertl-Wagner, B. B.; Kann, B. H.; Khalvati, F.

2026-01-30 radiology and imaging 10.64898/2026.01.27.26344883 medRxiv
Top 0.1%
7.5%
Show abstract

PurposeTo externally evaluate three binary classification models designed to differentiate the molecular subtype of pediatric low-grade glioma (pLGG) between BRAF Fusion, BRAF Mutation, and Wild Type on T2-weighted magnetic resonance imaging using self-supervised transfer learning, which enables effective performance in a low data setting. Materials and methodsThis retrospective study evaluates pLGG molecular subtyping models, pre-trained using data collected at Dana Farber Cancer Institute/Bostons Childrens Hospital, on two datasets from the Hospital for Sick Children, one consisting of patients identified from the electronic health record between January 2000 to December 2018 (n=336) and another consisting of patients identified from the electronic health record between January 2019 to April 2023 (n=87). These datasets consist of T2-weighted MRI with pLGG and corresponding genetic marker identifications, labelled as BRAF Fusion, BRAF Mutation, or Wild Type. The datasets included manually annotated ground-truth segmentations that were used in the classification pipeline during evaluation. The models were evaluated using the area under the receiver operating characteristic curve (AUC). To acquire a per-class probabilities across all three considered molecular subtypes, we used the output probabilities from each binary model as logits input to a Softmax function. These probabilities were used to determine the AUC of the models on each evaluated dataset. ResultsThe models performed achieved a macro-average AUC of 0.7671 on the newer dataset from the Hospital for Sick Children but achieved a lower macro-average AUC of 0.6463 on the older dataset from the Hospital for Sick Children. ConclusionsThe evaluated pLGG molecular subtyping models have the potential for effective generalization but may require further fine-tuning for consistent performance across varying datasets.

18
Artificial Intelligence in Mammography Screening in Norway (AIMS Norway): Protocol for a randomized controlled trial

Holen, A. S.; Larsen, M.; Hofvind, S.

2026-03-15 radiology and imaging 10.64898/2026.03.13.26348320 medRxiv
Top 0.1%
7.4%
Show abstract

Background and ObjectiveIncreasing screening volumes, combined with global shortage of radiologists and a high proportion of normal mammograms, challenge the efficiency and sustainability of breast cancer screening. Artificial intelligence (AI) has the potential to improve resource allocation, workflow efficiency and diagnostic performance by supporting and partially replacing radiologists in the interpretation process. This randomized, controlled, parallel-group, non-inferiority, single-blinded trial evaluates whether an AI-supported reading strategy, involving one or two radiologists depending on AI risk stratification, is non-inferior to standard independent double reading. The primary outcome is the number of screen-detected breast cancer cases in each group. MethodsWomen invited to BreastScreen Norway in the Western, Central, and Northern Norway Regional Health Authorities are eligible for inclusion. Following written informed consent, participants are randomized 1:1 to the control group (standard independent double reading by two radiologists) or the intervention group. In the intervention group, mammograms are analyzed using Transpara. Examinations with AI scores of 1-7 are interpreted by a single radiologist, whereas examinations with scores of 8-10 undergo independent double reading. Radiologists are blinded to AI scores and AI image markings during the initial interpretation; this information is disclosed during consensus meetings. Non-inferiority will be assessed by estimating confidence interval for the difference in screen-detected cancer rates between groups. Non-inferiority will be concluded if the upper bound of the confidence interval does not exceed the predefined non-inferiority margin. ConclusionsThe trial addresses a critical challenge in breast cancer screening: maintaining diagnostic performance while improving efficiency in the context of workforce constraints and a high prevalence of normal examinations. By evaluating a risk-stratified AI-supported reading strategy within a population-based screening program, the study will provide important evidence on whether AI can be safely integrated to optimize workload distribution while preserving cancer detection rates. Trial registrationThe ClinicalTrials.gov registry (NCT06032390)

19
Fourier Analysis of Bilateral Breast Asymmetry for Short-term Breast Cancer Risk Prediction

Heine, J.; Fowler, E.; Egan, K.; Weinfurtner, R. J.; Balagurunathan, Y.; Schabath, M. B.

2026-03-30 radiology and imaging 10.64898/2026.03.27.26349508 medRxiv
Top 0.1%
7.3%
Show abstract

A substantial body of evidence demonstrates that measures from mammograms are predictive of breast cancer risk. In this matched case-control study, mammograms acquired near the time of diagnosis were analyzed to investigate bilateral breast asymmetry as measure of short-term risk prediction. Specifically, contralateral breast images were compared with measures derived in the Fourier domain (FD); this technique summarizes power in concentric radial bands that cover the Fourier plane. Equivalently, this approach can be described as a multiscale characterization of the image. The summarized power difference between respective contralateral bands produces an asymmetry measure. Full field digital mammography (FFDM) and synthetic two-dimensional images from digital breast tomosynthesis (DBT) were investigated for women that had both types of mammograms acquired at the same time. Odds ratios (ORs) and the area under the receiver operating curves (Azs) were generated from conditional logistic regression modeling with 95% confidence intervals. Raw unprocessed FFDM images produced significant findings: OR = 1.90 (1.58, 2.29) and Az = 1.72 (0.67, 0.76) per one standard deviation unit. Associations were significant but attenuated for both clinical FFDM and DBT images: OR = 1.31 (1.11, 1.54) and Az = 0.63 (0.58, 0.67); and OR = 1.48 (1.25, 1.76) and Az = 0.65 (0.60, 0.70), respectively. Results suggest that clinical FFDM and DBT images are inferior to raw FFDM images in capturing breast asymmetry with information loss for breast cancer risk prediction. Moreover, these DBT images have lower spatial resolution but produced stronger associations than the clinical FFDM images.

20
Explainable, Lightweight Deep Learning for Colorectal Cancer Microsatellite Instability Screening in Low-Resource Settings

Adegbosin, O. T.; Patel, H.

2026-04-20 oncology 10.64898/2026.04.18.26350809 medRxiv
Top 0.2%
6.7%
Show abstract

BackgroundMicrosatellite stability status determination is important for prognostication and therapeutic decision making in colorectal cancer management, but the conventional methods for this assessment are not readily available, especially in low- and middle-income countries. Deep learning (DL) models have been proposed for addressing this problem; however, potential computational cost due to model complexity and inadequate explainability may limit their adoption in low-resource settings. This study explored the potential of explainable lightweight models for detection of microsatellite instability in colorectal cancer. MethodsDL models were trained using a public dataset of colorectal cancer histology images and then used to classify a set of test images into one of two classes: microsatellite instability or microsatellite stability. The models were compared for efficiency. Gradient-weighted class activation mapping (Grad-CAM) was used to interpret the models decision making. ResultsThe simpler convolutional neural network (CNN) trained from scratch had modest performance (accuracy=0.757, area under receiver-operating characteristic curve [AUROC]=0.840). With an attention mechanism added, these values increased, but specificity and sensitivity reduced. Pretrained models performed better than the ones trained from scratch, and EfficientNet_B0 had the best balance of high performance and low computational requirements (accuracy=0.936, AUROC=0.990, negative predictive value=0.923, specificity=0.953, 4,010,000 trainable parameters, 0.38 gigaFLOPs). However, a simple CNN model with attention mechanism had the best interpretability based on Grad-CAM. ConclusionThis study demonstrated that DL models that are lightweight when compared to previously proposed ones can be useful for colorectal cancer microsatellite instability screening in resource-limited settings while balancing performance and computational efficiency.