Diagnostics
○ MDPI AG
Preprints posted in the last 30 days, ranked by how well they match Diagnostics's content profile, based on 48 papers previously published here. The average preprint has a 0.08% match score for this journal, so anything above that is already an above-average fit.
Solanki, s.; Solanki, N.; Prasad, J.; Prasad, R.; Harsulkar, A.
Show abstract
Background: Early breast cancer detection remains central to improving clinical outcomes, yet conventional screening pathways, particularly mammography, have recognized limitations in sensitivity, specificity, and performance in dense breast tissue. Circulating microRNAs (miRNAs) have emerged as promising minimally invasive biomarkers, while artificial intelligence and machine learning (AI/ML) offer powerful tools for identifying diagnostically relevant multi-marker patterns within complex biomarker datasets. This systematic review and meta-analysis evaluated the diagnostic performance of AI/ML-based circulating miRNA signatures for early breast cancer detection. Methods: A systematic search of PubMed/MEDLINE, Scopus, and Web of Science Core Collection was conducted from database inception to 31 December 2025. Studies were eligible if they were original human investigations evaluating circulating miRNAs using an AI/ML-based diagnostic model for breast cancer detection and reporting extractable diagnostic performance metrics. Study selection followed PRISMA 2020 and PRISMA-DTA guidance. Methodological quality was assessed using QUADAS 2. Pooled sensitivity and specificity were synthesized using a bivariate random-effects model, and overall diagnostic performance was summarized using a hierarchical summary receiver operating characteristic framework. Results: Seven studies met the inclusion criteria for qualitative synthesis, with eligible studies contributing to the quantitative analysis depending on data availability. Across the pooled analysis, AI/ML-based circulating miRNA models demonstrated good overall diagnostic performance, with a pooled AUC of 0.905 (95% CI: 0.890 to 0.921), pooled sensitivity of 81.3% (95% CI: 76.8% to 85.2%), and pooled specificity of 87.0% (95% CI: 82.4% to 90.7%). Heterogeneity was moderate for AUC (I2 = 42.3%) and sensitivity (I2 = 38.7%) and low for specificity (I2 = 28.4%). Risk-of-bias assessment showed overall low-to-moderate methodological concern, with patient selection representing the most variable domain. Deeks funnel plot asymmetry test showed no significant evidence of publication bias (p = 0.34). Conclusions: AI/ML based circulating miRNA signatures show promising diagnostic accuracy for early breast cancer detection and may have value as non invasive adjunctive tools within imaging supported diagnostic pathways. However, the evidence base remains limited by methodological heterogeneity, variable validation rigor, and the predominance of retrospective case control designs. Prospective, standardized, and externally validated studies are needed before routine clinical implementation can be justified.
de Boer, S.; Häntze, H.; Ziegelmayer, S.; van Ginneken, B.; Prokop, M.; Bressem, K. K.; Hering, A.
Show abstract
Background: Medical imaging, especially computed tomography and magnetic resonance imaging, is essential in clinical care of patients with renal cell carcinoma (RCC). Artificial intelligence (AI) research into computer-aided diagnosis, staging and treatment planning needs curated and annotated datasets. Across literature, The Cancer Genome Atlas (TCGA) datasets are widely used for model training and validation. However, re-annotation is often necessary due to limited access to public annotations, raising entry barriers and hindering comparison with prior work. Methods: We screened 1915 CT scans from three TCGA-RCC databases and employed a segmentation model to annotate kidney lesion. After a meta-data-based exclusion step, we hosted a reader study with all papillary (n=56), chromophobe (n=27) and 200 randomly selected clear cell RCC cases. Two students quality checked and corrected the data as well as annotated tumors and cysts. Uncertain cases were checked by a board-certified radiologist. Results: After data exclusion and quality control a total of 142 annotated CT scans from 101 patients (26 female, 75 male, mean age 56 years) remained. This includes 95 CTs with clear cell RCC, 29 with papillary RCC and 18 with chromophobe RCC. Images and voxel-level annotations of kidneys and lesions are open sourced at https://zenodo.org/records/19630298. Conclusion: By making the annotations open-source, we encourage accessible and reproducible AI research for renal cell carcinoma. We invite other researchers who have previously annotated any of these cohorts to share their annotations.
Heine, J.; Fowler, E.; Egan, K.; Weinfurtner, R. J.; Balagurunathan, Y.; Schabath, M. B.
Show abstract
A substantial body of evidence demonstrates that measures from mammograms are predictive of breast cancer risk. In this matched case-control study, mammograms acquired near the time of diagnosis were analyzed to investigate bilateral breast asymmetry as measure of short-term risk prediction. Specifically, contralateral breast images were compared with measures derived in the Fourier domain (FD); this technique summarizes power in concentric radial bands that cover the Fourier plane. Equivalently, this approach can be described as a multiscale characterization of the image. The summarized power difference between respective contralateral bands produces an asymmetry measure. Full field digital mammography (FFDM) and synthetic two-dimensional images from digital breast tomosynthesis (DBT) were investigated for women that had both types of mammograms acquired at the same time. Odds ratios (ORs) and the area under the receiver operating curves (Azs) were generated from conditional logistic regression modeling with 95% confidence intervals. Raw unprocessed FFDM images produced significant findings: OR = 1.90 (1.58, 2.29) and Az = 1.72 (0.67, 0.76) per one standard deviation unit. Associations were significant but attenuated for both clinical FFDM and DBT images: OR = 1.31 (1.11, 1.54) and Az = 0.63 (0.58, 0.67); and OR = 1.48 (1.25, 1.76) and Az = 0.65 (0.60, 0.70), respectively. Results suggest that clinical FFDM and DBT images are inferior to raw FFDM images in capturing breast asymmetry with information loss for breast cancer risk prediction. Moreover, these DBT images have lower spatial resolution but produced stronger associations than the clinical FFDM images.
Adegbosin, O. T.; Patel, H.
Show abstract
BackgroundMicrosatellite stability status determination is important for prognostication and therapeutic decision making in colorectal cancer management, but the conventional methods for this assessment are not readily available, especially in low- and middle-income countries. Deep learning (DL) models have been proposed for addressing this problem; however, potential computational cost due to model complexity and inadequate explainability may limit their adoption in low-resource settings. This study explored the potential of explainable lightweight models for detection of microsatellite instability in colorectal cancer. MethodsDL models were trained using a public dataset of colorectal cancer histology images and then used to classify a set of test images into one of two classes: microsatellite instability or microsatellite stability. The models were compared for efficiency. Gradient-weighted class activation mapping (Grad-CAM) was used to interpret the models decision making. ResultsThe simpler convolutional neural network (CNN) trained from scratch had modest performance (accuracy=0.757, area under receiver-operating characteristic curve [AUROC]=0.840). With an attention mechanism added, these values increased, but specificity and sensitivity reduced. Pretrained models performed better than the ones trained from scratch, and EfficientNet_B0 had the best balance of high performance and low computational requirements (accuracy=0.936, AUROC=0.990, negative predictive value=0.923, specificity=0.953, 4,010,000 trainable parameters, 0.38 gigaFLOPs). However, a simple CNN model with attention mechanism had the best interpretability based on Grad-CAM. ConclusionThis study demonstrated that DL models that are lightweight when compared to previously proposed ones can be useful for colorectal cancer microsatellite instability screening in resource-limited settings while balancing performance and computational efficiency.
rani, a.; mishra, s.
Show abstract
Accurate histopathological differentiation between High-Grade Serous Carcinoma (HGSC) and Low-Grade Serous Carcinoma (LGSC) remains a critical yet challenging aspect of ovarian cancer diagnosis due to their similar morphology and different clinical outcomes. This study presents a deep learning framework that uses custom attention mechanisms, including the Convolutional Block Attention Module (CBAM), Squeeze-and-Excitation (SE) blocks, and a Differential Attention module within five CNN architectures for automated binary classification of ovarian cancer subtypes from H&E WSI patches. Although individual models achieved higher accuracy, the ensemble stacking framework with a shallow MLP meta-learner delivered the best overall performance, with a ROC-AUC of 0.9211, an accuracy of 0.85, and F1-scores of 0.84 and 0.85 across both subtypes. These findings demonstrate that attention-guided feature recalibration combined with ensemble stacking provides robust and clinically interpretable discrimination of ovarian carcinoma subtypes.
Gunta, S. P.; Mohananey, D.; Garster, N.; Bennett, C.; Kalidindi, S.; Geiger, J.; Ocran, S.; Narra, R.; Bergmann, L. L.; Lewandowski, D.
Show abstract
Background Cardiac MRI (CMR) is often utilized for patients with suspected cardiac amyloidosis (CA). However, data are lacking for use in patients with advanced renal dysfunction (ARD) (GFR<30 mL/min/1.73 m2, dialysis dependent, or renal transplant). This study evaluates the utility of CMR for diagnosis of CA in this population. Methods Patients with ARD who underwent CMR in a 3T field for suspicion of CA between 2010 and 2024 at our institution were included. A diagnosis of CA was made if any of the following were present a)?PYP scintigraphy grade ? 2, b) positive endomyocardial biopsy, or c) positive extracardiac biopsy with clinical features of CA. Two CMR-trained physicians independently assessed T1 relaxation time, ECV, Ti scout, LGE, and overall likelihood of CA. Results Out of the 65 patients included 14 (22%) had a diagnosis of CA. Although T1 time [1352 (1276-1428) ms] and ECV (40.3% +/- 9.1%) were elevated across the cohort, they were significantly higher in patients with CA (p<0.001 for both). Both ECV and T1 time reliably predicted CA (AUC of 0.87 and 0.88 respectively). ECV of ?45% had 75% sensitivity and 80% specificity for CA. A T1 time ? 1390 ms had 75% sensitivity and 85% specificity for CA. LGE was prevalent and was seen in 86% and 84% patients with and without CA respectively. Of the 31 patients deemed to be unlikely CA by a CMR reader, 6% had CA. However, of the 34 patients read as possible/likely CA, only 35% had confirmed CA. Conclusions In this understudied population of ARD, CMR parametric mapping exhibits high negative predictive value (NPV) for CA and improved positive predictive value (PPV) when higher cutoffs are used for T1 time and ECV. CMR reader overall impression exhibits high NPV but low PPV for CA.
Wood, C. S.; Abele, S. M.; Alsbach, J.; Gervalla, A.; Meinel, D. M.; Cuny, A. P.
Show abstract
The development of chemiluminescent immunoassays (CLIAs) is a complex and iterative process that relies on costly laboratory infrastructure, limiting its accessibility and application across healthcare settings and disease areas. Here, we detail the CLIA Mobile Development Kit (CLIAMDK) a modular, mobile, and inexpensive platform to assess image sensors, smartphones and data processing workflows for CLIA development. For its demonstration, we developed two CLIAs targeting renin and aldosterone, key biomarkers for diagnosing primary aldosteronism. The results from our performance study, including 50 patient samples, demonstrate the potential of our platform in a real-world scenario. We found that the performance of our mobile reader platform is comparable to that of a state-of-the-art plate reader, with a Lower Limit-of-Detection (LLoD) approaching 41 femtomolar. We envision that our platform will help accelerate CLIA development, make it more accessible, and lay the foundations for novel, distributed, yet highly sensitive diagnostic tests.
Agumba, J.; Erick, S.; Pembere, A.; Nyongesa, J.
Show abstract
Abstract Objectives: To develop and evaluate a deployable deep learning system with Gradient-weighted Class Activation Mapping (Grad-CAM) for tuberculosis screening from chest radiographs and to assess its classification performance and explainability across desktop and mobile deployment platforms. Materials and methods: This study used publicly available chest X-ray datasets containing Normal and Tuberculosis images. A DenseNet121-based transfer learning model was trained using stratified training, validation, and test splits with data augmentation and class weighting. Model performance was evaluated using accuracy, precision, recall, F1 score, receiver operating characteristic (ROC) curve, and area under the ROC curve (AUC). Grad-CAM was used to visualize regions influencing model predictions. The trained model was converted to TensorFlow Lite and deployed in both a Windows desktop application and a Flutter-based mobile application for offline inference and visualization. Results: The model demonstrated strong classification performance on the independent test dataset, with high accuracy and AUC values indicating effective discrimination between Normal and Tuberculosis cases. Grad-CAM visualizations showed that the model focused primarily on anatomically relevant lung regions, particularly the upper and mid-lung fields in Tuberculosis cases. Deployment testing confirmed consistent prediction outputs and Grad-CAM visualizations across both Windows and mobile platforms. Conclusion: The proposed deployable deep learning system with Grad-CAM provides accurate and interpretable tuberculosis screening from chest radiographs and demonstrates feasibility for offline mobile and desktop deployment. This approach has potential as an artificial intelligence-assisted screening and decision support tool in radiology, particularly in resource-limited and remote healthcare settings.
Dell'Orco, A.; De Vita, E.; D'Arco, F.; Lange, A.; Rüber, T.; Kaindl, A. M.; Wattjes, M. P.; Thomale, U. W.; Becker, L.-L.; Tietze, A.
Show abstract
Focal cortical dysplasias (FCDs) are one of the most common structural causes of drug-resistant epilepsy in children but are frequently subtle and difficult to detect on conventional MRI. Many automated lesion detection methods have therefore been proposed to support neuroradiological assessment. In this study, we externally validated two recently developed deep-learning approaches for FCD detection, MELD Graph and 3D-nnUNet, in a pediatric cohort. In this retrospective single-center study, brain MRI scans of 71 children evaluated for epilepsy were analyzed, including 35 MRI-positive patients with suspected FCD and 36 MRI-negative cases based on the primary radiology reports. Both models were applied to standard 3D T1-weighted and 3D FLAIR images. Detected lesions were reviewed by an experienced pediatric neuroradiologist and classified as true positive, false positive, or false negative. Clinical semiology and EEG findings were additionally evaluated for cases with false-positive detections. At the lesion level, MELD Graph achieved a precision of 0.85 and recall of 0.52, while 3D-nnUNet achieved a precision of 0.91 and recall of 0.48. In the MRI-negative patients, MELD Graph produced more false-positive detections than 3D-nnUNet (0.53 vs. 0.14 false-positive lesions per patient). At the patient level, MELD Graph showed slightly higher sensitivity than 3D-nnUNet (0.63 vs. 0.54), whereas 3D-nnUNet demonstrated markedly higher specificity (0.86 vs. 0.56). Improved FLAIR image quality was associated with trends toward improved model performance. Both models demonstrated high precision but moderate sensitivity, indicating that they are valuable decision-support tools but cannot replace expert neuroradiological evaluation. Optimized MRI acquisition protocols are needed to further improve automated lesion detection in pediatric epilepsy.
Guerrero Quiles, C.; Lodhi, T.; Sellers, R.; Sahoo, S.; Weightman, J.; Breitwieser, W.; Sanchez Martinez, D.; Bartak, M.; Shamim, A.; Lyons, S.; Reeves, K.; Reed, R.; Hoskin, P.; West, C.; Forker, L.; Smith, T.; Bristow, R.; Wedge, D. C.; Choudhury, A.; Biolatti, L. V.
Show abstract
Whole-genome sequencing (WGS) enables comprehensive analysis of tumour genomes, but its use in formalin-fixed paraffin-embedded (FFPE) samples is limited by DNA fragmentation and low yields. Whole-genome amplification (WGA) methods such as multiple displacement amplification (MDA) can boost DNA availability but distort copy-number alteration (CNA) profiles. DNA ligation-mediated MDA (DLMDA) mitigates this bias by reconstituting fragmented templates, yet its performance in FFPE-derived DNA remains uncertain. We compared paired DLMDA pre-amplified (2h, 8h) and non-pre-amplified FFPE prostate tumour samples from 22 archival blocks (5, 15 and 20 years old). DLMDA increased DNA yield by 42- to 86-fold, with global CNA patterns largely preserved. However, DLMDA significantly reduced the number of detected CNA deletions and amplifications. These effects were independent of both block age and reaction time. CNA dropouts were randomly distributed across the genome, indicating that DLMDA does not introduce regional bias. Our results show that DLMDA enables robust DNA yield recovery and avoids false-positive CNA artefacts, but at the cost of reduced CNA sensitivity. While suitable for CNA screening pipelines through WGS, further improvements are required to minimise the false-negative risk and improve the techniques sensitivity for FFPE-based genomics.
Garcia Rairan, L. A.; Corpus Gutierrez, v.; Del castillo, m. a.; Riveros Castillo, W.; Saavedra Gerena, J.; Turizo Smith, A. D.; Arias Guatibonza, J.
Show abstract
Introduction: Glioblastoma multiforme (GBM) remains the most lethal primary brain tumor with median survival of 14-15 months. Current prognostic markers inadequately stratify patient outcomes. PINK1 (PTEN-induced putative kinase 1), a mitochondrial kinase regulating mitophagy and cellular stress responses, has emerged as a promising prognostic candidate. Our preliminary analysis of 20 GBM cases demonstrated significant PINK1 expression with correlation to aggressive phenotypes (Turizo Smith et al., 2025). This multicenter study aims to prospectively validate PINK1 as a prognostic biomarker for survival and functional outcomes in a Latin American cohort. Methods and analysis: PINK1-GBM Colombia is a multicenter, observational cohort study across four tertiary hospitals in Bogota, Colombia (Hospital de Kennedy, Hospital El Tunal, Hospital Santa Clara and Hospital Universitario de la Samaritana). We will enroll at least 26-50 adults (18+ years) with newly diagnosed IDH-wild type GBM undergoing surgical resection. PINK1 expression will be quantified by immunohistochemistry (IHC) on formalin-fixed paraffin embedded (FFPE) tissue using standardized protocols. Primary outcomes: overall survival (OS) and progression-free survival (PFS). Secondary outcomes: functional status trajectories (KPS/ECOG). Follow-up extends 24 months with clinical, imaging (RANO 2.0), and telephone assessments. Survival analyses will employ Kaplan-Meier methods, log-rank tests, and Cox proportional hazards models adjusted for established prognostic factors. Ethics and dissemination: Approved by Universidad Nacional de Colombia Ethics Committee (Acta 001, February 5, 2026; Ref: 2.FM.1.002-CE-002-26), Subred Sur Occidente (P-AP-19-2025, July 11, 2025), and Subred Centro Oriente (CEI 067/2025, October 24, 2025). Conducted per Declaration of Helsinki and Colombian Resolution 8430/1993. Results will be disseminated via peer-reviewed publication, international conferences, and thesis submission.
Altinok, O.; Ho, W. L. J.; Robinson, L.; Goldgof, D.; Hall, L. O.; Guvenis, A.; Schabath, M. B.
Show abstract
Objectives: Among surgically resected non-small cell lung cancer (NSCLC) patients with similar stage and histopathological characteristics, there is variability in patient outcomes which highlights urgency of identifying biomarkers to predict recurrence. The goal of this study was to systematically develop a pre-surgical CT-based habitat-based radiomics classifier to predict recurrence-of-risk in NSCLC. Methods: This study included 293 NSCLC patients with surgically resected stage IA-IIIA disease that were randomly divided into a training (n = 195) and test cohorts (n = 98). From pre-surgical CT images, tumor habitats were generated using two-level unsupervised clustering and then radiomic features were calculated from the intratumoral region and habitat-defined subregions. Using ridge-regularized logistic regression, separate classifiers were developed to predict 3-year recurrence using intratumoral radiomics, habitat-based radiomics, and a combined model (intratumoral and habitat) which was generated using a stacked learning framework. For each classifier, probability of recurrence was calculated for each patient then numerous statistical and machine learning approaches were utilized to stratify patients for recurrence-free survival. Results: The combined radiomics classifier yielded a superior AUC (0.82) compared to the intratumoral (AUC = 0.75) and habitat radiomics (AUC = 0.81) models. When the classifiers were used to stratify high- versus low-risk patients utilizing a cut-point identified by decision tree analysis, high-risk patients were yielded the largest risk estimate (HR = 8.43; 95% CI 2.47 - 28.81) compared to the habitat (HR = 5.41; 95% CI 2.08 - 14.09) and intratumoral radiomics (HR = 3.54; 95% CI 1.45 - 8.66) models. SHAP analyses indicated that habitat-derived information contributed most strongly to recurrence prediction. Conclusions: This study revealed that habitat-based radiomics provided superior statistical performance than intratumoral radiomics for predicting recurrence in NSCLC.
Tan, J.; Tang, P. H.
Show abstract
Background: Paediatric pneumonia is a leading cause of childhood morbidity and mortality worldwide. Chest X-rays (CXR) are an important diagnostic tool in the diagnosis of pneumonia, but shortages in specialist radiology services lead to clinically significant delays in CXR reporting. The ability to communicate findings both to clinicians and laypersons allows MLLMs to be deployed throughout clinical workflows, from image analysis to patient communication. However, MLLMs currently underperform state-of-the-art deep learning classifiers. Objective: To evaluate the diagnostic accuracy of ensemble strategies with MLLMs compared to the baseline average agent for paediatric radiological pneumonia detection. Methods: We conducted a retrospective cohort study using paediatric CXRs from two independent hospital datasets totalling 2300 CXRs. Fifteen MedGemma-4B-it agents independently classified each CXR into five pneumonia likelihood categories. Majority voting, soft voting, and GPTOSS-20B aggregation were compared against the average agent performance. The primary metric evaluated was OvR AUROC. Secondary metrics included accuracy, sensitivity, specificity, F1-score, Cohen's kappa, and OvO AUROC. Results: Soft voting achieved improvements in OvR AUROC (p_balanced = 0.0002, p_real-world = 0.0003), accuracy (p_balanced = 0.0008, p_real-world < 0.0001), Cohen's Kappa (p_balanced = 0.0006, p_real-world = 0.0054) and OvO AUROC (p_balanced < 0.0001, p_real-world = 0.0011) across both datasets, and a superior F1-value (pbalanced = 0.0028) for the balanced dataset. Conclusion: Soft voting enhances MedGemma's diagnostic discriminatory performance for paediatric radiological pneumonia detection. Our system enables privacy-preserving, near real-time clinical decision support with explainable outputs, having potential for integration into emergency departments. Our system's high specificity supports triage by flagging high-risk radiological pneumonia cases.
Ma, C.; Wei, M.; Wang, Z.; Li, X.; Feng, Y.; Luo, Y.; Lu, X.; Wang, W.; Zhou, S.; Li, X.; Wang, F.; Liu, W.
Show abstract
Background Urinary catheterization is a routine procedure after ureteroscopy lithotripsy URSL , but it often causes catheter-related bladder discomfort (CRBD) and urethral pain, which aggravates patients' postoperative discomfort. This study finds out the effect of topical anesthesia on CRBD and urethra pain in patients undergoing ureteroscopy lithotripsy and urinary catheterization. Methods In this study, 330 patients undergoing ureteroscopy lithotripsy enrolled, with 160 cases in the control group and 170 cases in the experimental group. The experimental group divided into two subgroups based on the local anesthetic used: Tetracaine Hydrochloride Gel subgroup and Oxybuprocaine Gel subgroup. Postoperative assessments conducted using CRBD scores and urethra pain numerical rating scale (NRS) score. CRBD and urethra pain NRS scores measured at T0, T1, T2, T3, T4, T5, and T6. Results Compared to the control group, the use of local anesthetics significantly reduced both CRBD scores and urethra pain NRS scores in the experimental group, with the differences being statistically significant (P < 0.01). In male patients, patients who used local anesthetics markedly decreased CRBD scores and urethra pain NRS scores compared to those not receiving local anesthetics, showing statistical significance (P < 0.01), whereas no significant difference followed in female patients. No statistically significant differences found between Rigid ureteroscopy lithotripsy R-URSL and Flexible ureteroscopy lithotripsy F-URSL) regardless of the use of local anesthetics. Within the experimental group, the effects of different local anesthetics were similar, with comparable impacts on CRBD scores and urethra pain NRS scores, and no statistical differences noted. These findings suggest that local anesthetics are effective in reducing postoperative CRBD scores and urethra pain NRS scores, especially in male patients. Conclusion Topical anesthesia following ureteroscopy lithotripsy reduces CRBD scores and urethra pain NRS scores in patients undergoing urinary catheterization, especially in male patients.
Sivakumar, E.; Anand, A.
Show abstract
Computer vision and deep learning techniques, including convolutional neural networks (CNNs) and transformers, have increased the performance of medical image classification systems. However, training deep learning models using medical images is a challenging task that necessitates a substantial amount of annotated data. In this paper, we implement data augmentation strategies to tackle dataset imbalance in the VinDr-SpineXR dataset, which has a lower number of spine abnormality X-ray images compared to normal spine X-ray images. Geometric transformations and synthetic image generation using Generative Adversarial Networks are explored and applied to the abnormal classes of the dataset, and classifier performance is validated using VGG-16 and InceptionNet to identify the most effective augmentation technique. Additionally, we introduce a hybrid augmentation technique that addresses class imbalance, reduces computational overhead relative to a GAN-only approach, and achieves ~99% validation accuracy with both classifiers across all three case studies. Keywords: Data augmentation, Generative Adversarial Network, VGG-16, InceptionNet, Class imbalance, Computer vision, Spine X-ray, Radiology.
Sparnon, E.; Stevens, K.; Song, E.; Harris, R. J.; Strong, B. W.; Bruno, M. A.; Baird, G. L.
Show abstract
The present study evaluates the real-world clinical predictive performance of FDA-authorized artificial intelligence (AI) devices used in radiology, focusing on the false positive paradox (FPP) and its implications for clinical practice. To do this, we analyzed publicly available FDA data on AI radiology devices from 2024 and 2025 from 510(k) summaries, demonstrating how diagnostic accuracy metrics like sensitivity and specificity do not necessarily translate into high positive predictive value (PPV) due to the influence of target disease prevalence. We show the importance of disclosing the false discovery (FDR) and false omission rates (FOR) and argue that this transparency enables clinicians to select AI systems that balance false positive and false negative costs in a clinically, ethically, and financially appropriate manner. Finally, we provide recommendations for what data should be provided to best serve practices and radiologists.
Adeluwoye, A. O.; Gbadegesin, M. O.; James, F. M.; Otegbade, P. S.; Alabetutu, A.
Show abstract
Digital pathology, coupled with advanced image recognition algorithms, represents a transformative frontier in histopathological diagnosis. This sub-Saharan African laboratorys exploratory study investigates the application of a Convolutional Neural Network (CNN) model, specifically leveraging the VGG16 architecture with transfer learning, for automated analysis and classification of selected gastrointestinal (GIT) and liver tissue samples, incorporating both routine and specialized staining protocols. The study utilized a dataset comprising 114 samples (18 liver, 96 GIT images) derived from archival formalin-fixed paraffin-embedded tissue blocks at University College Hospital, Ibadan, Nigeria. Specialized staining techniques included Alcian Yellow for GIT mucin visualization and Massons Trichrome for liver fibrosis assessment, alongside conventional H&E staining. Model performance was evaluated using statistical methodologies including Wilson Score confidence intervals (CI), Bayesian probability assessment, and effect size analysis. Results reveal a striking dichotomy in model performance. The GIT tissue model achieved perfect classification accuracy (100% test accuracy) with exceptional statistical significance (Z=10.0, p<0.0001), Wilson CI [96.29%, 99.99%], Cohens h=1.571, and Bayesian probability >99.99%. Conversely, the liver tissue model demonstrated diagnostic failure (42.86% test accuracy), with Z=-1.428, p=0.9236, Wilson CI [33.59%, 52.65%], Cohens h=-0.144, and Bayesian probability of 7.64%. This performance divergence correlates with training data availability, as the liver dataset fell far below empirically established thresholds (>100-200 samples) for reliable classification. The liver models failure reveals limitations in transfer learning with insufficient data. These findings underscore critical implications for AI-enhanced digital pathology, demonstrating potential deployment of the GIT model as a promising one that supports tissue-specific model development.
Aquaro, G. D.; Licordari, R.; De Gori, C.; Todiere, G.; Ianni, U.; Barison, A.; De Luca, A.; Folgheraiter, a.; Grigoratos, C.; alberti, m.; lombardo, m.; De Caterina, R.; Sinagra, G.; Emdin, M.; Di Bella, G.; fulceri, l.
Show abstract
Background: Late gadolinium enhancement (LGE) quantification by cardiovascular magnetic resonance is central to risk stratification in hypertrophic cardiomyopathy (HCM), yet conventional techniques require contour tracing and region-of-interest (ROI) placement, which may reduce reproducibility and increase analysis time. We developed a novel visual standardized approach, the Visual Standardized Quantification of LGE (VISTAQ), that does not require myocardial contouring, arbitrary ROI positioning, or dedicated post-processing software. Methods: In this multicenter, multivendor retrospective study, LGE images from 400 patients (100 prior myocardial infarction, 250 HCM, 50 other non-ischemic heart diseases) were analyzed. VISTAQ subdivides each myocardial segment into transmural mini-segments and classifies LGE visually using predefined criteria, expressing global LGE burden as the percentage of positive mini-segments. Reproducibility was assessed in 250 patients across different observer expertise levels using intraclass correlation coefficients (ICC) and Bland?Altman analysis. In 100 HCM patients, VISTAQ was compared with conventional methods (mean+2SD, +5SD, +6SD, FWHM, visual thresholding). Prognostic performance was evaluated in 250 HCM patients over a median 5-year follow-up. Results: VISTAQ demonstrated excellent intra- and inter-observer reproducibility (ICC up to 0.98 and 0.97, respectively), consistent across disease subtypes. Compared with conventional techniques, VISTAQ showed similar ICC to FWHM but significantly lower net and absolute inter-observer differences (median absolute difference 1.3%). Mean+2SD markedly overestimated LGE, whereas mean+6SD slightly underestimated LGE compared with VISTAQ, mean+5SD, FWHM, and visual thresholding. Analysis time was substantially shorter with VISTAQ (median 105 vs. 375 seconds, p<0.0001). During follow-up, 21 hard cardiac events occurred in HCM population. An LGE threshold >10% predicted events with higher accuracy using VISTAQ (AUC 0.90; sensitivity 85%; specificity 94%) compared with mean+6SD (AUC 0.75; sensitivity 57%; specificity 93%). Conclusions: VISTAQ provides highly reproducible, time-efficient LGE quantification without dedicated software and demonstrates non-inferior prognostic discrimination in HCM compared with conventional threshold-based techniques.
Powell, S.; Bui, T.; Gullipalli, D.; LaCava, M.; Jones, S. M.; Hansen, T.; Kuhr, F.; Swat, W.; Simandi, Z.
Show abstract
Current clinical management of multiple myeloma (MM) relies on bone marrow (BM) biopsies for minimal residual disease (MRD) assessment. While BM biopsies are the gold standard, their invasive nature and potential to miss extramedullary or patchy disease necessitate sensitive, non-invasive liquid biopsy platforms. In this study, we evaluated the analytical performance of the CellSearch CMMC assay to determine its utility for deep-MRD monitoring. Using a standard 4 mL whole blood input, the assay achieves a WBC-normalized sensitivity of 2.45 x 10-7, supported by a limit of quantitation of 5 cells per run. Given this high analytical sensitivity, the assay provides a robust negative predictive value, rendering false-negative findings highly unlikely in populations with detectable peripheral disease. These findings characterize the CellSearch CMMC assay as a highly sensitive, analytically validated platform for non-invasive deep-MRD level longitudinal surveillance monitoring. When integrated into a clinical workflow that accounts for its specificity profile, the platform offers a patient-friendly complement to serial BM biopsies, with the potential to reduce their frequency in appropriate clinical contexts.
Walser, A.; Flammer, A. J.; Hundertmark, M. J.; Shiri, I.; Ciocca, N.; Ryffel, C.; de Marchi, S.; Schwotzer, R.; Ruschitzka, F.; Tanner, F. C.; Graeni, C.; Benz, D. C.
Show abstract
Background: Transthyretin cardiomyopathy (ATTR-CM) is a progressive, potentially fatal disease requiring accurate risk stratification. Echocardiography is the first-line imaging modality, with AI-based tools increasingly applied for automated analysis, yet their prognostic value remains unknown. Objectives: To examine the prognostic value of AI-derived echocardiographic measurements and their incremental value beyond biomarker staging in ATTR-CM. Methods: This retrospective study included patients from two ATTR-CM registries. Baseline echocardiograms were analyzed using the fully automated AI-based software Us2.ai. Prognostic performance was assessed by Kaplan-Meier analysis, Cox regression, and ROC curves. A two-parameter echocardiographic staging system combining left ventricular (LV) global longitudinal strain (GLS) and right ventricular (RV) fractional area change (FAC) stratified patients into low (both normal), intermediate (one abnormal), and high risk (both abnormal). Results: Among 347 patients (91% male, median age 78 years), 141 experienced all-cause death or heart failure hospitalization over a median follow-up of 2.4 years. In multivariable analysis, AI-derived LV-GLS (HR 1.13 [1.03-1.25], p=0.011) and RV FAC (HR 0.96 [0.93-0.99], p=0.014) were independent outcome predictors. Echo staging stratified risk into groups with 3-fold (95% CI 1.70-5.91) and 6-fold (95% CI 3.22-10.30) increased hazard compared to low risk (p<0.001), with incremental prognostic value beyond National Amyloidosis Centre (NAC) staging and age (chi-square from 53 to 80; p<0.001). AI and human measurements showed comparable 1-year predictive performance (all p>0.05). Conclusion: AI-derived echocardiographic measurements demonstrate independent and incremental prognostic value beyond biomarker-based NAC staging in ATTR-CM, comparable to human measurements, supporting their integration into clinical risk stratification.