Back

Artificial intelligence-generated smart impression from 9.8-million radiology reports as training datasets from multiple sites and imaging modalities

Kaviani, P.; Kalra, M. K.; Digumarthy, S. R.; Rodriguez, K.; Agarwal, S.; Brooks, R.; En, S.; Alkasab, T.; Bizzo, B. C.; Dreyer, K. J.

2024-03-09 radiology and imaging

10.1101/2024.03.07.24303787 medRxiv

Show abstract

ImportanceAutomatic generation of the impression section of radiology report can help make radiologists efficient and avoid reporting errors. ObjectiveTo evaluate the relationship, content, and accuracy of an Powerscribe Smart Impression (PSI) against the radiologists reported findings and impression (RDF). Design, Setting, and ParticipantsThe institutional review board approved retrospective study developed and trained an PSI algorithm (Nuance Communications, Inc.) with 9.8 million radiology reports from multiple sites to generate PSI based on information including the protocol name and the radiologists-dictated findings section of radiology reports. Three radiologists assessed 3879 radiology reports of multiple imaging modalities from 8 US imaging sites. For each report, we assessed if PSI can accurately reproduce the RDF in terms of the number of clinically significant findings and radiologists style of reporting while avoiding potential mismatch (with the findings section in terms of size, location, or laterality). Separately we recorded the word count for PSI and RDF. Data were analyzed with Pearson correlation and paired t-tests. Main Outcomes and MeasuresThe data were ground truthed by three radiologists. Each radiologists recorded the frequency of the incidental/significant findings, any inconsistency between the RDF and PSI as well as the stylistic evaluation overall evaluation of PSI. Area under the curve (AUC), correlation coefficient, and the percentages were calculated. ResultsPSI reports were deemed either perfect (91.9%) or acceptable (7.68%) for stylistic concurrence with RDF. Both PSI (mismatched Hallers Index) and RDF (mismatched nodule size) had one mismatch each. There was no difference between the word counts of PSI (mean 33{+/-}23 words/impression) and RDF (mean 35{+/-}24 words/impression) (p>0.1). Overall, there was an excellent correlation (r= 0.85) between PSI and RDF for the evolution of findings (negative vs. stable vs. new or increasing vs. resolved or decreasing findings). The PSI outputs (2%) requiring major changes pertained to reports with multiple impression items. Conclusion and RelevanceIn clinical settings of radiology exam interpretation, the Powerscribe Smart Impression assessed in our study can save interpretation time; a comprehensive findings section results in the best PSI output.

Artificial intelligence-generated smart impression from 9.8-million radiology reports as training datasets from multiple sites and imaging modalities

Matching journals