Performance of an Artificial Intelligence System for Breast Cancer Detection on Screening Mammograms from BreastScreen Norway
Larsen, Marthe; Olstad, Camilla Flåt; Lee, Christoph I.; Hovda, Tone; Hoff, Solveig Kristin Roth; Martiniussen, Marit Almenning; Mikalsen, Karl Øyvind; Lund-Hanssen, Håkon; Solli, Helene; Silberhorn, Marko; Sulheim, Åse Ø.; Auensen, Steinar Gøytil; Nygård, Jan Franz; Hofvind, Solveig Sand-Hanssen
Journal article, Peer reviewed
Published version
Permanent lenke
https://hdl.handle.net/11250/3151120Utgivelsesdato
2024Metadata
Vis full innførselSamlinger
Sammendrag
A commercially available artificial intelligence system showed high performance in detecting breast cancers within 2 years of screening mammography and may help triage low-risk mammograms to reduce radiologist workload.
Purpose: To explore the stand-alone breast cancer detection performance, at different risk score thresholds, of a commercially available artificial intelligence (AI) system.
Materials and Methods: This retrospective study included information from 661 695 digital mammographic examinations performed among 242 629 female individuals screened as a part of BreastScreen Norway, 2004–2018. The study sample included 3807 screen-detected cancers and 1110 interval breast cancers. A continuous examination-level risk score by the AI system was used to measure performance as the area under the receiver operating characteristic curve (AUC) with 95% CIs and cancer detection at different AI risk score thresholds.
Results: The AUC of the AI system was 0.93 (95% CI: 0.92, 0.93) for screen-detected cancers and interval breast cancers combined and 0.97 (95% CI: 0.97, 0.97) for screen-detected cancers. In a setting where 10% of the examinations with the highest AI risk scores were defined as positive and 90% with the lowest scores as negative, 92.0% (3502 of 3807) of the screen-detected cancers and 44.6% (495 of 1110) of the interval breast cancers were identified with AI. In this scenario, 68.5% (10 987 of 16 040) of false-positive screening results (negative recall assessment) were considered negative by AI. When 50% was used as the cutoff, 99.3% (3781 of 3807) of the screen-detected cancers and 85.2% (946 of 1110) of the interval breast cancers were identified as positive by AI, whereas 17.0% (2725 of 16 040) of the false-positive results were considered negative.
Conclusion: The AI system showed high performance in detecting breast cancers within 2 years of screening mammography and a potential for use to triage low-risk mammograms to reduce radiologist workload.