How should we validate AI models in radiology
How should we validate AI models in radiology
In this interview, we speak with Louis Lind Plesner (Herlev og Gentofte Hospital; Denmark) about the clinical impact of AI radiograph reporting in healthcare, and the steps involved in AI imaging model validation. Louis offers his insights into the challenges of this validation and if there are any negative impacts of AI-automated radiograph reporting.
What are the current applications of AI models in radiology?
There is an explosive development of new AI tools for radiology, but adoption by clinics and hospitals is generally very slow compared to the pace of innovations: is a well-known phenomenon in healthcare. So, the available applications are a totally different story to what is actually being used, which also depends very much on the country and specific setting. The current applications of AI models in radiology also depend on the type of AI referred to, and here I assume we are talking about computer vision models for radiological diagnosis. In Denmark, AI is currently being used mostly as an assistant reader for chest CT and mammography screening, but we are also seeing an increase in the adoption of other models such as bone fracture and chest radiograph models, as well as models for detection of pulmonary embolism and intracranial hemorrhage/stroke. For several years AI has also been the standard of care for estimating pediatric bone age.
What clinical impact does AI radiograph reporting have in healthcare?
At the moment the impact is quite limited in my experience, and existing tools are more of a nice-to-have than a need-to-have tool. There are plenty of publications explaining that radiologists can be more accurate when using these tools and also produce reports faster, but most have methodological flaws. Additionally, in papers with a more real-life setup, these effects tend to be much smaller. The current generation of tools is severely limited compared to a radiologist, because such tools cannot take into account the clinical history and previous medical imaging of a patient. Hence they cannot put the findings into clinical context, which is a big part of a radiologist’s job, and also much bigger than many people realize. Therefore, we believe the only way for these models to gain true impact in the field of radiology is to make them autonomous, so they do not require human feedback. Though this obviously is a big step forward, it is one that we are trying to solve.
Are there any negative impacts that AI-automated radiograph reporting may pose?
Yes, for sure. The most important is patient safety, and that is why we have started with this question. However, our research has been reassuring in this area, as AI scored very highly in terms of patients’ safety for the autonomous reporting of completely normal chest radiographs (chest radiographs without any kind of disease findings). Still, this should be investigated further and also individually estimated at any facility wanting to implement this technology as part of a quality control/algorithm audit. Other important concerns are that the automation of normal chest radiograph reporting may lead to inferior education of junior radiologists because they will not get to see the full spectrum of cases because AI has ‘removed’ some of them from the workload autonomously. Another big concern is that the interpretation of chest radiographs will ‘feel’ more burdensome for radiologists when all the ‘normals’ are removed by AI because these are the easy radiographs to interpret, meaning the radiographs left for interpretation are the more difficult cases that may lead to a paradoxical sense of burden even though the workload is theoretically reduced. This is something we are very aware of.
What steps are involved in AI imaging model validation?
First of all: finding the clinical need. AI should not be implemented into radiology just for the sake of technological advancement. Sometimes the implementation of new technology will fail to provide help or assistance in the area it was created for. However, it can instead sometimes be better suited elsewhere to assist another area in an organization or institution, so we should also be open to alternative purposes for these AI tools. For example, a chest radiograph tool developed for radiologists may be of greater value outside of the radiology department, where physicians are less experienced in interpreting these images. But to go back to the question of AI model validation, when a clinical need has been established, some questions should be posed. For example, how well does the AI detect a specific chest x-ray finding? Then a database can be built, where for example previous radiologist reports can serve as the reference standard, or even better like we did in our study, an independent reference standard can be created for the AI to be compared against. The data should also be representative of the population you want to use the tool for. So essentially, the steps in AI imaging model validation should involve: asking clinical questions, getting access to data, annotating the data, analyzing the data with the AI tool, and then assessing the performance of the AI tool.
Are there any challenges in this validation?
All the steps outlined above pose challenges. First of all, having the time and funding to do this properly is a big challenge. Another challenge is getting access to the sensitive health information that radiological examinations obtain and maintaining the integrity of this data. For example, anonymizing the data correctly needs careful consideration. Annotating the data is also very time consuming and requires the expertise of radiologists, which is a scarce resource.
How can we overcome such challenges?
This is the million-dollar question, and one that has not been properly answered yet in my opinion. Ideally, local AI validation as outlined above should always precede its implementation, to ensure safety and figure out the clinical value beforehand. However, this is not always the case in my experience. This is due to the challenges outlined above, but also probably somewhat due to the hype surrounding AI and the fact that these products are approved for clinical usage, so there are no legal grounds to justify rigorous AI validation. I think we need streamlined workflows for AI validation in radiology, and some vendors are already taking steps in this direction in order to make the process outlined above more straightforward and ensure that you do not need particular skills in data science to perform the validation; skills that are not present in many clinical departments. Most importantly, we have to really think about AI validation as a necessity, and not an option only if you have the time.
Interviewee profile:

I am a medical doctor trained at the University of Copenhagen (Denmark), where I graduated in 2017. During medical school, starting around 2013, I became involved in clinical research, specifically in the field of cardiology. My early career as a doctor primarily focused on cardiology and emergency medicine. However, due to my profound interest in various medical imaging techniques across different specialties, I later decided to pursue a career in radiology. Before my current position as a PhD fellow in AI for radiology, I had some experience in statistics and data analysis. Nevertheless, I had not received any specific training in AI before taking on this role. This journey has been fascinating so far. As I approach the completion of my PhD next year, my plan is to return to clinical radiology while continuing my research. My research work has not been limited to the AI field; I have also explored various other areas in radiology.
.avif)
.avif)
.png)


