putilov_denis - stock.adobe.com
The challenges of verifying AI for healthcare
Artificial intelligence promises to revolutionise healthcare, but even in areas such as medical imaging, where it is easy to spot AI errors, more research is needed
There is a lot of excitement in healthcare about the use of artificial intelligence (AI) to improve clinical decision-making.
Pioneered by the likes of IBM Watson for Healthcare and DeepMinds Healthcare, AI promises to help specialists diagnose patients more accurately. Two years ago, McKinsey co-produced a report with the European Union’s EIT Health to explore the potential for AI in healthcare. Among the key opportunities the report’s authors found were in healthcare operations: diagnostics, clinical decision support, triage and diagnosis, care delivery, chronic care management and self-care.
“First, solutions are likely to address the low-hanging fruit of routine, repetitive and largely administrative tasks, which absorb significant time of doctors and nurses, optimising healthcare operations and increasing adoption,” they wrote. “In this first phase, we would also include AI applications based on imaging, which are already in use in specialties such as radiology, pathology and ophthalmology.”
The world of healthcare AI has not stood still and in June, the European Parliament published Artificial intelligence in healthcare, focusing on the applications, risks, ethical and societal impacts. The paper’s authors recommended that risk assessment of AI should be domain-specific, because the clinical and ethical risks vary in different medical fields, such as radiology or paediatrics.
The paper’s authors wrote: “In the future regulatory framework, the validation of medical AI technologies should be harmonised and strengthened to assess and identify multi-faceted risks and limitations by evaluating not only model accuracy and robustness, but also algorithmic fairness, clinical safety, clinical acceptance, transparency and traceability.”
Validation of medical AI technologies is the key focus of research being run by the Erasmus University Medical Center in Rotterdam. Earlier this month, Erasmus MC, University Medical Center Rotterdam, began working with health tech firm Qure.ai to launch its AI Innovation Lab for Medical Imaging.
The initial programme will run for three years and will conduct detailed research into the detection of abnormalities by AI algorithms for infectious and non-infectious disease conditions. The researchers hope to understand the potential use cases for AI in Europe and provide guidance to clinicians on best practices for adoption of the technology specifically for their requirements.
Jacob Visser, radiologist, chief medical information officer (CMIO) and assistant professor for value-based imaging at Erasmus MC, said: “It is important to realise we have big challenges, an ageing population and we have a lot of technology that needs to be used in a responsible way. We are investigating how we can bring value to clinicians and patients using new technology and how we can measure those advancements.”
Visser’s role as CMIO acts as a bridge between the medical side and technologists. “As a medical professional, the CMIO wants to steer IT in the right direction,” he said. “Clinicians are interested in the possibilities IT can offer. New technical developments trigger medical people to see greater opportunities in areas like precision medicine.”
Erasmus MC will run the laboratory, conducting research projects using Qure’s AI technology. The initial research project will focus on musculoskeletal and chest imaging. Visser said that when evaluating AI models, “it is easy to verify that a fracture has been detected correctly”.
Read more about AI certification
- There is little doubt that artificial intelligence and machine learning will revolutionise decision-making. But how these new technologies make decisions is a mystery and the black art.
- To implement effective government regulation of technologies like AI and cloud computing, more data on the technologies’ environmental impacts is needed.
This makes it possible to assess how well the AI copes, allowing the researchers to gain a meaningful insight into how often the AI incorrectly misses a genuine fracture (false negative) or incorrectly classifies an X-ray scan as a fracture (false positive). They will also gain insight if the algorithm fails in specific diseases or in specific areas.
Discussing the level of scrutiny that goes into the use of AI in healthcare, Visser said: “Medical algorithms need to be approved, such as by the Federal Drug Administration [FDA] in the US and achieve CE certification in Europe. This does, however, not mean that we know the added value of such algorithms in daily clinical practice.”
Looking at the partnership with Qure.ai, he added: “We see the adoption of AI in healthcare at a critical juncture, where clinicians are asking for expert advice on how best to evaluate the adoption of the technology. In Qure’s work to date, it is clear they have gathered detailed insights into the effectiveness of AI in healthcare settings, and together we will be able to assess effective use cases in European clinical environments.”
But there are plenty of challenges in using AI for healthcare diagnostics. Even if an algorithm has been approved by the FDA or is CE certified, this does not necessarily mean it will work in a local clinical practice, said Visser. “We have to ensure the AI algorithm meets our local practice needs,” he added. “What are the clinically relevant parameters that can be affected by the results the AI produces?”
The challenge is that the data used to develop a healthcare AI algorithm uses a specific dataset. As a consequence, the resulting data model may not be representative of real patient data in the local community. “You usually see a drop in performance when you validate an algorithm externally,” said Visser.
This is analogous to pharmaceutical trials, where side-effects can vary between populations. The pharmaceutical sector monitors usage, which feeds into the product development cycle.
Looking at his aspirations for the research coming out of the new lab, Visser said: “I hope, within a year, to prove the algorithms work, the accuracy of their diagnoses, and I hope we will have begun evaluating how these algorithms work in daily clinical practice.”