The allure of AI revolutionizing healthcare is undeniable. Studies frequently tout AI’s diagnostic prowess, promising a solution to America’s strained healthcare system. AI could potentially streamline administrative tasks, allowing doctors to see more patients, reduce costs, and even bridge language barriers through real-time translation. The potential financial gains for tech companies are also substantial. However, the reality of AI in healthcare today is far less transformative.
Recent investigations, including a report by the Washington Post, reveal significant challenges in early AI healthcare applications. These findings raise serious concerns about the feasibility of replacing or even augmenting doctors with AI in the near future.
AI’s Diagnostic Accuracy Under Scrutiny
Experts are finding that AI-driven diagnoses often fall short of clinical standards. For instance, Stanford Medical clinical professor Christopher Sharp observed GPT-4o recommending a steroid topical cream for itchy lips after eating a tomato. While avoiding tomatoes was sound advice, applying steroid cream to the thin lip tissue is generally not recommended. This highlights a critical gap in AI’s understanding of nuanced medical practices.
Similarly, Stanford medical and data science professor Roxana Daneshjou found ChatGPT providing incorrect advice for mastitis, recommending hot packs and continued breastfeeding, contrary to established medical guidelines advocating cold compresses and avoiding overstimulation.
The Dangers of Inaccurate AI in Healthcare
Unlike software glitches with minor consequences, errors in healthcare can be life-threatening. Daneshjou’s “red team” exercise, involving computer scientists and physicians, revealed ChatGPT offering dangerous medical advice in 20% of test cases. Such a high error rate is unacceptable for real-world healthcare applications.
While some argue that AI can augment, not replace, doctors, the need for constant human oversight negates the promised time savings. Furthermore, relying solely on AI transcription, like OpenAI’s Whisper, introduces the risk of fabricated information. Sharp noted Whisper erroneously adding information about a patient attributing a cough to their child.
Bias and the Limits of AI “Intelligence”
Bias in training data remains a significant concern. Daneshjou observed an AI transcription tool incorrectly assuming a Chinese patient’s profession based on ethnicity. These biases perpetuate harmful stereotypes and underscore the limitations of current AI technology.
Generative AI, at its core, is a sophisticated word prediction engine, lacking true understanding of the underlying medical concepts. It generalizes from existing data, failing to account for individual patient circumstances. This inherent limitation makes it unsuitable for making critical healthcare decisions.
The Future of AI in Healthcare
Adam Rodman, an internal medicine doctor and AI researcher, aptly summarizes the current state of AI in healthcare: “promising, but not there yet.” The risk of “hallucinated ‘AI slop'” impacting patient care is a serious concern.
While AI holds potential, its current implementation in healthcare requires cautious evaluation and rigorous oversight. The next time you visit your doctor, it’s worth inquiring about their use of AI and how they ensure its accuracy and safety.