超越諂媚：AI醫療建議中易受誤導的風險

Hacker News·4 個月前

一名腎臟移植患者因AI聊天機器人基於正常的肌酸酐水平建議她停用抗生素，導致移植腎功能衰竭，凸顯了AI醫療建議中存在嚴重誤導的風險。

Prevent hallucinations & security issues

Beyond sycophancy: The risk of vulnerable misguidance in AI medical advice

‍What happened

This article states, a 30-year-old kidney transplant recipient reportedly stopped her antibiotics after an AI chatbot indicated her normal creatinine levels meant she no longer needed the medication. Within weeks, her graft function collapsed, creatinine spiked, and she returned to dialysis post-surgery. Senior nephrologists at the Nizam's Institute of Medical Sciences (NIMS) have highlighted a concerning trend: even well-educated patients are making critical health decisions based on chatbot outputs without consulting their healthcare teams.

In another case, a 62-year-old man with diabetes experienced rapid weight loss and dangerously low sodium after following a chatbot plan that advised cutting salt completely. Although chatbots are adept at providing high-level and general advice, they lack the contextual understanding and expertise of an experienced medical professional.

Beyond AI hallucinations: The vulnerable misguidance problem

These incidents show something more profound than simple technical failures or inaccurate information. They expose "vulnerable misguidance," a phenomenon we've been tracking throughout our Phare benchmark.

One of the central concerns in deploying LLMs responsibly is their potential to validate or encourage unsafe behaviours, particularly in sensitive or high-risk user scenarios such as disordered eating, mental health crises, substance misuse, or dangerous lifestyle practices. Rather than measuring toxicity or abuse, vulnerable misguidance focuses on more subtle forms of harm, particularly when users implicitly seek affirmation or validation for harmful behaviour.

In real-world applications, users may express unsafe intentions in emotionally complex ways: framed positively, indirectly, or within supportive-seeming dialogue. A safe and trustworthy model should not merely avoid giving explicit harm-related advice but should also express concern or caution, suggest safer alternatives, or encourage people to seek professional advice when needed.

The sycophantic tendency amplifies the risk of harmful AI advice

What makes vulnerable misguidance particularly dangerous is how it couples with the sycophantic tendency of LLMs. These models are trained to be helpful and agreeable, which means they're prone to validating user assumptions rather than challenging potentially dangerous decisions.

When a kidney transplant patient asks whether normal creatinine levels mean they can stop antibiotics, an LLM might provide a confident, affirmative response without understanding the critical context: that post-transplant immunosuppression requires strict adherence regardless of apparent lab improvements.

Why generic AI advice fails in clinical settings

Chatbots don't see contraindications, drug-drug interactions, transplant protocols, or the arc of a patient's chart. They interpret isolated inputs and provide confident answers, even when they are uncertain, the evidence is weak, or the required context is missing. All these things together can turn "OK-sounding" advice into life-threatening situations.

What we learned from testing vulnerable misguidance

Through our testing at Giskard, we found that most large language models perform reasonably well at avoiding vulnerable misguidance scenarios. However, performance drops significantly for smaller model versions. This is likely because smaller versions fail to capture the full complexity of situations where users might be seeking validation for potentially harmful behaviours.

As organisations deploy smaller, more cost-effective models to scale AI applications, they may inadvertently introduce safety vulnerabilities that larger models would have caught. Don’t get us wrong, smaller language models are great for many reasons (size, speed), but when using them, you should be even more aware of the risks of vulnerable misguidance within customer-facing deployments.

How to prevent harmful AI medical advice?

Organisations deploying client-facing LLMs must implement rigorous AI security testing. At Giskard, we evaluate the robustness of AI systems using our automatic LLM vulnerability scanner, which assesses:

To gain a deeper understanding of our tests and their capabilities, we recommend reviewing our whitepaper on LLM security attacks.

Actionable steps to secure AI healthcare deployments

If your organisation is evaluating AI for patient-facing or clinician-facing tasks, implement these safeguards:

Most of these things can be defined and verified with custom validation rules for client interaction. You could use an LLM as a judge to evaluate whether returned information aligns with the policies above. In combination with proactive attacks, you can then verify different risky user scenarios for your entire system.

The impact of AI failures in healthcare

What changes with AI-assisted healthcare interactions is the scale of potential harm. A single vulnerable misguidance failure in a widely deployed chatbot could affect thousands of patients making critical health decisions. The same system that provided dangerous advice to one kidney transplant patient could just as easily mislead diabetic patients about insulin management, cancer patients about treatment adherence, or mental health patients about medication discontinuation.

Conclusion

The discussed cases reveal how AI features, such as sycophancy, can become catastrophic risk multipliers in high-risk scenarios. With minimal friction, patients now receive confident medical guidance that ignores critical clinical context, bypasses safety protocols, and operates without accountability.

Organisations deploying LLMs in healthcare must take AI red teaming seriously. Test your models against vulnerable misguidance scenarios. Probe for weaknesses in safety alignment, particularly around the sycophantic tendency to validate potentially harmful user assumptions.

If you're looking to prevent AI failures in healthcare applications, we support over 50+ attack strategies that expose AI vulnerabilities from prompt injection techniques to authorisation exploits and vulnerable misguidance scenarios. Get started now.

‍

You will also like

— Hacker News