I'd greatly prefer a blind study comparing doctors to AI, rather than a study of doctors feeding AI scenarios and seeing if it matches their predetermined outcome.
Edit: People seem confused here. The study was feeding the AI structured clinical scenarios and seeing it's results. The study was not a live analyses of AI being used in the field to treat patients.
I don't understand this reasoning. Randomizing people to AI vs standard of care is expensive and risky. Checking whether the AI can pass hypothetical scenarios seems like a perfectly reasonable approach to researching the safety of these models before running a clinical trial.
You would pass those hypothetical scenarios to doctors too, and then the analyses of results would be done by doctors who don't know if it's an AI or doctor result.
> Three physicians independently assigned gold-standard triage levels based on cited clinical guidelines and clinical expertise, with high inter-rater agreement
The issue is that those hypothetical scenarios do not have to look like how patients actually interact with the tool.
Real life use is full of ill posed questions open ended statements inaccurate assessment of symptoms, and conclusory remarks sprinkled in between. Real use of chat bots for Health by non-clinicians looks very different than scenario based evaluation.
The number of people who die each year just in the United States for causes attributable to medical errors is believed to be in the hundreds of thousands. A doctor’s opinion is not the golden yardstick.
It may be interesting to study if there is some kind of signal in general health outcomes in the US since the popularization of ChatGPT for this purpose. It may be a while before we have enough data to know. I could see it going either way.
We have standards of care for a reason. They are the most basic requirements of testing. Ignoring them is not just being a bad doctor, its unethical treatment. Its the absolute bare minimum of a medical system.
That type of experimental set-up is forbidden due to ethical concerns. It goes against medical ethics to give patients treatment that you think might be worse.
I think the best would be an interface, where the patient isn't told if the doctor on the other end is human or AI. Tell them that they are going to do multiple remote exams with different care providers for the same illness in exchange for free treatment, and payment for the study.
If you're worried about not catching a legit emergency, as in something that can't wait a day or two for them to complete the different sessions, you could have a doctor monitor the interactions with the ability to raise a flag and step in to send them to the ER.
I don't think that would tell us anything useful. The data quality in most patient charts is shockingly bad. I've seen a lot of them while working on clinical systems interoperability. Garbage in / garbage out. When human physicians make a diagnosis they typically rely on a lot of inputs that never appear in the patient chart.
And in most cases the diagnosis is the easy part. I mean we see occasional horror stories about misdiagnosis but those are rare. The harder and more important part is coming up with an effective treatment plan which the patient will actually follow, and then monitoring progress while making adjustments as needed. So a focus on the diagnosis portion of clinical decision support seems fundamentally misguided.
You could absolutely randomize care between a doctor and an AI under an IRB. I’d be stunned if there aren’t a dozen studies doing something like this already.
You have to justify it, but most places have sections in the document where you request review to justify it. It’s not any different from giving one patient heart medicine that you think works and another patient a sugar pill.
Huh? Do you have any actual examples of such studies? I don't think you understand how IRB actually works.
In actual heart medicine studies the control arm is typically treated with the current standard of care, not a placebo. So it seems pretty clear that you don't have any actual knowledge or experience in this area.
Edit: People seem confused here. The study was feeding the AI structured clinical scenarios and seeing it's results. The study was not a live analyses of AI being used in the field to treat patients.