I'd greatly prefer a blind study comparing doctors to AI, rather than a study of...

riskassessment · 2026-02-27T16:25:29 1772209529

I don't understand this reasoning. Randomizing people to AI vs standard of care is expensive and risky. Checking whether the AI can pass hypothetical scenarios seems like a perfectly reasonable approach to researching the safety of these models before running a clinical trial.

WarmWash · 2026-02-27T16:44:05 1772210645

You would pass those hypothetical scenarios to doctors too, and then the analyses of results would be done by doctors who don't know if it's an AI or doctor result.

riskassessment · 2026-02-27T16:49:31 1772210971

From the paper

> Three physicians independently assigned gold-standard triage levels based on cited clinical guidelines and clinical expertise, with high inter-rater agreement

aqme28 · 2026-03-02T08:53:37 1772441617

You're misunderstanding. What this paper did-- Those three physicians set a ground truth to compare the AI response to.

What people in this thread are asking for-- Evaluate a set of doctors on those cases as well, and compare doctor vs AI accuracy.

selridge · 2026-02-27T18:51:14 1772218274

The issue is that those hypothetical scenarios do not have to look like how patients actually interact with the tool.

Real life use is full of ill posed questions open ended statements inaccurate assessment of symptoms, and conclusory remarks sprinkled in between. Real use of chat bots for Health by non-clinicians looks very different than scenario based evaluation.

nick49488171 · 2026-02-27T16:36:00 1772210160

You can start by comparing "doctor" care vs "doctor who also uses AI" care

GorbachevyChase · 2026-02-27T21:20:02 1772227202

The number of people who die each year just in the United States for causes attributable to medical errors is believed to be in the hundreds of thousands. A doctor’s opinion is not the golden yardstick.

It may be interesting to study if there is some kind of signal in general health outcomes in the US since the popularization of ChatGPT for this purpose. It may be a while before we have enough data to know. I could see it going either way.

hwillis · 2026-02-27T16:41:07 1772210467

We have standards of care for a reason. They are the most basic requirements of testing. Ignoring them is not just being a bad doctor, its unethical treatment. Its the absolute bare minimum of a medical system.

dekoidal · 2026-02-27T18:03:51 1772215431

You're joking right? This is the 'testing on mice' phase and it failed and your idea is to start dosing humans just to see what happens.

selridge · 2026-02-27T18:53:04 1772218384

Human use is already widespread. You might as well complain in 2015 about the use of Wikipedia among emergency room doctors. That ship has sailed.

lmkg · 2026-02-27T16:41:44 1772210504

That type of experimental set-up is forbidden due to ethical concerns. It goes against medical ethics to give patients treatment that you think might be worse.

RandomLensman · 2026-02-27T17:37:15 1772213835

Feeding scenarios is not without challenges as some things, for example, smell, would be "pre-processed" by humans before fed into the AI, I think.

nradov · 2026-02-27T16:34:19 1772210059

I don't understand what you're proposing. How would you design such a study in a way that would pass IRB?

dec0dedab0de · 2026-02-27T17:43:11 1772214191

I think the best would be an interface, where the patient isn't told if the doctor on the other end is human or AI. Tell them that they are going to do multiple remote exams with different care providers for the same illness in exchange for free treatment, and payment for the study.

If you're worried about not catching a legit emergency, as in something that can't wait a day or two for them to complete the different sessions, you could have a doctor monitor the interactions with the ability to raise a flag and step in to send them to the ER.

nradov · 2026-02-27T18:03:17 1772215397

I'm pretty sure that wouldn't pass IRB.

SoftTalker · 2026-02-27T16:36:16 1772210176

Feed it randomly selected case histories? See if it came up with the same diagnosis as the doctors?

nradov · 2026-02-27T16:46:33 1772210793

I don't think that would tell us anything useful. The data quality in most patient charts is shockingly bad. I've seen a lot of them while working on clinical systems interoperability. Garbage in / garbage out. When human physicians make a diagnosis they typically rely on a lot of inputs that never appear in the patient chart.

And in most cases the diagnosis is the easy part. I mean we see occasional horror stories about misdiagnosis but those are rare. The harder and more important part is coming up with an effective treatment plan which the patient will actually follow, and then monitoring progress while making adjustments as needed. So a focus on the diagnosis portion of clinical decision support seems fundamentally misguided.

qsera · 2026-02-27T16:57:04 1772211424

> When human physicians make a diagnosis they typically rely on a lot of inputs that never appear in the patient chart.

Yea, like how rich the patient is or if they are on insurance etc. I wish I was kidding.

PearlRiver · 2026-02-27T18:03:58 1772215438

This the real reason why some people go to chatGPT instead of a GP. I am glad to live in a country were going to the doctor is free.

selridge · 2026-02-27T18:54:43 1772218483

You could absolutely randomize care between a doctor and an AI under an IRB. I’d be stunned if there aren’t a dozen studies doing something like this already.

You have to justify it, but most places have sections in the document where you request review to justify it. It’s not any different from giving one patient heart medicine that you think works and another patient a sugar pill.

nradov · 2026-02-27T19:12:06 1772219526

Huh? Do you have any actual examples of such studies? I don't think you understand how IRB actually works.

In actual heart medicine studies the control arm is typically treated with the current standard of care, not a placebo. So it seems pretty clear that you don't have any actual knowledge or experience in this area.

dyauspitr · 2026-02-27T16:46:07 1772210767

It’s all case histories and text no real person is affected by this.

lkey · 2026-02-27T19:10:17 1772219417

This 'preference' is sociopathic, illegal, and stupid.

qsera · 2026-02-27T16:06:43 1772208403

Yea, that is exactly why I don't like this.

These "experts", they have no problem to tout anecdotes when it serves them..