ChatGPT’s assessment of the quality of online information on erectile dysfunction treatment

Author(s):

“ChatGPT isn't yet ready to determine the quality of online text, but I think it's going to change in the future,” says Roei Golan.

Video Player is loading.

Current Time 0:00

Duration 0:00

Loaded: 0%

Stream Type LIVE

Remaining Time 0:00

In this video, Roei Golan highlights the background and findings from the study, “ChatGPT’s Ability to Assess Quality and Readability of Online Medical Information: Evidence from a Cross-Sectional Study,” for which he served as the lead author. Golan is a 4th year medical student at Florida State University College of Medicine in Tallahassee, Florida.

Video Transcript:

Could you describe the background of this study?

There have been multiple studies on quality and readability of online information regarding a variety of medical treatments. One of the studies we did prior to this one was determining the readability and quality of online medical information regarding shockwave therapy for erectile dysfunction. It's a very controversial topic; there's a lot of data and research out, but it's still not an FDA-approved treatment and it's not being offered by many clinics, but patients are still seeking the treatment. It's supposed to be a restorative therapy, and many studies are showing great results. However, there's still a lot more to be studied to determine whether this can be beneficial or not.

That being said, what we wanted to do was determine the quality and readability of online information on shockwave therapy, just to see what's out there, just to see what patients are reading. We found that the readability was okay, and readability just means how readable is the text; all text for patients should be at an 8th grade level. But the quality was very biased. That was one interesting finding we found. There were also multiple different sources. We just searched on Google: shockwave therapy for erectile dysfunction. After that, we found many different web pages, so we analyzed the articles in those web pages. We found that data from private clinics that were selling shockwave therapy were more biased. Urology Times was on there, academic websites were on there, and they had better quality because they showed the benefits of doing shockwave therapy or not doing shockwave therapy; they were more objective [and] they said more facts, whereas the private clinics were a little more biased. So, we published that.

What we did in this study was we wanted to see whether ChatGPT can determine readability and quality. So, in our previous study, what we did was we had human evaluators. They answered a questionnaire called DISCERN, and DISCERN asks, "Are the aims clear? Does it achieve its aims? Is it relevant? Is it balanced and unbiased?" So, we went through that questionnaire and determined the quality of each article. But in this study, we wanted ChatGPT to do so.

What were some of the notable findings?

After we had ChatGPT evaluate quality and readability, we compared ChatGPT's findings to our human evaluator findings. We were surprised, because given ChatGPT's incredible capabilities, ChatGPT's findings were different than ours. So, what does that mean? That means that it did not determine that quality was at the same level that we did. So, for example, when analyzing a private clinic website that gave an article about shockwave therapy, we determined that the quality wasn't as good, whereas ChatGPT determined that the quality was good.

So now, the question is who's right: ChatGPT or 3 human evaluators? But that's just 1 of the weaknesses we pointed out about AI and ChatGPT in general. Especially in the era of AI, more patients are starting to go into ChatGPT, and they're relying on information from there. So, that's something we should be aware of, is ChatGPT isn't yet ready to determine the quality of online text, but I think it's going to change in the future.

This transcription has been edited for clarity.