NIHR Oxford Biomedical Research Centre

Enabling translational research through partnership

MENUMENU
  • About
    • About the NIHR Oxford Biomedical Research Centre
    • Impact
    • Our next BRC
    • Steering Committee
    • Promoting equality, diversity and inclusion in research
    • Current Vacancies
    • Stay in Touch
    • Contact Us
  • Research

        • Research Overview
        • Clinical Research Facility
        • Health Economics
        • Ethics in the NIHR Oxford BRC
        • Medical Statistics
        • Infections in Oxfordshire Database (IORD)
        • 15 Research Themes

        • Cancer
        • Cardiovascular Medicine
        • Digital Health from Hospital to Home
        • Gene and Cell Therapy
        • Genomic Medicine
        • Imaging
        • Inflammation across Tissues
        • Life-saving Vaccines
        • Metabolic Experimental Medicine
        • Modernising Medical Microbiology and Big Infection Diagnostics
        • Musculoskeletal
        • Preventive Neurology
        • Respiratory Medicine
        • Surgical Innovation, Technology and Evaluation
        • Translational Data Science
  • Patient and Public Involvement
    • For patients and the public
    • For researchers
    • More information
  • Training Hub
    • Training Hub Overview
    • Clinical Academic Pathway
    • Internships
    • Pre-Doctoral Research Fellowships
    • Senior Research Fellowships
    • Doctoral Awards
    • Post-Doctoral Awards
    • Pre-Application Programme
    • Other Funding
    • Leadership Training
    • Useful Links
    • Training and Education Resources
    • Upcoming Training Events & Courses
    • Success Stories
  • Industry and Partnerships
    • Collaborate with Oxford BRC
    • Who Do We Work With?
    • Events
    • Partnerships News
    • Further Information and Additional Resources
    • Contacts for Industry
  • Videos
  • News
  • Events

News

You are here: Home > Digital Health from Hospital to Home > Study warns of risks in AI chatbots giving medical advice

Study warns of risks in AI chatbots giving medical advice

10 February 2026 · Listed under Digital Health from Hospital to Home

The largest user study of large language models (LLMs) for assisting the general public in medical decisions has found that they present risks to people seeking medical advice due to their tendency to provide inaccurate and inconsistent information.

person's hands using a computer keyboard
Photo: Kit via Unsplash

The research reveals a major gap between the promise of large language models (LLMs) and their usefulness for people seeking medical advice. While these models now excel at standardised tests of medical knowledge, they pose risks to real users seeking help with their own medical symptoms.

The study, published in Nature Medicine, was carried out by the Oxford Internet Institute and the Nuffield Department of Primary Care Health Sciences at the University of Oxford, in partnership with MLCommons. The research team received support form the NIHR Oxford Biomedical Research Centre.

Among the key findings of the research were:

  • No better than traditional methods: Participants used LLMs to identify health conditions and decide on an appropriate course of action, such as seeing a GP, or going to the hospital, based on information provided in a series of specific medical scenarios developed by doctors. Those using LLMs did not make better decisions than participants who relied on traditional methods like online searches or their own judgment.
  • Communication breakdown: The study revealed a two-way communication breakdown. Participants often didn’t know what information the LLMs needed to offer accurate advice, and the responses they received frequently combined good and poor recommendations, making it difficult to identify the best course of action.
  • Existing tests fall short: Current evaluation methods for LLMs do not reflect the complexity of interacting with human users. Like clinical trials for new medications, LLM systems should be tested in the real world before being deployed.

“These findings highlight the difficulty of building AI systems that can genuinely support people in sensitive, high-stakes areas like health,” said Dr Rebecca Payne, GP, lead medical practitioner on the study.

“Despite all the hype, AI just isn’t ready to take on the role of the physician. Patients need to be aware that asking a large language model about their symptoms can be dangerous, giving wrong diagnoses and failing to recognise when urgent help is needed.”

Real users, real challenges

In the study, researchers conducted a randomised trial involving nearly 1,300 online participants who were asked to identify potential health conditions and recommended course of action, based on personal medical scenarios.

The detailed scenarios, developed by doctors, ranged from a young man developing a severe headache after a night out with friends, for example, to a new mother feeling constantly out of breath and exhausted.

One group used an LLM to assist their decision-making, while a control group used other traditional sources of information.

The researchers then evaluated how accurately participants identified the likely medical issues and the most appropriate next step, such as visiting a GP or going to A&E. They also compared these outcomes to the results of standard LLM testing strategies, which do not involve real human users.

The contrast was striking: models that performed well on benchmark tests faltered when interacting with people.

They found evidence of three types of challenge:

  • Users often didn’t know what information they should provide to the LLM
  • LLMs provided very different answers based on slight variations in the questions asked
  • LLMs often provided a mix of good and bad information which users struggled to distinguish.

“Designing robust testing for large language models is key to understanding how we can make use of this new technology,” said lead author Andrew Bean, a doctoral researcher at the Oxford Internet Institute.

“In this study, we show that interacting with humans poses a challenge even for top LLMs. We hope this work will contribute to the development of safer and more useful AI systems.”

← New Kadoorie Institute for Trauma, Emergency and Critical Care established
OUH receives over £1m for vital research equipment →

Other news

News Categories

News by Month

See all news

Subscribe to the Oxford BRC Newsletter

Keep informed about the work of the Oxford BRC by subscribing to our Mailchimp e-newsletter. It is produced several times a year and delivers news and information about upcoming events straight to your inbox.

Subscribe Now

Feedback

We’d love to hear your feedback. Please contact us at obrcenquiries@ouh.nhs.uk

Oxford BRC on Social Media

  • Bluesky
  • Facebook
  • LinkedIn
  • YouTube
  • Data Control and Privacy
  • Accessibility
  • Our Partners
  • Disclaimer
  • Contact

Copyright © 2026 NIHR Oxford Biomedical Research Centre