Building safer chatbots for public health

Meedan Program Director Nat Gyenes, a fellow of the AMS Fitzgerald Fellowship in AI and Human-Centered Leadership at the University of Toronto’s Joint Centre for Bioethics, is focusing her fellowship project on developing public health-informed benchmarks that can improve our digital information ecosystem.

With the rapid advance of large language models, AI chatbots are quickly becoming go-to sources of information on everything from politics to natural disasters to public health. While traditional search engines overwhelm us with information that may be relevant to our questions, chatbots “give us an answer.”

But that answer may not always be correct. Despite significant breakthroughs, AI chatbots still regularly offer answers that are biased or flat-out wrong. They also face major hurdles when it comes to language and local context. Large language models that power AI systems are often rooted in English, and simply do not work as well in other languages. Chabots built for resource-rich contexts may recommend, for example, that a person take steps that are not possible or not safe for individuals in crisis or emergency situations.

Right now, relying on an AI chatbot for information about one’s sexual or reproductive health is uniquely risky. Inaccurate or irrelevant responses can lead a person to put their own health in jeopardy, and responses that are insensitive to cultural or linguistic norms can drive users away entirely.

This is why Meedan’s research team has set out to build standards for public health chatbots that will help address emergent health needs by taking a safe, inclusive, multilingual approach that will consider the unique challenges faced by healthcare workers and patients in crisis settings.

As part of this project, we are partnering with omgyno, a Greece-based social enterprise that leverages digital tools to deliver sexual and reproductive health information and anonymous home testing kits for vaginitis, sexually-transmitted infections, urinary tract infections, and HPV – a virus that can lead to cervical cancer. omgyno specializes in providing these services in high-risk contexts such as conflict zones where there is often limited access to in-person medical care, and where target users have diverse linguistic needs and cultural norms.

When people who use omgyno’s services have questions about their bodies, symptoms, or test results, they can use WhatsApp to chat with medical staff on omgyno’s team or book a telehealth consultation with a doctor. While these chats bring a huge benefit for patients, they do not scale — the team only has a handful of licensed physicians available to talk with patients. Our team builds AI-supported conversational chatbots that can help streamline the kind of work that omgyno’s team would otherwise have to do manually. On the backend, our software captures frequently-asked questions, matches them with relevant answers and other resources, and gets that information to people who need it.

But how do we ensure that our chatbot helps omgyno deliver high-quality information that takes peoples’ cultural, linguistic and contextual needs into account? In order to better serve omgyno and other organizations like it, we’re working to develop what’s known as an AI benchmark — an evaluation standard that can be used to assess an AI model for things like performance, relevance to a given population, fairness and accuracy. AI benchmarks can provide a layer of validation or a “stress test” for a tool before it is deployed to serve real people. If an AI chatbot “fails” a strong benchmark, this helps identify key areas where an AI chatbot will need to improve in order to effectively and equitably solve whatever problem it aims to address.

We’ll start by working to better understand the nuances of specific languages and circumstances of the communities that omgyno serves. We will develop mock conversations and question-and-answer exchanges that are specific to these languages and contexts, and then annotate them to understand similarities and differences between various kinds of responses. We will also work to identify the types of questions that a chatbot should or should not answer, and where other resources or services should be brought in, depending on this contextual data.

This project will build on two pieces of work we undertook during and after the COVID-19 pandemic. Through our Health Desk project, our public health team developed a response service for journalists and fact-checkers who were looking for information and quotes from health researchers and practitioners to inform their reporting. This led to a set of questions and answers that could be used for more effective and efficient responses to journalists.

Later, our team examined the gaps between the content provided by organizations such as the World Health Organization and the day-to-day questions people actually asked, as captured by their anonymized searches on Bing and posts on Twitter. Our research and engineering teams worked together to leverage machine learning methods for clustering and thematically assessing questions and WHO content to identify gaps.

We were able to do this across 12 countries, and created a valuable methodology that helped us understand what people actually wanted and needed to know during COVID-19, and what information was not provided by key information authorities. This helped to support public health information stakeholders working to better organize and prioritize information that bots could then distribute to users. We’ll publish a paper on this later in 2025.

As we launch the next phase of this work in partnership with omgyno, we’re looking forward to developing a benchmark that can help our partners — and other like-minded organizations — to build health-focused AI chatbots that will respond to sexual health and reproductive rights information needs in a manner that is inclusive, relevant, and responsive to real-world challenges related to information access and equity through technology.

‍

We collaborated with 53 partner organizations worldwide to design and carry out our 2024 elections projects. We extend special gratitude to our lead partners in Brazil, Mexico and Pakistan, whose work we highlight in this essay.

The 2024 elections projects featured in here would not have been possible without the generous support of these funders.

Footnotes

References

Authors

Words by

Nat Gyenes, MPH, leads Meedan’s Digital Health Lab. She received her masters in public health from the Harvard T. H. Chan School of Public Health, with a focus on equitable access to health information and human rights. She is a lecturer at Harvard University on the topic of health, digital media and human rights.

Nat Gyenes

Words by

Organization

Published on

February 21, 2025

Building safer chatbots for public health

Building safer chatbots for public health

Footnotes

References

Authors

Words by

Published on

Related

Journalism and AI: Reflections from World Press Freedom Day

NeurIPS award-winning paper on diverse feedback for LLMs with the University of Oxford

Meedan leads workshop at Palestine Digital Activism Forum (PDAF) 2024

The Checklist Newsletter