Meedan Program Director Nat Gyenes, a fellow in AI and Human-Centered Leadership at the University of Toronto’s Joint Centre for Bioethics, is focusing her fellowship project on developing public health-informed benchmarks that can improve our digital information ecosystem.
With the rapid advance of large language models, AI chatbots are quickly becoming go-to sources of information on everything from politics to natural disasters to public health. While traditional search engines overwhelm us with information that may be relevant to our questions, chatbots “give us an answer.”
But that answer may not always be correct. Despite significant breakthroughs, AI chatbots still regularly offer answers that are biased or flat-out wrong. They also face major hurdles when it comes to language and local context. Large language models that power AI systems are often rooted in English, and simply do not work as well in many other languages. Chabots built for resource-rich contexts may recommend, for example, that a person take steps that are not possible or not safe for individuals in crisis or emergency situations.
Right now, relying on an AI chatbot for information about one’s sexual or reproductive health is uniquely risky. Inaccurate or irrelevant responses can lead a person to put their own health in jeopardy, and responses that are insensitive to cultural or linguistic norms can drive users away entirely.
This is why Meedan’s research team has set out to build standards for public health chatbots intended to help address emergent health needs by taking a safe, inclusive, multilingual approach that will consider the unique challenges faced by healthcare workers and patients in crisis settings.
As part of this project, we are partnering with omgyno, a Greece-based social enterprise that leverages digital tools to deliver sexual and reproductive health information and anonymous home testing kits for vaginitis, sexually-transmitted infections, urinary tract infections, and HPV – a virus that can lead to cervical cancer. omgyno specializes in providing these services in high-risk contexts such as conflict zones where there is often limited access to in-person medical care, cultural barriers, and where many different languages are spoken.
When people who use omgyno’s services have questions about their bodies, symptoms, or test results, they can use WhatsApp to chat with medical staff on omgyno’s team or book a telehealth consultation with a doctor. While these chats bring a huge benefit for patients, they do not scale — the team only has a handful of licensed physicians available to talk with patients. Our team builds AI-supported conversational chatbots that can help streamline the kind of work that omgyno’s team would otherwise have to do manually. On the backend, our software captures frequently-asked questions, matches them with relevant answers and other resources, and gets that information to people who need it.
But how do we ensure that our chatbot helps omgyno deliver high-quality information that takes peoples’ ’ cultural, linguistic and contextual needs into account? In order to better serve omgyno and other organizations like it, we’re working to develop what’s known as an AI benchmark — an evaluation standard that can be used to assess an AI model for things like performance, relevance to a given population, fairness and accuracy. AI benchmarks can provide a layer of validation or a “stress test” for a tool before it is deployed to serve real people. If an AI chatbot “fails” a strong benchmark, this helps identify key areas where an AI chatbot will need to improve in order to l effectively and equitably solve whatever problem it aims to address.
We’ll start by working to better understand the nuances of specific languages and circumstances of the communities that omgyno serves. We will develop mock conversations and question-and-answer exchanges that are specific to these languages and contexts, and then annotate them to understand similarities and differences between various kinds of responses. We will also work to identify the types of questions that a chatbot should or should not answer, and where other resources or services should be brought in, depending on this contextual data.
This project will build on two pieces of work we undertook during and after the COVID-19 pandemic. Through Health Desk, our public health team developed a response service for journalists and fact-checkers who were looking for information and quotes from health researchers and practitioners to inform their reporting. This led to a set of questions and answers that could be used for more effective and efficient responses to journalists.
Later our team examined the gaps between the content provided by organizations such as the World Health Organization and the day-to-day questions people actually had as captured by their anonymized searches on Bing and posts on Twitter. Our research and engineering teams worked together to leverage machine learning methods for clustering and thematically assessing questions and WHO content to identify gaps.
We were able to do this across 12 countries, and created a valuable methodology that helped us understand what people actually wanted and needed to know during COVID-19, and what information was not provided by key information authorities. This helped to support public health information stakeholders working to better organize and prioritize information that bots could then distribute to users. The methodology supports an analysis of user questions en masse in order to We’ll publish a paper on this later in 2025.
As we launch the next phase of this work, we’re looking forward to developing a benchmark that can help our partners — and other like-minded organizations — to build health-focused AI chatbots that will respond to sexual health and reproductive rights information needs in a manner that is inclusive, relevant, and responsive to real-world challenges related to information access and equity through technology.
We collaborated with 53 partner organizations worldwide to design and carry out our 2024 elections projects. We extend special gratitude to our lead partners in Brazil, Mexico and Pakistan, whose work we highlight in this essay.



The 2024 elections projects featured in here would not have been possible without the generous support of these funders.




Footnotes
References
Authors
Words by
Nat Gyenes, MPH, leads Meedan’s Digital Health Lab. She received her masters in public health from the Harvard T. H. Chan School of Public Health, with a focus on equitable access to health information and human rights. She is a lecturer at Harvard University on the topic of health, digital media and human rights.