Content Moderation Toolkit

Executive Summary
Content Moderation Toolkit: Executive Summary

In the current digital media ecosystem, content moderation has become central to conversations about managing violations of human rights on and through technology platforms. Solutions for effective and equitable content moderation cannot be led by any one sector but, at the same time, challenges in coordination among stakeholders and disciplines complicate efforts for effective change.

In order to address these challenges, Meedan’s Content Moderation Project has developed a working group as an ongoing collaboration between stakeholders across industry, academia, policy, and civil society groups to develop clear pathways to responsible content moderation for technology platforms. Our goal is to standardize knowledge across sectors by developing vocabularies, taxonomies, and a body of research to enable informed, actionable discussions between different disciplines, stakeholders, and the public.

  • Through this working group, Meedan’s work focused on three central themes:
    • pertinent content moderation challenges, summarized into a literature review and stakeholder report;
    • content moderation in practice, which culminated into an applied toolkit useful for civil society organizations;
    • and applied research through the 2019 Housing and Population Census in Kenya, where frameworks for how to both standardize and contextualize responses to hate speech and harassment were developed and tested.


To meet these objectives, we have developed four documents that contribute to improved research and practice at the intersection of product development, resource development, content moderation, and hate speech.

1. A report covering our requirements gathering research on the state of content moderation issues identified by different affected stakeholders.

This research paid particular attention to the needs of industry as they intersect with those of other stakeholder groups, with an eye toward facilitating more productive and impactful conversations between industry, external organizations, and advocates raising issues and questions around content moderation practices and policies.

Findings from this research demonstrated an absence of baseline expertise in content moderation (and its externalities) as a domain, inconsistent vocabulary within and between sectors, ill-defined benchmarks for accountability, and instability in the maintenance of working relationships across industry, civil society, regulatory bodies, and academia.

As a result, we emphasize the importance of the Content Moderation Toolkit serving as a bridge amongst industry, academia, and civil society, connecting policy decisions and conversations with practical experience from industry, building data collection and case study opportunities to inform practical applications of content moderation approaches that are standardized and scalable, but allow for contextualization across languages and regions.

This research enabled us to successfully implement an applied research initiative in Nairobi, Kenya, focused on the tools and processes needed to address content moderation challenges in multi-language and unique digital media ecosystems.

2. A review of literature on topic-specific content moderation issues, summarizing important insights from the academic research field for industry stakeholders. Our findings are focused on two key thematic areas:

Hate Speech and Machine Learning In this section, we focus on the importance of and challenges associated with different definitions of hate speech, identify technical issues in the automated identification of hate speech, and examine the experience of hate speech moderation in practice. In each part, we summarize the challenges faced by platforms, academics, NGOs, and users in addressing hate speech, and propose opportunities to deepen cooperation.

Content Moderation & Vulnerable Populations: LGBTQ Communities We first examine the tension between the need for privacy and the need to connect with community online for LBGTQ people, and follow this with a discussion about the underlying anxieties and safety concerns that drive user wants and needs before turning to explore proposed design solutions. We explore this through three short example cases—Pride Month & Twitch, Streamers & Community Safety, and Fandom & Community Regulation—that serve to illustrate various points.

3. A content moderation toolkit for civil society that articulates a common vocabulary for stakeholders in civil society, academia, and policy to have meaningful, practical conversations with industry, through the development of a methodology for classifying moderation methods and an inventory of moderation tools.

The methodology for classifying moderation methods distinguishes components of the process that underlies a ‘content moderation decision,’ and defines critical properties (contexts, operators, and orientations) that characterize moderation product features. This toolkit inventory applies the above methodology by cataloguing the main moderation product features, or tools, used to enforce content moderation decisions on social media platforms.

In order to enable discussions of moderation features across platforms, this inventory applies a proposed set of classes of tools that correspond to content moderation actions as they are understood and named by the public, and articulates variations of how platforms implement tools in these classes.

This toolkit and inventory aim to inform stakeholders outside of industry how to advocate for product or policy changes in an actionable language that is informed by industry practices. This includes having a more intricate, nuanced understanding of the different types of moderation and the types of actions they perform, and how these actions can connect to the change stakeholders desire to see. At the same time we aim to allow stakeholders to better understand the infrastructure/tooling needed to support the implementation and enforcement of those policies (or their goals/needs).

Ultimately, content moderation sits at the intersection of a balance between harms to individuals and harms to society overall, with no easy solutions. Our intent is for the vocabulary that emerges from this report to enable more robust discussions around the trade-offs inherent in these topics.

4. A case study from the Kenyan Election and Census along with a draft set of indicators.

In August of 2019, the Meedan team, in collaboration with Article19 and the National Democratic Institute, held a coworking pop-up in Nairobi on the topics of hate speech, content moderation and disinformation, focused on collaboration and skill-sharing, with specific case examples through Kenya’s 2019 Population and Housing Census. The goal of this coworking space pop-up was to create a shared gathering area where CSOs could share ideas and collaborate, gain and share skills, and explore together how tools and resources can best support organizational work in addressing disinformation challenges during key large-scale events, including the Census, and through the implementation of the new digital identification system, Huduma Namba.

During this event, participating Civil Society Organizations (CSOs) political activists and human rights groups used the space to conduct their work as usual, with opportunities to attend workshop sessions and presentations surrounding hate speech modalities in the Kenyan context, how memes intersected with hate speech through the 2019 Census, an efforts of various platforms to address and mitigate the impacts of harassment and hate speech online in the Kenyan digital media ecosystem. Kenyan thought leaders and organizations presented on their own work, including fact-checking organizations Pesa Check, Defy Hate Now and AfricaCheck, and conducted training workshops on applying Meedan’s Check tool for building standardized annotation efforts to address hate speech and harassment online.

We generated our initial insights from two pre-workshop surveys, participatory design activities, workshop outputs and informal semi-structured interviews conducted over the course of the week.

Key findings

  • 1. Social media global standards for content moderation do not represent how hate speech, misinformation, and disinformation manifest in the Kenyan context. Social media posts that had been reported and remained visible on social media platforms were reviewed by workshop participants; they identified many of these posts as false propaganda, overt hate speech, harassment, and incitement of violence. We developed a set of indicators (e.g., Incitement to violence) that mapped to a subset of global standards for social media content moderation policies. We translated these indicators into questions (e.g., Does the post use a call to action?) to annotate social media posts. Participants struggled to annotate content that expressed ethnic hate, violent misogyny, and anti-LGBTQI sentiment through contemporary cultural or political references on social media; some additionally noted a limited ability to articulate what the gaps in the indicators were.

  • 2. Standard definitions and precise cues to identify hate speech and disinformation are a critical step for enabling interoperability between CSO initiatives. Civil society stakeholders in our workshops used different cues to determine whether an article or social media post contained hate speech, harassment, and disinformation. Throughout the workshops, participants collaboratively reconciled different Kenyan perspectives to work towards functional common definitions and regional cues that are able to represent diverse cultures within Kenya. Through the process of developing these definitions and cues, participants identified types of instances where social media global standards for content moderation fail in the Kenyan context, and organized around systematically investigating and documenting these gaps.

  • 3. Identifying interventions for hate speech, misinformation, and disinformation in Kenya requires ongoing collaboration with local stakeholders embedded in the Kenyan context. Assessing the impact of social media posts and articles requires deep contextual knowledge of cultural discussion around current events, historical background, and how online social media impacts people offline. CSO stakeholders did not only have crucial knowledge for interpreting coded terms, political incentives, and practices of media production and consumption on social media platforms in Kenya, but extensive knowledge on the ways marginalized and vulnerable populations were disproportionately affected by hate speech, misinformation, and disinformation online and offline.

This report is part of the Content Moderation Toolkit