Content moderation is a process, not a solution in and of itself. With the increasing complexity of content on the internet and norms around governance, new standards are needed to help bridge the gap between industry and those seeking to engage with industry. Developed by a multistakeholder working group, the Content Moderation Toolkit for Civil Society articulates a common vocabulary and methodology for content moderation.
The Content Moderation Project developed this inventory through systematic documentation of moderation tools that are common on popular social media platforms (e.g., Facebook, WhatsApp, Twitter, Reddit, YouTube) to the extent they are available to the public, informed by interviews with industry practitioners. We developed categories of tools through a content analysis of the tools documented and discussions with members of the working group.
We don’t see this as being completely comprehensive per se, but as the beginning of a model that can be used for an ongoing inventory in an effort to develop standards. Indeed, this toolkit serves as a proof of concept for a fuller model that would be developed collaboratively with stakeholders from different industries. In the long term, we believe that articulating this inventory will help civil society actors provide more informed opinions in their conversations with platforms, while serving platforms in cross-industry conversation about these core issues.
Why develop an inventory?
During our needs finding research, we identified a critical gap between how industry stakeholders discussed moderation decisions and how those outside industry—especially those in civil society, academia, and policymaking—understood these decisions. In that light, the Content Moderation Toolkit is designed with the following core goals in mind:
- Develop a common vocabulary for stakeholders in civil society, academia, and policy to have meaningful, practical conversations with industry.
- This toolkit and inventory aims to inform stakeholders outside of industry how to advocate for product or policy changes in an actionable language that is informed by industry practices. This includes having a more intricate, nuanced understanding of the different types of moderation and the types of actions they perform, and how these actions can connect to the change stakeholders desire to see. At the same time we aim to allow stakeholders to better understand the infrastructure/tooling needed to support the implementation and enforcement of those policies (or their goals/needs).
- Compile research insights in practical, searchable literature databases and summaries tailored for industry.
- Literature reviews of key academic papers.
- Construct an ongoing database developed by a working group of academic and civil society experts in the field.
- Develop an evidence-based model for making policy recommendations by collecting and auditing moderation-related data.
Defining Content Moderation and its Properties
What is this inventory?
This inventory is a proposed method of classification for developing a taxonomy of content moderation enforcement tools. This inventory catalogues the main moderation product features used to enforce content moderation decisions on social media platforms. We classify these moderation features as tools.
Different social media platforms use different terms to describe similar or identical moderation features, and conversely, use the same terms to describe moderation features that are implemented differently across platforms (e.g., unfollow on Twitter vs. unfollow on Facebook). Moreover, platforms continue to change these features or release new features to address current and ever-emerging challenges in content moderation.
Users and organizations external to social media companies thus have limited, confusing, and inconsistent insight into how content moderation features operate within the systems underlying platform governance, from both technical and platform policy perspectives.
Rather than documenting the current versions of how each tool is designed and implemented on each platform, it’s important to develop consistent terminology that describes the functions of these tools to make moderation enforcement comprehensible and develop a vocabulary to differentiate between approaches or impacts of the ways social media companies deploy these tools.
In order to enable discussions of moderation features across platforms, this inventory proposes a set of classes of tools that correspond to content moderation actions as they are understood and named by the public, and articulates variations of how platforms implement tools in these classes.
For each tool, the inventory lists a range of common implementations of that class of tool as it has been deployed on popular social media platforms. Each platform’s implementation of a moderation enforcement tool emerges from a differential sets of resources, priorities, and trade-offs, audiences, and structures of interaction on the platform. Therefore, the goal is not to impose a standard of how these tools should be articulated or labeled on platforms; rather, this inventory is designed to help standardize a vocabulary across these systems, to help build a meaningful way for civil society actors and platforms to engage.
We see three properties of content moderation enforcement tools that influence how they are implemented and used:
- Context: How interaction between users is structured on a platform
- Ownership: Who is doing the moderation enforcement through each tool
- Orientation: What temporal dimension each tool is meant to support.
Variance in execution of enforcement between platforms can occur for a variety of reasons including:
- How tools are designed and operate
- What is possible in a technical or operational capacity
- Policy limitations
- Relationship with other stages of the moderation process
- Concerns expressed by users and public stakeholders
According to a recent report developed by human rights and media organization WITNESS, society stands at a precipice of opportunities and challenges around online content:
- The opportunity: In today’s world, digital tools have the potential to increase civic engagement and participation—particularly for marginalized and vulnerable groups—enabling civic witnesses, journalists, and ordinary people to document abuse, speak truth to power, and protect and defend their rights.
- The challenge: Bad actors are utilizing the same tools to spread misinformation, identify and silence dissenting voices, disrupt civil society and democracy, perpetuate hate speech, and put individual rights defenders and journalists at risk. AI-generated media in particular has the potential to amplify, expand, and alter existing problems around trust in information, verification of media, and weaponization of online spaces.
- The report highlights the following core dilemmas:
- Who might be included and excluded from participating?
- The tools being built could be used to surveil people
- Voices could be both chilled and enhanced
- Authenticity infrastructure will both help and hinder access to justice and trust in legal systems
- Technical restraints might stop these tools from working in places they are needed the most.
- News outlets face pressure to authenticate media
- Social media platforms will introduce their own authenticity measures
- Data storage, access and ownership: Who controls what?
- The technology and science is complex, emerging and, at times, misleading
- How to decode, understand, and appeal the information being given
- Those using older or niche hardware might be left behind
- Jailbroken devices will not be able to capture verifiable audio visual material
- If people can no longer be trusted, can blockchain be?
While these dilemmas reflect the challenges of authenticating media, many apply equally well to the competing rights and responsibilities for content moderation in the context of free expression and user safety. Ultimately, content moderation sits at the intersection of a balance between harms to individuals and harms to society overall, with no easy solutions. This report seeks to create a vocabulary to enable more robust discussions around this topic.
What is content moderation? There is no authoritative definition of “content moderation,” but the term broadly refers to the process of monitoring, judging, and acting on user-generated content to enforce policies, rules, or guidelines (often called community standards) determined by a governing body. Public engagement with content moderation most prominently centers the role of social media companies in moderating content on their platforms. Therefore, social media companies are most commonly the assumed governing bodies. In reality, there are many intersecting layers of governance.
- A few key working assumptions should be noted:
- Content moderation is distinct from moderation more broadly. Moderation may include regulation of broader patterns of behavior, networks, or collaboration, in addition to the production of content. Content moderation acts on content that is produced or posted on technology platforms, including media, comments, messages, and others.
- Content moderation is widely articulated as being solely under the purview of moderation conducted by platforms. However, the work and responsibility of monitoring, judging, and acting on user-generated content is often offloaded onto platform users and community moderators.
What is content moderation “enforcement”?
This document specifically focuses on enforcement tools, while recognizing the multi-layered nature of content moderation. That said, it’s important to recognize that content moderation is a process that can be articulated with five distinct components:
- Detection: Locating and identifying content that may violate platform policy
- Adjudication: Determining if the content is in violation of platform policy
- Enforcement: Acting on content based on the consequence determined by platform policy
- Appeal: Returning to the adjudication stage if a user contests or appeals a platform judgment
- Policy: The set of principles, rules, or guidelines that determine what content is acceptable on a platform. In practice these guidelines are reviewed and updated based on other components of the content moderation process.
Properties of a Content Moderation Enforcement Tool
Standards need persistent properties for comparison, measurement, and generalizability. In order to build towards those standards, we operationalize some core properties of content moderation tools.
This document defines a tool for content moderation enforcement on a platform as a feature or option an actor can use to prevent or mitigate harms perpetrated by another actor. The term “tool” distinguishes the intended use of moderation tools and their technical affordances from practices of moderation that might not be explicitly designed for as well as from the unintended effects of the actions.
Context: Forum, Feed, Direct Message
Content moderation can operate within or govern a multiplicity of social contexts on social media platforms. Each social context has its own characteristics that emerge from the ways interaction is mediated through the platform. While specifics vary considerably across platforms, we define three useful, general contexts: forums, feeds, and direct messages.
- Forum: Content is organized thematically in forums and users contribute to threads, pages, groups, subreddits, etc. according to the topic of their content. Elements of Wikipedia, Facebook Groups, Reddit, and general Internet forums such as Mumsnet fit this type.
- Feed: Users follow or friend one another and author content from their individual accounts. The content from a user’s contacts on the platform is combined and, often, algorithmically ordered to produce a feed—a single list of content that combines content from a user’s friends and is personal to that user. Facebook and Twitter are two popular platforms displaying content this way.
- Direct Message: Two or more users exchange private messages. These messages have a specific audience and content is shared only with these specific users. Algorithmic re-ranking/ordering is usually not present. Facebook Messenger, Twitter DM, and WhatsApp all fit this context.
It is often the case that one platform employs multiple contexts as the above examples show.
Owner: User, Community, Platform
This property categorizes tools based on the level at which they operate and who performs the actions. That is, some tools will enable users to change only their personal experiences of the platform, others will enable a community member (defined below) to change the experience for a subset of users of the platform, and still others will enable the platform owner to change the experience for anyone (or everyone) on the platform as a whole.
Understanding the implications of current moderation tools and opportunities for intervention necessitates engaging with who is wielding which tools. Clarifying the question of “Who can moderate the content on a social media platform?” enables stakeholders beyond social media companies to investigate who is responsible for moderation and how to propose who should have power over content moderation and to what extent.
- Naming who is wielding the tools of moderation and which tools they’re wielding (or, what kind of moderation they’re empowered to do) enables intervening stakeholders to:
- See what options they have to grant different actors different levels of power to moderate (and the resulting trade-offs)
- Imagine more sophisticated and distributed models of detection and adjudication
- Recognize that all of these actors intersect in key ways and do not operate in silos
- In order to accomplish this, key questions must be asked of these actors.
- What moderation tools are available to them?
- What, or who, are they allowed to moderate?
- At what scale must they moderate?
- What are the limits of their ability to moderate (or moderate effectively)?
- Who is the moderator accountable to?
- Who is influenced by their moderation?
- What set of trade-offs must the moderator balance?
- What are the goals or obligations of the moderator?
- Who does the moderator want to protect?
- What measures is the moderator willing to take?
On a technical level, any particular attribute of a category of owner is not necessarily mutually exclusive with the other categories. Rather, these categories articulate profiles of moderators that have emerged on social media platforms. Each has distinct characteristics that afford different modes of interaction and moderation.
User-centered content moderation typically targets the experience of the individual who is deploying moderation measures, with limited impact on the experience of other users on the platform.
User-centered moderation tools are not typically thought of as “content moderation,” but they figure centrally into how decision-makers at social media companies balance their desire to enable free expression on their platforms with their responsibility to minimize harm towards their users. Social media companies reduce their roles as singular arbiters of content by distributing control of how content is experienced on platforms by offering tools that enable users to moderate what content they experience. Resources on abuse offered by social media companies often encourage platform users to employ user-centered moderation tools before flagging content for platform moderation—however, whether the user or the platform should be taking moderation measures is a point of contention. Distinguishing and building a vocabulary around the distinct tools users can use relative to those of platforms enables more nuanced discussions of the trade-offs and impacts of one approach over the other.
User voting to make content less visible may be an edge case of user-centered content moderation enforcement. The moderating impact of user voting is typically opaque to users: different platforms and community spaces interpret and respond to user voting metrics differently.
Community Content Moderation is moderation conducted by a subset of accounts within a forum. Community moderator accounts can perform actions not available to other accounts on posts, accounts, and the forum itself. They may have a formal or informal responsibility to regulate the content in the forum, typically towards the goal of enforcing policies in line with the values of the forum’s community or the organization hosting the forum.
“Community moderation” is not necessarily a formal role, but a category that describes a set of moderation features, roles, or obligations. On different social media sites or forums individuals who occupy this role may be formally designated as “moderators” (often abbreviated as “mods”), “administrators” (often abbreviated as “admins”), “operators” (sometimes abbreviated as “ops”), “community manager,” or other titles.
Messaging services (e.g., WhatsApp, Skype, Facebook Messenger) typically introduce Community Moderator features when a group chat is created, as opposed to a direct message sent between two users. Community moderator features are most often granted by default to the user who created this group chat and often allow community moderators to designate other users as additional community moderators in the group chat. Exceptions include Twitter Direct Message groups and Signal group chats, which have no community moderation features.
The circumstances under which a community moderator can take moderation actions towards another community moderator varies by platform.
Community moderation work may be volunteer or paid, but is (normally) not employed by the platform. An admin for a Facebook Group, for example, can be a volunteer moderator or a paid social media manager for a company running that group. Paid community moderation work falls within the purview of many professional roles and therefore is subsumed under a variety of terms, including social media management, community management, and community operations.
It’s important to distinguish community moderation from platform moderation because community moderation offers alternative models of top down moderation.
Platform content moderation is the process of monitoring, judging, and acting on user-generated content to enforce a technology platform’s policies, rules, or guidelines (often called community standards) by actors within the company that owns that platform.
Platform content moderation may be conducted by a number of people with different roles within or on behalf of the company that owns a platform, but moderation often appears to users through standardized messaging and actions on their account or their content without a clear actor.
Orientation: Proactive, Reactive, Preventative
Content moderation is widely perceived to be limited to the actions taken to enforce against a post after it has been posted on a platform. However, regulation of user-generated content so that it adheres to the policy, rules, or guidelines of an online space or platform may also occur through actions that intervene with content and users before policy-violating content is on the platform and visible to other users. Pre-emptive moderation measures can prevent harmful content from reaching users entirely and cultivate user-driven norms that disincentivize posting harmful content or defuse the impact of harmful content that does reach users. Such preventative measures, however, also carry a risk of disincentivizing legitimate speech (this parallels the idea of a ‘chilling effect’ in legal contexts).
- Reactive moderation acts on content after content has been posted to the platform. Example: A moderator of a chatroom on Twitch.tv can clear (remove) all of the messages in a chatroom if there is an influx of messages harassing users in the chatroom.
- Proactive moderation acts on content after it is submitted by a user, but before it is posted to the platform. Example: The Automod feature on Twitch.tv enables chat room moderators to review a chat message after it’s submitted by a user but before the message is posted to the chat room.
- Preventative moderation prevents content from being submitted by a user. Example: The owner of a chatroom on Twitch.tv can require that users view and agree to their chatroom rules before the user can send messages to the chatroom, which has been shown to reduce the amount of reactive moderation measures deployed by chatroom moderators.
It is important to note that platforms may combine these three orientations together. A platform may, in general, employ reactive moderation but hold certain content posts based on an automated analysis of their content or the user posting them for proactive moderation.
Inventory of Tools for Content Moderation Enforcement
Terminology:
- User describes the platform user employing the moderation tool, while account describes the intended target of a moderation tool, typically the social media account of an actor who is a possible source of harm to the user.
- Content is the text or media contained within a post.
- The terms “explicit,” “mature,” “sensitive,” “graphic,” “inappropriate,” and others do not have consistent definitions across platforms, and may variably be used to describe content that displays or depicts nudity, sexual activity, sexual abuse, child exploitation, gore, and violent imagery.
Block
- Description: A block is broadly defined as a feature that enables a user to 1) restrict the access that an account has to that user’s content and profile and 2) make content made by the blocked account no longer visible to that user.
- Owner: User
Context: Feed, Forum, Direct Message
Orientation: Reactive, Proactive - Example: Tumblr: Once a user blocks an account, neither the user nor the blocked account see one another aside from threaded posts where someone can make a comment and another account reblogs and adds to it. Because of Tumblr’s design, these reblogs with added comments offer some level of visibility on what a user is saying or doing if another person reblogs that user’s content with a comment.
- Example: Twitter: A user can individually block an account. The user no longer sees the account’s tweets and the blocked account no longer can see the user’s. Neither party can send a direct message (DM) to the other. The user can still navigate to the profile of the account they have blocked to opt into seeing the tweets only when navigating directly that account’s profile.
- Example: Discord: A user can individually block an account. Once an account is blocked, the blocked account can see the user’s contributions to a conversation in a group chat they are both in, but the user cannot see the contributions of the blocked account in a group chat.
- Example: Facebook: A user can block individual accounts. Blocked accounts cannot see the user’s profile, post on their profile, tag them in posts, see links to their profile, send invites to groups, pages, or events, or send a friend request. This block does not include apps, games, or groups that both the user and the blocked account participate in.
Mute
- Description: Mute is a feature that enables a user to remove a specific piece of content, user, or keyword from their feed and/or notifications so it is no longer visible to them by default. This feature may still enable users to seek out muted posts, profiles, or keywords proactively, or will hide them entirely. Different types of mutes can happen on:
- Notifications from comment threads
- Global content
- Direct or private messages
- Owner: User
Context: Feed, Forum, Direct Message
Orientation: Reactive, Proactive - Example: Facebook: A user can choose the “Snooze for 30 days” for an account so that posts made by that account do not show up in the user’s newsfeed for 30 days.
Hide Post
- Description: An individual post or message is no longer visible to the user, but is still viewable by other users. In direct messages, a user may be able to hide a message by choosing a “delete for me” option but they will not be able to retrieve it once deleted for them.
- Owner: User
Context: Feed, Forum, Direct Message
Orientation: Reactive, Proactive - Example: Reddit: A user can choose to hide a post; it won’t show up in their feed or in the subreddit where it was posted. The user can find the hidden post by navigating to their account profile and selecting the “hidden” posts section.
Remove Post
- Description: The user can remove content that has been posted by an account on a user’s post or profile so that it is no longer viewable by any account where it was posted. Post removals through platform moderation are also commonly called content takedowns. Removing the posted content may result in:
- the content being deleted from the platform,
- remaining on the profile of the account that posted the content, or
- moved to a less visible location (See: Twitter “Hide reply”)
- Owner: User, Community Moderator, Platform
Context: Feed, Forum, Direct Message
Orientation: Reactive, Proactive - Example: Discord: A Discord moderator can select “Delete” from the overflow menu (which appears as “•••”) of a chat message, which will remove the message from the chat entirely.
Unfollow/Unfriend
- Description: The user removes an account from their social network, reducing the visibility of the account to the user or the ability of the account to act on the user, where the account may:
- no longer appear in the feed of the user
- lose access to non-public content posted by the user
- lose the ability to send the user a direct message
- Owner: User
Context: Feed, Forum, Direct Message
Orientation: Reactive, Proactive - Example: LinkedIn: A user can select “Remove Connection” on another account, which will remove that account from their Connections. That account will no longer show up as a Connection and the account will only be able to see the user’s profile details and content posted by the user that are available to the public.
Personal Filter
- Description: Features of this type modify what type of content or behavior is visible to the user, typically in their feeds, notifications, or direct messages. Direct messages that are filtered may be directed to a separate inbox or deleted. A personal filter is distinct from mute, which hides content explicitly based on user-chosen keywords or users.
- Owner: User
Context: Feed, Forum, Direct Message
Orientation: Reactive, Proactive - Example: Twitter: Users can choose to turn on a quality filter, which prevents tweets with “low-quality” content from appearing in their notifications.
Restrict Personal Audience
- Description: Content posted by the user or users within a community is restricted to the audiences designated by the user or community moderator.
- Owner: User, Community Moderator
Context: Feed, Forum
Orientation: Preventative - Example: YouTube: A user can set a video they upload to their channel to “Private,” which removes the video from the “Videos” tab on their channel (i.e. the user’s profile), YouTube’s search results, and recommendations. Comments are automatically disabled for the video and the video cannot be shared by a link. The user can choose to send an invitation to view the video to other user emails, but the video is unviewable otherwise.
Restrict Participation on Content
- Description: A user, community moderator, or platform moderator can prevent multiple accounts from making posts or interacting with posts in a designated space (e.g., posts in a forum space, the comments section of a post, edits on a wiki article). This can include:
- Turning off comments
- Turning off voting
- Filtering posts by a characteristic (e.g., automatically removes posts containing slurs)
- Filtering posting by user characteristics (e.g., A Reddit moderator may make a filter that removes posts made by all accounts less than 10 days old)
- Tagging
- Turning off or filtering content edits (e.g., editing wiki pages)
- Owner: User, Community Moderator, Platform
Context: Feed, Forum
Orientation: Reactive, Proactive, Preventative - Example: Wikipedia: Unregistered users can edit most but not all articles, and they cannot create new articles on the English edition without an account.
Content Screen
- Description: A screen that blocks a type of content partially or entirely with warning about the nature of the content—typically by hiding or blurring the content. When encountering screened content, the user is given the ability to choose whether to view that content or not. Common content types for when the platform screens content by default are graphic sexual or violent content.
- Owner: User, Community Moderator, Platform
Context: Feed, Forum, Direct Message
Orientation: Reactive, Proactive - Example: Reddit: Puts a screen in front of content that is labeled NSFW (not safe for work), which blurs the thumbnails and media preview for users who are using the “Safe browsing mode” setting. Users can choose to click on the content screen to view that content.
Limit Account
- Description: The range of actions otherwise available to a designated account is restricted. Actions that can be restricted for that account include:
- Posting on the platform or community space
- Commenting on other user posts
- Sending direct messages
- Interacting with a post (e.g., “Like,” “Favorite,” votes)
- Sharing other user posts
- Reducing or removing the ability to monetize content that is produced by the account
- Reducing or removing the ability to post advertisements on the platform
- Owner: Community Moderator, Platform
Context: Feed, Forum, Direct Message
Orientation: Reactive - Example: YouTube: YouTube videos that do not follow guidelines for “ad-friendly” content may be demonetized, removing the video creator’s ability to host paid advertisements on their video and by extension their ability to generate revenue from that video through advertisements.
Account Suspension/Ban
- Description: Removes the account from the platform or forum.
- Forum: An account banned from a forum will not be able to rejoin that forum, and possibly will be restricted from viewing the contents of that forum.
- Platform: The account suspended by the platform can no longer be accessed, and the account profile will no longer be visible to other users.
A suspension or ban may be lifted if the account owner performs requested actions or a set amount of time has passed. An account may be suspended permanently if it has received multiple conditional or temporary suspensions.
- Owner: Community Moderator, Platform
Context: Feed, Forum, Direct Message
Orientation: Reactive - Example: Facebook: A Facebook Group Admin or Moderator can remove a Facebook account from the Group and then choose to “Block” that account from the group. An account blocked from the group won’t be able to find, see or join that group.
Sanction
- Description: The platform warns a user about content or actions that have violated platform policy. A warning may result in:
- An account strike: A strike on an account where the account faces further moderation actions (e.g., account suspension or ban) once a certain number of strikes have been acquired within an allotted time period.
- Requiring action from the user: The platform may require that the user remove a post, alter their content or profile, or take other measures within a certain period of time to avoid further moderation actions or reduce, e.g., to lift an account suspension.
Please note that some sanctions are reflective of cumulative flags against the user that may not be visible immediately but are logged.
- Owner: Community Moderator, Platform
Context: Feed, Forum, Direct Message
Orientation: Reactive - Example: Instagram: An account may receive an account warning when they have made multiple posts that violate Instagram community guidelines, notifying them that their account will be deleted if they continue to violate policy. The warning includes a list of previous policy violations the account has made.
Downrank
- Description: Content from a particular account, accounts with particular characteristics, or content with particular characteristics will be ranked lower in user feeds, recommended content, home screens, or front pages of social media platforms, making it less visible to users.
- Owner: Platform
Context: Feed, Forum
Orientation: Reactive, Proactive - Example: Twitter: Twitter announced July 2018 that tweets associated with bad actors were ranked lower with bad actors determined based on signals like account age, others accounts in its network (i.e., followers), and the moderation actions that users took against that account.
We collaborated with 53 partner organizations worldwide to design and carry out our 2024 elections projects. We extend special gratitude to our lead partners in Brazil, Mexico and Pakistan, whose work we highlight in this essay.
The 2024 elections projects featured in here would not have been possible without the generous support of these funders.
Footnotes
References
Authors
Words by
Kat is a researcher and a consultant specializing in online harassment and content moderation. She develops solutions for challenges that social media companies face in mitigating and understanding online harassment and other challenges in online moderation. She advises civil rights, mental health, and online safety organizations in matters of online harassment defense, educational resources, and online community health.
Kat is currently Content Moderation Lead for Meedan and a visiting researcher at UC Irvine in the Department of Informatics, studying emerging forms of community formation, norm development and maintenance, and moderation in online platforms. Much of this work is in support of technology-supported collective action for marginal and underserved communities.