This report was written by Learning Lab, a team of writers and designers to help Open Technology Fund-supported projects produce content, and cross-posted at the Open Technology Fund website.

In today’s increasingly online world, social media has become a source for news—often in compressed visual form. Widely circulated images have the power to shape opinions and alter actions. Sometimes they’re funny. Sometimes they’re informative. But what happens when memes are used to intentionally transmit disinformation (information that is false and deliberately created to cause harm)? How are people supposed to know what’s true and what’s false?

If the distinction seems trivial, it’s not. Elections and lives hang in the balance of finding a way to solve this problem in an efficient and effective manner. That’s why Meedan received funding from the Open Technology Fund (OTF) in 2019 to help develop the Claims and Memes Database (CMDb)—a programmer-accessible repository of fact-checked claims and debunked visual misinformation intended to combat this new type of information control. The CMDb project was overseen by Meedan program manager Wafaa Heikal.

SETTING THE STAGE

The rise of the intentional use of misinformation online

Meedan has long worked at the forefront of improving the quality and equity of online information. One of the main ways it does so is through the organization’s flagship software, Check. This free and open-source collaborative workflow software helps users verify digital photos and text, build datasets, and structure open-source investigations. An early iteration of the software played a key role for citizen journalists during the 2011 Arab Spring protests. Five years later, over 1,000 journalists used Check’s collaborative verification structure to help monitor and detect voting issues in the 2016 U.S. presidential election. Today, over 250 workspaces created by more than 1,750 users around the world now rely on Check to fact-check and investigate online information. Their efforts, and the stories they have helped produce, have made a tangible difference in the way global citizen media is relied upon and viewed.

But by 2019, with repressive regimes ramping up their own efforts to weaponize visual misinformation as a tool for social network manipulation, voter suppression, and censorship, it became clear that the Check platform required an upgrade. Journalists needed to be able to more efficiently verify or debunk memes. And citizens needed to be able to receive that fact-checked information in a more expeditious manner. So, with the funding from OTF’s Core Infrastructure Fund, Meedan set out to build the CMDb and enhance Check’s infrastructure.

TRUE OR FALSE—EXPLAINED

Inside Meedan’s revolutionary verification software Check

Before diving into the details of the CMDb project, it’s best to understand how Check works.

The software helps journalists, researchers, and civil society organizations gather and verify digital information through a uniquely customizable process. After a Check team creates their project workspace, they set-up how they will receive submissions for verification. These submissions can be images, text, a combination of images and text (memes), or videos. For example in India, the Checkpoint project allowed for submissions by providing WhatsApp users with a phone number to which they could text rumors, claims, or memes for verification. Other submission processes include collaborating with local partners in North Africa and West Asia to crowdsource investigative evidence of human rights violations (to be sent to the UC Berkeley Human Rights Investigations Lab) or working with students and journalists to help monitor elections in Tunisia.

Once these submissions are uploaded to Check, the verification process begins. Throughout it all, Meedan works with partners to shape their tasks and workflows based on the project’s specific needs. During verification, users (such as journalists or human rights researchers) review a submission and work to determine the claim’s origin and accuracy. Annotation options allow reviewers to provide in-depth analysis and tags. After all tasks are completed for a submission, a verdict is determined (these are customizable and often include correct, incorrect, or misleading). In certain situations, this verdict can then be transmitted back to the individual who submitted the claim.

ENHANCED TECHNOLOGY

CMDb developments streamline the verification process for thousands of Check users

Even just a brief overview of the verification process reveals how incredibly time-consuming it can be. With thousands of claims being submitted to election-monitoring projects, how can reviewers possibly verify each one?

Short answer: They can’t. And the reviewers can’t provide the results fast enough to combat voter suppression and protect electoral integrity. But there is a way to streamline the process—that’s why Meedan applied to OTF to develop the CMDb.

The technology built for the CMDb allowed Meedan to develop better workflows for annotating claims and images on Check. These developments will help improve research and machine-learning processes in both the short and long term. OTF’s funding also allowed the Meedan team to build new features to better respond to memetic misinformation—like Check’s report designer, which enables journalists to quickly develop compelling visual responses to help stem the spread of misinformation online. New tools for processing content, like importing spreadsheets and ingesting content submitted from closed networks, were also developed with the support.

In addition to enhancing user ease, the CMDb project significantly improved Meedan’s image and claim similarity infrastructure. The newly developed claim/image matching technology enables similar images and claims to be clustered—making a verifier’s work far easier. By identifying secondary items that are related to a submitted claim, this matching technology allows for similar images and memes to automatically receive the same tagging, annotation, and verdict as the original image or meme. This technological advancement produced immediate efficiencies. Journalists are now able to cluster hundreds of similar images and claims and respond to misinformation in a collective manner (saving massive amounts of time and effort).

EXPANDED OUTREACH

Immediate application of CMDb technology and OTF funds enable Meedan to expand its global impact

Meedan quickly put the newly developed CMDb technology to use. The streamlined content workflows and image matching technology helped support dozens of newsrooms and fact-checking organizations during the 2019 elections in India, Indonesia, and the Philippines.

India: In response to misinformation circulating in closed networks in India, Meedan partnered with Pop-Up Newsroom, WhatsApp, and Proto (a civic media studio) on the massive and innovative research endeavor, Checkpoint. The effort used Check Message, Meedan’s WhatsApp integration, to conduct research to better understand the ways by which misinformation spreads on WhatsApp.

Indonesia: Google News Initiative worked in collaboration with Meedan and the news consultancy Fathm to help journalism schools, newsrooms, and fact-checking organizations in Indonesia collaborate and verify claims during and leading up to the 2019 elections. The initiative, which resembled Meedan’s 2016 Electionland effort in the United States, was headed by the fact-checking platform Cekfakta.

Philippines: Meedan teamed up with local partners in the Philippines to create Tsek.ph—a new collaborative fact-checking journalistic initiative (Tsek means "Check" in Filipino). The project brought together 11 news organizations and three universities in an effort to counter disinformation and provide verified information related to the country’s midterm elections. Throughout the year, Meedan also continued its work with VeraFiles and Rappler, two pioneers of fact-checking and verification in the Philippines.

Support from OTF in 2019 also enabled Meedan to lead various informational workshops around the globe showcasing the CMDb project and providing journalists with a better understanding of the power—for good, and for bad—of global memes and claims. Various aspects of the CMDb’s research were presented by Meedan’s Director of Research, Scott Hale, at the International Multimodal Communication Centre, University of Oxford, the Okinawa Institute for Science and Technology, and the CCSS School on Computational Social Science at the University of Kobe.

Meedan’s Director of Engineering, Caio Almeida, presented a research paper, Text Similarity Using Word Embeddings to Classify Fake News, at the Workshop on Digital Humanities and Natural Language Processing collocated with the International Conference on the Computational Processing of Portuguese (PROPOR 2020). The paper describes the new, open-source architecture Meedan developed to use word embeddings at scale with Elasticsearch in order to analyse the similarity of short textual items and identify near duplicates. A second research paper evaluating the text and image matching performance is currently in preparation.

While contributing to research efforts, Meedan also provided input to Facebook, Twitter, and Google regarding the best approaches to the use of misleading manipulated and synthetic imagery on social media platforms. This feedback was taken into consideration when Twitter drafted new guidelines on manipulated media. Additionally, as part of the W3C Credible Web Community Group, Meedan contributed to the development of a list of indicators for content credibility, including images. And, as part of the Credibility Coalition, members of Meedan helped form the Responding to Memes and Images working group to further address credibility indicators for images and memes.

DATA DRAWBACKS

Complications emerge surrounding data sharing and sensitivity

Despite the significant progress achieved during the CMDb’s first year, the project was not without its difficulties. Obstacles arose pertaining to the data collected by partners using Check’s software. Ideally, the images and claims that comprise the CMDb would be able to be shared publicly. But doing so creates a host of ethical concerns and legal issues regarding terms of service, varying privacy policies, and the very real threat of revealing personal information. Meedan worked with Harvard Law School’s Cyberlaw Clinic on terms that enable us to conduct research on this data while protecting personally identifiable information and user privacy. Options such as data trusts, in which a legally separate trust would own the underlying data, are also being explored. These efforts are critical, as finding a way to make the database accessible to other users—and one day even the general public—would help further curb the spread of misinformation online.

Another obstacle that emerged during the project was psychological. Due to the sensitive nature of some of the investigations conducted using Check, images can be highly graphic and therefore traumatizing for people to review. At this time, image and text matching technology can only reduce—not eliminate—the need for human judgement. That means in cases of human rights violations, an individual must still personally view the submitted image. Such work involves serious mental health challenges and risks like PTSD. Meedan is working to better understand these problems and provide the necessary resources to support the journalists and fact-checkers in these roles. Although new developments, like the option to flag and filter unwanted images, provide possible mitigation options—this issue needs more attention and may not be able to be solved by technology alone.

LOOKING TO THE FUTURE

What’s next for Meedan, Check, and the CMDb?

Meedan’s work to build a publicly accessible global claims and memes database is far from over. Data sharing issues must be resolved, language barriers must be overcome, and the matching technology must continue to be improved. Nonetheless, the first year of the CMDb project was a resounding success. Technology developed for the project is already in use, saving critical time and effort for journalists and human rights activists in the verification process. Much was also learned about the ways by which disinformation spreads across the internet. In response, an infrastructure was established that one day will be able to prevent debunked claims and memes from traveling across countries and time (known as "zombie misinformation," these false rumors originate in one context and continue to permeate the internet despite being "killed off"—debunked—in their original country/time).

For now, Meedan has focused its efforts on combating the spread of misinformation related to the COVID-19 pandemic. Like the virus itself, misinformation pertaining to "cures" or "treatments" is spreading around the world and causing real harm. Meedan is committed to making its full toolkit and code available to help improve trust, information quality, and research capabilities. The organization’s development of the CMDb, and its history of fighting to advance accurate information online, make it uniquely positioned to help during this critical time. As a critical first step, Meedan introduced the Check Bot—a customizable COVID-19 WhatsApp bot for fact-checkers tackling the wave of misinformation surrounding the virus. The COVID-19 Expert Database, which contains a database of expert-sourced COVID-19 information, has been built to help support verification processes.

Now, as repressive regimes continue to expand their use of misinformation online, and COVID-19 rumors threaten to wreak havoc on public health, Meedan’s work is more important than ever. Memes are no longer a laughing matter. Lives truly hang in the balance of determining what’s true and what’s false.

Tags
Data
Organization
Technology
Footnotes
  1. Online conversations are heavily influenced by news coverage, like the 2022 Supreme Court decision on abortion. The relationship is less clear between big breaking news and specific increases in online misinformation.
  2. The tweets analyzed were a random sample qualitatively coded as “misinformation” or “not misinformation” by two qualitative coders trained in public health and internet studies.
  3. This method used Twitter’s historical search API
  4. The peak was a significant outlier compared to days before it using Grubbs' test for outliers for Chemical Abortion (p<0.2 for the decision; p<0.003 for the leak) and Herbal Abortion (p<0.001 for the decision and leak).
  5. All our searches were case insensitive and could match substrings; so, “revers” matches “reverse”, “reversal”, etc.
References
Authors
Words by
Learning Lab
Words by
Organization
Published on
June 11, 2020
April 20, 2022