Signaling credibility: what we’re building at the W3C

While misinformation is not a new problem, what’s new is the context: as misinformation flows online, it’s pretty clear we need structures for signaling content credibility. The need for a common framework has long been in discussion, since this post published just after our experience at Meedan monitoring the 2016 election, to the early days of MisinfoCon. In a blog post we crafted in March 2017 before we founded the Credibility Coalition (CredCo), we argued for the importance of standards:

As concerns grow about the importance of establishing credible content, so do concerns about how this credibility is communicated on content traveling around the web. Simply rating an article as "credible" is not enough; we need to understand what parts of it are credible, how the conclusion about its credibility was reached, and how to communicate that credibility effectively…. Defining a set of standards for content credibility gives us a more effective way to talk about it, and, importantly, to make important decisions about how we share and display that content, regardless of what site the content appears on.

Since then, CredCo has grown as an initiative with Hacks/Hackers, and we helped form the W3C Credible Web Community group (CredWeb). Founded last year at the W3C’s annual plenary (yes, that means means we’re celebrating our birthday today!), at CredWeb pleased to release two documents for public review and commentary. These are both DRAFT documents, designed to be circulated and commented on by folks dedicating time to this issue.

Technological Approaches to Improving Credibility Assessment on the Web

This is a vision document that captures a general technical approach for improving credibility assessment online. It captures everything from terminology to threat models and stakeholders, and it lays out some of our core vision for the potential of these standards. It’s a hefty lift, and we recommend taking some time to look through it and offer commentary.

Here’s how it starts:

Can you tell, when looking at a random web page, whether you should trust it? When scanning a page of reviews or search results, do you know which matches come from legitimate sources and which are scams? When reading a news feed, can you tell which items ought to be believed and which are slanted or manipulative? Can you detect propaganda or outright lies? Perhaps most importantly, what happens when you inevitably guess wrong while making some of these credibility assessments, and you unknowingly share falsehoods with your community, helping to make them viral? What if you are misled into making bad decisions for yourself and the people you care about, with potentially disastrous consequences?

The Credible Web Community Group was formed at W3C, the organization which develops technical standards for the web, to look for technological approaches to this "credibility assessment" problem. It’s not that we think technology can solve every problem, especially ones as deeply human and complex as this one, but it seems likely that some technology is making matters worse and that certain designs could probably serve people better. For some of us, creating better approaches to credibility assessment seems like a good way to help.

Credibility Signals

After reading through the above, it helps to dive into specifics. What do signals look like? In this document, we have an outline of some draft signals. They range from signals around rhetoric — does it show negative or positive valence? — to a typology around clickbait and what it means for a title to accurately represent the body content.

Here’s how we frame the document:

This document is intended to support an ecosystem of interoperable credibility tools. These software tools, which may be components of familiar existing systems, will gather, process, and use relevant data to help people more accurately decide what information they can trust online and protect themselves from being misled. We expect that an open data-sharing architecture will facilitate efficient research and development, as well as an overall system which is more visibly trustworthy.

The document has three primary audiences:

Software developers and computer science researchers wanting to build systems which work with credibility data. For them, the document aims to be a precise technical specification, stating what they need for their software to interoperate with any other software which conforms to this specification.
People who work in journalism and want to review and contribute to this technology sphere, to help make sure it is beneficial and practical.
Non-computer-science researchers, interested in helping develop and improve the science behind this work.

We’re grateful to the many many participants who gave feedback on these documents in their early stages, and we’re looking forward to advancing them in our second year as a group. If you’re coming to CredCon, be sure to join our workshops and chat with CredWeb co-chair Sandro Hawke. Or, learn more about joining the group at credweb.org.

We collaborated with 53 partner organizations worldwide to design and carry out our 2024 elections projects. We extend special gratitude to our lead partners in Brazil, Mexico and Pakistan, whose work we highlight in this essay.