The 2020 version of the Association of Computational Linguistics (ACL) conference, originally planned to take place in Seattle, happened virtually from July 5th to 10th. This year’s ACL, which is among the top publication venues for natural language processing researchers around the world, set many records as it was the first ACL event to ever be held virtually and had record numbers in volunteers, participants (overall and across continents) and accepted papers.

ACL 2020 featured many papers addressing challenges in misinformation, fact-checking and computational journalism, with Microsoft Research Asia having three automated fact-checking papers in the main conference alone. Overall, most of these papers introduced new and important challenges to the community and some proposed new models to solve or improve on existing tasks. Throughout the rest of this blog post, I’ll try to highlight some of these efforts.

Main Conference Papers

This year, there were a number of papers addressing misinformation and fact-checking from an NLP perspective. Here are some of the papers that caught my attention:

That is a Known Lie: Detecting Previously Fact-Checked Claims

The paper by Qatar Computing Research Institute (QCRI) team led by Preslav Nakov, addresses the claim matching problem which has been of interest to Meedan and the fact-checking community. Technology is needed to meet the ever-increasing volume of claims coming in to fact-checkers around the globe. One important and effective solution to this problem is the detection of duplicate and previously fact-checked claims. This paper presents two claim-matching datasets and proposes a mix of information retrieval and text embedding based models as potential solutions for the claim-matching problem.

Generating Fact Checking Explanations

Extracting supporting evidence from discussions around a claim is the premise of this paper by Isabelle Augenstein’s group at the University of Copenhagen. Using a dataset of facts from Politifact, the authors derive explanations using a BERT-based sentence selection model for each fact-checked statement. They show that the extracted explanations can help automated fact-checking models improve on veracity prediction tasks by jointly learning to do both fact-checking and explanation extraction.

DeSePtion: Dual Sequence Prediction and Adversarial Examples for Improved Fact-Checking

Work by authors from Columbia University, Facebook AI and George Washington University provides resources and models for studying and understanding challenges in automated fact-checking. They address three types of challenges that confuse current automated fact-verification systems: multiple propositions (the need to synthesize evidence from various sources), temporal reasoning, and ambiguity and lexical variation. Their solutions largely use pointer networks, BERT and reinforcement learning algorithms. The code and resource of this extensive work is publicly available.

Masking Actor Information Leads to Fairer Political Claims Detection

This is a short paper on identifying political claims from text that proposes a method for increasing fairness to minority groups in these models. The simple but useful reference masking method presented by University of Stuttgart researchers on a German newspaper dataset shows claim-detection models can be fairer to minorities without significant performance drops.

Prta: A System to Support the Analysis of Propaganda Techniques in the News

This demonstration paper by QCRI won an honorable mention award in the demo category. The paper builds on top of their prior work on propaganda analysis and provides statistics on different propaganda techniques used in media outlets. You can also submit your propaganda of choice and see the ways it is trying to convince you.

For a more detailed picture of computational journalism research at ACL 2020, I also recommend looking at the following papers:

Workshops

Often mini-conferences of their own, workshops at ACL maintain high quality proceedings and hold fascinating invited talks. The line up for workshops this year included a variety of topics ranging from neural text generation and translation to educational applications and conversational AI. In the remainder of this post, I try to summarize the workshops I found interesting and relevant to computational journalism.

FEVER Workshop

The recent meeting of the workshop for fact extraction and verification included an excellent line-up of invited speakers and contributed work. This was the third edition of the excellent workshop organized by Andreas Vlachos, James Thorne, Oana Cocarascu, Christos Christodoulopoulos and Arpit Mittal. The opening talk by Isabelle Augenstein on the importance of explainable fact-checking first took a deep dive into extracting supporting evidence for fact veracity and then moved on to presenting work on claim understanding and what needs to be fact-checked. The overall sentiment within participants and speakers, as reiterated by Yejin Choi, was that we need more evidence-based and explanatory approaches in FEVER. Among other fascinating talks, Philip Resnik presented work on framing and how it affects our perception of news and facts. Through years of research on language framing, he portrayed that harmful content isn’t always false information and sometimes more harm can be done with true information framed to serve a purpose. In his closing remarks, he called for a more "FEVER"ish approach to tackling our misinformation problems, meaning that we should go beyond labeling content fake or true and look at more complicated analyses of the phenomenon.

NLP COVID-19 Workshop

This ACL workshop was formed as a response to the COVID-19 pandemic in a short timeframe and contained many interesting peer-reviewed COVID-related papers, including research by NYU and University of Waterloo on Rapidly Deploying a Neural Search Engine for the COVID-19 Open Research Dataset: Preliminary Thoughts and Lessons Learned. This system and paper provided a neural search engine for COVID-19 literature and is available at covidex.ai. With the large number of submissions in such short timelines, the workshop deserves a closer look at its proceedings.

You can find the full list of submitted papers here.

Final Thoughts

With uncertainty caused by the pandemic and upcoming elections, it’s both overwhelming and exciting to be involved in computational journalism research. It is refreshing to see the rapid community response to the pandemic and the evolution of computational misinformation research.

Presented research on misinformation and fact-checking at ACL 2020 seems to be moving away from "automatic" misinformation detection towards addressing specific questions more in demand of journalists and fact-checkers. Acknowledging that any potential solution to the current misinformation problems will require more than just "detection", the shift in focus in this year’s papers portrays the NLP community’s maturity in understanding the deeper, more important underlying requirements for a working solution. Meedan has long advocated for an approach that augments and empowers human fact-checkers and does not try to replace them, an orientation that is evident in our open-source fact-checking software, Check. There are many ways that NLP can assist fact-checkers, and even papers at ACL not specifically about misinformation such as text similarity, summarization, and cross-lingual alignment are relevant to tasks fact-checkers need to perform.

This community growth could not have been possible without interdisciplinary collaborations and publication venues with journalism experts and researchers. As brought up both during and in their followup questionnaire, the FEVER workshop was interested in community feedback on future direction and next steps. Building on top of the fruitful interdisciplinary collaboration and the need for a future plan, now seems like a good time for a stronger NLP and journalism alliance in planning our next moves against information disorders.

Acknowledgements

I would like to thank my PhD advisor Dr. Rada Mihalcea for providing me with financial support to attend the conference. My appreciation also goes to my mentor and manager Dr. Scott Hale, Director of Research at Meedan, and other fellow Meedanis for supporting my attendance in the conference during my summer internship at Meedan.

Tags
Research
Footnotes
  1. Online conversations are heavily influenced by news coverage, like the 2022 Supreme Court decision on abortion. The relationship is less clear between big breaking news and specific increases in online misinformation.
  2. The tweets analyzed were a random sample qualitatively coded as “misinformation” or “not misinformation” by two qualitative coders trained in public health and internet studies.
  3. This method used Twitter’s historical search API
  4. The peak was a significant outlier compared to days before it using Grubbs' test for outliers for Chemical Abortion (p<0.2 for the decision; p<0.003 for the leak) and Herbal Abortion (p<0.001 for the decision and leak).
  5. All our searches were case insensitive and could match substrings; so, “revers” matches “reverse”, “reversal”, etc.
References
Authors
Words by

<p><a href="https://www.ashkankazemi.ir" title="Ashkan's personal website">Ashkan</a> is a natural language processing (NLP) intern at Meedan, contributing to research efforts in building fact-checking technology. He is also a PhD candidate at University of Michigan’s department of Computer Science and Engineering.</p>

Ashkan Kazemi
Words by
Organization
Published on
July 23, 2020
April 20, 2022