The 2020 version of the Association of Computational Linguistics (ACL) conference, originally planned to take place in Seattle, happened virtually from July 5th to 10th. This year’s ACL, which is among the top publication venues for natural language processing researchers around the world, set many records as it was the first ACL event to ever be held virtually and had record numbers in volunteers, participants (overall and across continents) and accepted papers.
ACL 2020 featured many papers addressing challenges in misinformation, fact-checking and computational journalism, with Microsoft Research Asia having three automated fact-checking papers in the main conference alone. Overall, most of these papers introduced new and important challenges to the community and some proposed new models to solve or improve on existing tasks. Throughout the rest of this blog post, I’ll try to highlight some of these efforts.
Main Conference Papers
This year, there were a number of papers addressing misinformation and fact-checking from an NLP perspective. Here are some of the papers that caught my attention:
The paper by Qatar Computing Research Institute (QCRI) team led by Preslav Nakov, addresses the claim matching problem which has been of interest to Meedan and the fact-checking community. Technology is needed to meet the ever-increasing volume of claims coming in to fact-checkers around the globe. One important and effective solution to this problem is the detection of duplicate and previously fact-checked claims. This paper presents two claim-matching datasets and proposes a mix of information retrieval and text embedding based models as potential solutions for the claim-matching problem.
Extracting supporting evidence from discussions around a claim is the premise of this paper by Isabelle Augenstein’s group at the University of Copenhagen. Using a dataset of facts from Politifact, the authors derive explanations using a BERT-based sentence selection model for each fact-checked statement. They show that the extracted explanations can help automated fact-checking models improve on veracity prediction tasks by jointly learning to do both fact-checking and explanation extraction.
Work by authors from Columbia University, Facebook AI and George Washington University provides resources and models for studying and understanding challenges in automated fact-checking. They address three types of challenges that confuse current automated fact-verification systems: multiple propositions (the need to synthesize evidence from various sources), temporal reasoning, and ambiguity and lexical variation. Their solutions largely use pointer networks, BERT and reinforcement learning algorithms. The code and resource of this extensive work is publicly available.
This is a short paper on identifying political claims from text that proposes a method for increasing fairness to minority groups in these models. The simple but useful reference masking method presented by University of Stuttgart researchers on a German newspaper dataset shows claim-detection models can be fairer to minorities without significant performance drops.
This demonstration paper by QCRI won an honorable mention award in the demo category. The paper builds on top of their prior work on propaganda analysis and provides statistics on different propaganda techniques used in media outlets. You can also submit your propaganda of choice and see the ways it is trying to convince you.
For a more detailed picture of computational journalism research at ACL 2020, I also recommend looking at the following papers:
- MSR Asia Papers on Automated FactChecking
- Understanding the Language of Political Agreement and Disagreement in Legislative Texts
- NSTM: Real-Time Query-Driven News Overview Composition at Bloomberg
- "Who said it, and Why?" Provenance for Natural Language Claims
- DTCA: Decision Tree-based Co-Attention Networks for Explainable Claim Verification
- GCAN: Graph-aware Co-Attention Networks for Explainable Fake News Detection on Social Media
Often mini-conferences of their own, workshops at ACL maintain high quality proceedings and hold fascinating invited talks. The line up for workshops this year included a variety of topics ranging from neural text generation and translation to educational applications and conversational AI. In the remainder of this post, I try to summarize the workshops I found interesting and relevant to computational journalism.
The recent meeting of the workshop for fact extraction and verification included an excellent line-up of invited speakers and contributed work. This was the third edition of the excellent workshop organized by Andreas Vlachos, James Thorne, Oana Cocarascu, Christos Christodoulopoulos and Arpit Mittal. The opening talk by Isabelle Augenstein on the importance of explainable fact-checking first took a deep dive into extracting supporting evidence for fact veracity and then moved on to presenting work on claim understanding and what needs to be fact-checked. The overall sentiment within participants and speakers, as reiterated by Yejin Choi, was that we need more evidence-based and explanatory approaches in FEVER. Among other fascinating talks, Philip Resnik presented work on framing and how it affects our perception of news and facts. Through years of research on language framing, he portrayed that harmful content isn’t always false information and sometimes more harm can be done with true information framed to serve a purpose. In his closing remarks, he called for a more "FEVER"ish approach to tackling our misinformation problems, meaning that we should go beyond labeling content fake or true and look at more complicated analyses of the phenomenon.
This ACL workshop was formed as a response to the COVID-19 pandemic in a short timeframe and contained many interesting peer-reviewed COVID-related papers, including research by NYU and University of Waterloo on Rapidly Deploying a Neural Search Engine for the COVID-19 Open Research Dataset: Preliminary Thoughts and Lessons Learned. This system and paper provided a neural search engine for COVID-19 literature and is available at covidex.ai. With the large number of submissions in such short timelines, the workshop deserves a closer look at its proceedings.
You can find the full list of submitted papers here.
With uncertainty caused by the pandemic and upcoming elections, it’s both overwhelming and exciting to be involved in computational journalism research. It is refreshing to see the rapid community response to the pandemic and the evolution of computational misinformation research.
Presented research on misinformation and fact-checking at ACL 2020 seems to be moving away from "automatic" misinformation detection towards addressing specific questions more in demand of journalists and fact-checkers. Acknowledging that any potential solution to the current misinformation problems will require more than just "detection", the shift in focus in this year’s papers portrays the NLP community’s maturity in understanding the deeper, more important underlying requirements for a working solution. Meedan has long advocated for an approach that augments and empowers human fact-checkers and does not try to replace them, an orientation that is evident in our open-source fact-checking software, Check. There are many ways that NLP can assist fact-checkers, and even papers at ACL not specifically about misinformation such as text similarity, summarization, and cross-lingual alignment are relevant to tasks fact-checkers need to perform.
This community growth could not have been possible without interdisciplinary collaborations and publication venues with journalism experts and researchers. As brought up both during and in their followup questionnaire, the FEVER workshop was interested in community feedback on future direction and next steps. Building on top of the fruitful interdisciplinary collaboration and the need for a future plan, now seems like a good time for a stronger NLP and journalism alliance in planning our next moves against information disorders.
I would like to thank my PhD advisor Dr. Rada Mihalcea for providing me with financial support to attend the conference. My appreciation also goes to my mentor and manager Dr. Scott Hale, Director of Research at Meedan, and other fellow Meedanis for supporting my attendance in the conference during my summer internship at Meedan.
- Online conversations are heavily influenced by news coverage, like the 2022 Supreme Court decision on abortion. The relationship is less clear between big breaking news and specific increases in online misinformation.
- The tweets analyzed were a random sample qualitatively coded as “misinformation” or “not misinformation” by two qualitative coders trained in public health and internet studies.
- This method used Twitter’s historical search API
- The peak was a significant outlier compared to days before it using Grubbs' test for outliers for Chemical Abortion (p<0.2 for the decision; p<0.003 for the leak) and Herbal Abortion (p<0.001 for the decision and leak).
- All our searches were case insensitive and could match substrings; so, “revers” matches “reverse”, “reversal”, etc.
<p><a href="https://www.ashkankazemi.ir" title="Ashkan's personal website">Ashkan</a> is a natural language processing (NLP) intern at Meedan, contributing to research efforts in building fact-checking technology. He is also a PhD candidate at University of Michigan’s department of Computer Science and Engineering.</p>