FacTeR-Check: Semi-automated fact-checking through semantic similarity and natural language inference

Bookmark (0)
Please login to bookmark Close

Our society produces and shares overwhelming amounts of information through Online Social Net-works (OSNs). Within this environment, misinformation and disinformation have proliferated, becom-ing a public safety concern in most countries. Allowing the public and professionals to efficiently find reliable evidence about the factual veracity of a claim is a crucial step to mitigate this harmful spread. To this end, we propose FacTeR-Check, a multilingual architecture for semi-automated fact-checking and hoaxes propagation analysis that can be used to implement applications designed for both the general public and for fact-checking organisations. FacTeR-Check implements three different modules relying on the XLM-RoBERTa Transformer architecture to evaluate semantic similarity, to calculate natural language inference and to build search queries through automatic keywords extraction and Named-Entity Recognition. The three modules have been validated using state-of-the-art benchmark datasets, exhibiting good performance in all of them. Besides, FacTeR-Check is employed to collect and label a dataset, called NLI19-SP, composed of more than 40,000 tweets supporting or denying 60 hoaxes related to COVID-19, released publicly. Finally, an analysis of the data collected in this dataset is provided, which allows to obtain a deep insight of how disinformation operated during the COVID-19 pandemic in Spanish-speaking countries.

​Our society produces and shares overwhelming amounts of information through Online Social Net-works (OSNs). Within this environment, misinformation and disinformation have proliferated, becom-ing a public safety concern in most countries. Allowing the public and professionals to efficiently find reliable evidence about the factual veracity of a claim is a crucial step to mitigate this harmful spread. To this end, we propose FacTeR-Check, a multilingual architecture for semi-automated fact-checking and hoaxes propagation analysis that can be used to implement applications designed for both the general public and for fact-checking organisations. FacTeR-Check implements three different modules relying on the XLM-RoBERTa Transformer architecture to evaluate semantic similarity, to calculate natural language inference and to build search queries through automatic keywords extraction and Named-Entity Recognition. The three modules have been validated using state-of-the-art benchmark datasets, exhibiting good performance in all of them. Besides, FacTeR-Check is employed to collect and label a dataset, called NLI19-SP, composed of more than 40,000 tweets supporting or denying 60 hoaxes related to COVID-19, released publicly. Finally, an analysis of the data collected in this dataset is provided, which allows to obtain a deep insight of how disinformation operated during the COVID-19 pandemic in Spanish-speaking countries. Read More