webLyzard Publications

Classifying News Media Coverage for Corruption Risks Management with Deep Learning and Web Intelligence

Weichselbraun, Albert and Hörler, Sandro and Hauser, Christian and Havelka, Anina (2020) Classifying News Media Coverage for Corruption Risks Management with Deep Learning and Web Intelligence. In: 10th International Conference on Web Intelligence, Mining and Semantics (WIMS 2020).

[thumbnail of Classifying News Media Coverage for Corruption Risks Management with Deep Learning and Web Intelligence] PDF (Classifying News Media Coverage for Corruption Risks Management with Deep Learning and Web Intelligence) - Accepted Version
1MB

Abstract

A substantial number of international corporations have been affected by corruption. The research presented in this paper introduces the Integrity Risks Monitor, an analytics dashboard that applies Web Intelligence and Deep Learning to english and german-speaking documents for the task of (i) tracking and visualizing past corruption management gaps and their respective impacts, (ii) understanding present and past integrity issues, (iii) supporting companies in analyzing news media for identifying and mitigating integrity risks. Afterwards, we discuss the design, implementation, training and evaluation of classification components capable of identifying English documents covering the integrity topic of corruption. Domain experts created a gold standard dataset compiled from Anglo-American media coverage on corruption cases that has been used for training and evaluating the classifier. The experiments performed to evaluate the classifiers draw upon popular algorithms used for text classification such as Naïve Bayes, Support Vector Machines (SVM) and Deep Learning architectures (LSTM, BiLSTM, CNN) that draw upon different word embeddings and document representations. They also demonstrate that although classical machine learning approaches such as Naïve Bayes struggle with the diversity of the media coverage on corruption, state-of-the art Deep Learning models perform sufficiently well in the project's context.

Item Type:Conference or Workshop Item (Paper)
Uncontrolled Keywords:Web Intelligence, Corruption Risk Management, Text Classification, Text Analytics, Deep Neural Networks, Word Embeddings
Subjects:Q Science > QA Mathematics > QA75 Electronic computers. Computer science
Divisions:Faculty of Engineering, Science and Mathematics > School of Electronics and Computer Science
ID Code:115
Deposited By: Dr Albert Weichselbraun
Deposited On:21 Sep 2020 17:59
Last Modified:21 Sep 2020 18:02

Repository Staff Only: item control page