Panning for gold: Comparative analysis of cross-platform approaches for automated detection of political content in textual data Articles

authors

Makhortykh, Mykola
de León, Ernesto
Urman, Aleksandra
GIL LOPEZ, TERESA
Christner, Clara
Sydorova, Maryna
Adam, Silke
Maier, Michaela

published in

PLoS One Journal

publication date

November 2024

start page

e0312865

issue

11

volume

19

Digital Object Identifier (DOI)

https://doi.org/10.1371/journal.pone.0312865

full text

https://hdl.handle.net/10016/47193

International Standard Serial Number (ISSN)

1932-6203

abstract

To understand and measure political information consumption in the high-choice media
environment, we need new methods to trace individual interactions with online content and
novel techniques to analyse and detect politics-related information. In this paper, we report
the results of a comparative analysis of the performance of automated content analysis
techniques for detecting political content in the German language across different platforms.
Using three validation datasets, we compare the performance of three groups of detection
techniques relying on dictionaries, classic supervised machine learning, and deep learning.
We also examine the impact of different modes of data preprocessing on the low-cost implementations of these techniques using a large set (n = 66) of models. Our results show the limited impact of preprocessing on model performance, with the best results for less noisy data being achieved by deep learning- and classic machine learning-based models, in contrast to the more robust performance of dictionary-based models on noisy data.

subjects

Information Science