Germanophone test datasets and how their performance is affected by different modes of text preprocessing. In addition to individual models, we examine the performance of 330 ensemble models combining the above-mentioned approaches for the dataset with a particularly high volume of noise. Our findings demonstrate that the DL models, in combination with more computationally intense forms of preprocessing, show the best performance among the individual models, but it remains suboptimal in the case of more noisy datasets. While the use of ensemble models shows some improvement for specific modes of preprocessing, overall, it mostly remains on par with individual DL models, thus stressing the challenging nature of computational detection of PRR content.