Automated identification of network anomalies and their causes with interpretable machine learning: The CIAN methodology and TTrees implementation Articles uri icon

authors

  • Moulay, Mohamed
  • Leiva, Rafael Garcia
  • Rojo Maroni, Pablo J.
  • Diez, Fernando
  • MANCUSO, VINCENZO
  • Fernandez Anta, Antonio

publication date

  • July 2022

start page

  • 327

end page

  • 348

volume

  • 191

International Standard Serial Number (ISSN)

  • 0140-3664

Electronic International Standard Serial Number (EISSN)

  • 1873-703X

abstract

  • Leveraging machine learning (ML) for the detection of network problems dates back to handling call-dropping issues in telephony. However, troubleshooting cellular networks is still a manual task, assigned to experts who monitor the network around the clock. To help in this task we present CIAN (from Causality Inference of Anomalies in Networks), a practical and interpretable ML methodology, which we implement in the form of a software tool named TTrees (from Troubleshooting Trees). We have designed CIAN to automate the identification of the causes of performance anomalies in cellular networks. Our methodology is unsupervised and combines multiple ML algorithms (e.g., decision trees and clustering) and Kolmogorov complexity-inspired data analysis tools that we have developed for this work. CIAN can be used with small volumes of data and is quick at training. Our experiments use diverse data sets obtained from measurements in operational commercial mobile networks. They show that the TTrees implementation of CIAN can automatically identify and accurately classify network anomalies (e.g., cases for which a network low performance is not apparently justified by operational conditions) training with just a few hundreds of data samples. The resulting information hence enables precise troubleshooting actions. In particular, we showcase how TTrees can be flexibly used to monitor the performance of TCP and QUIC protocols when they are adopted to serve mobile users.

subjects

  • Telecommunications

keywords

  • anomaly detection; feature selection; interpretable machine learning; troubleshooting