Machine-Learning based analysis and classification of Android malware signatures Articles uri icon

publication date

  • August 2019

start page

  • 295

end page

  • 305

volume

  • 97

International Standard Serial Number (ISSN)

  • 0167-739X

Electronic International Standard Serial Number (EISSN)

  • 1872-7115

abstract

  • Multi-scanner Antivirus (AV) systems are often used for detecting Android malware since the same piece of software can be checked against multiple different AV engines. However, in many cases the same software application is flagged as malware by few AV engines, and often the signatures provided contradict each other, showing a clear lack of consensus between different AV engines. This work analyzes more than 80 thousand Android applications flagged as malware by at least one AV engine, with a total of almost 260 thousand malware signatures. In the analysis, we identify 41 different malware families, we study their relationships and the relationships between the AV engines involved in such detections, showing that most malware cases belong to either Adware abuse or really dangerous Harmful applications, but some others are unspecified (or Unknown). With the help of Machine Learning and Graph Community Algorithms, we can further combine the different AV detections to classify such Unknown apps into either Adware or Harmful risks, reaching F1-score above 0.84.

keywords

  • multi-scan antivirus; android malware; security; machine learning; malware classification; graph community algorithms