Blocklists constitute a widely-used Internet security mechanism to filter undesired network traffic based on IP/domain reputation and behavior. Many blocklists are distributed in open source form by threat intelligence providers who aggregate and process input from their own sensors, but also from third-party feeds or providers. Despite their wide adoption, many open-source blocklist providers lack clear documentation about their structure, curation process, contents, dynamics, and inter-relationships with other providers. In this paper, we perform a transparency and content analysis of 2,093 free and open source blocklists with the aim of exploring those questions. To that end, we perform a longitudinal 6-month crawling campaign yielding more than 13.5M unique records. This allows us to shed light on their nature, dynamics, inter-provider relationships, and transparency. Specifically, we discuss how the lack of consensus on distribution formats, blocklist labeling taxonomy, content focus, and temporal dynamics creates a complex ecosystem that complicates their combined crawling, aggregation and use. We also provide observations regarding their generally low overlap as well as acute differences in terms of liveness (i.e., how frequently records get indexed and removed from the list) and the lack of documentation about their data collection processes, nature and intended purpose. We conclude the paper with recommendations in terms of transparency, accountability, and standardization.
Classification
subjects
Telecommunications
keywords
functional areas; internet connectivity and internet access services; methods; network monitoring and measurements; security management; service management