Security of HyperLogLog (HLL) Cardinality Estimation: Vulnerabilities and Protection

Count distinct or cardinality estimates are widely used in network monitoring for security. They can be used, for example, to detect the malware spread, network scans, or a denial of service attack. There are many algorithms to estimate cardinality. Among those, HyperLogLog (HLL) has been one of the most widely adopted. HLL is simple, provides good cardinality estimates over a wide range of values, requires a small amount of memory, and allows merging of estimates from different sources. However, as HLL is increasingly used to detect attacks, it can itself become the target of attackers that want to avoid being detected. To the best of our knowledge, the security of HLL has not been studied before. In this letter, we take an initial step in its study by first exposing a vulnerability of HLL that allows an attacker to manipulate its estimate. This shows the importance of designing secure HLL implementations. In the second part of the letter, we propose an efficient protection technique to detect and avoid the HLL manipulation. The results presented strongly suggest that the security of HLL should be further studied given that it is widely adopted in many networking and computing applications.

Security of HyperLogLog (HLL) Cardinality Estimation: Vulnerabilities and Protection Articles