Avoiding Flow Size Overestimation in the Count-Min Sketch with Bloom Filter Constructions

The Count-Min sketch is the most popular data structure for flow size estimation, a basic measurement task required in many networks. Typically the number of potential flows is large, eliminating the possibility to maintain a counter per flow within memory of high access rate. The Count-Min sketch is probabilistic and relies on mapping each flow to multiple counters through hashing. This implies potential estimation error such that the size of a flow is overestimated when all flow counters are shared with other flows with observed traffic. Although the error in the estimation can be probabilistically bounded, many applications can benefit from accurate flow size estimation and the guarantee to completely avoid overestimation. We describe a design of the Count-Min sketch with accurate estimations whenever the number of flows with observed traffic follows a known bound, regardless of the identity of these particular flows. We make use of a concept of Bloom filters that avoid false positives and indicate the limitations of existing Bloom filter designs towards accurate size estimation. We suggest new Bloom filter constructions that allow scalability with the support for a larger number of flows and explain how these can imply the unique guarantee of accurate flow size estimation in the well known Count-Min sketch.

Avoiding Flow Size Overestimation in the Count-Min Sketch with Bloom Filter Constructions Articles