Electronic International Standard Serial Number (EISSN)
1558-2558
abstract
In many applications, there is a need to estimate the frequency of elements. For example, in networking to know the number of packets of each flow. This poses a challenge as the number of flows and packets per second can be very large and therefore an exact count would require a large amount of fast memory. In those cases, an alternative is to use data structures, commonly referred to as sketches, that provide an estimate of the frequency of elements using a much smaller amount of memory. For example, the Count Min Sketch (CMS) hashes each element to a few counters and returns as estimate the minimum value among them. The CMS in general overestimates the frequency of an element as other elements may also map to the same counters and increment them. In this letter, fingerprint counting, a scheme to reduce the counter overestimation is presented and evaluated. The main idea is to add a fingerprint to the counters and use it to check if consecutive increments to a counter belong to the same element. When they do not, the counters can be incremented by half a packet instead of a full packet thus reducing the overestimation. The evaluation results show that the proposed scheme is able to reduce the overestimation and improve the CMS accuracy. In more detail, the overestimation is reduced by more than 20% in many of the configurations tested reaching values over 50% in some cases. A scheme to encode the fingerprints in the counters that practically eliminates the additional memory required for the fingerprints is also presented. Therefore, the improvement in the accuracy is achieved with a negligible impact on the size of the memory needed to implement the CMS.