Electronic International Standard Serial Number (EISSN)
1872-941X
abstract
Frequency estimation is a common operation in big data processing. In many big data applications, computing the exact data frequency is not practical as it requires a large computational effort. Instead, a reasonably accurate estimate is commonly used. The Count Min Sketch (CMS) is a popular method for frequency estimation due to its simple implementation and low memory requirements. However, Single Event Transients (SETs) can affect the logic functions of the CMS. Based on our previous research, nearly half of the SETs will lead to CMS estimation errors in the worst case. Moreover, the most frequent elements are more likely to be underestimated when SETs occur, which is not acceptable in practice. Therefore, this paper proposes several fault-tolerant schemes to protect the CMS against SETs, especially to avoid underestimation. In particular, space redundancy-based schemes and partial time redundancy-based schemes are proposed, and the probabilities for underestimation and overestimation are analyzed theoretically. Experiments are performed to compare the reliability of the CMS protected by different schemes and validate the theoretical predictions. Finally, the selection of the best CMS parameters for practical applications is discussed with a comprehensive analysis of the overhead and performance of the different protection schemes.
Classification
keywords
count min sketch; fault tolerance; frequency estimation; single event transients