Designs for efficient low power cardinality and similarity sketches by Two-Step Hashing (TSH)

authors

Li, Jie
REVIRIEGO VASALLO, PEDRO
SHANSHAN, LIU
Xiao, Liyi
Lombardi, Fabrizio

published in

Integration Journal

publication date

November 2021

start page

246

end page

253

volume

81

Digital Object Identifier (DOI)

https://doi.org/10.1016/j.vlsi.2021.08.003

full text

https://www.scopus.com/inward/record.uri?eid=2-s2.0-85112294625

abstract

In many computing applications, there is a need to estimate different features of large data sets, such as the number of distinct elements or the similarity with other data sets. This can be efficiently implemented using probabilistic data structures known as sketches; for example, Hyperloglog is widely used for cardinality estimate and Minhash for similarity estimate. Many of these sketches rely on computing hash functions on the elements and counting the number of consecutive leading zeros (or ones) to identify the minimum (or maximum) hash value for data updating. This functionality usually accounts for the most demanding part of the sketch implementation; an efficient and low power hash computation and leading zero counting are therefore required for high performance. A novel but important observation is exploited in this paper; namely the trailing hash bits are rarely used in these sketches (e.g., the leading zeros counting stops once the first one is found, which usually occurs after a few bits). Based on this feature, a so-called Two-Step Hashing (TSH) scheme is proposed in this paper to significantly reduce the power consumption. Therefore rather than computing all hash bits (as in the traditional scheme), TSH computes a few leading hash bits in a first step and the trailing hash bits in the second step, that is only required when the leading bits are all zeros (or ones). Hence, in most cases only a fraction of the hash bits is computed, so significantly reducing the power dissipation of hash computations when they are used to check the number of consecutive leading zeros, or the minimum value. The use of TSH for sketches with both single and multiple pipeline implementations are considered; evaluation results based on HyperLogLog (HLL) as a sketch with a Cyclic Redundancy Check hash function show that TSH saves up to 75.9 % (78.7 %) power dissipation in a single (multiple) pipeline implementation.

Designs for efficient low power cardinality and similarity sketches by Two-Step Hashing (TSH) Articles

Overview

authors

published in

publication date

start page

end page

volume

Digital Object Identifier (DOI)

full text

abstract

Classification

subjects

keywords