Posted On: Oct 2, 2020
Amazon Redshift introduces support for natively storing and processing HyperLogLog (HLL) sketches. HyperLogLog is a novel algorithm that efficiently estimates the approximate number of distinct values in a data set. HLL sketch is a construct that encapsulates the information about the distinct values in the data set. You can use HLL sketches to achieve significant performance benefits for queries that compute approximate cardinality over large data sets, with an average relative error between 0.01–0.6%.
Redshift provides a first class datatype HLLSKETCH and associated SQL functions to generate, persist, and combine HyperLogLog sketches. The Amazon Redshift's HyperLogLog capability uses bias correction techniques and provides high accuracy with low memory footprint. The Amazon Redshift data type HLLSKETCH can be used to store HLL sketch values in a table. Additionally, Amazon Redshift supports operations that can be applied to HLLSKETCH values using aggregate and scalar functions. You can use these functions to create HLL sketches, extract the cardinality of a HLL sketch, or combine multiple sketch values.
The support for HLL sketches in Redshift is included with Redshift release version 1.0.19097 or later. This functionality is available to new and existing customers at no additional cost. To get started and learn more, visit our documentation. Refer to the AWS Region Table for Amazon Redshift availability.