This module implements Cormode-Muthukrishnan CountMin sketches on integer values, implemented as a user-defined aggregate. It also provides scalar functions over the sketches to produce approximate counts, order statistics, and histograms.
cmsketch( col_name )
cmsketch
. cmsketch_count( cmsketch, p )
cmsketch_rangecount( cmsketch, m, n )
cmsketch_centile( cmsketch, k, count )
cmsketch_centile(cmsketch,50,count)
. cmsketch_median( cmsketch, count )
cmsketch_width_histogram( cmsketch, min, max, n )
cmsketch_depth_histogram( cmsketch, n )
CREATE TABLE data(class INT, a1 INT); INSERT INTO data SELECT 1,1 FROM generate_series(1,10000); INSERT INTO data SELECT 1,2 FROM generate_series(1,15000); INSERT INTO data SELECT 1,3 FROM generate_series(1,10000); INSERT INTO data SELECT 2,5 FROM generate_series(1,1000); INSERT INTO data SELECT 2,6 FROM generate_series(1,1000);
CREATE TABLE sketch_count AS SELECT class, cmsketch_count( cmsketch( a1 ), 2 ) FROM data GROUP BY data.class; SELECT * FROM sketch_count;Result:
class | cmsketch_count ------+---------------- 2 | 0 1 | 15000 (2 rows)
SELECT class, cmsketch_rangecount( cmsketch(a1), 3, 6 ) FROM data GROUP BY data.class;Result:
class | cmsketch_rangecount ------+--------------------- 2 | 2000 1 | 10000 (2 rows)
SELECT cmsketch_centile( cmsketch( a1 ), 90, count(*) ) FROM data;Result:
cmsketch_centile ----------------- 3 (1 row)
SELECT cmsketch_width_histogram( cmsketch( a1 ), 0, 10, 2 ) FROM data;Result:
cmsketch_width_histogram ----------------------------------- [[0L, 4L, 35000], [5L, 10L, 2000]] (1 row)
SELECT cmsketch_depth_histogram( cmsketch( a1 ), 2 ) FROM data;Result:
cmsketch_depth_histogram ---------------------------------------------------------------------- [[-9223372036854775807L, 1, 10000], [2, 9223372036854775807L, 27000]] (1 row)
[1] G. Cormode and S. Muthukrishnan. An improved data stream summary: The count-min sketch and its applications. LATIN 2004, J. Algorithm 55(1): 58-75 (2005) . http://dimacs.rutgers.edu/~graham/pubs/html/CormodeMuthukrishnan04CMLatin.html
[2] G. Cormode. Encyclopedia entry on 'Count-Min Sketch'. In L. Liu and M. T. Ozsu, editors, Encyclopedia of Database Systems, pages 511-516. Springer, 2009. http://dimacs.rutgers.edu/~graham/pubs/html/Cormode09b.html
File sketch.sql_in documenting the SQL functions.
Module grp_quantile for a different implementation of quantile function.