Counting Efficiently with Bounter pt. 2: CountMinSketch

Filip Štefaňák Machine Learning, Open Source 2 Comments

In my previous post on the new open source Python Bounter library we discussed how we can use its HashTable to quickly count approximate item frequencies in very large item sequences. Now we turn our attention to the second algorithm in Bounter, CountMinSketch (CMS), which is also optimized in C for top performance.

Counting Efficiently with Bounter pt. 1: HashTable

Filip Štefaňák Machine Learning, Open Source Leave a Comment

Have you heard about the new open source Bounter Python library in town? In case you can’t wait to use it in practice but are wary of its “frequency estimation”, and what kind of results you can expect, this series of blog posts will help you develop the right intuition. It is split into two parts, one for each of …