Last month, we ran a survey among Gensim users to get a better idea what delights and annoys you. The ~7 minute survey was completed by 448 people. That’s a great juicy sample, big thanks to all who participated! Full detailed statistics here; in this post I’ll summarize what we found and what it means for Gensim.
We had recently published a large-scale machine learning benchmark using word2vec, comparing several popular hardware providers and ML frameworks in pragmatic aspects such as their cost, ease of use, stability, scalability and performance. Since that benchmark only looked at the CPUs, we also ran an analogous ML benchmark focused on GPUs.
In my previous post on the new open source Python Bounter library we discussed how we can use its HashTable to quickly count approximate item frequencies in very large item sequences. Now we turn our attention to the second algorithm in Bounter, CountMinSketch (CMS), which is also optimized in C for top performance.
In my previous post I talked about Facets Dive – an excellent visualisation tool from Google PAIR for data scientists. Now that you have created beautiful interactive charts from your data analyses and machine learning experiments, you may want to share them with your non-technical colleagues or customers, simply and securely.
The Facets project by Google’s “People+AI Research Initiative” (PAIR) offers two open source visualization tools for data analytics – Facets Overview and Facets Dive. Today, we are going to look at Facets Dive and demonstrate how to use it for an interactive confusion matrix for a multiclass classification problem using Python, Pandas and Scikit-learn.