facets dive banner

Data analysis in Python: Interactive confusion matrix with Facets Dive, Pandas, Scikit-learn

Jan Pomikálek Machine Learning, Open Source 2 Comments

The Facets project by Google’s “People+AI Research Initiative” (PAIR) offers two open source visualization tools for data analytics – Facets Overview and Facets Dive. Today, we are going to look at Facets Dive and demonstrate how to use it for an interactive confusion matrix for a multiclass classification problem using Python, Pandas and Scikit-learn.
img_20170129_105446_hdr-1

Gensim switches to semantic versioning

Lev Konstantinovskiy gensim, Open Source

Starting with release 1.0.0, Gensim adopts semantic versioning. The time went in a flash, but Gensim has reached maturity. It's been cited in nearly 500 academic papers, used commercially in dozens of companies, organized many coding sprints and meetups and generally withstood the test of time. Between the continued Gensim support by our parent company, rare-technologies.com, and our open Student ...
20156116-data-concept-computer-keyboard-with-word-data-processing-selected-focus-on-enter-button-background-3-stock-photo

Topic Modelling with Latent Dirichlet Allocation: How to pre-process data and tune your model. New tutorial.

Ólavur Mortensen gensim, Machine Learning, Open Source, programming, Student Incubator

If you’ve learned how to train topic models in Gensim, but aren’t able to get satisfying results, then we have a new tutorial that will help you get on the right track on GitHub. Primarily, you will learn some things about pre-processing text data for the LDA model. You will also get some tips about how to set the parameters …

Author Topic Model

Author-topic models: why I am working on a new implementation

Ólavur Mortensen gensim, Machine Learning, Open Source, programming, Student Incubator

Author-topic models promise to give data scientists a tool to simultaneously gain insight about authorship and content in terms of latent topics. The model is closely related to Latent Dirichlet Allocation (LDA). Basically, each author can be associated with multiple documents, and each document can be attributed to multiple authors. The model learns topic representations for each author, so that …

image00

Three Sprints in India (To Say Nothing of PyCon)

Lev Konstantinovskiy gensim, Open Source, PyCon, Student Incubator

I was very happy to visit India this October to run three Gensim coding sprints, give workshops and visit PyCon India conference. Many thanks to our Incubator programme student Devashish Deshpande for being my host. PyCon India Pycon India was a very friendly event of 500 attendees with workshops on Friday and conference talks over Saturday and Sunday. My favorite PyCon moment was the keynote …