mammals_350_ep_l2_reg

Implementing Poincaré Embeddings

Jayant Jain gensim, Open Source

I have been working on implementing a model called Poincaré embeddings over the last month or so. The model is from an interesting paper by Facebook AI Research – Poincaré Embeddings for Learning Hierarchical Representations [1]. This post describes the model at a relatively high level of abstraction, and the detailed technical challenges faced in the process of implementing it.
Output for python -m gensim.downloader --info

New API for pretrained NLP models and datasets in Gensim

Chaitali Saini Datasets, gensim, Open Source, Student Incubator 1 Comment

There’s no shortage of websites and repositories that aggregate various machine learning datasets and pre-trained models (Kaggle, UCI MLR, DeepDive, individual repos like gloVe, FastText, Quora, blogs…). The only problem is, they all use widely different formats, cover widely different use-cases and go out of service with worrying regularity. For this reason, we decided to include datasets and models relevant …
graph of cloud hardware benchmark in USD

Machine learning benchmarks: Hardware providers (part 1)

Shiva Manne Machine Learning, Open Source, Student Incubator 10 Comments

The rise of machine learning as a discipline brings new demands for number crunching and computing power. With easily accessible and cheap hardware resources, one has to pick the right platform to run the experiments and model training on. Should you use Amazon’s AWS EC2 instances? Or go with IBM’s Softlayer, Google’s Compute Engine, Microsoft’s Azure? How about a real …
egyptian mummy

The Mummy Effect: Bridging the gap between academia and industry (PyData keynote)

Radim Řehůřek Machine Learning, Open Source, Student Incubator

Last month, I gave a keynote at PyData Warsaw about the existing (and growing) gap between academia and industry, specifically when it comes to machine learning / data science. This is a topic close to my heart, since we’ve operated in that no-man’s land where academia and industry collide for a living for 7 years now. Between running our Student …
bounter eval

Counting Efficiently with Bounter pt. 1: HashTable

Filip Štefaňák Machine Learning, Open Source Leave a Comment

Have you heard about the new open source Bounter Python library in town? In case you can’t wait to use it in practice but are wary of its “frequency estimation”, and what kind of results you can expect, this series of blog posts will help you develop the right intuition. It is split into two parts, one for each of …
facets dive banner

Data analysis in Python: Interactive confusion matrix with Facets Dive, Pandas, Scikit-learn

Jan Pomikálek Machine Learning, Open Source Leave a Comment

The Facets project by Google’s “People+AI Research Initiative” (PAIR) offers two open source visualization tools for data analytics – Facets Overview and Facets Dive. Today, we are going to look at Facets Dive and demonstrate how to use it for an interactive confusion matrix for a multiclass classification problem using Python, Pandas and Scikit-learn.

Chinmaya’s GSoC 2017 Summary: Integration with sklearn & Keras and implementing fastText

Chinmaya Pancholi gensim, Google Summer of Code, Student Incubator

This blog summarizes the work that I did for Google Summer of Code 2017 with Gensim. My work during the summer was divided into two parts: integrating Gensim with scikit-learn & Keras and adding a Python implementation of fastText model to Gensim. Gensim integration with scikit-learn and Keras Gensim is a topic modelling and information extraction library which mainly serves unsupervised …