graph of cloud hardware benchmark in USD

Machine learning benchmarks: Hardware providers (part 1)

Shiva Manne Machine Learning, Open Source, Student Incubator 10 Comments

The rise of machine learning as a discipline brings new demands for number crunching and computing power. With easily accessible and cheap hardware resources, one has to pick the right platform to run the experiments and model training on. Should you use Amazon’s AWS EC2 instances? Or go with IBM’s Softlayer, Google’s Compute Engine, Microsoft’s Azure? How about a real …
egyptian mummy

The Mummy Effect: Bridging the gap between academia and industry (PyData keynote)

Radim Řehůřek Machine Learning, Open Source, Student Incubator

Last month, I gave a keynote at PyData Warsaw about the existing (and growing) gap between academia and industry, specifically when it comes to machine learning / data science. This is a topic close to my heart, since we’ve operated in that no-man’s land where academia and industry collide for a living for 7 years now. Between running our Student …
bounter eval

Counting Efficiently with Bounter pt. 1: HashTable

Filip Štefaňák Machine Learning, Open Source Leave a Comment

Have you heard about the new open source Bounter Python library in town? In case you can’t wait to use it in practice but are wary of its “frequency estimation”, and what kind of results you can expect, this series of blog posts will help you develop the right intuition. It is split into two parts, one for each of …
facets dive banner

Data analysis in Python: Interactive confusion matrix with Facets Dive, Pandas, Scikit-learn

Jan Pomikálek Machine Learning, Open Source Leave a Comment

The Facets project by Google’s “People+AI Research Initiative” (PAIR) offers two open source visualization tools for data analytics – Facets Overview and Facets Dive. Today, we are going to look at Facets Dive and demonstrate how to use it for an interactive confusion matrix for a multiclass classification problem using Python, Pandas and Scikit-learn.

Semantic Search Using a Fulltext Engine Presented at ACL 2017

Jaroslav Dostál Deep Learning, Machine Learning, ScaleText

Some of our consulting tasks keep on repeating, hinting at a wide-spread pain point across our clients and industries. One of them is looking for meaningful nuggets of information in large unstructured document databases. How do you extract actionable insights and relationships from messy datasets, such as Customer Support records? How about financial reports, or job CVs? Are you still …

Topic Modelling with Latent Dirichlet Allocation: How to pre-process data and tune your model. New tutorial.

Ólavur Mortensen gensim, Machine Learning, Open Source, programming, Student Incubator

If you’ve learned how to train topic models in Gensim, but aren’t able to get satisfying results, then we have a new tutorial that will help you get on the right track on GitHub. Primarily, you will learn some things about pre-processing text data for the LDA model. You will also get some tips about how to set the parameters …

Author Topic Model

Author-topic models: why I am working on a new implementation

Ólavur Mortensen gensim, Machine Learning, Open Source, programming, Student Incubator

Author-topic models promise to give data scientists a tool to simultaneously gain insight about authorship and content in terms of latent topics. The model is closely related to Latent Dirichlet Allocation (LDA). Basically, each author can be associated with multiple documents, and each document can be attributed to multiple authors. The model learns topic representations for each author, so that …

Black and Blue Keyboard

Radim, Gensim and RaRe Technologies

Radim Řehůřek gensim, Machine Learning Leave a Comment

Racing through 2016 with so much on the front burner and yet it is timely to pause for a quick update on the launch of my new machine learning company, RaRe Technologies. The Start of Something Exciting I’ve heard from a few people who were confused when they received a recent newsletter from “RaRe Technologies”, when they signed up for …

Pycon Entrance 2

Pycon 2016 and Gensim Sprint Recap

Lev Konstantinovskiy gensim, Machine Learning, PyCon 2 Comments

Our team was on site representing RaRe Technologies and Gensim at this year’s PyCon 2016 hosted in Portland, Oregon, from May 28th to June 5th. It was a packed, outright massive event of over 3000 attendees which included two days of focused tutorials, sponsor workshops and talks from some of the industry’s renowned experts. RaRe was a sponsor of the …