Gensim word2vec on CPU faster than Word2veckeras on GPU (Incubator Student Blog)

Šimon Pavlík gensim

Word2Vec became so popular mainly thanks to huge improvements in training speed producing high-quality words vectors of much higher dimensionality compared to then widely used neural network language models. Word2Vec is an unsupervised method that can process potentially huge amounts of data without the need for manual labeling. There is really no limit to size of a dataset that can …


FastText and Gensim word embeddings

Jayant Jain gensim

Facebook Research open sourced a great project recently – fastText, a fast (no surprise) and effective method to learn word representations and perform text classification. I was curious about comparing these embeddings to other commonly used embeddings, so word2vec seemed like the obvious choice, especially considering fastText embeddings are an extension of word2vec. The main goal of the Fast Text …


Does Python Stand a Chance in Today’s World of Data Science? [video]

Tony DiLoreto gensim, Machine Learning Leave a Comment

Earlier this summer, our director Radim Řehůřek, led a talk about the state of Python in today’s world of Data Science. Covered in the talk is how businesses are using Python for commercial success, Python vs Java, and an interesting comparison of the popular latent semantic analysis (SVD) and word2vec algorithms running on with different platforms: Spark MLlib, gensim, scikit-learn …


Doc2vec tutorial

Radim Rehurek gensim, programming 87 Comments

The latest gensim release of 0.10.3 has a new class named Doc2Vec. All credit for this class, which is an implementation of Quoc Le & Tomáš Mikolov: “Distributed Representations of Sentences and Documents”, as well as for this tutorial, goes to the illustrious Tim Emerick. Doc2vec (aka paragraph2vec, aka sentence embeddings) modifies the word2vec algorithm to unsupervised learning of continuous …