student incubator | RARE Technologies

Translation Matrix: how to connect “embeddings” in different languages?

Ji Xiaohong 2017-09-13 gensim, Student Incubator

This is a blog post by one of our Incubator students, Ji Xiaohong. Ji worked on the problem of aligning differently trained word embeddings (such as word2vec), which is useful in applications such as machine translation or tracking language evolution within the same language.

WordRank embedding: “crowned” is most similar to “king”, not word2vec’s “Canute”

Parul Sethi 2017-01-23 gensim, Student Incubator

Comparisons to Word2Vec and FastText with TensorBoard visualizations. With various embedding models coming up recently, it could be a difficult task to choose one. Should you simply go with the ones widely used in NLP community such as Word2Vec, or is it possible that some other model could be more accurate for your use case? There are some evaluation metrics …

New Gensim feature: Author-topic modeling. LDA with metadata.

Ólavur Mortensen 2017-01-18 gensim

The author-topic model is an extension of Latent Dirichlet Allocation that allows data scientists to build topic representations of attached author labels. These author labels can represent any kind of discrete metadata attached to documents, for example, tags on posts on the web. In December of 2016, I wrote a blog post explaining that a Gensim implementation was on its …