Google Summer of Code 2017 – Week 1 of Integrating Gensim with scikit-learn and Keras

Chinmaya Pancholi gensim, Student Incubator

This is my first post as part of Google Summer of Code 2017 working with Gensim. I would be working on the project ‘Gensim integration with scikit-learn and Keras‘ this summer. I stumbled upon Gensim while working on a project which utilized the Word2Vec model. I was looking for a functionality to suggest words semantically similar to the given input word and Gensim’s …

Text Summarization in Python: Extractive vs. Abstractive techniques revisited

Pranay, Aman and Aayush gensim, Student Incubator, summarization

This blog is a gentle introduction to text summarization and can serve as a practical summary of the current landscape. It describes how we, a team of three students in the RaRe Incubator programme, have experimented with existing algorithms and Python tools in this domain. We compare modern extractive methods like LexRank, LSA, Luhn and Gensim’s existing TextRank summarization module …

WordRank embedding: “crowned” is most similar to “king”, not word2vec’s “Canute”

Parul Sethi gensim, Student Incubator

Comparisons to Word2Vec and FastText with TensorBoard visualizations. With various embedding models coming up recently, it could be a difficult task to choose one. Should you simply go with the ones widely used in NLP community such as Word2Vec, or is it possible that some other model could be more accurate for your use case? There are some evaluation metrics …

Topic Modelling with Latent Dirichlet Allocation: How to pre-process data and tune your model. New tutorial.

Ólavur Mortensen gensim, Machine Learning, Open Source, programming, Student Incubator

If you’ve learned how to train topic models in Gensim, but aren’t able to get satisfying results, then we have a new tutorial that will help you get on the right track on GitHub. Primarily, you will learn some things about pre-processing text data for the LDA model. You will also get some tips about how to set the parameters …

Author-topic models: why I am working on a new implementation

Ólavur Mortensen gensim, Machine Learning, Open Source, programming, Student Incubator

Author-topic models promise to give data scientists a tool to simultaneously gain insight about authorship and content in terms of latent topics. The model is closely related to Latent Dirichlet Allocation (LDA). Basically, each author can be associated with multiple documents, and each document can be attributed to multiple authors. The model learns topic representations for each author, so that …

Three Sprints in India (To Say Nothing of PyCon)

Lev Konstantinovskiy gensim, Open Source, PyCon, Student Incubator

I was very happy to visit India this October to run three Gensim coding sprints, give workshops and visit PyCon India conference. Many thanks to our Incubator programme student Devashish Deshpande for being my host. PyCon India Pycon India was a very friendly event of 500 attendees with workshops on Friday and conference talks over Saturday and Sunday. My favorite PyCon moment was the keynote …

Dynamic NMF and Dynamic Topics

Bhargav Srinivasa Google Summer of Code 2016, Student Incubator Leave a Comment

While hunting for a data set to try my DTM python port, I came across this paper, and this repository. The paper itself was quite an interesting read and analysed trends of topics in the European Parliament,  but what caught my attention was the algorithm they used to perform this analysis – what they called the Dynamic Non-Negative Matrix Factorisation (NMF). The …