Chinmaya’s Google Summer of Code 2017 Live-Blog : a Chronicle of Integrating Gensim with scikit-learn and Keras

Chinmaya Pancholi gensim, Student Incubator

20th June, 2017 During the last week, I continued working on creating scikit-learn wrappers for Gensim’s LDA (PR #1398), LSI (PR #1398), RandomProjections (PR #1395) and LDASeq (PR #1405) models. After making several changes including updating wrapper class-methods and adding unit-tests for features like model persistence, integration with sklearn’s Pipeline, incorporating NotFittedError as well as fixing some of the older unit-tests, these PRs have now been accepted and merged. 🙂 I also created PR …

Google Summer of Code 2017: Training and Topic Visualizations

Parul Sethi gensim, Student Incubator

21st June 2017 In previous week, I worked on visualizing the document-topic distribution in Tensorboard projector (PR1396) . It basically used the topic distribution of the document as it’s embedding vector and hence ends up forming clusters of documents belonging to same topics. Now, in order to understand and interpret about the theme of those topics, I used pyLDAvis to explore the …

Google Summer of Code 2017 – Performance improvement in Gensim and fastText

Prakhar Pratyush gensim, Student Incubator

June 21, 2017 In the last blog, I mentioned about a memory trade-off for speed by applying unicode to utf8 conversions (any2utf8) only before saving and not on every incoming word. But apparently, memory is more critical here, therefore to handle this speed bottleneck, we now apply this conversion on entire sentence in one go (by using a delimiter), and …

Google Summer of Code 2017 – Week 1 of Integrating Gensim with scikit-learn and Keras

Chinmaya Pancholi gensim, Student Incubator

This is my first post as part of Google Summer of Code 2017 working with Gensim. I would be working on the project ‘Gensim integration with scikit-learn and Keras‘ this summer. I stumbled upon Gensim while working on a project which utilized the Word2Vec model. I was looking for a functionality to suggest words semantically similar to the given input word and Gensim’s …


Archive of RRP Podcast Episodes

Radim Řehůřek podcast Leave a Comment

Subscribe with RSS, iTunes, YouTube, Stitcher, SoundCloud. Episode #3: Andy Müller on scikit-learn and open source Where Andy, a core contributor to scikit-learn, shares his journey from academia to open source, his work at Amazon as a Machine Learning Scientist, and then going back to his love: open source and scikit-learn. [full post] Episode #2: John D. Cook on independent ...

Text Summarization in Python: Extractive vs. Abstractive techniques revisited

Pranay, Aman and Aayush gensim, Student Incubator, summarization

This blog is a gentle introduction to text summarization and can serve as a practical summary of the current landscape. It describes how we, a team of three students in the RaRe Incubator programme, have experimented with existing algorithms and Python tools in this domain. We compare modern extractive methods like LexRank, LSA, Luhn and Gensim’s existing TextRank summarization module …


Gensim switches to semantic versioning

Lev Konstantinovskiy gensim, Open Source

Starting with release 1.0.0, Gensim adopts semantic versioning. The time went in a flash, but Gensim has reached maturity. It's been cited in nearly 500 academic papers, used commercially in dozens of companies, organized many coding sprints and meetups and generally withstood the test of time. Between the continued Gensim support by our parent company,, and our open Student ...