Semantic Search Using a Fulltext Engine Presented at ACL 2017

Some of our consulting tasks keep on repeating, hinting at a wide-spread pain point across our clients and industries. One of them is looking for meaningful nuggets of information in large unstructured document databases. How do you extract actionable insights and relationships from messy datasets, such as Customer Support records? How about financial reports, or job CVs? Are you still …

Chinmaya’s Google Summer of Code 2017 Live-Blog : a Chronicle of Integrating Gensim with scikit-learn and Keras

1st August, 2017 In the last two weeks, I worked mainly on updating and adding sklearn API for models in Gensim and updating tests in shorttext. I was not able to add a blog in the previous week since my college semester has now commenced and I was travelling at the time. In PR #1473, I removed the BaseTransformer class and refactored the …

Parul’s Google Summer of Code 2017 Live-Blog : a chronicle of adding training and topic visualizations in gensim

2nd August 2017 PR-1484 is almost near it’s completion. It adds the dendrogram visualization which I talked about in last post. I added the additional parameter to define text annotations on the upper hierarchy levels also which could enable user to see the common/different terms on the cluster heads also which are made up of topics in leaves. It also …


Google Summer of Code 2017 – Performance improvement in Gensim and fastText

July 20, 2017 This week, I’ve mostly worked on implementing native unsupervised fastText (PR #1482) in gensim. It’s quite challenging as I had to look into the fasttext C codes, and read the research paper to properly understand how this is working, and then had to figure out the similarity with word2vec code. After lots of discussion with mentors, we …

Google Summer of Code 2017 – Week 1 of Integrating Gensim with scikit-learn and Keras

This is my first post as part of Google Summer of Code 2017 working with Gensim. I would be working on the project ‘Gensim integration with scikit-learn and Keras‘ this summer. I stumbled upon Gensim while working on a project which utilized the Word2Vec model. I was looking for a functionality to suggest words semantically similar to the given input word and Gensim’s …


Archive of RRP Podcast Episodes

Subscribe with RSS, iTunes, YouTube, Stitcher, SoundCloud. Episode #3: Andy Müller on scikit-learn and open source Where Andy, a core contributor to scikit-learn, shares his journey from academia to open source, his work at Amazon as a Machine Learning Scientist, and then going back to his love: open source and scikit-learn. [full post] Episode #2: John D. Cook on independent ...

Text Summarization in Python: Extractive vs. Abstractive techniques revisited

This blog is a gentle introduction to text summarization and can serve as a practical summary of the current landscape. It describes how we, a team of three students in the RaRe Incubator programme, have experimented with existing algorithms and Python tools in this domain. We compare modern extractive methods like LexRank, LSA, Luhn and Gensim’s existing TextRank summarization module …