While hunting for a data set to try my DTM python port, I came across this paper, and this repository. The paper itself was quite an interesting read and analysed trends of topics in the European Parliament, but what caught my attention was the algorithm they used to perform this analysis – what they called the Dynamic Non-Negative Matrix Factorisation (NMF). The …
Validating gensim’s topic coherence pipeline
Sorry for not posting in such a long while. It had been a turbulent few weeks with some sharp twists and turns involving mails flying back and forth and a few pivots here and there. To validate the topic coherence pipeline in gensim, my plan was to work with the RTL-Wiki corpus and reproduce the results stated in the paper. …
The craziness that is Dynamic Topic Models
Every week, I’d end up having ‘fit DTM‘ as my weekly goal. And I would try, converting line by line of C++ gsl code, only to have it fail miserably and fall back on me. (you can see my gripe about it in my live blog here.) The task in itself was quite straightforward – rewrite the Dynamic Topic Model code originally written by …
What is Topic Coherence?
What exactly is this topic coherence pipeline thing? Why is it even important? Moreover, what is the advantage of having this pipeline at all? In this post I will look to answer those questions in an as non-technical language as possible. This is meant for the general reader as much as a technical one so I will try to engage …
Radim, Gensim and RaRe Technologies
Racing through 2016 with so much on the front burner and yet it is timely to pause for a quick update on the launch of my new machine learning company, RaRe Technologies. The Start of Something Exciting I’ve heard from a few people who were confused when they received a recent newsletter from “RaRe Technologies”, when they signed up for …
Devashish’s Student Incubator Live-Blog: a Chronicle of Implementing Topic Coherence Metrics in Gensim
10th August : PyCon Delhi Planning to give some open space and lightening talks on gensim at pycon India in September. Hopefully we’ll also be able to organize a sprint there. 1st August : Plugging in your own model You can use the topic coherence pipeline to plug in your own topic model too. If you can extract the topics …
Bhargav’s Google Summer of Code 2016 Live-Blog: a Chronicle of Dynamic Topic Models
September 2nd, 2016 It’s celebration time – I’ve officially cleared Google Summer of Code 2016! 😀 😀 It’s been an absolutely awesome experience, I’ve had great mentors in Lev and Radim and I’ve learned so, so much. You can see the result of my work here in this notebook tutorial – link. And you can follow the extra features which …
Understanding and Coding Dynamic Topic Models
Around a month into GSoC and into coding Dynamic Topic Models, there have been many challenges and experiences along the way. Before getting into the problems I faced, I’ll briefly describe what Dynamic Topic Models are. It would be helpful to read my previous blog post where I described Topic Models, first. You can also just do a quick google …
Pycon 2016 and Gensim Sprint Recap
Our team was on site representing RaRe Technologies and Gensim at this year’s PyCon 2016 hosted in Portland, Oregon, from May 28th to June 5th. It was a packed, outright massive event of over 3000 attendees which included two days of focused tutorials, sponsor workshops and talks from some of the industry’s renowned experts. RaRe was a sponsor of the …
Topic Coherence API Project – Week 2
Hey everyone! Here’s a small reflection of what I had set out to do and how it panned out over the last month. My agenda for last month was to complete my normalization PR, finish my doc2vec/word2vec warning PR, code two modules required by the topic coherence API and resolve any other bugs which I encounter in the process. The …