My tryst with Gensim started when I was looking for ways to model evolution of topics in Software Engineering research, and Dynamic Topic Models was an obvious choice. While I initially worked with the original Blei DTM code, interpreting and organizing my results was a pain… and this is where Gensim came in.
The DTM wrapper was a breeze to use, and helped me finish my undergraduate project. But what excited me the most was the open source nature of it all, and I decided to give little back to Gensim and the community by opening a few Pull Requests to do with the Dynamic Topic Model wrapper (#676, #679). You can read the Gensim DTM tutorial here!
Having got my feet wet with Dynamic Topic Models and it’s workings, the next logical step was to implement it from scratch, and with Gensim’s philosophy of being blazing fast and memory independent. I soon approached my to-be mentors on the Student mailing list of Gensim, and started the process of applying for GSOC 2016 (you can see my proposal here).
While getting used to the code base, I tried my hand at implementing some popular Similarity functions – the code is under review, and it’s been a great learning experience getting it to work – especially writing unit tests and good code which can be used in a variety of scenarios. You can see the PR here.
Since then, I have officially started my community bonding period and am wrapping up my pending PRs, as well as getting myself ready to tackle coding up Dynamic Topic Models. I’ve been readily encouraged by my mentors Lev and Radim to help on the mailing lists and open issues (#683 – hope to put in a PR soon for this as well!, and #703).
Gensim themselves have been very helpful throughout this process, responding to mails and queries within a day, and even having daily half-hourly academic assistance video calls! It’s been great fun working so far, and I hope to do justice to Google Summer of Code 2016 by successfully completing the project.