I write these lines as I recover my voice from the Pioneers Festival 2013 in Vienna, a major event in the world of IT startups, investors and the media. I’m not used to talking so much, for so long 🙂
Gensim, the machine learning library for unsupervised learning I started in late 2008, will be celebrating its fifth anniversary this November. Time to reminisce and mull over its successes and failures 🙂
I intend to keep this blog mostly technical. But since it’s the eve before parliamentary elections here in the Czech Republic, I feel a small politically-technical rant is in order.
The final instalment on optimizing word2vec in Python: how to make use of multicore machines. You may want to read Part One and Part Two first.
Last weekend, I ported Google’s word2vec into Python. The result was a clean, concise and readable code that plays well with other Python NLP packages. One problem remained: the performance was 20x slower than the original C code, even after all the obvious NumPy optimizations.
Neural networks have been a bit of a punching bag historically: neither particularly fast, nor robust or accurate, nor open to introspection by humans curious to gain insights from them. But things have been changing lately, with deep learning becoming a hot topic in academia with spectacular results. I decided to check out one deep learning algorithm via gensim.
Update: blog went live on 8th September 2013, comments and suggestions welcome 🙂