Radim Řehůřek | RARE Technologies

Performance Shootout of Nearest Neighbours: Contestants

Radim Řehůřek 2013-12-08 gensim, programming 12 Comments

Continuing the benchmark of libraries for nearest-neighbour similarity search, part 2. What is the best software out there for similarity search in high dimensional vector spaces? Document Similarity @ English Wikipedia I’m not very fond of benchmarks on artificial datasets, and similarity search in particular is sensitive to actual data densities and distance profiles. Using fake “random gaussian datasets” seemed …

Performance Shootout of Nearest Neighbours: Intro

Radim Řehůřek 2013-11-30 gensim, programming 1 Comment

Violent as the title sounds, I’ll be actually benchmarking software packages that realize the nearest-neighbour search in high dimensional vector spaces. Which approach is the fastest, easiest to use, the best? No neighbours got harmed writing this post.

Money, startups, fame, bullshit

Radim Řehůřek 2013-11-15 startups 3 Comments

I write these lines as I recover my voice from the Pioneers Festival 2013 in Vienna, a major event in the world of IT startups, investors and the media. I’m not used to talking so much, for so long 🙂

Five Years of Gensim

Radim Řehůřek 2013-10-28 gensim 6 Comments

Gensim, the machine learning library for unsupervised learning I started in late 2008, will be celebrating its fifth anniversary this November. Time to reminisce and mull over its successes and failures 🙂

Technology vs. politics, round 1

Radim Řehůřek 2013-10-25 politics Leave a Comment

I intend to keep this blog mostly technical. But since it’s the eve before parliamentary elections here in the Czech Republic, I feel a small politically-technical rant is in order.

Parallelizing word2vec in Python

Radim Řehůřek 2013-10-04 gensim, programming 21 Comments

The final instalment on optimizing word2vec in Python: how to make use of multicore machines. You may want to read Part One and Part Two first.

Word2vec in Python, Part Two: Optimizing

Radim Řehůřek 2013-09-21 gensim, programming 46 Comments

Last weekend, I ported Google’s word2vec into Python. The result was a clean, concise and readable code that plays well with other Python NLP packages. One problem remained: the performance was 20x slower than the original C code, even after all the obvious NumPy optimizations.

Deep learning with word2vec and gensim

Radim Řehůřek 2013-09-17 gensim, programming 33 Comments

Neural networks have been a bit of a punching bag historically: neither particularly fast, nor robust or accurate, nor open to introspection by humans curious to gain insights from them. But things have been changing lately, with deep learning becoming a hot topic in academia with spectacular results. I decided to check out one deep learning algorithm via gensim.

Site under construction

Radim Řehůřek 2013-08-30 Uncategorized Leave a Comment

Update: blog went live on 8th September 2013, comments and suggestions welcome 🙂