Category Archive

Below you'll find a list of all posts that have been categorized as “programming”

Performance Shootout of Nearest Neighbours: Querying

Radim Řehůřek 2014-01-12 gensim, programming 38 Comments

Previous posts explained the whys & whats of nearest-neighbour search, the available OSS libraries and Python wrappers. We converted the English Wikipedia to vector space, to be used as our testing dataset for retrieving “similar articles”. In this post, I finally get to some hard performance numbers, plus a live demo near the end.

Asymmetric LDA Priors, Christmas Edition

Radim Řehůřek 2013-12-21 gensim, programming 2 Comments

The end of the year is proving crazy busy as usual, but gensim acquired a cool new feature that I just had to blog about. Ben Trahan sent a patch that allows automatic tuning of Latent Dirichlet Allocation (LDA) hyperparameters in gensim. This means that an optimal, asymmetric alpha can now be trained directly from your data.

Performance Shootout of Nearest Neighbours: Contestants

Radim Řehůřek 2013-12-08 gensim, programming 12 Comments

Continuing the benchmark of libraries for nearest-neighbour similarity search, part 2. What is the best software out there for similarity search in high dimensional vector spaces? Document Similarity @ English Wikipedia I’m not very fond of benchmarks on artificial datasets, and similarity search in particular is sensitive to actual data densities and distance profiles. Using fake “random gaussian datasets” seemed …

Performance Shootout of Nearest Neighbours: Intro

Radim Řehůřek 2013-11-30 gensim, programming 1 Comment

Violent as the title sounds, I’ll be actually benchmarking software packages that realize the nearest-neighbour search in high dimensional vector spaces. Which approach is the fastest, easiest to use, the best? No neighbours got harmed writing this post.

Parallelizing word2vec in Python

Radim Řehůřek 2013-10-04 gensim, programming 21 Comments

The final instalment on optimizing word2vec in Python: how to make use of multicore machines. You may want to read Part One and Part Two first.

Word2vec in Python, Part Two: Optimizing

Radim Řehůřek 2013-09-21 gensim, programming 46 Comments

Last weekend, I ported Google’s word2vec into Python. The result was a clean, concise and readable code that plays well with other Python NLP packages. One problem remained: the performance was 20x slower than the original C code, even after all the obvious NumPy optimizations.

Deep learning with word2vec and gensim

Radim Řehůřek 2013-09-17 gensim, programming 33 Comments

Neural networks have been a bit of a punching bag historically: neither particularly fast, nor robust or accurate, nor open to introspection by humans curious to gain insights from them. But things have been changing lately, with deep learning becoming a hot topic in academia with spectacular results. I decided to check out one deep learning algorithm via gensim.

Page 2 of 2
←
1
2