I never got round to writing a tutorial on how to use word2vec in gensim. It’s simple enough and the API docs are straightforward, but I know some people prefer more verbose formats. Let this post be a tutorial and a reference example.
Previous posts explained the whys & whats of nearest-neighbour search, the available OSS libraries and Python wrappers. We converted the English Wikipedia to vector space, to be used as our testing dataset for retrieving “similar articles”. In this post, I finally get to some hard performance numbers, plus a live demo near the end.