Shocked user

Docstrings in open source Python

Dmitry Berdov gensim, Open Source, Student Incubator

Hi everyone, my name is Dmitry Berdov, I’m a graduate student at the Ural Federal University, now working in QA testing (automation) sphere. I had no experience with writing documentation before joining the RARE Incubator, where my task has been to refactor and improve the poor state of Gensim docs. Now, after several months of shooting myself hard in the ...
podcast_background_light

RRP #4: Leonid Boytsov on kNN search and information retrieval

Radim Řehůřek podcast Leave a Comment

Episode Summary: Leo Boytsov, a PhD researcher from the Language Technologies Institute of Carnegie Mellon University, talks about fast approximate search in modern information retrieval. We discuss the curse of dimensionality, hard-to-beat baselines and NMSLIB, Leo's super fast library for nearest-neighbour search. How does NMSLIB compare to Facebook's FAISS and Spotify's Annoy? Warning: very technical. Links & resources: NMSLIB: Leonid's ...
gpu-benchmark-results

Machine learning mega-benchmark: GPU providers (part 2)

Shiva Manne Deep Learning, Machine Learning, Open Source 8 Comments

We had recently published a large-scale machine learning benchmark using word2vec, comparing several popular hardware providers and ML frameworks in pragmatic aspects such as their cost, ease of use, stability, scalability and performance. Since that benchmark only looked at the CPUs, we also ran an analogous ML benchmark focused on GPUs.
screen-shot-2018-01-31-at-11-32-49

Counting Efficiently with Bounter pt. 2: CountMinSketch

Filip Štefaňák Machine Learning, Open Source 2 Comments

In my previous post on the new open source Python Bounter library we discussed how we can use its HashTable to quickly count approximate item frequencies in very large item sequences. Now we turn our attention to the second algorithm in Bounter, CountMinSketch (CMS), which is also optimized in C for top performance.
mammals_350_ep_l2_reg

Implementing Poincaré Embeddings

Jayant Jain gensim, Open Source 3 Comments

I have been working on implementing a model called Poincaré embeddings over the last month or so. The model is from an interesting paper by Facebook AI Research – Poincaré Embeddings for Learning Hierarchical Representations [1]. This post describes the model at a relatively high level of abstraction, and the detailed technical challenges faced in the process of implementing it.
Output for python -m gensim.downloader --info

New download API for pretrained NLP models and datasets in Gensim

Chaitali Saini Datasets, gensim, Open Source, Student Incubator 3 Comments

There’s no shortage of websites and repositories that aggregate various machine learning datasets and pre-trained models (Kaggle, UCI MLR, DeepDive, individual repos like gloVe, FastText, Quora, blogs, individual university pages…). The only problem is, they all use widely different formats, cover widely different use-cases and go out of service with worrying regularity. For this reason, we decided to include free …
graph of cloud hardware benchmark in USD

Machine learning benchmarks: Hardware providers (part 1)

Shiva Manne Machine Learning, Open Source, Student Incubator 12 Comments

The rise of machine learning as a discipline brings new demands for number crunching and computing power. With easily accessible and cheap hardware resources, one has to pick the right platform to run the experiments and model training on. Should you use Amazon’s AWS EC2 instances? Or go with IBM’s Softlayer, Google’s Compute Engine, Microsoft’s Azure? How about a real …