Radim Řehůřek | RARE Technologies

Export PII drill-down reports

Radim Řehůřek 2019-02-10 Personal Data Protection, PII Tools

In the latest February release (version 2.4.0), we combined Personal Data Analytics search with dynamic HTML report generation to make GDPR compliance and auditing easier.

Personal Data Analytics

Radim Řehůřek 2018-12-10 Deep Learning, Personal Data Protection, PII Tools

The latest 2.0 release of PII Tools brings a brand new SAR dashboard, allowing targeted personal data search, filtering and analytics.

Scanning Office 365 for sensitive PII information

Radim Řehůřek 2018-07-16 personal data, PII Tools, security

After PII Tools implemented scanning of on-prem Windows workstations, endpoints and file shares into PII Tools, the nr. 1 request has been to find personal, sensitive and intimate data inside Office 365 installations.

Gensim Survey 2018

Radim Řehůřek 2018-04-30 gensim, Machine Learning, Open Source

Last month, we ran a survey among Gensim users to get a better idea what delights and annoys you. The ~7 minute survey was completed by 448 people. That’s a great juicy sample, big thanks to all who participated! Full detailed statistics here; in this post I’ll summarize what we found and what it means for Gensim.

RRP #4: Leonid Boytsov on kNN search and information retrieval

Radim Řehůřek 2018-03-12 podcast Leave a Comment

Episode Summary: Leo Boytsov, a PhD researcher from the Language Technologies Institute of Carnegie Mellon University, talks about fast approximate search in modern information retrieval. We discuss the curse of dimensionality, hard-to-beat baselines and NMSLIB, Leo's super fast library for nearest-neighbour search. How does NMSLIB compare to Facebook's FAISS and Spotify's Annoy? Warning: very technical. Links & resources: NMSLIB: Leonid's ...

The Mummy Effect: Bridging the gap between academia and industry (PyData keynote)

Radim Řehůřek 2017-11-19 Machine Learning, Open Source, Student Incubator

Last month, I gave a keynote at PyData Warsaw about the existing (and growing) gap between academia and industry, specifically when it comes to machine learning / data science. This is a topic close to my heart, since we’ve operated in that no-man’s land where academia and industry collide for a living for 7 years now. Between running our Student …

RRP #3: Andy Müller on scikit-learn and open source

Radim Řehůřek 2017-04-23 podcast Leave a Comment

Episode Summary: Andreas Müller talks about how he fell in love with scikit-learn and his continuous work there as the package maintainer. We also cover his work at Amazon and why he left to work on open source; his recent book on machine learning in Python; sustainability and future of sklearn in the "deep learning world", and his impressions of ...

Archive of RRP Podcast Episodes

Radim Řehůřek 2017-04-17 podcast Leave a Comment

Subscribe with RSS, iTunes, YouTube, Stitcher, SoundCloud. Episode #4: Leonid Boytsov on kNN search and information retrieval Where Leo, a PhD researcher from the Language Technologies Institute of Carnegie Mellon University, talks about fast approximate search in modern information retrieval. How does his NMSLIB library compare to Facebook's FAISS and Spotify's Annoy? [full post] Episode #3: Andy Müller on scikit-learn ...

RRP #2: John D. Cook on math consulting, Python and going solo

Radim Řehůřek 2017-03-16 podcast 4 Comments

Episode Summary: A few years ago I promised you a blog series on how to start your own consulting business in machine learning: getting set up, figuring out legal & intellectual property rights, finding consistent work, scoping in the face of research uncertainty, the project life cycle, mistakes to avoid... I gave a few talks on this topic but never ...

RRP #1: Tomáš Mikolov on word2vec and AI research at Microsoft, Google, Facebook

Radim Řehůřek 2017-02-09 podcast 7 Comments

Episode Summary: Today I sat down with Tomáš Mikolov, my fellow Czech countryman whom most of you will know through his work on word2vec. But Tomáš has many more interesting things to say beside word2vec (although we cover word2vec too!): his beginnings with 8bit graphics and games, living in NY compared to California, AI research at Microsoft vs Google vs ...