pii_o365

Scanning Office 365 documents

Radim Řehůřek personal data, PII Tools, security

You wouldn’t know it if you’re coming from a data science background, but Microsoft products permeate the businesses world. A lot of documents are stored and shared within Windows environments, not all of them easily discoverable and interpretable with respect to personal information and data privacy. After we implemented scanning of on-prem Windows devices, endpoints and file shares into PII …
podcast_background_light

RRP #4: Leonid Boytsov on kNN search and information retrieval

Radim Řehůřek podcast Leave a Comment

Episode Summary: Leo Boytsov, a PhD researcher from the Language Technologies Institute of Carnegie Mellon University, talks about fast approximate search in modern information retrieval. We discuss the curse of dimensionality, hard-to-beat baselines and NMSLIB, Leo's super fast library for nearest-neighbour search. How does NMSLIB compare to Facebook's FAISS and Spotify's Annoy? Warning: very technical. Links & resources: NMSLIB: Leonid's ...
egyptian mummy

The Mummy Effect: Bridging the gap between academia and industry (PyData keynote)

Radim Řehůřek Machine Learning, Open Source, Student Incubator

Last month, I gave a keynote at PyData Warsaw about the existing (and growing) gap between academia and industry, specifically when it comes to machine learning / data science. This is a topic close to my heart, since we’ve operated in that no-man’s land where academia and industry collide for a living for 7 years now. Between running our Student …
podcast_background_light

Archive of RRP Podcast Episodes

Radim Řehůřek podcast Leave a Comment

Subscribe with RSS, iTunes, YouTube, Stitcher, SoundCloud. Episode #4: Leonid Boytsov on kNN search and information retrieval Where Leo, a PhD researcher from the Language Technologies Institute of Carnegie Mellon University, talks about fast approximate search in modern information retrieval. How does his NMSLIB library compare to Facebook's FAISS and Spotify's Annoy? [full post] Episode #3: Andy Müller on scikit-learn ...
podcast_background_light

RRP #1: Tomáš Mikolov on word2vec and AI research at Microsoft, Google, Facebook

Radim Řehůřek podcast 7 Comments

Episode Summary: Today I sat down with Tomáš Mikolov, my fellow Czech countryman whom most of you will know through his work on word2vec. But Tomáš has many more interesting things to say beside word2vec (although we cover word2vec too!): his beginnings with 8bit graphics and games, living in NY compared to California, AI research at Microsoft vs Google vs ...
Black and Blue Keyboard

Radim, Gensim and RaRe Technologies

Radim Řehůřek gensim, Machine Learning Leave a Comment

Racing through 2016 with so much on the front burner and yet it is timely to pause for a quick update on the launch of my new machine learning company, RaRe Technologies. The Start of Something Exciting I’ve heard from a few people who were confused when they received a recent newsletter from “RaRe Technologies”, when they signed up for …