Topic Modelling and Coloring Document Words

Bhargav Srinivasa gensim, Google Summer of Code 2016, Student Incubator Leave a Comment

My second Google Summer of Code blog post is going to be a wee bit more technical – I’m going to briefly describe what topic models do, before linking to a tutorial I wrote which will teach you how to do some cool stuff with Topic Models and gensim. Very, very briefly – given a collection of documents, topic models …

2016 Student Data Science Programs with RaRe Technologies

Chris Lakatos gensim, Machine Learning, programming, Student Incubator 2 Comments

RaRe Technologies is deeply rooted in the open source community and we are always seeking out opportunities to dedicate our experience and time to the next generation of computer scientists. Often the first step is to connect ambitious students to the resources they need to truly make an impact with hands-on projects and mentorship. These up and coming students have …

RaRe Technologies Announces New Growth Team and Pycon Participation for 2016

Chris Lakatos Machine Learning Leave a Comment

As the demand for solid Machine Learning software development increases, RaRe has realized a need for an equally solid internal growth team and recently added two new hires into the mix. Chris Lakatos has joined the company as the Director of Marketing while Jeff Hoey has joined heading up Business Development. Each bring with them ample experience in technical fields …

2016 Student Incubator – Week 1 Implementing Topic Coherence Metrics in Gensim

Devashish Deshpande gensim, Student Incubator Leave a Comment

Here’s my first post as part of the RaRe Technologies Incubator Programme! Over the course of this summer I will be working on (and hopefully improving) the functionality of gensim, an open source library for topic modelling. My interest in machine learning and natural language processing started when I took an online course on machine learning by BerkeleyX. I was …

Google Summer of Code 2016 – Week 1 on Dynamic Topic Models

Bhargav Srinivasa Google Summer of Code 2016, Student Incubator Leave a Comment

It’s been around a month since being selected to participate in Google Summer of Code 2016 with NumFOCUS and Gensim, and it’s been quite exhilarating. My tryst with Gensim started when I was looking for ways to model evolution of topics in Software Engineering research, and Dynamic Topic Models was an obvious choice. While I initially worked with the original …

Go, Games, Strategy and Life: The Big Picture

Radim Řehůřek Go, Machine Learning 8 Comments

Everyone and their dog have shared their opinion on the recent Google AlphaGo commotion of AI beating Fan Hui, a pro player, and its upcoming match against Lee Sedol. As an avid Go player, as well as a machine learning practitioner with a long history of programming game AIs, I have a different perspective on what AlphaGo’s victories ultimately mean …

Does Python Stand a Chance in Today’s World of Data Science? [video]

Tony DiLoreto gensim, Machine Learning 4 Comments

Earlier this summer, our director Radim Řehůřek, led a talk about the state of Python in today’s world of Data Science. Covered in the talk is how businesses are using Python for commercial success, Python vs Java, and an interesting comparison of the popular latent semantic analysis (SVD) and word2vec algorithms running on with different platforms: Spark MLlib, gensim, scikit-learn …

Doc2vec tutorial

Radim Řehůřek gensim, programming 89 Comments

The latest gensim release of 0.10.3 has a new class named Doc2Vec. All credit for this class, which is an implementation of Quoc Le & Tomáš Mikolov: “Distributed Representations of Sentences and Documents”, as well as for this tutorial, goes to the illustrious Tim Emerick. Doc2vec (aka paragraph2vec, aka sentence embeddings) modifies the word2vec algorithm to unsupervised learning of continuous …