Pycon 2016 and Gensim Sprint Recap

Lev Konstantinovskiy gensim, Machine Learning, PyCon 2 Comments

Our team was on site representing RaRe Technologies and Gensim at this year’s PyCon 2016 hosted in Portland, Oregon, from May 28th to June 5th. It was a packed, outright massive event of over 3000 attendees which included two days of focused tutorials, sponsor workshops and talks from some of the industry’s renowned experts. RaRe was a sponsor of the event which means, yes, we had a handy booth and literature abound plus friendly faces (and exceptional team members) like Gordon Mohr, Jeff Hoey, well, and myself of course.

Image: Left to right - Gordon Mohr, Senior Trainer, Jeff Hoey, Business Development, and Lev Konstantinovskiy, Community Manager at Gensim representing RaRe Technologies at PyCon 2016.

Image: Left to right – Gordon Mohr, Senior Trainer, Jeff Hoey, Business Development, and Lev Konstantinovskiy, Community Manager at Gensim representing RaRe Technologies at PyCon 2016.

We may have bumped into one another during the event? Possibly at the pre-conference tutorials, conference talks or post-conference (Gensim!) sprint hosted by yours truly. I collected lots of stickers and awesome information but doubt it will all fit in this blog post. Let’s take a look at a quick recap of my favorite PyCon moments.

PyCon Tutorials were great

I was excited to attend a pre-conference tutorial day where I sat in on one of the two Gensim tutorials. It was run by Ben Bengfort of District Data Labs. He provided very deep coverage of the important pre-processing steps with NLTK. As you may know, performance of machine learning algorithms in Gensim really depend on what happened upstream in pre-processing.

Video: Tony Ojeda, Benjamin Bengfort, Laura Lorenz – Natural Language Processing with NLTK and Gensim – PyCon 2016  [May 28, 2016 – YouTube]

Of course the conference itself offered up many interesting conversations about NLP especially when people visited our sponsored booth (for RaRe Technologies and Gensim!) in the expo hall. So many great people walked up and took the time to talk with us, even thanking us, for our work with Gensim. Others were quite curious and asked about how they could use NLP in their business. Perfect conversation for RaRe indeed!

Although I was quite busy throughout the event, I made time to sit in on some of the most interesting talks. One covered the Gilectomy project – focusing on removing Global Interpreter lock from Python. Frankly, it should have been a keynote but absolutely no one realized it would make so much progress in six months! So yes, it was assigned to a smaller room and as the room began to exceed limits, lots of people were turned away! But regarding the technology, I am okay with the current state of affairs (re: multi-threading confined to C extensions) and don’t quite see the point of having a complicated system of locks in Python as, say, Java has. It was important to be there and hear the community debate the issue!

Video: Larry Hastings – Removing Python’s GIL: The Gilectomy – Pycon 2016 [May 29, 2016 – YouTube]

Alas, let’s talk sprint!

My role at the conference shifted a few days in as sprints began and I ran the Gensim sprint! Thankfully I had a bit of coaching from Shauna Gordon-McKeon, the PyCon sprint organizer (thank you!) because this was my first time running one solo. She has a lot of experience and offered up many useful tips. For example, she shared that we should make a “please interrupt me” sign so that people would feel okay about asking questions. It definitely works and was helpful as the sprint days progressed.

Image: Let the sprints begin! PyCon 2016 Gensim coding sprint.

Image: PyCon 2016 Coding Sprints

Yes, I can confirm that our sprint days were very enjoyable (and productive!) especially as we ran an “Introduction to unit testing” tutorial at the “new to sprints” workshop. Our particular sprint was beginner friendly, so no machine learning required, and asked for previous experience in python (for loops). We invited everyone to try out our tutorials and help make corrections to any mistakes they may find. As a result, all of the tutorials on our website have been reworked! All by the 20 amazing people who supported Gensim through our PyCon sprint.

Image: Check out that group! 20 people sprinted Gensim on the first day. PyCon 2016

Image: PyCon 2016 Gensim Sprint

The mix of those who attended our sprint really inspired me. It was a collective of first time contributors to open source, while others were very experienced. For example, Jes Ford, a very experienced maintainer of astronomical data wrangling package Cluster-lensing quickly reworked our tutorials into Ipython notebooks. Another notable is Karl Heiger, the author of spark-neighbors who’s Spark package has moved forward our Annoy integration efforts. Hobsons Lane, a veteran Gensim user from TalentPair, has done numerous contributions. Special thanks to Sean Law from TD Ameritrade for creating our Doc2Vec tutorial. Andrew Mullins from Puppet Labs has created a new Quick-start – soon to be live on our Github page. There were many more contributors – see the Release Notes for Gensim 0.13.0.

Image: PyCon 2016 Gensim Sprint including Sean Law from TD Ameritrade (right)

Image: PyCon 2016 Gensim Sprint including Sean Law from TD Ameritrade (right)

Image: Results from Gensim Sprint at Pycon 2016 - Release Notes for Gensim 0.13.0

Image: Gensim Sprint at Pycon 2016 – constant stream of Github notifications

Worth a thousand words

Of course I couldn’t close this post without taking a moment to share the amazing experience I had not only at PyCon, but in Portland itself. With a few extra minutes on hand, I took to the trails and experienced a zen-like hiking side-trip. With the event behind me and great work completed on Gensim, it was the perfect end to this awesome trip.

Image: Lev from RaRe Technologies and Community Manager for Gensim in Portland, Oregon hiking the trails.

Image: Lev from RaRe Technologies and Community Manager for Gensim in Portland, Oregon hiking the trails.

Thank you to all of our coding sprinters for making our new release possible while also growing our community with your talent and support. Until next year!

Comments 2

  1. Pingback: Radim, Gensim and RaRe Technologies | RaRe Technologies

  2. Pingback: How better to explore topic modeling and NLP advances - CognitionX

Leave a Reply

Your email address will not be published. Required fields are marked *