Our team was on site representing RaRe Technologies and Gensim at this year’s PyCon 2016 hosted in Portland, Oregon, from May 28th to June 5th. It was a packed, outright massive event of over 3000 attendees which included two days of focused tutorials, sponsor workshops and talks from some of the industry’s renowned experts. RaRe was a sponsor of the event which means, yes, we had a handy booth and literature abound plus friendly faces (and exceptional team members) like Gordon Mohr, Jeff Hoey, well, and myself of course.
We may have bumped into one another during the event? Possibly at the pre-conference tutorials, conference talks or post-conference (Gensim!) sprint hosted by yours truly. I collected lots of stickers and awesome information but doubt it will all fit in this blog post. Let’s take a look at a quick recap of my favorite PyCon moments.
PyCon Tutorials were great
I was excited to attend a pre-conference tutorial day where I sat in on one of the two Gensim tutorials. It was run by Ben Bengfort of District Data Labs. He provided very deep coverage of the important pre-processing steps with NLTK. As you may know, performance of machine learning algorithms in Gensim really depend on what happened upstream in pre-processing.
Video: Tony Ojeda, Benjamin Bengfort, Laura Lorenz – Natural Language Processing with NLTK and Gensim – PyCon 2016 [May 28, 2016 – YouTube]
Of course the conference itself offered up many interesting conversations about NLP especially when people visited our sponsored booth (for RaRe Technologies and Gensim!) in the expo hall. So many great people walked up and took the time to talk with us, even thanking us, for our work with Gensim. Others were quite curious and asked about how they could use NLP in their business. Perfect conversation for RaRe indeed!
Although I was quite busy throughout the event, I made time to sit in on some of the most interesting talks. One covered the Gilectomy project – focusing on removing Global Interpreter lock from Python. Frankly, it should have been a keynote but absolutely no one realized it would make so much progress in six months! So yes, it was assigned to a smaller room and as the room began to exceed limits, lots of people were turned away! But regarding the technology, I am okay with the current state of affairs (re: multi-threading confined to C extensions) and don’t quite see the point of having a complicated system of locks in Python as, say, Java has. It was important to be there and hear the community debate the issue!
Video: Larry Hastings – Removing Python’s GIL: The Gilectomy – Pycon 2016 [May 29, 2016 – YouTube]
Alas, let’s talk sprint!
My role at the conference shifted a few days in as sprints began and I ran the Gensim sprint! Thankfully I had a bit of coaching from Shauna Gordon-McKeon, the PyCon sprint organizer (thank you!) because this was my first time running one solo. She has a lot of experience and offered up many useful tips. For example, she shared that we should make a “please interrupt me” sign so that people would feel okay about asking questions. It definitely works and was helpful as the sprint days progressed.
Yes, I can confirm that our sprint days were very enjoyable (and productive!) especially as we ran an “Introduction to unit testing” tutorial at the “new to sprints” workshop. Our particular sprint was beginner friendly, so no machine learning required, and asked for previous experience in python (for loops). We invited everyone to try out our tutorials and help make corrections to any mistakes they may find. As a result, all of the tutorials on our website have been reworked! All by the 20 amazing people who supported Gensim through our PyCon sprint.
The mix of those who attended our sprint really inspired me. It was a collective of first time contributors to open source, while others were very experienced. For example, Jes Ford, a very experienced maintainer of astronomical data wrangling package Cluster-lensing quickly reworked our tutorials into Ipython notebooks. Another notable is Karl Heiger, the author of spark-neighbors who’s Spark package has moved forward our Annoy integration efforts. Hobsons Lane, a veteran Gensim user from TalentPair, has done numerous contributions. Special thanks to Sean Law from TD Ameritrade for creating our Doc2Vec tutorial. Andrew Mullins from Puppet Labs has created a new Quick-start – soon to be live on our Github page. There were many more contributors – see the Release Notes for Gensim 0.13.0.
Worth a thousand words
Of course I couldn’t close this post without taking a moment to share the amazing experience I had not only at PyCon, but in Portland itself. With a few extra minutes on hand, I took to the trails and experienced a zen-like hiking side-trip. With the event behind me and great work completed on Gensim, it was the perfect end to this awesome trip.
Thank you to all of our coding sprinters for making our new release possible while also growing our community with your talent and support. Until next year!
Pingback: Radim, Gensim and RaRe Technologies | RaRe Technologies
Pingback: How better to explore topic modeling and NLP advances - CognitionX