Blog | RARE Technologies

Chinmaya’s Google Summer of Code 2017 Live-Blog : a Chronicle of Integrating Gensim with scikit-learn and Keras

Chinmaya Pancholi 2017-06-12 gensim, Student Incubator

2nd September, 2017 The final blogpost in the GSoC 2017 series summarising all the work that I did this summer can be found here. 15st August, 2017 During the last two weeks, I had been working primarily on adding a Python implementation of Facebook Research’s Fasttext model to Gensim. I was also simultaneously working on completing the tasks left for adding scikit-learn API for …

Parul’s Google Summer of Code 2017 Live-Blog : a chronicle of adding training and topic visualizations in gensim

Parul Sethi 2017-06-01 gensim, Student Incubator

19th August 2017 For last phase of my project, i’ll be adding a visualization which is an attempt to overcome some of the limitations of already available topic model visualizations. Current visualizations focus more on topics or topic-term relations leaving out the scope to comprehensively explore the document entity. I’d work on an interface which would allow us to interactively …

Google Summer of Code 2017 – Performance improvement in Gensim and fastText

Prakhar Pratyush 2017-05-31 gensim, Student Incubator

July 20, 2017 This week, I’ve mostly worked on implementing native unsupervised fastText (PR #1482) in gensim. It’s quite challenging as I had to look into the fasttext C codes, and read the research paper to properly understand how this is working, and then had to figure out the similarity with word2vec code. After lots of discussion with mentors, we …

Google Summer of Code 2017 – Week 1 of Integrating Gensim with scikit-learn and Keras

Chinmaya Pancholi 2017-05-30 gensim, Student Incubator

This is my first post as part of Google Summer of Code 2017 working with Gensim. I would be working on the project ‘Gensim integration with scikit-learn and Keras‘ this summer. I stumbled upon Gensim while working on a project which utilized the Word2Vec model. I was looking for a functionality to suggest words semantically similar to the given input word and Gensim’s …

Dealing mergeytocin: how to run an open source sprint. Based on 8 gensim sprints in 5 countries in 12 months.

Lev Konstantinovskiy 2017-05-24 Open Source

In this blog I want to tell you what it takes to organize an open source coding sprint – find a venue, set an agenda and then actually run it.

RRP #3: Andy Müller on scikit-learn and open source

Radim Řehůřek 2017-04-23 podcast Leave a Comment

Episode Summary: Andreas Müller talks about how he fell in love with scikit-learn and his continuous work there as the package maintainer. We also cover his work at Amazon and why he left to work on open source; his recent book on machine learning in Python; sustainability and future of sklearn in the "deep learning world", and his impressions of ...

Archive of RRP Podcast Episodes

Radim Řehůřek 2017-04-17 podcast Leave a Comment

Subscribe with RSS, iTunes, YouTube, Stitcher, SoundCloud. Episode #4: Leonid Boytsov on kNN search and information retrieval Where Leo, a PhD researcher from the Language Technologies Institute of Carnegie Mellon University, talks about fast approximate search in modern information retrieval. How does his NMSLIB library compare to Facebook's FAISS and Spotify's Annoy? [full post] Episode #3: Andy Müller on scikit-learn ...

Text Summarization in Python: Extractive vs. Abstractive techniques revisited

Pranay, Aman and Aayush 2017-04-05 gensim, Student Incubator, summarization

This blog is a gentle introduction to text summarization and can serve as a practical summary of the current landscape. It describes how we, a team of three students in the RaRe Incubator programme, have experimented with existing algorithms and Python tools in this domain. We compare modern extractive methods like LexRank, LSA, Luhn and Gensim’s existing TextRank summarization module …

RRP #2: John D. Cook on math consulting, Python and going solo

Radim Řehůřek 2017-03-16 podcast Leave a Comment

Episode Summary: A few years ago I promised you a blog series on how to start your own consulting business in machine learning: getting set up, figuring out legal & intellectual property rights, finding consistent work, scoping in the face of research uncertainty, the project life cycle, mistakes to avoid... I gave a few talks on this topic but never ...

Gensim switches to semantic versioning

Lev Konstantinovskiy 2017-02-25 gensim, Open Source

Starting with release 1.0.0, Gensim adopts semantic versioning. The time went in a flash, but Gensim has reached maturity. It's been cited in nearly 500 academic papers, used commercially in dozens of companies, organized many coding sprints and meetups and generally withstood the test of time. Between the continued Gensim support by our parent company, rare-technologies.com, and our open Student ...