Translation Matrix: how to connect “embeddings” in different languages?

Ji Xiaohong gensim, Student Incubator

This is a blog post by one of our Incubator students, Ji Xiaohong. Ji worked on the problem of aligning differently trained word embeddings (such as word2vec), which is useful in applications such as machine translation or tracking language evolution within the same language. I was working on the Translation Matrix project, an idea originally proposed by Mikolov et al …

Chinmaya’s GSoC 2017 Summary: Integration with sklearn & Keras and implementing fastText

Chinmaya Pancholi gensim, Google Summer of Code, Student Incubator

This blog summarizes the work that I did for Google Summer of Code 2017 with Gensim. My work during the summer was divided into two parts: integrating Gensim with scikit-learn & Keras and adding a Python implementation of fastText model to Gensim. Gensim integration with scikit-learn and Keras Gensim is a topic modelling and information extraction library which mainly serves unsupervised …

Semantic Search Using a Fulltext Engine Presented at ACL 2017

Jaroslav Dostál Deep Learning, Machine Learning, ScaleText

Some of our consulting tasks keep on repeating, hinting at a wide-spread pain point across our clients and industries. One of them is looking for meaningful nuggets of information in large unstructured document databases. How do you extract actionable insights and relationships from messy datasets, such as Customer Support records? How about financial reports, or job CVs? Are you still …

Chinmaya’s Google Summer of Code 2017 Live-Blog : a Chronicle of Integrating Gensim with scikit-learn and Keras

Chinmaya Pancholi gensim, Student Incubator

2nd September, 2017 The final blogpost in the GSoC 2017 series summarising all the work that I did this summer can be found here. 15st August, 2017 During the last two weeks, I had been working primarily on adding a Python implementation of Facebook Research’s Fasttext model to Gensim. I was also simultaneously working on completing the tasks left for adding scikit-learn API for …

Parul’s Google Summer of Code 2017 Live-Blog : a chronicle of adding training and topic visualizations in gensim

Parul Sethi gensim, Student Incubator

19th August 2017 For last phase of my project, i’ll be adding a visualization which is an attempt to overcome some of the limitations of already available topic model visualizations. Current visualizations focus more on topics or topic-term relations leaving out the scope to comprehensively explore the document entity. I’d work on an interface which would allow us to interactively …

prakhar2b

Google Summer of Code 2017 – Performance improvement in Gensim and fastText

Prakhar Pratyush gensim, Student Incubator

July 20, 2017 This week, I’ve mostly worked on implementing native unsupervised fastText (PR #1482) in gensim. It’s quite challenging as I had to look into the fasttext C codes, and read the research paper to properly understand how this is working, and then had to figure out the similarity with word2vec code. After lots of discussion with mentors, we …

Google Summer of Code 2017 – Week 1 of Integrating Gensim with scikit-learn and Keras

Chinmaya Pancholi gensim, Student Incubator

This is my first post as part of Google Summer of Code 2017 working with Gensim. I would be working on the project ‘Gensim integration with scikit-learn and Keras‘ this summer. I stumbled upon Gensim while working on a project which utilized the Word2Vec model. I was looking for a functionality to suggest words semantically similar to the given input word and Gensim’s …