Semantic Search Using a Fulltext Engine Presented at ACL 2017

Jaroslav Dostál Deep Learning, Machine Learning, ScaleText

Some of our consulting tasks keep on repeating, hinting at a wide-spread pain point across our clients and industries. One of them is looking for meaningful nuggets of information in large unstructured document databases. How do you extract actionable insights and relationships from messy datasets, such as Customer Support records? How about financial reports, or job CVs? Are you still …

New Gensim feature: Author-topic modeling. LDA with metadata.

Ólavur Mortensen gensim

The author-topic model is an extension of Latent Dirichlet Allocation that allows data scientists to build topic representations of attached author labels. These author labels can represent any kind of discrete metadata attached to documents, for example, tags on posts on the web. In December of 2016, I wrote a blog post explaining that a Gensim implementation was on its …

Radim, Gensim and RaRe Technologies

Radim Řehůřek gensim, Machine Learning Leave a Comment

Racing through 2016 with so much on the front burner and yet it is timely to pause for a quick update on the launch of my new machine learning company, RaRe Technologies. The Start of Something Exciting I’ve heard from a few people who were confused when they received a recent newsletter from “RaRe Technologies”, when they signed up for …

Does Python Stand a Chance in Today’s World of Data Science? [video]

Tony DiLoreto gensim, Machine Learning Leave a Comment

Earlier this summer, our director Radim Řehůřek, led a talk about the state of Python in today’s world of Data Science. Covered in the talk is how businesses are using Python for commercial success, Python vs Java, and an interesting comparison of the popular latent semantic analysis (SVD) and word2vec algorithms running on with different platforms: Spark MLlib, gensim, scikit-learn …