New Gensim feature: Author-topic modeling. LDA with metadata.

Ólavur Mortensen gensim


The author-topic model is an extension of Latent Dirichlet Allocation that allows data scientists to build topic representations of attached author labels. These author labels can represent any kind of discrete metadata attached to documents, for example, tags on posts on the web.

In December of 2016, I wrote a blog post explaining that a Gensim implementation was on its way. The blog also discussed the shortcomings of existing author-topic model implementations. This implementation is now ready, and a tutorial in Jupyter Notebook format can be found at

If you are experiencing any problems, or if you just want to discuss something concerning the model and the tutorial, please head over to the Gensim Google group.