Text summarization is one of the newest and most exciting fields in NLP, allowing for developers to quickly find meaning and extract key words and phrases from documents. RaRe Technologies’ newest intern, Ólavur Mortensen, walks the user through text summarization features in Gensim.
The gensim implementation is based on the popular “TextRank” algorithm and was contributed recently by the good people from the Engineering Faculty of the University in Buenos Aires. This is the first of many publications from Ólavur, and we expect to continue our educational apprenticeship program with students like Ólavur to help them showcase their talents.
The following example was written in IPython Notebook (newly renamed “Jupyter“), feel free to install the Gensim package and step through the tutorial. The source code for the notebook is available under gensim/docs/notebooks.
Are you also interested in sharpening your open source skills or contributing to open source projects? Get in touch using the contact form below.