Text Summarization with Gensim

Text summarization is one of the newest and most exciting fields in NLP, allowing for developers to quickly find meaning and extract key words and phrases from documents. RaRe Technologies’ newest intern, Ólavur Mortensen, walks the user through text summarization features in Gensim.

The gensim implementation is based on the popular “TextRank” algorithm and was contributed recently by the good people from the Engineering Faculty of the University in Buenos Aires. This is the first of many publications from Ólavur, and we expect to continue our educational apprenticeship program with students like Ólavur to help them showcase their talents.

The following example was written in IPython Notebook (newly renamed “Jupyter“), feel free to install the Gensim package and step through the tutorial. The source code for the notebook is available under gensim/docs/notebooks.

Are you also interested in sharpening your open source skills or contributing to open source projects? Get in touch using the contact form below.

Comments 23

ffd

2016-09-13 at 11:52 am

fbdf

turja chaudhuri

2016-12-08 at 10:36 am

Cannot install gensim inside anaconda python setup 2.7 even with pip. Any ideas?
Error StackTrace

File “C:\Users\turjac591\AppData\Local\Continuum\Anaconda2\lib\site-packages\gensim\__init__.py”, line 6, in
from gensim import parsing, matutils, interfaces, corpora, models, similarities, summarization

File “C:\Users\turjac591\AppData\Local\Continuum\Anaconda2\lib\site-packages\gensim\models\__init__.py”, line 7, in
from .coherencemodel import CoherenceModel

File “C:\Users\turjac591\AppData\Local\Continuum\Anaconda2\lib\site-packages\gensim\models\coherencemodel.py”, line 23, in
from gensim import interfaces

ImportError: cannot import name interfaces

Post
Author

Ólavur Mortensen

2016-12-08 at 11:04 am

Turja, I suggest you pose this question on the Gensim mailing list: https://groups.google.com/forum/#!forum/gensim

ravi

2017-01-09 at 9:54 am

multi document summary is possible by this ?

Post
Author

Ólavur Mortensen

2017-01-11 at 11:04 am

To summarize multiple documents, you could just concatenate the documents and run the algorithm on that. There may be some caveats to using that method however, I haven’t tried it.

You should ask that question on the Gensim mailing list, and see if someone else has an idea about it (https://groups.google.com/forum/#!forum/gensim).

Muhammad Shams paracha
2019-02-19 at 9:52 am

sir i am from pakistan and want to work on automatic urdu text summarization. but research work on urdu language is tough for me. please if you have latest and helpful paper than kindly send me and i want to develop algorithm for urdu text summerization. please give me some text.

Reply

Pingback: Text Summarization | IMPULSE

Ashwin Perti

2017-06-24 at 10:39 am

Really a very nice and short Tutorial for Text Analytics

eva

2017-07-27 at 3:07 am

thanks for this. very helpful.

viswanath

2017-09-02 at 3:00 pm

thanks for this.

vinay

2017-09-28 at 10:32 am

Very helpful Thanks!!!

Norman Fisher

2017-10-02 at 1:36 pm

Thank you for sharing information)
There are many writing services benefits that promote on the web however not very many really, have qualified staff to give their services. Most use the extremely least expensive online writers that they can find regardless of the possibility that they have poor language skills and are inexperienced. We, in any case, realize that the nature of the work that we supply to our customers is straightforwardly identified with the abilities of our writers, consequently, we just use the absolute best that we can find.

Here’s the reason I choose this “website summarizer .

The appropriate response is basic – here they built up the calculation that encourages keep up the most noteworthy standard of condensing that incorporates a few phases of data refinement. During the main stage your article or a bit of content experiences a thorough examination and concentrate so as to characterize the key data bearing components, similar to who, when, where, why and how took the activities, other essential components like subtle elements and foundation are accumulated and prepared with the assistance of extraordinary format arranging data. Simply after the correct data game plan the information can be changed, isolated and designated in the best possible approach to fill the primary need of the condensing – shortening the portrayal without losing the thoughts and general significance.

Mina

2017-10-03 at 11:55 am

Hi, thank you for your helpful tutorial. I want to modify the code so I can use it in other language. I can’t find out what is “tags” in “merge_syntactic_units” and what is the usage of it?

Mina SMZ

2017-10-04 at 7:05 am

Hi, it is very helpful thank you for this tutorial, I have an question what is the usage of keywords.py and where it is used?

grandmasterspock

2017-10-31 at 8:57 am

I’m having trouble using the keywords result as a list. is there any special way to do that?

Ming

2017-11-01 at 9:26 am

Hey，sir, i have a question about the function “keywords”. When i use the attribute “words=”, how can i use that? I try to use “words=3” but the program tells me it is an invalid syntax.
Plz tell me how to do that

Vicente Cifuentes

2017-11-14 at 7:59 am

Check out this approach to summarization and hierarchical keyword extraction: http://elcid.demon.nl/form.html

Harsh Shah

2018-02-01 at 9:14 am

Thank you for the tutorial. I am getting following warning:

2018-02-01 14:37:00,208 : WARNING : Input text is expected to have at least 10 sentences.
2018-02-01 14:37:00,212 : INFO : adding document #0 to Dictionary(0 unique tokens: [])
2018-02-01 14:37:00,212 : INFO : built Dictionary(52 unique tokens: [‘clearli’, ‘adult’, ‘chang’, ‘member’, ‘visit’]…) from 4 documents (total 70 corpus positions)
2018-02-01 14:37:00,216 : WARNING : Input corpus is expected to have at least 10 documents.
2018-02-01 14:37:00,224 : WARNING : Couldn’t get relevant sentences.

Can you please help me to solve this? or is it the limitation to this library, if yes then can you please suggest some other library same as gensim.

Thanks in advance!!

Pingback: Automatic Text Summarization with Python - Text Analytics Techniques

Ken

2018-04-16 at 1:09 am

Hi,

Thanks for the post. Just wonder if it’s possible to return the ranking of all sentences?

Hiren

2018-04-26 at 4:35 am

when i try summary using same text , but every time output means summary is changes , why ????

Pingback: A benchmark comparison of extractive summarisation systems - SKIM

Muhammad Shams paracha