Text Summarization with Gensim

Ólavur Mortensen programming 23 Comments

text-analytics
Text summarization is one of the newest and most exciting fields in NLP, allowing for developers to quickly find meaning and extract key words and phrases from documents. RaRe Technologies’ newest intern, Ólavur Mortensen, walks the user through text summarization features in Gensim.

The gensim implementation is based on the popular “TextRank” algorithm and was contributed recently by the good people from the Engineering Faculty of the University in Buenos Aires. This is the first of many publications from Ólavur, and we expect to continue our educational apprenticeship program with students like Ólavur to help them showcase their talents.

The following example was written in IPython Notebook (newly renamed “Jupyter“), feel free to install the Gensim package and step through the tutorial. The source code for the notebook is available under gensim/docs/notebooks.

Are you also interested in sharpening your open source skills or contributing to open source projects? Get in touch using the contact form below.

Comments 23

  1. turja chaudhuri

    Cannot install gensim inside anaconda python setup 2.7 even with pip. Any ideas?
    Error StackTrace

    File “C:\Users\turjac591\AppData\Local\Continuum\Anaconda2\lib\site-packages\gensim\__init__.py”, line 6, in
    from gensim import parsing, matutils, interfaces, corpora, models, similarities, summarization

    File “C:\Users\turjac591\AppData\Local\Continuum\Anaconda2\lib\site-packages\gensim\models\__init__.py”, line 7, in
    from .coherencemodel import CoherenceModel

    File “C:\Users\turjac591\AppData\Local\Continuum\Anaconda2\lib\site-packages\gensim\models\coherencemodel.py”, line 23, in
    from gensim import interfaces

    ImportError: cannot import name interfaces

  2. Post
    Author
  3. Post
    Author
    Ólavur Mortensen

    To summarize multiple documents, you could just concatenate the documents and run the algorithm on that. There may be some caveats to using that method however, I haven’t tried it.

    You should ask that question on the Gensim mailing list, and see if someone else has an idea about it (https://groups.google.com/forum/#!forum/gensim).

    1. Muhammad Shams paracha

      sir i am from pakistan and want to work on automatic urdu text summarization. but research work on urdu language is tough for me. please if you have latest and helpful paper than kindly send me and i want to develop algorithm for urdu text summerization. please give me some text.

  4. Pingback: Text Summarization | IMPULSE

  5. Norman Fisher

    Thank you for sharing information)
    There are many writing services benefits that promote on the web however not very many really, have qualified staff to give their services. Most use the extremely least expensive online writers that they can find regardless of the possibility that they have poor language skills and are inexperienced. We, in any case, realize that the nature of the work that we supply to our customers is straightforwardly identified with the abilities of our writers, consequently, we just use the absolute best that we can find.

    Here’s the reason I choose this “website summarizer .

    The appropriate response is basic – here they built up the calculation that encourages keep up the most noteworthy standard of condensing that incorporates a few phases of data refinement. During the main stage your article or a bit of content experiences a thorough examination and concentrate so as to characterize the key data bearing components, similar to who, when, where, why and how took the activities, other essential components like subtle elements and foundation are accumulated and prepared with the assistance of extraordinary format arranging data. Simply after the correct data game plan the information can be changed, isolated and designated in the best possible approach to fill the primary need of the condensing – shortening the portrayal without losing the thoughts and general significance.

  6. Mina

    Hi, thank you for your helpful tutorial. I want to modify the code so I can use it in other language. I can’t find out what is “tags” in “merge_syntactic_units” and what is the usage of it?

  7. Mina SMZ

    Hi, it is very helpful thank you for this tutorial, I have an question what is the usage of keywords.py and where it is used?

  8. grandmasterspock

    I’m having trouble using the keywords result as a list. is there any special way to do that?

  9. Ming

    Hey,sir, i have a question about the function “keywords”. When i use the attribute “words=”, how can i use that? I try to use “words=3” but the program tells me it is an invalid syntax.
    Plz tell me how to do that

  10. Harsh Shah

    Thank you for the tutorial. I am getting following warning:

    2018-02-01 14:37:00,208 : WARNING : Input text is expected to have at least 10 sentences.
    2018-02-01 14:37:00,212 : INFO : adding document #0 to Dictionary(0 unique tokens: [])
    2018-02-01 14:37:00,212 : INFO : built Dictionary(52 unique tokens: [‘clearli’, ‘adult’, ‘chang’, ‘member’, ‘visit’]…) from 4 documents (total 70 corpus positions)
    2018-02-01 14:37:00,216 : WARNING : Input corpus is expected to have at least 10 documents.
    2018-02-01 14:37:00,224 : WARNING : Couldn’t get relevant sentences.

    Can you please help me to solve this? or is it the limitation to this library, if yes then can you please suggest some other library same as gensim.

    Thanks in advance!!

  11. Pingback: Automatic Text Summarization with Python - Text Analytics Techniques

  12. Pingback: A benchmark comparison of extractive summarisation systems - SKIM

Leave a Reply

Your email address will not be published. Required fields are marked *