Experimental Analysis of Multiple Criteria for Extractive Multi-Document Text Summarization Articles uri icon

publication date

  • February 2020

start page

  • 1

end page

  • 13

issue

  • 112904

volume

  • 140

International Standard Serial Number (ISSN)

  • 0957-4174

Electronic International Standard Serial Number (EISSN)

  • 1873-6793

abstract

  • Automatic text summarization methods are increasingly needed in different fields of knowledge. In the scientific literature, generic extractive multi-document text summarization can be formulated as an optimization problem which involves several criteria. Only two criteria have been considered simultaneously, i.e., content coverage and redundancy reduction, whereas the other ones, relevance and coherence have been considered separately. Therefore, there is a lack of studies comparing the performance of different criteria. For this reason, a comparative study of the different criteria suitable for generic extractive multi-document text summarization is performed here. All possible combinations of two, three, and four criteria have been considered within a multi-objective optimization context. Experiments have been carried out based on datasets from Document Understanding Conferences (DUC), and the combinations of objective functions have been compared and evaluated with Recall-Oriented Understudy for Gisting Evaluation (ROUGE) metrics. Redundancy reduction has been demonstrated as an indispensable criterion, being the coherence the least significant and efficient criterion. The combination that includes content coverage, redundancy reduction, and relevance obtains the most balanced results in terms of average ROUGE and execution time.

subjects

  • Computer Science

keywords

  • multi-document summarization; multi-objective optimization; content coverage; redundancy reduction; relevance; coherence