Experimental Analysis of Multiple Criteria for Extractive Multi-Document Text Summarization

Automatic text summarization methods are increasingly needed in different fields of knowledge. In the scientific literature, generic extractive multi-document text summarization can be formulated as an optimization problem which involves several criteria. Only two criteria have been considered simultaneously, i.e., content coverage and redundancy reduction, whereas the other ones, relevance and coherence have been considered separately. Therefore, there is a lack of studies comparing the performance of different criteria. For this reason, a comparative study of the different criteria suitable for generic extractive multi-document text summarization is performed here. All possible combinations of two, three, and four criteria have been considered within a multi-objective optimization context. Experiments have been carried out based on datasets from Document Understanding Conferences (DUC), and the combinations of objective functions have been compared and evaluated with Recall-Oriented Understudy for Gisting Evaluation (ROUGE) metrics. Redundancy reduction has been demonstrated as an indispensable criterion, being the coherence the least significant and efficient criterion. The combination that includes content coverage, redundancy reduction, and relevance obtains the most balanced results in terms of average ROUGE and execution time.

Experimental Analysis of Multiple Criteria for Extractive Multi-Document Text Summarization Articles