Improving single document summarization in a multi-document environment

Huspi, S 2015, Improving single document summarization in a multi-document environment, Doctor of Philosophy (PhD), Computer Science and Information Technology, RMIT University.


Document type: Thesis
Collection: Theses

Attached Files
Name Description MIMEType Size
Huspi.pdf Thesis Click to show the corresponding preview/stream application/pdf; Bytes
Title Improving single document summarization in a multi-document environment
Author(s) Huspi, S
Year 2015
Abstract Most automatic document summarization tools produce summaries from single or multiple document environments. Recent works have shown that there are possibilities to combine both systems: when summarising a single document, its related documents can be found. These documents might have similar knowledge and contain beneficial information in regard to the topic of the single document. Therefore, the summary produced will have sentences extracted from the local (single) document and make use of the additional knowledge from its surrounding (multi-) documents. This thesis will discuss the methodology and experiments to build a generic and extractive summary for a single document that includes information from its neighbourhood documents. We also examine the evaluation and configuration of such systems.

There are three contributions of our work. First, we explore the robustness of the Affinity Graph algorithm to generate a summary for a local document. This experiment focused on two main tasks: using different means to identify the related documents, and to summarize the local document by including the information from the related documents. We showed that our findings supported the previous work on document summarization using the Affinity Graph. However, contrary to past suggestions that one configuration of settings was best, we found no particular settings gave better improvements over another. Second, we applied the Affinity Graph algorithm in a social media environment. Recent work in social media suggests that information from blogs and tweets contain parts of the web document that are considered interesting to the user. We assumed that this information could be used to select important sentences from the web document, and hypothesized that the information would improve the summary of a single document.

Third, we compare the summaries generated using the Affinity Graph algorithm in two types of evaluation. The first evaluation is by using ROUGE, a commonly used evaluation tools that measure the number of overlapping words between automated summaries and human-generated summaries. In the second evaluation, we studied the judgement of human users using a crowdsourcing platform. Here, we asked people to choose their judgement and explained their reasons to prefer one summary to another. The results from the ROUGE evaluation did not give significant results due to the small tweet-document dataset used in our experiments. However, our findings on the human judgement evaluation showed that the users are more likely to choose the summaries generated using the expanded tweets compared to summaries generated from the local documents only. We conclude the thesis with a study of the user comments, and discussion on the use of Affinity Graph to improve single document summarization. We also include the discussion of the lessons learnt from the user preference evaluation using crowdsourcing platform.
Degree Doctor of Philosophy (PhD)
Institution RMIT University
School, Department or Centre Computer Science and Information Technology
Subjects Information Retrieval and Web Search
Analysis of Algorithms and Complexity
Keyword(s) summary evaluation
crowdsourcing
graph-based algorithm
multi-document environment
single document summarization
Versions
Version Filter Type
Access Statistics: 173 Abstract Views, 212 File Downloads  -  Detailed Statistics
Created: Fri, 07 Jul 2017, 09:36:56 EST by Denise Paciocco
© 2014 RMIT Research Repository • Powered by Fez SoftwareContact us