7 minute read

AUTHORS

John Ray Martinez (jbm332@drexel.edu), Jonathan Musni (jem472@drexel.edu), Juan Miguel Trinidad (jbt46@drexel.edu)

This research is implemented in fulfillment of the requirements for the Information Retrieval Systems Course of Master of Science in Data Science under Drexel University College of Computing & Informatics

INTRODUCTION

In recent years, there has been a growth of large volume of text data from a variety of sources. This explosion of amount of text data led to the problem of information overload. The generation today called ‘Net generation’ learns through multitasking, performing activities simultaneously, and has short attention span. ‘Net generation’ can perform more tasks simultaneously and shift their attentions quickly from one to another, but would probably be overwhelmed if they are asked to read a long report. Thus, more educators motivate them to engage in the learning content by supplying shorter contents in the curricula [1]. To alleviate information overload and considering the characteristic of the ‘Net generation’, the need for automatic text summarization is deemed necessary.

One tool for text summarization is Python package sumy [2]. It has three most notably used models namely LSA (latent semantic analysis), LexRank and Luhn. LSA is an unsupervised method of summarization that combines term frequency techniques with singular value decomposition to summarize texts. Also an unsupervised approach, LexRank is a graphical based text summarizer inspired by algorithms PageRank. Meanwhile, Luhn is a naive approach based on TF-IDF. It scores sentences based on frequency of the most important words and also assigns higher weights to sentences occurring near the beginning of a document [3]. In this study, we investigate and evaluate the application of sumy models on the extractive summarization task using news articles and show that the results obtained with LSA are competitive with other two algorithms developed.

Furthermore, utilizing the sumy extractive summarization techniques, we build and implement a web application on Heroku that mainly functions as text summarizer.

EXPERIMENTS

DATA DESCRIPTION

The dataset is approximately 2225 documents from the BBC news website and represented into five topical areas such as business, entertainment, politics, sport, and technology [4]. This dataset for extractive text summarization has 510 business news articles of BBC from 2004 to 2005. For each article, one summary are provided in the Summaries folder. In this study, the first 100 pairs of business news articles and its correponding reference summaries were manually selected and used. The extractive summary articles will be used as reference summaries (gold standard) for evaluating the system summaries using ROUGE.

METHODOLOGY

'Algorithm 1'

We applied the three sumy methods in the sampled business news articles. All these algorithms extract six sentences from each article in order to compose the summary. We performed an experimental comparison with three extractive summarization techniques. The performance of each summarization technique was evaluated by using variants of the ROUGE measure [5]. This performance metrics is a method based on Ngram statistics and found to be highly correlated with human evaluations [6]. Concretely, Rouge-N with unigrams and bigrams (Rouge1 and Rouge-2) and Rouge-L. Each Rouge has corresponding F1, precision, recall scores. First, the value of the evaluation measure was calculated for each of the article. Next, we took average of those scores to arrive at a consolidated Recall and F1 scores for each Rouge. Algorithm 1 shows the pesudo-code of the implementation of the method in this study.

EXPERIMENTAL RESULTS AND DISCUSSION

We evaluate the four summarization techniques on a single-document summarization task using 100 news articles from business section of BBC dataset. For this task to have a meaningful evaluation, we report ROUGE Recall as standard evaluation and take output length into account [7]. For each article, each summarizer generate a six-sentences summary. The corresponding 100 human-created reference summaries are provided by BBC and used in the evaluation process. We compare the performance of the three different summarizing techniques with each other.

Table 1 shows the results obtained on this data set of 100 news articles, including the results for LSA, and the results of the other two sumy summarizers in the single document summarization task. LSA summarization technique succeeds in summarization task on news articles followed by sumy-LexRank then sumy-Luhn.

Table 1. The average Recall (F1) of test set results on the BBC business news articles dataset using granularity of text metrics ROUGE-1, ROUGE-2 and ROUGE-L.

Model ROUGE-1 ROUGE-2 ROUGE-L
LSA 0.867 (0.059) 0.617 (0.041) 0.841 (0.082)
Luhn 0.794 (0.052) 0.407 (0.031) 0.612 (0.045)
LexRank 0.844 (0.072) 0.576 (0.049) 0.807 (0.096)

Figure 1 visualizes the comparison of models using Rouge Recall as performance metrics. As shown, LSA has the best performance in extractive summarization task on business news articles.

'ROUGE' Figure 1: ROUGE performance of algorithms.

IMPLEMENTED SYSTEM

In this section, we presented the overall architecture to implement the system and discussed the major system features.

SYSTEM ARCHITECTURE

The overall architecture of web application for single document summarization based on news components using sumy models LSA, Luhn and LexRank is shown in Figure 2. The three main phases include the back-end, front-end, and deployment. To create a web application, we utilized Flask which is a micro web framework written in Python. For layout to look good, we styled it with Boostrap. Finally, we deployed the models on Heroku. Figure 3 presented visually the detailed diagram of system features.

'Architecture overview' Figure 2: System architecture overview.

SYSTEM FEATURES

I AM SAM web app alleviates information overload by distilling important information using machine learning algorithms. It is an assistant that helps the users manage time by providing text summary in seconds. In this project, we focus on plain text and URL as inputs. More specifically, we consider the following features:

'System features' Figure 3: System Features.

Target Length

Target length is the number of sentences in the text summary. This feature lets the user input the prefer length of summary in terms of number of sentences in the output.

Inputs

There are two possible inputs: plain text articles or URL that contains the text articles. Figure 3 visualizes the diagam flow for these two options.

Check Mode

Feature ‘check mode’ gives the user a choice to put up a reference summary. There is a ON and OFF toggle for this feature. It provides the user to check how good the provided summaries with respect to the reference summaries. This also shows the calculated Rouges (F1, precision, recall) scores for each model text summary.

Best Summary Generator

Once the check mode is ON, the system compares the summary output of each algorithm and output the best model based on Rouge Recall metric

CONCLUSION AND FUTURE WORKS

We presented Python package sumy implementation of LSA (latent semantic analysis) outperforming the other models such as LexRank and Luhn in extractive summarization task using BBC business news articles. Furthermore, we implemented an automatic text summarization system called I AM SAM through Heroku that has capability to summarize news article from a URL or plain text utilizing the three sumy extractive summarization techniques.

As future work, we plan to extend the averaging algorithm to all articles in business news articles folder which has a total of 510 articles. In addition, we will explore other news articles such as entertainment, politics, sport, and technology.

REFERENCES

[1] D. G. Oblinger and J. L. Oblinger. 2005. In Educating the net generation. Educause. Retrieved August, 19, 2020 from https://www.educause.edu/ir/library/PDF/pub7101.PDF.

[2] Mišo Belica. 2020. Module for automatic summarization of text documents and HTML pages. https://github.com/miso-belica/sumy.

[3] Mišo Belica. 2020. Summarization methods. https://github.com/miso-belica/sumy/blob/master/docs/summarizators.md.

[4] Derek Greene and Pádraig Cunningham. 2006. Practical Solutions to the Problem of Diagonal Dominance in Kernel Document Clustering. In Proc. 23rd International Conference on Machine learning (ICML’06). ACM Press, 377–384.

[5] C.Y. Lin. 2004. ROUGE: A Package for Automatic Evaluation of Summaries. In In Text Summarization Branches Out: Proceedings of the ACL-04 Workshop. Association for Computational Linguistics: Barcelona, Spain, 74–81.

[6] C.Y Lin and E.H. Hovy. 2003. Automatic evaluation of summaries using n-gram co-occurrence statistics. In In Proceedings of Human Language Technology Conference (HLT-NAACL 2003). Association for Computational Linguistics: Edmonton, Canada.

[7] Benjamin Van Durme Courtney Napoles and ChrisCallison-Burch. 2011. Evaluating sentence com-pression: Pitfalls and suggested remedies. In Proceedings of the Workshop on Monolingual Text-To-Text Generation. Association for Computational Linguistics: Portland, Oregon, 91–97.