Impact Factor 2021: 3.041 (@Clarivate Analytics)
5-Year Impact Factor: 2.776 (@Clarivate Analytics)
Impact Factor Rank: 10/24, Q2 (Tropical Medicine)
  • Users Online: 637
  • Print this page
  • Email this page

Table of Contents
Year : 2020  |  Volume : 13  |  Issue : 8  |  Page : 378-380

Using twitter and web news mining to predict COVID-19 outbreak

1 Information Technology, Islamic Azad University Branch of Kerman, Iran
2 Zoonoses Research Center, Jahrom University of Medical Sciences, Jahrom, Iran

Date of Submission19-Feb-2020
Date of Decision25-Feb-2020
Date of Acceptance26-Feb-2020
Date of Web Publication02-Mar-2020

Correspondence Address:
Vahid Rahmanian
Zoonoses Research Center, Jahrom University of Medical Sciences, Jahrom
Login to access the Email id

Source of Support: None, Conflict of Interest: None

DOI: 10.4103/1995-7645.279651

Rights and Permissions

How to cite this article:
Jahanbin K, Rahmanian V. Using twitter and web news mining to predict COVID-19 outbreak. Asian Pac J Trop Med 2020;13:378-80

How to cite this URL:
Jahanbin K, Rahmanian V. Using twitter and web news mining to predict COVID-19 outbreak. Asian Pac J Trop Med [serial online] 2020 [cited 2023 Jun 5];13:378-80. Available from:

On January 9, 2020, the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), formerly known as 2019-nCoV, was declared the causative agent in 15 of the 59 hospitalized patients in Wuhan, Hubei Province, causing great concern: this new coronavirus has 70% genetic association with SARS and is a subspecies of Sarbecovirus. The virus is temporarily named the 2019-nCoV virus[1] and the Coronavirus Study Group has nominated the virus as SARS-CoV-2[2].

In January 2020, more positive cases from other countries such as Thailand, Japan, South Korea, and the United States of America were reported by January 20, 2020, and the transmission of individual-to-health care, further complicated the situation[3].

Coronaviruses are zoonotic, meaning they are transmitted between animals and people, but the ways in which it is transmitted, animal reservoirs, prophylaxis, and precise clinical manifestations requires more investigation. There is currently no vaccine and appropriate treatment for COVID-19, so a high index of clinical suspicion and inquiring about the history of travel and contact from patients with fever and respiratory symptoms play a critical role in the prevention and control of the disease[4].

On a daily basis, a large number of Websites and online social media produce a large amount of data in a variety of fields such as technology, medicine, history, political and social news, arts and other fields. Analyzing and classifying these data leads to the production of knowledge and nowadays, it has attracted the attention of many researchers[5].

Web news mining is one of the most significant tools and the subset sciences “Big Data” in social networking. A web news mining-based automatic system can monitor, evaluate, and categorize news, which, in addition to managing news articles, it is also applied in the field of advisory systems[6].

Social networks fall into six groups as follows[7]: 1. Micro-blogging platforms: such as twitter; 2. Blogging platforms: such as WordPress and Blogger; 3. Instant messaging Apps: such as WhatsApp and Telegram; 4. Networking platforms: such as Facebook and LinkedIn; 5. Software elaboration platform: such as GitHub; 6. Photo/video sharing platforms: such as Instagram and YouTube.

The Twitter social networking is a micro-blogging platform considered by researchers as a result of useful applications. There are over 320 million active subscribers on the social network, which daily generates approximately 6 million tweets containing instant news and comments; due to the wealth of information and their easy access. Twitter has extensive applications, such as the predicting a political process, investigating the effectivity of a product, monitoring the events pertaining to the health and hygiene[8]. Approximately 23% of Twitter subscribers are adults and on a daily basis, a total of approximately 500 million Tweets are broadcasted each day[9].

In the model presented in this study, unstructured data on a novel coronavirus (2019-nCoV) are extracted from Twitter and then subjected to text cleaning, so-called screening or filtering, and finally classification operations. Since the focus is on real-time programming, this model is implemented using a fuzzy rule-based evolutionary algorithm called Eclass1-MIMO.

One of the most effective ways to prevent and control epidemics is to monitor and track the news and social networks about the spread of infectious diseases. In this study, the FAMEC method was used to send an alert message to surveillance systems for timely detection outbreaks of the COVID-19.

The FAMEC method has four main phases as follows:

1. Clearing and integrating data and extracting vocabulary; 2. Web and tweet crawling; 3. Applying fuzzy rules and storing data using fuzzy classifier. 4. Visualizing and sending messages.

The visualization component of the suggested method aims to assist in real-time monitoring and tracking of the beginning and spread of outbreaks, which can greatly contribute to the effectiveness of public health surveillance systems in this area.

Initially, during the period between Dec. 31 2019 and Feb. 6 2020, 2019-nCoV (COVID-19) tweets were extracted from the Twitter social network and stored in the relevant database. The collected database contained 364 080 tweets from 179 534 users. 21 805 371 users who have re-tweet or like these posts and 52 837 975 554 times these posts have been viewed by users. The main hashtags about novel coronavirus were #corona, #ncov, #wuhan, #china, #2019- nCoV, #virus, #corona virus china, #coronavirus outbreak, wuhan virus.

[Figure 1] shows the results obtained from the monitoring of a novel coronavirus (2019-nCoV) related news in the study period, which are associated to 364 080 tweets from 179 534 users. The most Tweets about the coronavirus have been from the US (42.1%), China (13.0%), Italy (11.8%) and Australia (6.6%). This is consistent with the report of the cases which was obtained from the WHO[10]. In this study, a new method based on fuzzy algorithm was applied for evolving of the TSK of mining, monitoring, storage and visualization of news and tweets about preparing our COVID-19. To execute the method, more than 364 080 clean and integrated tweets and news were then categorized using the Eclass1-MIMO method and finally viewed in real time on the world map.
Figuer 1: Monitoring of geographical distribution of the tweets about COVID-19 between 31/12/2019 and 6/02/2020.

Click here to view

In the recent years, a significant number of researchers have been working on categorizing, clustering, analyzing emotions, thinking and developing recommenders based on social data, but most of these works have focused on either news websites or Twitter.

The evolving fuzzy algorithm with the Eclass1-MIMO method was used in the study of Iglesias for classifying six areas of knowledge, health, technology, sports, arts and commerce[5]. Also, Jahanbin et al. used web news mining in infectious disease surveillance systems to timely diagnose epidemics[8].

The geographical origins of tweets posted about COVID-19 were found to be consistent with the formal WHO report about incidence cases of COVID-19 during the study period. This reflects the efficacy of the suggested method to monitor and track this infection. The limitation of the proposed method is that it cannot be used to monitor and track infectious diseases in regions with poor or no access to social networks such as Twitter and Facebook. Also, as the language of processing the tweets in this study was English, the results may be affected by the processing language.

In conclusion, due to the revolutionary development of the social networks, using the web news mining of these network used by each community, the geographical and demographical of the users can be identified accurately. This is due to the fact that these network report easily statistical data with the most comments, photos, videos, etc. on COVID-19. This helps to predict morbidity rates in each region, and bring attention of policy-maker in the health care systems to purposefully implement educational programs in the regions where exposed to higher risks. Finally, this can help to reduce the incidence case and even mortality in communities.

Conflict of interest statement

The authors declare that there is no conflict of interest.


The authors would like to thank to the instructors of the online course “Machine Learning for Data Science and Analytics” provided by Columbia University for giving us better insight into the area of data and text mining.

Authors’ contributions

VR, and KJ conceived and designed the study. VR, and KJ were responsible for literature search and screening. KJ were responsible for data collection and analyses. VR, KJ, contributed to data interpretation. KJ drafted the manuscript and VR, critically revised the manuscript.

  References Top

Nishiura H, Jung SM, Linton NM, Kinoshita R, Yang Y, Hayashi K, et al. The extent of transmission of novel coronavirus in Wuhan, China, 2020. J Clin Med Jan. 24 2020. doi: 10.3390/jcm9020330.  Back to cited text no. 1
Gorbalenya AE, Baker SC, Baric RS, de Groot RJ, Drosten C, Gulyaeva AA, et al. Severe acute respiratory syndrome-related coronavirus: The species and its viruses-a statement of the coronavirus study group. bioRxiv 2020.02.07.937862; doi:  Back to cited text no. 2
Majumder M, Mandl KD. Early transmissibility assessment of a novel coronavirus in Wuhan, China. SSRN Jan. 26 2020. doi: http://dx.doi. org/10.2139/ssrn.3524675.  Back to cited text no. 3
World Health Organization. Coronavirus 2020. [Online]. Available from: [Accessed on 10 February 2020].  Back to cited text no. 4
Iglesias JA, Tiemblo A, Ledezma A, Sanchis A. Web news mining in an evolving framework. Inf Fusion 2016; 28: 90-98.  Back to cited text no. 5
Guellil I, Boukhalfa K. Social big data mining: A survey focused on opinion mining and sentiments analysis. In: Conference of ISPS 2015: 12th International Symposium on Programming and Systems. Algiers: IEEE; 2015. doi:10.1109/ISPS.2015.7244976.  Back to cited text no. 6
Ravindran SK, Garg V. Mastering social media mining with R. Mumbai: Packt Publishing Ltd; 2015.  Back to cited text no. 7
Jahanbin K, Rahmanian F, Rahmanian V, Sotoodeh Jahromim A. Application of Twitter and web news mining in infectious disease surveillance systems and prospects for public health. GMS Hyg Infect Control 2019; 14: 1-12.  Back to cited text no. 8
Duggan M, Ellison NB, Lampe C, Lenhart A, Madden MJPRC. Social media update 2014. Pew Res Center 2015; 19(9): 1-17.  Back to cited text no. 9
World Health Organization. Novel coronavirus (2019-nCoV) situation reports. 2020. [Online]; Available from: emergencies/diseases/novel-coronavirus-2019/situation-reports/. [Accessed on 10 February 2020].  Back to cited text no. 10


  [Figure 1]

This article has been cited by
1 Enhancing the government accounting information systems using social media information: An application of text mining and machine learning
Huijue Kelly Duan, Miklos A. Vasarhelyi, Mauricio Codesso, Zamil Alzamil
International Journal of Accounting Information Systems. 2023; 48: 100600
[Pubmed] | [DOI]
2 The Conversation around COVID-19 on Twitter—Sentiment Analysis and Topic Modelling to Analyse Tweets Published in English during the First Wave of the Pandemic
Javier J. Amores, David Blanco-Herrero, Carlos Arcila-Calderón
Journalism and Media. 2023; 4(2): 467
[Pubmed] | [DOI]
3 COVID-19 Pandemic
Jasdeep Kaur, Amit Chhabra, Munish Saini, Nebojsa Bacanin
Journal of Information Technology Research. 2022; 15(1): 1
[Pubmed] | [DOI]
4 An Exploratory Study of Tweets about the SARS-CoV-2 Omicron Variant: Insights from Sentiment Analysis, Language Interpretation, Source Tracking, Type Classification, and Embedded URL Detection
Nirmalya Thakur, Chia Y. Han
COVID. 2022; 2(8): 1026
[Pubmed] | [DOI]
5 Sentiment Analysis of Users’ Reactions on Social Media during the Pandemic
Eldor Abdukhamidov, Firuz Juraev, Mohammed Abuhamad, Shaker El-Sappagh, Tamer AbuHmed
Electronics. 2022; 11(10): 1648
[Pubmed] | [DOI]
6 Correlation Analysis between Urban Elements and COVID-19 Transmission Using Social Media Data
Ru Wang, Lingbo Liu, Hao Wu, Zhenghong Peng
International Journal of Environmental Research and Public Health. 2022; 19(9): 5208
[Pubmed] | [DOI]
7 Exploring the Relationship among Human Activities, COVID-19 Morbidity, and At-Risk Areas Using Location-Based Social Media Data: Knowledge about the Early Pandemic Stage in Wuhan
Mengyue Yuan, Tong Liu, Chao Yang
International Journal of Environmental Research and Public Health. 2022; 19(11): 6523
[Pubmed] | [DOI]
8 COVID-19 Pandemi Döneminde Egitimde Derin Ögrenmeye Dayali Duygu Analizi
Kemal KARGA, Mansur Alp TOÇOGLU, Aytug ONAN
Deu Muhendislik Fakultesi Fen ve Muhendislik. 2022; 24(72): 855
[Pubmed] | [DOI]
9 Identifying Patients With Inflammatory Bowel Disease on Twitter and Learning From Their Personal Experience: Retrospective Cohort Study
Maya Stemmer, Yisrael Parmet, Gilad Ravid
Journal of Medical Internet Research. 2022; 24(8): e29186
[Pubmed] | [DOI]
10 Leveraging Dynamic Heterogeneous Networks to Study Transnational Issue Publics. The Case of the European COVID-19 Discourse on Twitter
Wolf J. Schünemann, Alexander Brand, Tim König, John Ziegler
Frontiers in Sociology. 2022; 7
[Pubmed] | [DOI]
11 Graph-based joint pandemic concern and relation extraction on Twitter
Jingli Shi, Weihua Li, Sira Yongchareon, Yi Yang, Quan Bai
Expert Systems with Applications. 2022; : 116538
[Pubmed] | [DOI]
12 Understanding internal migration in the UK before and during the COVID-19 pandemic using twitter data
Yikang Wang, Chen Zhong, Qili Gao, Carmen Cabrera-Arnau
Urban Informatics. 2022; 1(1)
[Pubmed] | [DOI]
13 Twitter-aided decision making: a review of recent developments
Yihong Zhang, Masumi Shirakawa, Yuanyuan Wang, Zhi Li, Takahiro Hara
Applied Intelligence. 2022;
[Pubmed] | [DOI]
14 Unifying telescope and microscope: A multi-lens framework with open data for modeling emerging events
Yunhe Feng, Chirag Shah
Information Processing & Management. 2022; 59(2): 102811
[Pubmed] | [DOI]
15 Spatiotemporal sentiment variation analysis of geotagged COVID-19 tweets from India using a hybrid deep learning model
Vaibhav Kumar
Scientific Reports. 2022; 12(1)
[Pubmed] | [DOI]
16 Park access affects physical activity: new evidence from geolocated Twitter data analysis
Chuo Li, Jing Zhao, Junjun Yin, Guangqing Chi
Journal of Urban Design. 2022; : 1
[Pubmed] | [DOI]
17 Revealing the linguistic and geographical disparities of public awareness to Covid-19 outbreak through social media
Binbin Lin, Lei Zou, Nick Duffield, Ali Mostafavi, Heng Cai, Bing Zhou, Jian Tao, Mingzheng Yang, Debayan Mandal, Joynal Abedin
International Journal of Digital Earth. 2022; 15(1): 868
[Pubmed] | [DOI]
18 Constructing Mobile Crowdsourced COVID-19 Vulnerability Map With Geo-Indistinguishability
Rui Chen, Liang Li, Ying Ma, Yanmin Gong, Yuanxiong Guo, Tomoaki Ohtsuki, Miao Pan
IEEE Internet of Things Journal. 2022; 9(18): 17403
[Pubmed] | [DOI]
19 Best Practices on Big Data Analytics to Address Sex-Specific Biases in Our Understanding of the Etiology, Diagnosis, and Prognosis of Diseases
Su Golder, Karen O'Connor, Yunwen Wang, Robin Stevens, Graciela Gonzalez-Hernandez
Annual Review of Biomedical Data Science. 2022; 5(1): 251
[Pubmed] | [DOI]
20 Analysing discourse around COVID-19 in the Australian Twittersphere: A real-time corpus-based analysis
Martin Schweinberger, Michael Haugh, Sam Hames
Big Data & Society. 2021; 8(1): 2053951721
[Pubmed] | [DOI]
21 Random-Forest-Bagging Broad Learning System With Applications for COVID-19 Pandemic
Choujun Zhan, Yufan Zheng, Haijun Zhang, Quansi Wen
IEEE Internet of Things Journal. 2021; 8(21): 15906
[Pubmed] | [DOI]
22 Toward Combatting COVID-19: A Risk Assessment System
Qianlong Wang, Yifan Guo, Tianxi Ji, Xufei Wang, Bingfang Hu, Pan Li
IEEE Internet of Things Journal. 2021; 8(21): 15953
[Pubmed] | [DOI]
23 Identification of affective valence of Twitter generated sentiments during the COVID-19 outbreak
Ruchi Mittal, Amit Mittal, Ishan Aggarwal
Social Network Analysis and Mining. 2021; 11(1)
[Pubmed] | [DOI]
24 A systematic review on AI/ML approaches against COVID-19 outbreak
Onur Dogan,Sanju Tiwari,M. A. Jabbar,Shankru Guggari
Complex & Intelligent Systems. 2021;
[Pubmed] | [DOI]
25 Investigating Public Discourses Around Gender and COVID-19: a Social Media Analysis of Twitter Data
Ahmed Al-Rawi,Karen Grepin,Xiaosu Li,Rosemary Morgan,Clare Wenham,Julia Smith
Journal of Healthcare Informatics Research. 2021;
[Pubmed] | [DOI]
26 Fusion of AI techniques to tackle COVID-19 pandemic: models, incidence rates, and future trends
Het Shah,Saiyam Shah,Sudeep Tanwar,Rajesh Gupta,Neeraj Kumar
Multimedia Systems. 2021;
[Pubmed] | [DOI]
27 Prediction Model for the Spread of the COVID-19 Outbreak in the Global Environment
Ron S. Hirschprung,Chen Hajaj
Heliyon. 2021; : e07416
[Pubmed] | [DOI]
28 From Farm to Fork: Early Impacts of COVID-19 on Food Supply Chain
Shalika Vyas, Nitya Chanana, Madhur Chanana, Pramod K. Aggarwal
Frontiers in Sustainable Food Systems. 2021; 5
[Pubmed] | [DOI]
29 COVID-19 ile Ilgili Sosyal Medya Gönderilerinin Metin Madenciligi Yöntemlerine Dayali Olarak Zaman-Mekansal Analizi
Aytug ONAN
European Journal of Science and Technology. 2021;
[Pubmed] | [DOI]
30 Implications of Twitter in Health-Related Research: A Landscape Analysis of the Scientific Literature
Andy Wai Kan Yeung,Maria Kletecka-Pulker,Fabian Eibensteiner,Petra Plunger,Sabine Völkl-Kernstock,Harald Willschke,Atanas G. Atanasov
Frontiers in Public Health. 2021; 9
[Pubmed] | [DOI]
31 Changes in Perceptions and Use of Mobile Technology and Health Communication in South Africa During the COVID-19 Lockdown: Cross-sectional Survey Study
Alex Emilio Fischer,Tanya Van Tonder,Siphamandla B Gumede,Samanta T Lalla-Edward
JMIR Formative Research. 2021; 5(5): e25273
[Pubmed] | [DOI]
32 Modes of Transmission of Severe Acute Respiratory Syndrome-Coronavirus-2 (SARS-CoV-2) and Factors Influencing on the Airborne Transmission: A Review
Mahdieh Delikhoon,Marcelo I. Guzman,Ramin Nabizadeh,Abbas Norouzian Baghani
International Journal of Environmental Research and Public Health. 2021; 18(2): 395
[Pubmed] | [DOI]
33 Capturing Twitter Negativity Pre- vs. Mid-COVID-19 Pandemic: An LDA Application on London Public Transport System
Ioannis Politis, Georgios Georgiadis, Aristomenis Kopsacheilis, Anastasia Nikolaidou, Panagiotis Papaioannou
Sustainability. 2021; 13(23): 13356
[Pubmed] | [DOI]
34 Ephemeral mimetics: memes, an X-ray of Covid-19
Sara Martínez Cardama, Fátima García-López
The European Journal of Humour Research. 2021; 9(4): 35
[Pubmed] | [DOI]
35 Positive aspects of the COVID-19 pandemic
Sanjay Bhattacharya
Journal of The Academy of Clinical Microbiologists. 2020; 22(1): 2
[Pubmed] | [DOI]
36 An Overview of Social Media Apps and their Potential Role in Geospatial Research
Innocensia Owuor,Hartwig Hochmair
ISPRS International Journal of Geo-Information. 2020; 9(9): 526
[Pubmed] | [DOI]
37 Twitter communication of university libraries in the face of Covid-19
Sara Martínez-Cardama,Ana R. Pacios
El profesional de la información. 2020;
[Pubmed] | [DOI]
38 The Number of Confirmed Cases of Covid-19 by using Machine Learning: Methods and Challenges
Amir Ahmad,Sunita Garhwal,Santosh Kumar Ray,Gagan Kumar,Sharaf Jameel Malebary,Omar Mohammed Barukab
Archives of Computational Methods in Engineering. 2020;
[Pubmed] | [DOI]
39 An Improved K-Means Algorithm Based on Fuzzy Metrics
Xinyu Geng, Yukun Mu, Senlin Mao, Jinchi Ye, Liping Zhu
IEEE Access. 2020; 8: 217416
[Pubmed] | [DOI]


    Similar in PUBMED
   Search Pubmed for
   Search in Google Scholar for
    Access Statistics
    Email Alert *
    Add to My List *
* Registration required (free)  

  In this article
Conflict of inte...
Authors' contrib...
Article Figures

 Article Access Statistics
    PDF Downloaded1063    
    Comments [Add]    
    Cited by others 39    

Recommend this journal