ORIGINAL ARTICLE Year : 2020  Volume : 13  Issue : 2  Page : 8190 Predicting the number of visceral leishmaniasis cases in Kashgar, Xinjiang, China using the ARIMAEGARCH model Huling Li^{1}, Rongjiong Zheng^{2}, Qiang Zheng^{3}, Wei Jiang^{3}, Xueliang Zhang^{4}, Weiming Wang^{5}, Xing Feng^{1}, Kai Wang^{4}, Xiaobo Lu^{2}, ^{1} College of Public Health, Xinjiang Medical University, Urumqi 830011, P.R. China ^{2} Department of Infectious Diseases, the first Affiliated Hospital of Xinjiang Medical University, Urumqi 830054, P.R. China ^{3} Xinjiang Center for Disease Control and Prevention, Urumqi, 830002, P.R. China ^{4} Department of Medical Engineering and Technology, Xinjiang Medical University, Urumqi 830011, P.R. China ^{5} School of Mathematics Science, Huaiyin Normal University, Huaian 223300, P.R. China Correspondence Address: Objective: To forecast the visceral leishmaniasis cases using autoregress integrated moving average (ARIMA) and hybrid ARIMAEGARCH model, which offers a scientific basis to control visceral leishmaniasis spread in Kashgar Prefecture of Xinjiang, China. Methods: The data used in this paper are monthly visceral leishmaniasis cases in the Kashgar Prefecture of Xinjiang from 2004 to 2016. The sample data between 2004 and 2015 were used for the estimation to choose the best model and the sample data in 2016 were used for the forecast. Time series of visceral leishmaniasis started on 1 January 2004 and ended on 31 December 2016, consisting of 1 790 observations reported in Kashgar Prefecture. Results: For Xinjiang, the total number of reported cases were 2 187, the maletofemale ratio of cases was 1:1.42. Patients aged between 0 and 10 years accounted for 82.72% of all reported cases and the largest percentage of visceral leishmaniasis cases was detected among scattered children who accounted for 68.82%. The monthly incidences fitted by ARIMA (2, 1, 2) (1, 1, 1)_{12} model were consistent with the real data collected from 2004 to 2015. However, the predicted cases failed to comply with the observed case number; we then attempted to establish a hybrid ARIMAEGARCH model to fit visceral leishmaniasis. Finally, the ARIMA (2, 1, 2) (1, 1, 1)_{12} EGARCH (1, 1) model showed a good estimation when dealing with volatility clustering in the data series. Conclusions: The combined model has been determined as the best prediction model with the rootmeansquare error (RMSE) of 7.23% in the validation phase, which means that this model has high validity and rationality and can be used for shortterm prediction of visceral leishmaniasis and could be applied to the prevention and control of the disease.
1. Introduction Leishmaniasis or black fever, which also known as kalaazar is a chronic infectious disease which can be divided into visceral leishmaniasis and cutaneous leishmaniasis[1],[2],[3],[4]. Visceral leishmaniasis is a vector borne tropical infection provoked by protozoans which belongs to the genus Leishmania. The transmission vector of the disease is the sandfly (Phlebotomus and Lutzomyia species). The syndromes of visceral leishmaniasis are fever, weight loss, splenomegaly, hepatomegaly, skin darkening and anemia[5]. It is estimated about 50 00090 000 new cases of visceral leishmaniasis occur worldwide annually[6], 90% of which are mainly from 6 countries including India, Bangladesh, Sudan, South Sudan, Brazil and Ethiopia[2] . In terms of global estimation, the number of annual visceral leishmaniasis cases in Somalia is on the top two among the six countries, after South Asia[8],[9] and South Asia is considered as the highest visceral leishmaniasis prone region in the world[10]. In China, visceral leishmaniasis is prevalent in Xinjiang, Gansu, Sichuan and some Midwest regions(autonomous regions). The main transmission types in Xinjiang are wildlife animals type and human type. The type of dogs usually occurs in Gansu, Sichuan and other endemic areas. China has reported a total of 3 994 visceral leishmaniasis cases and cases are distributed in 27 provinces and 336 counties nationwide from 2005 to 2015. The top three reported provinces are Xinjiang, Gansu, and Sichuan, and the reported cases account for 48.00%, 33.12%, and 14.17% of the total reported cases, respectively. The total number of cases account for 95.29% of the total number of cases has been reported in the country[11]. Xinjiang is a key epidemic area of visceral leishmaniasis in China, visceral leishmaniasis cases had increased from 377 in 19801990 to 1 588 in 20042014. At the end of 2014, 1 272 cases were reported in Kashgar for 11 years which accounted for 80.10% of the total reported cases in Xinjiang[12],[13]. The Kashgar region in Xinjiang has become the area with the highest number of black fever patients and the highest incidence rate in the country. Among them, Kashi City, Shufu County and Jiashi County are main endemic areas of visceral leishmaniasis in China, and the death caused by black fever is continuing[1416j. This is why we choose Kashgar Prefecture of Xinjiang as the research area. In recent years, there are many studies using various models for the modeling and forecasting of diseases. Among them, time series models play an important role in disease prediction. For instance, ARIMA model in time series analysis has been employed to forecast visceral leishmaniasis[1719j. The three studies on visceral leishmaniasis, all of whom have an identical point. The relationship between meteorological factors or influencing factors and disease has been taken into consideration in their models. However, at present, studies predicting the trend of Visceral leishmaniasis incidence with the consideration of time series models in Xinjiang are scarce. Hence, this study aims at adopting time series analysis to simulate visceral leishmaniasis cases in Kashgar Prefecture of Xinjiang and offering insights on visceral leishmaniasis and measuring the occurrence of visceral leishmaniasis by using multiplicative seasonal ARIMA model, Exponential General Autoregressive Conditional Heteroscedasticity (EGARCH) technique. This paper aims to use the hybrid ARIMAEGARCH model for modeling and forecasting visceral leishmaniasis in Xinjiang which has not been reported yet to the best of our knowledge. 2. Materials and methods 2.1. Study area and data source Kashgar Prefecture is located in the southwestern of Xinjiang Uygur Autonomous Region between the latitude of 35°28’ to 40°16’N and the longitude of 71°39’ to 79°52’E with the area of about 162 000 square kilometers [Figure 1]. The monthly visceral leishmaniasis cases from 2004 to 2016 in Kashgar Prefecture were retrieved from the Xinjiang Uygur Autonomous Region Centers for Disease Control and Prevention (CDC), age, sex, occupation and other demographic information were also collected and anlyzed. The visceral leishmaniasis cases were divided into development group (January 1, 2004, through December 31, 2015) and validation group (January 1, 2016, through December 31, 2016) for model fitting testing and empirical study, respectively.{Figure 1} 2.2. Descriptive data The data covers 156 months, starting from January 2004 to December 2016 in Xinjiang Uygur Autonomous Region. The total number of visceral leishmaniasis cases reported throughout the period is 2 187, among which, cases in Kashgar Prefecture accounted for 81.85% (1 790/2 187). A total of 58.62% were male and 41.38% were female, with the sex ratio was 1.42:1 (1 282/905); the highprevalence group occurred among those aged 010 years accounting for 82.72% (1 809/2 187) of all reported cases. Cases reported in people aged 075 years decreased gradually with age, While the incidence of children aged 02 years was the highest, accounting for 77.34% (1 399/1 809) of the incidence of children under 10 years old and 63.97% (1 399/2 187) of the total incidences. With regard to occupation, the highest percentage of visceral leishmaniasis cases were detected in scattered children accounting for 68.82% (1 505/2 187), followed by farmers (11.52%, 1 505/2 187), students (10.11%, 1 221/2 187) and kindergarten children (4.71%, 103/ 2 187). In Kashgar Prefecture, Serious endemic districts were mainly Payzawat County (60.89%, 1 090/1 790), Kashi City (17.82%, 319/1 709), Shufu County(9.61%, 172/1 709) and Marabishi County (4.86%, 87/1 709) [Figure 2]. As for the total number of reported cases in the remaining counties was fewer than 30 cases. In 20082009 and 20142016, an outbreak of visceral leishmaniasis occurred in Payzawat County, Kashgar Prefecture, which resulted in a sharp increase in the number of people suffering from visceral leishmaniasis, and at the same time, the number of patients peaked in Xinjiang.{Figure 2} 2.3. Time series analysis 2.3.1. ARIMA model Mostly, we applied the descriptive epidemiology approach to depict the epidemic distribution of visceral leishmaniasis, comprising the spatial distribution and demographic characteristics (sex ratio, age group, and occupation). The ARIMA model originally was conceived for economics applications, then this model has been widely used in the infectious disease for an amount of different time varying events,which is the most common used time series prediction model[20]. An ARIMA (p, d, q) model is determined by three types of parameters (p, d, q): p is the number of autoregressive (AR) terms, d is the number of times the model differenced, and q is the number of moving average (MA) terms. The ARIMA model usually is termed as multiplicative seasonal ARIMA (p, d, q) (P, D, Q) S , which is an extension of the ARIMA model to time series. In the expression, the seasonal parameters area: P is the seasonal order of autoregressive; D is the order of seasonal differencing; Q is the seasonal order moving average; and S is the length of the seasonal period, defined as 12[21],[22]. To get an optimal ARIMA model, the procedure comprises three iterative steps[2325j: identification, estimation, and diagnostic checking. Prior to fitting the ARIMA model, Augmented DickeyFuller (ADF) method has been adopted to identify whether the series is stationary. Appropriate difference could transform the series into a stationary one. Identification is the process of determining seasonal and nonseasonal orders of a model using the autocorrelation functions (ACF) and partial autocorrelation functions (PACF) of the transformed data. The maximum likelihood estimation method has been used for parameters estimation after the identification step. At the diagnosis phase, the adequacy of the established model for the series is verified by employing white noise tests to check whether the residuals are independent and normally distributed. It is possible that several ARIMA models may be identified, and the selection of an optimum model is necessary. The best fitting model has been determined by comparing values of Akaike Information Criterion (AIC) and Schwartz Bayesian Criterion (SBC). Lower AIC and SBC values can demonstrate better model from models developed by different lags. In addition, we have used the rootmeansquare error (RMSE) to evaluate the general performance for each model, the calculation of RMSE between actual cases and predicted cases utilize the formula: [INLINE:1] where is observed values and is predicted values at time t and N is the number of observations. Finally, the fitted ARIMA model applied to predict the monthly visceral leishmaniasis incidences between January and December 2016. 2.3.2. GARCH family models In the above model, the variance of innovations (often referred to as volatility) is constant over time (homogeneity of variance). This often proves to be too restrictive of an assumption for real data. Under such an assumption, features like volatility clustering cannot be modeled. In order to solve this question, the Autoregressive Conditional Heteroscedasticity (ARCH) and Generalized Autoregressive Conditional Heteroscedasticity (GARCH) models developed by Engle and extended by Baillie and Nelson have been proven to be a useful tool to capture the momentum in conditional variance. The GARCH models are commonly used in modeling financial time series that exhibit timevarying volatility clustering. GARCH models attempt to address volatility clustering in an innovations process. Volatility clustering occurs when an innovations process does not exhibit significant autocorrelation, but the variance of the process changes with time. If a series exhibits volatility clustering, it suggests that past variances might be predictive of the current variance. GARCH family models including Integrated Generalized Autoregressive Conditional Heteroscedasticity (IGARCH) is a restricted version of the GARCH model; A model which accepts the asymmetric effect of the news is the exponential GARCH model (EGARCH); A TGARCH (p, q) model proposed by Glosten et al. which can also handle leverage effect[26],[27],[28],[29],[30],[31]. In recent years, more hybrid forecasting models have been proposed including an ARIMA model with GARCH to predict time series data in various fields for their good performances. This article used three different criteria, named Root Mean Squared Error (RMSE), AIC and SBC to compare the performance efficiency of the ARIMA and ARIMA GARCH family models in the forecasting of visceral leishmaniasis. The model with a smaller amount would be considered as a better and more appropriate model. All analyses, modelings and data visualizations were programmed in R (version 3.4.1) and ArcGIS (version 10.4.1). 3. Results 3.1. Multiplicative seasonal ARIMA model 3.1.1. Data preprocessing The trend of monthly visceral leishmaniasis time series data from January 2004 to December 2015 can be seen in [Figure 3]A. The sequence identified by ADF test, which was stationary (P=0.022). The seasonal characteristics of visceral leishmaniasis cases revealed that the data series had a seasonal cycle per 12 months. Looking at the seasonal distribution of cases over the years, the peak of visceral leishmaniasis cases was between September and November, most diseases burst out in the autumn. To eliminate the effect of seasonal trends, we took the first difference and 1order seasonal difference. The series was stationary which can be seen from [Figure 3]B and Augmented Dickeyfuller (ADF) test (P=0.01) the series has been.{Figure 3} 3.1.2. Model identification First, the orders were examined for elected ARIMA (p, d, q) (P, D, Q)12. The Autocorrelation Function (ACF) and Partial Autocorrelation Function (PACF) were regarded as an indication of order [Figure 4]. Although the ACF and PACF implement an effective tool for identifying pure AR (p) or MA (q) models, however, the theoretical ACFs and PACFs have an infinite number of nonzero values for the hybrid ARMA model, using ACF and PACF to identify the hybrid model is extremely difficult. To facilitate the determination of ARIMA model order, it could be determined that the difference order d=1, D=1; the initial model was ARIMA(p, 1, q) (P, 1, Q)12; the remaining parameters took 0, 1, 2. The different combinations of orders leaded to 81 different ARIMA model for further analysis.{Figure 4} 3.1.3. Parameter estimation and model diagnostic [Table 1] showed the parameter estimation and testing results of the multiplicative seasonal ARIMA models which had passed parameter verification. There were three plausible models for ARIMA in this study, including ARIMA (1, 1, 0) (2, 1, 0)12, ARIMA (2, 1, 1) (1, 1, 1)12 and ARIMA (2, 1, 2) (1, 1, 1)12. [Table 2] showed the ARIMA (2, 1, 2) (1, 1, 1)12 which was considered as the most suitable one for all tested models because of the smaller AIC, AICC and SBC (966.46, 967.37 and 986.58). RMSE (8.62%) implied that the selected model was workable. The white noise diagnostic check (χ2=0.002, P=0.965) of the residuals of the best model revealed that the ARIMA (2, 1, 2) (1, 1, 1)12 model had got whole information from the visceral leishmaniasis times series data in this paper. So the residual series was a white noise.{Table 1}{Table 2} 3.1.4. Forecast analysis of model The ARIMA (2, 1, 2) (1, 1, 1)12 model was used to simulate visceral leishmaniasis incidence (January 2004 to December 2015) and forecast visceral leishmaniasis incidence (January 2016 to December 2016). The fitted and real amounts of the disease cases have been presented in [Figure 5]. The prediction results of the ARIMA model (2, 1, 2) (1, 1, 1)12 with 95% confidence interval were also depicted in [Figure 5]. As shown in the Figure, the pattern of the fitted monthly values was very close to the actual visceral leishmaniasis cases. The achieved results of [Table 3] in the validation phases (January 2016 to December 2016) predicted that the visceral leishmaniasis cases were similar to the actual cases in the first half of the year and the actual values lied in the predicted 95% confidence interval (CI). However, in the next six months, the forecasted cases were not consistent with the observed visceral leishmaniasis, the ARIMA model presented an RMSE of 48.53%. Unfortunately, almost all observed number of visceral leishmaniasis cases were not within the 95% confidence interval of the predicted value, which proved that the multiplicative seasonal ARIMA model failed to accurately predict the development trend of cases the number of visceral leishmaniasis cases in the future. Therefore, in order to boost the predicted precision of ARIMA model, we introduced the ARIMAEGARCH model in the next section.{Table 3}{Figure 5} 3.2. Hybrid ARIMAEGARCH model In this section, to enhance the precision of ARIMA (2, 1, 2) (1, 1, 1)12 model, we analyzed the residual of the model. ARIMA combined with different models in the GARCH family models. We chose the ARIMAEGARCH model as the hybrid model. 3.2.1. Testing for ARCH effects From the previous introduction, since the ARIMA model was established on the assumption that the data was homoscedastic. In order to take the changes in variance for the visceral leishmaniasis data into consideration, we needed to model the heteroscedastic behavior of the data series, which was an ARCH effect. Therefore, we needed to test the autocorrelation of the square of the residual sequence. Here, we used the McLeodLi test to obtain p values that were much smaller than the significance level (significant level P<0.05), so the null hypothesis was rejected. It is considered that there was a strong correlation between the squared terms of the residuals, which was, the sequence had conditional heteroskedasticity [Figure 6].{Figure 6} 3.2.2. Create hybrid ARIMAEGARCH model Since the model had conditional heteroskedasticity, the volatility equation was established for the residual of the model. We chose the EGARCH model to explain the conditional heteroskedasticity. The aim of the general EGARCH model was (1, 1) to explain most models, hence, we built the ARIMA(2, 1, 2) (1, 1, 1)12EGARCH (1, 1) model to boost the precision of forecast. By the model diagnostic, we found the correlation of the square of standardized residual of the combined model was weak correlation [Figure 7]A, [Figure 7]B), and there was no ARCH effect longer. It is considered that the combined model was workable. [Table 4] presented information criterion, in the fitted phase, AIC, SBC and RMSE for the ARIMA (2, 1, 2) (1, 1, 1)12 model were 966.46, 986.58 and 8.62%, respectively. However, the values of AIC, SBC and RMSE of the ARIMA (2, 1, 2) (1, 1, 1)12 EGARCH (1, 1) model were 6.16, 6.39 and 11.07%, respectively. The RMSE values for predicted data of ARIMA (2, 1, 2) (1, 1, 1)12 and combined model were 48.53% and 7.23%, respectively. This suggested that the combined model was able to achieve dramatic performance improvement. ARIMA (2, 1, 2) (1, 1, 1)12EGARCH (1, 1) model presented a reasonable result in forecasting visceral leishmaniasis series [Figure 8].{Figure 7}{Figure 8}{Table 4} 4. Discussion Kashgar is located in the southern border of Xinjiang and is economically underdeveloped. It is the main epidemic area of visceral leishmaniasis in China. Under the circumstances, although the country has basically controlled visceral leishmaniasis, the epidemic of visceral leishmaniasis in Kashgar Xinjiang still has not been fundamentally controlled. With the acceleration of the development in the western region and the increase of the floating population, the control and prevention of visceral leishmaniasis in Xinjiang has become more complex and difficult[32]. Our study suggested that serious endemic areas mainly emerged in Payzawat County, Kashi City, Shufu County and Marabishi County. In 20082009 and 20142016, an epidemic of visceral leishmaniasis occurred in Payzawat County, Kashgar Prefecture, which led to a sharp increase of the number of people suffering from visceral leishmaniasis and the highest peak in the number of people suffering from the disease in Xinjiang. In other years, it has maintained stable fluctuations. The visceral leishmaniasis cases in Xinjiang mainly were comprised of scattered children and farmers. Among them, infants and children under 2 years old accounted for more than half of the total number of cases, and they were the ageappropriate population of the natural epidemic type visceral leishmaniasis in Xinjiang. Our study is similar with the epidemiological aspects of visceral leishmaniasis in Kaleybar and KhodaAfarin districts, northwest of Iran[33]. They collected 1 420 human(children under 12 years) samples, 101 domestic dogs samples (Canis familiaris), and 577 female sand fly samples were collected. Sera of human and dogs were tested using the direct agglutination test, and sand flies were identified at species level using the microscopic method. Furthermore, a structured questionnaire was applied to evaluate the correlation between the potential risk factors and the related clinical signs/symptoms with the human and dogs’ seropositivity. This study showed that visceral leishmania infection is prevalent in rural areas of Kaleybar and KhodaAfar districts located in EastAzerbaijan province, therefore active detection and treatment of visceral leishmaniasis cases should not be neglected. In this study, we used ARIMA (p, d, q) (P, D, Q)12 model to analyze the visceral leishmaniasis data in Kashgar, China and the visceral leishmaniasis cases were simulated by ARIMA (2, 1, 2)(1, 1, 1)12 model. However, the fitted cases did not reflect the actual case number. And the RMSE value for the validation phase was 48.56%. The predicted cases did not highly comply with the observed visceral leishmaniasis trend indicating that multiplicative seasonal ARIMA model can not accurately predict the occurring trend of case numbers of visceral leishmaniasis in the future. In order to improve the predicted accuracy of ARIMA (2, 1, 2) (1, 1,1 )12 model, the ARIMA (2, 1, 2) (1, 1, 1)12EGARCH (1, 1) model was further established to predict the visceral leishmaniasis incidence and it showed the best value of RMSE, AIC and SBC criteria equaled to 11.07%, 6.16 and 6.39, respectively. In the validation phases, the RMSE of ARIMA (2, 1, 2) (1, 1, 1)12EGARCH (1, 1) model was less than that of the ARIMA (2, 1, 2) (1, 1, 1)12. So, through the RMSE, AIC and BIC criteria, ARIMAEGARCH has been proven to be the best model. Furthermore, through seasonal index analysis, it was shown that visceral leishmaniasis had a significant seasonality, as the most incidences occurred in Autumn and the peaked between September and November. Previous study had shown[14] that the outdoor bites and people whose neighbors have visceral leishmaniasis were the main risk factors for infecting leishmania. Avoiding outdoor bites and the universal popularity of the distribution of bed nets is an efficiency way to prevent and control outbreaks of visceral leishmaniasis infection in local populations. The epidemic factors of visceral leishmaniasis are complex, such as the influence from pathogens, vectors and floating population, which increases the difficulty of the prevention and control. Therefore, it is necessary to raise the awareness of the longterm and arduous task of the prevention and control of visceral leishmaniasis. Diverse models have also been developed due to the deficiencies of ARIMA, such as ARIMAX, SVM and MaxEnt ecological niche modeling. In 2014, Zhang Xingyu et al.[22] conducted a comparative study of four typical time series methods in the prediction of nine infectious diseases, named two decomposition methods (regression and exponential smoothing), ARIMA model and support vector machine based model. The differences in the principles and practices of these methods have been compared. The results of the study showed that no single method was found to be completely superior but the SVMs outperformed ARIMA models and decomposition methods in most cases. A series of studies on visceral leishmaniasis have been published in Iran[17],[18],[19], and these articles all used time series methods to analyze the cases of visceral leishmaniasis. For example, Sharafi M et al.[l7] used the Seasonal Autoregressive Integrated Moving Average (SARIMA) model to predict the trend of cutaneous leishmaniasis and assess the relationship between the disease trend and weather variables in south of Fars province. Another research, Nikonahad A et al.[l8j investigated the relationship between the environmental and metrological variables and cutaneous leishmaniasis transmission and its prediction in a region susceptible to this disease prevalence with using the time series model. Similarly, Tohidinik HR et al, in order to forecast the occurrence of zoonotic cutaneous leishmaniasis and evaluate the effect of climatic variables on disease incidence in the east of Fars province, they used the Seasonal Autoregressive Integrated Moving Average (SARIMA) model. The three studies on visceral leishmaniasis mentioned above all have an identical point. That is, variables such as earthquakes, rainfall, temperature, sunshine hours, and relative humidity might be related to the occurrence of visceral leishmaniasis. Finally, the results of these studies indicated that introducing meteorological variables into the model could improve its accuracy. Shiravand B et al.[34] obtained the distribution data for vector and reservoir hosts of zoonotic cutaneous leishmaniasis in Yazd province. Using MaxEnt ecological niche modeling to predict environmental suitability. BCCCSM11(m) model and two climate change scenarios, RCP 4.5 and RCP 8.5 were used for horizons 2030 and 2050 climate projections. With both scenarios in 2030 and 2050, the results of jackknife test indicated that the mean temperature of wettest quarter and temperature annual range had the greatest effect on the model for the vector and the reservoir hosts, respectively. Their study demonstrated that the climate conditions are the major determinants of zoonotic cutaneous leishmaniasis incidence rate in Yazd Province. Understanding the role of environmental and bioclimatic factors in zoonotic cutaneous leishmaniasis occurrence can provide a guide for policymakers in the creation and implementation of more effective policies for prevention and control. In Xinjiang, China, few reports used the ARIMAX model to analyze the exist of visceral leishmaniasis. In future, we can consider applying these methods to the study of visceral leishmaniasis and seek more exact models to predict the incidence of visceral leishmaniasis in Xinjiang. This study examined the visceral leishmaniasis cases in Kashgar Prefecture from 2004 to 2016 by using time series methodology. After series of models tried, we hold the ARIMA (2, 1, 2) (1, 1, 1)12 as the best model because of its low AIC, AICc and BIC criteria and RMSE. As the visceral leishmaniasis time series possess volatility, an attempt was made to model this volatility using ARIMAEGARCH models which was considered as appropriate for best fitting the model. In the future, we should apply other predictive models in the study of visceral leishmaniasis to present a richer theoretical basis for prevention and control of visceral leishmaniasis. Conflict of interest statement We declare that we have no conflict of interest. Acknowledgements This research was supported by the National Natural Science Foundation of China (11961071, 61672013, and 81660333) and Huaian Key Laboratory for Infectious Diseases Control and Prevention (HAP201704). Authors’ contributions Both Wang K and Lu XB are equally contributed authors who designed the work and participated the critical revision of the article and contributed to the final version of the manuscript. Zheng RJ, Zheng Q and Jiang W collected the data. Li HL performed the data analyses, interpreted the results and wrote the initial draft of the paper. Zhang XL and Wang WM revised the manuscript. Feng X help to perform the analysis with constructive discussions. References


