A generalised SEIRD model with implicit social distancing mechanism: A Bayesian approach for the identification of the spread of COVID-19 with applications in Brazil and Rio de Janeiro state

ABSTRACT We develop a generalized Susceptible--Exposed--Infected--Removed--Dead (SEIRD) model considering social distancing measures to describe the COVID-19 spread in Brazil. We assume uncertain scenarios with limited testing capacity, lack of reliable data, under-reporting of cases, and restricted testing policy. A Bayesian framework is proposed for the identification of model parameters and uncertainty quantification of the model outcomes. We identify through sensitivity analysis (SA) that the model parameter related to social distancing measures is one of the most influential. Different relaxation strategies of social distancing measures are then investigated to determine which are viable and less hazardous to the population. The scenario of abrupt social distancing relaxation implemented after the peak of positively diagnosed cases can prolong the epidemic. A more severe scenario occurs if a social distancing relaxation policy is implemented prior to the evidence of epidemiological control, indicating the importance of the appropriate choice of when to start the relaxation.


Introduction
At the end of 2019, the world was taken by the news about the outbreak in China of a new coronavirus called SARS-CoV-2, which stands for Severe Acute Respiratory Syndrome Coronavirus 2. The associated disease, called COVID-19 by the World Health Organization (WHO), rapidly spread around the world and is currently considered a pandemic. Coronavirus belongs to a group of viruses that are common in humans and are responsible for 15% to 30% of cases of common cold (Mesel-Lemoine et al., 2012). The SARS-CoV-2 has spread rapidly and has already affected a significant part of the world's population (Worldometer, 2020). As of April 8 2021, there are 223 countries and territories with cases of COVID-19, accumulating about 132.7 million confirmed cases and 2.8 million deaths (World Health Organization, 2021). The within-host dynamics seem to be very heterogeneous and the severity of the disease varies widely. Based on information available currently, elderly people and people of any age with comorbidities (e.g., obesity, hypertension, cardiovascular disease, and diabetes) are among those at most risk of severe illness from COVID-19. In most locations, the burden of COVID-19 cases requiring medical intensive care exceeds the available medical resources, worsening the situation. A great effort has been made worldwide to better understand the disease and possible mechanisms to control it. At least in the early dynamics of every epidemic, measures such as widespread testing and reducing social contacts are known to decrease the speed of the disease spread and the case fatality (Koo et al., 2020). However, in the case of the COVID-19 pandemic, countries often have to deal with the limitation of testing capacity, which increases the uncertainty associated with how the disease behaves and is likely to complicate the management of the policies to mitigate and control it. One form to deal with this lack of information is through mathematical modelling, using diversified modelling techniques (see Bauch et al. (2005) for a review) to intrinsically represent simplifications of reality. An overview of applications and limitations of mathematical modelling for COVID-19 is presented in Wang (2020). In particular, population-based models that split individuals into classes are widely used and this is the approach followed in this work.
Mathematical models can help the investigation of different scenarios, devising alternative strategies to control virus spreading. The SARS coronavirus epidemic in 2002-2003 generated several publications employing mathematical models to provide some understanding of the propagation pattern and ultimately to infer control measures for the disease (Gumel et al., 2004); Ng et al., 2003); and Zhang, 2007). Similar mathematical models were employed for other diseases, for example: (i) the 1918 influenza pandemic using data from cities of the USA (Mills et al., 2004), and England and Wales (He et al., 2013); (ii) the 2009 influenza pandemic was modelled in local regions in Japan (Saito et al., 2013); and (iii) the 2014-2016 ebola epidemic was modelled for the West African countries with reported cases of the disease (Diaz et al., 2018;Mamo and Koya 2015).
Regarding the COVID-19, several papers modelling the dynamics were published until now, mainly in the first half of 2020 (Else, 2020). The common lesson presented in these work points to the need for early and effective isolation of the infected people, and the importance of social distancing until vaccines are available for a large part of the population. Some papers that employ mathematical modelling separate the target population into distinct compartments: infected, diagnosed, undiagnosed, and recovered Shao et al., 2020), and some also consider an asymptomatic infected compartment, whose individuals can spread the disease Prem et al., 2020). It is also typical to consider the compartment of exposed (latent) individuals who carry the virus and subsequently become infectious (Hou et al., 2020). Importantly, social distancing can be considered either implicitly (Maier and Brockmann, 2020) or explicitly Tang et al., 2020) among the compartments.
WHO has already recognised that testing for COVID-19 is a key way to know how the virus spreads and to provide insights on how to respond to it, although widespread testing is low in most countries in the world. In Brazil, for example, the initial testing policy included only severely ill people and healthcare practitioners. The under-reporting of infected cases was and still is a major problem, coupled with incomplete and inconsistent overall data as well as the lack of complete knowledge on the prevalence and progression of the disease (Marson, 2020). Likewise, the number of deaths may be underestimated for the same reason. Overall, one of the main difficulties in developing predictive compartmental models of the COVID-19 is the lack of reliable data to support parameter choices. We then propose a modelling framework to deal with uncertain scenarios. It includes a seven-compartment model that implicitly considers the social distancing policy that isolates individuals from the infection for a period of time. This modelling strategy to consider social distancing is also employed in Maier and Brockmann (2020) using a generalisation of the standard SIR model. Here, we focus on understanding the model response to perturbations/uncertainties. We perform sensitivity analysis and quantify the uncertainties in model outcomes, mainly the cumulative numbers of confirmed and death cases, and the effective reproduction number. We also investigate hypothetical scenarios on how the system subject to uncertainties respond to time-dependent social distancing measures. Our studied cases include the whole Brazilian scenario (BR) and the state of Rio de Janeiro (RJ), which was one of the first states in BR to recommend and implement social distancing measures, beginning on March 17, (2020). However, they have not lasted much longer and RJ has started a progressive social distancing flexibilization, as the whole country .
Given the above, the objective of this work is to study through mathematical modelling the early spread of the COVID-19 in Brazil and the state of Rio de Janeiro. The analysed data were obtained before the development of vaccines and pharmacological treatments, to specifically depict the early dynamics. Moreover, we employed an uncertainty evaluation and SA because of the lack of knowledge of some traits of the COVID-19. We do not know of any other work employing a similar framework considering the disease uncertainties to qualitatively evaluate the social distancing measures for Brazil and Rio de Janeiro state. We hope these analyses might guide policymakers mainly in which concerns the social distancing measures, both for the COVID-19 pandemic and even for eventual future epidemics with characteristics similar to COVID-19. Figure 1 provides a schematic workflow of the main steps included in the proposed modelling framework to deal with scenarios of the spread of COVID-19 that are subject to great uncertainties. The dashed block involves the developed compartmental model, called the forward model, to describe the spread of COVID-19 and the steps associated with the model calibration. The model parameters to be calibrated, denoted by the vector θ ¼ fθ 1 ; . . . ; θ k g, are estimated based on the given experimental data y, which can be observable quantities such as the cumulative numbers of confirmed and death cases. A Bayesian inference approach is used for parameter estimation, which requires the prior knowledge on the model parameters π prior ðθÞ, and the definition of the likelihood function π like ðyjθÞ. The Bayes' theorem combines π like ðyjθÞ with π prior ðθÞ to obtain the posterior probability distribution for each parameter θ 1 ; . . . ; θ k . These calibrated parameters are gathered with the initial conditions and other predefined (biological) parameters into the vector Θ, which is the input vector of the forward model. Any mathematical or computational model can be used as a forward model to describe the phenomenon of interest. Here, we developed the SEAIRPD-Q (Susceptible -Exposed -Asymptomatic Infected-Symptomatic Infected -Removed -Positively Diagnosed -Dead -Quarantine) compartmental model, which is described in Section 2.1. The input model factors Θ and associated uncertainties are then propagated through the forward model, which allows quantifying the uncertainties on the model outputs through probability distribution functions. Of note, model outputs can be any state variable or some related quantities of interest (QoI) in a time or time range of interest. They ultimately depend on t and Θ, so that we denote QoI = QoIðt; ΘÞ. The proposed framework also quantifies how uncertainties on the model factors impact model outputs using a global SA method. In summary, our modelling framework enables us to (i) evaluate parameter uncertainties from the available data, (ii) investigate how uncertainties on the model factors impact model outputs, (iii) identify the most influential mechanisms related to the disease spread, and (iv) propagate parameter uncertainties through the model, obtaining probability distributions of the model outcomes. In this way, we are able to integrate typical modelling techniques widely used in epidemiology with a Bayesian approach as a basis for statistical calibration and uncertainty quantification, encompassing relevant aspects of predictive computational science. All the steps shown in Figure 1 are described in detail in the following subsections.

The SEAIRPD-Q model
We develop a generalised SEIRD model that includes protective social distancing measures based on the following assumptions: (i) the analysis time is small enough such that natural birth and death are disregarded; (ii) all positively diagnosed individuals, who are severely ill, are hospitalised; (iii) only symptomatic infected and positively tested (diagnosed/hospitalised) individuals may die due to complications from the disease; (iv) the hospitalised individuals are under treatment and remain isolated so that they are considered not infectious; (v) social distancing measures are restrictive so that isolated individuals are not likely to be infected, and (vi) the recovered/removed individuals acquire immunity. Our SEAIRPD-Q model has seven compartments, including the population of positively diagnosed (P) individuals who are under medical treatment, and the infected class that is modelled as two separate compartments encompassing individuals with and without symptoms, denoted by I and A, respectively. The removed compartment (R) includes the recovered individuals as well as those under social distancing measures. In this way, individuals that are isolated by social distancing measures are implicitly included in the compartment of those who are not subject to be infected. The mathematical description (differential equations) of the SEAIRPD-Q Figure 1. A schematic view of the proposed modelling framework. It is a general approach, with roots in Bayesian inference, which enables investigating scenarios subject to uncertainties. model and its schematic representation are shown in Figure 2 while model parameters and related meanings are exhibited in Table 1 and SM-A.1 (see the Supplementary Material (SM) for more details).
Susceptible individuals become exposed to COVID-19 when in contact with infected people. The rate at which this happens may vary depending on whether the infected individual has symptoms or not, denoted by β or μ, respectively. After an incubation period (1=σ), exposed individuals become infected. The fraction ρ of infected people who present symptoms can vary widely. Due to the assumption of reduced testing capacity, asymptomatic individuals, as well as symptomatic infected individuals with mild symptoms, are not likely to be hospitalised and therefore will not be diagnosed. Those who present symptoms are either isolated or hospitalised. We assume that only symptomatic infected individuals with stronger symptoms are diagnosed quickly, at a rate of 2 I , and require hospitalisation. This assumption better reflects the policy on only testing severely ill individuals.
Following the WHO guideline on social distancing, most countries around the world are adopting social distancing policies to some extent. An interesting feature of our model is the assumption that susceptible, exposed, and infected individuals can be kept isolated at a removal rate of ω, an assumption also taken by Maier and Brockmann (2020). Thus, the removed individuals' compartment includes individuals who have recovered from the disease as well as those subjected to social isolation. Asymptomatic and symptomatic individuals can recover without medical treatment at rates of γ A and γ I , respectively, and hospitalised individuals recover at a rate of γ P . Given the model inputs Θ, the cumulative numbers of confirmed and death cases are obtained, respectively, from simulation day zero (t 0 ) to a desired time t as: (1)  Here and henceforth, we purposely drop the dependency of the variables with Θ to simplify notation.

Time-dependent social distancing policy modelling
To represent and understand social distancing policy effects on COVID-19 epidemic, we propose a timedependent removal rate, which we denote by ω :¼ ωðtÞ. Considering that t d is the time at which some social distancing relaxation policy is implemented, ωðtÞ is defined as the following continuous function: ; (2) in which ω � is a positive constant and λ ¼ ln 2ðt 1=2 Þ À 1 is the decay constant, with t 1=2 being the half-life time for the social distancing release policy. In this way, ωðtÞ is a smooth decreasing function after t ¼ t d for which λ regulates the decay speed. From an epidemiological point of view, we consider that social distancing policies started soon after the first registered cases, with a removal rate ω � . By defining Equation (2), we may relax these policies from an arbitrary time t d towards resuming pre-pandemic social interaction levels at a rate λ. Thus, changing t d and λ enables to model the day and intensity of the relaxation of social distancing measures.

Evaluation of the reproduction number
Following the ideas introduced in Diekmann et al.
(2010), we apply the Next-Generation Matrix method to obtain R 0 for the SEAIRPD-Q model. Firstly, we use a reduction process to arrive at a linearised infection model, under the disease-free steady state. The identification of the infected variables (E, A, and I) allows defining the vector x T ¼ ½E; I; A�, where the superscript T stands for the transposition of a matrix. Thus, we consider that S, R, P, and D individuals are not able to transmit the disease. Denoting by T and � the transmission and transition matrices, respectively, the three-dimensional linearised infection sub-system is given by: where The reproduction number R 0 is the dominant eigenvalue of the next-generation matrix K ¼ À T� À 1 (Diekmann et al., 2010), which yields: The corresponding effective reproduction number is:

Data
We investigated the model behaviour for BR and RJ. We used the raw data from cumulative numbers of confirmed and death cases gathered by the official panel of the Brazilian Ministry of Health Brazilian Health Ministry (2020). Since Brazilian policy for testing in COVID-19 is still mostly restricted to severe cases, model calibration aims at matching C and D, without considering the recovered outflow from P. We assessed the epidemiological data of 198 days for BR, ranging from March 5 2020, to September 18 2020, and 193 days for RJ, ranging from March 10 2020, to September 18 2020. The initial dates were chosen considering a minimal requirement of at least five diagnosed individuals at the pandemic initial date. For completeness, all employed data are listed in Volpatto et al. (2020). The experimental data are denoted by the time series yðt i Þ; i ¼ 1; . . . ; n, where n is the number of observable days. The observable quantities are the cumulative numbers of confirmed cases, y ðCÞ ðt i Þ, and death cases y ðDÞ ðt i Þ. Thus, we have yðt i Þ ¼ fy ðCÞ ðt i Þ; y ðDÞ ðt i Þg n i¼1 .

Bayesian calibration
Model calibration is performed using a Bayesian approach to make results consistent with available observations on cumulative confirmed and death cases. With Bayesian calibration, we are capable of determining the most likely uncertainties for input parameters as well as their maximum a posteriori (MAP) estimates of marginal distributions described by the a posteriori probability distribution π post . Considering that our aim is to adjust a set of parameters θ given y observations, Bayes' theorem states: π post ðθjyÞ ¼ π like ðyjθÞπ prior ðθÞ π evid ðyÞ ; where π prior is the a priori probability distribution that represents the initial (a priori) knowledge over θ, π like is the likelihood function, and π evid is the evidence (information) encompassed in the data (see Tarantola (2005) for a more detailed description). We assume a Gaussian likelihood function in the form: where y ðCÞ model ðtÞ ¼ CðtÞ and y ðDÞ model ðtÞ ¼ DðtÞ are the simulated model outcomes for the cumulative numbers of confirmed and death cases, respectively. Gaussian noise is associated with each observable quantity with variances σ 2 C and σ 2 D , which entail the uncertainties in the observed data and are considered hyperparameters to be determined in the Bayesian calibration procedure. Due to paucity data and possible model identifiability issues, we calibrate the following model parameters: β ¼ μ, ω � , d I , and d P , together with σ C and σ D . All other parameters are gathered from the available literature and are listed in Table SM The code implementation was written in Python language (version 3.7), using PyMC3 as a Probabilistic Programming framework (Salvatier et al., 2016). To take advantage of the parallel programming framework provided by PyMC3, all experiments were performed on a machine with the following configurations: Ubuntu 20.04 LTS, Intel Core i9-9900KF octa-core CPU with 3.60 GHz, NVIDIA GeForce RTX 2080 SUPER integrated with 8 GB GDDR6 256-bit memory interface, and 32 GB DDR4 DRAM. On this equipment, our codes take an average of 20 min to generate all the results. For Bayesian posterior sampling, we used the Cascading Adaptive Transitional Metropolis in Parallel (Minson et al., 2013), a Transitional Markov Chain Monte Carlo available in PyMC3. For the sake of reproducibility, code and scripts are publicly provided in Volpatto et al. (2020). Of note, the SEAIRPD-Q system of seven ordinary differential equations is solved using the LSODA method (Petzold, 1983) from SciPy (Virtanen et al., 2020). This package provides an interface for such routine from ODEPACK (Hindmarsh, 1983).

Sensitivity analysis
In order to quantify how changes in the input model factors Θ affect the QoI, we apply a global SA method called the Elementary Effects method (Campolongo et al., 2007) present in the SALib library (Herman and Usher, 2017). In contrast to local sensitivity methods, which evaluate how small changes around a singular point affect a QoI (Van Voorn and Kooi, 2017), the Elementary Effects method identifies the impact of changing all factors simultaneously. Here, our QoIs are the effective reproduction number Rðt j Þ and the normalised sum of the squares of C and D, ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi Cðt j Þ 2 þ Dðt j Þ 2 q . The analysis is performed at a few time points t j throughout the disease evolution. Due to the presence of great uncertainty, we determine the sensitivity of those QoIs with respect to variations of Θ. Specifically, we assume that each factor included in the SA is a random variable that represents the uncertainty about its value. After transforming the corresponding parametric space into dimensionless form and sampling m initial points Θ ðrÞ ; r ¼ 1; . . . ; m, the method builds m trajectories of v þ 1 points randomly varying one factor at a time to screen the parametric space, where v is the number of input factors considered in the SA. At every i th change of direction in the r th trajectory, an elementary effect is computed at each t j , i.e., where Δ 2 f 1 pÀ 1 ; . . . ; 1 À 1 pÀ 1 g is the magnitude of the change of the i th factor in the v-dimensional plevel grid. Values of p as an even number and Δ ¼ p 2ðpÀ 1Þ are usually suggested choices to ensure an equal sampling probability in the parametric space and are the default option in the SALib library (Herman and Usher, 2017). After computing the elementary effects, the process ends by calculating the global sensitivity indexes for each parameter. Our work dedicates the analysis only to the firstorder sensitivity index: Factors with high scores are the most influential to the QoI. In turn, the first-order sensitivity index values allow ranking the parameters concerning their order of importance.
Here, we determine the sensitivity of those two QoIs with respect to variations of all parameters of the model and the initial conditions of the exposed and infected populations, asymptomatic and symptomatic, accounting for a total of v ¼ 13. We assume that they all are random variables following uniform distribution ranging � 50% around their fixed or MAP values.

Results
To properly perform projections and further study different social distancing scenarios of interest, we calibrated the SEAIRPD-Q model to the data of BR and RJ (separately). We employed a Bayesian calibration of some model parameter values, with the remaining parameter values gathered from the literature. Model parameters are shown in Table 1, in which we also indicate the MAP estimates for the calibrated parameters and their corresponding prior distributions. The frequency histograms for the calibrated parameter posterior marginals and other model factors are presented in the SM. Of note, in Table SM-A.2 we state the hypotheses adopted to fix the values of some parameters and the values adopted for the initial conditions.
Taking into account the maintenance of the social distancing policies during the time of the analysis, Figure 3 exhibits the fitting and predictions for the COVID-19 pandemic in BR. Figure 3a shows that the peak of P individuals occurred on the simulation day 145 (95% CI: 143-146) with around 638.8 thousand (95% CI: 630.6-647.1) people simultaneously infected (denoted "active cases", hereinafter). By the end of the simulation time, the model predicts around 149.3 thousand (95% CI: 145.1-153.6) of deaths and 4.392 million (95% CI: 4.306-4.482) of cumulative cases. The R 0 calculated by means of MAP for BR is 3.09, and the time evolution of RðtÞ is displayed in Figure 3b (black line). Notice that RðtÞ < 1 occurred around the simulation day 135 (95% CI: 133-136), which indicates that the disease is controlled (Van den Driessche and Watmough, 2002).
We also fitted the model with the available data for RJ. Considering again the maintenance of the social distancing policy, the projections for COVID-19 infection in RJ are shown in Figure 4. Of note, model fitting for RJ is much worse than in the BR scenario, probably driven by highly noisy data. Nevertheless, the model could capture correctly the peak of active cases that occurred on the simulation day 115 (95% CI: 114-116) with around 32.2 thousand (95% CI: 31.7-32.7) active cases. Also, around 16.6 thousand (95% CI: 16.3-17.0) of deaths and 212.5 thousand (95% CI: 207.6-217.7) of cumulative cases are expected on day 225, when these numbers stabilised. The R 0 is 2.68 for RJ, and the time evolution of RðtÞ is displayed in Figure 4b (black line). Similarly as for the broader scale of the Brazilian scenario, if the social distancing measures are maintained in RJ, the disease was under control around the simulation day 105 (95% CI: 103-107), when the effective reproduction number becomes less than one.
We now address the sensitivity for QoI 1 :¼ Rðt j Þ and QoI 2 :¼ ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi Cðt j Þ 2 þ Dðt j Þ 2 q . SA was performed for both scenarios (BR and RJ) considering all parameters and the initial conditions Eð0Þ, Að0Þ, and Ið0Þ. The settings p ¼ 4 and m ¼ 40 were used for all experiments.
The SA results for BR are depicted in Figure 5. The sensitivity indexes for the cumulative C and D (QoI 2 ) are heavily influenced by the considered initial conditions in the early stages of the disease propagation, although their influence decreases as time evolves. In contrast, they are not influential for RðtÞ (QoI 1 ) along all the simulation time. The influence of the parameter related to social distancing (ω) on both QoIs is remarkable. More importantly, its influence increases over time, which emphasises the need for care when establishing social distancing relaxation measures. On the other hand, the proportion of infected individuals who have symptoms is also one of the most influential parameters on both QoIs, although it is more relevant considering QoI 2 . This points out to the need of having a more widespread testing policy. Regarding the RJ scenario, Figure 6 reveals very similar scores. One noteworthy distinction is the rank of influence of Eð0Þ and Ið0Þ associated with QoI 2 . The former is more influential in the BR scenario (Figure 5b), while the latter has a higher score for RJ (Figure 6b). Overall, the most influential parameter is the removal rate (ω), and its impact increases as the dynamics evolves. The second most influential parameter is the proportion of I with symptoms (ρ). We remark that the latter parameter is fixed due to the limited testing capacity in BR.

A study case: How would different social distancing measures affect the disease spread?
The constant value hypothesis for the rate ω at which susceptible, exposed, and infected individuals are removed due to social distancing measures is an idealisation of reality because the actual value is probably dynamic and spatially heterogeneous. However, such an idealisation is useful to qualitatively analyse the effect of different disease control mechanisms. On the other hand, the SA presented in the previous section indicated that variations in the value of ω significantly impact the quantities of interest. Now, to investigate the qualitative effects of different social distancing relaxation scenarios implemented at day t d , we consider the time-dependent removal rate ωðtÞ ¼ ω � e À λðtÀ t d Þ ; t � t d ; as defined in Equation (2). The higher the value of λ ¼ ln 2ðt 1=2 Þ À 1 , the more quickly measures of social distancing are reduced. We then consider three cases: (i) a sudden release from social distancing, which is modelled as a decay with half-life as t 1=2 ¼ 0:1 days; (ii) a gradual release, with half-life as t 1=2 ¼ 15 days, and (iii) the current social distancing policy, with constant removal rate, which represents the original study case. For cases (i) and (ii), we selected t d ¼ 165 days, the time for which the peak of active cases has already passed. Figure 7 shows the consequences of relaxing social distance measures in terms of the model distributions of C and D for BR at t ¼ 397 days. Additional results are displayed in the SM for both BR and RJ scenarios. Remarkably, a sudden release can yield an increase in C, D, and their uncertainties, extending the crisis duration due to a slower decrease in the active cases (see the SM for more details). This effect can also be noted in Figure 3b and 4b, where we show the time evolution of RðtÞ for cases (i) and (ii) in comparison with the original scenario. The sudden and gradual social distancing release policies make the decrease of RðtÞ far slower, implying a slow control of the disease. As direct consequences, more cases of C and D may occur, worsening the health damage to the entire population. Thus, determining the moment and the way to start relaxing social distance measures requires very special care, as inappropriate measures can prolong the health crisis for a very long period.
We also remark the importance of the date to start relaxing social-distancing measures in the following less favourable scenario. Considering that social distancing measures are relaxed 20 days before the peak of the active cases for BR (t d ¼ 125 days), the spread of the disease can yield a critical scenario even when a smooth and gradual release strategy (half-life as t 1=2 ¼ 20 days) is adopted, causing an increase in C and D, and a drastically longer pandemic period, as shown in Figure 8. In this case, the peak of active cases would occur on the simulation day 154 (95% CI: 152-158) with around 664.6 thousand (95% CI: 651.5-680.8) active cases, and D is expected to be around 270.7 thousand (95% CI: 255.1-288.9) at the end of the simulation, with C around 7.976 million (95% CI: 7.542-8.473). Note, however, that the dynamics of C and D did not reach the stabilisation level, reflecting the more critical scenario mentioned earlier.

Discussion and conclusions
The present paper contributes to the development of a modelling framework to investigate the expansion of COVID-19 and the impacts of different ways to relax the measures of social distancing in the presence of uncertainties. We applied the developed approach to model the COVID-19 dynamics in BR and RJ. Being a highly populated Brazilian state, RJ was one of the first states to adopt mitigation actions, such as the suspension of classes, cancellation of events, and home isolation (Crokidakis, 2020a;Silva et al., 2020) (but has not implemented a population quarantine ). According to the data released by the Brazilian Ministry of Health when the results of the present work were obtained, RJ was one of the most affected states, both in the number of registered cases of COVID-19 and in the number of deaths. For this reason, the present study analysed the spread in this state, as well as in the country as a whole, to assess the particular characteristics of the pandemic at those different spatial scales. Extensive research for the spread of COVID-19 in BR with multiple perspectives has been reported, e.g., (Bastos and Cajueiro, 2020;Crokidakis, 2020aCrokidakis, , 2020b; Reis et al. (2020);Schulz et al. (2020). The present work takes into account factors that are predominant in many underdeveloped countries, such as the limited testing capacity and the policy to test only severely ill hospitalised individuals. We hope that the present modelling framework brings some insights or guidelines for public health and policy-makers.
Due to data paucity, parameter identifiability is a major difficulty (Massonis et al., 2020). To overcome this issue, only four model parameters (and two hyperparameters) were calibrated using a Bayesian approach. Other parameters, as well as model initial conditions, were set based on the available  information on COVID-19 (see the SM for more details). Our simulations forecast that the peaks of active cases occurred on July 28, 2020 (95% CI: 26-29) and July 3 2020 (95% CI: 2-4) for BR and RJ, respectively. This difference can be explained due to the discrepancy of the spatial scale and the way the disease has spread along with the different Brazilian locations, considering, for instance, different political measures taken and the social and demographic structure of each Brazilian locality (Marson and Ortega, 2020). The social distancing measures implemented in RJ at the beginning of the pandemic seemed to be able to flatten the epidemic curve and postpone the peak of active cases (Crokidakis, 2020a), even though it occurred 25 days before the peak of active cases in the country. However, as happened to the whole country , these measures were not maintained for a long time, so the disease continued to spread which may explain the growing numbers of cases and deaths. Indeed, since September 2020, when the results of the present paper were obtained, the disease spread has taken a different course. In the beginning of 2021, faced with the difficulty of maintaining measures to mitigate the disease, delay in vaccination, and the emergence of new SARS-CoV-2 variants, the Brazilian population has witnessed an unprecedented escalation of the disease. These differences should be taken into account in future studies to understand the current state of the COVID-19 dynamics in BR.
Aiming to evaluate the influence of uncertainties in hard-to-track populations such as undiagnosed infected individuals (I and A), as well as those who carry the disease and are unable to transmit (E), we performed a global SA to understand which model factors (parameters and initial conditions) play important roles at the various stages of the initial phase of the COVID-19 pandemic for both BR and RJ. The analysis confirms that a proper understanding of how the disease spreads can provide insights and aids to elaborate containment decisions to reduce RðtÞ. In this sense, considering both BR and RJ, SA suggests that the most influential parameter for a longterm perspective is the removal rate parameter (ω).
We have shown that the rate at which S, E, I, and A individuals are removed due to social distancing measures significantly affects RðtÞ, CðtÞ and DðtÞ. In order to study the qualitative effects of changes in social distancing policies, we proposed to model ω as a timedependent exponential decay function (Equation (2)), which can represent different social distancing relaxation scenarios. Our analysis showed that when more abrupt social distancing relaxation is implemented after the occurrence of the peak of active cases, it accompanies a longer extension of the duration of the disease, with approximately 29% increase and much higher uncertainty in the projected numbers of C and D at the end of the simulation. If implemented before the peak, the consequences can be devastating, as indicated by our results. The hypothetical scenario built by considering a slow and gradual release implemented 20 days before the original peak indicates a delay of 9 days in the occurrence of the peak of the active cases, with more than 7.9 million confirmed cases and about 271 thousand deaths accumulated over less than 13 months of the spread of the disease in BR. Our simulations highlight the importance of relaxing social distancing measures only under a very careful follow-up.
We note that the analysis performed in this paper should be viewed from a qualitative perspective. The conclusions for the considered hypothetical scenarios (with and without social distancing relaxation) are based on the data and knowledge available when the results were obtained. Model simplifications and the calibration procedure of model parameters can explain potential quantitative discrepancies between our simulations and the observed data. Such simplifications include: (i) homogenisation of age, social, and spatial structure, (ii) some model parameters are fixed, with values obtained from the literature, (iii) model parameters are constant along time (except ω, when investigating the impacts of relaxing social distancing measures), and (iv) use of raw data. Moreover, the available data have limited information due to underreporting of infected cases, since most cases reported in BR are of hospitalised individuals. These simplifications are inherent in the modelling procedure and should be viewed as part of the scientific process of understanding natural phenomena. Nevertheless, our framework could promptly be employed at other locations (in Brazil or other countries). This is because the general type of model we employed has the potential to enable the modeller to make a re-estimation of the parameters and adapt to the specific case study (Oliveira et al., 2021).
Using the same proposed modelling framework, the present work can be extended to consider additional phenomena such as spatial heterogeneity, parameters that vary in time and space, under-reporting of actual infected and dead cases, among others. The effects of noise on the data can be mitigated through the use of regularisation processes that have the potential to improve model calibration and forecast (e.g., Abry et al. (2020), Velásquez andLara (2020), andZeroual et al. (2020)).
It is important to highlight the impacts of social distancing relaxation to control the COVID-19 pandemic. For the investigated scenarios for which the relaxation of social distancing measures was implemented after the peak of infection, our results indicate that a very slow decrease in RðtÞ may occur, or even its stagnation, afterwards. A direct consequence is that the disease would need more time to be eradicated. Our analyses suggest that policies based on short-term social distancing are not enough to control the evolution of the pandemic, and these results are in line with other papers from other locations (e.g., Pei et al. (2020)). Employing data from Brazil at the beginning of the pandemic, Bastos and Cajueiro (2020) predicted an excessive number of infection cases when social distancing policies were relaxed before an estimated optimal date. Some authors argue that longer or even intermittent social distancing will be necessary to avoid recurrent outbreaks. Specifically, Kissler et al. (2020) examined a range of likely virus transmission scenarios until 2025 and assessed non-pharmaceutical interventions to mitigate the outbreak. They concluded that if the new coronavirus behaves in the same matter as similar viruses we can expect the disease to return in the coming years, depending on the level and duration of immunity, an aspect that remains to be clarified in the future. Extensive vaccination, the discovery of new treatments, together with extensive testing could alleviate the need for severe social distancing measures to control the disease. Until then, the need to maintain social distancing measures, even if intermittently, must be carefully addressed. It is noteworthy that the reduction of community transmission is crucial to prevent the emergence of new SARS-CoV-2 variants. High levels of vaccination on a worldwide scale must be pursued. Of note, the present work may be extended to design optimised vaccination strategies, as proposed in Libotte et al. (2020). Another potential extension of the present paper is to consider the economic impact of the pandemic in terms of different social distancing strategies adopted by different countries. Our proposed model could be extended in a way to couple the epidemiological and economic aspects, and in this way to be able to have a more complete perspective of the effects of the COVID-19 pandemic (Wang, 2020).