Evaluating the impact of COVID-19 on traffic congestion and safety skills using structural equation modeling (SEM) and Auto-Regressive Integrated Moving Average (ARIMA)

Abstract The current work presented a comparative analysis of traffic demand and safety skills before and after control measures during the COVID-19 epidemic, acquired time-series change data curves, and constructed a prediction model after determining the trend of traffic demand over time. From a data analysis perspective, the paper draws some interesting conclusions about long span, coarse sampling studies. In terms of the study population, the paper did focus on the specificity of the global epidemic. Kuwait was selected as a case study. Traffic demand analysis was conducted using a Structural Equation Model (SEM), Auto-Regressive Integrated Moving Average (ARIMA), and safety skills questionnaire along with flow charts and demographic variables. These methods were utilized to study the impact of COVID-19 on traffic congestion and safety skills as well as to forecast the future traffic volumes. Results showed that traffic congestion had a significant reduction during COVID-19 as a result of the preventive safety measures taken to control the spread of the virus. Such reduced traffic volume was associated with a decrease in traffic violations and an increase in the safety skills and PM skills of drivers.


Introduction
The global outbreak of COVID-19 had devastating consequences for all countries worldwide.The first case of coronavirus disease was confirmed on 31st December 2019 in Wuhan, China.Within the first three months after the first case was confirmed, the disease was declared a pandemic as millions of people had been affected, and the number of deaths in countries like Italy, Spain, and the United States was exponentially rising.Therefore, governmental agencies all over the world had implemented strategies to reduce the spread of the disease.These included public transportation banning, banning of air transport, closure of schools and other training institutions, closure of shopping malls, implementation of curfews and partial lockdown during specific hours, especially at night (National Bank of Kuwait, 2020).Although the strategies were effective and the rate of infection was significantly reduced, traffic flow regimes were substantially affected, resulting in traffic accidents and congestions.
There are several models to quantify whether an area can be termed as congested or not (Caves, 2004).Before now, the thresholds used to quantify congestion were space mean speed and free-flow speed.A congested area is one whose space mean speed is below 60-75% of the free-flow speed (Brennan et al., 2015;2018;Gong & Fan, 2017;Remias et al., 2014).This model was abandoned as it was considered inefficient.It lacked variability to land use, road geometry, speed limit, and other important factors to give accurate data.That way it was termed a static threshold model.A new index was proposed that involves the measurement of the congestion index.It has then been associated with the Travel Time Reliability (TTR) measures to generate reliable thresholds for freeways and roads.Researchers (Bhouri & Kauppila, 2011;Dowling et al., 2015;Duddu et al., 2018;Martchouk et al., 2011;Mathew et al., 2020;Mehran & Nakamura, 2009) used TTR to monitor road congestion and to carry out analysis on the strategies used to reduce it.Overall, they considered it to be their most preferred measure to study road congestion.
With accurate mediums in place to measure the level of congestion at times before and during COVID-19 pandemic, looking into research would give us insights into the change in congestion patterns.Few studies in the literature showed a direct link of traffic volume to road congestion.Traffic volume in South Korea was measured between January and March in 2020 and 2019 by Liu et al. (2020).Traffic volume started dropping as early as when the first case was confirmed by 17.3%.The following weeks was even greater with the detection of more COVID-19 cases and the reduction in traffic had increased from 23% to 26%.In Somerville, Hudda et al. (2020) researched the traffic volumes of cars and trucks on I-93 between 27 March 2020, and 14 May 2020.It was noticed that the suspension of various activities caused the traffic volume in this area to be almost half of what it used to be in previous years.Another study in Italy showed a change in traffic patterns observed around three university campuses.Favale et al. (2020) carried out the study when two of the universities faced a total closure and the third university was on a partial closure.They observed a 90% reduction in traffic from the two universities, while the third university still experienced a considerable amount of traffic.
Even though considerable reductions in traffic jams were noticed worldwide, road accidents continued to reveal themselves as a dominant threat -even though a noticeable reduction in frequency was clear -with a serious increase in severity levels (Sekadakis et al., 2021;Vanlaar et al., 2021).This is mainly related to overspeeding (Qureshi et al., 2020) and improper driving behaviour during the pandemic (Vingilis et al., 2020).Shaik and Ahmed (2022) provided a concise overview of the effect of pandemic on travel behavior and road safety in different countries worldwide.
Many modeling techniques can be seen in the literature that were used to analyze traffic demand and road safety prior to Corona.Among these are using Cellular Automata for traffic congestion analysis (Larraga & Alvarez-Icaza, 2014), Auto-Regressive Integrated Moving Average (ARIMA) for predicting traffic accidents (Ghedira et al., 2018;Hassouna & Al-Sahili, 2020;J. Zhang et al., 2007) and traffic flow (Dong et al., 2009), k-NN for short-term traffic forecasting (Paul et al., 2017;L. Zhang et al., 2013) and Deep learning models for traffic flow prediction (Essien et al., 2021).However, very few models were used to analyze traffic demand and highway safety during and after Corona.Our current work came to fill this gap in the literature by utilizing Structural Equation Modeling (SEM) and Auto-Regressive Integrated Moving Average (ARIMA) for evaluating the impact of COVID-19 on traffic congestion and safety skills in developing countries.The location of this work is also unique -i.e.Kuwait -where there is no single study in the literature that covered this area that has a strategic location in the Gulf Cooperation Council (GCC) region (Jamal et al., 2019).This work presents a comparative analysis of traffic demand before and after control measures during the epidemic, acquire time-series change data curves, and constructs a prediction model after determining the trend of traffic demand over time.From a data analysis perspective, the paper draws some interesting conclusions about long span, coarse sampling studies.In terms of the study population, the paper did focus on the specificity of the global epidemic.

Study area and data collection
The study was conducted on Dasman intersection, located in Kuwait City, which joins three roads: Jabaer Al-Mubarak Street, Jassim Mohammad Albahar Street, and East Magwa Road. Figure 1(a) shows the exact location from where the time series data was collected by AlShamlan International company using inductive loops and Figure 1(b) shows a 2D location map of the intersection.
Traffic counts were collected by AlShamlan International Trading contracting company (W.L.L) using inductive loops.Time series data were obtained for each joining road in the intersection from January 2019 until April 2021, which covers the two periods before and during COVID-19.

Traffic congestion study
The first objective of the study was analyzed through a time series data of vehicle count throughout a day from all joining roads to the intersection.The traffic count of vehicles was observed every hour of the day.Three specific days were chosen eleven days before the first case of COVID-19 was discovered in Kuwait to establish a baseline of traffic flow before COVID-19.Likewise, three specific days were chosen post the discovery of COVID-19 within a span of a month when different measures were introduced to prevent the spread of COVID-19.
The obtained traffic count dataset was compiled and arranged in two different excel spread sheets.All of the analysis and reporting of results based on one dataset at a time was carried out in the 26th version of IBM SPSS software.The data was cleaned and tabulated on per hour basis throughout the day for the six chosen days, three of which are pre COVID-19 and the remaining three are during COVID-19 when preventive measures were taken.Table 1 summarizes the details of the six days chosen for the study.
Next, in order to visualize the impact of the various preventive measures, the bar chart of Figure 2(a) was constructed to see the overall traffic demand for all the chosen six days.Similarly, Figure 2(b) summarizes the overall traffic flow of AM and PM peak hours and helps in identifying the peak hour of morning and evening each.Then a clustered bar chart (Figure 2(c)) was constructed summarizing the traffic flow in these morning and evenings peak periods, respectively.Further, it is a but obvious expectation that the traffic demand trend should be declining post the implementation of the measures to stop the spreading of COVID-19.The same supposition was investigated through Figure 2(d), which indicates the hourly relative distributions for each day that were obtained by taking the ratio of the hourly demand to the daily demand for each day, respectively.
Finally, the impact of the preventive measures to control the spread of COVID-19 in Kuwait has been quantified.Table 2 summarizes the traffic demand volume on all three studied intersections.It also compares the reduction in traffic demand during COVID-19 days post certain preventive measures been taken with that of the baseline data, which is the average of all the three days chosen before COVID-19 took over.The differences in traffic flow so obtained were tested for their statistical significance at 5% level of significance to validate the results.Moreover, a similar type of reduction in traffic demand summary table was obtained with respect to the AM and PM peak hour as shown in Table 2 at 5% statistically significant confidence levels.

Forecasting traffic volume using an Auto-Regressive Integrated Moving Average (ARIMA) model
• Preparatory Procedure for Forecast

Data import from Excel
The workflow towards obtaining a forecast of the traffic volume can be briefly explained with the flowchart shown in Figure 3.It was begun by noting that the discrete traffic volume data is available at a sampling rate of per 5 min for the East-Maqua Road and Jaber-Al-Mubarak Road.Moreover, these two roads have three lanes each.However, the sampling rate for the Inside Road is per 60 min (the lane-wise data is not available for the Inside Road -only the total traffic volume is given).In all three cases, the range of data spans a whole week.Accordingly, there are exactly 7 × 24 × 60 ÷ 5 = 2016 data-points in the data-series for the East-Maqua and Jaber-Al-Mubarak roads, whereas there are 7 × 24 × 60 ÷ 60 = 168 data-points in the data-series of the Inside Road (Jassim Al-Bahar Road).
The three raw-data series as g em 5 , g jm 5 , and g in 60 were denoted.The subscripts represent the road names whereas the superscripts denote the sampling rate.It is emphasized that all three raw data-series span a period of 28 months, but only contain information of one single week for every month therein.
To import the data in Matlab, a separate data-import routine is written for each of the three roads.The lane-wise data-series for the East-Maqua Road and Jaber-Al-Mubarak Road are vectorially added post raw-import to obtain the net traffic volume on these roads.Once the data is imported in the Matlab workspace, a .matfile is written to the memory to access the information quickly at a later point in time.This eliminates the necessity to execute the data-import routines at every call within the main program.The main program (Program Main.m) simply commits the .matfiles to memory at the beginning of every execution.It is known that the .matfile format (native to the Matlab kernel) is the fastest possible way to save and retrieve information in the Matlab workspace.
It is noted that the data-series g em 5 , g jm 5 , and g in 60 are arranged as matrices, with the rows denoting the traffic volume over the span of a week monitored at the given sampling rate, and the columns representing the months.Thus, the data-series g em 5 , g jm 5 are matrices of size 2016 × 28, whereas the data-series g in 60 is a matrix of size 168 × 28.The week-long discretized data for any month can be easily extracted by indexing the columns of the matrices.For example, the week-long data of traffic volume on the Jaber-Al-Mubarak road for the month of March 2020 g jm 5 (Mar20), containing 2016 entries is accessed simply by indexing the 15th column of the matrix g jm 5 .The data-import is carried out by the Matlab subroutine Data Import Main.m.This subroutine is called from the excel file database of each particular road, and therefore it is not included in the main directory.The MAT files created by the data import routines are copied to the main   directory, where they are subsequently called in and committed to workspace memory via Program Main.m.

Filling omissions in the inside road data
The data-import shows that the Inside Road data g in 60 contains conspicuous omissions for the months of July and August 2020.The data series is identically zero for these two months.Such significant omission of data can cause erroneous forecast -hence the omission must be remedied by physical intuition coupled with statistical methods.The physical intuition refers to the conjecture that the data for the month of July should roughly resemble that for the month of June, whereas the data for August should resemble the data for September.In the absence of other relevant information, this seems to be a reasonable assumption to make.However, the data-series for consecutive months cannot be exactly alike.Hence, the measure was resort to statistical measures.The data-series for the month of July and August are obtained by adding a Gaussian white noise (zero-mean standard normal randomly distributed error) respectively to the June and September data-series.The amplitude of the white noise is set to 50% of the standard deviation.Finally, the data series is rounded to the nearest integer.Mathematically in Equation 1, Similarly in Equation 2, where (.) denotes the nearest integer function (rounding with half-up) and X(t) is the white noise.Once the data omission in the Inside Road data-series is filled, the data is segregated into respective months and years and plotted on a uniform scale, so that the monthly variation is properly visualized.The imported weekly, omission-free data for the East-Maqua, Jaber-Al-Mubarak and Jassim Al-Bahar roads is respectively shown in Figures 4-6.The procedure outlined in this section is carried out by the Matlab subroutine Fill Omissions RTV60 IN.m.The raw-data plotting commands are contained in the subroutine Plot Original Data.m.

Up sampling the inside road data (Jassim Al-Bahar Road)
As outlined in the previous sections, the data series g em 5 , g jm 5 , and g in 60 are not of the same size, since the latter is sampled at a much lower rate of per 60 min.To obtain the net traffic volume at the junction of the three roads, the three series must be vectorially added.To this end, first the data-series g in 60 must be resampled at the higher rate of per 5 min, i.e. up sampled.This can be crudely accomplished via a simple linear/spline/cubic interpolation.However, the interpolated data will not have the noise that characterizes a time-series of the higher sampling rate.Hence, a white Gaussian noise (zero-mean standard normal randomly distributed error) is added to the interpolated data to achieve a realistic up sampled time-series.In this case, the amplitude of the white noise is somewhat arbitrarily set to 25% of the standard deviation to achieve a similar level of noise as that present in the series g em 5 and g jm 5 .Finally, the data series is rounded to the nearest integer.
For the purpose of interpolation, Matlab's interp1 command (1-d gridded interpolant) is used with the 'pchip' interpolation algorithm.The pchip interpolant is one of the family of shape-preserving piecewise cubic Hermite interpolants.The interpolated value at a query point is based on a shape-preserving piecewise cubic interpolation of the values at neighboring grid points, the resulting interpolation is shape-preserving, ∁ 1 continuous (with continuous first-order derivatives), and visually appealing.Mathematically, the upsampling procedure can be given as in Equation 3:  where, interp (x, k) is the piece-wise cubic Hermite interpolation of the series x at the query points that are k times denser than the sampling rate of x.As before, (.) denotes the nearest integer function (rounding with half-up) and X(t) is the white-noise.The upsampled and intentionally white-noise corrupted data (g in 5 ) for the Inside road is shown in Figure 7.The up sampling procedure outlined in this section is encoded in the Matlab subroutine Resample RTV60 IN.m.

Net traffic volume at the junction
The matrices g em 5 , g jm 5 and g in 5 representing the traffic volume at the three roads are now of the same size and hence can be vectorially added.The addition gives the net traffic volume at the junction, as a discretized time-series at a sampling rate of per 5 min, spanning over a period of 28 months, with each month containing the discretized data for only one full week.The net traffic volume is mathematically given as in Equation 4: The net traffic volume g jun 5 is plotted on a uniform scale in Figure 8.Note that the y-scale of Figure 8 is 400 vehicles, whereas that of the earlier figures is 250 vehicles.The plotting commands are encoded in the Matlab subroutine Plot_Summed_Junction_Data.m.

Filling data Omissions in the monthly data 2.3.5.1. The time (x) axis.
The data-series g jun 5 is not a single unbroken time series, but rather a collection of week-long observations for each month over a total period of 28 months.It is known that for any given week in any given month, the observations begin at 12:05 am Monday and end at 12:00 am Sunday.For brevity, it is assumed that the weekly observations begin on the first Monday of each month.It may also be assumed without loss of generality that the weekly observations for each month repeat throughout that month.The task considered in this section is to fill up the rest of the time-period in each month (before 12:05 am on the first Monday and after 12:00 am on the first Sunday) based on the available week-long observations.The difficulty of the task is compounded by the fact that the rearranging of the data must be consistent with the 5-min timestamps for each month, and the month-long observations stitched together should in turn be consistent with the time-stamps of each year.If the timestamps don't match, plotting along a date-time axis would be erroneous.It must be noted that this is neither a mathematical or a statistical process, but rather an algorithmic process.To this end, Matlab's in-built date-time repository function calenderduration is employed.The algorithmic implementation consists of computing the number of days in each month in excess of 28 days (denoted by the variable nExdays in the program).The 28-day span can be obtained by a four-fold repetition of the week-long observations.Based on the position of the first Monday in each month, the 28-day (four-week) long observations are

Discrete time-date data.
The imported data is a discretized time-series.To seamlessly encode the timeinformation in the data-series, Matlab's powerful family of functions representing discrete time-date data are used, namely datetime and datenum.This facilitates seamless zooming in on the time-series plots (upto seconds and beyond).For this purpose, a date-time vector denoted as TIMEX is created that bears a unique (one-to-one) correspondence with the dataseries y 5 (t).In the absence of the coupled time-information encoded in the variable TIMEX, the x-axis limits and grid for each plot would have to be separately created by specifying the x-y correspondence during each call to plot, and the correspondence would be lost at every zoom action.

Data smoothing with Hanning MA filter
The data series y 5 (t) is not directly tractable in a forecasting environment, due to (a) the densely packed and (b) highly variable data.This is in turn because of the very high-fidelity sampling rate.A forecasting paradigm works by estimating the coefficients of a polynomial model that is a best fit to the available data, so that this polynomial fit can then be extrapolated in the future to obtain a statistically reliable forecast.However, a disproportionately high-fidelity 1.Firstly, the estimation of the coefficients of the model becomes extremely computationally expensive due to the large sizes of the matrices involved in the solution.Even if a linear least-squares approach is adopted, the coefficient estimation involves solving a linear matrix problem of the type Ax b = .Unlike the large but sparse, banded matrices that arise in the Finite Element Method, the matrix A in the case of numerical minimization of the least square error problem is a fully populated matrix.Direct inversion or iterative solution of fully populated matrices of the order of 10 5 is beyond the scope of desktop computers, even with a state-of-the-art memory.2. Secondly, a forecast with extremely high-fidelity data cannot offer any more insight than the original data itself.As shown in Figure 14, the original high-fidelity data y 5 (t) (shown in blue) does not reveal any distinct trend about the time-series.The trend is buried in the high-frequency error oscillations about the non-stationary mean.A forecast with such data, even if possibly computed with the aid of super-computing clusters, will steal hide the trend under the same amount of noise, which defeats the purpose of the forecast.
Hence, it is important to first extract the trend in the high-fidelity data, and then resample the trend at a much lower rate (the resampling is discussed in the next section), so that the forecasting effort is both (a) computationally efficient, and (b) worthwhile in terms of the clear insights generated by the forecast.Smoothing can be done by various data-smoothing algorithms.A commonly used approach for data smoothing is the so-called 'moving average' (MA) window, see Figure 9 for the schematic.The basic idea behind MA window smoothing is to extract a weighted average of the data-points appearing in the window, as the window itself 'glides' through the data one data-point at a time.
Classic MA window smoothing always introduces an undesired lag in the original data-series and the smoothed data series -the severity of the lag depends on the number of data-points in the appearing in the MA window (referred to as the 'window size').Hence, a symmetric MA smoother which essentially de lags − the smoothed series by prepending and appending half window-size samples drawn respectively from the data at the end and the beginning of the series was employed.It is also helpful to limit the window size to an odd integer, to aid in obtaining a symmetric, de-lagged moving average, although this is not mandatory.A symmetric centered weighted moving average y t ( ) is given by in Equation 5.
where w[n] is the chosen MA window function, and S' is the half-span defined by Equation 6.
Several types of weighted averages w[n] exist within the general framework of MA window smoothing.A high-accuracy, yet computationally efficient type of MA window is the so-called Hanning window (also known variously as Hann window, or Raised Cosine window, Hann filter, or von-Hann filter), which is mathematically expressed as equation 7.
The Hanning MA window smoother described above is encoded in the Matlab subroutine Apply_Hanning_MA_ Window.m.The subroutine is in turn called by another subroutine Get_and_Plot_Smoothed_TV_Data.m, which allows specifying an arbitrarily large (but odd) integer value of the MA window and superimposes the smoothed data series upon the original data-series.Three different levels of smoothing (obtained by using three different Hanning MA window sizes) are shown in Figures 4-6 at three different levels of zoom.In each Figure, the Hanning MA window size varies from 145 data-points (≈12-h window), 289 data-points (≈24 h window), and 1009 data-points (≈84 h, or biweekly window).
It is clear from Figure 10 and its zoomed counterparts (Figures 11 and 12) that although the smaller window-sizes perform a significant smoothing at the zoomed-in scales, only the biweekly (84 h) window reveals a sufficiently resolved trend-line on the larger zoomed-out scale of 28 months.Hence the smoothed data-series y t ( ) obtained with the Hann window of 1009 window-size is chosen for further analysis and forecast.

Down sampling with Chebyshev low-pass filter
The smoothed data-series y t 5 ( ) obtained from the Hanning filter in the previous section is sufficiently smooth to reveal a trend in the data, but it is still sampled at the same high rate of per 5 min.Such high sampling rate is not useful for forecast due to reasons of computational efficiency (since the reasons related to the high noise in the series has been effectively dealt with by the MA smoother).Therefore, the smoothed series must be resampled at a much lower rate (that is, down sampled) that preserves the essential information present in the original high-rate series but dispenses with the unnecessarily densely packed data-points.
The down sampling can also be done crudely by deleting the elements of the original time-series at a specified rate and only retaining the elements in between.However, this produces a series which does not best-preserve the shape of the original series.Hence, for this purpose, MATLAB's powerful family of low pass filters − encoded in the in-built command decimate was employed.Decimation reduces the original sample rate of a sequence to a lower rate.It is the opposite of interpolation.When called, decimate low pass filters − the input to guard against aliasing and down samples the result.Among a variety of low pass filters − available in the repository to complement the decimate function, the Chebyshev filter was chosen, denoted in the Matlab syntax as cheby1.The down sampling rate is set at per 12 h.The low-pass filtered and down sampled time-series, denoted as y h 12 is shown in Figure 13 at three different levels of zoom.In the bottom-most figure, the 12-h sampling rate is clearly discernible.The middle and top plots convey the excellent shape-preserving properties of the low-pass Chebyshev filter.
It is clearly mentioned in the article that in the process of resampling processing and expansion, many of the operations only retained the trend of data changes, but removed some.Additional citations of relevant studies (Afrin & Yodo, 2022;Bagui & Li, 2021;Moniz et al., 2017;Morris & Yang, 2021) are available in the literature to demonstrate that such operations do not ignore the different effects on traffic under different measures.Since the study does put dedicated attention on the specificity of the epidemic event, citing such literature at current locations in the data treatment is important to compare the relevant findings with other studies that are consistent or different, and to show that the chosen method is valid in the treatment of the relevant issues.
• Forecasting with ARIMA The data-series y h 12 (t), obtained in the previous section is ready for forecasting.In what follows, the superscript denoting the sampling rate, and the bar denoting smoothing is dropped, and the series is simply referred to as y(t).Forecasting refers to extrapolating a time-series data up to a specified period in the future (also known as forecast horizon).There are two main classes of mathematical models available for forecasting: state-space and the so-called ARIMA models, popularized by Box and Jenkins in the late 1970s.The most widely used family of forecasting methods today outside of academia is the ARIMA model, which stands for Auto-Regressive Integrated Moving Average model.

Mathematical background
As the name suggests, an ARIMA model is a combination of (a) an auto-regressive AR component, (b) an integrated I, or in the reverse sense, a differenced component, and (c) a moving average MA component.Each component refers to a certain type of polynomial.To understand the three components, it is helpful to first understand some of the basic terminology and notation related to time-series manipulation.Appendix A-1.

Polynomials in L.
Discussed in Appendix A-1.

Auto regressive AR(p) process. Discussed in
Appendix A-1.

Moving average MA(q) process. Discussed in
Appendix A-1.

Auto regressive moving average ARMA(p,q)
process.Discussed in Appendix A-1.ARIMA(p,D,q) process.Discussed in Appendix A-1.

Overview of ARIMA in MATLAB
MATLAB's Econometric toolbox provides a rich array of functions for analyzing and modeling time series data.It offers a wide range of visualizations and diagnostics for model selection, including tests for autocorrelation and heteroscedasticity, unit roots and stationarity, co-integration, causality, and structural change.The toolbox offers functionalities to estimate, simulate, and forecast time-variant systems using a variety of modeling frameworks, including regression, ARIMA, multiplicative and additive seasonal ARIMA, state-space, GARCH, multivariate and switching models.Additionally, the time-variant systems can also depend on a set of independent predictor variables (exogenous covariates).Within the ARIMA framework, the possible types of implementable models include AR, MA, ARMA, ARIMA, ARIMAX (ARIMA with exogenous covariates), SARIMA (Seasonal ARIMA) and SARIMAX (SARIMA with exogenous covariates).All these models can be created and manipulated in an object-oriented framework using MATLAB's arima object.For example, calling arima (1,0,1) creates an ARMA(1,1) model, whereas calling arima (2,0,0) creates a purely autoregressive AR(2) model.Seasonal ARIMA models, and ARIMA models with exogenous co-variates require an extended syntax call using name-value pair arguments.Considering that predictor variable data is not available, and seasonal changes in the traffic volume are not clearly identifiable in the 28-month long time-series (possibly because of the large-scale disruption caused by COVID-19 pandemic), the ARIMA model (without seasonality and exogenous co-variates) is selected.The selection of the polynomial order of each component in the ARIMA model is discussed next.

Selecting the model
Since the present study uses an ARMA (p q , ) model, the selection of the model in this context refers to the selection of the order p of the auto-regressive polynomial and the order q of the moving average polynomial.It is noted that although the ARMA model is employed, the Matlab commandused is still Arima since it is applicable to the entire family of ARIMA-type models.Specifying D = 0 in Arima automatically creates an ARMA model object.The selection of the polynomial orders is largely a matter of personal judgement.Stationarity tests can help up to a certain point in deciding the order of differencing D. For example, a timeseries with a roughly linear trend without seasonal effects can use a first-order differencing (D = 1).Similarly, sample ACF (auto-correlation functions) and PACF (partial auto-correlation functions) plots [4] can help in the determination of p and q.However, interpreting the sample ACFs and PACFs is itself an art, rather than a precise science, and personal judgment as well as the past experience of the forecaster plays a key role in the determination of p and q.Another way of selecting the model is by minimizing the so-called corrected Akaike's Information Criterion (AICc) which is given as in Equation 8.
where T is the most recent number of observations in the time series, L is the likelihood of the data, AIC is given in Equation 9.
and the parameter k is given by k c c = ≠ = { 1 0 0 0 An alternative information criterion called the Bayesian Information Criterion (BIC) is also frequently used, which is given in Equation 10.
Rather than considering every possible combination of p and q, the selection can be done by the following stepwise search to traverse the model space.Based on the above algorithm, the AR and MA polynomial coefficients p, q are selected as 1, 2. The preliminary calculations for the same were done on paper.

Estimation of the model coefficients
Once the model order has been identified (i.e. the values of p, D and q are selected), the parameters c, φ1, …, φp and θ1, …, θq need to be estimated.When Matlab estimates the ARIMA model, it uses Maximum Likelihood Estimation (MLE).This technique finds the values of the parameters which maximize the probability of obtaining the observed data.For ARIMA models, MLE corresponds to the linear least squares' method of numerical optimization.Nonlinear estimators like Newton's method, and Gauss (damped) Newton method can also be used, but these are computationally expensive as well as very cumbersome to algorithmically implement.If the model data-fit is denoted as ˘yt, the residual at time t is denoted as When a time series model is fitted to data yt (to obtain the best-fit ˘yt by least squares, say), lagged terms in the model require initialization, usually with observations at the beginning of the sample.Also, to measure the quality of forecasts from the model, the data at the end of the sample from estimation must be held out.Therefore, before analyzing the data, the time base is partitioned into three consecutive, disjoint intervals: Three-time base partitions for univariate autoregressive integrated moving average (ARIMA) models are the pre sample, estimation, and forecast periods.
5. Presample period -Contains data used to initialize lagged values in the model.An autoregressive integrated moving average model ARIMA(p D q , , ) model requires a presample period containing at least p D + observations (This is the property P of the arima model object (Figure 14).

Simulation and forecasting with ARIMA
Once the selected model coefficients are estimated, the resulting polynomial model can then be extrapolated beyond the fitted data into the future forecast horizon.MATLAB's forecast command is used to predict the forecast using the model estimated as described in the previous section.The function estimate also optionally computes the root-mean square errors, using which the 95% confidence bound intervals can be generated.Simulation refers to generating individual random stochastic response paths using Monte-Carlo simulation technique.MATLAB's simulate command is used for this purpose.If a sufficiently high number of simulations (>50) are run, then a reliable forecast can be made from the simulated paths using the mean of the simulations and the approximate 95% confidence bounds calculated as the 2.5th, 50th (median), and 97.5th percentiles of the simulated response paths using the function prctile.

Traffic congestion study
Figure 2(a) shows clearly that the total traffic demand is decreasing after the implementation of different types of precautions in the form of three specific measures chosen on particular days for this study as compared to the typical days before COVID-19.Regarding the hourly distribution of traffic, represented through Figure 2(b), it shows that in general the hourly distribution remains the same with low traffic volume between 12:00-6:00 am after which traffic demand starts to increase between 7:00-9:00 am Then the traffic demand seems to remain steady throughout the rest of the day before it starts dropping from 6:00 pm onwards up till midnight.However, the three days before COVID-19 experienced a very similar hourly trend and demand volume throughout the day.It also indicates that the morning peak period (AM Peak) is between 7:00-8:00, and the evening peak period (PM Peak) is between 8:00-9:00.
Figure 2(c) shows that there is a continuous drop in PM peak demand after implementing the three measures.
However, the AM peak has experienced an increase in demand after implementing Measure 2. The probable reason for this increase in the AM peak could be due the closure of commercial and shopping centers (malls) with the exception of the central food markets, supplies and pharmacies, which could have led people to make purchase of basic commodities in the morning hours.The other reason could be as the public transportation was suspended in preventive Measure 2, so individuals could have used more private vehicles to travel, which could have also led to the increase in the count of vehicles.
Further, the ratios between hourly and daily demand for each of the six days are plotted against each hour of the day in Figure 2(d).It can be seen that the hourly relative distributions (in percentages) for the three typical days prior the discovery of COVID-19 is almost similar regardless of the demand changes.However, in the case of the three days chosen post the preventive measures been implemented, the traffic flow shows a declining trend mostly except for certain peak points (hours).
The traffic demand volume on all the seven lanes in the studied intersection were used to compare the reduction in the total traffic for each of the three measures relative to the baseline scenario (which is the average of all the three typical before COVID-19 days), and also relative to the each of the previously implemented measures (which is Measure1 and Measure 2, respectively).The results for the same are summarized in Table 2, which shows that the daily traffic demand has dropped constantly after implementing each of the three measures.Once Measure 1 was implemented, the traffic demand has dropped by 31.9% at all of the seven lanes studied intersections.As for Measure 2, it has further dropped the traffic demand by an additional 10.3% compared to the demand after implementing Measure 1.The two measures combined (Measure 1 and Measure 2) have reduced the baseline traffic by 38.2%.Finally, once Measure 3 was implemented, it has dropped the already reduced traffic (due to Measures 1 and 2) by an additional 4.1%.The three measures together have contributed toward reducing the background traffic by 40.7%, which is more than one-third of the baseline demand.
Finally, a similar computation was performed for the morning and evening peak periods, between 7:00-8:00 for the AM peak and between 8:00-9:00 for the PM peak with their respective results summarized in Table 2.These results showed that the implementation of three preventive measures was able to reduce the traffic demand by approximately 49.6% of the morning peak traffic demand and 66.7% of the evening peak period demand.

The forecast results
The forecast horizon chosen for the present study spans a length of 146 data points (=4 months) from 1 May 2012 to 31 August 2021.The forecast made using ARMA(1,2) is shown in Figure 8.The associated estimated model parameters are shown in Table 3.In Figure 14, the firm grey line denotes the available (smoothed and down sampled) traffic volume.The firm and dotted lines in green color denote the forecast and simulated mean (using 100 Monte-Carlo random path simulations).The firm and dotted lines in blue color represent the forecast and simulated upper bound of the 95% confidence interval, whereas those in red color denote the corresponding lower bound.The grey dots represent the outliers in the simulated paths that lie above the upper bound and below the lower bound of the simulated 95% confidence interval.

Limitations of the forecast
The limitations in the statistical significance of the ARIMA forecast in Figure 14 of the previous section stems from the scant size of the available data.It is known that the high fidelity of a given time-series (as dictated by the sampling rate) does not on its own add value in terms of its forecasting potential.
As far as forecasting is concerned, the significantly more importance attribute is the time-span covered by the data.It is easy to see why a coarsely sampled time-series (perhaps sampled once or twice a month) spanning several years or decades has far more long-term (of the order of years) predictive potential than a very finely sampled time-series (say every five minutes, as is the case in the present study) spanning only several months.The sampling rate dictates the sensitivity of the short-term prediction, whereas the total time-span dictates the accuracy of the long-term forecast.Furthermore, with the time-series fabricated from the available data in the present study, even short-term forecasts (say of the order of 3-4 months) are not of a commendable statistical significance.This is because the concerns stemming from the scarcity of the data are further exacerbated by the presence of a structural change (change-point, or break point) in the time series introduced by the COVID-19 pandemic.It is known that the presence of a change point very close (relative to the total time-span) to the forecast horizon results in wildly erroneous coefficients of the model.This is because there isn't enough data for the model estimation procedure (whether it is least squares or some other non-linear optimization algorithm) to generate reliable coefficients.Hence, although the estimated model coefficients may provide an adequately good fit for the holdout data in the pre-sample period, the forecast itself is bound to have confidence intervals that are too far apart for the forecast to be practically usable.The further apart the confidence intervals are, the lesser is the statistical significance commanded by the forecast mean.

Conclusions
This study shows clearly that the total traffic demand is decreasing after the implementation of different types of precautions in the form of three specific measures chosen on particular days for this study as compared to the typical days before the COVID-19 happened.Results show that there is a continuous drop in PM peak demand after implementing the three measures.However, the AM peak has experienced an increase in demand after implementing Measure 2. The probable reason for this increase in the AM peak could be due the closure of commercial and shopping centers (malls) with the exception of the central food markets, supplies and pharmacies, which could have led people to make purchase of basic commodities in the morning hours.The ratios between hourly and daily demand for each of the six days show that the hourly relative distributions (in percentages) for the three typical days prior the discovery of COVID-19 is almost similar regardless of the demand changes.However, in the case of the three days chosen post the preventive measures been implemented, the traffic flow shows a declining trend mostly except for certain peak points (hours).
The results show that the daily traffic demand has dropped constantly after implementing each of the three measures.Once Measure 1 was implemented, the traffic demand dropped by 31.9% at all of the seven lanes studied intersections.As for Measure 2, it has further dropped the traffic demand by an additional 10.3% compared to the demand after implementing Measure 1.The two measures combined (Measure 1 and Measure 2) has have reduced the baseline traffic by 38.2%.Finally, once Measure 3 was implemented, it has dropped the already reduced traffic (due to Measures 1 and 2) by an additional 4.1%.The three measures together have contributed toward reducing the background traffic by 40.7%, which is more than one-third of the baseline demand.
As far as ARIMA forecasting is concerned, the significantly more importance attribute is the time-span covered by the data.It is easy to see why a coarsely sampled time-series (perhaps sampled once or twice a month) spanning several years or decades has far more long-term (of the order of years) predictive potential than a very finely sampled time-series (say every five minutes, as is the case in the present study) spanning only several months.The sampling rate dictates the sensitivity of the short-term prediction, whereas the total time-span dictates the accuracy of the long-term forecast.Furthermore, with the time-series fabricated from the available data in the present study, even short-term forecasts (say of the order of 3-4 months) are not of a commendable statistical significance.This is because the concerns stemming from the scarcity of the data are further exacerbated by the presence of a structural change (change-point, or break point) in the time series introduced by the COVID-19 pandemic.Hence, although the estimated model coefficients may provide an adequately good fit for the holdout data in the pre-sample period, the forecast itself is bound to have confidence intervals that are too far apart for the forecast to be practically usable.
The amount of data and the complexity of prediction models in the article are too small, resulting in its reference value not being obvious enough.Suggestions that could be considered, include analyzing the traffic volume data after the recovery of the epidemic, considering more influencing factors, or using more prediction models for comparison, etc.The results of the current analysis are macroscopic and broad, and the lack of more detailed studies on the mechanisms of the impact of epidemic-related control measures on traffic congestion makes it difficult to guess and interpret the results of similar data predicted by others in practice.This could be a future research objective.
Constant assessment of newly gained data-points for accuracy of the forecast future traffic-volume is recommended.Forecasting in the presence of a change-point close to the forecast horizon is an active research topic in the field of statistics and numerical optimization.Dividing the time-series across the change-points, followed by a piece-wise estimation of the models for each portion, and then stitching back a coherent whole together using a variational principle in the area of Finite Elements is also recommended for future studies.
The current paper focused more on the discussion of the advanced analysis methods in hand.The actual model parameters and measures of fit for the ARIMA fitting in comparison to other techniques will be the core of our forthcoming paper.We still need more data to include a regression along with the time series analysis that could directly account for the external stimuli affecting the traffic volumes rather than just focusing on how the predictions differ from the observed.

Disclosure statement
No potential conflict of interest was reported by the author(s).

Funding
The author(s) reported there is no funding associated with the work featured in this article.

Figure 1 .
Figure 1.(a) Satellite view of the intersection used for traffic data collection.(b) 2d map of the location used for traffic data collection.
days after the closure of commercial and shopping centers (malls) with the exception of the central food markets, supplies, and pharmacies along with suspension of public transport.

Figure 2 .
Figure 2. (a) total traffic demand for all intersections.(b) Hourly variation in traffic demand for all intersections.(c) Peak hour traffic demand for each studied intersection.(d) relative hourly to daily traffic demand distributions.

Figure 3 .
Figure 3. a flowchart of the workflow for obtaining the traffic-volume forecast from the available data.

Figure 4 .
Figure 4. number of vehicles coming from east maqua at a sampling rate of per 5 min.the y-axis limit of each subplot denotes 250 vehicles (axis labels not shown for clarity).traffic volume data at the junction courtesy of (1) General department of traffic -ministry of Interior, Kuwait, and (2) traffic data from alShamlan corp.

Figure 5 .
Figure5.number of vehicles coming from Jaber-al-mubarak at a sampling rate of per 5 min.the y-axis limit of each subplot denotes 250 vehicles (axis labels not shown for clarity).traffic volume data at the junction courtesy of (1) General department of traffic -ministry of Interior, Kuwait, and (2) traffic data from alshamlan corp.

Figure 6 .
Figure 6.number of vehicles coming from the Inside road at a sampling rate of per 60 min.the y-axis limit of each subplot denotes 250 vehicles (axis labels not shown for clarity).traffic volume data at the junction courtesy of (1) General department of traffic -ministry of Interior, Kuwait, and (2) traffic data from alshamlan corp.

Figure 7 .
Figure7.number of vehicles coming from the inside road.the y-axis limit of each subplot denotes 250 vehicles (axis labels not shown for clarity).the original series sampled at per 60 min is resampled at a higher rate (per 5 min), to match the sampling rate on the other two roads.a zero-mean Gaussian white-noise of 50% standard deviation is added to the original data series.

Figure 8 .
Figure 8. total number of vehicles at the junction.the y-axis limit of each subplot denotes 400 vehicles (axis labels not shown for clarity).traffic volume data from all three roads is summed.

Figure 9 .
Figure 9. Schematic of data-smoothing by moving average (ma) window.

Figure 10 .
Figure 10.Smoothed time-series of the total number of vehicles at the junction using different Hanning ma window sizes.top: 12-h window, middle -24 h window, bottom -biweekly (84 h) window.

Figure 11 .
Figure 11. a zoomed-in view of the smoothed time-series of the total number of vehicles at the junction using different Hanning ma window sizes.top: 12-h window, middle -24-h window, bottom -biweekly (84 h) window.also see figure 10.

Figure 12 .
Figure 12. a further zoomed-in view of the smoothed time-series of the total number of vehicles at the junction using different Hanning ma window sizes.top: 12-h window, middle -24-h window, bottom -biweekly (84 h) window.also, see figures 10 and 11.

Figure 13 .
Figure 13.low pass filtered, resampled total traffic volume at the junction at a sampling rate of per 12 h.the low-pass filter cheby1 (Chebyshev 1-d low-pass filter) called by decimate compensates for the delay introduced by the filter.

Figure 14 .
Figure 14.arIma(1,0,2) forecast of the traffic volume for a total period of ≈4 months.the first wave of the CoVId-19 pandemic is shown in red transparent patch.the right figure is a zoomed-in view of the forecast.
on the current model are considered: a. vary p and/or q from the current model by ±1.b. include/exclude the constant c from the current model.3.The best model considered so far (either the current model, or one of these variations) becomes the new current model.4. Repeat Step 2(b) until no lower AICc can be found.

Table 1 .
details of the six chosen days for the current study.

Table 2 .
the highlighted values indicate the demand reduction percentage for each action independently.

Table 3 .
Summary of the estimated non-Seasonal, univariate arma(1,2) model for traffic volume, with a Gaussian conditional probability distribution.