Machine learning predicts bioaerosol trajectories in enclosed environments: Introducing a novel method

Abstract The COVID-19 pandemic has sparked a global interest in understanding the mechanism of transmission of bioaerosols in enclosed environments. Swift and accurate calculations of particle trajectories are crucial for predicting the diffusion of bioaerosols. The use of machine learning can expedite these calculations and predictions. However, research focusing on the use of machine learning methods for calculating bioaerosol trajectories remains scarce. A bioaerosol trajectory is a time series with long intervals and delays between different positions; certain machine learning models are well suited for handling time series data. Herein, we aimed to establish a new method we refer to as physics–machine learning (P–ML) that includes a machine learning model for calculating bioaerosol trajectories. To this end, we adopted a lightweight, single-layer long short-term memory (LSTM) model and used supervised learning with mean squared error as an evaluation metric for training. Our findings indicate that disregarding the turbulence diffusion enables us to train the LSTM model by a motion equation. Furthermore, the model accurately predicted trajectories while exhibiting some degree of transferability. However, when considering the turbulence diffusion of bioaerosol, the training data in P–ML method could not be generated using a motion equation with turbulence diffusion model (Discrete Random Walk model). To address this issue, we integrating the fluctuating velocity into the LSTM model input. Consequently, the predicted results were consistent with the motion equation. Our method exhibits considerable potential for expediting trajectory calculation and aiding in early warning and rapid design in enclosed environments. Copyright © 2023 American Association for Aerosol Research


Introduction
With the global spread of the COVID-19 pandemic that originated in 2019, infectious diseases and public health emergencies have become increasingly prevalent in the twenty first century. Research on airborne transmission associated with COVID-19 (Dai and Zhao 2022;Drossinos et al. 2022;D'Orazio, Bernardini, and Quagliarini 2021;Hao et al. 2020;Laxminarayan et al. 2020) has become increasingly important as bioaerosols containing pathogens can spread over distances of >10 m (Morawska and Cao 2020), causing SARS-CoV-2 transmission (Busco et al. 2020;Drossinos and Stilianakis 2020;van Doremalen et al. 2020). Consider that over 85% of human life is spent in enclosed environments, the risk of infection in these spaces is a serious concern. The COVID-19 outbreaks in a Guangzhou restaurant and on a bus exemplify the potential longdistance transmission of bioaerosols within closed spaces (Ou et al. 2022;Li et al. 2021), including hospitals (Stockwell et al. 2019), stations (Madhwal et al. 2020), and offices (Lindsley et al. 2020;Zivich, Gancz, and Aiello 2018). Consequently, understanding the distribution of bioaerosols in enclosed environments has received considerable attention from researchers, managers, and designers interested in predicting the spread of infectious diseases (Dai and Zhao 2023;Abuhegazy et al. 2020;Mubareka et al. 2019).
At present, research on bioaerosol transmission in enclosed environments is primarily conducted through field experiments (Lou et al. 2021;Wang, Fu, and Chao 2021;Tang et al. 2020) and computational fluid dynamics (CFD) numerical methods (Buckley et al. 2012). Field experiments are time consuming and labor intensive, while CFD numerical methods provide detailed information on bioaerosol transmission under controlled conditions, offering convenience and speed. Therefore, CFD is often adopted for calculating bioaerosol transmission (Liu, Yin et al. 2022;Rockett et al. 2020;Meccariello and Gallo 2020;Ivorra et al. 2020;Enserink and Kupferschmidt 2020;Diwan et al. 2020;Chen 2020;Chaudhuri et al. 2020;Abuhegazy et al. 2020). The bioaerosol trajectory problem is categorized into two parts: (1) obtaining the fluid phase by solving the Navier-Stokes equations using the Euler method and (2) treating bioaerosols as a discrete phase by solving the equilibrium forces acting on them using the Lagrange method, thereby tracking the bioaerosols through an estimated flow field. However, the computational burden associated with these methods limits their practical application in bioaerosol trajectory prediction.
Machine learning techniques offer a solution to reduce the computational burden, enabling the creation of data-driven models that predict small sections of a domain (Gan et al. 2022). The effectiveness of this combined approach has already been demonstrated in practical engineering applications Christodoulou et al. 2020;Mohebbi Najm Abad et al. 2020), accurately capturing complex spatiotemporal behaviors even when only a small fraction of the domain is calculated with high precision (Christodoulou et al. 2020). Machine learning can also significantly reduce the computational cost of analysis, sometimes by over 100 times (Christodoulou et al. 2020). In the environmental domain, a multitude of machine learning models have been employed for pollutant emission prediction and risk assessment. For instance, Dai et al. utilized Vector autoregression and the Kriging method to establish a spatial prediction approach for O 3 mass concentration . Furthermore, several haze hazard risk assessment models were developed by enhancing the particle swarm optimization (IPSO) algorithm and employing the light gradient boosting machine (LightGBM) algorithm (Dai, Huang, Zeng, and Yu 2022). These machine learning techniques circumvent the need to delve into intricate details and mechanisms of pollutant emissions. Instead, they directly model pollutant predictions, leading to a significant reduction in computational costs.
One type of data-driven technology is the recurrent neural network (RNN), a neural network used to process sequence data and extract time and semantic information from input data (Junwei et al. 2019). RNN-based deep learning models have been successfully applied in various natural language processing fields, including speech recognition, image recognition, machine translation, and temporal analysis (Abiodun et al. 2018;Dutta et al. 2018). The long short-term memory (LSTM) is a specific variant of RNN that is well-suited for analyzing and forecasting events in time series with intervals and delays (Hochreiter and Schmidhuber 1997). Unlike RNN, which stores all information from the previous time step, LSTM selectively stores information, addressing the issues of gradient disappearance, and gradient explosion in long sequence training (Sherstinsky 2020;Ye et al. 2019). In summary, LSTM exhibits better performance than ordinary RNN for longer sequences . Since bioaerosol transmission is also a long sequence process (Wan and Sapsis 2018), the LSTM model may be suitable for training and calculating bioaerosol trajectories.
In this study, we try to establish a novel method including machine learning model to track bioaerosol trajectories in the flow field. Three cases were explored using supervised learning to train the LSTM model using multiple sets of bioaerosol spatial data generated from solving different types of motion equations for the bioaerosol. The results were analyzed and discussed in detail, and potential avenues for improvement were identified. The physicsmachine learning (P-ML) model, introduced in this article, employs a streamlined LSTM architecture to forecast particle velocity while integrating fundamental physical principles to enable accurate prediction of bioaerosol trajectories. This approach amalgamates multiple factors, including the forces acting on discrete entities, within the machine learning framework.

LSTM
LSTM is an RNN architecture extensively used in deep learning. This article utilized an LSTM model (additional information is included in Section 1.1 of the Supplementary Information)

Calculation framework of bioaerosol trajectory
The Euler-Lagrange method is the commonly used framework for calculating bioaerosol trajectories ( Figure 1) Liu et al. 2020a). This method involves two steps. In the first step, the fluid phase is treated as a continuum, and the time-averaged Navier-Stokes equations, along with other governing equations, are solved (Ye and Zhang 2022). In the second step, the bioaerosol phase is treated as a discrete phase, and the transient bioaerosol trajectory is determined based on the flow field. Since bioaerosols constitute a small proportion, they have minimal impact on the flow field (Chaudhuri et al. 2020;Ivorra et al. 2020). In an enclosed environment, the flow field can be considered as a preprocessing step for calculating the bioaerosol trajectory ( Figure 2). In this study, the Euler method was used in Step 1 to discretize the governing equation of the flow field into a sequence of grid nodes, employing computational mathematics. This technique is crucial for determining the flow properties within an enclosed space. The governing equation is represented as follows: where q is the air density, u represents the velocity components, V ! is the air velocity vector, C u is the effective diffusion coefficient of u, and S u is the source item. Direct numerical simulation, large eddy simulation, and Reynolds averaged Navier-Stokes (RANS) are three approaches used to simulate interior turbulence airflow. Among these models, the RANS model offers lower computational load and time. Specifically, the RNG k-e turbulence model, as one of the RANS models, was chosen in this study as it improves the accuracy of air flow and is considered an effective alternative for modeling indoor air flow. ANSYS Fluent 2021R1 was employed to solve the differential equations. The computations utilized least squares for gradient dispersion, a second-order format for discrete pressure terms, and a semi-implicit pressure-velocity linked (SIMPLE) method. Convergence was achieved when the scaled velocity and continuity residuals approached 10 À3 and the energy residuals reached 10 À6 .
Previous studies obtained the trajectory by solving the motion equation that accounts for various forces acting on individual bioaerosols. In this study, the calculation of trajectories in Step 2 adopted the LSTM model. A bioaerosol trajectory could be discretized based on time (Figure 2), representing a time series of particle positions with additional information, such as velocity. Since these events have long intervals and delays in the time series, LSTM is well-suited for forecasting them. In long-sequence training, LSTM overcomes the challenges of gradient disappearance and explosion and yields superior performance (Si et al.  2020). The subsequent sections introduce the training and planning methods of this novel approach for bioaerosol trajectory.

Computational domain
The top supply and bottom return mode is commonly employed in enclosed environments (Li et al. 2012). For this study, a 2D computational domain and sample flow field were selected from the isothermal test case provided in IEA Annex 20 (Lemaire et al. 1993). Figure 3a illustrates the geometry and boundary conditions of case 1. The 2D model had dimensions of 3 m Â 9 m, with air being supplied through the top left corner and exhausted through the bottom right wall. The inlet velocity was set to 0.455 m/s, while the turbulent kinetic energy (k 0 ) and dissipation rate (e 0 ) were 4.97Â 10 À4 m 2 /s 2 and 6.59 Â 10 À4 m 2 /s 3 , respectively. The gauge pressure at the outlet was set to 0 Pa. The bioaerosol source was located at coordinates (0, 2.832 m).
For the purpose of comparison and a detailed analysis of the proposed model, a second 2D computational domain (Case 2) was utilized. This 2D model had dimensions of 4 m Â 8 m, as depicted in Figure 3b. The inlet was positioned at the top of a sidewall, while the outlet was located at the bottom of the same wall.
Additional boundary conditions are also shown in Figure 3b. Bioaerosols were released from a point source at position (0, 3.5 m).
Case 3, as shown in Figure 3c, represented a room with dimensions of 8 m Â 3 m Â 4 m (L Â W Â H) and utilized a side supply and side return air supply method. The size of the air supply outlet was 0.5 m Â 2 m, positioned 0.3 m from the ceiling and 0.3 m from the floor. In Case 3, the bioaerosol source was a surface source at the inlet, releasing 1500 particles at once.
In Case 1, all boundary temperatures were set to 20 C, neglecting the influence of temperature in the calculations. In Case 2 and Case 3, a moderate temperature gradient and limited change in flow density met the assumptions of the Boussinesq approximation. This approximation treats air as a low-velocity incompressible fluid and considers the effect of temperature changes on air density, resulting in a uniform release of convective heat from the surface. The walls surrounding the model were set to nonslip, and wall enhancement techniques were employed to address turbulence characteristics near the walls. The inlet type was specified as velocity-inlet, and the outlet type was defined as outflow.
The model settings did not consider the specifics of human breathing, and the bioaerosol source was assumed to be constant. Previous observations have indicated that the flow of human breathing has minimal impact on the overall flow field (Zhou et al. 2016). Consequently, several studies have treated bioaerosols emitted by human breathing as a constant source during bioaerosol emission simulations Wang et al. 2022). Therefore, in this study, we have not incorporated the influence of human respiratory flow on the overall flow field, and we have disregarded the effect of breathing rate on particle velocity.

Training set acquisition
The data-driven model necessitates training data. To obtain this data, the conventional force analysis method was employed to calculate the motion equation of bioaerosols and generate the training set. The differential equation for solving the motion of an individual bioaerosol under the influence of various forces is as follows: where u ! and u ! p are the velocity of airflow and bioaerosols, respectively; q and q p are the density of airflow and particles, respectively; g ! is gravitational acceleration; F ! represents the additional forces acting on the bioaerosols. To solve the trajectory, Equation (2) should be combined with the following Equation (3): where x ! p are the position of the bioaerosols. Additional forces included the Brownian force, Bassett force, Saffman's lift force, virtual mass force, pressure gradient force, and thermophoretic force. Research has shown that the Brownian force has a significant impact on particles smaller than 0.5 mm (Chang and Hu 2008). Moreover, the virtual mass force, Bassett force, and pressure gradient force can be disregarded when the ratio of air density to particle density is small (Zhao et al. 2004). In this study, only Saffman's lift force and the thermophoretic force were considered in calculating the bioaerosol trajectory. It is worth noting that bioaerosol transmission also involves processes such as evaporation and heat transfer, which were not considered in this study. The inclusion of additional factors in trajectory calculations results in increased complexity of the motion equation and slower computation. To overcome these limitations, machine learning methods were employed to facilitate rapid trajectory calculation.
To characterize the characteristics of bioaerosols, particle diameters of 1 Â 10 À6 m were set, which fall within the typical particle size range of bioaerosols generated by human physiological activities such as breathing, speaking, and coughing (Duguid 1946;Huang et al. 2022). Previous studies investigating bioaerosol distribution have utilized a particle density of 1000 kg/m 3 (Wang, Holmberg, and Sadrizadeh 2018;Chow and Wang 2012), and the present study adopted the same density. The mass flow rate of the particles was set to 1 Â 10 À15 kg/s.
The boundary conditions were set as follows: during the diffusion process, some bioaerosols were released through the outlet, thus the outlet was designated as the escape boundary condition. When bioaerosols encountered a rigid surface, such as a wall, they were deposited on the surface due to the lack of bounce and reflected energy, making the surrounding walls the trap boundary condition (Hinds 1999).
The Discrete Random Walk (DRW) model is commonly used to study bioaerosol diffusion under turbulent flow conditions and simulate the diffusion of bioaerosols (Liu et al. 2020b;Liu, Zhang, et al. 2021). In this study, the use of the DRW model for generating the training set was compared and discussed in the results and discussion section.
In the simulation process, bioaerosols were initially released from the emission source, and subsequently, the motion equation incorporating various forces acting on the bioaerosols was solved to obtain discrete-time information. This information encompassed the position and velocity of the bioaerosols, as well as the velocity of the flow field and relevant turbulent parameters (such as kinetic energy and turbulent energy dissipation rate for RNG k-e model). Subsequently, the derived bioaerosol trajectory information was processed and categorized according to discrete time intervals, which served as the dataset for training the LSTM model.

Calculation of bioaerosol trajectory
In this study, a lightweight single-layer LSTM model with multiple input and output layers was constructed. The input included the velocity of the bioaerosol (consisting of two components) and the velocity of the flow field (consisting of two components), while the output predicted the velocity of the bioaerosol (consisting of two components). The input at time t n is described in Equation (4): where v px is the particle velocity in the x direction; v py is the particle velocity in the y direction; v fx is the flow velocity in the x direction; and v fy is the flow velocity in the y direction. The output at time t n is described in Equation (5): where v px is particle velocity in x direction; v py is particle velocity in y direction. In this expression, a series of input data from y (t nÀkþ1 ) to y (t n ) are set as the input set, and Y(t nþ1 ) is the output set. The k corresponds to the time-step dimension of the input layer. Thus, the expression of the model is: ð Þ ¼ f LSTM ðyðt n Þ, ::::::, yðt nÀkþ1 ÞÞ where the f LSTM is the LSTM model of the particle trajectory problem. The parameters of the lightweight single-layer LSTM model are listed in Table 1. As mentioned earlier, the input size consisted of four dimensions, including the velocities of particles in the X and Y directions and the velocities of flow fields in the X and Y directions. The related time-step size was 8 steps, representing the 4-dimensional inputs at the last 8 time points. The output consisted of the twodimensional velocities of the particles in the X and Y directions at the given time point. The size of the hidden layer was set to 20.

Training
The training process for the LSTM model comprises three steps: (1) preprocessing and importing the training set, (2) defining the parameters, including the dimensions of the hidden layer and other related parameters, and (3) selecting the loss function. For the cases examined in this research, the training sets for Case 1 and Case 2 consist of 200 bioaerosol trajectories each, whereas Case 3 comprises 1500 bioaerosol trajectories. The optimization algorithm employed was the Adam algorithm, with 500 training iterations conducted at a learning rate of 0.005. The loss function chosen for the LSTM model was the mean squared error (MSE) (Mumtaz et al. 2021). The mean squared loss function calculates the average of the sum of the squares of the deviations between the data and the true value, also known as the average of the sum of the squares of the errors. In this study, it represents the average value of the squared difference between the predicted value and the target value, providing a measure of the model's accuracy. The loss function can be calculated as follows: where y i Ù and y i are prediction and target values, respectively and n is the total number of data points. This evaluation parameter can also be replaced by mean absolute percentage error (MAPE) and coefficient of determination (R 2 ), but the training effectiveness may vary depending on characteristics of the case.

Trajectory calculation
Bioaerosol trajectory was computed using the trained LSTM model, employing Equation (6) instead of Equation (2), which represents the motion equation resulting from the forces acting on the bioaerosols. First, the bioaerosol velocity at the next time point was obtained by employing the trained LSTM model and Equation (6). Second, the bioaerosol position at the subsequent time point was calculated using Equation (3), following which the flow field velocity was determined using Equation (8): where (x,y) is the coordinate point of the bioaerosol; v fx is flow velocity in x direction; v fy is flow velocity in y direction; finally, the new flow velocity and particle velocity were utilized as the four-dimensional input to calculate the new two-dimensional output. This iterative process was repeated until the complete trajectory was calculated, with the iteration beginning from the initial input parameters.

Visualization and evaluation
In this study, the Lagrange trajectory was calculated using a lightweight LSTM model. Since the bioaerosol concentration cannot be directly observed, a method was employed to determine its distribution (Zhao et al. 2008). To enhance the clarity of concentration visualization, the calculated trajectories were treated as continuous bioaerosol emissions. The concentration within a cell was computed using Equation (9) along the trajectory (Zhao et al. 2008): where C j is the mean particle concentration in a cell, M is the number flow rate of each trajectory and was set as 10 À15 , V is the determined unit volume for concentration computation, dt denotes particle residence time, and subscripts (i,j) denote the ith track and jth unit, respectively (Zhang and Chen 2006).

Result
In this study, Grid Independence Validation, Airflow Field Characteristics Validation, and Trajectory Independence Validation were conducted for cases analyzed. Further information regarding these validations is discussed in Section 1.2 of Supplementary Information.

Research process
For the application of P-ML method, it is divided into two distinct scenarios. First, it is applied to a virtual scenario where turbulent diffusion is not considered, as shown on the left side of Figure 4. In this phase, the equations of motion were used, excluding the DRW model, to determine the bioaerosol trajectories. These trajectories were then utilized as the training set for the LSTM model. Subsequently, the LSTM model was employed to predict bioaerosol trajectories without taking turbulent diffusion into account. The second part focused on calculating bioaerosol trajectories while considering turbulent diffusion, as shown on the right side of Figure 4. However, using the same method as in the first part did not yield satisfactory results. As a result, the LSTM model obtained in the first part was augmented with the DRW correction to predict bioaerosol trajectories considering turbulent diffusion.

The calculation results without the DRW model
The study conducted calculations using 200 trajectories for both the P-ML model and the motion equation model, and the results were compared. Figure 5 illustrates the trajectories and corresponding concentrations of bioaerosols in Case 1 obtained from both models over time, while Figure 6 presents the results for Case 2. The trajectories followed the flow field streamlines, with many of them either escaping near the exit or colliding with a wall during their movement. In both cases, the bioaerosol concentration accumulated near the walls, irrespective of whether it was calculated using the P-ML model or the motion equation model. Figure S5a displays the number of bioaerosols trapped on different walls. In Case 1, the LSTM model resulted in more bioaerosols hitting the lower wall and the left wall compared to the motion equation model, whereas in Case 2, more bioaerosols hit the left wall. The behavior of bioaerosols in the P-ML model aligns with physical laws. However, subtle differences were observed between the two models over time. Figure S5b demonstrates the residence time of the trajectories calculated by both the LSTM model and the motion equation model in the flow field. Trajectories from the P-ML model exhibited greater dispersion, leading to a shorter residence time in the air ( Figure S5a) and a higher likelihood of hitting the upper wall ( Figure S5b). Despite these disparities, the results revealed consistent trajectories and concentrations over time for both models.

Transferability of the P-ML model
To assess the transferability of the P-ML model, the model trained on Case 1 was applied to Case 2. Figure 7 presents a comparison of the concentrations calculated by different models. Although the concentration obtained by the Case 1-trained P-ML model exhibited slight variations from the results of the Case 2-trained P-ML model and the motion equation model, the bioaerosols still followed the flow field and adhered to physical laws in the transfer case. While some trajectories were directly trapped on the right wall, overall behavior remained consistent. These findings indicate that the trained model possesses a certain degree of transferability.

Direct training of the LSTM model
In this study, the P-ML model was employed to learn the motion equation model incorporating the DRW model, which simulates particle dispersion caused by turbulence (Liu, Zhu, et al. 2022;Liu et al. 2020a). Figures S6 and S7 present a comparison of the concentration over time calculated by the LSTM model and the motion equation model for Case 1 and Case 2, respectively. The results revealed that the bioaerosols in the LSTM model accumulated in the middle of the model, forming a concentrated area, whereas the concentration distribution in the motion equation model was more dispersed.

Modification of the P-ML model
In the DRW model, the velocity of the airflow affecting the bioaerosol trajectories (represented by u ! in Equation (2)) consists of both the time-averaged velocity and the fluctuating velocity. The average velocity is determined using the Navier-Stokes equation in the DRW model, while the fluctuating velocity follows a Gaussian probability distribution. The formula for calculating the fluctuating velocity is as follows: where u a 0 is the fluctuating velocity, f is a random variable of the normal distribution; K is turbulent kinetic energy; D is the dimension of case. To incorporate the influence of turbulent kinetic energy as input in the P-ML model, two additional input schemes were used in addition to the basic four input parameters (as described in Equation (4)). These schemes involved including either turbulent kinetic energy (k) or the second-order average fluctuating velocity. Both schemes had five input dimensions and were applied in Case 1. Figure S8 compares the different input schemes in the P-ML model. However, no improvement was observed among the different schemes as a high concentration area still formed in the middle of Case 1 for all input types. These results suggest that using LSTM to learn the motion equation model with the DRW model was not successful, indicating that the key new input in the DRW model may be the random variable following a normal distribution n.
Based on Equation (10), the original average flow velocity is adjusted when determining the trajectory  ). This approach was applied to both Case 1 and Case 2. Figure 8 compares the results of the modified P-ML model and the equation of motion model at 250s. It can be observed that in both Case 1 and Case 2, the concentration in the modified P-ML model is more dispersed, and its distribution is closer to that of the equation of motion model.

3.4.3.
Calculation of a 3D case using the modified P-ML model The modified P-ML model was employed to calculate the bioaerosol concentration of the three-dimensional model with the random walk model in Case 3. In Case 3, the inlet was set as a surface source of bioaerosols, emitting a total of 1500 trajectories. Consequently, the high concentration area is situated at the inlet level, and the bioaerosol concentration in the middle area of the model is low. Figure 9 illustrates the concentration of bioaerosols at the plane Z ¼ 0 in Case 3. Consistent with the 2D results, the bioaerosol concentration calculated by the P-ML model closely approximates the results of the equation of motion model in terms of distribution, but with lower magnitudes in certain areas.

Discussion
In this study, the P-ML model, which incorporates a lightweight LSTM model, effectively produced results that aligned with the deterministic motion equation model, irrespective of whether it was used directly or after transfer. However, when applied to the motion equation model with the DRW model, the new method was unable to accurately predict the behavior of bioaerosols. This can be attributed to the presence of an independent random variable in the motion equation that disrupted the deterministic relationship, resulting in model failure. To address this issue, we improved the method by incorporating the random variable from Equation (10) into the model's input, and this approach yielded more satisfactory results.
The Euler-Lagrange method is widely used for calculating bioaerosol trajectories (Liu et al. 2020a;. The proposed P-ML model in this paper is utilized to calculate the trajectory of bioaerosols. When discussing bioaerosols carrying bacteria and viruses related to human health, the Euler-Lagrange method is commonly used, there is a higher demand for calculating their propagation trajectory Liu, Yao et al. 2022). When studying non-bioaerosol particles such as PM2.5, the issue of propagation trajectory is often overlooked (Zhou et al. 2016). Moreover, our method can also be used to calculate PM2.5 trajectories. In Newton's law of motion method, bioaerosols are treated as discrete phases, and their equilibrium force and flow field (Mesgarpour et al. 2021;Zhao et al. 2020) tracking are considered, limiting the calculation speed (Christodoulou et al. 2020;Liu et al. 2020b). The proposed P-ML model in this paper employs a lightweight LSTM to predict the particle velocity and achieve bioaerosol trajectory prediction in combination with physical laws. This approach incorporates various factors, including the forces acting on the discrete term, into the machine learning model, thereby improving computational efficiency ( Figure 10). In enclosed environments, preprocessing can be conducted to obtain spatial flow fields and machine learning models. The subsequent computations are exclusively conducted on the machine learning model, obviating the necessity of calculating specific motion equations.
This model could be adopted for rapid calculation of bioaerosol trajectories in one or a class of similar cases, so we do not need to train large-scale data to obtain a universal model. The enhanced speed of calculating trajectories is of great significance for mitigating the airborne transmission of COVID-19. When infected individuals are present, rapid predictions can be made to achieve early warning effects.
Supervised machine learning models, such as the LSTM model, aim to establish a black box function that maps inputs to outputs (Burkart and Huber 2021;Gillingham 2016). In this study, the motion equation (Equation (2)) serves as a crucial physical model for calculating particle trajectories (Liu, Zhu et al. 2022). However, its solution only considers data from the most recent time point and does not incorporate sequence data from previous time points. Other lightweight machine learning models, such as Support Vector Machine and Artificial Neural Network, can also be employed to learn Equation (2).
For more complex motion equations, such as the Maxey-Riley (MR) equation, which provides a more accurate model for small spherical particles in a viscous fluid described by the Stokes regime (Wan et al. 2019;Maxey 1983), the LSTM model remains suitable, particularly for capturing the memory term. The MR equation includes an integral term, known as the history force, which accounts for the diffusion of vorticity around the particle throughout its lifetime (Jaganathan et al. 2023). The LSTM model is wellsuited to capture this history force, making it an excellent choice for tackling more intricate motion equations (Dai, Huang, Zeng, and Zhou 2022;Wan and Sapsis 2018). Figure 11 illustrates the relationship between actual trajectories, physical models, and empirical models. Physical models and empirical models are both employed to describe and interpret the physical phenomena underlying actual trajectories. Physical models provide a more comprehensive explanation of the phenomena, whereas empirical models offer faster and more convenient results without requiring a complete understanding. In fluid mechanics, various combination models, such as diffusion models, boundary models, and turbulent models, rely on experimental data (Ismail et al. 2020;Liu and Lin 2019;Steelant and Dick 2001;Carberry 1960). Similarly, in CFD, empirical models like boundary layer models and Laminar-turbulent conversion models are widely employed (Menter et al. 2021;Gulhane and Sajana 2021;Grant 1997). The integration of physical and empirical models using machine learning techniques is also a significant approach. In this study, the training set was derived from a physical model, specifically a combination model that closely approximates the physical phenomena. However, instead of solely using data from an existing model to train a new model, it is also plausible to consider employing   experimental data to train empirical models or combination models using machine learning methods. The training database can be obtained through Particle Image Velocimetry.

Conclusions
In this study, we investigated a novel method for predicting bioaerosol trajectories using the P-ML model, which incorporates a lightweight 2D LSTM model. The key findings are as follows: 1. When the DRW model was not factored, the P-ML model produced results similar to those produced by the motion equation model with respect to bioaerosol trajectories and concentration. For Case 1 and Case 2, the P-ML model accurately predicts mean residence times of 95.556 and 51.66 seconds for bioaerosols within the flow field, respectively, while ensuring compliance with the governing principles of physical properties. 2. The P-ML model exhibited some degree of transferability from Case 1 to Case 2 when the DRW model was not considered. 3. The motion equation model with the DRW model includes random variables, causing significant deviations in the P-ML model's calculated trajectories. However, incorporating random variables into the P-ML model significantly reduced these deviations.
This study provides a solid foundation and valuable insights for further exploration of LSTM and other machine learning methods in simulating bioaerosol trajectories.

Disclosure statement
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.