A dynamic mode decomposition based deep learning technique for prognostics

Remaining useful life is one of the key indicators for mechanical equipment health and condition-based maintenance requirements. In fact, the field of prognostics and health management is heavily reliant on remaining useful life estimation. The availability of industrial big data has enabled promising research efforts in prognostics and health management. Deep learning techniques have been widely adopted, and proven to be successful in big data prognostics applications. However, deep learning approaches are considered black box approaches with interpretation difficulties and loss of information due to high-level feature extraction resulting from layer-by-layer processing. Enriching the deep learning input with temporal features can increase the performance of deep learning based approaches. This paper aims to improve the performance of deep learning techniques by incorporating dynamic mode decomposition into the deep learning schemes for the purposes of remaining useful life estimation. The developed method is capable of accurately predicting the remaining useful life in a data driven manner without prior knowledge of system equations. The input temporal information and health state are enriched by using dynamic mode decomposition which produces dynamic modes that approximate the infinite Koopman operator modes. The modes contain coherent time dynamics of the processed system which contribute to producing a health indicator that is representative of the system degradation. These time dependent dynamics are important characteristics of the system’s health state. The degradation profile is incorporated into deep learning schemes that accurately predict the remaining useful life of the system. To validate the proposed model, two different experimental data repositories are used in this paper. The first one is a spiral bevel gear vibration dataset. The second one consists of turbofan engines vibration datasets. The validation results have shown improved remaining useful life estimation performance when dynamic mode decomposition technique is incorporated into the deep learning schemes presented in this paper.


Introduction
Remaining useful life (RUL) is one of the key indicators for mechanical systems health and condition-based maintenance requirements. In fact, the field of prognostics and health management (PHM) is heavily reliant on RUL estimation as an important parameter for condition-based maintenance (Qu et al., 2019). Effectively extracting features from big data for the purposes of accurate RUL prediction is one of the main challenges in PHM. In recent years, a variety of methods have been utilized to estimate the RUL of industrial equipment using available sensor data and machine learning algorithms. These methods can be categorized into shallow learning approaches and deep learning approaches. RUL prediction using shallow learning approaches includes the utilization of algorithms such as naïve Bayes (Ng et al., 2014), support vector regression (Benkedjouh et al., 2013;Dong et al., 2014;GarcíaNieto et al., 2015), and regression trees (Tran et al., 2009). Shallow learning techniques can also include the use of traditional neural networks with the lack of the deep hidden layer architecture. Examples of shallow neural networks include ensemble neural networks (Baraldi et al., 2013;Lim et al., 2014) and quantum neural networks (Cui et al., 2015).
In (Hsu & Jiang, 2018), LSTM was used to predict the RUL of aero-propulsion systems where the proposed method proved to be superior to shallow learning approaches as well CNN.
One variant of SAE is the stacked denoising autoencoder (SDAE) which tackles the noise issue in collected data (Xia et al., 2018;Ma et al., 2018;Gao et al., 2017;Yan et al., 2018;He et al., 2018]. It does so by providing a noisy version of the data and tries to reconstruct it into a denoised output that has an increased level of robustness. Robust output helps obtaining solid features to be used for a reliable RUL prediction. The SDAE was used in (Xia et al., 2018) to classify the input signals of bearing data into various stages of degradation. The denoising property of the SDAE helped achieve more accurate representation of the health stages. A regression shallow neural network was developed for each of the classified health stages. By smoothing the regression results from different models, the final RUL estimation was obtained.
DBN was enhanced by incorporating particle filters (PF) in (Niu et al., 2018) to predict the RUL of lithium ion batteries. PF is a Monte Carlo approach for system state estimation. It does so by combining the parameters of the state and the state evolution of the system. The method utilized DBN for offline training. In addition, a fault dynamic model (FDM) was achieved by the trained DBN. New particles in the PF process can be generated based on the FDM of the trained DBN. In the filtering step, the weights were updated when new measurements were presented. The hybrid DBN-PF method was used to estimate the fault state which was used for the purposes of RUL estimation. The number of neurons in the hidden layers was 30 and 20 for the first and second hidden layers, respectively, of the DBN structure that was combined with PF.
In (Ren et al., , 2018aRen, Zhao, et al., 2018), a technique named spectrum principal energy vector was employed to produce eigenvectors that better resemble a typical CNN input. After that, CNN was applied to the eigenvectors for feature extraction. The structure of the CNN consisted of three convolutional layers, three average pooling layers, and a flattening layer. Dropout was also incorporated into the CNN structure for the purposes of controlling the overfit. The prediction results were then smoothed to estimate the RUL of bearings. The smoothing method used was based on linear regression.
In (Elforjani, 2016), fully connected dense networks were utilized for bearing RUL estimation. The number of hidden layers and the number of neurons were fine-tuned until the best structure was achieved. The best structure consisted of two hidden layers where the number of neurons was 5 and 8 for hidden layer 1 and 2, respectively. After that, linear regression was used to estimate the RUL.
The previously mentioned deep learning based approaches have proven to be superior to shallow learning based algorithms for accurate RUL estimation of industrial machines and components. It is worth mentioning that some of these deep learning based approaches were not originally developed for prognostics. For example, CNN was originally developed for computer vision applications and image processing as in (Vardhana et al., 2018). LSTM was prominent in sequence processing for handwriting and speech recognition applications as presented in (Graves et al., 2013). However, these deep learning based approaches have shown reliable performance in the field of PHM for predicting the RUL in a wide range of industrial applications.
The process of estimating the RUL of a system often requires extraction of features that help in the construction of health indicators (HI). A health indicator is useful in determining the current health state of a system in the time domain with relation to both past and future health states. When an HI is obtained, it provides insight into the degradation of the system over time with a monotonic representation of the health state. A health indicator can then be processed using one of the deep learning approaches mentioned above to estimate the RUL.
One of the limitations of deep learning based approaches is the lack of interpretability as they are considered, to some extent, black box approaches. Layer by layer, high level fea-tures are extracted from the input data which could account for loss of information. For this reason, it is intended to add a preprocessing step that enriches the input data with time coherent structures that take into consideration the temporal evolution of the system's health state. This preprocessing step consists, in part, of using the dynamic mode decomposition (DMD) of a Koopman operator.
In physics, the Koopman operator (Koopman & Neuman, 1931;Koopman & Neuman, 1932), also called the composition operator, is a linear operator used to infinitely represent nonlinear dynamical systems with known equations. Infinite representations need to be approximated for obtaining health indicators for RUL prediction. A data driven approach is needed to approximate the infinite representations of the Koopman operator to facilitate the construction of health indicators. The dynamic mode decomposition is a data driven approach that approximates the eigenvalues and modes of the Koopman operator (Bagheri, 2013). DMD was first introduced in (Schmid, & Sesterhenn, 2008). It is primarily used in the field of fluid dynamics (Schmid et al., 2011) to construct dynamic modes that are coherent structures of the fluid behavior in flow fields. DMD is connected to the Koopman operator through observation of temporal behavior similarity of the time dependent modes of DMD and the infinite Koopman operator representations. DMD approximates the eigenmodes of the Koopman operator and outputs finite representations in a data driven manner which is ideal for constructing the health indicator for predicting the RUL. In this paper, A DMD based approach is used to construct the health indicator that serves as an input into various deep learning architectures to estimate the RUL of two different industrial applications.
In this paper, two applications are used to validate the DMD based deep learning approach for prognostics. The first application is the RUL prediction for the NASA spiral bevel gear univariate vibration data. Using DMD showed an improvement in RUL prediction. The DMD helped achieve a monotonic health indicator by obtaining coherent time dependent structure of the signal that served as input data for the deep learning based approach.
The second application is the RUL prediction with NASA C-MAPSS engine fleets simulation datasets. DMD helped compress multiple sensors and extract a time coherent mode that is representative of all sensors which is a result of utilizing the dimensionality reduction property of DMD. The mode is considered a fused health indicator. This mode is later used as an additional feature to more accurately predict the RUL. In both applications, the DMD is used to improve the quality of the deep learning input for the purposes of enhancing the RUL prediction accuracy. This quality improvement stems from the DMD's ability to capture the time evolution of the system's health state. The proposed approach is outlined in more details in the methodology section next.
The remainder of this paper is organized as follows: Section "The methodology" details the methodology of the proposed approach, Section "Case studies" provides an exhaustive presentation of two case studies, and Section "Conclusions" concludes the paper.

The methodology
The technologies involved in the proposed approach are Dynamic mode decomposition and deep learning algorithms which are explained in the first two subsections of the methodology section, respectively. Furthermore, the overall framework of the DMD based deep learning approach for prognostics is presented in the third methodology subsection. Software requirements for implementing the approach include Python 3 (Van Rossum & Drake, 2009), Keras library (Chollet et al., 2015), TensorFlow (Abadi et al., 2016), NumPy library (Harris et al., 2020), and SciPy .

Dynamic mode decomposition
In this paper, the dynamic mode decomposition (Tu et al., 2014) of a Koopman operator is used to obtain health indicators of the raw sensor signals. Consider a sequential set of data vectors {d 1 , . . . , d n }, where each d k ∈ R n . It is assumed that the data is created by linear dynamics d k+1 Ad k for some unknown A matrix. A continuous evolution d(t) may also be sampled to generate d k where d k d(kΔt) with the assumption that there exists a fixed sampling rate Δt. An operator A is assumed to approximate the dynamics of a system when the data to which DMD is applied is generated by nonlinear dynamics. The DMD modes and eigenvalues are intended to approximate the eigenvectors and eigenvalues of A. The equations that outline the procedure of implementing the DMD are significant due to their data driven approximation of the Koopman operator to obtain the dynamic modes. In fact, the terms Koopman mode and dynamic mode are interchangeably being used in the literature (Tu et al., 2014). The DMD procedure is explained as follows.
The data is arranged in snapshots (d i ) to form the appropriate DMD input by defining X where n is the time index of when the last snapshot in the sequence is taken and d n is the last time snapshot. And d i is a two-dimensional snapshot of the original data matrix at time i, X is a matrix that contains snapshots of the data from time 1 to time n-1, and Y is a matrix that contains snapshots of the data from time 2 to time n. By arranging the time snapshots of the data into X and Y , the sequences of the Hankel shift matrix (Layman, 2001) are obtained. The two sequences X and Y represent a subset, the first two rows, of the Hankel shift matrix when it is applied to the original data. A Hankel shift matrix results in a skewdiagonal constant where the first snapshot of Y is the same as the second snapshot of X. If the original data matrix is of dimension n × q, the resulting X and Y sequences are each of dimension (n − 1) × q. The Hankel shift matrix extends infinitely beyond defining sequences X and Y , however, only the first two rows are required to define them.
Singular value decomposition (SVD) is the factorization of a square or rectangular matrix into the product of three matrices (Rao et al., 2014). SVD-based DMD (Schmid, 2010) is used in this paper for its increased numerical stability when compared with the original DMD which is formulated in terms of a companion matrix (Schmid & Sesterhenn, 2008). The SVD-based DMD is now recognized as the defining DMD approach. The SVD-based DMD starts with computing the SVD as presented next.
Compute the SVD of X: Define matrixÃ as follows: Calculate the eigenvalues and eigenvectors ofÃ: where λ is a DMD eigenvalue. The DMD modes associated with each eigenvalue are then calculated using Eqs. (4) and (5): where φ are the eigenmodes when the Hankel shift is not applied, in which case the Hankel shift operation is replaced with exact pairings of X and Y where x i d i−1 and y i d i . And φ p are the projected eigenmodes when the Hankel shift is applied. The eigenvectors ofÃ are lifted to the original space using the left singular vector U to calculate the projected DMD modes. It is important to note that DMD differs from proper orthogonal decomposition (POD), used in applications such as principal component analysis, in three aspects: (1) DMD modes possess temporal behavior that are not found in POD's orthogonal modes, (2) DMD approximates the temporal dynamics while POD relies on a time-averaged spatial correlation (Schmid et al., 2009) time series dynamics unlike POD that reconstructs the input with ranked modes in terms of variance or energy (Tu et al., 2014). In this paper, the obtained dynamic modes are used to construct the health indicator for prognostics. Due to the use of the Hankel matrix, the time-shifted input contributes to constructing temporally dependent and coherent dynamic modes which are used to construct the health indicator as shown in the first case study. In addition, the dimensionality reduction property of DMD helps obtain a fused health indicator that is representative of the multivariate time series as seen in the second case study.

Deep learning
Different types of deep learning based architectures can be used in combination with DMD for prognostics. The architectures include a dense deep neural network as well as a hybrid CNN-LSTM architecture.

Dense deep neural network
The dense deep neural network used in this paper is a fully connected neural network with multiple hidden layers (Maksimenko et al., 2018). It uses backpropagation to update weights at each hidden node. This type of layer was utilized as the main HI estimator for the first case study and as part of the hybrid architecture in the second case study. Figure 1 shows the basic architecture of a dense deep neural network.
Using backpropagation, a dense deep neural network updates the weights and biases iteratively to minimize the MSE with respect to predicted and actual output of each node in a given hidden layer.

LSTM
LSTM is a powerful sequence processing deep neural network that utilizes memory cells (Kurata et al., 2017). It is  Figure 2 shows the basic structure of an RNN network. RNN processes input data at each timestep while sharing the calculated weights from previous timesteps.
A recurrent network, at any given timestep, has two inputs. The first input comes from the input layer while the second input comes from the hidden layer of the previous timestep as shown in Fig. 2, inspired by (Guo et al., 2017).
An RNN can be mathematically described as follows (Guo et al., 2017): where f (·) is the activation function used, w hx is the weight matrix calculated between the input and hidden layers, w hh is the weight matrix in between a hidden layer and its counterpart in the previous timestep, and the vectors b h and b y are biases of hidden and output layers, respectively. LSTM solves the problem of vanishing and exploding gradients in traditional RNN. Unlike RNN, LSTM utilizes three gates which are input, forget, and output gates. These gates allow an LSTM's memory cell to discard some input weights at each timestep and carry forward useful information through the output gate. Figure 3 illustrates the functionality of an LSTM memory cell.
The operations inside an LSTM cell can be described mathematically as follows (Guo et al., 2017): where g t , i t , f t , and o t are outputs of the input node, input gate, forget gate and output gate, respectively. w gx , w ix , w fx , and w ox are weights passed from input layer x t to hidden layer h t at time t. w gh , w ih , w fh , and w oh are hidden layer weights between time t and t − 1. b g , b i , b f , and b o are bias vectors of the input node, input gate, forget gate and output gate, respectively. h t−1 is the output value of hidden layer at the previous timestep.
where s t and s t−1 are the internal state at the current and previous timesteps, respectively, and ⊗ is the tensor product.

CNN
Convolutional neural networks have three dimensionality variations, 1-dimensional, 2-dimensional, and 3-dimensional CNN. 1-dimensional CNN is used in this paper as part of a hybrid deep learning scheme where the output of the 1-dimensional CNN serves as the input of the LSTM as explained the case studies section. Figure 4 shows the basic structure of a 1-dimesnional convolutional neural network which was inspired by . The 3D tensor results from stacking feature maps.

Input Sequence
Feature Maps 3D Tensor

Fig. 4 One-dimensional CNN
A one-dimensional CNN can be best described using its mathematical expressions of each operation as follows : One-dimensional sequential data input: The convolution operation: Equation (15) demonstrates the dot product of the filter kernel w, ∈ R F L and a concatenation vector representation x i:i+F L −1 . In (15), x i:i+F L −1 is a window of length F L sequential signal that starts with i, and ⊕ concatenates each sample into a longer embedding.
The final convolution operation: where * T represents the transpose of matrix * , b is the bias, and ϕ is a non-linear activation function. The feature map of the j th filter: where z i represents the filter kernel learned feature and j represents the j th filter kernel. Figure 5 represents the overall framework of the proposed approach. The input into the proposed approach consists of raw sensor signals. The DMD is applied to the raw sensor signals to extract dynamic modes. The dynamic modes provide chronologically ordered states of the system. The obtained dynamic modes serve as the temporal health indicator of the system. Once the health indicator is obtained, the deep learning algorithm is executed to learn the trajectory of the health indicator and for obtaining a model. The learning procedure is implemented by utilizing a specific network structure that is detailed in the case study section.

The proposed approach
The obtained model is then used to predict the health state of the systems on the test set. The predicted health indicator is used to predict the RUL of the system which is the main objective of the proposed approach. After that, the predicted RUL is compared to the true RUL. Performance metrics are then used to compare DMD-deep-learning approaches to deep-learning approaches.
The novelty of the proposed approach can be outlined through the following contributions:

A new DMD based deep learning technique has been
developed to predict the RUL of industrial machines and equipment. The technique is designed by incorporating DMD into a variety of deep learning schemes. The developed method is capable of accurately predicting the RUL in a data driven manner without prior knowledge of system equations. The input temporal information and health state are enriched by using dynamic mode decomposition which produces dynamic modes that approximate the infinite Koopman operator modes. The DMD modes contain coherent time dynamics of the processed system which contribute to producing a health indicator that is representative of the system degradation. These time dependent dynamics are important characteristics of the system's health state. The degradation profile is incorporated into deep learning schemes that accurately predict the RUL of the system. 2. Local stationary system representations have been developed for RUL prediction. The dynamic modes obtained using DMD represent the local dynamical features of the system at each timestep. The deep learning scheme then processes those local features to obtain higher level global features. This transition from local to global representation using DMD and deep learning for prognostics is one of the contributions of this paper. As infinite iterations may be required to reach a solution and map the temporal degradation of a system using deep learning, DMD as a closed form solution enriches the input with temporal information of the subsystems that allows deep

Case studies
In this section, a detailed explanation of each of the two case studies is provided. After each case study is explained, the specific method is outlined, the results following the proposed approach is then visualized, compared, and discussed.

Spiral bevel gear data
The data was collected using a bevel gear test rig at the NASA Glenn Spiral Bevel Gear Test Facility. The data collected are vibration signals. More details about the procedures of the tests performed can be found in (Dempsey et al., 2002). There were 7 experiments performed at the gear test facil-ity. The data used in this paper are from the last experiment NGB1_CHK7.
The acquisition time of the collected data is a 1 s duration, the acquisition interval is 1 min at a sampling frequency of 150 kHz. In NGB1_CHK7, increased damage begins on the right side of the bevel gear from 1 to 4 teeth. Figure 6 shows the schematic of the used test rig.
The gear has 36 teeth while the pinion has 12 teeth (Fig. 7). Figure 8 shows an image of a damaged spiral bevel gear. It is worth mentioning that oil debris mass (ODM) and vibration condition indicators were used to detect the pitting damage.

Spiral bevel gear method
The steps for implementing the DMD based deep learning approach are as follows.
Step 1: The DMD method is applied to the raw vibration signal and modified to obtain 30 modes φ p .
Step 2: The standard deviation of the modes is calculated. For comparison, direct standard deviation is applied to the version where DMD is not applied.
Step 3: The moving average is calculated.
Step 4: The data is normalized using a minmax scaler defined by: Step 5: The univariate timeseries problem is then converted into a supervised learning problem using time-shifted  Step 7: The original univariate timeseries is then used as the label for the supervised learning problem and concatenated with the embedding matrix d to create the final training matrix F Gear .
Step 8: F Gear is split into F TrainGear and F TestGear .
Step 9: A dense deep neural network is then applied to F TrainGear and a model is obtained.
Step 10: The obtained model is then used to predict the HI using F TestGear . The network configuration is presented in Table 1.
Step 11: The RUL is then calculated using the following equation:

RU L Pred H I Pred (RU L T rue ) H I T rue
The basic flow structure for the spiral bevel gear approach is presented in Fig. 9. The step that includes applying the DMD on the data is removed from the proposed approach to validate the effectiveness of DMD and compare the results.
The training/testing split was set to 90/10 percent of the data while the validation data is obtained by further splitting the 90% training data into a 90/10 training/validation input. The configuration in Table 1 [1,64,128], and finally the layer units were found empirically. The grid search was implemented on the L 1 step ahead prediction, without incorporating DMD, and the obtained best hyperparameter settings were used for all 3 steps ahead prediction schemes to demonstrate the difference in performance when L increases. The exact model parameters were used for both DMD-Dense and Dense across all L step ahead prediction schemes to demonstrate the effectiveness of incorporating DMD into the deep learning based approach. Randomly selected seeds were set to 123 and 2 for Numpy and TensorFlow, respectively, for reproducibility of the obtained results. Multiple ensembles of deep learning networks were built based on the exhaustive grid search that determines the best model. The model that minimizes the mean absolute percentage error is considered the best model whose parameters are then used for all L steps.
For L 10 and L 15, an early stopping criterion was deployed to control the overfit resulted from the larger number of embedding d. Early stopping is set to 3 and is activated when the validation loss reaches equilibrium for consecutive training iterations. Using backpropagation, the dense deep neural network updates the weights and biases at each epoch to minimize the mean squared error (MSE) with respect to predicted and actual output of each unit in each hidden layer. The weights are updated until the set training epochs is achieved or the early stopping criteria is activated. The relatively bigger learning rate helps escaping local minima. The convergence is then directed to the global optimum that minimizes the MSE after each epoch update.

Spiral bevel gear RUL prediction results
Following the proposed approach in the methodology section, we obtain the RUL prediction for the spiral bevel gear. Figures 10 and 11, show the original vibration signal and the DMD based health indicator obtained to serve as an input for the deep dense network, respectively.
To demonstrate the effectiveness of incorporating the dynamic mode decomposition into the deep dense network approach, two methods were tested on the spiral bevel gear data. The first method utilizes DMD to construct the health   indicator and uses a dense deep neural network to predict the RUL which is referred to as (DMD-Dense). The second method is identical to the first method with the exception of not implementing the dynamic mode decomposition step which is referred to as (Dense).
Root mean squared error (RMSE) and mean absolute percentage error (MAPE) were used to compare the performance  where A t is the actual value, P t is the predicted value, and n is the number of predicted points. Table 2 shows the comparison results between the DMD-Dense and the Dense approaches. It is noted that after incorporating the DMD into the dense deep neural network approach, a consistent improvement in both RMSE and MAPE values is evident. Figures 12, 13, and 14 show the predicted RUL of 1, 10, and 15 steps ahead predictions, respectively. The prediction is shown against the true linear degradation of the RUL. The remaining useful life is calculated using (19) after obtaining the health indicator.
The DMD-Dense approach has shown improvement compared to the Dense approach. The dense deep neural network is identical in both approaches to demonstrate the effectiveness of incorporating the DMD approach to predict the remaining useful life of the spiral bevel gear. As anticipated, decreasing the number of L steps prediction demonstrates an improvement in the RUL estimation accuracy.
It can be seen from the previous figures that the proposed approach performs well in predicting the RUL relative to the decreasing number of L step predictions. At L 10 and L 15, the remaining useful life is predicted early to some extent when compared to the L 1 step ahead prediction.

Turbofan engines data
The C-MAPSS engines datasets (Saxena & Goebel, 2008) include 4 engine fleets: FD001, FD002, FD003, and FD004. Each fleet's data include run-to-failure training data, unlabeled, abruptly ended test data, and true RUL. The training and testing data each contain 3 operational settings in addition 21 sensor measurements.  In this paper, a subset of operational settings and sensors were selected as shown in Table 3. Train and test trajectories differ across each of the 4 engine fleets as shown in Table 4.
The C-MAPSS dataset contains simulated vibration signals. The conditions and fault modes are summarized as follows: Dataset FD001 conditions: one (sea level), and fault modes: one (HPC degradation). Dataset FD002 conditions: six, and fault modes: one (HPC degradation). Dataset FD003 conditions: one (sea level), and fault modes: two (HPC degradation, fan degradation). Dataset FD004 conditions: six, and fault modes: two (HPC degradation, fan degradation). Figure 15 shows the airflow through a C-MAPSS engine.

Turbofan engines method
The turbofan engines case study contains 4 different datasets. The methodology presented next is applied to all 4 datasets identically. The DMD step is removed from the following steps to demonstrate its effectiveness and compare the results.
Step 1: Preprocessing and feature selection are implemented. The selected features are shown in Table 3.  ,1),100,50,1) Step 2: The training data is normalized using (18). Given that there exists a separate testing set of the C-MAPSS dataset, the minmax scaler is fit on the training and then applied to the testing set.
Step 3: Selected features are preprocessed using DMD and a modified single dynamic mode φ Y V −1 w is obtained where φ is a univariate timeseries that represents a fused health indicator of the system. When applying DMD here, X and Y consist of exact pairings of the same timeseries.
Step 4: φ is concatenated with the normalized selected features and time cycles to create the final training matrix F Train .
Step 5: Step 4 is repeated for the selected corresponding testing features to create F Test .
Step 6: A hybrid CNN-LSTM network is then used for training a model on F Train . The configuration is a continuation of the work in (Akkad & He, 2019). The network configuration is presented in Table 5.
Step 7: The trained model is then used to predict the RUL for each engine using F Test .
The basic flow structure for the C-MAPSS datasets approach is presented in Fig. 16. The steps of the turbofan engines methodology are repeated for each of the 4 engine fleet datasets to predict the RUL of all engines in each fleet. Table 5 presents the hyperparameter optimization resulting values that were used to train the hybrid deep learning algorithm. The same hyperparameters were used across all 4 engine fleets for consistency.
The configuration in Table 5  methods across all engine fleets to demonstrate the effectiveness of incorporating DMD into the deep learning based approach. Randomly selected seeds were set to 1337 and 2 for Numpy and TensorFlow, respectively, for reproducibility of the obtained results. Multiple ensembles of deep learning networks were built based on the exhaustive grid search that determines the best model. The model that minimizes the mean absolute percentage error is considered the best model whose parameters are then used for all engine fleets. The hybrid layer structure consists of a 1-D CNN layer followed by 3 LSTM layers with hyperbolic tangent (tanh) activation. A final dense layer is used to estimate the RUL. The testing data in the C-MAPSS datasets end abruptly and the goal is to estimate how many cycles are remaining for each test engine's life before failure occurs. Each engine's remaining useful life was obtained in a supervised fashion.

Turbofan engines RUL prediction results
Following the approach to process the C-MAPSS datasets, we predict the RUL for all engines within each of the 4 fleets. The prediction is implemented using the DMD-Hybrid approach where the hybrid part consists of a CNN-LSTM deep neural networks as mentioned in the methodology section.
For comparison purposes, the hybrid approach is also implemented but without including the DMD step in the proposed approach to observe the performance improvement.
It is noted that both the RMSE and MAPE values improve when incorporating the DMD into the proposed approach.
Incorporation of the DMD into the hybrid deep learning approach increases the accuracy of predicting the RUL across all 4 engine fleets. A summary of the RUL estimation performance, calculated using (20) and (21), is shown through RMSE and MAPE values in Table 6.
As mentioned in the methodology section, the sensors are preprocessed and decomposed using the DMD method. A dynamic mode is then obtained and added to the original training matrix to compose the final training matrix.
The same process is then repeated for the testing matrix. After all training and testing signals are preprocessed for all the datasets, the final training matrix is used to train the hybrid deep learning model. The obtained model is then applied to the testing matrix to predict the RUL of all engines within each fleet. The DMD step of the approach is then eliminated to observe the effect on the RUL estimation metrics. Figures 17,18,19,20 show the actual and predicted RUL for all engines at the end of each engine's test signal in datasets FD001, FD002, FD003, and FD004, respectively. The true RUL values are shown in blue while the predicted RUL values are shown in green. For the C-MAPSS datasets, the RUL prediction seems to alternate between early and late prediction depending on the specific engine. The RUL prediction follows the general trend of the true RUL in each of FD001, FD002, FD003, and FD004 engine fleets. As shown in the previous figures, the DMD-Hybrid approach accurately predicts the remaining useful life of all engines in the 4 C-MAPSS engine fleets datasets. The hybrid part of the approach consisted of 1-D convolutional layer, 3 long short term memory network layers, and a final dense layer for the supervised RUL prediction. The CNN layer acts as the first feature extractor of the network where it convolutes the input signals and outputs a 3-D tensor that serves as the input for the first LSTM layer. The high level features extracted by the CNN layer are sequentially inputted into the first LSTM layer as time dependent embeddings. The LSTM processes these embeddings using a tanh activation function. The input, forget, and output gates of the LSTM sub-network of this model help control the gradient from exploding or vanishing. Finally, the dense layer outputs the predicted RUL values when the model is applied on the test set.
It is worth mentioning that incorporating the DMD into a deep learning scheme for prognostics is a continuation of the work presented in (Akkad, 2019). In summary, the dynamic mode decomposition consistently improved the deep learning RUL estimation performance on both gear and engine case studies. It was shown that the DMD-Deep-Learning approaches are scalable to big data applications for remaining useful life estimation.

Conclusions
In this paper, a dynamic mode decomposition based deep learning approach for prognostics was presented. In the proposed approach, the dynamic mode decomposition is incorporated into different deep learning schemes with the intent of improving the remaining useful life prediction performance. Raw sensor signals are processed using systematic approaches that focus on highlighting the remaining useful life prediction improvement resulting from incorporating the dynamic mode decomposition. The results show that incorporating the dynamic mode decomposition into the deep learning based schemes improves the remaining useful life prediction performance. Two different deep learning algorithms are used for the final prediction of remaining useful life. A dense deep neural network in addition to hybrid convolutional neural network-long short term memory network were used for the first and second case studies, respectively.
To validate the proposed approach of incorporating dynamic mode decomposition into deep learning based schemes, two case studies were utilized to observe the performance improvement in remaining useful life prediction. The first case study included vibration data from a spiral bevel gear. The second case study included 4 different datasets, each of which contained simulated vibration sensor measurements from a variety of simulated turbofan engine fleets.
It is worth mentioning that the first case study consisted of a univariate timeseries data while the second case study contained multiple sensor measurements. It was found that the dynamic mode decomposition incorporation improves the deep learning remaining useful life prediction performance for both case studies and across all testing datasets therein. The proposed methods demonstrated good generalization across all used datasets and the dynamic mode decomposition based deep learning approach has shown consistent improvement when compared to its deep learning counterpart.
For future research, it is important to consider the limitations of data driven approaches in real life applications. For instance, the spiral bevel gear remaining useful life is predicted in this paper using run-to-failure data. Consequently, future data may become available and the remaining useful life is to be predicted for a gear with unknown failure time. Threshold setting may be used when observing the health indicator of run-to-failure gears to estimate the remaining useful life from gears of which the failure time is unknown. This may be considered a similarity method where degradation profiles are compared between known failure and unknown failure timeseries data. Asymmetry or imbalance can be a significant limitation when implementing similarity based methods. For instance, the available run-to-failure data may not be of a large enough sample size to produce a reliable model to predict new data with unknown failure time. A possible solution to overcome such an obstacle is to employ resampling techniques to even both sides of known and unknown failure time data for remaining useful life prediction. Another possible approach is to create an ensemble of the training data that would expose the model to a wider range of data subsets with different behaviors resulting in a more generalized model building procedure to help better predict incoming new data with unknown time to failure. One more consideration related to data type is to be addressed. In future research, vibration signals with torque information as the defining characteristic of the timeseries data may also be used to validate the proposed approach.
For additional future research, a variety of considerations may also be addressed. One of the considerations is the expansion of incorporating physics based approaches to include techniques in addition to the dynamic mode decomposition. The purpose for this consideration is to further enrich the temporal information of the data and consequently to improve the accuracy of the remaining useful life prediction. Another consideration for future research is to improve upon the dynamic mode decomposition itself. This could be achievable by systematically updating the dynamic mode decomposition equations to fit special cases of processed data. Further development is needed for achieving this specific consideration. A final consideration for future research could include the development of a comprehensive system that outlines the specifics of incorporating dynamic mode decomposition into deep learning schemes. This may be realized by considering a full integration of dynamic mode decomposition into deep learning layers and hyperparameter updates as a logical next step for physics based deep learning approaches.