Neural Network-Based Subduction Ground Motion Model and Its Application to New Zealand and the Andaman and Nicobar Islands

ABSTRACT A deep learning model is developed for the Next Generation Attenuation – Subduction database for predicting spectral accelerations and peak amplitude measures. The developed model satisfies the statistical criteria necessary for prediction. Standard deviations lie in 0.2864–0.3809, 0–0.2696, and 0.4514–0.7892, range for inter-event, -region, and intra-events, respectively. Transfer learning is applied to the New Zealand region. Probabilistic seismic hazard analysis is performed for the Andaman-Nicobar region and obtained a peak ground acceleration of 0.6–0.7 g and 0.4–0.5 g at the Andaman and the Nicobar Islands, respectively, for a 2475-year return period.


Introduction
The subduction process has created not only several natural wonders like the Himalayas and the Mariana Trench but also produced some of the catastrophic earthquakes and tsunamis (e.g. 2010 M8.81 Chile and 2011 M9.12 Tohoku earthquakes). These earthquakes result from the sudden burst of energy stored due to one plate thrusting over the other. Chile, Indonesia, and Japan are some of the very active subduction regions globally; thus, these regions record many earthquakes and experience significant hazards. The structures constructed in these regions must be adequately safe to prevent the loss of life and property. Developing a ground motion prediction equation (GMPE) is thus very important as it is not only the basis for developing the seismic codes but also valuable for site-specific hazard analysis.
In this section, subduction earthquakes are broadly discussed, followed by reviewing various GMPEs developed for subduction regions and concluding with the objective of the current work. Subduction earthquakes are divided into the interface and intra-slab categories. Interface events are primarily reverse mechanisms generally occurring at shallow depths below 40 km (Tichelaar and Ruff 1993) at the interface of the subducting and overriding plates (e.g. 2011 M9.12 Tohoku earthquake). In contrast, intra-slab events are primarily normal mechanisms occurring at large depths within the subducting ocean plates (e.g. 2014 M7.96 Rat Islands Earthquake). Indonesia, Japan (JP), Taiwan (TW), New Zealand (NZ), Central America and Mexico (CAM), South America (SA), Alaska (AK), Indo-Burma, the Andaman and Nicobar Islands, and Cascadia (CAS) regions are some of the subduction zones in the world. Youngs et al. (1997) was one the earliest global subduction model developed considering random effects with 476 records from AK, Chile, CAS, JP, Mexico, Peru, and the Solomon Islands. The model accounts for magnitude, distance, depth, and a few flags as input parameters. Major issues with this model were that data considered lacked near field records, and no distinction was made to recordings model are magnitude (M w ), Joyner-Boore distance (R JB ), depth to the rupture plane top (Z TOR ), shear wave velocity (V s30 ), and region flag. The output parameters are 5% damped spectral acceleration (PSA) values at 105 periods, peak amplitude measures: peak ground acceleration, velocity, and displacement (PGA, PGV, and PGD). The inter-event, inter-region, and intra-event residuals are determined. A detailed comparison with the developed NGA-Sub models is made by plotting output parameters against distance and periods. The applicability of the developed network to other regions (which have poor data or are not considered in training), such as New Zealand, is demonstrated using transfer learning.
Furthermore, the engineering use of the developed model is illustrated by computing the seismic hazard of Andaman-Nicobar (AN) regions. The AN region, a group of islands situated on the boundary of Indian, Australian, and Sunda plates, faces severe earthquake and tsunami threats. Near to these islands, several large and great earthquakes have occurred in the past. Among these, the 2004 Mw 9.2 Sumatra earthquake created a mega-tsunami across the Indian Ocean, causing large destructions and deaths in the neighboring areas (Singh et al. 2012). Hence seismic hazard estimations are required, which helps in mitigation and preparedness in reducing vulnerability at the islands. In the present study, the classical Cornell-McGuire probabilistic seismic hazard analysis (PSHA) (Cornell 1968;McGuire 1974) of the region is carried out using the developed prediction models for different regions in a logic tree framework. The hazard is computed on a grid of 0.1° × 0.1° spacing throughout the study region for B-C boundary type site conditions (Vs30 = 760 m/s). The hazard contours are presented in PGA and PSA at 0.2s and 1s for 475 year and 2475-year return periods. These hazard maps are helpful in estimating and designing the seismic risk of structures in the region.

Strong Motion Database
NGA-Sub database is the most extensive collection of strong motion records from the various subduction zones of the world: AK, CAS, CAM, JP, NZ, SA, and TW regions. The flatfile (last accessed July 21), which is developed from the database, has various input fields like magnitude, distance, depth, station properties, record quality review flags, site quality flags, fault details, instrument details, volcanic arc details, event classification to main and aftershocks and other details. Output fields are the PGA, PGV, PGD, and 5% damped Rot50 PSA values from 0.01 to 20 sec (Kishida et al. 2018;Bozorgnia and Stewart 2020). The flatfile contains 71340 records from 1880 events and 6433 stations corresponding mainly to the interface and intra-slab earthquakes. The magnitudes are in the range of 2.2 to 9.12, and R JB is 0 to 6504 km. Figure 1 shows the percentage of records, events, and stations from various zones available in the database. It is observed that Japan contributes the largest to the records (57%), followed by Taiwan (18%) and Japan has the maximum density of stations contributing to (35%) followed by Cascadia (18%). However, the maximum events are recorded in South America (45%), followed by New Zealand (15%). Despite the tremendous effort in developing this vast flatfile, several records have issues related to missing important input parameters and poor record quality, which must be screened out before developing GMPE. The following criteria are followed for data screening: (1) Several records that have negative and unusually high PGA values (> 10 g) are screened out.
Similarly, records from lower magnitudes (< 4) are also screened out to ensure enough records for analysis.
(2) Records that do not have metadata for input parameters like magnitude, distance (R JB ), hypocentral depth (Z hyp ), Z TOR , and shear wave velocity (V s30 ) are removed. (3) Since the database contains events that belong to other tectonic categories, records that belong to the interface, intraslab, or belong to lower double seismic zone in Japan (i.e. Intra-inter flag = 0, 1, or 5) are selected. (4) Further, to ensure that the records belong to corresponding tectonic categories, records with hypocentral depths below 40 km for interface events and less than 200 km for intraslab events are selected (Parker et al. 2020). (5) To limit the bias from farther distances, where the ground motions may not be higher than the recording noise levels, that is, non-triggered records, only records that satisfy R JB ≤ R max subjected to a maximum of 1000 km are selected (Parker et al. 2020). Here, R max is the maximum distance limit (Contreras et al. 2020). (6) Remove records having sensor depth > 2 m. (7) Remove records that are induced due to multiple sources (i.e. event flag = 1). (8) Remove records with late P-wave trigger flag. (9) Remove records that do not represent free-field conditions (i.e. GMX first letter = N, Z, and F). (10) Use records with high source review standards (i.e. source review flag = 0, 1, 2, or 4) to ensure high-quality records. (11) Remove records with the longest usable periods below 10 sec to ensure GMPE can be developed till 10 sec. (12) After applying the above criteria, screen those events which have less than three records.
The above criteria resulted in the final 9009 records from 189 events recorded at 2877 stations. However, on preliminary analysis, a few more records had maximum absolute residuals larger than 3.5 times the with-in event standard deviations. Those records are removed, and criteria 12 is again applied. The final considered database comprises 8752 records from 189 events recorded at 2812 stations. The screened data did not contain a single record of New Zealand as it had multiple issues: it has missing V s30 values, sensor depth, review flags such as late p-wave trigger, and source review have values of −999.

Database Distribution
The selected database has a moment magnitude value ranging from M4.7-9.12, R JB values in 0-995.044 km, V s30 values below 2229.8 m/s, and Z TOR values below 36.83 km for interface events; M4.1-8.28, R JB values in 0-995.685 km, V s30 values below 2100 m/s, and Z TOR values below 186.004 km for intra-slab events. The input parameter ranges for each region are given in Table 1. Figure 2 shows magnitude and distance distribution for the selected records for various subduction regions. It is observed that in the selected database, interface events have a higher number of large magnitude earthquakes compared to intra-slab events; consequently, most of the records lie in M6-8 range for interface events and the M5-7 range for intra-slab events. Interface events correspond to the reverse mechanism, whereas intra-slab events correspond to all the mechanisms: strike-slip, normal, and reverse. It is also noted that nearly 62% of the records correspond to the Japan region, followed by the Taiwan (14%) region.
Further to study the distribution of rupture depths for various events, Z TOR is plotted against the available records and events for the tectonic types shown in Fig. 3. In an interface type, most of the records correspond to four depths distributed over 40 km. However, most of the histogram depths have at least two events recorded. Similarly, most of the records in intra-slab events correspond to 50 km, with histogram depths having at least one event recorded. Figure 4 shows the availability of records for various shear wave velocities. Stations on stiff and dense soils make most of the interface events, whereas stations on dense soils make most of the intra-slab events. Very few records are available for stations on the rock. Overall, the screened data can be used to develop the GMPE.

Ground Motion Prediction Model
GMPE is to be developed using the screeded database in this section. The input parameters are M w , R JB , the logarithm of R JB , Z TOR , V s30 , and the subduction region flag (R). Region flag (R) has the  following values -AK:1, CAS:2, CAM:3, JP:4, SA:5, and TW:6. Output parameters are PSA (in the units of g) from 0.01-10 sec, PGA (in g), PGV (in cm/s), and PGD (in cm). Since the neural network is an alternative approach to the default regression, it is used here to develop GMPEs. By this input parameter choice, the developing GMPE model incorporates source, site, and path effects. Since vast high-quality data is available, GMPE could be developed using a deep neural network. The functional form of GMPE is shown below in equation 1.
Here, Y is the target vector given by log(PSA 0.01 ), log(PSA 0.02 ), . . ., log(PSA 10 ) with the natural logarithm of PSA values at all the 105 periods given in the flatfile from 0.01-10 sec, log(PGA),  log(PGV) and log(PGD). The input and output parameters are scaled using a z-score, as shown in equation 2.
Where X is the input/target vector, μ is its mean value, σ is its standard deviation, and X' is the normalized matrix whose values are given in Tables S1 and S2 (Supplementary), respectively. Periods 0, −1 and −2 corresponds to log(PGA), log(PGV) and log(PGD), respectively. Though a shallow feedforward network can approximate any function provided sufficiently large neurons are provided in the hidden layer, a more practical approach to universal approximator is to use deep networks (Hornik, Stinchcombe, and White 1989). Due to the abundance of data, deep neural networks can be explored to reduce network size and deduce better data features than shallow feedforward networks. Thus, two hidden layers are considered for the network. Training the data optimizes the predicted values to have the lowest residual. This residual can then be split into fixed and random effects. The training algorithm optimizes the model weights and biases during the training process to reduce the cost function. Adaptive Moment Estimation (ADAM) optimizer is an efficient algorithm applicable even for large data and model parameters. Since the training matrices are large when dealing with huge data, this algorithm requires less memory (Kingma and Ba 2014). Before the training, the algorithm must be tuned for the model's hyperparameters such as learning rate, drop factor and drop factor period, and hidden neurons numbers. Apart from these parameters, regularization is also considered a hyperparameter to prevent overfitting during the training. These parameters can be tuned using grid search, random search, Bayesian optimization, and evolutionary methods (e.g. genetic algorithm). Vemula et al. (2021) use the grid search technique to determine the model hyperparameters. Since the parameters to be determined are few in that case, grid search was not computationally expensive. On the other hand, a random search is computationally efficient than a grid search and is better for fewer parameters. However, these methods do not account for past search history to determine the later search hyperparameter values.
On the other hand, the Bayesian optimization using the Bayes theorem constructs a surrogate function by constructing a likelihood function from the considered cost function. This surrogate function is essentially a Gaussian process, and it operates on the surrogate function than on the cost function directly and performing only a few actual cost function evaluations, making the search very efficient. The search effectiveness comes in selecting the further evaluation candidates based on the expected improvement, an acquisition function (Frazier 2018). This technique is typically used when optimization parameters are below 20, and thus this optimization is used to determine the model hyperparameters.
The network is trained using MATLAB (MATLAB 2020) with mean squared error (MSE) as the objective function to be minimized. Early stopping criterion (when a validation error is not reducing further, while training error is reducing) is incorporated to prevent overtraining. The architecture of the developed model is shown in Fig. 5.
The transfer function chosen is an exponential linear unit (Clevert, Unterthiner, and Hochreiter 2015), given by equation 3, as it yielded lower MSE values than tanh and rectified linear unit functions. Data is randomly divided into training (70%), validation (15%), and testing (15%).
Several hyperparameter combinations having nearly similar MSE values are obtained. The model with the lowest AIC (Akaike 1969) and BIC (Schwarz 1978) value is selected, and the criteria are given by equation 4.
Where k is the number of training parameters, n is the available training samples, and SSE is the sum of squared errors for the training set. After the analysis, the best two networks have MSE values 0.51980 (network 1: N1) and 0.52036 (network 2: N2). However, training, validation, and testing MSE values are 0.5170, 0.5336, and 0.5193 for N1 and 0.5182, 0.5257, and 0.5249 for N2. Higher training and validation errors imply the underfitting of the model. As accuracy improves, training and validation errors decrease, but after the minimum value is achieved, further training primarily reduces the training error, implying overfitting and increasing the error between training and validation. In N1, training and testing errors are close, so underfitting is not an issue, but the validation error is high, so there could be a sampling issue. In N2, validation and testing errors are close and slightly higher than the training error. There is no potential underfit or overfit in both the models as training is Figure 5. Architecture of the developed model six input parameters, ten neurons in hidden layer 1, 12 neurons in hidden layer 2, and 108 predicted parameters in the output layer. stopped before overfitting. AIC and BIC criteria values are 3.3248, 5.4892 for N1 and 3.1275, and 4.9008 for N2. N2 model is better than N1 as there is no major MSE reduction, but N2 has fewer network parameters; thus, it is preferred over N1. The final selected network, N2, has hidden layer 1 and 2 sizes 10 and 12, respectively, learning rate drop factor 0.8677, L2 regularization 3.28e-4, learning rate drop period 200, and an initial learning rate of 0.0064. The obtained weights and biases of the network are given in the program (in supplementary). For the obtained network, mixed-effects are to be determined in the later section.

Fixed and Random Effects
In this section, mixed-effect residuals are determined for the trained network. Total residual (Δ, given by equation 5) is the difference of target (Y ers ) and predicted values in natural logarithm is split into inter-event (δB e ) for each event e, inter-region (δR r ) for each region r, and intra-event (δW es ) for each event e and station s. Residuals are given by equation 6 (Al Atik et al. 2010). It is assumed that all the residuals considered are independent with zero mean and follow Gaussian distribution. Since nearly half of the stations in the considered database do not have more than one record, station variability is not considered. Different approaches like single-stage regression, two-stage regression, linear mixedeffects methods are commonly adopted to compute event, site, and random effects. The first two methods involve determining mixed effects by maximizing log-likelihood iteratively. In the current work, the mixed-effects method is implemented in MATLAB, considering maximum likelihood estimation. The obtained mixed-effect values are tabulated in Table S3 (Supplementary) and for brevity shown in Table 2 at selected periods. The obtained mean values are zero for inter-event andregion terms, and thus no bias can be expected in these obtained values. A slight nonzero mean is obtained for the intra-event residual, indicating a minimal bias. The inter-event and -region standard deviations are lower than intra-event standard deviations as expected with inter-event std. (τ) in the range of 0.2864-0.4084, inter-region std. values are in the range of 0-0.2696 and intra-event std. (ϕ) are in the range of 0.451-0.7892 with peak values at around 0.1 sec matching other GMPEs developed for subduction earthquakes. Figure 6 shows the computed standard deviations. The obtained total standard deviation, sigma values are in a similar range as that of other GMPEs developed for this database. For the sigma (σ), inter-region is not considered and is given by equation 7.

Model Performance
In this section, the efficiency of the developed model is verified using various performance parameters proposed by Golbraikh and Tropsha (2002). The following are the criteria used: (1) A correlation coefficient (R) greater than 0.8 for all output variables indicates strong dependence of the developed output variables and the target.
(2) Performance parameter, PP, given by 1 À MSE=σ (σ is the standard deviation of the target) closer to unity indicates a better performance.
(3) The slope of k ¼ should lie in the range of 0.85-1.15 (4) Correlation coefficient squares R 2 o and R 0 2 o given by equations 8 and 9 should be close to 1 should be less than 0.1 Figure 6. Standard deviation plot for inter-event, inter-region, intra-event, and sigma of response spectrum plotted against period.
Here, Y and � Y are the log(target) and its mean; y and � y are the log(predicted value) and its mean values; Y r0 ¼ k0Y and y r0 ¼ ky.
Table S4 (Supplementary) shows the computed performance parameters that are discussed above and for brevity are shown in Table 3 at selected periods. It can be observed that the correlation coefficient value is greater than 0.8 at all the periods with a minimum value of 0.9408 at 0.11 sec period. Higher accuracy is obtained for larger periods with a maximum value of 0.9754. Following the trend of the correlation coefficient, PP values have a minimum of 0.8850 at 0.11 sec period and maximum values at higher periods with a peak value of 0.9513. Slopes are within the limits of 0.85 and 1.15, with k' closest to 1 and k in the range of 0.9422 to 0.9967. R 2 0 and R 0 2 0 are almost close to 1 with the range of 0.9958-0.9998. a and a' are slightly beyond 0.1 for the 0.042-0.32 second period and below 0.1 for the  remaining periods. Overall, the developed model satisfies most of the criteria and thus can be used to predict the GMPE.

Residual Plots
Residuals are plotted for inter-event, -region, and intra-event residuals against various input parameters to further analyze the model efficiency. Figures 7-9 show various residual (in log scale) plots of PSA at periods of interest (shown on the figure titles) and for PGA, PGV, and PGD. Inter-event residues are plotted against moment magnitudes, as shown in the top two rows of Fig. 7. Considering additional records per events would further improve the computed  residuals. It is observed that most of the residuals lie in the range of −1 to 1, with bins having zero mean values and lower standard deviations at lower magnitudes and slightly higher standard deviations at higher magnitudes. In contrast, at higher periods, bins have slightly higher standard deviations. Inter-region residues are plotted against regions considered in the bottom two rows in Fig. 7. Regions have lower residuals for central and south America and Japan. Alaska has a positive mean at most periods, whereas Cascadia and Taiwan have negative residuals, with Cascadia having a higher residual. Figure 8 shows the intra-event residuals plotted against distance in the top two rows and shear wave velocity in the bottom two rows.  Residuals for most of the events lie in the range of −2 to 2, with higher standard deviations for distances compared to shear wave velocity. Figure 9 shows inter-event, inter-region, and intraevent residuals for PGA, PGV, and PGD. It is observed that inter-event residuals have larger standard deviations at higher magnitudes and lower residuals at lower magnitudes. PGV and PGD value residuals do not depend on the region; PGA value residual is negative for Cascadia and positive for Alaska and does not depend on other regions. Intra-event residuals lie mainly within −2 to 2 regions when plotted against distance and shear wave velocity with slightly nonzero bin mean values at higher distances and shear wave velocity values. At all the periods, distances at around 500 km have the largest standard deviations. The lower standard deviation  is obtained for rocky and soft soils at lower periods, whereas higher periods have lower standard deviations to soft rocks.

Relative Importance
This section deals with the effect of each input on the predicted values. In this regard, each input is replaced with normalized (with zero mean and unit standard deviation) random Gaussian variables. Then the error is computed as the change in the actual predicted target and the new obtained target values. The variable with the largest MSE change is deemed with the highest importance. The process is repeated multiple times to obtain average values. The obtained MSE values of input parameters are: M w À 4.9, R JB À 0.29, log R JB ð Þ À 4.04, Z TOR À 0.74, log V s30 ð Þ À 0.27, and R À 0.38. It shows that the highest dependence is obtained for magnitude, followed by the logarithm of distance and focal depth.

Model Discussion
In this section, various aspects of the model selection and validation are discussed. It is noted that the developed NGA-Sub models have captured the distance and depth effects mainly through the rupture distance (R rup ). However, the current work captures the distance and depth effects by pairing R JB with Z TOR , so the physics is not violated. It is tested quantitatively by developing two models to determine the network with minimum MSE. R JB term for model 1 (M1) and R rup term for model 2 (M2) and magnitude, Z TOR , V s30 , and region flag are used as inputs. These two models are trained with the same database, similar hyperparameters, and similar neuron and hidden layers. The obtained MSE values of M1 and M2 are 0.4976 and 0.6266, respectively. Having observed qualitatively M1 network resulted in lower MSE, R JB and Z TOR are used as input parameters.
From the developed network as discussed above, Figs. 10, 11 and S1-S2 (Supplementary) are plotted for magnitude 8 and 9, 400 and 760 m/s shear wave velocities, and 20 km Z TOR value to interface events for the considered regions. Figures 12, 13 and S3-S4 (Supplementary) are plotted for magnitudes 7 and 8, 400 and 760 m/s shear wave velocities, and 60 and 70 km Z TOR values to intra-slab events for the considered regions. As magnitude increases, the target values increase; as the distance increases, the target values decrease; as shear wave velocity increases, the target values decrease for all the plots. The Taiwan region has the highest spectral and peak amplitude values for most of the plots, followed by the South America region; however, the Taiwan region has the largest attenuation. The South American region has the most prominent values at distances near 1000 km. The remaining regions behave closely with similar spectral for and peak amplitude values (PGA and PGV) for most of the inputs. Slightly different behavior is seen in the interface events for PGD at higher magnitudes with a clear separation between various regions, with the Alaska region having the lowest PGD values. A few predictions correspond to extrapolation and thus not well behaved. The predicted values are compared with KBCG20 (local), PSHAB20 (local), and SMK20 models, which are obtained from Gregor et al. (2020). The model is derived using R JB and Z TOR , and the obtained values correspond to R rup and Z hyp ; no conversion is used in the comparison, resulting in some error in the plotted values. The developed model follows a similar trend as the other models. Plots with lower magnitudes have an excellent agreement with the other models. However, there is a higher deviation for the 200 km than the 75 km distance plots for interface and intra-slab events due to higher standard deviations. Similarly, higher deviations are observed for 760 m/s plots than 400 m/s. Interface events are slightly better predicted than intra-slab events because the overall residual is lower for interface events than intra-slab events. Several regions do not have magnitude values till 9 for interface and till 8 for intra-slab events. So, the plots beyond their maximum available range would have errors as we are extrapolating. For instance: the CAS region has only magnitudes below 5 for interface events; thus, the obtained spectral curves are not smooth for higher magnitudes ( Figure S6 of Supplementary), and the TW region has poor predictions of spectral values at larger magnitudes (M > 7) for interface events because M7.12 was the largest magnitude available in the database ( Figure S9 of Supplementary). Similarly, CAS plots for intra-slab ( Figure S11 of Supplementary) have huge deviations at all the distances because R JB values differed significantly with R rup values, and Z TOR has a maximum value of 52 km. CAM region has higher deviations in intra-slab than the rest of the regions due to higher standard deviation. Also, since many output periods are used for training, the predicted PSA values are not perfectly smooth in some plots. Attenuation is significant for far-fields than for near-field distances. Overall, there is a good agreement with the developed models for most of the input range.

Comparison with New Zealand Database
In this section, the application of the developed network to different regions to determine the response spectrum is demonstrated. A comparison is made with the spectral values of the New Zealand database, which are obtained from the GeoNet during the period 1968 to 2016 (Kaiser et al. 2017;Van Houtte et al. 2017). The data for training corresponds to interface and intra-slab events of model 2 of SMBRA21: Vemula et al. (2021). Machine learning techniques are efficient when dealing with extensive data rather than fewer data, which is often the case. Thus, the starting point of the training net is a pre-trained network optimized to similar inputs. Training the data from the learned knowledge rather than from scratch is transfer learning (Pan and Yang 2009). The trained network on the global database is applied onto a smaller New Zealand GeoNet database using transfer learning. It is achieved by removing the final fully connected output layer to match the output size of the target response spectrum from the GeoNet and initialized it with random weights and bias. However, the remaining layers are initialized with the weights and biases from the developed network. Then the entire network is trained, following the similar method as in the previous sections.
The MSE of the training, validation, and testing are 0.3404, 0.3482, and 0.3465 (in log units), respectively, compared to the MSE value of 0.4460 (in log units) for the developed SMBRA21 model despite optimizing the initial weights with genetic algorithm. The network has neither under fitted nor overfitted. Dusky Sound earthquake is chosen to compare the developed model with the results from SMBRA21. MSE values of the current and SMBRA21 are 0.2850 and 0.3974, respectively, for Dusky Sound. Residuals are not computed as the data is severely biased, with nearly 70% of the records correspond to 8 events and 22 events have less than three records (out of 40 events). Figure 16 shows plots for near and far-field distances and soil and rock types along with values predicted from SMBRA21. The model developed from transfer learning is better predicted than from the SMBRA21 model. The higher MSE value in SMBRA21 is mainly due to the weights and bias optimized with the limited data available.
In contrast, better initial weights and biases from the pre-trained model are assigned; in a sense, knowledge is transferred from global to regional. The weights and biases are given in the supplementary GMPE New Zealand file. Transfer learning is a vast area that must be explored further to other regions where data is scarce or was not included in the initial model development.

Seismic Hazard Estimation of Andaman-Nicobar Islands
Another application of the developed model is to use in seismic hazard calculation of the subduction regions. In the present study, the developed GMPE is employed in a probabilistic seismic hazard framework to calculate the hazard of the Andaman-Nicobar (AN) region for typical B-C boundary type site conditions (Vs30 = 760 m/s). Figure 17 shows the seismotectonic of the study region. The Indian, Australian, and Sunda plates meet at this region, resulting in a triple plate junction, making it a seismically active subduction region (Mishra et al. 2007). The Andaman and Nicobar Islands lie along this plate boundary and are mainly subjected to interface earthquakes (Mukhopadhyay et al. 2018). The main fault structures in the region are the Sumatra-Andaman trench, West Andaman faults, and Sumatra faults, as shown in Fig. 17. The proximity to these major faults results in higher seismic risks to the islands, demanding hazard analysis for the seismic design. Several investigators have studied the seismic hazard and estimated the high earthquake threat in these regions (Kolathayar and Sitharam 2012;Ornthammarath 2018). Mishra et al. (2021) have estimated the hazard parameters for the AN region using the Gutenberg-Richter relation and extreme value theory. The study reports that, for the next 50 to 100-years, there is a high probability (>90%) for the occurrence of Mw 7.0 earthquake in the Nicobar and Sumatra regions. Hence, the region has a high seismic threat from the occurrence of large to great earthquakes in the near future. Now, it will be interesting to estimate the hazard for the region using the subduction zone attenuation models developed in the present study. The seismic hazard analysis is performed using the classical Cornell-McGuire PSHA procedure (Cornell 1968;McGuire 1974). The details of the hazard procedure and seismicity parameters are described in Sreejaya et al. (2021). A brief description of the same is presented here.
The national catalog of NDMA (2010) is modified and updated with earthquakes till December 31, 2019 to characterize the seismicity parameters for the region (Sreejaya et al. 2022). Seismic occurrence rates are obtained from the elliptical smoothening approach using the spatial distribution of earthquakes from the catalog and the known fault orientations (Frankel, 1995;Lapajne et al., 2003). It is shown in Fig. 16 for , m 0 = 4 after smoothing. Using these details, the mean annual exceedance rate at a site is computed from Equation 10. Where K is the number of sources, p M (m) is the probability density for the magnitude and P Y > y � jr; m ½ � is the conditional probability obtained from the developed attenuation models. Since the developed model is based on six subduction zone databases, a logic tree framework is used for these regions with equal branch weights, such that the epistemic uncertainty is accounted. Further, the hazard is computed at each grid point at 0.10 × 0.10 spacing over the study region using the adopted logic tree. The estimated hazard curve for a few selected cities in the region is shown in Fig. 18. It can be noted from the figure that the cities (Mayabunder, Rangat, and Port Blair) in the Andaman group of islands have higher ground motion intensity at larger return periods than the Nicobar Islands. Consequently, the Andaman Islands have higher seismic threats compared to the other regions in AN. The hazard curves are estimated at each of the grid points throughout the AN region. PGA and PSA at 0.2s and 1s are obtained from these hazard curves for 475 and 2475-year return periods at all grid points, and its spatial variation is shown in Fig. 19a,b. A maximum PGA of 0.6-0.7 g is observed at the Andaman Islands, whereas the maximum PGA for the Nicobar Islands is ~0.4-0.5 g for a 2475-year return period. PSA contours at 0.2s and 1s are shown in Figs. S15-S16 (Supplementary) for 475 and 2475-year return periods, respectively. From all these maps, it is evident that the Nicobar Islands are less prone to higher return periods of earthquakes than the Andaman Islands.
Apart from the PGA values, the commonly used design parameter is the design response spectra at a site. Hence, site-specific uniform hazard response spectra (UHRS) are obtained at the cities for 475 and 2475-year RP. UHRS are derived from the hazard curve, where the PSA at each natural period has an equal probability of exceedance. Figure 20 shows the UHRS obtained for the selected cities with the IS 1893 (2016) code spectra. According to the seismic zonation of IS1893(2016), the AN region is placed in zone V, having the highest level of seismic potential. It is noted that, for a period range of 0.01-0.4s, the code underestimates the hazard, whereas at the longer periods, IS 1893-2016 code gives a conservative estimate.
Further, the estimated hazard values are compared with the earlier hazard studies for the region. Kolathayar and Sitharam (2012) have investigated the hazard for AN using different source models for PSHA and reported a maximum PGA of 0.6-0.7 g for a 2475-year return period. Ornthammarath (2018) has performed seismic hazard estimation of offshore structures in the Andaman Sea using a gridded seismicity approach. They have reported a maximum PGA of 0.8-1 g for the Southern part of the Andaman region. Sreejaya et al. (2021) have estimated the hazard for India and adjoining regions using smoothed gridded seismicity approach and global subduction zone GMPEs. They have obtained a PGA of 0.6-0.8 g for the AN region. The estimates from the present study are in range with the hazard reported by investigators using the other subduction GMPEs from the literature. The slight differences observed in all these hazard studies can be attributed to the differences in methodology, seismogenic zones, and the GMPEs used for the study.

Conclusion
To briefly summarize the discussion, all the global subduction models are based on regression, so in the current work, a deep neural network is developed using a recently developed PEER-Sub database to six different subduction regions in the world. The data is processed to remove outliers and those records with missing inputs to ensure quality data is used for training. The data is normalized using a z-score. The input parameters of the network are M w , R JB , log(R JB ), Z TOR , log(V s30 ), and a regional flag. A total of 8752 records from 189 events and 2812 stations are considered for the training. The usage range of input parameters are: 4.7 ≤ M w ≤ 9.12, 1e-4 ≤ R JB ≤ 995.044 km, 0 ≤ Z TOR ≤ 36.83 km, and 94.7 ≤ V s30 ≤ 2229.8 m/s for interface events; 4.1≤ M w ≤ 8.28, 1e-4 ≤ R JB ≤ 995.685 km, 17.5 ≤ Z TOR ≤ 186.004 km, and 94.7 ≤ V s30 ≤ 2100 m/s for intra-slab events (details of each region are given in Table 1). The output parameters of the model are PSA at 105 periods, PGA, PGV, and PGD.
The developed deep network has two hidden layers, with the first layer having ten neurons and the second layer having 12 neurons. The network hyperparameters are selected using the Bayesian optimization technique. The network is trained to minimize the MSE value. The network that has the minimum MSE, AIC, and BIC values is finally chosen. Then the residual obtained is split into interevent, -region, and intra-event residuals using the mixed-effects model. They are assumed to be independent, with zero mean values, and follow Gaussian distribution. The obtained residuals are verified to be without any potential bias or trend. The standard deviations at the considered periods are within 0.2864-0.3809, 0-0.2696, and 0.4514-0.7892, for the inter-event, -region, intra-event, respectively. Events with at least three records per event are considered; better residuals could be obtained when a higher number of records per event are considered.
The obtained inter-event residuals are higher at higher magnitudes. Cascadia region has the higher inter-region deviation, while most of the regions are independent of it. The model has the highest dependence on the magnitude and logarithm of distance. Plots for different magnitudes, distances, depths, shear wave velocities, regions, and tectonic conditions are drawn for different periods and peak amplitude values. The obtained results are comparable with other models developed using this database. The CAM region has higher variation than other regions whereas, the CAS region has lower standard deviations than other regions. However, when observed in the intra-slab Figure S11 (Supplementary), there is a considerable variation due to uncorrected input parameters. Interface events have lower residuals compared to intra-slab events. The model has a lower error at lower magnitudes, and due to higher interevent residuals at higher magnitudes, residuals are high. Overall, the developed model has good agreement with other GMPEs with similar residuals and thus can be used for ground motion prediction.
The model does not account for the data from Indonesia, New Zealand, Indo-Burma, and other regions. A better training model could be developed with the additional data considering inter-site residuals to compute single sigma values. Fourier spectral model could be further developed. The developed network can be an initial starting point for regions lacking quality recorded data. It is demonstrated using the New Zealand database, and the developed model is better predicted than the SMBRA21 model. Furthermore, to demonstrate the engineering use of the model, seismic hazard curves and contour maps are developed for Andaman-Nicobar-Sumatra regions for typical B-C boundary type site conditions. The hazard is computed using a logic tree with equal branch weights to the predictions from the six subduction databases. For the 2475-year return period, the maximum PGA of 0.6-0.7 g and 0.4-0.5 g are observed at the Andaman Islands and the Nicobar Islands, respectively. For the majority of the cities, the IS 1893-2016 code underestimates the hazard at short periods (0.01-0.4s) and gives conservative estimates at long periods (0.4-5s). The hazard estimates from the study are useful for the design and structural risk assessment in the AN region.

Data and Resources
The NGA-Sub data is processed by PEER; New Zealand data is collected and processed by GeoNet. Neural network architecture is plotted in latex. The developed NGA-Sub MATLAB code and the transfer learning New Zealand MATLAB code are also submitted.

Disclosure Statement
No potential conflict of interest was reported by the author(s).