Joint inversion of teleseismic receiver functions and magnetotelluric data using a genetic algorithm: Are seismic velocities and electrical conductivities compatible?

Joint inversion of different kinds of geophysical data has the potential to improve model resolution, under the assumption that the different observations are sensitive to the same subsurface features. Here, we examine the compatibility of P‐wave teleseismic receiver functions and long‐period magnetotelluric (MT) observations, using joint inversion, to infer one‐dimensional lithospheric structure. We apply a genetic algorithm to invert teleseismic and MT data from the Slave craton; a region where previous independent analyses of these data have indicated correlated layering of the lithosphere. Examination of model resolution and parameter trade‐off suggests that the main features of this area, the Moho, Central Slave Mantle Conductor and the Lithosphere‐Asthenosphere boundary, are sensed to varying degrees by both methods. Thus, joint inversion of these two complementary data sets can be used to construct improved models of the lithosphere. Further studies will be needed to assess whether the approach can be applied globally.


Introduction
[2] In recent years joint inversion of multiple data sets has become increasingly popular. By taking advantage of different data sensitivities, joint inversion can improve resolution of subtle features, reduce the influence of noise, and limit the range of acceptable models. These studies generally fall into two categories: 1) joint inversion of data sets that are sensitive to the same physical parameters, e.g., seismic surface waves and receiver functions [Julia et al., 2000], and 2) inversions of data sets that are inherently sensitive to different physical parameters, such as electrical resistivity and seismic velocity Meju, 2003, 2007]. In the first case, the joint-inversion procedure is straightforward, at least in principle. The second class of joint-inversion problem is more challenging, since one set of observations may be sensitive to structures to which the other is largely insensitive and vice versa. An important step in joint inversion of disparate data sets is therefore an assessment of compatibility. Where the data are compatible, more information on the physical state of the Earth may be obtained, allowing for a superior and more robust geological interpretation.
[3] Here, we consider joint inversion of data from teleseismic and long-period magnetotelluric (MT) studies. These methods are mainly sensitive to seismic wave velocity and electrical resistivity, respectively. Although both are affected by temperature and pressure, seismic velocities in the lithosphere are determined mainly by bulk composition and mineralogy, whereas electrical conductivities are primarily controlled by variations in minor conductive constituents [e.g., . Despite these differences, at major lithological boundaries (e.g., the Moho) both velocity and resistivity are expected to change. For co-located measurements, we should thus observe some common features using both methods.
[4] Our study region is the Slave craton, northern Canada, where there is compelling evidence for strong layering in the lithospheric mantle [Bostock, 1997;Clowes et al., 2005: Jones et al., 2005. Based on individually constructed models, previous workers have postulated that seismic interfaces imaged by receiver functions correlate with conductivity changes in the lithosphere [Jones et al., 2003;Snyder et al., 2004]. These observations motivated our work, and our objective is to test these qualitative models using a formal approach, and thus to examine the compatibility for joint inversion of these distinct types of data as well as obtain a better understanding of the relationship between electrical and seismic structure of the lithosphere.

MT and Teleseismic Data
[5] The magnetotelluric (MT) data used in this study were acquired as part of the Lithoprobe SNORCLE project and additional programs by the Geological Survey of Canada [Jones et al., 2003]. To avoid ambiguity arising from spatial variability in the response, we selected MT sites located within 10 km of the broadband seismograph stations. The mantle-probing MT data are generally of high quality between periods of 1 -5,000 s and exhibit only weak multi-dimensionality, indicated by low maximum phase split in the inverted frequency range. Comparison with neighboring sites shows that the data are not affected by static-shift. To diminish the effect of two-dimensional structures, we use the Berdichevskiy invariant [Berdichevskiy and Dmitriev, 1976], the average of the off-diagonal elements of the impedance tensor, in the inversion.
[6] The teleseismic data for this study are from sites BOXN and EKTN, located in the central Slave craton (Figure 1). These stations are part of the Canadian POLARIS network [Eaton et al., 2005] and have been in continuous operation since 2001 and 2002, respectively. We calculated P-wave receiver functions (RFs) for 59 events recorded between 2002 and 2005 for a distance range between 40°a nd 90°. We tested several techniques and selected an iterative deconvolution approach [Ligorria and Ammon, 1999], which produced the clearest and most consistent results for our data. Modeling and inversion of receiver functions depends on epicentral distance and backazimuth, which vary from event to event. In particular, azimuthal anisotropy, which is characteristic of this region [Bostock, 1997], will produce time shifts that are not accounted for in our modeling procedure. Rather than stacking the RFs, which would attenuate events that are imperfectly aligned, we therefore selected a single ''best'' RF from the complete set of RFs based on signal-to-noise ratio (SNR) and consistency with other RFs. In comparison with a stacked RF, our single-event RF only samples the anisotropic structure for a single backazimuth. While this may result in a velocity bias, for the purpose of testing joint inversion we believe that a carefully selected single RF much better represents the overall structure and major interfaces.

Joint Inversion With Genetic Algorithms
[7] Our initial tests using a linearized approach, with a Levenberg-Marquardt joint optimization algorithm, gave disappointing results, showing that the final inversion model strongly depends on the starting model and the relative weighting of the data sets. This is indicative of multiple local minima on the error surface. For this reason we use an implementation of a genetic algorithm called NSGA-II [Deb et al., 2002] specifically designed for multiobjective inversion. While genetic algorithms are computationally more expensive than linearized methods, they have a number of advantages particularly for joint inversion. Genetic algorithms, like other stochastic techniques, can escape local minima and often produce superior results for problems with an unknown number of local minima [Goldberg, 1989].
[8] In contrast to linearized methods and some other genetic algorithms, NSGA-II yields a number of final models based on the criterion of Pareto-optimality, the socalled Pareto-optimal front [Deb et al., 2002]. Each model in the Pareto-optimal front is optimal with regard to a distinct relative weighting factor between the inverted data sets. The distribution of misfit values and number of final models illustrate the trade-off between minimizing the different objective functions [Dal Moro and Pipan, 2007]. If all objective functions possess a single common minimum, there should only be one single final model. In practice, noise and early termination of the algorithm often prevent this from occurring, and the final result is a cluster of models with similar misfit and model parameters. If the objective functions have different minima, or even describe competing criteria, a large number of models spanning a range of misfit values will be the result. One example is the joint minimization of data misfit and model roughness; for such problems the genetic algorithm will yield the L-curve [Schwarzbach et al., 2005] that is often used to estimate the optimum regularization parameter in traditional inversion.
[9] We use the additional information from the various output models to assess the compatibility of electrical and seismic structure in two ways. First, the overall shape of the trade-off curve indicates the extent to which the minimization of the misfit for each data set competes with the other. Second, examining the changes in misfit and its reflection in the models and simulated data reveals the resolving power of the models and indicates parts of the model where structures are sensed differently.

Inversion Setup
[10] We jointly invert the RF and MT data for onedimensional structure to a depth of $250 km. The connection between electrical and seismic models is established by the requirement of coincident layer interfaces, although a parameter change at the layer boundary is not required. Within each layer, resistivity and seismic velocity are uniform but mutually independent. The fitted parameters are therefore layer thickness t, logarithmic resistivity log (r el ), shear-wave velocity V s and density, r. In order to reduce the ambiguity in RF modeling [Ammon et al., 1990] we use a fixed crustal shear-wave velocity of 3.4 km/s from the Lith5.0 model for Canada [Perry et al., 2002]. We calculate P-wave velocities by multiplying V s with a 3. Synthetic waveforms are obtained using the modeling code of Randall [1989].
[11] We calculate the misfit between periods of 1 -2,000 s and 0.1 -2,000 s for the MT data at sites EKTN and BOXN, respectively, and a time interval of 1 s to 35 s for the receiver functions at both sites. The exclusion of highfrequency MT data avoids near-surface structures in the model that cannot be resolved by the receiver functions, whereas the later part of the receiver function data is sensitive to structures below the asthenosphere which are not a focus of this study. Furthermore at site EKTN the phase plunges to zero at frequencies higher than 3 Hz, indicating a problem with this part of the data.
[12] The number of layers in the inversion is a fixed parameter that determines the degrees of freedom for the inversion. If too many layers are used, the models are effectively decoupled, thus providing little information on the compatibility of the data, whereas too few layers are unduly restrictive. Using data misfit as a guide, we tested 7 -20 layers and selected 11 layers for the examples below. Inversion runs with fewer layers resulted in a considerably higher misfit for both MT and receiver function data, whereas increasing the number of layers did not reduce the misfit significantly.
[13] We ran the genetic algorithm with a population size, i.e., the number of models in each iteration, of 1000 for 100 iterations, resulting in 10,000 forward computations. Due to its stochastic nature, different runs of the GA inversion yield slightly varying results. We only present results from a single run for each setting, but comparison with other runs shows that the important features are robust between dif-ferent runs even if population size and number of generations are decreased a factor of two or less.

Results From the Slave Craton
[14] Figure 2 shows the joint inversion models and misfit trade-off for the two Slave craton sites and Figure 3 shows a comparison between observed and predicted data. The trade-off curve for site EKTN (Figure 2a) shows a pronounced L-shape with the model at the point of maximum curvature marked Model A. With respect to that model, all other retrieved models only achieve a minor improvement in misfit for one data set while the other data set deteriorates considerably. Model B exemplifies the change in model parameters when moving away from the optimum model to achieve a better receiver function fit. Overall the general features of both models are similar; the major difference is in the depth of the low-velocity, high-conductivity layer at a depth below 200 km. We interpret this layer as the lithosphere-asthenosphere boundary, which has been reported at depths of 210 km based on MT results [Jones et al., 2003], 190 km based on receiver function data [Snyder et al., 2004], and 200 ± 65 km from surface wave inversions [Chen et al., 2007]. Comparing the differences in predicted data for both RF and MT, we see that the expression of the LAB in the RF data is only weak, as can be expected for P-receiver functions [Yuan et al., 2006], but that the expression of the LAB in Model A seems to correspond to a more robust feature (see Figure 3 (bottom)). Considering that also the improvement in MT is considerable, we prefer Model A to explain both data sets and infer a LAB depth of 220 km.
[15] The other two major features of theses models, the Moho and the Central Slave Mantle Conductor (CSMC) Figure 2. Representative joint inversion models and trade-off curves for sites BOXN and EKTN. For site EKTN we show two representative models that illustrate the differences between different parts of the trade-off curve. For site BOXN we show the dependence of the trade-off curves on the location of the first conductor. Only for a crustal conductor a joint model can explain the two data sets. We plot the optimum model together with the surface wave results by Chen et al. [2007].
[ Jones et al., , 2003], a high conductivity, low velocity zone between 114 and 148 km depth, are consistent between both models. The similar expression of the CSMC in electrical and seismic data has been observed before in qualitative model comparisons [Snyder et al., 2004]. The models also indicate a conductive layer just below the seismic Moho at 35 km. Such a layer has been observed in the southwestern Slave craton and has been interpreted based on its electrical expression in the absence of a highly conductive lower crust . Resolution analysis, however, shows that we only have limited sensitivity to the position of that layer and we cannot exclude the possibility that the top of the interface is located in the lower crust.
[16] The MT data from site BOXN, in contrast, has good resolution in this depth range. Here we observe a different shape of the trade-off curve, when we require the conductor to reside in the mantle (Figure 2d). Now, we cannot find a single model with low misfit for both data sets any more. Instead we have to choose between minimizing either the RF or the MT data. Inspecting the dependence of structures on the misfit (not shown) shows that the major impacting factor is the depth to the Moho. While the MT data requires the conductive interface at 26 km, the RF requires it at 35 km. Allowing the top of the conductor to reside in the crust while leaving all other inversion parameters unchanged, results in a cluster of similar models with no significant misfit trade-off. We mark the optimum model from this cluster Model C in the trade-off curve and plot the electrical and seismic parameters in Figures 2e and 2f. The overall appearance of both models is similar to the models for site EKTN. If we identify the low velocity, high conductivity zone with the CSMC, its top is now at 85 km. This agrees with previous observations of a shallower CSMC to the south [Jones et al., 2003]. We also observe an additional conductor between Moho and CSMC. Due to its location between two strong conductors, it cannot be resolved and equivalently be replaced by a gradual change in conductivity.
[17] Interestingly we do not observe a conductive LAB in this model but only a reduction in seismic velocity at 200 km depth, slightly shallower than at site EKTN. It is also instructive to compare our model with the surface wave model by Chen et al. [2007]. There appears to be a constant offset of 0.2 km/s between the two models, but the relative velocity changes are remarkably similar. The overall similarity makes us confident that our joint inversion approach models the structure correctly.

Conclusions
[18] Joint inversion of receiver function and magnetotelluric data from the Slave craton indicates that the electrical and seismic structure at least partially coincide in this region. The CSMC is expressed in all considered models as a region of low velocity and high conductivity. This and the inferred location at depths of 80-120 km agrees with the result of previous studies. At site BOXN we can exclude the possibility that the highly conductive layer is located below the Moho, as observed by  in the southwestern Slave craton. At site EKTN poor data quality prevents such a clear conclusion, but joint inversion runs with a neighboring MT site suggest that the top of the conductor is located above the Moho as well. The lithosphere-asthenosphere boundary is only a minor feature in P-receiver functions, but we obtain a low velocity layer at depths of 200-220 km, which at site EKTN coincides with a high conductivity layer and which we interpret as the LAB. These results demonstrate that with our joint inversion approach we cannot only construct a joint model, but also identify cases where an interface is sensed differently by both data sets. This is an important advantage of our approach, as we can generally expect only a partial compatibility of electrical and seismic structures. Where we can make the link, e.g. the CSMC in our study, we obtain a more detailed picture of the structure of the lithosphere. A more detailed analysis will be necessary to link these observations to petrology of the crust and upper mantle. It remains to be seen whether correlated structures are a special feature of the Slave craton or a widespread property of the Earth's lithosphere.
[19] Acknowledgments. This research is funded by IRCSET Basic Research Grant SC/2003/237. Insightful discussions with Colin Brown and instructive reviews from Niklas Linde and an unknown reviewer, helped clarify some critical points. We thank the Geological Survey of Canada for making the POLARIS data publicly available. This work was made possible by the facilities of the Shared Hierarchical Academic Research Computing Network (SHARCNET). This is DIAS publication no. 193. Figure 3. Berdichevskiy and Dmitriev [1976] averaged (top) apparent resistivity and phase and (bottom) single event receiver function for site EKTN plotted with synthetic data from our preferred model (Model A). Both synthetic data sets reproduce the major features of the observed data. We also mark the expression of those features in the receiver function data.