A novel big-data perspective on earth system evolution

ABSTRACT The many components of the Earth System are linked through complex feedbacks that can be revealed in large compilations of diverse geochemical proxy data for paleoclimate and ocean conditions. In those archives, for example, temporal and spatial trends can be visualized as topologies (‘landscapes’) defined by areas of high-density data, corresponding to steady states (= stability basins) maintained by negative feedbacks. The boundaries between those stability basins, representing low-density data areas (potential tipping points), are crossed when external drivers are involved, moving the system to a new steady state. These external drivers are often associated with positive feedbacks. As a proof of concept, we produced a stability ‘landscape’ using an extensive set of published carbon (δ13C) and oxygen (δ18O) isotope data from sedimentary carbonates spanning the last 2.5 billion years. The superimposed C-O isotopic pathways show a preference for particular regions of the ‘landscape’ at different times in Earth history. Major excursions reflect positive loops often set into motion by external inputs (drivers) that can overwhelm the system, such as major volcanic and tectonic events and human-induced climate effects. Our approach can be applied to other proxy datasets, and multivariate statistical treatments, including machine-learning approaches, can potentially yield a robust, high-resolution ‘landscape’ of the Earth System through time. GRAPHICAL ABSTRACT


Introduction
The modern climate is changing rapidly, and the magnitude of change may soon cross a critical tipping point, resulting in an irreversible cascade of feedbacks within the Earth Surface System (Stommel 1961;Budyko 1968;Lenton et al. 2008;Lenton 2013;Livina et al. 2013;Scheffer et al. 2015;Brovkin et al. 2021).This cascade will likely move the Earth System, with feedbacks -both positive (e.g.increased temperature leading to clathrate destabilization; Bratton 1999) and negative (e.g.increases in temperature offset by increasing rates of CO 2 drawdown via rock weathering) -that can result in a new steady state.Significant perturbations of geological or biological origin (e.g.major Large Igneous Provinces (LIPs; Ernst et al. 2021) or time intervals with enhanced burial of organic carbon in the oceans (e.g.Lomagundi Event; Bekker, 2022) have occurred through history, and feedbacks have mediated their impacts on average global temperature, surface ocean pH, size and depth of oxygen-minimum zones, and other parameters of the atmosphere-ocean system.While studies of past climate change hold enormous promise in understanding the progression and future of the current climate crisis, these studies have in many cases fallen short of reaching their full potential largely because of the vastly different temporal and spatial scales of change (e.g.global vs. regional) and the limited data presently available for ancient systems (e.g.gaps in stratigraphic continuity and generally weak spatial and temporal resolution).Further, while many previous studies have focussed on developing sophisticated models for changes in the climate system, a relatively untapped niche remains for using large volumes of proxy data to image more directly the climate system.
We hypothesize that exceptionally large, multi-proxy datasets have important value because they can delineate multivariate clustering within the Earth System over time.These data clusters reflect stability fields, and the sparsity of data separating them reflect the confining influence of negative feedbacks.This approach allows the data to form a topological landscape of high-density data areas (HDDA = stability basins) that are separated by ridges or hills with low-density data areas (LDDA).We assign lower confidence to those clusters in parts of the landscape with sparser data.In the broadest mechanistic sense, we can interrogate these landscape features independent of age and duration to reveal processes operating across diverse time scales -a novel concept that is so far only realised with models.Additionally, by integrating a variety of geochemical proxy types, this approach has the potential to elucidate interrelationships among geological and biogeochemical processes that would otherwise go unappreciated.Herein, we present a proofof-concept analysis using published carbon (δ 13 C) and oxygen (δ 18 O) isotope data to demonstrate the feasibility and potential utility of this novel approach for a wide diversity of future studies.It is our hope that this first step will lead to many more across a wide range of parameter space, time scales, and essential Earth System questions.

Topology landscape of environmental proxies
The Earth System is complex and contains subsystems that operate semi-autonomously but are affected by feedbacks within the subsystem, as well as those that link one subsystem to another.Positive feedbacks have the capacity to result in non-linear responses to small changes in temperature, for example, that can in some cases lead to dramatic climatic change when tipping points are crossed.Negative feedbacks, in contrast, can stabilize the climate, maintaining long-term, steadystate conditions.However, when perturbations linked to first-order drivers are sufficiently strong -or positive and negative feedbacks becoming stronger or weaker, respectively -tipping points can be crossed, leading the system into a new steady state, expressed as a new stability basin in the data (e.g.Scheffer et al. 2001Scheffer et al. , 2012Scheffer et al. , 2015;;Scheffer and Carpenter 2003;Scheffer 2010;Ashwin et al. 2012;Lenton 2013).
The current or past state of the Earth System can be assessed using a wide array of geochemical proxies (Pavlov and Kasting 2002;Kump et al. 2005;Kuroda et al. 2007;Meyer and Kump 2008;Jenkyns 2010;Kidder and Worsley 2010;Chi Fru et al. 2016;Cui et al. 2021;Diamond et al. 2021;Bergman et al. 2021;Zhang et al. 2021).These proxies are wide ranging, including those based on elemental abundances, isotopic ratios, and mineralogies through time.Carbon isotope data of carbonates (δ 13 C), for example, can reflect the state of the global carbon cycle (Arthur et al. 1985;Eldrett et al. 2014;Karhu and Bekker 2020), while oxygen isotope data of carbonates (δ 18 O) can respond to global average temperatures and continental and marine ice volume (Galili et al. 2019;Crockford et al. 2019).Trace metal abundances and isotopic relationships, such as those for chromium, molybdenum, uranium, selenium, and thallium (Pearce et al. 2008;Frei et al. 2009;Planavsky et al. 2014;Stüeken et al. 2015;Kipp et al. 2017;Asael et al. 2018;Kendall et al. 2021;Chen et al. 2021), as well as cerium and iodine abundances (Hardisty et al. 2017;Lu et al. 2017;Liu et al. 2021), can track global redox conditions.Boron isotope data are thought to mirror trends in oceanic pH, while strontium, osmium, and lithium isotope data are used as proxies for continental weathering and oceanic crust generation (Du Vivier et al. 2014;Kuznetsov et al. 2017;Dickson et al. 2021).Nitrogen isotope data and phosphorous abundance (Bjerrum and Canfield 2002;Kipp et al. 2018) scale with nutrient availability.When measured in marine sedimentary rocks from outcrop or, given the challenges linked to weathering, ideally from drill core, these proxies can track local and/or global ocean chemistry particularly when viewed in independent temporal, palaeogeographic, and paleoenvironmental frameworks.These results can then be linked through numerical modelling to atmospheric and ocean chemistry.However, the influence of changes during lithification and diagenesis (e.g.Knauth and Kennedy 2009) are also critical for some proxies (e.g.δ 18 O and δ 13 C values of carbonates).
In this light, the stability basins of clustered data (high-density data areas; HDDA) separated by topological 'ridges', which are defined by a sparsity of data (lowdensity data areas; LDDA), create a visual 'landscape' in which boundary conditions and feedbacks of the Earth System govern the topology.A 'basin' in this context should not be confused with the common use of basin to refer to sedimentary or geographic/topographic features.Transitions from one stability basin to another were likely associated with cascades of positive feedbacks (Zuckerman and Jefferson 1996;Lenton 2013) often set into motion by external drivers overwhelming the stabilizing influence of negative feedbacks within the system.These external drivers, to name only a few, include major volcanic and tectonic events, meteorite impacts, evolutionary advances in life, and humaninduced climate change (Kidder andWorsley 2010, 2012;Mills et al. 2014;Bond and Grasby 2017;Ernst and Youbi 2017;Rampino et al. 2019).Accordingly, these events can be periodic, stochastic, or even onetime perturbations, and they can be locally restricted to the depositional site of interest or, conversely, can record significant global changes.
Patterns of system behaviour reflecting transitions among states of stability and across tipping points can be visualized using the simple ball and cup diagram shown in Figure 1 (Scheffer and Carpenter 2003;Beisner et al. 2003;Hughes et al. 2013;Scheffer et al. 2015;Steffen et al. 2016;Abel et al. 2016;Mao et al. 2016;Lucarini and Bódai 2017;van Nes and Scheffer 2017;Lamothe et al. 2019;Rodríguez-Sánchez et al. 2020;compare to Lucarini et al. 2012 for related caveats).Continuous data from a geochemical proxy can exhibit a range of patterns.For example, an excursion can move a system away from the baseline value, as shown in the Figure 1.Schematic characterization of earth system stability and tipping points using proxies such as δ 13 C and δ 18 O values.Column 1: Possible variation of a proxy over time.Columns 2 and 3: system stability over a range of values of the proxy.State of system is tracked by yellow balls.Column 4: stability basin in plot of two proxies.Intensity of grey shading corresponds to an increase in basin stability.Red dots show distribution of system states, and the limited number of dots between the topological stability basins indicates the limited time the system spends in transition between stability basins.Rows indicate various scenarios of system pathways through time.Row A: two excursions for an earth system proxy, one (no.1) that is insufficient to leave the stability basin and a second (no.2) that did cross a tipping point, thus leaving the original basin and falling into a second basin.Row B: two examples of oscillations of a proxy, one (no.1) where oscillations are small in magnitude and another (no.2) where the range of oscillation is larger and inferred to cover substantial portion of the sides of the stability basin.Row C: the values cluster in two groups, but have different interpretation based on the pathway: No. 1 shows oscillation back and forth between two end-members, and no. 2 is a steady-state with limited oscillations followed by a shift to another steady-state with limited oscillations.Row D: the proxy changes value transiently, yielding an ambiguous interpretation that is either (no. 1) a shift to another stability basin followed by a return to the original stability basin or (no.2) a system that moved to a transiently stable position on a wide topological high (a metastable state).
first column of panel A (Figure 1).In such a case, the proxy data reflect a geochemical system being forced away from steady-state conditions, represented by the ball moving upward in the ball and cup analogy, as shown in the second and third columns of Figure 1 at which point the system returns to the same steady state or settles into a new one.This difference reflects how the excursion is controlled: the strength and interaction of negative feedbacks relative to the magnitude of the driver and the strength of competing positive feedbacks.The second and third columns of Figure 1 show how the data in cases 1 and 2 would plot on a single proxy axis (i.e.independent of time).By adding a second set of proxy data to the plot in the fourth column, as measured in the same samples, the ball and cup analogy becomes a topological surface of basins (HDDA) and ridges/hills (LDDA).
Though proxy visualizations of this type could become complex, this simple concept can be extended to include as many discrete data types as are available for a given sample set.By integrating results, a multidimensional topological landscape of basins and ridges can be created.This approach can yield delineations of interrelationships and system-stabilizing forces.

Data
To demonstrate the potential of this approach we can generate a topological landscape for approximately 20000 published carbon and oxygen isotope values (δ 13 C and δ 18 O) from marine carbonates spanning from 2.5 Ga ago to the present.Our data are from the Knauth and Kennedy (2009) compilation augmented with the additional Neoproterozoic data compiled in Cox et al. (2016) and Paleoproterozoic data from Shields and Veizer (2002) and Bekker et al. (2006), among many other sources.In total, there are 119 datasets represented.The references for these are provided in the Supplementary files.
The basic context for the analysis is that significant clusters of data (HDDA) are used to define the stability basins based on the notion that the system steady state (in C-O isotopic space) will preferentially sit with C and O isotopic values corresponding to those topological basins.Ridges (marking boundaries between stability basins) would then be defined by areas of fewer data density (LDDA).We emphasize again that in this paper we are discussing topological basins, troughs, hills, and ridges, which are entirely different from sedimentary and geomorphological features with the same names.

Defining deepest portions of stability basins
Our case history using C and O isotope compositions is developed using a graphical approach outlined here.From Figure 1 (Row C, column 3) it can be seen that data clusters (marking stability basins) can be defined in two ways: (1) a grouping of data in C and O isotope space that is comprised of multiple sequential datapoints or (2) a grouping of data in C and O isotope space that is comprised of all data whether sequential or non-sequential.
Sequential in this context is defined as stratigraphically continuous data in a sedimentary sequence (i.e. in which the data represent a progression up section over some time interval).In this case, it is best to use data from a single stratigraphical section in defining data clusters rather than juxtaposing data from different stratigraphic sections of uncertain age correlation.Therefore, a key factor in our approach is defining C and O isotope clusters separately for each of the 119 datasets.
It is likely that the location of stability basins (if they exist) could be shifted by variation in the values of additional proxies for other controlling factors (analogous to the way that phase boundaries associated with petrogenetic processes depend not only on temperature and pressure but also on melt composition).Our strategy is to focus on locating the deepest parts of stability basins, which we hypothesize will be less variable under the influence of additional proxies.To achieve this, we focus on small clusters of data generally having 0.5 to 1‰ range in C and O isotope values.For each of the 119 datasets (see Supplementary File), we drew ellipses around the densest concentrations of data in order to locate the 'deepest' (most stable) parts of stability basins.These data ellipses from the 119 different datasets were then grouped on a single diagram (Figure 2).

Integrating C and O isotope data clusters into a topological surface
The next step is to generate a topological surface through the ellipse data.The ellipse diagram (Figure 2) was imported into ArcGIS, and to capture the variation in the density of ellipses we assigned grid values ranging from −15 (very stable) to + 15 (not stable).Inverse distance weighting (IDW) of this grid of values was used to produce the topology shown in Figures 3-5.It is apparent in Figure 2 that the available data are not homogeneously distributed and that some regions of topology are therefore better defined that others.

Topology landscape based on C and O isotope data
The resulting topology diagram is presented in Figure 3. Figure 4 highlights the main stability basins and ridges suggested for the topology landscape in δ 13 Cδ 18 O space.The dominant topological basin (A) shows an elongated trough-like distribution, with most variations in the δ 13 C data falling between −8 and+5‰ and co-varying with oxygen isotope data ranging from −6 to −1‰.This distribution corresponds to the main grouping of data that includes both the range in primary marine values and changes expected during diagenesis (Figure 4).On the side of basin A with the lowest δ 18 O values, there is another basin (B) and an elongated basin (C) separated from basin A by prominent ridges (1 and 2 in Figure 4(b)).Basins G and perhaps B and E are more elongated along the oxygen isotope axis.Basin F is less elongated and thus could reflect feedbacks impacting both δ 13 C and δ 18 O isotope data.

Superimposed C-O isotopic pathway through time
An important question is whether the topology is invariant over the entire studied time interval, that is, whether the stability basins shown in Figures 3-5 are relevant back to 2.5 Ga or whether they are relevant only for particular time intervals.To address this question in a preliminary way, we consider the pathway of the Earth System across this topology space through time.We plot the generalized pathway in the δ 13 C and δ 18 O space with the data spanning the period from 3800 to 445 Ma (Shields and Veizer 2002; Figure 5), recognizing that the extension beyond 2.5 Ga to 3.8 Ga represents a sparser dataset.There are some interesting patterns in this distribution (compare Figures 5 with 4).For example, Neoproterozoic data plot in topological basins A, B, and C, while Mesoproterozoic and Paleoproterozoic results sit in basins C and C and F, respectively.The lower δ 13 C end-member of the basin A is populated with the Phanerozoic (mainly Plio-Pleistocene) data.Basin E is populated with the data from the Phanerozoic and some Neoproterozoic.Basin F is mainly populated with the Paleoproterozoic data, and the basin G is dominated by the Neoproterozoic data.

Controls on stability basins and isotopic excursions
With our recognition of stability basins (clusters of C and O isotope data) within the general C-O data distribution, we need to consider the processes and drivers that control the location of these stability basins and the isotopic excursions.,000 analyses of marine carbonates (see supplementary file for sources).The plot was generated using a graphical approach where data for each selected sedimentary carbonate sequence from around the world were plotted from an excel spreadsheet and transferred to a graphical program where ellipses were drawn around the densest data concentrations (HDDA) and inferred to represent stability basins (black ellipses were based on groupings of more than three points, while green ellipses and blue ellipses were based on clusters of three and two points, respectively).The above-mentioned similarity of the Phanerozoic data with the Neoproterozoic data was previously noted by Knauth and Kennedy (2009).They inferred from this similarity in data distribution that processes responsible for variation in δ 13 C and δ 18 O values in the Phanerozoic were also active in the Neoproterozoic (see Figure 5).Based on the better understood Phanerozoic processes, they inferred the primary seawater composition over most of this interval was centred on δ 13 C values of 0 to +5‰ and δ 18 O values of 0 to+4‰.Early diagenetic processes yield deviations from this marine field towards δ 13 C values of −10‰ and δ 18 O values of −5 to −10‰.Subsequent late diagenetic alteration processes are inferred to shift δ 18 O to even lower values (Knauth and Kennedy 2009; Figure 4(d)).However, note that C isotope values can also be affected at high waterrock ratios (Banner et al. 1988;Banner and Hanson 1990), resulting first in a shift to lower δ 18 O values followed, at higher water-rock ratios, by a shift to lower and negative δ 13 C values.
Many of the major isotopic changes (excursions) during the Cambrian and Neoproterozoic (Figure 5 (Hurtgen et al. 2002) in Knauth and Kennedy (2009); #98 = Paleoproterozoic (Bekker et al. 2006) for the Fenwick Formation.For part D, the drivers are after Knauth and Kennedy (2009), apart for the LIP warming trend.
The nature of these excursions provides insight into the negative feedbacks that maintain topological basin stability, as well as the positive feedbacks and related external drivers that cause shifts (excursions) among stability basins.Some factors (summarized in Figure 4(d)) relevant to our trends are: (1) negative feedbacks that maintain the stability basins and (2) positive feedbacks and drivers that are related to transitions among stability basins representing potential tipping points.These relevant factors also include primary marine composition as well as shifts associated with early and late diagenesis.Also relevant is the role of triggers external to the system, such as emplacement of Large Igneous Provinces (LIPs) and meteorite impacts.The effect of LIPs can be complex and multidirectional with, for example, large release of CO 2 associated with mafic volcanism resulting in global warming, whereas felsic volcanism delivers large volumes of SO 2 , which results in sulphate aerosols and cooling (Ernst et al. 2021).At the same time, continental chemical weathering of LIPs that efficiently draws down CO 2 can also deliver P to the oceans, leading to blooms in primary organic productivity that further enhance CO 2 consumption (e.g.Cox et al. 2016).
These are only a few of the potential mechanistic underpinnings of our results presented in a very general way.Our principal goal here has been to demonstrate, through one representative application, a general topological approach that can be extrapolated to many other environmental proxy data types.These applications can provide a novel context for interpretation of environmental and climatic stability and the potential for crossing tipping points.In other words, our proof of concept is a demonstration that variations in proxy data density (e.g.C and O isotopic data; Figures 2-5) can be linked to local-and global-scale processes, allowing detailed interpretation of primary and secondary controls and relevance to climate models.Important controls include first-order, often linked drivers, such as tectonics, continental weathering, sea-level variations, and even ocean redox.

Conclusions
Modern Earth is experiencing dramatic changes in atmospheric carbon dioxide content and climate.Central to related concerns are the point at which Earth's modern climate would breach a tipping point that would shift the climate dramatically and perhaps rapidly.In order to understand the Earth's modern tipping points, it is instructive to explore the evolution of the Earth System through geological time.We have suggested an approach to understanding this system by using environmental proxies recorded in sedimentary rocks to characterize the topology of basins (marking stable conditions) bounded by ridges (marking tipping points between adjacent stability basins).Herein, we demonstrated the existence of such topological landscape using two proxies for which there are abundant data.Carbon and oxygen isotope values from carbonate rocks provide insights into marine conditions (temperature, biological productivity, and carbon sequestration) and subsequent processes of diagenesis (potentially affecting both δ 13 C and δ 18 O values).A dataset of~20,000 paired δ 13 C and δ 18 O values from 119 datasets was analysed to identify data clustering inferred to represent preferred system states (i.e.basins of stability), with the areas of fewer data representing ridges (and potential tipping points separating stability basins).The primary and secondary mechanisms intrinsic to this distribution of C and O isotope data, although only one example, are at the heart of many, if not most, first-order controls on the Earth System and, specifically, long-term patterns of co-evolving life and environments.

Future work
Given this initial promise suggested by our graphical approach to large carbon and oxygen isotope datasets, the next stage of this research is to provide a more robust mapping of the landscape of the Earth System using multivariate statistical assessment and data mining algorithms, including machine/deep learning, and other big-data analytical techniques (and testing for and addressing any potential systematic biases due to sparse data sets using Monte Carlo simulations and other approaches) across a diversity of geochemical proxies.These can include other surface proxies, such as those bearing on ocean redox evolution (trace metal abundances and isotope systematics) and pH, continental weathering, submarine volcanism, and nutrient availability, among many others -all with eyes on both past and future Earth Geologic age can be included as an additional variable to help test for topological invariance with time.Paleo-location information integrated with continental reconstructions can help distinguish effects due to palaeogeography.Subsequent studies should strive to better constrain topologies through inclusion of more data and additional proxy types with the goal of detailed analysis of specific feedbacks (both negative and positive) that apply to particular areas of the topology landscape, as well as to major short-duration perturbations (e.g.LIPs and meteorite impact) and long-term processes with often predictable time scales of operation (e.g.Milankovich forcing and plate tectonics).The ultimate end goal is to couple this qualitative understanding to sophisticated climate system models.

Figure 2 .
Figure2.Distribution of C and O isotope data from approximately 20,000 analyses of marine carbonates (see supplementary file for sources).The plot was generated using a graphical approach where data for each selected sedimentary carbonate sequence from around the world were plotted from an excel spreadsheet and transferred to a graphical program where ellipses were drawn around the densest data concentrations (HDDA) and inferred to represent stability basins (black ellipses were based on groupings of more than three points, while green ellipses and blue ellipses were based on clusters of three and two points, respectively).
(a)) represent variation in δ 13 C values; however, during the Mesoproterozoic (Figure 5(b)), δ 18 O variations were on a larger scale than those in δ 13 C values.During the Paleoproterozoic, variations were again mostly in δ 13 C values, but with a component of variation in δ 18 O, and in the sparse Archaean dataset the variation was mostly in δ 18 O values.In Figure 4(d), we highlight some specific excursions that crossed multiple stability basins.

Figure 5 .
Figure 5. Generalized secular C and O isotope trends superimposed on the topology of Figures 3 and 4. Smooth pathways for the mean global C and O isotopic data for 3800-445 Ma carbonates (data from Shields and Veizer 2002) are generated using Excel with a smoothed-path option.Blue and red paths are for δ 18 O and δ 13 C values for limestones and dolostones, respectively.A similarity of Phanerozoic data (not shown) with Neoproterozoic data was previously noted by Knauth and Kennedy (2009).