Geographically and temporally weighted principal component analysis: a new approach for exploring air pollution non-stationarity in China, 2015–2019

ABSTRACT In spatiotemporal applications, geographically weighted principal component analysis (GWPCA) is commonly adopted to describe spatial heterogeneity. However, time effects are ignored in GWPCA. In this study, the temporal effect was incorporated into GWPCA . Thus, an extended model, geographically and temporally weighted principal component analysis (GTWPCA), was developed to simultaneously explore spatial and temporal non-stationarity. The GTWPCA was implemented using a case study of air pollution in China. The results mainly show that GTWPC1 (the local component one in GTWPCA) corresponds to a ‘winning group’ with constantly varying ‘winning’ variables adapted to the spatiotemporal non-stationary characteristics of air pollution in China.


Introduction
Principal component analysis (PCA), the most common form of factor analysis, is categorized as a multivariate statistical technique (Ballabio 2015). Currently, PCA has been used in various fields, including biology (Peakall et al. 2006, Lall et al. 2018, chemistry (Smilde et al. 2003, Tang et al. 2018, and geography (Wang et al. 2012, Abdellaoui et al. 2013, Liu et al. 2019. In this study, we focus on its use within a geographical setting. Similarly, Sharma et al. used PCA to extract three components from 13 geomorphic variables (Sharma et al. 2014), characterizing the eight sub-watersheds of the Tons River in India. However, the study found that local differences between sub-basins were masked due to spatial and temporal effects.
Space and time are two fundamental dimensions that provide a framework for evolutionary relationships in nature (Fotheringham et al. 2015). According to Tobler's First Law of Geography stating that 'everything is related to everything else, but near things are more related than distant things', spatiotemporal heterogeneity is prevalent in spatial and temporal environments (Tobler 1970, Wu et al. 2014. However, as PCA is implemented at a global scale, it does not account for variations in the covariance structure of the data over space and time. Fortunately, geographically weighted (GW) models can solve spatial heterogeneity problems that cannot be handled by global models (Fotheringham 1998, Fotheringham et al. 2002, Huang and Leung 2002, Li et al. 2016, and as such are becoming one of the most widely used methods for understanding variations in space. Harris et al. explored the expansion of PCA with spatial effects in geographic processes to capture the spatial variation in data structure (Harris et al. 2011). GWPCA provides a detailed explanation of spatially varying relationships among variables in a multivariate dataset. Common practice indicates that the spatial characteristics of each component are identified by mapping the percentage of total variation for each component. More importantly, the main variables affecting spatial variation are captured by visualizing the 'winning' variables of the component loadings. Many applications have confirmed that GWPCA is a powerful tool for revealing the variation in the spatial structures of multivariate datasets (Tsutsumida et al. 2017, Vladyka et al. 2018, Fernández et al. 2018, Basu et al. 2020. Harris et al. revealed spatial diversity in the chemical structure of freshwater in Scotland using a freshwater chemistry dataset (Harris et al. 2015). Li et al. applied GWPCA to the dataset of provincial statistical yearbooks and detected that the regional economic development of Jiangsu Province exhibited characteristics of spatial inequality between northern and southern areas (Li et al. 2015).
Similarly, if we apply GWPCA to air pollution data, significant variations in air pollution may be found across different geographic locations. Nevertheless, such variations are typically assumed to be stationary over time. Air pollution is a dynamic and complex process that changes not only with geographic location, but also with time (Guo et al. 2017, Li et al. 2017a, Mohtar et al. 2018. Compared with PCA, GWPCA is advantageous in that it captures spatial heterogeneity by considering spatial effects, however it neglects important temporal effects that lead to missing temporal information. Hence, a novel approach to explore the spatial and temporal variation characteristics of air pollution would be to apply a GWPCA model, with the incorporation of temporal effects.
Air pollution is posing a threat to human health. Numerous studies have indicated that respiratory and cardiovascular diseases are caused and exacerbated by air pollution (Brunekreef and Holgate 2002, Boogaard et al. 2019, Hu and Guo 2020. China, as the largest developing country, has been actively seeking a balance between rapid economic development and air pollution. Currently, more and more researchers are joining the study of air pollution. Some studies have considered specific sources of air pollution emissions to investigate the effects of the emission sources of air pollutants on their concentrations. For example, Zhai et al. explored the relationship between PM2.5 emission sources (traffic, industrial plants, and ground surface dust, etc.) and PM2.5 concentrations by PCA-GWR model (Zhai et al. 2018). The relationship between SO 2 emission sources (industrial and household emissions) and SO 2 concentration was examined by (Yang et al. 2017). Other researchers have explored air pollution from the perspective of economic factors. Zhao et al. explored the impacts of gross domestic product, fossil fuel energy consumption, energy consumption intensity, and economic structure on SO 2 Emissions (Zhao et al. 2018). Dong et al. showed the relationship between economic development factors (energy structure, energy intensity, technological progress, urban population, etc.) on haze pollution . Additional studies have focused on modeling the spatial distribution of specific air pollutants or analyzing the potential relationships between different air pollutant emissions (Zhai et al. 2016, Li et al. 2017b, Dong et al. 2019. Nowadays, with the emergence of spatiotemporal derivation and modeling (Pu and Yoo 2020, Zheng et al. 2021, Liu and Dong 2021, the spatiotemporal characteristics of air pollution are gradually revealed. In this study, the concentrations of air pollutants are selected as the study object. The objective of this investigation is to extend the GWPCA model to issues involving the spatial and temporal non-stationarity of air pollution, thus enabling the simultaneous capture of spatial and temporal variability characteristics. First, time effects were incorporated into GWPCA to construct GTWPCA method. Second, to test the reliability of the GTWPCA model, Monte Carlo tests were used to evaluate whether local eigenvalues from GTWPCA varied significantly across space and time. Finally, the PCA, GWPCA, and GTWPCA models were compared with air pollution data of 292 Chinese cities from 2015 to 2019.
The remainder of this paper proceeds as follows. In Section 2, we introduce the theory of the GWPCA model and the incorporation of the temporal dimension. The details of the GTWPCA implementation are presented, including bandwidth and parameter selection and Monte Carlo tests. In Section 3, an overview of the dataset for this study is presented, followed by an analysis and comparison of the results from the different models. Section 4 discusses the advantages of the GTWPCA model, the reliability of its results, and the deficiencies in its interpretation. In Section 5, we summarize and draw conclusions.

GWPCA review
GWPCA is an extension of PCA to include geographic data, and is designed to account for potential spatial heterogeneity in the data (Harris et al. 2011). GWPCA uses a moving window weighting approach to search for principal components at local locations. When conducting a single PCA at local locations, all neighbouring observations are weighted according to the features of the chosen kernel weighting function, after which a standard PCA is applied locally to these weighted data. The size of the window defines the scale of the standard PCA operating at local locations, and is controlled by the bandwidth of the kernel function selected in GWPCA (Harris and Clarke 2015).
Specifically, if spatial location i has coordinates u; v ð Þ, the GWPCA introduces a vector of observed variables X i associated with the position Þ are the local mean vector and variance-covariance matrix, respectively. The local variancecovariance matrix is where X is a data matrix (with n-rows for the observations and m-columns for the variables), and w u i ;v i ð Þ refers to the diagonal matrix of geographic weights generated using a kernel function.
The geographic weight matrix indicates the different importance of each individual observation in the dataset for calculating the local principal component at location i. In general, the closer an observation is to i, the greater the weight. Therefore, each point i has an exclusive weight matrix. Typically, the GW method is specified using a distance decay kernel function. In this case study, to better capture the non-stationarity of the study area (Paez et al. 2002), we use a bi-square function: where the bandwidth refers to the geographic distance r, d ij is the distance between the spatial locations of the i th and j th rows in the data matrix, and W ij is the geographic weight at location j when calculating the local principal component at location i. The bandwidth can be specified either as a fixed distance or as a fixed number of local observations (i.e. an adaptive distance). In fact, adaptive distance is more appropriate for the irregular sample configuration of the study data (Harris and Clarke 2015), and it was thus adopted in our study.
By decomposing the local variance-covariance matrix to provide the local eigenvalues and eigenvectors, the geographically weighted principal components at the locations u i ; v i ð Þ can be expressed as follows: In GWPCA, the components are usually reported in descending order of their respective eigenvalues. When each local eigenvalue is divided by tr V u i ;v i ð Þ À � , the percentage of the total variance in the original data is returned for each component at location i. Hence, the first component contains the highest percentage of the total variance, and the percentage of the total variance decreases for the subsequent components.

GTWPCA methodology
At some scale, an assumption of multivariate normality is required for PCA and GWPCA. The scale for PCA operation is global, whereas GWPCA is local in geographic space (Harris et al. 2011). For GTWPCA, an assumption of multivariate normality is as well required, and this operational scale is localized in a spatiotemporal setting. Supposing that a vector X i of the observed variables at spatiotemporal location i has a multivariate normal distribution of the mean vector μ and the variance-covariance matrix �, it would be denoted as X i ,N μ; � ð Þ. In addition, if spatiotemporal location i has coordinates u; v; t ð Þ, then PCA with local spatiotemporal effects regarding u, v and t as conditions of X i , and considering μ and � as functions of u, v and t, namely, Þ. Since μ and � are functions of u, v and t, this means that each element of μ u; v; t ð Þ and � u; v; t ð Þ is also a function of u, v and t. Consequently, μ u; v; t ð Þ and � u; v; t ð Þ are the geographically and temporally weighted mean vector and variance-covariance matrix, respectively. The geographically and temporally weighted variance-covariance matrix is as follows: where X refers to the data matrix (n-row refers to data points, m-column refers to variables), and w u;v;t ð Þ is a diagonal matrix of spatiotemporal weights. According to a comprehensive elaboration of the spatiotemporal weight matrix by Huang et al. (2010), the observation data are assumed to be located in a threedimensional space-time coordinate system with relatively balanced-scale effects in time and space. The spatiotemporal distance d ST ij between point u i ; v i ; t i ð Þ and its surrounding point u j ; v j ; t j À � can be described by the linear combination of the spatial distance d S ij and the temporal distance d T ij , as shown in Formula 6. When drawing a sphere with radius r around a particular point i, the local principal component can be calculated using PCA only on the observations within this sphere.
The parameters α and β are used to balance the effects of the spatial and temporal distances. The spatiotemporal weight matrix is constructed through the spatiotemporal distance and the bi-square kernel function, as shown in Formula 7: The bandwidth is the spatiotemporal distance r ST (in this case, the bandwidth is also an adaptive distance), d ST ij refers to the distance between the spatiotemporal locations of the i th and j th rows in the data matrix. And W ST ij is the spatiotemporal weight at location j when calculating the local principal component at location i.
The decomposition of the geographically and temporally weighted variance-covariance matrix provides the geographically and temporally weighted eigenvalues and eigenvectors matrix. As such, the geographically and temporally weighted principal components at spatiotemporal location u i ; v i ; t i ð Þ can be written as The geographically and temporally weighted component scores at the spatiotemporal location u i ; v i ; t i ð Þ are found by post-multiplying the original data X by geographically and temporally weighted eigenvector matrices L u i ;v i ;t i ð Þ , as shown in Formula 9.
In GTWPCA, each local eigenvalue is divided by tr V u i ;v i ;t i ð Þ À � to find the percentage of the total variance for each component at location u i ; v i ; t i ð Þ. Moreover, the first component from the GTWPCA exhibited the highest percentage of the total variance.

Bandwidth and balanced parameter τ selection
The key procedure in GTWPCA is to determine the proper scale at which each localized PCA should operate. This operational scale is often determined by the kernel bandwidth r ST , and the spatiotemporal distance r ST is influenced by balanced parameter τ, which can be calculated by α=β. In most cases, this bandwidth is specified by the user (2012). In this research, we are guided by the automated procedure for bandwidth selection introduced in GWPCA (described by Harris et al), in a similar manner to the procedure for bandwidth selection by cross-validation in GWR (Zhao et al. 2018). In essence, both GWPCA and GTWPCA retained the properties of the PCA. The parameter τ determined the scale of GTWPCA operation, as it balanced the effects of different distances in space and time. A similar procedure was used in the GTWR model to select the optimal balanced parameter (Huang et al. 2010), and the same practice was followed in this study.
In general, to provide a better description of the selection of bandwidth and balanced parameter τ in GTWPCA, it is necessary to comprehend the properties of PCA (Harris et al. 2011). Supposing that the X matrix consists of m variables and n observations, there will be n vectors in the m-dimensional space (each observation is a vector). To reduce the dimension, a traditional method is to transform the correlated variables in the data matrix into uncorrelated components, and then retain the first q components, enabling the subspace to maintain a higher percentage of the total variance. A q-dimensional subspace is constructed from the first q loadings, and the component scores with components q + 1 to m represent the Euclidean distances along the axes of the corresponding orthogonal vectors to a q-dimensional linear subspace. As a result, components q + 1 to m represent the deviation from this q-dimensional subspace. Herein, we assume that L q denotes the loading matrix L, with only the first q-columns retained, and that L À q ð Þ denotes the loading matrix L with the first q-columns removed. The first q components are XL q , and the remaining components are XL À q ð Þ . Jolliffe (2002) proved that the best (least squares) approximation of X with rank q is XL q L T q , and the residual matrix given by S ¼ X À XL q L T q is equivalent to XL À q ð Þ L T À q ð Þ (Jolliffe 2002). Hence, with principal components, we aimed to find the minimum of the expression P ik X ½ � ik À S ½ � ik À � 2 with respect to S, where S is a matrix of rank q. In this way, the variance levels of the components of S could measure the 'goodness of fit' (GOF) of the projected subplans (Harris et al. 2011, Harris andClarke 2015). The GOF for the i th observation is where S ik is the score of the kth component of observation point i. The GOF of the entire dataset is For GTWPCA, the local principal components of the i th location represent a similar projection, but the loadings are locally defined. Thus, S must be determined to minimize the following equation. . W i refers to the spatio-temporal weight for location i. GOF statistics for GTWPCA can be defined in a similar way to those of PCA, except that S is defined by the spatiotemporal weight at each local location. In this way, the total GOF statistics provide a method for finding the optimal bandwidth and balanced parameter τ for GTWPCA, where 'Leave-One -Out' is typically used when calculating the terms of the statistic. Here, the total GOF statistics, applying 'Leave-One-Out', can be calculated for all possible bandwidths and balanced parameters τ. Furthermore, the optimal bandwidth and balanced parameter τ are related to the minimum total GOF value. More importantly, as in GWPCA, the number of components to retain for GTWPCA (i.e. the value of q) is decided upon a priori. The optimal bandwidth and balanced parameter τ cannot be determined if all m components are retained, as shown in Equation 10. Normally, if compared with global PCA and GWPCA, GTWPCA needs to retain the same number of components.

Randomization tests
Monte Carlo tests can be employed to evaluate whether local eigenvalues from GTWPCA vary significantly across space and time, and provide support for the use of GTWPCA. In doing so, the spatiotemporal locations were sequentially randomized in the variable dataset. After each randomization, GTWPCA was applied, and the standard deviation (SD) of a given local eigenvalue was calculated. Thereafter, the actual SDs of the same local eigenvalues were included in a sorted distribution of SDs. Finally, its position within this ranking distribution determined whether there were significant spatiotemporal variations in the selected local eigenvalue. A similar procedure was used in the GWPCA model to test whether local eigenvalues from GWPCA varied significantly across space (Harris et al. 2011). In the actual operation, the Monte Carlo result of the local first eigenvalue is usually used to measure the non-stationarity of local eigenvalues, as the first component dominates the other components (Harris and Clarke 2015).

Case study: air pollution in China
To examine the applicability of GTWPCA, a case study was conducted to explore the spatial and temporal variations of air pollution in 292 Chinese cities between 2015 and 2019. The spatial and temporal heterogeneities were first detected using Monte Carlo tests, and then three air pollution analyses were established using PCA, GWPCA, and GTWPCA. First, PCA was employed to analyze the global trends of air pollution without considering any spatial and temporal effects. Thereafter, when spatial heterogeneity was considered, GWPCA was carried out to show the spatial variation characteristics of air pollution. Finally, the non-stationary characteristics of air pollution across space and time were presented by applying our proposed GTWPCA, which simultaneously considered spatial and temporal effects. Figure S1 illustrates the entire process of the case study.

Study data
In this study, we screened 292 cities in China with complete data fields from the officially published air pollution dataset between 2015 and 2019. The geographical location of the cities is shown in Figure 1, indicating a distribution pattern of a greater number of cities in eastern regions than in western regions. The cities in Taiwan Province are not included in this survey. Some researchers have demonstrated an inextricable relationship between meteorological conditions and air pollution (Vanos et al. 2013, He et al. 2017. Therefore, we used eight variables containing six criteria of air pollutants and two meteorological statistics to measure the air pollution characteristics of the cities (Vanos et al. 2013, Ji et al. 2019. The six criteria air pollutants included particulate matter (PM2.5, PM10), carbon monoxide (CO), sulfur dioxide (SO 2 ), nitrogen dioxide (NO 2 ), and ozone (O 3 ), based on data from the China National Environmental Monitoring Centre (http:// www.cnemc.cn). The two selected meteorological statistics, namely the number of sunny days in a year, and the number of days in a year when wind is less than force 3, are commonly available from the China Meteorological Data Service Center (http://data.cma. cn). These eight variables are listed in Table 1.

Results of PCA, GWPCA and GTWPCA
To understand the output of any local model, it is instructive to build the global model first. PCA was applied for the first time using the air pollution dataset. The results are presented in Table 2. The results revealed that the first three components had eigenvalues greater than, or very close to 1, and these three components collectively accounted for 73.4% of data variation, with the first component representing 45.7%. Only the first three components were retained to reduce the dimensionality, with the key trends of data variation explained by the first component. From a global perspective, PC1 (the first component from PCA) represents PM10 and PM2.5, which reveals the general trend of air pollution in China, but fails to reliably characterize air pollution locally.
As a non-spatial statistical analysis method, PCA ignored the influence of geographical location on air pollution. Thus, we attempted to replace PCA with GWPCA to explore the spatial variation in air pollution. A Monte Carlo test was conducted to evaluate whether the local eigenvalues for each component varied significantly across space. As shown in Figure S2, the p-value of the true SD was insignificant (p = 0.001) regarding the  eigenvalues of the dominant first component. Thus, the null hypothesis of local eigenvalue stationarity was rejected. In the next step, the critical aspect of GWPCA implementation was to determine the scale at which it operated locally, namely, selection of the optimal bandwidth. To compare GWPCA with PCA, it was reasonable that GWPCA similarly retained three components. Thus, we found the optimal bandwidth of 706, using an adaptive bandwidth selection procedure ( Figure S3). Eigenvalues and loadings for each component at spatial locations from GWPCA were available. The results are visualized according to the method of Fotheringham et al. (2002). Figure 2(a,b) show the spatial distribution of the percentage of total variance for the first component, and the first three components combined, respectively. Spatial variations in the results of GWPCA are shown on both maps, with the majority of local variations explained by the first local component from GWPCA in the northern region. In addition, the percentage of total variance of the first three components combined was higher in northern regions than in southern regions.
To further explore the output of the GWPCA model, the 'winning' variables in each local component (the one with the highest absolute local loading) were mapped according to the guidance of common practice. For a better understanding of the relationship between air pollution and meteorological conditions, the visualization method was improved by replacing the 'winning' variable with the 'winning group'. In essence, the 'winning group' variables contained two 'winning' variables, namely, air pollutants and weather variables. Figure 2(c) shows the 'winning group' of the first component from GWPCA. In comparison with the spatial pattern in Figure 2(a), GWPCA clearly shows that various 'winning groups' dominated air pollution in different regions of China. For example, the 'winning group' (i.e. PM2.5 and SunD/WindD) played a significant role in air pollution in most of the southern region, while another 'winning group' (i.e. PM10 and SunD/WindD) dominated air pollution in the northwest.
In general, a multivariate glyph could be used locally to observe the spatial variation between the 'winning' variables in the 'winning group'. In the multivariate glyphs, the length of the spokes reacted to the magnitude of the local loading and its color responds to the sign (in this case, red indicated negative and blue indicated positive). The 'winning' variable for each local loading was always in the same place on the glyph as follows: PM2.5 at 0° (North); PM10 at 45° (Northeast); CO at 90° (East); NO 2 at 135° (Southeast), SO 2 at 180° (South), O 3 at 225° (Southwest), SunD at 270° (West) and WindD at 315° (Northwest). Figure 2(d) shows the multivariate glyph for the 'winning group' of the first component from GWPCA. A spatial preponderance of the 'winning group' glyphs of one color or another, or the color difference of the 'winning' variable symbols from the 'winning group', provided a general indication of air pollution in the geographic location. The northeastern and southeastern regions showed an increasing trend in air pollution, while other regions showed a stronger positive effect. It is worth noting that in a few cases throughout the eastern region, there were opposite effects between air pollutants and meteorology (represented as different colors in the 'winning group').
As mentioned above, the GWPCA results only account for the spatial pattern of air pollution. Therefore, GTWPCA was applied to provide a comprehensive understanding of air pollution characteristics across both space and time. Figure S4 displays the resulting distribution of the local eigenvalue SD for the first eigenvalue from GTWPCA. As the p-value from the Monte Carlo test for the local first eigenvalue was 0.001, the null hypothesis of local eigenvalue stationarity in space and time was significantly rejected at the 95% level (i.e. the first component dominates the others). GTWPCA retained the first three components in order to align with GWPCA and PCA. After which, the bandwidth and parameter were selected using a modified automated procedure. Figure S5 shows that the optimal adjusted bandwidth and balanced parameter were 706 and 15.7, respectively.
To provide a more understandable visualization of the GTWPCA output, the same mapping method of GWPCA was mimicked. For GTWPCA, the percentage of total variance and 'winning group' of the first component were mapped separately. Figure 3(a,b) present the spatiotemporal distribution of the percentage of total variance for the first, and the first three components combined, respectively. Both maps show marked variations in space and time. Specifically, most of the local variations from year to year can be explained by the first component (GTWPC1) from GTWPCA. This suggested that GTWPC1 offered a stronger explanatory power regarding local variations as time increased. Furthermore, the percentage of total variance explained in the first three components combined was higher in the north than in the south, and generally increased over time.
Based on a priori knowledge, the dominant first component (GTWPC1) describes the spatial and temporal patterns of air pollution. Figure 4 shows the 'winning group' of the local first component. Compared with the spatiotemporal pattern in Figure 3(a), GTWPCA indicates that various 'winning groups' dominated air pollution across space and time. Specifically, the 'winning group' (i.e. CO and SunD/ WindD) lost dominance in the air pollution of China year by year, while another 'winning group' (i.e. SO 2 and SunD/WindD) was increasing in dominance. Moreover, the 'winning group' (i.e. PM10 and SunD/WindD) had an unchanged dominance in the air pollution, and the 'winning group' (i.e. O 3 and SunD/WindD) had a substantial position since 2018. In order to develop deeper insights into the spatiotemporal information of the 'winning' variables in the 'winning group', a multivariate glyph was also used for mapping. Figure 5 shows the multivariate glyph for the 'winning group' of the first component from GTWPCA. A spatiotemporal advantage of the 'winning group' glyphs of one color or another, or the color changes of the 'winning' variables symbols from the 'winning group', offered new evidence of air pollution over space and time. Specifically, the situation of air pollution showed a year-on-year improvement (depicted in Figure 5 by the appearance of red symbols in an increasing number of 'winning group'). In addition to this, our findings indicated that the opposite effect of weather and air pollution gradually diminished, as depicted by the color of sign in the 'winning group' which became more consistent year by year.   . Multivariate glyphs of loadings for 'winning group' on GTWPC1 at all spatiotemporal locations. The variable is specified by the sign direction: PM2.5 is at 0° (North); PM10 is 45° (Northeast); CO is 90° (East); NO 2 is 135° (Southeast), SO 2 is 180° (South), O 3 is 225° (Southwest), SunD is 270° (West) and WindD is 315° (Northwest).

Discussion and conclusions
GTWPCA and GWPCA are localized versions of PCA, with GTWPCA being an improved method for GWPCA. In a complex spatiotemporal environment, both space and time have important influences on air pollution. PCA is a global model that extracts the principal components to analyze the overall trend of air pollution. However, this process does not consider space or time effects on air pollution. GWPCA considered the space effect and ignores the time effect, making the results incomplete. Therefore, we attempted to introduce time as a new dimension into the GWPCA model in order to establish a three-dimensional spatiotemporal coordinate system. In doing so, the spatial kernel function was replaced by the spatiotemporal kernel function constructed from the spatiotemporal distance and the bi-square function. The observations around the spatiotemporal locations were weighted according to the spatiotemporal kernel function, after which the local principal components were calculated for these weighted observations. This process involved both space and time effects; thus, indicating that GTWPCA can identify the spatial and temporal characteristics of air pollution.
In GTWPCA, a variant PCA is calculated at a local location, and as such, the results vary continuously over space and time, allowing them to be mapped. If there are m-variables in a dataset, the GTWPCA generates m-components, m-eigenvalues, m-sets of component loadings, and m-sets of component scores at each spatiotemporal position. As a result, visualizing and interpreting the copious output that results from its application poses some challenges. It is conventional to consider the first component with the largest eigenvalue, as it contributes to the greatest part of the data variance. Other components may be useful for obtaining more information, which is an area for further research. However, GTWPCA is a suitable exploration tool when there is a certain spatiotemporal heterogeneity in the structure of the multivariate dataset being studied.
Using China's air pollution data, our proposed GTWPCA verified the significant spatiotemporal variation in air pollution, without which, this non-stationarity would have gone unnoticed. The GTWPCA results indicated that the percentage of total variance explained by the first three components combined was higher in the north than in the south, and generally increased over time. Additionally, GTWPC1 corresponds to a 'winning group' with constantly changing 'winning' variables adapted to the spatiotemporal non-stationary characteristics of air pollution in China. This illustrates that under the influence of spatial and temporal effects, the major air pollutants vary over space and time. To be specific, the spatial performance is that the major air pollutants differ in different spatial locations. For instance, the major air pollutant in Xinjiang Province is PM10, while the major air pollutant in Yunnan Province is SO 2 . The temporal behavior shows that the major air pollutants are prone to shift over time. As an example: the major air pollutant in Guangdong switched from PM2.5 to SO 2 in 2017. Furthermore, the meteorological variable in the 'winning group' had a positive effect on spatiotemporal variation. As shown in Figure 5, the red color of sign in the 'winning group' is increasing year by year. In other words, the air pollution is improving year by year, which means that the sunny and windy meteorological conditions are favorable to control the air pollution.
These conclusions are of great significance for establishing environmental protection measures in future. Based on the research results of this paper, the following suggestions are provided.
(1) Due to the spatial variation of major air pollutants, cities should adopt targeted measures to reduce major air pollutants in the region. For example, Yunnan Province should prioritize actions to address PM2.5, while Xinjiang Province should reduce PM10 more first.
(2) The temporal shift of major air pollutants should be taken into account, otherwise it will be caught in a situation where the reduction of one major air pollutant is followed by the dominance of another air pollutant. A case in point, the major air pollutant in Guangdong Province shifted from PM2.5 to SO 2 in 2017.
(3) Measures to control air pollution should be carried out under favorable meteorological conditions. Because experimental results indicate a natural association between meteorological factors and air pollution. Moreover, sunny and windy meteorological conditions are favorable for air pollution control.

Disclosure statement
No potential conflict of interest was reported by the author(s).

Funding
This work was supported by the National Natural Science Foundation of China [41701461,41801316].