Drivers of carbon dioxide emissions: an empirical investigation using hierarchical and non-hierarchical clustering methods

The mitigation of CO2 emissions requires a global effort with common but differentiated responsibilities. In this paper, we identify clusters of CO2 emissions across 72 countries. First, using the stochastic version of the IPAT and employing the dynamic common correlated effects technique, we identify three key determinants affecting CO2 emissions (non-renewables, population, and real GDP). In the second step, both hierarchical and non-hierarchical clustering methods are considered to identify the optimal number of clusters. We identify two to four clusters with different member countries, and in particular establish that in most cases, a 2-cluster solution appears to be optimal. The contents of clusters vary slightly according to the clustering methods for each period. The clustering results from using only the overall CO2 emissions indicate that the countries we consider form three clusters, with China and the USA each within a single member cluster. The remaining 70 countries form the third cluster. Our findings reflect the prominent roles of China and the USA in overall CO2 emissions. Analyses with sub-period and largest emitters reflect a different clustering structure. Some policy recommendations in setting emission reductions are made, considering different clusters across countries.


Introduction
Global warming and the associated environmental degradation are under constant scrutiny. The Intergovernmental Panel on Climate Change (IPCC) has released guidelines for the national greenhouse (GHG) inventory since 1995 and published several updated versions in the last three decades. The United Nations Framework Convention on Climate Change (UNFCCC) follows the IPCC guidelines in setting the national emissions inventory. Implementation depends on 'common but differentiated responsibility', based on the economic development, structural composition and emission growth factors of countries. At the same time, countries emitting high CO 2 emissions per capita may be grouped together and coordinate to set a similar reduction target.
Scientific research addresses the problem of climate change as a result of anthropogenic emissions. Since the first IPCC assessment report (Change 1990), there have been active efforts to design and adopt policies for controlling the emissions of pollutants. The effects of CO 2 emissions are evident globally; mitigation requires managing the global commons and maintaining a measure of international coordination among nations. Reducing emissions and maintaining economic activities at the same time need a balance between economic development and environmental policy. Analysing the drivers of greenhouse gas (GHG) emissions provides useful insights for carrying out climate negotiation agreements and mitigation actions. 1 Climate change is recognized as a problem on a global scale; however, not all countries have similar structures of production, consumption, and CO 2 emissions per capita. In order to identify similar countries (in a group), key factors that relate to the production of CO 2 emissions must be identified for the countries in groups. 2 There is a rich volume of literature identifying key factors causing an increasing level of CO 2 emissions. Recent research can be found in Apergis (2016); Sadorsky (2014) and Wang and Lin (2017). 3 In a recent survey on applied energy economics research relating energy consumption-CO 2 emissions to economic growth, Smyth and Narayan (2015) emphasised the over-use of time series or panel estimation techniques in identifying key non-energy variables relating to energy variables. Decoupling global emissions and economic growth has been discussed widely in the empirical literature. 4 In the world energy arena, there is increasing interest in mitigating CO 2 emissions, either convergence patterns are established across countries and regions, while there is a significant difference across countries or regions, or alternatively clustering, as a result of the disparities in their CO 2 emissions. The recent studies on convergence with CO 2 emissions are discussed in Aldy (2006); Criado and Grether 1 CO 2 emissions is considered here as the primary source of GHG. 2 We consider CO 2 emissions, as this emission is the primary mitigating factor for climate change in the global context. Other studies include Awaworyi Churchill et al. (2018); Cai et al. (2018); Churchill et al. (2018);Mikayilov et al. (2018); Mrabet et al. (2017); and Wu et al. (2018); Zhou et al. (2018). 3 Energy use and the associated pollution has been an important research issue (see for example, Anandalingam and Bhattacharya 1985;Li et al. 2012). 4 See Schandl et al. (2016) for further discussion.

3
Environmental and Ecological Statistics (2020) 27:1-40 (2011); Strazicich and List (2003), and others. The advantage of using clustering analysis can be identified by the fact that cluster analysis aims at sorting different countries into groups in a way that the degree of association between any two countries is maximal if they belong to the same group and minimal otherwise.
In this study, we provide a rational basis for clustering groups from 72 countries following their emissions characteristics over time. While there are studies in the literature that propose clustering of locations based on air-pollution data (D'Urso et al. 2013(D'Urso et al. , 2014(D'Urso et al. , 2017Lafuente-Rego et al. 2018;Vilar et al. 2018), there is a scant literature in energy economics in using clustering methods. In particular, Sul (2007, 2009) use a club convergence method, where they develop a new algorithm to identify clusters of convergent subgroups. Their algorithm is a data-driven method that avoids ex-ante sample separation. It can be used as a general panel method to cluster objects into groups with similar transition paths. Using this approach, Bhattacharya et al. (2020); Bhattacharya et al. (2018) identify clubs based on energy efficiency across states in Australia and India respectively.
To our knowledge, there is no study based on a clustering approach applied to countries where consideration of the effects of key factors contributing to CO 2 emissions is considered. We fill the research gap in this respect. In our analysis, we also exclude China and the USA who are the largest polluters in the world, in order to identify different clusters for which the influence of these two countries is minimised. Therefore, we believe our analysis is based on a robust econometric approach and some policy suggestions are indicated in the concluding section.
We use an extended version of IPAT identity developed by Ehrlich and Holdren (1971) to establish the major drivers of emissions for the clustering analysis. 5, 6 We follow three steps. Firstly, we analyse the findings with a full version of the CO 2 model, considering CO 2 emissions as features. 7 Secondly, we follow these clusters within the CO 2 trajectory considering significant variables. Finally, we attempt to provide some policy suggestions for reducing emissions where countries can act within a cluster in a coordinated manner. It implies that, as generally accepted by proponents of cluster-based policies, various countries should not try to create clusters starting from scratch.
Section 2 covers the most significant aspects of the theoretical model used and the empirical methodologies we implement to identify key factors; we describe the measures of variables and data sources in this section. In Sect. 3, we define the various aspects of the cluster analysis, while in Sect. 4, we analyse the findings from different clusters and the dynamics of these factors over time with changing CO 2 emissions. The concluding section covers our major findings with possible policy suggestions.

3 2 Model, data and estimated parameters
The IPAT identity was developed by Ehrlich and Holdren (1971) to consider the influence of human activity on the environment (Ehrlich and Holdren 1971). A simpler version of the identity is defined as Impact (I) = Population (P) × Affluence (A) × Technology (T). Following IPAT, environmental impact (measured in CO 2 emissions) is influenced by population growth, proxies of affluence, and technology. Since its development, the IPAT Model has been widely used in the literature to analyse the influence of economic and social factors on the environment. We use here the extended version of the model. The conventional IPAT equation is an accounting identity where the coefficients of all factors have fixed units of elasticity, and it is therefore restricted for empirical testing purposes. Dietz and Rosa (1994) relaxed the setting of fixed units of elasticity and modified it with a stochastic version of the IPAT (STIRPAT) model. A convenient calculation could be achieved through logarithmic dimensionless processing of variables and, accordingly, the basic analysis framework of the STIRPAT model can be expressed as follows: where CO 2 measures CO 2 emissions per capita in metric tons; ∝ i is the constant term; NRE is the non-renewable energy intensity; RNE is the renewable energy intensity 8 ; POP is population; RGDP is the real gross domestic product per capita (measured in constant 2010 US dollar); URB is the urban population as a share of total population; SER is the share of service sector value added (as a percentage of GDP); and IND is the share of industrial sector value added (constant US dollar). Parameters s are vectors of coefficient, where s = 1, …, 8; i is the index of countries; t is the time index; u it is the error term, f t is m × 1 vector of unobserved common effects with country-specific factor loadingsi and e it are the idiosyncratic errors. We consider the logarithmic version of this model for estimation purposes following Apergis and Payne (2010), and Bhattacharya et al. (2016). These variables are extracted from the World Development Indicators. The period of analysis covers 1970-2015.

Estimations and the selection of key determinants of CO 2 emissions
Over the last two decades, the popularity of panel time-series data (large N and T) has been enhanced by the availability of international macroeconomic databases and (1) the development of estimation, which extends first-generation panel time series techniques to allow for heterogeneity in the slope coefficients. Notable among these estimation techniques are the mean group OLS (MG), developed by Pesaran and Smith (1995), and the pooled mean group (PMG), developed by Pesaran et al. (1999). The MG estimator assumes that slope coefficients, error variances, and intercepts differ across countries. In contrast, the PMG estimator allows intercepts, error variances, and short-run coefficients to vary across countries while the long-run coefficients are constrained to be the same. The criticism of these panel time-series estimations is that they tend to generate inconsistent estimates in the presence of common shocks or common factors, i.e. in the presence of cross-sectional dependence across the panel.
The correlation of the residuals across panel units may arise from unobserved common factors in the panel, or a correlation between the predictors and the residuals may be present. Biased estimates arising from the unobserved common factors in the panel unit have garnered attention among researchers. In addressing this issue of unobservable common factors, recent research has focused on either slope homogeneity (see, Bai 2009;Moon and Weidner 2015), which yields inconsistent estimates, or slope heterogeneity and cross-sectional dependence. Notable among the latter developments was the introduction of the common correlated effects (CCE) estimator developed by Pesaran (2006). This estimator uses the cross-section averages of the variables to approximate the projection space of the unobserved factors while considering the heterogeneity of the slopes. The extension of the CCE technique to include weakly exogenous (endogenous independent) variables with the lags of the dependent variable is developed by Chudik and Pesaran (2015). They establish the dynamic common corrected effects (DCCE) estimator, which increases the overall efficiency of estimations.
In this paper, we employ the dynamic common correlated effects technique across a panel of countries. The estimation involves a full panel of 72 countries with a sufficient number of observations to conduct estimations. For the period of our study, these countries emit 82% of total global CO 2 emissions (based on all countrylevel data available in the World Bank dataset). 9 The panel also has a good mixture of developed and developing countries. The advantage of this estimation is that it allows the coefficients to be obtained for the entire sample and for each of the countries within the panel. Refer to Tables B1 and B2 in the Online Supplementary Material for the summary statistics of all the explanatory variables and the summary statistics of CO 2 emissions of the 72 countries, respectively.
Using the dynamic common correlated effects estimation, we account for common shocks across the countries. Two models are examined in this preliminary analysis. The first version of the model involves all the variables as stated in Eq. 1; in the second model, we employ only the significant variables observed from the first model.
We present the results of the regression analysis for the full version of the model as well as for other versions in Table 1. The results establish non-renewable energy 1 3 intensity (NRE), population (POP), and the real gross domestic product (RGDP) as significant in driving CO 2 emissions. Non-renewable energy sources causing CO 2 emissions is established as a significant positive factor in empirical literature; for example, in a recent study with OECD countries, Shafiei and Salim (2014) find that non-renewable energy consumption has a positive and statistically significant effect on CO 2 emissions in the long-run. This effect has been a major cause worldwide, which demands that countries change their energy-mix and introduce clean technology. Population has been found to be a significant factor in studies; e.g. Cole and Neumayer (2004) and Liddle (2015), among others. The nexus between CO 2 emissions and economic income has been explored in studies testing the Environmental Kuznets Curve (EKC) Hypothesis (see Ozturk and Acaravci, 2013 and references therein).
In Table 1, for Models (1) to (1′), variables are in levels while for Models (2) to (5), we consider variables in first differences to further account for non-stationarity. In particular, • Model (1) is the full version of the model in levels for the period 1970-2015 while. • Model (1′), consists only of significant variables in levels, for this period (except for the dynamic term which is not significant in this case, but is included in all models). The sub-period (viz., 1990-2015) is chosen because the implementation of renewable energy sources was initiated in most of the countries in the early 1990s. In determining clusters in Sect. 4, we consider only the significant variables (NRE, POP RGDP, and CO 2t−1 ) in identifying the statistical strength of forming clusters. The coefficients at the country level for the first differences are used for the cluster analysis, i.e., coefficients from Models (2)-(5).
In the USA, in recent years, CO 2 emissions are on the rise, while in contrast, China has been very proactive in meeting the target in 2015 Paris agreement. In our analysis, we exclude these largest polluters for robustness checks, namely, China and the USA are excluded from Models (3) and (5) so that when comparing the clustering results with those of Models (2) and (4), respectively, we can determine if China and the USA influence the cluster membership of all other countries.

Cluster analysis
Cluster analysis is an exploratory tool to identify groupings of similar entities. While different methods can produce difference results, we select the most likely results based on optimality measures. In this paper, we use several well-known clustering methods and assess the compatibility of results between the methods. Our research relates this approach to identify CO 2 clusters using economic analysis.
Cluster analysis is used extensively in various fields of study (such as Life Sciences, Behavioural and Social Sciences, Earth Sciences and Engineering Sciences). Caiado et al. (2015); Dai and Kuosmanen (2014); Everitt et al. (2011);Anderberg (2014); Hennig et al. (2015) and Manly and Alberto (2016) are useful references for understanding the nature of and concepts about the cluster analysis. We provide a brief introduction to cluster analysis for non-specialist readers, with particular focus on the methods we use in our analyses in Sect. 4.
Cluster analysis is a multivariate method which aims to group a set of objects or cases on the basis of a set of variables into a number of mutually exclusive groups based on distances between objects. In other words, it partitions the set of objects and cases. Amongst different methods, we focus here on two popular ones: hierarchical and non-hierarchical methods.
The advantage of using clustering analysis in this context can be identified by the fact that cluster analysis aims at sorting different countries into groups, in a way that the degree of association between any two countries is maximal if they belong to the same group and minimal otherwise. While there are other multivariate statistical methods such as multidimensional scaling that can also identify groupings, the outcomes of cluster analysis are much easier to convey graphically especially when there are more than two clusters. Other statistical methods such as discriminant analysis and the k-nearest neighbours approach require known groupings of existing cases before new cases can be classified.

Distance measures 10
The choice of distance measures can have an influence on the clustering results. There are many different distance measures but the most commonly known and used distance measure is the Euclidean distance. In general, if we have p variables X 1 , X 2 ,…, X p measured on a sample of n objects of cases, the observed data for case i can be denoted by x i1 , x i2 ,…, x ip and the observed data for subject j by x j1 , x j2 ,…, x jp . The Euclidean distance between these two objects is calculated as While it has been documented that the Euclidean distance has its short-comings (see for example, Shirkhorshidi et al. 2015), the way to overcome them is to ensure that the data is first normalised before determining the distances. Another distance measure is the Manhattan distance which is calculated as As well, in this case to overcome shortcomings, data must be first normalised before determining the distances.

Hierarchical methods
Hierarchical methods involve a series of (n−1) decisions (n equals the number of objects) that combine observations into a hierarchy or a tree-like structure (refer to Everitt et al. 2011). They can be either 'agglomerative', when groups are merged, or 'divisive', when one or more groups are split at each stage. More specifically, for agglomerative methods, each object starts out as its own cluster and is successively joined based on the distance between the objects. The two most similar clusters are then joined, and this continues until only a single cluster remains. For divisive methods, all objects start in a single cluster and are successively divided (first into two, then three, and so on) until each is a single-member cluster. We apply hierarchical agglomerative clustering, in particular, the minimum variance method of Ward Jr (1963) along with average and complete linkage methods; refer to Everitt et al. (2011) for more details about these methods.
Once hierarchical clustering analysis is carried out, we select the optimum cluster solution in the next step. The literature emphasises both informal (more subjective), and formal methods (based on probability) in selecting the optimum cluster solution. An informal method is to examine a diagram called a dendrogram by which the hierarchical procedure is represented graphically. The dendrogram illustrates which clusters are joined at each stage of the analysis and the distance between clusters at the time of joining. If there is a large jump in the distance between clusters from one stage to another, this implies that at the first stage, clusters that are relatively close together are joined. In the next stage, the clusters that are joined are relatively far apart. This implies that the optimum number of clusters may be the number of clusters present immediately before the large jump in distance occurred. Formal methods include the determination of coefficients based on distances such as the Euclidean or Manhattan distances. One of these methods, the silhouette method will be used. In our discussion, we report the hierarchical clustering results that are based on Euclidean distances.

Non-hierarchical methods
In contrast to hierarchical methods, non-hierarchical methods do not involve a treelike construction process. Instead, they assign objects into clusters once the number of cluster is specified. The first task is to identify starting points known as cluster seeds for each cluster. A cluster seed may be pre-specified, or objects may be randomly selected as seeds. We use two popular non-hierarchical methods, viz., the k-means and k-medoids.
The k-means method (Everitt et al. 2011) has been effective in producing good clustering results for many practical applications. The method operates through the following steps. Firstly, initial cluster centres (cluster seeds) are selected from among the objects, each of these objects then forms a cluster of one, and its centre is the value of the variables of that object. 11 Secondly, each object is assigned to its 'nearest' cluster defined in terms of the distance to the centroid. The centroids which are the means of the clusters are then determined. Finally, the distance from each object to each centroid is recalculated, and objects that are not in the cluster are moved to the clusters they are closest to. This process continues until the centroid remains relatively stable. In practice, this is achieved by repeating the algorithm a large number of times using software programs such as MATLAB and R.
Whereas for the k-mean method, the mean of all the objects serves as the centre or prototype of a cluster, the k-medoids method selects an observed data point as centre of each cluster. This observed data point in each cluster is referred to as medoid which serves as the prototype of the cluster whose average dissimilarity to all the objects in the cluster is minimal, that is, it is the most centrally located point in the cluster. The k-medoids method is more robust to outlier data compared to the k-means method. The most common algorithm for implementing the k-medoids method is partitioning around medoids (PAM). Refer to Kaufman and Rousseeuw (2009) for details about the algorithm.
In our discussion, we will report k-means and k-medoids clustering results that are based on Euclidean distances.

Identifying the optimal number of clusters
There are several methods that can be used to identify the optimal number of clusters. We consider the average silhouette coefficient method. The silhouette method is used to find the strength of clusters and is based on distances. The number of clusters that gives the largest average silhouette coefficient is considered to be the optimal cluster solution; however, average silhouette coefficient values of less than 0.5 imply a weak separation of clusters. Kaufman and Rousseeuw (2009) give a detailed discussion on silhouette coefficients and plots.

Compatibility of cluster solutions
Since we obtain cluster solutions using different methods, we would like to assess the compatibility of these solutions. In what follows, we use a measure, viz., the Adjusted Rand Index. The Rand Index developed by Rand (1971) which is a measure of similarity, allows the unique evaluation of hard (so-called crisp or non-fuzzy) clustering partitions. The Rand Index is defined as the number of pairs of objects that are either in the same group or in different groups in both partitions divided by the total number of pairs of objects. The Rand Index lies between 0 and 1. When two partitions agree perfectly, the Rand Index achieves the maximum value 1. A problem with the Rand index is that the expected value of the Rand Index between two random partitions is not a constant. This problem is corrected by the Adjusted Rand Index which assumes the generalized hyper-geometric distribution as the specification of randomness. The Adjusted Rand Index has the maximum value 1, and its expected value is 0 in the case of random clusters. A larger Adjusted Rand Index means a higher agreement between two partitions. Hubert and Arabie (1985) have a detailed discussion on this index.

Determination of clusters
To determine how countries group together based on significant explanatory variables driving CO 2 emissions in the model, we apply cluster analysis to the standardised coefficients (betas) of all significant explanatory variables i.e., non-renewable energy, population, the real GDP, and lagged CO 2 emissions, from various models in Table 1. Comparisons across various model specifications are made to identify if the study period and the largest emitters have any influence in forming different clusters. Comparisons are made between: • Models (2) and (3) to determine the impact the largest CO 2 emitters, China and the USA have on the grouping of the countries for the period 1970 to 2015. • Models (2) and (4) to determine whether there is a substantial difference with the contents of clusters for the full period  and sub-period (1990-2015), considering the full panel of countries. • Models (4) and (5) to determine the impact the largest CO 2 emitters have on the grouping of the countries for the sub-period . • Models (3) and (5) to determine if excluding China and the USA, results in a substantial difference with the contents of clusters for the full and sub-period of analysis.
We consider different models to identify if the presence of the largest emitters and various periods may have any influence in determining clusters.
The sub-period  is chosen because the implementation of renewable energy sources was initiated in most of the countries in the early 1990s.
In each case we implement the following steps: • Apply three hierarchical methods: Ward's, average and complete linkage methods. • Apply two non-hierarchical methods: k-means and k-medoids. • Identify the possible optimal number of clusters for each method using the average silhouette coefficient. • Assess the compatibility of the solutions across the four methods using the Adjusted Rand Index. Following that, we focus on the k-means method and one hierarchical method that is most compatible with the k-means method. • Profile the cluster solutions based on summary statistics of the total CO 2 emissions per capita; henceforth we will refer to these summary statistics as features. Profiling will be undertaken by comparing the distribution of each of the features across the identified number of clusters. This is identified by examining boxplots.
However, to determine whether significant drivers of CO 2 emissions can lead to more meaningful clusters compared to clusters formed from total CO 2 emissions only, we cluster only total CO 2 emissions based on the summary statistics (features) of the series for each of Models (2) to (5).

Model 2: All countries, 1970-2015
In this sub-section, we consider the total CO 2 emissions as well as the significant explanatory variables for all countries for the full period of our analysis.

Cluster analysis with only CO 2 features
For this purpose, we consider maximum, minimum, mean, variance, skewness, and kurtosis for the CO 2 per capita for each country. From the dendrogram of Ward's method in Figure A1, 12 we can identify five possible cluster solutions. The average silhouette coefficients for each method are shown in Table 2 from where it is apparent that the 3-cluster solution is optimal for all methods.
On assessing the compatibility of the 2-cluster and 3-cluster solutions from the Adjusted Rand Indices 13 (Table A5) across the three hierarchical methods and two non-hierarchical methods we observe 100% compatibility between all five methods  for the 2-and 3-cluster solutions. The results for the 3-cluster solution as shown in Table A1. 14 The findings are identical across five methods we consider here. It is noticeable that Cluster 2 consists of only China, Cluster 3 consists only the USA, and the remaining 70 countries belong to Cluster 1.

Profiling of k-means cluster solution
Based on the above analysis, we profile the 3-cluster solutions on each of the CO 2 features that were used to obtain the clusters. They are displayed in boxplots in Fig. 1. A boxplot summarises information about the distribution of a variable. It plots the summary statistics, median (Q2), represented by the horizontal line in box, the first quartile (Q1), third quartile (Q3) which is represented by the lower and upper edges of the box, the maximum and minimum values represented by horizontal lines on either side of the box, commonly referred to as the whiskers. The median indicates the level of the distribution, and the width of the box indicates the spread of the distribution. If the distribution has outliers, i.e., points away from the upper or lower edges of the box, then each such value will be represented by a cross (+). From the boxplots, it is clear that the distribution of each feature is quite different for each cluster. Clusters 2 and 3 have single members, Cluster 2 is with China only, Cluster 3 is with the USA, while Cluster 1 consists of all other countries. China has the highest maximum emissions over the 46 years between 1970 and 2015 while the USA has the highest minimum and mean emissions over the 46 years. China has the largest variation and relative to the mean this variation is 80% compared to that of 17% for the USA. 15,16 There are also distinct differences in the distribution for each of these countries following the skewness and kurtosis. Therefore, it is apparent that the overall CO 2 features yield meaningful separation for our panel of countries into three clusters.

Clustering with key significant variables contributing to CO 2 emissions
We now determine how the cluster solution based only on the features of CO 2 emissions compares to that of the cluster solution based on the coefficients of the key significant variables contributing to CO 2 emissions for the period 1970-2015. From the dendrogram of Ward's method in Figure A2, it seems possible to identify 2, 3, and 4 cluster solutions. The average silhouette coefficients for 2 to 4 clusters are given in Table 3. Since the 2-cluster solution produces the maximum average coefficient for all methods, it may be considered to be the optimal solution. On assessing the compatibility of the 2-cluster solution from the Adjusted Rand Indices across the five methods (Table A6), we can observe that for the 2-cluster solution, there is 100% compatibility between the average and complete linkage methods, and of the hierarchical methods, Ward's method is most compatible with the two non-hierarchical methods (68.7% and 68.4%). In addition, it is noticeable that there is very good compatibility between two non-hierarchical methods (78.3%).
The cluster solutions for the Ward and k-means methods are given in Table 4. We notice for the 2-cluster solution, the four countries, viz., Japan, Denmark, Sweden, and Sri Lanka switch clusters between these two methods. It does appear that the combined impact of non-renewables, population, real GDP, and lagged CO 2 emissions on CO 2 emissions tends to reasonably separate the countries into two clusters. Each cluster contains a mixture of developed and developing countries, but the Cluster 1 solution for each method contains a much higher percentage of developing countries than Cluster 2; for example, Ward's Method (Cluster 1: 92% of developing countries, Cluster 2: 29% of developing countries), k-means Method (Cluster 1: 90% of developing countries, Cluster 2: 33% of developing countries). In both cases, the largest emitters like China and the USA are in Cluster 1.
While the contents of each cluster in this case are not necessarily consistent across all the methods considered, as indicated in the compatibility percentages, these results are clearly different from just clustering of the features of the CO 2 emissions as indicated in Figure A1, indicating the usefulness of including the impact of the significant explanatory variables on CO 2 emissions. It is interesting to note that in this case, China and the USA are also in different clusters.

Profiling of k-means cluster solution
Based on the above analysis, we profile this 2-cluster solution on each of the CO 2 features. Boxplots of the 2-cluster solution are given in Fig. 2. We observe that except of a few outliers, the levels and spreads of distributions of the maximum, minimum, mean and variance of Cluster 1 are similar to that of Cluster 2, However, the level of the distribution of skewness of Cluster 1 is greater than that of Cluster 2 but the spread of this distribution for Cluster 2 is slightly larger than that of Cluster 1, whereas, the level of the distribution of kurtosis of Cluster 2 is greater than that of Cluster 1, but the spread of this distribution for Cluster 2 is slightly larger than that of Cluster 1. Similar observations were made when the 2-cluster solution from Ward's method was profiled.
Environmental and Ecological Statistics (2020) 27:1-40  In this part of the analysis to determine if China and the USA have an influence on cluster membership, we consider the total CO 2 emissions as well as the significant explanatory variables for all countries, except China and the USA for the period 1970 to 2015.

Cluster analysis with only CO 2 features
We consider maximum, minimum, mean, variance, skewness, and kurtosis for the CO 2 per capita for each country except China and the USA. From the dendrogram of Figure A3 based on Ward's method, we can identify five possible cluster solutions. The average silhouette coefficients for each method are shown in Table 5 where it is clear that the 2-cluster solution is optimal for all methods.  On assessing the compatibility of the 2-cluster solution from the Adjusted Rand Indices (Table A7) across the five methods, we observe that for the 2-cluster solution there is 100% compatibility between Ward's and the k-means methods, and between the average and complete linkage methods. We also observe very good compatibility between Ward's and the k-medoids methods (83.6%) and between the k-means and k-medoids methods (83.6%).
The results of the 2-cluster solution are presented in Table A2. The findings are identical across Ward's and the k-means methods. Cluster 2 contains France, India, Japan, and the United Kingdom; surprisingly, these four countries are forefront in the world in implementing nuclear energy resources as a major source of energy mix or until recent years (in the case of Japan). Cluster 1 includes all remaining 68 countries. Previously the inclusion of China and the USA, the two largest emitters of CO 2 , would have masked the grouping of the countries in Cluster 2.

Profiling of k-means cluster solution
We now profile the 2-cluster solutions on each of the CO 2 features that were used to obtain the clusters. They are displayed in boxplots in Fig. 3. The level of the distribution of the maximum, minimum, mean and variance of emissions for Cluster 2 is at a much higher level than that of Cluster 1, with the spread of the distribution of minimum, mean and variance being larger for Cluster 2 than for Cluster 1. The distribution of skewness of Cluster 2 has a slightly lower level than that of Cluster 1, but it has greater spread. The distribution of kurtosis of Cluster 2 has a higher level than that of

Clustering with key significant variables contributing to CO 2 emissions
We now determine how the cluster solution based only on the features of CO 2 emissions compares to that of the cluster solution based on the coefficients of the key significant variables contributing to CO 2 emissions for the period 1970-2015, when China and the USA are excluded from the analysis. From the dendrogram of Ward's method in Figure A4, it seems possible to identify 2, 3, and 4 cluster solutions. The average silhouette coefficients for 2 to 4 clusters are given in Table 6. Since the 2-cluster solution produces the maximum average coefficient for all methods, it may be considered to be the optimal solution. We now assess the compatibility of the 2-cluster solutions across the four methods. From the Adjusted Rand Indices (Table A8), we observe that for the 2-cluster solution, there is good compatibility between the average and complete linkage methods (85.7%), and of the hierarchical methods, Ward's method is the most compatibility with non-hierarchical methods (63% and 68%). There is also good compatibility between the k-means and k-medoids methods (83.3%).
The cluster solutions for the Ward and k-means methods are given in Table 7 from where we observe for that four countries switch clusters between these two methods. It does appear as was the case Model 2, that the combined impact of lag 1 of CO 2 emissions, non-renewables, population, and real GDP on CO 2 emissions for Model 3 tends to reasonably separate the countries into two clusters. Each cluster contains a mixture of developed and developing countries, but the Cluster 1 solution for each method contains a much higher percentage of developing countries than Cluster 2; Ward's Method (Cluster 1: 92%, Cluster 2: 24%), K-means Method (Cluster 1: 91%, Cluster 2: 29%). Cluster 1 has more countries with high emitters than Cluster 2. The sources of increasing CO 2 emissions in recent years occurred in developing countries, where population, urbanization, and economic growth have been the main driving forces as suggested in Nejat et al. (2015).
While the contents of each cluster in this case are not necessarily consistent across all the methods considered, these results are clearly different from just clustering of the features of the CO 2 emissions as observed in Table A3, indicating again the usefulness of including the impact of the significant explanatory variables on CO 2 emissions.

Profiling of k-means cluster solution
Boxplots of the 2-cluster solution are given in Fig. 4. We observe that the levels of distributions of the maximum, minimum and mean of Cluster 2 are slightly higher than that of Cluster 1, whereas the spread of these distribution is slightly wider for Cluster 2 than for Cluster 1. The level and spread of the distribution of the variance of both clusters are approximately the same. The level of the distribution of skewness of Cluster 1 is higher than that of Cluster 2 with the spread of this distribution being approximately the same for both clusters, whereas the level of the distribution of kurtosis of Cluster 2 is higher than that of Cluster 2 with the spread of this distribution being approximately the same for both clusters. Similar observations were made when the 2-cluster solution from Ward's method was profiled.

Comparison of Models 2 and 3
On comparing the Model 3 cluster solution ( Table 7) with that of Model 2 (Table 4), we observe that removing China and the USA from the analysis for the period 1970-2015, has not had much of an impact on the composition of the clusters. Two of the four countries that appear together in the 2-cluster solution of the CO 2 emissions appear together in each of the clusters in this Model 3 cluster solution (Table 7). However, the profiling of cluster solution for Model 3 indicates slight differences in the CO 2 feature levels and spreads between the clusters as is evident as compared to that of Model 2; refer to the boxplots for Models 2 and 3. When China and the USA were included in the analysis in Model 2 (members of Cluster 1 and 2 respectively), it appears that their extremely high CO 2 emissions have masked any difference between the levels of the distributions of the minimum, maximum and means CO 2 emissions between the two clusters (Fig. 2). This result implies that in formulating coordinated energy policies in combating emissions, the largest emitters should act together with other countries.

Model 4: all countries, 1990-2015
In this part of analysis, we consider the total CO 2 emissions as well as the significant explanatory variables according to Model 4 for the period 1990 to 2015. We wish to determine if there is any change in cluster composition for this later period compared to the entire period (1970 to 2015) we considered in previous section.

Cluster analysis with only CO 2 features
We consider maximum, minimum, mean, variance, skewness, and kurtosis for the total CO 2 emissions for each country for the period 1990-2015. From the dendrogram of Ward's method in Figure A5, we can identify five possible cluster solutions. The average silhouette coefficients for each method are shown in Table 8 from where it is clear that the 2-cluster solution is optimal for the k-means and Ward's methods, but the 3-cluster solution is optimal for the average, complete and k-medoids methods.
For the 2-cluster solution, we observe from Adjusted Rand Indices (Table A9) that there is 100% compatibility between every pair of methods whereas for the 3-cluster solution, there is 100% compatibility between Ward and k-means methods, Average and complete linkage methods, average linkage and k-medoids methods, complete linkage and k-medoids methods.
We will proceed with the 2-cluster solution which is identical for all methods. We observe from the 2-cluster solution Table A3 that China and the USA group together in one cluster while all the other countries do so in the other cluster. Also, if we consider a 3-cluster solution, China and USA still appear together in one cluster. It is interesting to note here that for this period 1990-2015, China and USA appear together in a cluster while for the period 1970-2015, they appear in separate clusters. Our findings appear to be consistent with the recent initiatives both countries have implemented both changing energy-mix and improving energy technology in combating emissions. Bhattacharya et al. (2015) and Ahmed et al. (2016) discuss

Profiling of k-means cluster solution
We now profile the 2-cluster solutions on each of the CO 2 features that were used to obtain the clusters. They are displayed in boxplots in Fig. 5. The level of the distribution of the maximum, minimum, mean and variance of emissions for Cluster 2 which contains China and the USA is at a much higher level than that of Cluster 1, with the spread of these distributions being much wider for Cluster 2 than for Cluster 1. The distribution of skewness of Cluster 2 is approximately at the same level as that of Cluster 1, but it has a greater spread. The distribution of kurtosis of Cluster 2 has a higher level than that of Cluster 1, but the spreads are not all that different. Similar observations were made when the 2-cluster solution from Ward's method was profiled.

Clustering with key significant variables contributing to CO 2 emissions
We now determine how the cluster solution based only on the features of CO 2 emissions compares to that of the cluster solution based on the coefficients of the key significant variables contributing to CO 2 emissions for the period 1990-2015.
From the dendrogram of Ward's method in Figure A6, it seems possible to identify 2, 3, and 4 cluster solutions. The average silhouette coefficients for 2 to 4 clusters are given in Table 9. Since the 2-cluster solution produces the maximum average coefficient for all methods, it may be considered to be the optimal solution. We now assess the compatibility of the 2-cluster solutions across the five methods. We observe from the Adjusted Rand Indices (Table A10) there is good compatibility between Ward's and complete linkage methods (0.831) and of the hierarchical methods, the complete linkage method is most compatibility with the k-means and k-medoids methods (78%, 77.1%).
The cluster solutions for the complete linkage and k-means methods are given in Table 10 from where we observe for the 2-cluster solution, the four countries, viz., Bangladesh, Italy, Spain, and Uruguay, switch clusters between these two methods. It does appear that the combined impact of lag 1 of CO 2 emissions, nonrenewables, population, and real GDP on CO 2 emissions tends to reasonably separate the countries into two clusters. Each cluster contains a mixture of developed and developing countries, but the Cluster 1 solution for each method contains a much higher percentage of developing countries than Cluster 2; Complete linkage Method (Cluster 1: 89%, Cluster 2: 33%), k-means method (Cluster 1: 92%, Cluster 2: 27%). Our findings are consistent across various models in finding Cluster 1 with a high percentage of developing countries compared to Cluster 2.
While the contents of each cluster in this case are not necessarily consistent across all the methods considered, as indicated in the compatibility percentages, these results are clearly different from just clustering of the features of the CO 2 emissions as observed in Table A5 indicating the usefulness of including the impact of the significant explanatory variables on CO 2 emissions. It is interesting to note that in this case, China and the USA are in different clusters.

Profiling of k-means cluster solution
We now profile these 2-cluster solutions on each of the CO 2 features. Boxplots of the 2-cluster solution are given in Fig. 6. We observe that except of a few outliers, the levels and spreads of distributions of each of the maximum, minimum, mean and variance of Cluster 1 are similar to that of that of Cluster 2, However, the level of the distribution of skewness of Cluster 1 is greater than that of Cluster 2 but the spread of this distribution fare similar for both clusters. Except for a few outliers, the level and spread of the distribution of kurtosis are quite similar. Similar observations were made when the 2-cluster solution from the complete linkage method was profiled.

Comparison of Models 2 and 4
Comparing the findings from Tables 10 and 4 allows us to determine if differences in period have significantly caused changes in the structure of the clusters. Comparing complete linkage 2-cluster solution, we   , while only one country Cameroon has disappeared from Cluster 1 when we shift the time from 1970-2015 to 1990-2015. We notice, the changes over time, has had some impact on the structure of the clusters.

Model 5: all countries except China and USA, 1990-2015
In this sub-section, we consider the total CO 2 emissions as well as the significant explanatory variables for all countries, except China and the USA for the period 1990 to 2015.

Cluster analysis with only CO 2 features
We now consider maximum, minimum, mean, variance, skewness, and kurtosis for the CO 2 for each country except China and the USA, for the period 1990-2015. From the dendrogram using Ward's method in Figure A7, we can identify four possible cluster solutions. The average silhouette coefficients for each method are shown in Table 11 from where we observe that the 2-cluster solution is optimal for the k-means, Ward and average linkage methods while the 3-cluster solution is optimal for the complete linkage and the k-medoids methods. The results of the compatibility analysis from the Adjusted Rand Indices (Table A11) between the three hierarchical methods and the two non-hierarchical methods show that there is 100% compatibility between Ward's and complete linkage method, Ward's and the k-means method, and between Complete linkage and k-means methods for the 2-cluster solution. For the 3-cluster solution there is an 81.8% compatibility between Ward's and the k-means methods and 100% compatibility between complete linkage and the k-medoids methods. For all other pairs, there is very poor compatibility. We will proceed with the 2-cluster solution which has better compatibility outcomes than the 3-cluster solution.
The results of the 2-cluster solution are presented in Table A4. The findings are identical across Ward's and the k-means methods. Cluster 2 contains France, India, Japan, and the United Kingdom, and Cluster 1 includes all remaining 68 countries. This solution is exactly the same cluster solution for the 1970-2015 period when China and the USA were excluded. As before it appears that the inclusion of China and the USA, the two highest emitters of CO 2 , has masked the grouping of the countries in Cluster 2. All of these four countries viz. France, India, Japan, and the United Kingdom have taken significant initiatives to move from non-renewable energy resources (particularly in Nuclear energy) during this time.

Profiling of k-means cluster solution
We now profile the 2-cluster solutions on each of the CO 2 features that were used to obtain the clusters. They are displayed in boxplots in Fig. 7. The level of the distribution the of maximum, minimum, mean and variance of emissions for Cluster 2 is at a much higher level than that of Cluster 1, with the spread of the distribution of minimum, mean and variance being larger for Cluster 2 than for Cluster 1. The distribution of skewness of Cluster 2 has a slightly lower level than that of Cluster 1, but it has greater spread. The distribution of kurtosis of Cluster 2 has a higher level and wider spread than that of Cluster 1. Similar patterns were observed when the 2-cluster solution from Ward's method was profiled.

Clustering with key significant variables contributing to CO 2 emissions
We will now determine how this cluster solution based only on the features of CO 2 emissions compares when the clusters are based on the coefficients of the key significant variables contributing to CO 2 emissions. We exclude China and USA from the analysis and the considered time period is 1990-2015. From the dendrogram of Ward's method in Figure A8 it seems possible to identify 2, 3, and 4 cluster solutions. The average silhouette coefficients for 2 to 4 clusters are given in Table 12. The 2-cluster solution produces the maximum average silhouette coefficient for all methods except the k-medoids method. The k-medoids method produces the maximum average silhouette coefficient for the 3-cluster solution but given all the other methods produces much lower average silhouette coefficient, we will proceed with the 2-cluster solution as the optimal solution. We now assess the compatibility of the 2-cluster solutions across the five methods.
From the Adjusted Rand Indices (Table A12) from observe that for the 2-cluster solution, there is poor compatibility between all of the pairs except for that between Ward's and the k-means methods (71.8%).
The cluster solutions for the Ward and k-means methods are given in Table 13. We notice that three countries switch clusters between these two methods. It does appear that the combined impact of lagged CO 2 emissions, non-renewables, population, and real GDP on CO 2 emissions for Model 5 tends to reasonably separate the countries into two clusters. Each cluster contains a mixture of developed and   developing countries, but the Cluster 1 solution for each method contains a much higher percentage of developing countries than Cluster 2; Ward's Method (Cluster 1: 90%, Cluster 2: 22%), K-means Method (Cluster 1: 92%, Cluster 2: 26%). These findings are similar to our previous observations. While the contents of each cluster in this case are not necessarily consistent across all the methods considered, as indicated in the compatibility percentages these results are clearly different from just clustering of the features of the CO 2 emissions as observed in Table A4, indicating again the usefulness of including the impact of the significant explanatory variables on CO 2 emissions.

Profiling of k-means cluster solution
We now profile the 2-cluster solutions on each of the CO 2 features. Boxplots of the 2-cluster solution are given in Fig. 8. We observe that the levels of distributions of the maximum, minimum, and mean of Cluster 2 are slightly higher than that of Cluster 1. The spread of the distribution the minimum and mean are slightly wider for Cluster 2 than for Cluster 1. Except for a few outliers, the level and spread of the distribution of the variance of both clusters are approximately the same. The level of the distribution of skewness of Cluster 1 is higher than that of Cluster 2 with the spread of this distribution being a bit narrower for Cluster. The level of the distribution of kurtosis of Cluster 2 is higher than that of Cluster 2 with the spread of this distribution being approximately the same for both clusters. Similar observations were made when the 2-cluster solution from Ward's method was profiled.

Comparison of Models 4 and 5
Here we compare the Model 5 cluster solution (Table 13) with that of Model 4 (Table 10), we observe that removing China and the USA from the analysis for the period 1970-2015, has not had much of an impact on the composition of the clusters. Three of the four countries that appear together in the 2-cluster solution of the CO 2 emissions appear together in Cluster 1 for this Model 5 cluster solution (Table 13). However, the profiling of this cluster solutions indicates slight differences in the CO 2 feature levels and spreads between the clusters as is evident when comparing the boxplots in Figs. 6 and 8. Clearly, when China and the USA were included in the analysis in Model 4, it appears that inclusion of China and the USA as the largest emitters may have masked any difference between the levels of the distributions of the minimum, maximum and means CO 2 emissions between the two clusters (in Figure A3).

Comparison of Models 3 and 5
Finally, we compare the Model 3 cluster solution (Table 7) with that of Model 5 (Table 13). Model 3 which covers the full period, while Model 5 is for 1990-2015. In both cases, we exclude China and the U.S.A. for analysis. For complete linkage 2-cluster solution, Chile, Cyprus, Sri Lanka, Sweden shifted from Cluster 1 to Cluster 2, while Pakistan, Uruguay, Bangladesh, Spain, Denmark, UK, and the Congo Democratic Republic shifted from Cluster 2 to Cluster 1. For the k-means 2-cluster solution, Congo Democratic Republic, Congo Republic, Mexico, Pakistan shifted from Cluster 2 to Cluster 1. These changes reflect the selection of period has a significant influence in cluster formation. The findings reflect the presence of China and the U.S.A has an influence on the structure of the cluster. Here, there is a significant shift of countries from Cluster 1 to Cluster 2 and vice versa when you exclude the largest emitters from the sample.

Clustering and de-carbonization
It seems reasonable to summarise that over the period 1970 to 2015, there are differences in the levels of CO 2 emissions between the identified clusters of countries considering combined effects of the three key determinants (non-renewables, population, and real GDP). While we have more than one solution in each case, the findings can guide us in forming clusters based on the relationship between CO 2 emissions and each of these key determinants influencing CO 2 emissions. The findings of the cluster analysis considering total CO 2 emissions alone revealed that the 72 countries separated into three distinct clusters, two single-membered countries with China and the USA and the other containing the remaining 70 countries within our panel. This is to be expected given that China and the USA are the largest CO 2 emitters, with differing maximum, minimum, and mean CO 2 emissions over the 46 years from 1970 to 2015. One clear observation from our analysis is that clustering separation is different when we consider key determinants of CO 2 emissions compared to the clustering separation we obtain only with CO 2 features.
Overall, each of these three key determinants-non-renewables, population, and real GDP play different roles across countries in forming clusters and their combined effects on CO 2 emissions are different across countries; therefore, clusters form with different members. Findings from our full panel with entire time reflect that China and the USA in isolation should take the lead role in reducing overall CO 2 emissions while the other countries in our panel follow suit. To test the validation of this claim we have excluded China and the U.S. and considered sample period as 1990-2015. Various specifications result in a different structure of clusters, albeit an optimum number of clusters remain the same.

Conclusions and policy implications
Our primary objective of this research is to apply cluster analysis in examining a stochastic version of the IPAT model to decompose per capita emissions with three major components: population, affluence, and technology. We establish stable groups of countries forming clusters; based on the factors we identify to influence CO 2 emissions. The three key factors we identified are non-renewables, population, and real GDP. We summarise the key messages from this research in the next few paragraphs.
Firstly, our findings reflect that these factors can influence the polluting behaviour of individual countries differently. Secondly, each cluster is characterized by the fact that its CO 2 emissions are, to a significant extent, linked to a particular explanatory variable. This also links the differentiation between each cluster from the others. Each cluster therefore, may have a distinct role in contributing towards overall CO 2 emissions, identified for specific clusters of countries. This finding indicates a need for distinct policies and actions for the various clusters.
In future work, we will consider the use of the Mahalanobis distance measure which takes into account the correlation between the clustering variables. We will also consider fuzzy clustering to determine overlapping clusters. This may give us some insight into which countries could belong to more than one cluster based on the C0 2 features and based on the variables contributing to CO 2 levels.
We have identified three key determinants, non-renewable energy, population, and real GDP, which play a significant role in increasing CO 2 emissions and forming the clusters. Economic growth will increase in many countries, particularly the Asian, African, and South American continents, in the coming decades. Therefore, economic policies combating population, and simultaneous reduction of the use of non-renewable energy sources, can mitigate CO 2 emissions both for individual countries and for the rest of the world. In this respect, countries within similar clustering may coordinate to implement policies in reducing emissions and enhance economic growth. With changes in economic, demographic, and energy policies, along with other countries, the role of China and the USA will be significant in forecasting the overall CO 2 emissions trajectory. Our analysis also emphasises the roles of time and the largest emitters in controlling future emissions. Future research should include the role of trade and other variables such as political and social factors in explaining CO 2 clusters. The detail input-output data are needed across a large panel of countries to incorporate this analysis.

3
Author contributions All authors contributed to the development of ideas and methods. JI collected the data and wrote the code in STATA for the generation output of the models; EAM wrote the code in MAT-LAB for the generation of the cluster analysis output; all authors contributed to the writing the paper and editing it to arrive at the final draft.