An empirical look at the nature index

In November 2014, the Nature Index (NI) was introduced (see http://www.natureindex.com) by the Nature Publishing Group (NPG). The NI comprises the primary research articles published in the past 12 months in a selection of reputable journals. Starting from two short comments on the NI (Haunschild & Bornmann, , ), we undertake an empirical analysis of the NI using comprehensive country data. We investigate whether the huge efforts of computing the NI are justified and whether the size‐dependent NI indicators should be complemented by size‐independent variants. The analysis uses data from the Max Planck Digital Library in‐house database (which is based on Web of Science data) and from the NPG. In the first step of the analysis, we correlate the NI with other metrics that are simpler to generate than the NI. The resulting large correlation coefficients point out that the NI produces similar results as simpler solutions. In the second step of the analysis, relative and size‐independent variants of the NI are generated that should be additionally presented by the NPG. The size‐dependent NI indicators favor large countries (or institutions) and the top‐performing small countries (or institutions) do not come into the picture.


Introduction
Currently, there are five major international university rankings worldwide: (a) Academic Ranking of World University (ARWU), (b) Times Higher Education World University Rankings (THE Rankings), (c) QS World University Rankings, (d) Leiden Ranking by the Centre for Science and Technology Studies (CWTS), and (e) SCImago Institutions Ranking. Whereas the first three rankings use very different indicators to rank universities, the last two use a set of bibliometric indicators only. An overview of these (and other university rankings) can be found in several publications: Saf on (2013), Dill and Soo (2005), Buela-Casal, Guti errez-Mart ınez, Berm udez-S anchez, and Vadillo-Muñoz (2007), Aguillo, Bar-Ilan, Levene, and Ortega (2010), and Rauhvargers (2011). These overviews describe-among other things-the different indicators used and compare the ranking results based on the different ranking methods.
In November 2014, the Nature Index (NI) was introduced (see http://www.natureindex.com) (Campbell & Grayson, 2014). According to Campbell and Grayson (2015) the Nature Publishing Group (NPG) "does not intend the Nature Index to be a ranking and have quite deliberately not referred to it as such anywhere" (p. 1831). However, the NI allows exactly this: a ranking of worldwide institutions and countries based on their publication output in selected journals (see http://www.natureindex.com/country-outputs and http:// www.natureindex.com/institution-outputs). Thus, the NI should be discussed against other possibilities of measuring the performance of institutions or countries. The aim of this paper is to empirically study the NI as a new ranking method in more detail and to provide possible improvements. In a first step of analysis, we correlate the NI with other metrics that are simpler to generate than the NI. In a second step, relative and size-independent variants of the NI are generated that should be additionally presented by the NPG.

The Nature Index
The NI comprises primary research articles published in the past 12 months in a selection of reputable journals. The list of 68 journals is the result of 68 panelists being asked to name the journals (maximal n 5 10) in which they would want to publish their best research articles. Two panel chairs signed off on the final list. A confirmation attempt was made by sending out an online questionnaire to 100,000 scientists in the life, physical, and medical sciences. They also were asked to name their 10 most preferred journals. The panel chairs recorded a response rate of 2.8%. Overall, Campbell and Grayson (2014) reported "a high degree of convergence between the panel and survey outputs for the most popular journals" (p. S52). The NI contains three quantities: the raw article count (AC), the fractional count (FC), and the weighted fractional count (WFC). The AC is obtained from counting all primary research articles published in the past 12 months in the NI journals. The fractional count weights the individual primary research article according to the number of coauthors (e.g., if three scientists from the United States and two scientists from Japan published one paper in an NI journal, this paper is counted as 3/ 5 for the United States and 2/5 for Japan). The WFC is supposed to account for the fact that papers from journals in the field astronomy and astrophysics are approximately five times as numerous as papers from other fields in the NI. Therefore, papers from the field astronomy and astrophysics are weighted with a coefficient of 0.2.
Recently, we started the discussion about the NI with two comments (Haunschild & Bornmann, 2015a, 2015b. One of our discussion starters has sparked a reply (Campbell & Grayson, 2015).

Methods
On February 10, 2015, we saved the country tables from the NI for the publication period January 1, 2014 to December 31, 2014. In June 2015, the NPG published the NI supplement 2015, including a table with the top 100 countries (http://www.nature.com/nature/journal/v522/n7556_supp/full/522S34a.html). The comparison of both tables shows small deviations. For example, in February the United States had an AC 5 26,631 and in June an AC 5 26,638. The NPG seems to continuously update their older data. The deviations between both points in time are so small for all countries that we decided to use the data from February. A further advantage of these data is that the FC is included (in the NI supplement table it is not).
Reference data for the full publication output were taken from the in-house database of the Max Planck Society (MPG) based on the Web of Science (WoS) and administered by the Max Planck Digital Library (MPDL). Because we do not have reliable data in the database on the institutional but on the country level, the study focuses on countries. Also, the period of analysis is January 1, 2014 to December 31, 2014.
In this study, only countries with at least n 5 1,000 papers and an AC of at least 30 are considered. Although the study focuses on 1 year only (2014), we would like to produce results that can be generalized for neighboring years.
In case of small publication sets for a country, large annual variations of indicator values can be expected (Levy & Lemeshow, 2008). Thus, one needs larger publication sets to obtain results that might be also valid for neighboring years.

Results
In the next subsection we compare the NI with other metrics, namely, more simple solutions. Costly generated metrics (as the NI) should not correlate very high with more simple solutions. Otherwise the efforts are not justified. In the next subsection but one we recommend to complement the size-dependent results of the NI with size-independent results. For example, the Leiden Ranking (www.leidenranking.com) also provides both perspectives. The sizedependent results are mainly caused by the total publication output of a country or institution.

Comparison of NI With Other Metrics
The NPG made significant efforts to generate the NI. Two panels were constituted that selected the reputable journals. Furthermore, a comprehensive survey was performed to validate the selection of the panels. These huge efforts are justified only if the NI does not correlate with simpler solutions (metrics). In the case of high correlations, one could question these efforts. In this study we produced three metrics on the country level that are relatively easy to produce.
1. Total number of papers (Np). Here, a country's number of papers with the WoS document type "article" is counted. For some journals, the NPG selects some articles as primary research articles. This is not reproducible in an automated manner in our database. We consistently obtain too high values for the AC, but we obtain a nearly perfect correlation (r 2 5 0.9985) between our AC and the official AC by the NPG. 2. AvgAC: We generated five different random NIs. From each journal in the WoS that published papers in 2014 (N 5 12,102), 68 journals were randomly selected. This procedure was repeated five times, resulting in five different (random) ACs for each country. This procedure gave rise to five random NIs with five different (random) ACs for each country. We computed the mean value over the five random ACs, which yielded our (random) AvgAC. 3. Q1 JIF : In the SCImago Institutions Ranking (SIR), an indicator is considered that reflects the reputation of the journals in which an institution has published. Q1 is the ratio of papers that an institution publishes in the most influential scholarly journals of the world. The most influential journals are those that ranked in the first quartile (25%) of their subject areas (journal sets) as ordered by the SCImago Journal Rank (SJR) indicator (Gonzalez-Pereira, Guerrero-Bote, & Moya-Anegon, 2010). The Q1 indicator is a size-independent indicator. In order to produce a size-dependent indicator for this study, which can be correlated with the other size-dependent indicators, we identified those papers for a country published in a first quartile ranked journal. Different from the SCImago Institutions Ranking, we used the Journal Impact Factor (JIF) instead of the SJR to select the journals belonging to the first quartile of their subject areas (Pudovkin & Garfield, 2004). Thus, we name the indicator Q1 JIF .  Table 1 presents the AC, FC, and WFC values of the NI for the year 2014. In addition, the Np, Q1 JIF , and (random) AvgAC are included. For every indicator the corresponding rank numbers were generated. As the results in the table show, the indicators lead to the same or similar ranking positions for several countries. For example, the United States and China are at the top positions independent of the used indicators. The United Kingdom takes up the third or fourth position. However, it is also visible that the ranking positions of many countries differ to a larger extent. For example, Switzerland has the 18 th position if the countries are ranked according to the number of papers or random AvgAC. However, the country (significantly) improved its positions if the official NI indicators or the Q1 JIF indicator is used.
Since the rank columns in Table 1 do not offer a clear picture of the relationship between the different indicators, we calculated correlation coefficients.
Based on the indicator values in Table 1, we calculated Spearman's rank correlation coefficients. This coefficient is a descriptive statistical measure that represents the degree of relationship between two indicators. A positive correlation points to a monotonic increase, which means the increase in the value of one indicator is always accompanied by an increase in the value of the other indicator (Sheskin, 2007). The results of the correlations are shown in the correlation matrix of Table 2. All correlation coefficients are at least at r s 5 0.88. When we interpret the coefficients against the backdrop of Cohen (1988), we can conclude that the coefficients are much larger than typical and on a very high level (see also Kraemer et al., 2003).
The most important NI indicator is the WFC. This indicator is mainly used by the NPG to rank institutions or countries. WFC shows the highest correlations with the other NI indicators (r s 5 0.98 for the AC and r s 5 0.99 for the FC). A similar high correlation is pointed out for the Q1 JIF indicator (r s 5 0.96). The correlations with the total number of papers and the (random) AvgAC are somewhat lower with r s 5 0.91 and r s 5 0.88, but still much higher than one would expect.

Size-Independent and Size-Dependent NI Indicators
In the previous section, we have shown that the different NI variants correlate much higher than typical with the mere number of papers on the country level. This result points out that the NI variants are size-dependent indicators. Thus, we recommend complementing the size-dependent NI indica-tors with size-independent NI indicators. In Haunschild and Bornmann (2015a,b), we recommended this additional perspective and gave the following supporting example: The NI ranks the Chinese Academy of Sciences (CAS) before Harvard University with 2,661 papers from CAS and 2,555 papers from Harvard in 2013. Considering the full publication output of both institutes in 2013 (31,428 for CAS and 17,836 for Harvard), we see that in relative terms Harvard (14% of Harvard papers in the NI) ranks higher than the CAS (8% of CAS papers in the NI).
The CWTS justifies its presentation of size-independent indicators (besides size-dependent indicators) in the Leiden ranking as follows: "In the case of size-dependent indicators, universities with a larger publication output tend to perform better than universities with a smaller publication output. Size-independent indicators have been corrected for the size of the publication output of a university. So when size-independent indicators are used, both larger and smaller universities may perform well" (http://www.leidenranking.com/methodology/indicators# sthash.52ZO1Kmm.dpuf). On the country level, many small countries have no way of publishing the same (or larger) number of papers in NI journals as large countries. For example, the United States published n 5 26,631 papers in journals considered in the NI (see Table 1). Switzerland has a total publication output of n 5 25,979. Because the papers published in NI journals cannot exceed the total publication output of a country, Switzerland could never reach the top position in the AC, FC, or WFC ranking. Table 3 shows the size-independent AC values (RelAC) for the different countries, whereas the AC values have been divided by the total number of papers of the specific country and multiplied with 100 in order to obtain percentages: For comparison, analogous size-independent values for the Q1 JIF (RelQ1 JIF ) and AvgAC (RelAvgAC) indicators have been added. We could not calculate relative FC and WFC indicators, because we are not able to reproduce this variant of fractional counting based on our in-house database. As the results in Table 3 show (expectedly) the sizeindependent AC variant leads to top positions for smaller countries, such as Chile or Switzerland. The United Kingdom and United States are at positions seven and eight. Similar results are visible for RelQ1 JIF and RelAvgAC: RelQ1 JIF leads to top positions for the Netherlands, Switzerland, and Singapore and RelAvgAC puts Bulgaria, Chile, and Hungary on top positions. Whereas the top positions based on RelQ1 JIF are reasonable (Netherlands, Switzerland, and Singapore are known as high-performing small countries), the results based on RelAvgAc (Bulgaria and Hungary) seem questionable. Similar the previous subsection, we calculated Spearman rank correlation coefficients to compare AC and the different size-independent indicators. The correlation between AC and RelAc is r s 5 0.76. Because this correlation coefficient is defi-nitely lower than the coefficients in Table 2, the relative variant seems to be an informative additional indicator to the AC (Cohen, 1988;Kraemer et al., 2003). RelQ1 JIF correlates on a similar level with RelAc (r s 5 0.82) and definitely less with AC and (r s 5 0.64). However, we obtain significantly lower coefficients for the correlations with RelAvgAC (between r s 5 0.06 and r s 5 0.23). As the top country positions for the RelAvgAC already revealed, the generally low coefficients point to less conforming results with the other indicators (with the size-dependent AC as well as the size-independent RelAC and RelQ1 JIF ) and questions the validity of this indicator. The relative indicators RelAC and RelQ1 JIF certainly offer an important additional perspective on country performance to AC and Q1 JIF . There is also another kind of relative indicator that can be informative: Indicators obtained by normalizing for the worldwide production, specifically AC divided by the sum of AC (RelSumAC) and Q1 JIF divided by the sum of Q1 JIF (RelSumQ1 JIF ). Here, AC and Q1 JIF , respectively, are divided by the sum over all countries (AC or Q1 JIF , respectively). Finally, these indicators are multiplied by 100 to obtain percentages: These indicators can answer the following question: Among all the papers published in reputable journals worldwide (measured by AC or Q1 JIF ), how many of them come from a specific country? Note that RelSumAC and RelSumQ1 JIF offer a relative perspective (relative to the world), but the indicators are size-dependent. Because ranks based on these indicators would lead to the same country positions as in Table 1 (AC and Q1 JIF ), we present in Table  5 the indicator values only. As the results in the table show, the United States is the largest producer of papers published in reputable journals, with 23.87% (RelSumQ1 JIF ) and 27.04% (RelSumAC).

Discussion
According to Osterloh and Frey (2015), there are four reasons why "rankings are deemed to be necessary. First, it is argued that because of the high specialization of research and the lack of expertise in areas that are different from the own research field, it is efficient to rely on research rankings . . . Second, research rankings fuel competition among scholars, lead to more and better research, and promote what is called an 'entrepreneurial university' . . . Third, research rankings give the public a transparent picture of scholarly activity. They make scientific merits visible to people who have no special knowledge of the field like politicians, public officials, deans, university administrators, and journalists . . . Fourth, academic rankings make universities and departments more accountable for their use of public money" (Osterloh & Frey, 2015, p. 2). However, university rankings have always been heavily criticized (Huang, 2012;Mutz & Daniel, 2015;Schmoch, 2015). For example, it is criticized that different indicator values are weighted in a specific way to compute a sum score and that the weighting is not appropriately justified. The NI does not use a mix of different indicators to compute a sum score. It is based on the number of papers (fractionally counted) published in reputable journals. Thus, the NI provides a ranking of institutions and countries based on a small subset of bibliometric data only. Starting from two short comments on the NI (Haunschild & Bornmann, 2015a, 2015b, we undertook an empirical analysis of the index using comprehensive country data. The analysis is based on data from the MPDL inhouse database (which is based on WoS data). In a first step of analysis, we correlated the NI with other metrics that are simpler to generate than the NI. The resulting very large correlation coefficients point out that the NI produces very similar results as simpler solutions. Thus, the use of the NI is questioned by the empirical results. For example, the NI could be replaced by the Q1 JIF indicator (which is used in the SIR and excellencemapping.net in a similar form, see Bornmann, Stefaner, de Moya Aneg on, & Mutz, 2014) and also measures the amount of output published in high-quality journals. In a second step of analysis, two relative variants of the NI have been generated: one variant is size independent (RelAC and RelQ1 JIF ) and one variant is size dependent (RelSumAC and RelSumQ1 JIF ). The size-dependent variant produces the same country ranks as the original AC and the Q1 JIF . Therefore, we recommend that the relative and size-independent variants (RelAC and RelQ1 JIF ) should be additionally presented by the NPG along with the fractionally counted versions RelFC and RelWFC. The size-dependent NI indicators favor large countries (or institutions) and the top-performing small countries (or institutions) do not come into the picture.
According to Campbell and Grayson (2015), the "NPG actively seeks constructive feedback from the researcher community we serve, and our aim is to iterate and improve the Nature Index in response to such feedback" (p. 1831). We hope that our empirical results and recommendations are helpful for improvement of the NI.