The application of bibliometrics to research evaluation in the humanities and social sciences: An exploratory study using normalized Google Scholar data for the publications of a research institute

In the humanities and social sciences, bibliometric methods for the assessment of research performance are (so far) less common. This study uses a concrete example in an attempt to evaluate a research institute from the area of social sciences and humanities with the help of data from Google Scholar (GS). In order to use GS for a bibliometric study, we developed procedures for the normalization of citation impact, building on the procedures of classical bibliometrics. In order to test the convergent validity of the normalized citation impact scores, we calculated normalized scores for a subset of the publications based on data from the Web of Science (WoS) and Scopus. Even if scores calculated with the help of GS and the WoS/Scopus are not identical for the different publication types (considered here), they are so similar that they result in the same assessment of the institute investigated in this study: For example, the institute's papers whose journals are covered in the WoS are cited at about an average rate (compared with the other papers in the journals).


Introduction
In the classical core areas of natural and life sciences (hard sciences), quantitative methods have become an integral part of research evaluation (Moed, 2005). In the humanities and social sciences (soft sciences) quantitative methods for the evaluation of research performance are (still) not as widespread. However, in times of limited research funding, the evaluation pressure is also rising in these disciplines, but the methodical preconditions for the application of quantitative methods are (still) not very developed.
In the natural and life sciences, bibliometrics in particular has established itself as a standard procedure for quantitative research evaluation. With respect to the selection of suitable data sources and indicators, as well as the realization of a bibliometric study, standards have been developed and also applied . The most used databases are the Web of Science (WoS) from Thomson Reuters and Scopus from Elsevier. The WoS currently contains a core set of around 11,000 journals (WoS source journals); Scopus covers more than 20,000 journals. However, the WoS and Scopus are multidisciplinary databases that are biased towards natural and life sciences.

Problems of Bibliometrics in the Humanities and Social Sciences
Bibliometrics on the basis of the WoS and Scopus is unsuitable for use in the humanities and social sciences, chiefly for the following two reasons: 1. A higher proportion of journals that are not included in the database: Research topics in the humanities and social sciences are often nationally or regionally orientated. Thus, the corresponding publications appear in the relevant language and not in the (international) journals included in the WoS or Scopus (source journals) (Butler & Visser, 2006;Frandsen & Nicolaisen, 2008;Moed, 2005;Nederhof, 2006). The problem of insufficient coverage, particularly for the WoS, does not seem to be decreasing, at least in the case of the social sciences, but rather to be growing (Larsen & von Ins, 2010). 2. A larger share of book contributions and monographs: In the natural and life sciences, research results are mainly published as articles (papers) in specialist journals that are largely covered by the WoS and Scopus. However, this is not the case in some disciplines of natural science, such as computer science and materials science. In the area of humanities and social sciences, publication tends predominantly to be in the form of books or monographs, which are essentially excluded as database documents (source items) for the WoS or Scopus. Thus, typical publications in the humanities and social sciences are only insufficiently captured by these databases (Marx & Bornmann, 2015). Database providers are already including proceedings and monographs, although their coverage is still poor (Gorraiz, Purnell, & Glänzel, 2013;Torres-Salinas, Robinson-Garcia, Campanario, & López-Cózar, 2014).
Since bibliometrics based on the WoS and Scopus can hardly be applied to the social sciences and humanities, various studies have developed indicators for evaluation in these disciplines. For example, the project "Development and Testing Research Quality Criteria in the Humanities, with an emphasis on Literature Studies and Art History" of the universities of Zürich and Basel, has the objective of developing quality criteria for research in selected subjects of the humanities (http://www.psh.ethz.ch/crus/index). But the indicators proposed in these projects are generally less practical than the indicators that are used in bibliometrics Ochsner, Hug, & Daniel, 2012.
The meaningfulness of bibliometric data for research evaluation ultimately depends on the coverage of the publications in the databases selected (Chi, 2013). What is not covered by the databases also cannot be evaluated. The coverage of specialist literature in databases refers primarily to the publications that are recorded as database documents (source items) and made searchable; "nonsource" items are not considered (Butler & Visser, 2006;Chi, 2014). The different levels of coverage of humanities and social sciences in relation to the natural and life sciences is reflected in the different share of references (citations) of these publications that are recorded as database documents (i.e., as searchable publications in the WoS or Scopus) and correspondingly linked. The difference is especially marked in the social sciences and particularly in the humanities: Although publications in the social sciences contain, on average, even more references than natural science publications, only a third of these are recorded in the WoS as database documents (Marx & Bornmann, 2015). In the case of the humanities, the share of publications recorded in the WoS is lower still by far.

The Use of Google Scholar in Bibliometrics
Publications represent an important form of distribution of research results in most of the humanities and social sciences. In these publications results are usually produced or discussed against the background of the research results of other scholars (i.e., citations are mandatory). Thus, the use of bibliometrics for research evaluation seems appropriate in these disciplines as well. Because of the fundamental limitations associated with the WoS and Scopus, Google Scholar (GS) has been proposed in the past as an alternative (or supplement). In comparison with other (especially subject-specific) databases (such as Chemical Abstracts, http://www.cas.org/) the use of GS has the decisive advantage of a broad coverage of the literature (Prins, Costas, van Leeuwen, & Wouters, 2014). The limitations to a core set of scientific journals mentioned in connection with the WoS and Scopus disappear. This not only results in a more comprehensive coverage of publications to be evaluated, but also of citations by publications that have not appeared in core journals (Kousha & Thelwall, 2007Kousha, Thelwall, & Rezaie, 2011). For disciplines such as computer science, GS compared with the WoS provides a much more comprehensive and mostly more favorable picture (Franceschet, 2010;Kousha, Thelwall, & Rezaie, 2010).
However, a range of publications has pointed to many weak points and deficiencies of GS, which should be taken into account in its use (Jacso, 2005(Jacso, , 2009(Jacso, , 2012. Some years ago, GS had the problem that certain publishers denied GS access (such as the American Chemical Society, [ACS]), which led to very incomplete results in the corresponding specialties such as chemistry, and made the use of GS fundamentally questionable (Bornmann et al., 2009). But the situation has changed since then: The ACS publications are now also covered by GS. New studies show that GS at present covers scientific publications across the specialties so well that citation analyses now appear possible in disciplines beyond the natural and life sciences: "Finally, we argue that Google Scholar might provide a less biased comparison across disciplines than the Web of Science. The use of Google Scholar might therefore redress the traditionally disadvantaged position of the Social Sciences in citation analysis" (Harzing, , p. 1057). In addition, GS seems to be growing continually (parallel to the increasing output of publications) and thus to be sufficiently stable over time: "Our data suggest that-after a period of significant expansion for Chemistry and Physics-Google Scholar coverage is now increasing at a stable rate" (Harzing, 2014, p. 565).
However, certain fundamental problems still remain: GS does not supply any information on data sources, document types and time ranges, or update frequencies. The citations continue to include questionable sources, such as research applications and presentations which should really not be regarded as citing documents (Meho & Yang, 2007). However, the main problem with the bibliometric use of GS is the identification and elimination of duplicates, both on the publication side as well as the side of the citations of these publications. The cause is the automatic generation of the data sets from the sources available on the Internet, which leads to heterogeneous bibliographic information on one and the same publication. The names of authors, journals, and title words may appear in a range of variants that have to be combined (Jacso, 2009). This combination can never be performed satisfactorily in a purely automatic way and requires (manual) postprocessing (Köpcke, Thor, & Rahm, 2010;Thor & Rahm, 2007).
The present study takes a concrete example in an attempt to evaluate a research institute from the area of social sciences and humanities with the help of data from GS. Here we follow the example of Prins et al. (2014), by using GS in a real-life assessment procedure. For this study we have consciously chosen an institute (researching into the foundations of language) which also publishes a large part of its output in journals that are evaluated for the WoS or Scopus. Our intention is to test the convergent validity of the GS results by comparing them with those based on the WoS and Scopus. If convergent validity is established (and if we arrive at similar results with GS and the WoS/Scopus), we would see that as support for the use of GS for research evaluation in the social sciences and humanities. For the first time in bibliometrics, this study undertakes a normalization of citation impact on the basis of GS data. This involves a comparison of the impact of publications appearing in journals, conference proceedings, and anthologies with the impact of a reference set compiled correspondingly (Pudovkin & Garfield, 2009). The special difficulties in calculating normalized indicators on the basis of GS are indicated in Prins et al. (2014).

Data Set
The present study includes the publications of a research institute from the year 2009. The institute published a total of 212 publications in that year. Somewhat less than half of the publications (40%) are journal papers (see Table 1). All publication types-apart from the PhD dissertations-are included in the citation analysis of the current study.

Normalization of Citation Impact: Journal Normalized Citation Scores
In order to be able to compare the citation impact of papers published in different publication years and subject categories with each other, a normalization of citation counts of papers is performed (Vinkler, 2010). One possibility for normalization consists in calculating the so-called journal normalized citation score (JNCS) for a unit (here: an institute), as follows: "The number of citations to each of the unit's publications is normalized by dividing it with the world average of citations to publications of the same document type, published the same year in the same journal. The indicator is the mean value of all the normalized citation counts for the unit's publications" (Rehn, Kronman, & Wadskog, 2007, p. 22). A JNCS of 1 means that the citation impact of the institute's papers corresponds to the average citation impact in the journals which published them. A score of more (less) than 1 means that the citation impact of the institute's papers lies above (below) the average in the journal.

Normalization of the Citation Impact of Conference Proceedings and Book Chapters
Since calculating a normalized impact is not only desirable for journal papers, but also for conference proceedings and book chapters, in this study we would like to propose a suitable normalization procedure for these publication types (Torres-Salinas et al., 2014): (a) The citation impact of a contribution to a conference should be measured in relation to the citation impact of the other contributions to the same conference. In other words: The citation impact of a contribution should be divided by the average citation impact of the other contributions to the same conference. In the following sections we refer to a score calculated in this way as a Conference Proceedings Normalized Citation Score (CPNCS). Since meeting abstracts are generally not included in bibliometric analyses, the normalization procedure only includes contributions that are published as full papers in the corresponding proceedings volumes. (b) The citation impact of a book chapter should be measured relative to the citation impact of the other book chapters in the book concerned. In other words: The citation impact of a certain chapter should be divided by the average citation impact of the other chapters in the same book. In the following, we refer to a score normalized in this way as a Book Chapter Normalized Citation Score (BCNCS).

Searching for Publications in GS
To search for publications in GS (those from an institute or for construction of a reference set), the corresponding queries to GS were performed as follows: First, each publication was searched for by title in GS and (up to) 20 results recorded. Subsequently, a query was performed and up to 1,000 results recorded for each journal, conference, and book by name or title. The procedure described ensures a high probability that all relevant hits can be determined in GS, even if data errors exist in GS for certain publications (such as typos in the title). We extracted all hits in GS with their own (GS internal) ID, since only these hits have an unambiguous reference. The ID allows us to perform comparative investigations in the future in which the changes in citation numbers with time could be understood (for the same publication set).
The GS hits obtained in this way would be aligned with the publications sought, that is, the similarity of the title would be determined between the publications and the GS hits. For this, the so-called trigram similarity (ASIM) was calculated, which determines the relative agreement of trigrams (i.e., three successive characters in the title). Our experience in the past has shown that an ASIM > .8 indicates with a high probability that the hit in GS corresponds to the publication originally sought (Thor & Rahm, 2007). It can additionally be checked whether the GS hit has the same publication year as the publication (here: 2009). We also manually checked a range of publications to see whether it really was the publication concerned from the journal, the proceedings volume, or the book. Here we concentrated on the typical problem cases where, for instance, a publication has a lot of GS hits (e.g., because it has a general title like "Editorial") or several publications have the same GS hits (e.g., because they have very similar titles). The procedure described is a heuristic proven over many years, which allows a very good assignment of GS hits to publications despite possible data quality problems. However, complete agreement between the publications sought and the hits can only be guaranteed by manual checking of every single GS hit, which is not practical with a large number of publications.
The citation window for the impact scores in GS in this study covers a period from publication date to 2014.

Citation Impact of Journal Papers of the Institute That Are Covered in the WoS
For the calculation of the normalized citation impact, the corresponding reference set must be compiled for every article of the institute (n = 56). For this, searches were performed for all the articles in the journals in which the institute has published (n = 15,983). The search in GS produced an entry for a total of 15,691 articles. In other words: For the 15,983 articles that were sought in GS, the rate of hits was 98%. Table 2 shows the distribution of the articles sought and hit in GS across the various journals. If an article  24  54  54  65  25  16  16  24  26  11  11  17  27  120  120  163  28  13  10  13  29  23  23  30  30  135  130  169  31  5  1  6  32  59  59  85  33  175  175  237  34  9  9  12  35  7  7  12  36  49  30  57  37  234  233  290  38  327  327  414  39  680  680  977  40  33  29  42  41  103  103  140  42  244  243  409  43  36  36  53  44  94  94  118  45  195  195  267  46  48  48  75  47  171  171  252  48  32  32  54  published in one of these journals could not be found in GS, it was excluded from the calculation of the citation impact for the reference set. Besides the number of articles for which data in GS were sought, and the number of articles with at least one hit in GS, Table 2 provides the total number of hits for the articles in GS: For many articles, not just one corresponding entry is found in GS, but several. As Table 3 shows, there was a hit in GS for 11,859 articles (53%). For the remaining articles, there were between 2 (n = 2,442) and 20 (n = 1) hits. The comparable figures from Martín-Martín, Orduña-Malea, Ayllón, and Delgado López-Cózar (2014) show that the search strategy in this study (see previous section, Searching for Publication in GS) allowed a reduction in the number of possible hits per publication: "83% of the documents in our sample have more than one version, whereas 40% have 6 or more versions, 19% have 10 or more versions, and 200 documents have more than 100 versions (0.1%)" (p. 35).
Since several entries in GS were found for around half of the articles, the question arises whether all entries, or which fraction of the entries, should be used for the calculation of the reference values. Thus, for example, around 90% of the hits in GS relate to the year 2009 (i.e., the year from which the publications of the institute come). About 10% of the hits relate to other years. We can assume with high probability that we do not need to take into account the other hits for calculation of the reference values. Figure 1 shows the average number of citations (arithmetic mean) from GS for articles (or their article hits in GS) that were published in 49 different journals. Also shown are the average number of citations for all article hits in a journal, only for article hits from 2009 with an ASIM > .8, as well as for articles from 2009 with an ASIM > .8 and a manual correction of the data. The figure is intended to clarify which restrictions in the subgroups lead to small or large changes in the citation rates. The figure shows that the average values derived from all the articles hits differ markedly from the average values for the subgroups. In the derivation of reference values on the basis of journals, this indicates that the publication year of the hits should be taken into account. Consideration of further limitations, like the ASIM or the manual correction, hardly changes the average citation frequency at all: Across all journals, the citation rates differ on average by about one citation. However, the other limitations-besides the publication year-are still taken into account in the compilation of the reference values, so as to have the highest accuracy possible for the citation impact values.
For all 56 of the institute's articles, citations could be searched for in GS. Table 4 gives the number of hits for these articles in GS: For a total of 56 articles there were 80 hits. However, the number of hits could be reduced to 56 when only articles from the year 2009, with an ASIM > .8 and a manual correction are taken into account.
On the basis of the citations searched for in GS for the journals in which the staff of the institute have published their articles, we calculated the JNCS for each (based on GS). In addition, we researched these scores in the in-house  Max Planck Society (MPG) database, which is run by the Max Planck Digital Library (MPDL). This database contains the JNCSs on the basis of the WoS. Whereas the citation window for the GS scores related to a period from 2009 to 2014, the citation window for the WoS scores is from 2009 to 2013. If a comparison of the scores calculated with data from the two databases indicated a similarity for the scores for the institute's articles, we could conclude that GS may be used for the bibliometric research evaluation in the area of the humanities and social sciences. A convergent validity of the results would indicate that GS comes to similar conclusions as the WoS-that is, as the database that is applied as standard to research evaluation in the sciences. Figure 2 shows the JNCSs for the institute's 56 articles. The red line on the JNCS = 1 in Figure 2 marks the citation impact of an article from the institute that corresponds to the average in the journal. As the results show, the scores differ more or less clearly. However, for most articles the two scores agree on whether they were cited above or below the average rate. Table 5 shows for all of the institute's articles the average citations and JNCSs, derived on the basis of the WoS and GS. Whereas the average citation frequencies clearly differ between the WoS and GS, the JNCSs are similar. Thus, the JNCSs are convergently valid: They agree in indicating that the citation impact of the articles roughly corresponds to the average for a journal.

Citation Impact of Journal Papers of the Institute That Are Not Covered in the WoS (but in Scopus)
A total of 29 of the institute's papers were published in journals that are not covered by the WoS (but partly covered in Scopus). Analogously to the procedure in the previous section, JNCSs based on GS are also calculated for these papers. For this calculation we searched for the citations in GS not only for the 29 of the institute's papers, but also for all other papers in the journals in which the 29 papers appeared. As the figures in Table 6 show, a total of 2,628 papers in GS were processed, of which at least one entry in GS was found for 2,404. Table 7 shows the number of hits for the papers sought in GS. The number of hits ranges between 1 (n = 1,593) and 11 (n = 2). Figure 3 shows the average number of GS citations for the papers published in the 27 journals in which the institute's papers have appeared. Shown here are the average number of citations for all paper hits in a journal, only for paper hits from 2009 with an ASIM > .8, as well as for papers from 2009 with an ASIM > .8 and a manual correction of the data. In agreement with the results reported in the previous section, it is very clear that the arithmetic means, in particular, which are derived from all the papers hits, deviate from the other means. Because of the deviations in the data, we only included papers for the calculation of the reference values (exactly how in the previous section) that were published in 2009 have an ASIM > .8 and were manually corrected.
A total of 25 of the institute's papers have a hit in GS; 21 of these papers were published in 2009, have an ASIM > .8, and were manually corrected. Table 8 shows, for these papers, the average number of citations in GS and the average JNCS GS.
For some of the institute's papers (n = 9), besides the citations in GS the citations in Scopus (Elsevier) could also be searched for (see Table 8). In addition, for five of these nine papers a JNCS could be calculated. For four of the nine papers the citations for all papers in the particular reference  set were incomplete. Since in this study the normalized scores constructed on the basis of the WoS and Scopus are regarded as reference values that reflect the "true" normalized impact, attention was paid to the completeness of the publications in the reference set. For this reason no JNCS was calculated for four of the institute's papers. As a comparison of the two JNCSs (GS and Scopus) in Table 8 shows the scores are similar and differ from each other by about 0.2. The values become even more similar (Scopus = 1.42 and GS = 1.34) if the calculation of the mean JNCS GS only includes those papers (n = 5), which were also included in the mean JNCS Scopus.

Citation Impact of the Institute's Contributions in Conference Proceedings
Of the total of 39 contributions from the institute in conference proceedings, only four appeared in proceedings volumes that included full papers. The rest were published in volumes with abstracts. Because of the limited scope of abstracts (and the correspondingly lowered expected citation rates), abstracts (meeting abstracts) are generally excluded from bibliometric analyses (Moed, 2005). For the analysis in this study, there are thus only four contributions available for normalization. There are also citation counts for two contributions from the WoS.
The reference set for the four contributions consists in each case of the other contributions published in the proceedings of the same conference. We investigated a total of 100 contributions to the four conferences in GS (of which four were published by authors from the institute). As Table 9 shows, citations in GS could be found for 65 contributions.
From the figures in Table 9 it is clear that more than one hit was found in GS for a series of papers. As Table 10 shows, there were up to three hits for one and the same conference contribution.
As with the institute's papers that have appeared in journals (see the previous two sections), the question also arises for conference contributions which hits for a paper in GS should be included in the calculation of the citation rate for the reference set of a conference.    Figure 4 shows the average number of citations for all paper hits for a conference, only for paper hits from 2009/ 10, only for paper hits from 2009/10 with an ASIM > .8, as well as for papers from 2009/10 with an ASIM > .8 and a manual correction of the data. Since the papers from a conference (which took place in 2009) were not published in 2009, but in 2010, both years were taken into account in the evaluation. A greater deviation from the other hit groups was particularly noticeable for "All papers." The results for the three other groups are similar or largely identical.
Of the institute's four conference papers, three could be found with one hit each in GS. The corresponding citation counts are shown in Table 11. Whereas one paper had no impact at all, the two other publications were cited six and 16 times, respectively. These citation counts were used to calculate the CPNCS GS for the three papers. For this, the citations were each divided by the mean number of citations for the conference papers in the reference set. The reference set used-following the procedure in the previous sections-were the respective citations from the paper hits from 2009/10 with an ASIM > .8 and manual correction of the data. As the normalized scores in Table 11 show, the two papers that were able to produce citation impact have much   higher scores than the mean value of 1. Since all the papers from one conference whose GS numbers are in the table can also be investigated in the WoS, a comparison with the impact achieved there could be made for one paper. With scores of 1.97 (GS) and 2.22 (WoS), the paper has similar normalized values, which indicate about twice as great an impact as for the average conference paper.

Citation Impact of the Institute's Book Chapters
The analysis of the citation impact of the book chapters includes 71 of the institute's publications, which were published in a total of 40 books. As Table 12 shows, a hit in GS could be achieved for more than half of the chapters.
Thus, for example, the 17 chapters in book 1 included only two chapters with at least one hit.
From the results shown in Table 13, it is also evident that many of the book chapters found have achieved not one, but several hits in GS. This means that not only were relatively few chapters found in GS; the chapters found often had more than one hit (the latter indicates few accurate search results). Figure 5 shows the average number of citations from GS for chapters published in 39 books (for one book, citations for the chapters were not available in GS). Shown here are the average number of citations for all chapter hits in a book, only for chapter hits from 2009, only for chapter hits from 2009 with an ASIM > .8, as well as only for chapter hits It was possible to find data in GS for 55 of the total of 71 of the institute's book chapters; for 48, GS also contains citation information. Many chapters had only one hit in GS, as Table 14 shows.
For a total of 34 of the institute's chapters it was possible to calculate a normalized citation score (BCNCS) for which only chapter hits for 2009, with an ASIM > .8 and a manual correction of the data were included in the evaluation. As the score in Table 15 shows, the institute's book chapters were cited about 20% more often than the other chapters in the books (BCNCS = 1.2).

Citation Impact of the Institute's Books
The institute published a total of 10 books in 2009. Of these, citations for 8 books could be found in GS (only two could be found in the Book Citation Index, BCI, of the WoS). The number of citations ranges between 0 and 72. For books, it is unfortunately neither possible to investigate them in the WoS (the coverage in the BCI is too limited), nor to calculate normalized values. Since Torres-Salinas et al. (2014) have already proposed methods to calculate normalized citation impact values based on the BCI, these methods could be used in coming years, when the coverage of the BCI has improved. Furthermore, one could try to transfer these methods from the BCI to GS.

Discussion
Evaluation of research based on bibliometrics has one decisive advantage: In almost all disciplines, one focuses on the primary outcome of research (i.e., publications) and their usefulness for further research (i.e., citations). Since the application of the two most important bibliometric databases, the WoS and Scopus, is limited mainly to the natural and life sciences, we have presented in this study a method by which GS data can be applied to the evaluation in the social sciences and humanities. As the list of Martín-Martín et al. (2014) shows, the most important sources for publications and their citations have now been evaluated by Google: "Google Scholar's crawlers sweep the entire academic web: the most well-known scholarly publishers ( (2014) show that GS and its derivatives are the most used products by scientists. More and more institutions and people are recommending that one put the URL for the GS Citations page in one's CV and on one's personal website. In snowball metrics-a global standard enabling cross-institutional comparisons that have been defined and agreed upon by higher education institutions (and Elsevier) (Colledge, 2014)-the use of GS as the primary data source for bibliometric analyses is recommended (along with the WoS and Scopus). One argument for the GS use is that people can easily evaluate departments and institutions if the GS Citations pages of the faculty are easily available. Particularly when universities are evaluated, which generally cover a broad range of disciplines (Bornmann, de Moya Anegón, & Mutz, 2013), GS data could be used: According to the results of Martín-Martín et al. (2014) on GS publications, around half of the highly cited documents cannot be found in the WoS, and almost 20% of the highly cited documents are books. In addition, the number of books, which are hardly evaluated for the WoS, has continually increased in recent years, and "become the most frequent document type in the last five years (2009-2013)" (p. 18).
However, GS today is not without disadvantages: The ease with which GS indicators can be manipulated (Delgado López-Cózar, Robinson-García, & Torres-Salinas, 2014) and the transience of the results and measures (in many cases difficult to replicate stably): A comparison of two samples of 64,000 highly cited documents (May and October, 2014) showed that "14.7% of the 64,000 documents in the most recent sample were not also present in our earlier sample. Moreover, most of these new documents are placed in pretty low positions in Google Scholar's ranking of results" (Martín-Martín et al., 2014, p. 16). In order to be able to use GS in the evaluation of research in the humanities and social sciences as well, we have presented in this study procedures for normalization of citation impact that are derived from the procedures of classic bibliometrics. With these suggestions we are following recommendations as they have been formulated by, for example, Prins et al. (2014): "To use GS in the context of evaluation, various ways for benchmarking or field normalization have to be worked out, for instance on the basis of available journal data, to address the issues of research assessments" (p. 442). The normalization of citation impact proposed in this study can lead not only to a reduction of errors in the GS data on impact measurement (errors average each other out and the statistical accuracy of prediction is mainly determined), but also standardize the generally higher citation counts of GS in comparison with the WoS and Scopus: "In our sample, 91.6% of the documents have received more citations in GS than in the WoS. Only 3,079 documents (9.4%) have more citations according to the WoS than in GS" (Martín-Martín et al., 2014, p. 33).
Even if we could not find citations in GS for all papers in a reference set (journals, conference proceedings, and edited books), the comparison of GS normalized citation scores with the WoS or Scopus normalized scores shows that the reference sets based on GS data are still suitable for normalization. Even if scores calculated with the help of GS and the WoS/Scopus are not identical for the different publication types, they are so similar that they result in the same assessment of the institute investigated in this study: The papers of the institutes whose journals are also covered in the WoS are cited at about an average rate (compared with the other papers in the journals). Whereas the papers whose journals are not covered in the WoS, and the book chapters, are cited about 20 to 40% above the average, the conference papers are cited twice as often as one would expect for the papers from a conference. In the interpretation of the result for the conference papers it should be considered that it is based on only four papers that appeared in proceedings volumes.
Finally, we would like to mention a limitation of our study that future studies should address: Normalization on the basis of single journals is seldom undertaken in bibliometrics. An important reason is that this kind of normalization is disadvantageous for papers that have appeared in reputable (highly cited) journals. These journals have a high citation level which makes it more difficult to outperform the journals' average than is the case with less prestigious journals with a low citation level. Instead of journal-based normalization, the recommendation today is normalization on the basis of the papers of a research field, and this is also general practice (Vinkler, 2012). However, we applied journal-based normalization in this study, as it means less effort in the search for publications and citations in GS. For a research field, considerably more papers would have had to be searched for in a reference set. Future research on the normalization of citation impact based on GS data should therefore concentrate on the use of the papers of a research field for the construction of a reference value.