How open is communication science? Open-science principles in the field

ABSTRACT Open-science principles, such as open data, shared materials, and pre-registration, are expected to encourage a culture of replication in scientific research. Yet, with its topical and methodological heterogeneity, communication science has been described to fall short of such principles. We analyze the extent to which open-science principles were used in publications from 20 leading communication journals between 2010 and 2020, and compare the results to benchmarks from psychology. Results show that open-science principles were little used in communication science with some variation across methods, but were more consistently used in psychology papers. There was no relationship with scientific impact. This suggests a need for greater attention to open-science principles in communication science, while considering their appropriateness for different study designs.

experimental research has been suggested to suffer from low replicability, HARKING, and p-hacking (Bishop, 2019;Dienlin et al., 2020;Matthes et al., 2015), it seems particularly easy for such a professionalized part of the field to implement means of openness and transparency due to its standardized routines and designs. Thus, the replication crisis could, particularly for experimental research, also be perceived as a 'credibility revolution' (Vazire, 2018), given that, rather than pursuing something new, it more thoroughly resembles "an extension of accepted […] approaches to research" (Bowman & Keene, 2018, p. 365).
Yet, surprisingly little is known about whether proposed open-science principles may already be in use or whether they are eventually capable of living up to its expected standards. That is, to the best of our knowledge, only one study has, in parallel to this study, empirically investigated communication science to identify how often empirical research data and materials have been shared (Markowitz et al., 2021). We argue that the applied methodology has led the authors of this study to underestimate the number of studies that adhere to open-science practices. This is, the authors followed a dictionary approach capable of capturing papers that adhere to recent vocabulary surrounding the ongoing open-science discussion. Scholarship prior to the open-science movement is thus hard to capture. Moreover, we suggest that open-science practices are also limited by methodological approaches suggesting that the usage of open-science practices should be analyzed in dependence of the used methodology.
The current study aims at filling these gaps by analyzing a total of 9516 peer-reviewed articles from 20 different journals in communication science published between 2010 and 2020. Based on a semi-automated content analysis, we follow a three-step approach. First, we have quantified the variety of empirical methods applied, the practices of sharing data and materials, as well as the use of pre-registration in communication science. Second, we put these quantifications into the perspective of potential replicability by means of citation counts. Third, in order to provide a somewhat comparable benchmark, this study also looks at 4754 peer-reviewed articles from three major psychology journals over the same time period. In doing so, our study reflects the diverse nature of our discipline by providing a detailed account of adherence to open-science practices per methodological approaches and thereby makes it possible to get a clearer picture of how open communication science really is.

Open science
Recent endeavors to implement open-science principles into everyday research have received strong echo throughout a wide variety of disciplines, including communication science Fox et al., 2021). The main requests include incentivizing, if not requiring, research data and materials to be shared and empirical investigations, particularly experimental research, to be pre-registered 1  . These endeavors are expected to prevent causes of low replicability , increase analytic robustness (Klein, Hardwicke, et al., 2018), yield more reusable tools , and ultimately benefit a field's strive for "collective and iterative knowledge" (Bowman & Spence, 2020, p. 7).
As these requests are important pillars of modern empirical research in that they embody science as a transparent, reliable, and collectively owned societal sector (Bowman & Keene, 2018;Nosek et al., 2015), various means have been discussed recently to have more researchers and particularly journal publications meet these requests.
First, a number of researchers have suggested the 'TOP Guidelines,' a list of possible open-science standards along with potential levels of enforcement for scientific journals to offer, incentivize, or even require varying degrees of open research . For example, if a journal was to enforce data sharing, it could motivate authors to state where data could be accessed (enforcement level 1), ask authors to upload data to a trusted repository (enforcement level 2), or require authors to provide data and analyses in a format for the journal to independently reproduce results before publication (enforcement level 3).
Second, several journals have started to raise awareness about including data-availability statements into manuscripts during manuscript-submission processes (i.e. enforcement level 1). Such statements should at least refer and describe means of access to any original or third-party data used in a reported study, thereby providing a bare minimum of data transparency (e.g. Oxford University Press, n.d.).
Third, some journals have started to encourage researchers to not only refer to data but to share it publicly through citable and trusted repositories (i.e. enforcement level 2). Thereby, future studies are also able to build upon previously published research data (Sansone et al., 2019). This in turn might increase the number of citations that papers adhering to open-science principles receive as scholars can cite a study for reasons beyond its content (Markowitz et al., 2021).
Fourth, a few journals have implemented badges for openly shared data, openly shared research materials, and pre-registration that are attached visually to publications as suggested in 2013 by the Open Science Foundation (Blohowiak et al., 2013). While badges are commonly used in psychology where some journals have been introducing them as early as 2014, journals in communication science have started to employ open-science badges in 2019. In any case, the specific location of where badges are shown varies heavily with some journals depicting them prominently on the articles' websites and others only showing them at the end of articles' PDF documents (Fidler et al., 2018).
These efforts' effects, however, might be questionable (e.g. Fox et al., 2021). That is, aforementioned efforts might help to prospectively inform a more transparent and open-scientific environment, but they risk devaluating prior research in that awarding badges to new studies visually highlights these studies over older ones. Moreover, not every study, not every method, and not every research design are equally suitable for providing the same level of transparency, for example through open data. Also, pre-registration might be suitable for hypothesis-driven empirical research while exploratory research naturally struggles to identify its analytical intentions a-priori. In that, proposed open-science efforts embrace practices that match some but not other forms of communication scholarship.
Above and beyond openness and pre-registration, open-science principles aim at making research more robust and reliable through enabling replications on a larger and more systematic scale (Bishop, 2019;Dienlin et al., 2020). Replications are crucial to test for the resilience of empirical findings, especially in inductive experimental research, which is particularly important for studies sparking lots of academic discourse. That is, heavily cited studies typically depict a discipline's key findings of their time (Aksnes et al., 2019). In other words, although scientific valueas expressed by the number of citationsis only one criterion of research quality, and other facets such as solidity and plausibility do not necessarily correlate with citation counts, research quality might in fact be increased through open and transparent research practices (Aksnes et al., 2019). As such, heavily cited studies are both influenced by what bugs societies at times and influencing societal discourses through academic contributions. Recent communication-science examples, such as articles on social media, are thus "timely and with great relevance to our increasing digitized and mediated environment" (C. Chan & Grill, 2020, p. 20). Consequently, heavily cited studies require particular attention when it comes to replicability in order to test for these findings' robustness and reliability.
Yet, citing is not only a feature of those publications which are apt to be replicated. Instead, literature on citing behavior has suggested citations to be determined by a plethora of influences, namely motives of argumentation (e.g. referring to a specific claim), social alignment (e.g. positioning oneself in a discourse), mercantile alignment (e.g. crediting, self-citation), and data (e.g. citing in the realm of a meta analysis) (Case & Higgins, 2000;Erikson & Erlandson, 2014). Recently, findings have also pointed out social biases within citation patterns, such as for gender (Wang et al., 2021). Studies have also been shown to profit from being published in high-impact journals, both due to a journal's halo effect and due to high-impact journals' selection mechanisms (Traag, 2021). Each of these motives adheres to one of two more general perspectivesa normative model of credit exchange through citations as currency for intellectual debts (argumentation, data) and a social-constructionist model where authors seek to position themselves in a field through the selective citation of others (social alignment, mercantile alignment) (Erikson & Erlandson, 2014, p. 626).
Importantly, both streams of thought have been shown to be related with open-science principles. That is, in a study of 85 empirical publications from medicine, Piwowar et al. (2007) found papers which also shared their data to be cited about 70% more frequently than those papers which did not share their data. While the authors did not look at possible motives, the narrow field under investigation and the congruence with prior findings leads them to suggest that such "microarray data is indeed often reanalyzed" and that, in turn, these "re-analyses may spur enthusiasm and synergy around a specific research question, indirectly focusing publications and increasing the citation rate of all participants" (Piwowar et al., 2007, p. 3). Supporting evidence has also been accumulated for other fields by looking at open-access papers (Gargouri et al., 2010;Hajjem et al., 2005). Gargouri et al. (2010) expect the relationship between open accessibility and higher citation counts to be less causal but instead due to a 'self-selection bias' in that authors may tend to open up their best publications which might receive more citations anyhow. Adhering to open-science principles could thus be indeed expected to come along with more citations. Alternatively, Grand et al. (2012, p. 686) found open-science principles to "support consumers in developing the critical awareness and judgment that enables us to separate pseudo-science from real." Open science thereby contributes to a perception in which transparency and sharing data and materials is trustworthy practice that increases the quality of research which should ultimately also lead to higher numbers of citations.

Open communication science
When it comes to open communication science, empirical evidence is scarce. Recent calls for strengthened open-science principles have been fueled particularly by media psychologists in the field . While looking up to open-science endeavors in the field of psychology, communication science has been ascribed some space to catch up when it comes to replications, pre-registration, or sharing research data and materials Domahidi et al., 2019;van Atteveldt et al., 2019van Atteveldt et al., , 2021. Informed by diagnoses from other disciplines, however, these claims lack empirical sophistication within communication science.
A rare exception is a recent study by Markowitz et al. (2021) that demonstrates that only 5.1% of analyzed papers use open-science vocabulary. While this study is a helpful starting point for researching how open communication science is, there are several reasons why additional research endeavors are necessary. First, the study does not differentiate between sharing data and materials. Yet, different predispositions affect whether data (e.g. anonymization) or materials (e.g. copyright) can be shared. More differentiation is thus useful to offer insights into where communication science has room for improvement. Second, open-science practices can be expected to depend on the method used (e.g. van Atteveldt et al., 2021). Markowitz et al. (2021), however, only provide seminal insights across all papers published in high-impact journals independent from the applied method. Third, the used dictionary-based approach is likely to suffer from low recall and thus to underestimate the use of open-science practices. That is, while the authors convincingly demonstrate that papers with open-science badges use the 13 employed search terms more frequently than papers without badges, the used terms appear to be skewed toward more recently used vocabulary. Yet, as sharing data and materials was considered good practice also before the advent of open-science badges, other terms than the ones that make up the implemented dictionary might have been used to describe openness but might have thus been missed.
Arguably, the use of open-science practices might also be informed by a field's professionalization and standardized (and thus ready-to-share) routines. Yet, communication science might lack such empirical sophistication due to the field's lack of a coherent epistemological core. That is, the lack of a shared epistemological core may also hinder the formation of shared research norms and standards, such as open-science principles (see also M. . In that, communication science has been ascribed "too diverse, separated, and pulled in different directions to become a common intellectual enterprise" (Waisbord, 2019, p. 11) and the field's core was pragmatically characterized as "what communication scholars do; it gets presented at communication meetings and published in communication journals" (Waisbord, 2019, p. 124). In line with this attribution, also empirical findings show that "research in high-impact communication journals has at its core been characterized by a great topical variety" (Günther & Domahidi, 2017, p. 3064) which in itself can easily be understood as "a strength, a boon of communication studies, a logical, sensible approach to the study of specific questions" (Waisbord, 2019, p. 133). The same can be said for the field's methodological variety that is also capable of bringing about a certain variance in open-science practices as sharing data and materials is applicable to some methods more than to others. This does not only concern issues of privacy and anonymization of data but also questions of copyright (van Atteveldt et al., 2021). In sum, topical as well as methodological variety and the field's location at the intersection to several neighboring disciplines allows to reach out and adhere also to neighboring sets of quality norms and standards.
That said, communication science has become more focused over the last 15-20 years (e.g. C. Chan & Grill, 2020;Puschmann & Pentzold, 2021;Song et al., 2020). While the field's blurry selfdefinition and its noddings toward neighboring disciplines remain, particularly the uprise of the internet has helped core subjects to emerge (e.g. social media). This becomes evident with various topics addressed with much more nuance (Puschmann & Pentzold, 2021) as well as greater linkage between the field's individual subjects and sub disciplines on certain issues (Song et al., 2020). Chan and Grill (2020) distinguish the field's developments in this regard into topics of high supply (i.e. many publications), high popularity (i.e. many citations), and high prestige (i.e. many citations from publications on other topics), thereby identifying the field's defining core topics (e.g. persuasion, social media) alongside highly visible yet less numerous (e.g. media psychology, journalism studies) and highly numerous yet less visible (e.g. interpersonal communication, health studies) topics. It remains an open question, however, how the increasingly strengthened focus of communication science affects the formation of a shared set of quality norms and standards. At the very least, Markowitz et al. (2021) suggest a gradual increase in the use of open-science vocabulary.
Another driver of the field's continuously emerging focus has been identified in the recent developments of computational communication science (CCS). That is, in providing new opportunities for data collection and analysis, CCS allows for revisiting established concepts from new perspectives, providing broader sets of data and evidence, as well as estimating more complex dynamics and relationships (Hilbert et al., 2019). In turn, these developments come with an even more heterogeneous methodological landscape. That is, while these opportunities have already been paired with calls for transparency (van Atteveldt & Peng, 2018), open-science principles are by no means to be taken for granted within CCS (Lazer et al., 2020). Particularly, the availability of large datasets is often limited by copyright and/or ownership liabilities, rendering data sharing hard-to-impossible and thus inhibiting both reusability and replicability (van Atteveldt et al., 2019, 2021).

Research questions
As discussed, the main goals of open science are to promote and incentivize transparency, openness, and replicability throughout scientific endeavors. Importantly, the need for an increasing adherence to open science principles is rooted in the characterization of communication science as lacking replications and replicability . This latter characterization, however, is largely based on anecdotal evidence and has faced some criticism, for example for insufficiently considering potential harms for individual study participants (Fox et al., 2021). What is more is that sharing data and materials is applicable to some methods more than to others and sharing has been considered good practice not only since the debate around open-science principles took off but also before that (Blohowiak et al., 2013;Kidwell et al., 2016). As such, not much is known about how often data and materials are being shared, whether and how much such practices vary depending on the applied method and research design, how this has changed along the argued shift of the field's thematic focus, and how many research designs are being pre-registered. In addition to Markowitz et al.' (2021) suggestion that the usage of open-science vocabulary might be diminishingly low, getting a clearer picture of the status-quo of our discipline in this regard is important as an argumentative ground to base discussions about potential changes on. As such, our first set of (pre-registered) research questions asks: RQ1a: How often have published studies and particularly experiments been pre-registered?
RQ1b: How often has research data been shared for published studies in general and per method particularly?
RQ1c: How often have research materials been shared for published studies in general and per method particularly?
Second, prior evidence suggests open, transparent, and reproducible scientific practices to lead to a larger number of citations (e.g. Piwowar et al., 2007). As mentioned above, this might partially be due to the fact that open-science practices also offer additional reasons to cite a paper (Markowitz et al., 2021) and that authors selectively choose to share data and materials for better work more than for other studies (Gargouri et al., 2010). In addition, studies that adhere to open-science practices can also easier be replicated and might thus be cited more often. In this, larger numbers of citations indicate a discipline's key findings of their time (Aksnes et al., 2019), thus highlighting the studies' relevance for being replicated. Being able to replicate studies with large academic impacts is thereby pivotal for mapping the advance of a discipline. By sharing the used data or materials as well as by pre-registering, replications become easier and more feasible and might thus lead to potentially higher citation counts. While Markowitz et al. (2021) suggest that open-science practices are not related to citation counts, we argue that their operationalization of open-science practices might be too broad and thus underestimate the amount of papers that adhere to open-science principles. As such an additional analysis of this relationship might be helpful.
Ultimately, we aim to unravel whether these relationships are detectable and whether they are constant across communication science and psychology vis-à-vis varying roles of open-science endeavors per discipline. Psychology, due to large parts of the field's methodological homogeneity, its professionalization and standardization, thereby is the ideal benchmark. Psychologists have also discussed the replication crisis since at least a decade now and have ever since advocated for the stronger incentivation of open-science principles . RQ2a: How does (i) pre-registration, (ii) shared research data, and (iii) shared research materials relate to a study's number of citations in communication science? RQ2b: How does (i) pre-registration, (ii) shared research data, and (iii) shared research materials relate to a study's number of citations in psychology?

Method
Unlike related endeavors from psychology by means of a survey (LeBel et al., 2018) and previous broad dictionary-based approaches from communication science (Markowitz et al., 2021), we base our study on a large corpus of publications for which we automatically classified the applied method as well as the use of open-science principles through machine learning. As such, we followed a three-step approach. First, we collected published papers and their citation counts for both communication science and psychology. Second, we coded a sample of these papers for their reported method, design, and their reporting of open data, open materials, and pre-registration. Third, we trained supervised models on these variables to predict the remainder of the sample.
With the exception of publications' full texts, all data, materials, and trained models have been made publicly available under https://osf.io/gdjyq/. This repository also includes an earlier version's pre-registration of the research design.

Articles
Articles were collected from a total of 20 journals from communication science 2 as well as three psychology journals 3 for comparative reasons. Journals were chosen on the basis of their impact factor following a two-fold logic: First, we aim to connect our study to results from previous research endeavors that used a similar sampling (e.g. Chan & Grill, 2020;Markowitz et al., 2021;Song et al., 2020). Second, we decided to analyze high-impact journals since impact still appears to be the best indicator of a publication's relevance to a discipline (see: Günther & Domahidi, 2017).
For an article to end up in the data collection, it had to have a DOI assigned, represent original research, and be published in print between January 2010 and August 2020. 4 As a starting point, we were provided the corpus of a recent publication overview of communication science (Song et al., 2020); therein, the authors scraped full texts from papers that have been published between 2010 and 2019 in one of the 20 journals with the highest impact factors according to the Clarivate Journal Citation Report (previously ISI Web of Science). We were provided with a complete document-feature matrix, containing a total of 8819 articles, after lower-case conversion, the removal of stop words, punctuation, and symbols, as well as after lemmatization. In contrast to the author's original pre-processing, however, we were provided with the corpus before removing words that appear in more than 99.9% or less than 1% of the documents, which would have eliminated terms, such as 'osf,' or 'pre-registered. ' We then manually collected more recent articles that have been published in aforementioned journals between January 2019 and August 2020, omitting online-first publications as we do not expect those to show strong variation in citation. We also employed scraping tools to collect published papers from "three important psychology journals: Psychological Science (PSCI), Journal of Personality and Social Psychology (JPSP), and Journal of Experimental Psychology: Learning, Memory, and Cognition (JEP:LMC)" (Open Science Collaboration, 2015). The same pre-processing steps were applied to this additional set of communication and the new set of psychology papers, including lower-case conversion, the removal of the exact same stop words, punctuation, and symbols, as well as lemmatization according to the seminal corpus' authors (Song et al., 2020).
In total, we ended up with one document-feature matrix consisting of N = 14,270 unique documents (n = 9516 from communication science) and 451,089 unique words (features).

Citations
We captured the number of citations along with the DOI-official publication year to overcome any potential scraping issues on 29 April 2021 from CrossRef using the respective API and the rcrossref package (Chamberlain et al., 2020). All articles were successfully identified (also given that we only used articles employing a DOI and omitting online-first publications). A total of N = 743 publications (n = 563 from communication science) was never cited, though. On average, papers in communication science were cited M = 23.4 times (SD = 47.8; Md = 11) whereas papers in psychology were cited M = 39.2 times (SD = 76.4; Md = 19).

Coding
To create a gold standard (i.e. a ground truth), we extracted a stratified random sample from the collected articles in that we kept ratios of articles constant across all years of publication and across both disciplines. Following recommendations for the size of the gold standard to be at 50 relevant cases per category (Manning et al., 2008), we coded a total of 1042 unique publications to arrive at a respectively large number of publications. Following established procedures in computational social science, the validity of the automated coding will be judged against this sample of manually analyzed texts.
Given each article's DOI and thus access to its PDF respectively, three human coders coded whether (1) data has been shared, whether (2) materials have been shared, and whether (3) the reported study had been pre-registered. For each article, the coders also coded whether results from (4) a content analysis, (5) an observational study, or (6) a survey were reported, whether (7) at least one of the reported studies followed an experimental design, and whether the article contained a (8) qualitative, (9) quantitative, or a (10) computational study. None of the 10 categories were mutually exclusive, so that the coding scheme would also capture manuscripts reporting a content analysis and a survey or a combination of qualitative and quantitative methods. For preregistration, the coders were instructed to accept both fully registered reports and partly pre-registered research designs, hypotheses, or research questions. Finally, coders were also asked to note any empirical approaches deviating from this coding scheme but these notes did not add up to a necessary extension of the codebook.
All three coders coded a shared sub sample of 101 articles to estimate intercoder reliability as is common for content analyses. Intercoder reliability, as reported by Krippendorff's Alpha and evaluated with .667 as "the lowest conceivable limit" (Krippendorff, 2004, p. 429) was sufficient in all but one category, ranging from α = .67 (open data) to α = .96 (quantitative), despite some categories accounting for very small numbers of occurrences in this shared sample (e.g. pre-registration; n = 4; α = .75). Observation with α = .58 was below this common threshold and is thus eliminated from further use in this study. A detailed breakdown of all coded categories, their occurrences in the coded sample, as well as the intercoder reliabilities can be found in the supplementary materials (Table S1). The full codebook is available in the OSF repository.

Training
We trained different classifiers, separately for each of the remaining nine variables. That is, we optimized classifier performance both per open-science variable and per methodological variable through the use of different pre-processing steps, all building on the initial document-feature matrix (DFM) that already pertained to the pre-processing steps of lower-case conversion, removal of stop words/punctuation/symbols, and lemmatization. This per-classifier optimization to adapt to the respective classification problem is common to machine learning, particularly in natural language processing (Manning et al., 2008). As such, all presented machine-learning endeavors resemble bag-of-words approaches based on single terms, which were reduced per category to smaller sets of words likely indicative of the respective category. Coded data were split per category into a weighted training set containing 80% of the coded publications and a weighted test set containing the remaining 20%. For every coded variable, then, every additional step of pre-processing as well as the algorithm and the settings used for training can be found in the supplementary materials (S2). In the online appendix, also the final versions of the nine models have been shared.
Open-Science variables Open data. The final classifier to determine whether an article publicly shared data built on a DFM reduced to features following a two-step approach. First, features indicative of data (e.g. data, empirical), openness (e.g. open, accessible), or a sharing repository (e.g. osf, dataverse) were selected and upweighted by a factor of 10. This should help to identify all the relevant publications but is expected to also include several false positives. Therefore, second, the remainder of all features were reduced to their word stems before removing all terms appearing in more than 20 or less than 1% of all documents. This was to consider decisive patterns which are neither too common to be indicative of shared data as the manual coding only pointed to 73 out of 1042 publications (7%) sharing their data, nor too uncommon to be irrelevant. These terms were weighted by a factor of one. In total, this procedure yielded 5761 features and a sparsity of .96. A binary multinomial Naive Bayes algorithm was applied, implemented through the respective quanteda functions in R (Benoit et al., 2018). Given the small number of positively coded publications in the training data, accuracy is reflected mainly in recall rather than precision with the classifier performance being far from ideal yet much better than by chance as the area under the ROC curve (Robin et al., 2011) indicates (P = .24; R = .53; F1 = .33; ROC AUC = .92; accuracy = .86). Results on open data in this study thus need to be handled with great care. For our three main dependent variables, we performed additional spot checks of classified material to unravel what kind of studies were categorized as containing open science. We thereby analyzed 30 classified studies for each variable -15 that were classified as containing an element of open science and 15 not containing it. Most true positives classified by the open-data classifier contain links to an online repository where the dataset is stored. A second set of studies contains a link to publicly available dataset that was not created or published by the authors. Finally, the spot check revealed that a substantial part of false positives contains references to a dataset from another study and/or other authors that is not available. Individual results are available in the OSF repository.
Open materials. For open-materials, a similar two-step reduction of features was applied. First, features indicative of respective materials (e.g. scripts, stimuli, code), openness (e.g. open, accessible), or a sharing repository (e.g. osf, dataverse) were selected and upweighted by a factor of 10 to help identify relevant publications. Second, the remainder of all features were reduced to their word stems before removing all terms appearing in more than 20 or less than 1% of all documents to consider decisive patterns in line with the manual coding of 219 out of 1042 publications (21%) which shared materials. Again, these latter terms were weighted by a factor of one. The procedure yielded 5760 features and a sparsity of .96. Like with open data, also for open materials a binary multinomial Naive Bayes algorithm was applied. Performance was slightly better, though, with higher precision coming at a cost of some overall accuracy (P = .40; R = .56; F1 = .47; ROC AUC = .86; accuracy = .73).
A spot check of classified material revealed that mainly two types of documents were categorized as providing open material. First, studies that provide a link to an online repository (e.g. OSF) that contains stimulus material or questionnaires. Second, interview-based studies that contain the posed questionnaire items either in the appendix or provide them as a table in the main document. One source of misclassification originates from studies that only provide exemplary items and not the complete questionnaire in the text. Those studies were classified as containing open material by the algorithm but not by human annotators.
Pre-registration. Estimating a classifier to determine whether an article reported pre-registering its endeavors did yield its best outcome with a DFM reduced solely to a specific set of vocabulary. As such, the DFM was filtered on features suggested by prior research (Park et al., 2018) and various journals' suggestions on how to formulate respective sections within data-availability statements ('*osf*,' '*aspredicted*, 'pre*regist*,' 'registered*,' 'report,' 'plan*,' 'hypothes*s,' 'power,' 'analys*s'). The resulting DFM contained 374 features at a sparsity of .99. Across all categories, the number of positive cases in the training materials was smallest for pre-registered studies (only 21 out of 1021 cases were manually coded as pre-registered). A Support Vector Machine based on L2-regularized logistic regression (Fan et al., 2008) was used for training, resulting in satisfactory performance metrics (P = .75; R = .75; F1 = .75; ROC AUC = .76; accuracy = .99).
The spot check of the pre-registration classifier revealed that all 30 manually analyzed texts were classified correctly and that in all true positives the studies contained a link to the pre-registration on OSF.
Methodological variables Content analysis. To estimate employed methods, a slightly different two-step route was taken.
First, available features were reduced to method and measure vocabulary typically employed in respective approaches. For content analysis, such vocabulary subsumed the process of coding as well as the estimation of intercoder reliability ('content,' 'analysis,' 'content*analy,' 'coder*,' 'codebook,' 'int*cod*,' 'reliab*,' 'agreement,' 'classif*,' 'holst*,' 'cohen,' 'cohen*s,' 'krippendor*,' 'fleis*', 'brennan*'). Second, as the amount of usage of such vocabulary should not be indicative of the employed method, the reduced DFM was transformed into a boolean representation. In other words, a publication mentioning its codebook three times was equally represented in the DFM as a publication mentioning its codebook only once. The resulting DFM, then, consisted of 198 features and a sparsity of .99. A logistic neural network with one hidden layer and five neurons, initial weights of .6, a decay of .005, and a maximum number of 5000 iterations was trained using the nnet package in R (Venables et al., 2002). The training built on 194 out of 1024 (19%) positively coded content-analytic publications. Performance, again, was not ideal yet much better than by chance (P = .52; R = .54; F1 = .53; ROC AUC = .74; accuracy = .82).
Qualitative study. To estimate whether a publication reported a qualitative empirical study, a broad set of features, reduced to their word stems, was employed in that the DFM was reduced to terms appearing in at least 10% of all documents yet not in more than 50% of all documents. This range subsumes the share of 168 out of 1042 (16%) positive manual codings. In total, 1313 features (sparsity = .78) were used for training a logistic neural network with one hidden layer and 10 neurons, initial weights of .5, a decay of .005, and a maximum number of 5000 iterations. Performance was suboptimal yet, overall, satisfactory (P = .61; R = .61; F1 = .61; ROC AUC = .48; accuracy = .88).
Quantitative study. Similar to classifying qualitative studies, features to train a classifier for quantitative studies were also reduced to their word stems before removing features appearing in less than 20 or more than 80% of all documents. This resembled the manual coding insofar as a majority of 772 out of 1042 publications (74%) were categorized as quantitative in nature. The resulting 838 features (sparsity = .60) were used to feed the training for another logistic neural network with one hidden layer and 10 neurons, initial weights of .5, a decay of .005, and a maximum number of 5000 iterations. This network performed best across all reported categories (P = .92; R = .91; F1 = .91; ROC AUC = .65; accuracy = .87).

Results
The first set of research questions asks for descriptive insights into how often published studies and particularly experiments have been pre-registered (RQ1a), how often published studies have been sharing their data (RQ1b) and research materials (RQ1c), both in general and per applied method. Analyses for RQ1 are based on communication-science publications only.
Several findings are noteworthy ( Figure 1). First, sharing materials is the most common of the three open-science practices in communication science, with roughly one-in-five publications referring to openly available materials. While sharing data is much less common with only about 3-5% of publications doing so, pre-registration in communication science does practically not exist. Second, although little in numbers, almost all pre-registered studies are experimental studies. In contrast, sharing materials is more common for studies not following an experimental design while some publications which share data follow an experimental paradigm and others do not. Third, sharing materials has been rather common for surveys or interview studies (i.e. questionnaires, interview guidelines) as well as for content analyses (i.e. codebooks, coding instructions). A similar tendency can be identified for sharing data. Fourth, there are only minor differences in sharing practices between qualitative and quantitative research.
These findings stand in some contrast to results from psychology. Similar to communication science, also for psychology sharing materials is more common than sharing data, albeit to a lesser extent. Sharing data, however, is much more frequent, with about one-in-four publications doing so. The majority of publications sharing either data, materials, or both, follow an experimental design, which is not surprising given the specific selection of psychology journals. Finally, pre-registration is reported in an increasing share of publications. That is, among surveys or interview studies the largest group of psychology publications in this sample -24% of publications in 2020 were classified as reporting pre-registration, 61% of which were also classified as reporting an experimental study design.
The second set of research questions asked for how the implementation of open-science principles relates to subsequent citation in either communication science (RQ2a) or psychology (RQ2b). With regard to the descriptive overview (for journal-specific descriptives, see supplementary materials S5), we ran separate regression models in which we regressed citation counts on age (2020 minus the year of publication) as well as dummy codes per journal, classified methods, and study types, whether a publication reported an experiment, and per open-science principle. Likelihoodratio tests indicated the presence of overdispersion; instead of Poisson (as was suggested in our pre-registration) we hence employed negative binomial regression models (Table 1).
Findings show that age is one of the strongest and most consistent predictors of citations. Given this expectable benchmark, various journals also positively affect subsequent citations, such as both the Journal of Personality and Social Psychology and Psychological Science over the Journal of Experimental Psychology (i.e. the reference category), as well as particularly the Journal of Computer-Mediated Communication, the Journal of Advertising, the Journal of Communication, and New Media & Society over Communication Monographs as the reference category. These predictors already account for totals of 40 (communication science) and 59 (psychology) per cent of explained variance (using the Pseudo-R 2 by Nakagawa & Schielzeth, 2013).
Although both second models significantly better explain the data, particular methods, study types, the reporting of experimental studies, or open-science principles as additional dummy predictors do not explain much additional variance (now 41% for communication science and 60% for psychology). As such, open-science principles seem to not be a determining factor for subsequent citation.

Post-hoc analysis
As per a reviewer's suggestion, a post-hoc analysis was introduced after pre-registration to capture potentially different patterns from before and after the debate around open-science principles took  off. As such, we ran additional and separate regression models similar to those from before (i.e. Table 1) but reduced to two portions of both disciplines' datasets, resembling publications from before 2014 and in/after 2014. The split in 2014 was chosen as, by then, early-adopting journals in psychology started to introduce open-science badges. The reduced datasets, however, led to overfitted models with convergence issues so that we also applied two additional changes to this post-hoc dataset: First, instead of introducing every journal into to the model, we re-coded publication into journals with higher impact for both psychology (i.e. the Journal of Personality and Social Psychology and Psychological Science) and communication science (i.e. the Journal of Computer-Mediated Communication, the Journal of Advertising, the Journal of Communication, and New Media & Society) vis-à-vis publication in one of the other journals. Second, for the models with publications from before 2014, pre-registration had to be removed entirely from the dataset as no publication was classified as such.
Results then show (Table 2), again, that age and the corresponding journal depict the strongest predictors for citation counts. Method influence in these models when compared to the models spanning the entire 10 years is rather similar as well, with one notable exception. Quantitative studies predict subsequent citation count not uniformly but this effect disappeared in the second time period. With regard to open-science principles, not much has changed, though. One interesting aspect, however, is the explained variance that is considerably larger for the publications in/after 2014 than for older publication. One might interpret this cautiously as another example of the timely dynamics of age and visibility on citations as opposed to the here-studied open-science principles.

Discussion
Descriptive results confirm the anecdotal evidence that communication science largely lacks openscience principles, which are expected to inform an environment of transparency, openness, and reusability. As only a fraction of studies provides open materials while even less have provided open data or pre-registered their endeavors, it is safe to assume that an open communication science has not entered the mainstream of our discipline as of today. Still, the identified shares of studies adhering to open-science principles are substantially larger than suggested by prior work (Markowitz et al., 2021). That is, while this previous study found that only 5,1% of the analyzed studies employed open-science vocabularymatching our categories of open data and open materialour data indicate that about 20% of studies share their material while only 3-5% share their data. As indicated above, this difference might stem from the used dictionary-based approach that is likely to underestimate the reliance on open science. As such, our paper extends the perspective offered by Markowitz et al. and provides a more detailed perspective on the adherence to open science practices by also differentiating across methodological approaches. Moreover, we compared the adherence of open science practices in communication science to the discipline psychology as a benchmark. Psychology, in turn, shows a constant and high share of studies that make their data and materials available along with a recent and stark increase in pre-registrations. Given that the debate on open-science endeavors has started much earlier in psychology than in communication science, the difference between the two disciplines could just resemble a delay, with recent initiatives such as badges, conference themes, or special issues paving the way forward. However, the overall lack of adherence to open-science principles within communication science might also be due to a difference in research designs. While psychology, particularly in the journals analyzed here, is largely driven by experimental survey research, communication science more heavily builds on methods of content analyses.
Here, data sharing can be much more difficult, though, for example due to copyright protection, restraining terms of service, or ownership liabilities (van Atteveldt et al., , 2021. Various scholars recently have suggested measures to balance out these obstacles, for instance, by publishing metadata, sharing small validation datasets, or providing remote access to servers that contain the original data (van Atteveldt et al., 2021). Indeed, data sharing in 2020 has been the highest, albeit small in absolute numbers, for content analyses.
In trying to unravel the relationships between replicability of studies through means of openscience principles  and their academic impact through means of citation counts (Aksnes et al., 2019), this study also seeks to contribute to a better understanding of a transparent and replicable scientific environment. In doing so, the results indicate that there is no strong relationship between adhering to open-science principles and the subsequent number of citations a study receives. Instead, ageas in the time lag common to citationsand journal selection strongly relate to the number of subsequent citations.
These findings also oppose previous studies from other disciplines suggesting that scholars adhere to open science especially in their strongest publications which then, due to quality rather than mere availability, receive an increased amount of citations (e.g. Gargouri et al., 2010). Instead, these insights presumably echo the complex nature of citation behavior and the previously mentioned variety of citation motives, between normativity (i.e. citing as currency for intellectual debts) and social constructionism (i.e. citing to position oneself) (Erikson & Erlandson, 2014). Especially communication science as a "post-discipline […] has been too diverse to succumb to a single vision of science or discipline" (Waisbord, 2019, p. 131), thus potentially asking more pointedly to show adherence to specific schools of thought and to position oneself in accordance with certain streams of research through more social-constructionist forms of citation behavior. As such, our results indicate that scientific impact by means of citation counts resembles a rather different facet of research quality than, for example, transparency, openness, and reusability. This seems unfortunate, given that the current academic environment is driven largely by means of scientific impact. What follows is that if the communication-science community wants to value openscience principles more strongly, much remains to be done (see also Fox et al., 2021). Badges, as seen in the rather constant sharing patterns in psychology (Figure 2), have not made a big difference given that levels of sharing data and research materials have been equally high before and after journals have started to introduce them. This also becomes evident by depicting our results in two time periods (Table 2). What has presumably made a difference, is the prominent and ongoing debate around a 'replication crisis' which has started to spark attention right before a visible increase of pre-registrations (e.g. McEwan et al., 2018). To that end, journals, funders, and scholars should take transparency, openness, and reusability much more into account when judging the soundness of research, such as for calls, during review processes, or when selecting sources to cite. However, they should do so with great care, given communication science's manifold landscape of empirical methods. This also echoes a previous call, again from psychology, proposing that reviewers should demand certain levels of transparency when it comes to the applied methodology, the used instruments, and the analyzed data (Morey et al., 2016). And while open-science principles cannot always be followed, as total openness might for example affect marginalized groups or researchers (Fox et al., 2021), it can be made a requirement to at least be addressed to make transparent also the reasons on why to abstain from sharing data or materials.
As such, more debate about how research can be made transparent and open is necessary. In that, this study adds empirical foundations for communication science. In addition, this study also makes for a methodological argument that some, but not all, of the proposed predictors can be classified (semi-)automatically. Such an automation could potentially also help journal editors to award badges in retrospect so that older publications from before badges were introduced could be highlighted for sharing their data or materials. Yet, such automation endeavors are clearly dependent on the specific variables as, for example, pre-registration worked rather well whereas the opendata classifier faced severe precision issues when falsely claiming references to a dataset from another study and/or other authors to resemble open data.
In that, our findings as well as all of this study's shared tools do not come without limitations. Given that not all of our annotations and training efforts yielded satisfactory outcomes, no classifications could be provided for observational studies and computational methods. The remainder of trained models, then, reached satisfying levels of accuracy, recall, areas under the ROC curve, and in most cases, also precision. Building on these classification performances, our classifiers were able to code acceptable rates of true positives and thus merely underestimate rather than overestimate the occurrence of coded categories. Moreover, our tools build on single-term bag-of-words training data. While this allows us to share publications' document-feature matrices, as a result, the context in which a term was used could not be considered. Future research could take this into account to train classifiers with n-grams or more wholesome natural-language approaches, such as word embeddings. Ultimately, our study included publications only if they had already been included in a journal's issue. In doing so, we might have painted too dark of a picture of the status quo in communication science. The discipline's discourse around open science has recently gained further traction, not least due to the 2020 ICA theme of open communication. Whether the current discourse has sustainable effects, however, not only remains to be seen but also remains to be implemented by each and every one.