Examining Sentiments and Popularity of Pro- and Anti-Vaccination Videos on YouTube

Vaccine misinformation on social media poses significant drawbacks to the efforts of vaccine coverage rates. This research studies the interlinkages between pro- and anti-vaccine YouTube videos to help public health professionals explore new ways to reach anti-vaccine and vaccine-hesitant audiences. Using YouTube's API, we retrieved 9,489 recommended videos from 250 seeds using keywords such as "vaccines" and its derivatives. We then manually identified 1,984 videos directly related to vaccination and then categorized their vaccine sentiment into pro-, anti-, and neutral. Results show that 65.02% of the videos were anti-vaccine, and only 20.87% were pro-vaccine, 14.11% were neutral. Anti-vaccine videos were significantly more prevalent in the "News & Politics" and "People & Blogs" video categories; while pro-vaccine videos were more prevalent in the "Education" and "Science & Technology" categories. Results also showed that anti-vaccine sentiment videos have higher values of closeness centrality (p<0.05), suggesting that watching an anti-vaccine video will likely lead to more anti-vaccine video recommendations. Moreover, videos that had more dislikes than likes (dislike/like ratio) are positively related to pro-vaccine videos (OR=3.912), suggesting that pro-vaccine videos are more ill-received on YouTube than anti-vaccine videos. This study is the first to examine the network of vaccine-related videos on YouTube and their centralities. The results highlight some possible limitations of YouTube-based vaccination awareness campaigns and also emphasize the need to diversify how YouTube makes its recommendations to help viewers break out of the anti-vaccine "bubble."


INTRODUCTION
Vaccination is considered the greatest public heath achievement of the 20 th century by the Centers for Disease Control [1], it is one of the safest public health interventions that effectively reduces mortality, disability, and poverty [2]. However, negative sentiment towards vaccines is becoming an increasing concern over the past 20 years. A claim widely considered to originate from a nowretracted paper published in an influential medical journal The Lancet, the authors falsely claimed a positive correlation of Mumps-Measles-Rubella (MMR) vaccine and autism, the results of the paper was widely shared within an influential Autism community online that sponsored American talk-show host Jenny McCarthy in the early 2000's as their spokesperson [3,4]. This led to a growing interest among researchers to start examining media, celebrity culture, and the Internet to understand how and why misinformation spreads [5][6][7]. The following is a literature review on the socioeconomic and attitudinal variables that helps us understand vaccine-hesitancy from different paradigms.

Three Paradigms to Interpret Anti-Vaccination Movement
Anti-vaccination movement is first and foremost a health systems delivery issue. Under the evidence-based paradigm which promotes that evidence provides the basis for effective treatment, healthcare providers tend to see anti-vaccine movement as a population-based "knowledge deficit" exaggerated by health information-seeking on the Internet. Literature produced under the evidence-based paradigm emphasizes the responsibility of healthcare providers to present facts to debunk myths, to take time to listen and address concerned parents' doubts, and ultimately, to earn the trust of patients [8][9][10][11][12]. Systematic reviews of 17 studies suggested limited effectiveness of educational interventions on vaccine coverage [13]. More recently, a randomized clinical trial indicated that provaccine messages might even worsen misconceptions for people who are already anti-vaccine [14].
A second popular interpretation of anti-vaccination movement comes from the social sciences and qualitative researchers in public health. Some propose to incorporate communications theory and marketing strategies to inject trust and convenience into the vaccine-hesitant [15], while some use social construction theories and postmodernism to explain that anti-vaccination movement is contextual [6]. Under this paradigm, researchers observe that knowledge is perceived as situational and subjective, therefore an individual's belief about vaccines is actively being generated and interpreted to best suit their personal belief system; often, values are constructed through emotional appeal over evidence [16,17].
Moving along the same line of inquiry, public health professionals recognize that risk-negation increases vaccine-hesitancy [18]. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from Permissions@acm.org.
#SMSociety '17, July 28-30, 2017 Sociologists and public health scholars employ the concept of "risk society" [19] to explain that as civil society progresses, getting vaccine is beyond the control of technocracy or meritocracy [20][21][22]. Instead, curbing vaccine hesitancy is not a simple matter of "educate and vaccinate" health service announcements, but rather it requires a targeted and dialogical communication, tailored for the concerned individuals [23].
Finally, many researchers have begun looking at the medium in which misinformation resides, namely, social media. Social media has challenged the models, theories, and processes underpinning decision-making theories [24]. Social media fosters an atmosphere of reciprocal support and mutuality [25], creates virtual networks that allow like-minded people to form clusters/cliques [26], and influence others through information diffusion in a scale-free environment [27]. A large-scale study of 253 million Facebook users has demonstrated that the ecology of social media allows information to propagate through weak ties more than strong ties [28]. Furthermore, comparing the cascade effect between Digg and Twitter, researchers have observed that Digg's dense network leads to faster but limited spread of information, while Twitter's lose network created a slower but a wider spread of information. Thus, confirming that network structures can profoundly influence reach [29,30].
In the past two decades, properties of the network theory has been used extensively by social epidemiologists; for example, using the term "social contagion" to explain the spread of non-infectious diseases such as obesity [31], depression, and smoking cessation [32]. Others have looked at the causal pathways of network consolidation on the diffusion of opinions, norms, and practices using web-based stimulator games [33]. Even though social network analysis is at its infancy in public health, researchers are increasingly recognizing that a network's structural components is an important determinant of health disparities and health beliefs in communities [34][35][36]. Social media distinguishes itself from traditional health interventions as having scale-free properties that enables us to look at the centrality of the actors (nodes) that can influence opinions and norms [37].

Social Media and YouTube as an Untapped Opportunity for Public Health Campaigns
It is estimated that 71% of Internet users have searched for information about their health online [38]. For many, social media has also become one of the first places we look for health information, following going to a clinic, or asking advice from family and friends [39,40]. As iterated, the unique function of social media enables instant, free, dialogical interaction, through which communities are created [41]. In the context of public health, there are many possible applications for social media. For example, it can provide users with anonymous information and social support for health conditions, such as sexual and mental health, that are often stigmatized [42]. Social media also creates an environment of "legitimate peripheral learning" for those who wish to partially participate in learning about a certain health information [41,43]. It also enables the exchange of health information without geographical constraints [44]; each of these examples serve as direct and indirect mechanisms to empower the patient and the public.
Despite the benefits of seeking health-information, health care professionals whose daily practice are rooted in the positivist paradigm follow evidenced-based practice. This ideology appears to be at odds with the fast-paced social media platforms, which have no submission threshold where anyone can post and share vaccine information and whose claims might not offer evidence, testing of the hypothesis, or peer-review process, but are still be widely distributed.
As one of the most popular social media platforms, YouTube plays a key role in disseminating public health information (and misinformation). YouTube's popularity dominates the current video-based social media platform. As of 2016, there are 800 billion active monthly users and 4 billion videos being viewed on YouTube per day, making it one of the widest-reaching, demographically diverse user-generated content platform [45]. It was on YouTube that the North American anti-vaccine movement was able to upload and share conference recordings free of charge to a wider audience. Compared to the past when tapes of proceeding had to be ordered by mail, YouTube allowed non-profit Autism awareness groups to easily host instant discussions, comments, and information-sharing as early as 2005 [46].
In this research, we use a networked mindset to tackle vaccine hesitancy. We examined literature regarding public health promotion that specifically focuses on YouTube as a popular platform for information sharing, and from this research raise the question: What is sentiment towards vaccine on YouTube and are anti-vaccine videos more central, and thus more likely to be found, than pro-vaccine videos? Our objectives are to quantify the current sentiment towards vaccines on YouTube; then, to explore differences and similarities in vaccine sentiment in relation to various video properties, such as video category, view count, and network properties.

RELATED WORK
In this section, we will introduce three areas of research relevant to this paper. First, we discuss the use of sentiment analysis of text, image, and videos on vaccination; second, we briefly go over vaccine awareness interventions and outreach in the community and on social media.

Vaccine Sentiment Analysis
In response to the rise of anti-vaccine sentiment, the World Health Organization (WHO) organized a Strategic Advisory Group of Experts (SAGE) on Immunization to monitor vaccine confidence, vaccine complacency, and to assess accessibility of vaccines worldwide [47]. While traditional surveys are useful to gauge sentiment in global health settings [48], our online lives are increasingly influencing offline behaviors and decision-making [49].
Researchers have looked at Facebook and Twitter where 30-35% of the content are anti-vaccine [50][51][52], similar results were found on MySpace and blogs [53][54][55][56]. Regarding image-based social media platforms, the only research known to us was by Guidry et al., who have observed that 75% of the vaccine sentiment on Pinterest are negative [57].
In video-based social media, some early research in this area showed that YouTube exhibited 32% negative vaccine sentiment in 2007 [46], the prevalence of negative sentiment continued to rise to 51.7% in 2013 [58], and 65.5% in 2017 [59]. Although the samples of the three studies cited above are somewhat different and they are not setup for direct comparison, the overall results suggest that negative sentiments regarding vaccination on YouTube are potentially on the rise.

Vaccine Promotion Strategies in the Community, on the Internet, and Social Media
A comprehensive review-of-reviews looked at 15 published literature reviews that analyzed hundreds of interventions addressing vaccine hesitancy. Results showed that it is inconclusive whether any interventions work to address vaccine refusal and vaccine hesitancy [60]. In the review, traditional interventions remain the most prominent: educational (in-person or in-patient), school-based programs, or distributing reminders and standing orders, which are predominantly patriarchal and grounded in the knowledge-deficit framework [4].
Due to the prominence of social media as a go-to source for health information and discussions, public health practitioners have developed and tried a wide array of interventions on social media to promote vaccine awareness. Some interventions include building informational websites, online portals, mobile applications, textreminders, mailing lists, and advertisement [61,62]. Most interventions were able to monitor changes in attitudes towards vaccines but rarely the actual uptake of vaccines [61]. Furthermore, only one study has ever used YouTube as a source of intervention to observe whether watching anti-vaccine videos can change perceptions towards vaccines. However, the test subjects were first year medical students who were already predominantly provaccine, thus after watching anti-vaccine videos the participants' views on vaccines did not change [63].

Retrieving Vaccine-related Videos
Our data retrieval was based on the "recommended videos" function on YouTube. YouTube relies on a number of parameters to suggest videos tailored to individuals, maximizing viewership, watch-time, and viewer retention. Their proprietary algorithm recommends videos based on related topics, co-viewing information, as well as information about users' watch and search history, favorited videos, likes, and subscriptions [64].
We used a program called Netvizz [65] to collect a YouTube video network. This network consists of nodes that are YouTube videos and connections that link "related" videos according to YouTube's proprietary algorithm. To build this network, the program starts with the collection of seed videos that correspond to our search query. In our case, the search query was as follows: Immun* OR vaccin* OR vaxx*. In total, we retrieved 250 seed videos that corresponded to our search criteria. The number of seed videos was based on the five iterations of data retrieval using Netvizz. During each iteration, the program retrieved 50 items that corresponded to our search query. Once the seed videos were retrieved, Netvizz then queried each video to retrieve their corresponding lists of "related" videos with a crawl depth of one layer and five iterations. The resulting network consists of 9,489 nodes.

Variables
In order to examine a potential relationship between sentiment expressed in video towards vaccination and video attributes, we have identified seven independent variables, including six that were readily available for retrieval by Netvizz. These include comment count, view count, dislike count, like count, like/dislike ratio, and video categories. Video categories are predefined by YouTube and allow users to select which category their videos will be filed under. shows, and movies. The seventh variable was the sentiment of the vaccine-related videos. Since the seed videos were retrieved using English keywords, any non-English titles that might have been recommended by YouTube's "related" algorithm were ignored during this manual classification. To determine sentiments towards vaccination (either pro-vaccine, anti-or neutral) expressed in a video, one of the authors who specialized in public health watched all 1,984 vaccine-related videos containing in our results network of videos.

Sentiment Analysis of Vaccine-related Videos
Given that the YouTube recommended videos are suggested based on an algorithm designed to optimize viewer retention and viewing time [65], prominent vaccine-related videos and videos related to them, may have distinguishing features in terms of centrality measures. Thus, we explore whether network centrality measures of vaccine-related videos will be different and if it interacts with the independent variables. Among 9,489 videos collected for this study, only 1,984 videos had titles related to vaccine, immunization, and derivatives of these terms. Using this more targeted sample of 1,984 videos, we manually categorized their sentiment towards vaccination. We identified that 1,290 videos were negative (65.02%), 414 videos were positive (20.87%), 280 videos were neutral (14.11%). This is the first time that a large-scale video-sentiment analysis on vaccination has been achieved.

Measuring Network Centrality
To analyze the network of recommended videos and test whether there is a potential relationship between video's network position (centrality) and their sentiment, we used Social Network Analysis (SNA). We considered the normalized values of five network measures: betweenness centrality, in-closeness centrality, outcloseness centrality, in-degree centrality, and out-degree centrality. These centrality measures were selected to understand the prominence of a video in the YouTube network.
Betweenness centrality was defined based on Freeman's betweenness measurement, which is identified by counting the number of shortest paths between any two nodes that pass through a given node, such that videos with higher betweenness would theoretically have more "control" over the network [66][67][68]. Closeness centrality was calculated based on the reversed shortest distance between a given node and all other nodes in the network. In-closeness centrality considered paths from other videos to a given video, and out-closeness considered paths from a given video to other videos [69]. Finally, degree centrality was calculated based on the number of direct connections to and from other videos. Specifically, in-degree centrality measured the number of incoming direct connections to a video, and out-degree centrality measured the number of outgoing direct connections from a given video. High in-degree videos would be videos that are recommended based on many other videos; and high out-degree videos are those that lead to many other videos based on YouTube's recommendation algorithm.
We used UCINET6.6 to calculate the network centralities measures of the whole network (9,489 nodes). To measure the variance of centrality measures between pro-vaccine and anti-vaccine videos, we ran T-test on UCINET using 10,000 permutations. Running Ttest on UCINET allowed node-level data to be permutated, accounting for the non-independence of node-level videos in the network.
We also measured the effects of video attributes (e.g., video categories, view count, view count, like count, dislike count, comment count) on vaccine sentiment using chi-square test and binomial logistic regression in SPSS23.

Vaccine Sentiment and Centrality
Using the negative sentiment videos as the control group, we ran a two-sample one-tailed T-test. Anti-vaccine (negative sentiment) videos were coded as Group 1 and pro-vaccine (positive sentiment) videos were coded as Group 2. Both anti-vaccine and pro-vaccine videos have the same mean of out-degree and in-degree centralities (p<0.00). See Table 1. At the same time, anti-vaccine videos have higher values of in-closeness (p<0.05) and out-closeness centrality, (p<0.0005) but lower betweenness centrality (p<0.005). This suggests that anti-vaccine videos are easier to reach in the vaccinerelated video network, such that if the viewer started with an antivaccine video, subsequent recommended videos are more likely to be anti-vaccine in nature. Anti-vaccine videos were more likely to be in the "News & Politics" and "People & Blogs" categories; some examples of popular videos belong to the "Alex Jones Show," and videos related to the premiere of "Vaxxed: From cover up to catastrophe." Provaccine videos were more likely to be in the "Education" and "Science & Technology" category. There was a statistically significant association between video categories and vaccine sentiment, X 2 (7) = 43.678%, p<0.005. The association was small Cramer's V = 0.160. See Table 2.

Vaccine Sentiment and Video Properties
A binomial logistic regression was performed to ascertain the effects of view count, comment count, dislike/like ratio, and like count on the likelihood that vaccine-videos are positive or negative. We tested for linearity of the continuous variables (e.g., view count, like count, dislike count, comment count) via the Box-Tidwell procedure. A Bonferroni correction was applied using all nine terms of the model (including natural log transitions of continuous variables) to adjust the statistical significance to be accepted when p<0.0625. All continuous independent variables were found to be linearly related to the logit of vaccine sentiment.
To test for outliers, we conducted a case-wise diagnostic and identified that there were two studentized residual with a standard deviation of -7.302 and -10.642. After examining the data and its sentiment, we determined that they were not outliers and they were kept in the analysis. The logistic regression model was statistically significant X 2 (5) = 100.485 (p<0.0005). The model explained 9.3% of the variance in vaccine sentiment and correctly classified 78.2% of the cases. Of the five variables, dislike count, dislike/like ratio, like count, and view count were statistically significant (see Table  3). The larger the dislike/like ratio, the more a video is disliked in proportion to likes. The ratio is an indicator of controversial content or poor-quality content deemed by the viewers. Interestingly, we found that videos with higher dislike/like ratio have 3.912 times higher odds of being a video that is supportive of vaccines.

Anti-vaccine Sentiment Dominates YouTube
Mounting evidence suggests that imagery and video-based platforms tend to generate more shares than text-based forms or URLs [70,71]. Our research is consistent with previous research, showing that YouTube has been a harbour for anti-vaccine sentiment, which has a higher prevalence of anti-vaccine sentiment than text-based social media platforms observed in the Americas, Southeast Asia, and Eastern Europe [48,72]. Furthermore, our research shows that pro-vaccine videos are significantly positively correlated with more dislikes than likes (dislike/like ratio).
To explain this phenomena, we look into past research that showed that negative vaccine sentiment spreads more easily then provaccine messages on Twitter, and pro-vaccine messages are even observed to backfire and become subjects of attacks [73]. Another possible explanation on the prevailing anti-vaccine sentiment is related to the democratic nature of social media; with real-time free-speech comes unverified information, misinformation, and conspiracy theories that are often uncreditable. Venkatraman et al.'s research observed that the freedom of speech is positively correlated to beliefs that vaccines are linked to autism [74].
Our study also adds to the discussion of social media's structural influence on alternative narratives -the threshold-free access and algorithm catered to user-retention allows for clickbaits, false news, and alternative facts to be amplified by botnets and astroturfers [75,76]. Starbird and colleagues studied rumour networks in the U.S. on Twitter and identified the phenomena of "alternative narratives" which was uniquely driven by the public's tendency to be manipulated by conspiracy theorists, broken epistemologies, and political disinformation (ibid). Whether these false beliefs have an impact on real-life events (e.g., measles outbreaks, US presidential electoral results, and the flat-earth movement, etc.) remains contentious. To an extent, any decision-making is inarguably driven by "affect, cognitive, and conative processes" [77]. Based on our findings, one may argue that recent systematic reviews that have reported suboptimal outcomes of health promotion online may need to factor in the structural properties of social media and the decision-making processes of the public. That is to say, the traditional (yet popular) fact-based one-way information-sharing method needs a drastic makeover.

The High Closeness Centrality of Anti-Vaccine Videos in the Video Network
This research is the first to inspect the network centrality of proand anti-vaccine sentiment on YouTube. We have confirmed that anti-vaccine videos in the vaccine-related video network has higher in-closeness and out-closeness centrality, this would suggest that once a YouTuber watched a video with keywords related to negative sentiment, then it is more likely that subsequent recommended videos will also be anti-vaccine. This result may be indicative that searching for vaccine information online solidifies a person's sentiment towards vaccines depending on that first keyword that a user search for, which serves to reinforce one's preconceived opinions of vaccines [78]. Therefore, we propose that popular anti-vaccine recommended videos should be actively monitored. Active monitoring may assist in contributing to developing recommendation algorithms that more closely reflects 1 TensorFlow. URL: https://www.tensorflow.org the needs of the public -that is, to search for creditable information, and to easily access credible vaccine information online.
Another useful application of recognizing the higher closeness centrality of anti-vaccine videos is to eliminate the "bubble" effect [79]. In a high-closeness centrality network, pro-vaccine videos are likely overlooked if a user is stuck in the anti-vaccine sentiment bubble. This could potentially lead to "majority illusion" [80] -the theory that people believe that the opinions of "active others" represent the general sentiment on an issue. More vaccinehesitancy research needs to be done on YouTube to disentangle whether YouTube's algorithm proliferates a lesser-held belief (anti-vaccine) dominated by those that occupy more active/central nodes could lead inactive others (vaccine hesitant) to change their beliefs.
Although researchers have yet to find online health-seeking behaviors lead to negative health outcomes, social media's apomediary function (meaning to be removed of gatekeepers of information) indicate that misinformation will be rampant, and therefore requires novel approaches of active infoveillance [81].
One of the findings in our research shows that video categories that are more likely to demonstrate anti-vaccine sentiment rest in the "People & Blogs" and "News & Politics" sections, which reflects the fact that people who are anti-vaccine post heavily in these two categories instead of "Educational" or "Science & Technology" where most pro-vaccine sentiment videos are located. This research echoes the observation that social media's "openness, participatory, collaborative" functions have brought about apo-mediation. This means that intermediaries (e.g., physicians) are predicted to be removed from the vaccine-information seeking process, and are replaced by YouTube's content contributors (laymen or professional), and as such, the experiences of the individual is placed above the opinions and expertise of the providers' [82,83].

Limitations and Future Research
This research did not measure the duration of the video, the upload year (i.e., how long has the video been uploaded), length of the video, whether it is cross-promoted content on other social media platforms (such as movies and music videos), or is it a viral video cross-posted in other social media platforms (e.g., Instagram, Snapchat, etc.). These are variables that we would recommend looking at in the future. We would also recommend examining the actual content of the video aside from sentiment, adopting a method similar to text-based semantic analysis like LIWC to determine the variables that influence vaccine-hesitancy such as "emotional appeal", "personal experience," "conspiracy," "vaccine safety," "self-governance," and keywords are frequently mentioned as reasons for hesitancy [6,47,54].
In terms of coding, the process of manual coding sentiments of visual-audio data is labour-intensive and time-consuming. To date, human coding is more accurate than current sentiment analysis software due to our ability to process visual information in conjunction to audio-processing [84]. However, future research may benefit from opensource software like Google's TensorFlow 1 that has developed complex neural networks that can aid in processing audio-visual content in a timely, iterative manner from multiple social media platforms [85].

CONCLUSION
This research confirms that YouTube's dominant sentiment towards vaccines is anti-vaccine (65.02%), which suggests a potentially growing prevalence of anti-vaccine sentiments on imagery/video-based social media platforms. An encouraging step towards addressing vaccine hesitancy from a network perspective includes active monitoring of vaccine sentiment on YouTube and other imagery/video-based social media platforms. In addition, public health institutions and technology companies, such as Google, would benefit from collective action by, for example, fine tuning the recommendation system algorithm to reduce the centrality of anti-vaccine videos and videos containing misinformation at large. Second, YouTube's recommended videos are likely popular, captivating videos -unlike traditional educational videos that are dominating the YouTube pro-vaccine sphere. Adopting marketing and communications strategies to create engaging vaccine awareness content is another way of breaking the "misinformation bubble" without jeopardizing YouTube's viewership retention.
It would be useful to apply concepts in political research such as the polarization of public opinion [86]- [88] to look for new ways to approach and understand the dichotomized vaccine sentiment. For instance, rethinking the anti-vaccine movement from an activist framework (e.g., The 99% movement, the Arab Spring movement, and the Euromaidan revolution in Ukraine, etc.). Returning to the concept of apo-mediation, our neighbours' opinion (or in this case, the online YouTube video network's opinion) is increasingly favorable to a rigid CDC website. To embrace the unique functions of social media's openness is to embrace that we are experiencing multiple paradigmatic clashes. Living in a post-modern society, the public is free to interpret information and evidence away from science, away from facts, and closer to a level that is relatable to the general populace -one that appeals to emotions, risk-negation, and truth-seeking. As philosopher Bruno Latour noted, scientists addressing scientific evidence needs to address "matters of concern", in which networks of people form opinions based on not evidence but emotional appeal [89]. This research acknowledges that a "knowledge deficit" oriented intervention may be complimented by incorporating an in-depth understanding of the network in which misinformation resides.