Modeling User Arguments, Interactions, and Attributes for Stance Prediction in Online Debate Forums

Online debate forums are important social media for people to voice their opinions and debate with each other. Mining user stances or viewpoints from these forums has been a popular research topic. However, most current work does not address an important problem: for a speciﬁc issue, there may not be many users participating and expressing their opinions. Despite the sparsity of user stances, users may provide rich side information; for example, users may write arguments to back up their stances, interact with each other, and provide biographical information. In this work, we propose an integrated model to leverage side information. Our proposed method is a regression-based latent factor model which jointly models user arguments, interactions, and attributes. Our method can perform stance prediction for both warm-start and cold-start users. We demonstrate in experiments that our method has promising results on both micro-level and macro-level stance prediction.


Introduction
Online debate forums are important social media for people to voice their opinions and engage in debates with each other. Mining user stances and viewpoints from these forums has been a popular research area [1; 2; 3; 4; 5; 6]. One potential application is understanding public opinion; e.g., what are the popular stances on the Affordable Care Act, how do they associate with different subpopulations, and how are they changing over time? However, there may be a low online participation rate of Internet users in online discussion forums relating to any particular debate. For example, in our dataset collected from the debating website CreateDebate, 1 where users can explicitly state their stances on user-created debate topics, 278 people participated in the debate titled "Should guns be banned in America," while only 3 people participated in the debate "Oba-maCare." As a result, if we consider all the registered users and existing debates on CreateDebate, only 0.4% of the full set of user stances are observed in the data. In * School of Information Systems, Singapore Management University † Language Technologies Institute, Carnegie Mellon University 1 http://www.createdebate.com this paper, we are interested in predicting a user's stance on a debate topic in which she has not participated.
Note that this target user may have not expressed her stances before, i.e., the user may be a cold-start user. This problem is similar to item recommendation, where a user's purchase history is used to predict her preference for a new item. Collaborative filtering [7] is a technique commonly used to alleviate the data sparsity problem in item recommendation. Probabilistic matrix factorization (PMF) [8] has been shown to be an effective collaborative filtering algorithm, and we extend PMF in our work.
One notable problem with PMF is that it cannot perform stance prediction for users without any past stances. To alleviate this problem, we incorporate user attributes and user-generated content in online discussions. To be more specific, we incorporate three types of information that are prevalent in online discussions: user arguments that are used to back up their stances, user interactions from texts exchanged between users, and user attributes from their biographical information. Such rich information can help with stance prediction on both warm-start users and cold-start users and provide valuable user and issue profiling. Hence, we propose an integrated model that uses such information to provide insights into user behaviors and issues.
Using data from CreateDebate, we propose a unified model for the task of user stance prediction. Firstly, to incorporate user attributes from their biographies, we use a regression-based latent factorization method [9] to profile users. In this method, each user's latent factors are aggregated from factors associated with the user's attributes and user-specific deviations. This setting allows us to profile users who have no past stances, i.e., we can predict stances for cold-start users. We further introduce a novel binomial matrix factorization (BMF) model in the context of categorical ratings (i.e., user stances). This method extends the original PMF [8] model, which is designed for numerical ratings. Furthermore, users write arguments to support their stances, which provide textual cues to understand different topics involved in the issue. Like [10], we associate each latent factor dimension with a topic so as to produce an intuitive explanation for the hidden factors from BMF and improve stance prediction results. In addition, we find that incorporating features about a user's interaction network provides us with a way to infer relationships between users, which we can leverage to better predict user stances. To infer the model parameters, we employ Monte Carlo EM [11; 12] and adapt a fast inference method based on SparseLDA [13].
This work also aims to contribute to the problem of inferring public opinion from freely available social media text and metadata [14; 15; 16]. Such approaches have the potential to complement traditional surveys and polls. We focus on debate forums with rich user-contributed texts, opinions, and interactions on diverse topics. We formulate the task around predicting held-out user stances. Although online forums have been explored in the past for questions such as mining user stances [2; 3; 5; 6], detecting subgroups in online communities [17; 18; 19; 20], identifying user interactions [21; 22; 23; 24; 25], knowledge discovery [26], and detecting new and significant developments in science and technology [27], to the best of our knowledge, this is the first study on both micro-level and macrolevel stance prediction in a debate forum leveraging rich user metadata. We find our methods tend to agree with polls from Gallup, despite the fact that aggregated user stances are different from Gallup. This is promising as the polls are based on survey data collected through a labor-intensive procedure, while our method could serve as a cheap and effective way to complement them.
Our contributions are as follows: • We propose a regression-based latent factor model which jointly models user arguments, interactions, and attributes for stance prediction.
• We study a fast inference method for the model.
• Our experiments show promising results on microlevel stance prediction for both warm-start and cold-start users.
• Our experiments show that our model has a good result on macro-level stance prediction, which shows the potential to complement traditional surveys and polls.

Problem Definition
In Table 1, we present an excerpt of user arguments from a debate page in CreateDebate. In CreateDebate, each debate issue i focuses on a particular debate question, for example, "Does God exist?" Each debate issue has defined stances, which are usually "Yes" and "No" stances for the issue. In addition, each issue i has a set of threaded arguments, where each argument can be an independent post or a reply to an earlier argument. Each argument is authored by a user u, and explicitly contains his stance r u,i on the particular issue, e.g., user A in our example takes the "Yes" stance. One user can write multiple arguments on an issue. We represent the text of the nth argument from user u on issue i using a bag of bigrams w u,i,n . 2 If the nth argument by user u is a reply/interaction post, the user u needs to specify whether she wants to "dispute," "support," or "clarify" the recipient post. We take advantage of this metadata using an interaction polarity l u,i,n ∈ {positive, negative}, and assign l u,i,n to be "negative" when the user's argument disputes an earlier post and "positive" otherwise.  In April 2013, we crawled all arguments of twosided debates from all 14 categories in the CreateDebate website. For each participant, we also collect these six types of attributes from his biographical information: party (e.g., republican, democrat), religion (e.g., Catholic, Christian), gender (e.g., male, female), status (e.g., single, married), education (e.g., in college, postgrad.), and country (e.g., U.S., Singapore). We leave other attributes like "age" and user self-description in biography as further study. Table 2 displays some statistics on the CreateDebate corpus. We find user-stance and interaction information are sparse in our data.
Our task is to predict a given user u's stance on a target issue i when the user has not expressed his stance on that issue. We refer to this as micro-level stance prediction. In § 5.2, we also consider macro-level stance prediction, where we estimate the percentage of users holding a certain stance for a particular issue i.

Model
We approach the problem using a probabilistic graphical model. The graphical representation of the model can be found in Figure 1.

User arguments
User stance

User interaction
(v ! l) User profiling Figure 1: Plate notation for our model. The dashed variables will be collapsed out during Gibbs sampling. ρ = {c 1 , c 2 , qu, q u }, representing two parameters used in user interaction modeling (equation 3.4) and two biases specific to a user and her recipient u . vu, v u , qu and q u are fixed by a regression-based latent factorization method, detailed in § 3.1. Hyperparameters are omitted for clarity.
The model is composed of four parts: i.) user profiling, which considers a regression-based latent factorization method to incorporate user attributes for profiling users; ii.) user stance, which contains a binomial matrix factorization method for modeling categorical stance data; iii.) user arguments, which incorporate textual cues in threaded posts; and iv.) user interaction, which integrates the positive and negative interaction attributes between users.
3.1 User Profiling. Inspired by the work in [9; 29; 30], we consider a regression-based latent factorization method for profiling users. Let f u ∈ R P ×1 denote user u's attributes. We use a binary representation, where each dimension of f u is set as 1 if the corresponding attribute is present in user u, and 0 otherwise. In this study, we consider user attributes in these six types: party, religion, gender, status, education, and country.
We model user latent factors v u and bias q u as: is user-specific deviation, Z u refers to the number of feature types user u has provided, i.e. Z u = p f u,p . b u is user-specific bias. We pose zero-mean Gaussian priors on G, g, δ u , and b u .

User Stance.
To model issues, we embed a factor vector v i,s ∈ R F ×1 associated with each stance s of each issue i. The factor vector is drawn from zeromean spherical Gaussian priors, where hyperparameters σ 2 i are issue-related variances. This differs from probabilistic matrix factorization in associating multiple factor vectors with a single issue. In this paper, each issue corresponds to two vectors denoting the support and oppose stances; more stances could also be incorporated here.
Every user u has an affinity score on each stance s of an issue i, where q i,s is item-stance bias which is drawn from zeromean Gaussian priors N (0, σ 2 q ). Using a logit function, the probability of user u choosing a stance s on issue i is This approach models the categorical rating data (stance) in debates. It captures the intuition that a user chooses a stance based on her own preference over different stances. For example, in the debate "Do you support Obama or Romney in the presidential election?", a user tends to choose a stance "Obama" when her affinity score of "Obama" is higher than that of "Romney," i.e., a Obama We refer to this way of modeling user stance as binomial matrix factorization (BMF) as it extends probabilistic matrix factorization to two-sided stance data. This framework can be easily extended to more than two sides.

User Arguments.
We use a latent Dirchlet allocation topic model [31] to reduce the dimensionality of the text, and combine text data with latent factors from the user stance matrix, grounding each dimension of the hidden factor using inferred topics. Particularly, in our model, topics are in the same space as hidden factors, which is similar to the setting in [10].
We assume a stance-specific topic mixture θ i,s for each stance s of an issue. The reason is that, for each issue, users with different stances tend to have different topic preferences [5]. θ i,s denotes the relative log-odds of the different topics in issue i and stance s, encoding the distribution of topics that are likely to occur when arguing for that particular issue-stance. Specifically, where v i,s denotes the hidden factors associated with issue i and stance s.
The advantages of associating topic distributions with latent factors are two-fold. Firstly, the learnt topics provide an interpretation for factors, as each latent factor dimension is associated with a topicspecific word distribution. Secondly, this helps to reduce ambiguity for latent factors. In the BMF, v u and v i,s can be replaced by . This means the factors may change considerably while leaving the underlying model unchanged. With this association, the θ learnt from texts will regularize the latent factors.
Moreover, we provide a fine-grained categorization of terms (bigrams), where we assume the terms in a user's argument are drawn from one of the following four term distributions.
• Background term distribution φ B . These are words uniformly distributed in many issues. For example, "united states," "no longer," and "things like." • Issue-specific term distributions φ I i . Words that are related to the debate issue, e.g.: "God existence," "believe God" for the issue "Do you believe in God?" • Topical term distributions φ T t for each topic t (1 ≤ t ≤ T ). For example, terms like "health care," "federal government," and "tax cuts" are closely related to the topic "health care" and thus tend to have high probabilities under this topic.
• Interaction term distribution φ L l for each type of interaction l. These are words related to "positive" and "negative" interactions, for example, "i agree," "good point" for "positive" interaction, and "not agree," "not like" for "negative" interactions. In our work, the interaction polarity of an argument is observed, and this information is fed into our model to learn those interaction terms. Recent work in [22] also considers modeling interaction terms, however, they assume the interaction polarity is not available and use a maxent component to guide the model to find interaction terms. Additionally, we incorporate switching variables y to decide from which term distribution a bigram is drawn [32; 33].
The generative story of our model of user arguments is • Draw switching variable type distribution π ∼ Dirichlet(γ).
• ∀ terms w in the mth position of argument n from user u on issue i: • Draw switch y u,i,n,m ∼ Discrete(π).
• Draw term w,

User Interaction.
In CreateDebate data, user interactions are observed. As illustrated in Table 3, more positive user interactions are observed within users with the same stance and negative interactions in difference stances. We thus use it to inform our model in stance prediction, i.e., to guide the model to predict a user's stance to be the same with other users with whom she has positive interactions, and different from those with whom she has negative interactions. This motivates us to associate the similarity of users in the latent factors with the polarity of user interactions. To measure the similarity of users in the latent factors, we simply use the inner product of user factors with user biases. We leave other alternatives as future work. We then enforce a high probability  of observing a positive interaction polarity for users who are highly similar. Specifically, let u denote the recipient of user u's nth post in issue i, we sample user interaction polarity l u,i,n as: where S(·) is logistic function, c 1 ∼ N (1, σ 2 ) which encourages a positive value, and c 2 is also sampled from a Gaussian distribution with zero mean, i.e., c 2 ∼ N (0, σ 2 ); q u and q u are user-specific biases, sampled from zero-mean Gaussian distribution N (0, σ 2 q ); v u , v u , q u , and q u are fixed by equation 3.1.

Inference and Learning
Our goal is to learn the hidden factor vectors and topics of the textual content to accurately model user stances and maximize the probability of generating the textual content. Hence our objective function is to minimize: J = − u,i,n log p(r u,i | ρ u,i,n ) + log p(l u,i,n | ρ u,i,n ) + log p(ρ u,i,n | Υ) + log p(w u,i,n | l u,i,n , v i,s , Ω) , where u, i, n are user, issue and argument index respectively. ρ u,i,n = {v i , q i , G, g, δ u , δ u , b u , b u , c 1 , c 2 } refers to the set of latent variables related to user u, recipient u of the nth post of user u, and issue i, and Υ is the set of Gaussian priors for all the variables in ρ u,i,n . Ω denotes all the Dirichlet prior hyperparameters for φ. The first three terms denote the probability of generating user stance and interaction given the priors Υ, where the variable in ρ u,i,n are to be optimized to minimize the objective function. The last term denotes the probability of observing the text conditioned on θ i,s from learnt vector v i,s , interaction l u,i , and Dirichlet priors Ω.
Exact inference under the posterior distribution is intractable. We use Monte Carlo EM [11; 12], an inference method that alternates between collapsed Gibbs sampling [34] and gradient descent, to estimate parameters in the model. In the E-Step, we perform Gibbs sampling for variables {y, z}, fixing the values of ρ. In the M-step, we perform gradient descent to update latent variables in ρ, fixing the values of {y, z}. Note that the E-step usually takes more time than the M-step. To speed up the E-step, we borrow the idea from inference process in SparseLDA and find that it performs three times as fast as the original setting. For space, we leave all the detailed derivations to the supplementary pages 3 . Note that to further scale up our model for huge data sets, one may resort to use parallel frameworks like [35] and new sampling techniques like [36]. We leave this to future work.
We ran 1,000 iterations of Monte Carlo EM. For Gibbs sampling steps, we ran 400 iterations for burnin and sampled every 10 iterations to reduce autocorrelation. We fixed the number of topics and dimension of the latent factors at T = F = 20. (We considered this number from 10, 20, . . . , 90, 100 and found the resulting topics to be more meaning at around 20 by manual examination.) For our models and competing baselines, we use grid search on a development set to select the model hyperparameters.

Experiments
Recall that our task is to predict a user's stance on an issue that she has not commented on. This problem setting is different from existing studies on stance prediction (e.g., [4; 2; 3]) where a user's arguments about an issue are observed but not her stance, and hence incomparable. We design experiments to: (i.) quantitatively evaluate our model with baselines on the tasks of microlevel stance prediction on warm-start and cold-start users; (ii.) examine our model on macro-level stance prediction; (iii.) analyze the efficiency of our inference method; and (iv.) qualitatively examine the learnt term distribution of topics and issues. We leave the study of efficiency of the inference method and the qualitative analysis to supplementary pages.

Micro-Level Stance Prediction
. We evaluate on the task of micro-level stance prediction, i.e., predicting user stances on a given issue using learnt user and issue factor vectors and the user interaction network. We perform 10-fold cross-validation on our dataset. Each time, we hold out as a test set 10% of the observed user-issue pairs, i.e., observed user stance on an issue. For each test set, if the issue does not appear in the training set (i.e., a cold-start issue), we will put it back into the training set. As for the rest test sets, we split it into two datasets: one is for warm-start users who have their past stances in the training set, and the other for cold-start user who have no past stances in the training set. We remove all the text associated with user-issue pairs in the test set, and the prediction is based solely on users and issues factor vectors learnt from the training set. This setting mimics a real world scenario where a user does not have any prior stance on an issue, but the user has expressed stances on other issues and the issue has other users expressing their stances on it.
5.1.1 User Attributes. We first examine the importance of different user attributes for stance prediction task. We use prediction accuracy (Acc.) to measure model performance: Acc. = 1 |R| u,i I(r u,i = r u,i ). We refer our base model without any user attributes as BMF-AI, binomial matrix factorization with user arguments and interactions. We evaluate the results by first considering only one type of attributes.   Table 4 shows that, of those considered, only those attributes related to "ideology" (party and religion) are helpful for the task of stance prediction. Furthermore, if we incorporate both party and religion attributes into the model, Acc. rises to 0.712. In the following experiments, we will only incorporate these two attributes.

Warm-start Users.
We evaluate the following competing models for comparison: • MB: majority baseline. For each test issue, we predict a user's stance based on the majority stance on the issue from the training data. This method performs well when the stances are unbalanced, i.e., when an issue has a dominant stance.
• PMF: probabilistic matrix factorization [8]. The original model is designed for numerical ratings. We randomly map one stance of an issue to 0 and the other to 1.
• BMF: binomial matrix factorization. This model differs from PMF in that it assumes a rating for each stance of an issue and draws a stance based on a logit function over stance-specific ratings.
• HFT: hidden factors as topics [10]. Based on PMF, this model further considers user arguments.  We summarize the results in the following. (i.) MB performs poorly compared to other methods, since stances here are fairly balanced. (ii.) Both collaborative filtering approaches (PMF and BMF) significantly outperform the simple baseline. (iii.) BMF-A significantly outperforms BMF, at 5% significance level, meaning that text is helpful in modeling user stances; with user arguments, we are able to bring together issues that are similar, as evidenced by similar topic distributions. Meanwhile, BMF-A outperforms HFT by a very small margin; its chief advantage is its extensibility to multiple stances, which we have not tested. (iv.) Modeling the user interactions can further boost performance, as shown by BMF-AI outperforming BMF-A, at 5% significance level. (v.) By incorporating user attributes, the resulting model BMF-AIA achieves the best performance, with significance. This demonstrates the effectiveness of an integrated model of user arguments, interactions, and attributes.

Cold-start Users.
For a cold-start user, although we don't observe her past stances, our model can still profile her, using her attributes. Specifically, for a cold-start user u, we set factor deviations δ u = 0, and v u = 1 Zu G f u . The user's stance for an issue is: arg max s exp(v u v i,s ). Here G and v i,s have been learnt from training data.  Table 6: Micro-level stance prediction for cold-start users, averaged across ten folds. s.d. refers to standard deviation.
We compare our method with different types of attributes and a majority baseline. The results are presented in Table 6, which shows that our model significantly outperforms the majority baseline, at 1% significance level (McNemar's test). We also find that the religion attribute is a slightly (but not significantly) more important than party attribute in this task.

Macro-Level Stance Prediction.
Recall that only 0.4% of the full set of user stances are observed in the data. We consider the task of predicting all the users' stances on all issues; the aggregate of these gives a macro-level stance prediction. Using our model, we can predict any user's stance on any issue in our data giving all the learnt variables in ρ according to equations 3.1 and 3.2. Specifically, we setr u,i = arg max s exp{a s u,i }, and the macro stance for an issue i is defined as: The demographic of participants in CreateDebate are not representative of the larger population and hence we expect that these stance estimates do not match up with Gallup exactly. 5 Moreover, no oneneither us nor Gallup-can be certain that these estimates are accurate, due to difficulties in measuring public opinion such as sampling biases, truthfulness of responses, the way questions are framed, etc. (Of course, experts like those at Gallup have invested a great deal in techniques for overcoming those challenges.) Nevertheless, understanding where and how the results diverge may give us a sense of how the CreateDebate population is different from the population polled by Gallup. We find there are multiple issues that are phrased differently but arguably mean the same thing, e.g., "Does God exist?" and "Is there a God?" We refer a group of such similar issues as high-level issues. We chose seven high-level issues with the most arguments that have corresponding Gallup polls. We select Gallup polls whose (i.) poll date is closest to the CreateDebate data collection date (April 2013) and (ii.) poll question is similar to all the issues in the high-level issue. For example, the high-level issue "gun control" contains three related issues: "Gun Control: Should we have it?", "Should guns be banned?", and "Should guns be  banned in America?" and it corresponds to [37]. 6 For each issue i in CreateDebate, we know the stances of a small number of users and we can compute the proportions of users choosing the majority stance. We can also predict the proportions of the majority stance across the entire CreateDebate population with equation 5.5, and normalizing,n i,ŝ ni,s+ni,¬s . Since we group similar issues together into high-level issues, stance proportions for similar issues are averaged to obtain stance proportions, which are presented in Table 7.
For these high-level issues, we identified Gallup poll results, from which we denote the ratios on both sides of the issue as c i,s and c i,¬s . In Gallup polls, users are allowed to provide "no opinion" as an answer, meaning we have c i,s + c i,¬s = 1. We ignore this small subset of polled users, and instead normalize the ratios to get for stance s: g i,s = ci,s ci,s+ci,¬s . In Table 8, we present stance proportions predicted by Gallup polls, by our model, and by observed stances only. Treating Gallup polls results as "ground truth," we define error of model-predicted stance proportions as i,s |g i,s −n i,ŝ ni,s+ni,¬s |, and error of observable stance proportions as i,s |g i,s − ni,s ni,s+ni,¬s |. Overall, our results on macro-level stance prediction are encouraging. The mean absolute error of our predictions against Gallup is 0.07, much lower than 0.12 when using only observed user stances. By comparing with polls from Gallup, we find our methods tend to agree with or bring the stance prediction results closer to those online polls. For example, the Gallup poll results for "2012 election," [39]

Related Work
Existing approaches for stance prediction have focused on taking advantage of the availability of different sources of information. Topic modeling approaches are useful for distilling texts into low dimensional topics, and using them for predicting user stances [44; 4; 1]. Somasundaran and Wiebe mined the web to augment existing data with learnt associations that are indicative of opinion stances in debates [2], and later focused on identifying stances in online debates by extracting useful linguistic features and making use of curated sentiment and argument lexicons [3]. Sentiment analysis has been used to infer interaction polarities and model the interplay between user interactions and stances [5; 6]. These methods require either rich textual contents that are present for every user, or use of an external, curated data source, which is not readily available in our problem setting. Data sparseness also makes standard supervised learning unsuitable for this task (see, e.g., [45]).
In this work, we consider collaborative filtering to alleviate the data sparseness problem. Collaborative filtering techniques can be readily applied to our user stance-issue matrix (see [46] for a survey). In our prob-lem, we specifically consider PMF methods (proposed in [8]), because of its successful use in real-world problems [47; 48; 49; 50]. PMF has also been extended to incorporate social data. For instance, [49] extended it with social network information to perform social recommendation. Similarly, there are studies on recommendations that incorporate trust network information [51; 52; 53]. Our model bears similarity to these approaches in that we also consider social information extracted from user interactions. However, we incorporate user arguments to ground our model and learn intuitive explanations for the dimensions in our latent factors. We also incorporate user attributes into the model to help overcome cold-start, a notable problem with PMF.
We have compared our findings with publicly available poll data in macro-level stance prediction. Although inferring user opinions in the wider population is not our primary goal, we believe our method could play a role in using publicly available social media data to infer user opinions in the larger population [14; 15; 16; 54]. Our work takes this in a different direction, by studying online debates which have well structured user arguments, opinions, and interactions on diverse topics, to seek to predict user stances on a wider variety of social topics. To the best of our knowledge, this is the first study using debate forums for such analysis.

Conclusion
In this work, we studied the novel setting of stance prediction task in the online discussion forum CreateDebate, despite the low online participation rate of Internet users relating to any particular debate. We seek to predict user stances on a variety of topics; these methods might eventually complement traditional surveys. Our model brings together user arguments, interactions, and attributes into a collaborative filtering framework that exploits recently introduced fast inference methods. Experiments show promising results on both micro-level and macro-level stance prediction.