Beyond Purchase Intentions: Mining Behavioral Intentions of Social-Network Users

Abstract Advertisers, recommendation-system designers, and public-health campaigners are investing heavily in online targeting, focusing intently on social-network platforms because of their ability to identify unique subpopulations according to the users’ traits. A particularly informative trait is current behavioral intentions, which provide solid information about users’ future behaviors; yet, the ability to infer such intentions constitutes a significant risk to users’ privacy. An important task is therefore understanding to what extent we can infer behavioral intentions of social-network users solely using publicly-available data. In this article, we formulate intention inference as a time-series classification task and design novel Bayesian-network models that can capture the dynamically evolving nature of the human decision-making process by combining data and priors from multiple domains. We then extend our models to the more general case of attribute inference in the presence of scarce labeled data by introducing a new semi-supervised approach to user-modeling in social networks. We evaluate the performance of our models when used for the inference of five behavioral intentions using temporal, real-world social-network data.


Introduction
Advertisers, recommendation-system designers, and publichealth campaigners are investing heavily in online targeting, focusing intently on social-network platforms because of their ability to identify unique subpopulations according to the users' traits.A particularly informative trait is current behavioral intentions, providing solid information about users' future behaviors.In this work, we explore the following question: Can we infer offline behavioral intentions of social-network users using information extracted solely from their public, online social-network profiles?

Motivation and objectives
To better illustrate the motivation for this work, we describe one specific use case of the proposed methodology: ad-targeting mechanisms.Oftentimes, an ad is designated to persuade an individual to behave in a certain way-to take a particular action, such as buying a product, voting for a certain candidate, receiving a vaccination, etc.; when designing such a mechanism, advertisers would like make the best use of their funds by only paying for ads that are presented to individuals who are highly likely to take the desired action.The question that then arises is, how should one identify the target subpopulation?In recent years, advertisers came to understand that social-network accounts are the place to search for such subpopulations; not only that social-network users voluntarily provide private information about themselves regularly, but also much of this information is publicly available; that is, everyone can see it, not only the users' friends or closest network ties.Advertisers, therefore, can mine the social-network accounts of a user to better understand whether the user is likely to perform the desired action.
One course of action that an advertiser might take is extracting data that explicitly appears on the user's profile, such as posts or tweets written by the user, and mining new insights from it.For instance, if the user wrote a tweet about the fact that she has not yet decided whom to vote for, or the fact that she is looking for a new job, the advertiser can decide to present her with an ad for a certain candidate, or with an offer for a service that links between job searchers and employers.However, what we choose to publicly and explicitly mention on our social-network accounts is only part of the story.A more thorough analysis of a user profile will use such explicit data extracted from social-network accounts to infer implicit, or latent attributes that are known to be related to the desired behavior.For instance, by inferring the body image or personality traits of various users, attributes that are rarely explicitly revealed in social networks, an advertiser can decide to target users with a low body image or with a certain personality structure with ads for a new weight-loss pill or gym membership.The above approach has been researched in many previous articles as we detail in Section 2. But is it enough?Can we do better?
A core observation is that attributes researched in prior work, such as personality traits, body image, and demographic attributes are time-invariant-they hardly change over time.More importantly, they are not associated with and do not directly lead to any behavior.Hence, while inferring such attributes definitely assists in identifying the desired subpopulations, the degree of confidence that links between having those attributes, and completing the desired behavior in near future is no more than moderate at best (Hampshire & Hart, 1958;Sheeran, 2002).For these reasons, advertisers might want to take a more targeted approach and use different types of attributes; attributes that are both domain-specific; time-varying (change over time); and known to be highly predictive of real-world behavior.
"Intentions are people's decisions to perform particular actions" (Sheeran, 2002).Behavioral intentions are dynamic in nature and oftentimes lead to a concrete, near-future behavior.Identifying users with intentions that are relevant to a given desired behavior thus has a huge potential for improving adtargeting mechanisms and assisting in identifying relevant target subpopulations.Going back to the examples presented above, assume that an advertiser would like to start an ad campaign for a new weight-loss pill.They can use self-reports-for instance, users who declare that they are attempting to lose weight.Of course, the users that will be identified in this way are only those that self-report their linkage to the desired behavior.The advertiser can also try to infer a latent attribute that is related to the product being advertised, such as low body image.However many of these users will either won't have an intention to lose weight, though having a low body image or will have a goal to start a weight-loss regime at some point in the future but not now.Both types of users are "associated" with the desired behavior, yet do not have any concrete intention to start a weight-loss regime at the moment of publishing the ad.The same applies to job-searching intentions.An individual might feel unsatisfied with her job, or even be unemployed; those two attributes can be either extracted directly from users' posts or inferred as latent attributes.But even if a user indeed has one of those traits, this does not necessarily mean that she has a concrete intention at this specific moment to start looking for a new job.
The goal of this article is to infer, using only publiclyavailable social-network data, whether a given user has a given intention at a given point in time.We use Bayesiannetworks and dynamic-Bayesian networks for that purpose, and describe in detail their strengths for the specific task of intention inference in the following sections.The second goal of this article is to offer a new semi-supervised timeseries classification technique (TSC) that unlike other TSC techniques can model various temporal relations between different features and at the same time work on small datasets without a long history.

Our contributions
The article makes the following contributions: 1.2.1.A new approach to modeling social-network users using dynamic Bayesian networks We present a new approach to modeling social-network users and mining time-varying attributes using dynamic Bayesian networks.We provide a detailed overview of the design of the models, training, and inference process.We present a new semi-supervised training algorithm that is particularly suited to temporal datasets with a very limited amount of labeled data.We evaluate our models' performance when used for the inference of five dynamic attributes using temporal, real-world social-network data collected in multiple waves.This work is the first work to take a DBNbased approach to the task of private-attribute inference in social networks and the first to offer a DBN-based representation of social-network users.

A unique focus on offline, time-varying behavioral intentions
Unlike other existing works that tackle the task of attribute inference, our work is the first work that aims at inferring offline and dynamic, non-politically-related behavioral intentions of social-network users (i.e., a user-centric approach) solely based on public social-network data.Other works either focus on online intentions or time-invariant preferences; use private data or data that is not obtained from social networks; or take an "object-centric" approach by trying to infer the intention associated with a single, standalone and contextless social-network "object," such as a post or a tweet.Furthermore, some of the behavioral intentions that we consider in this article, such as borrowing intentions and job-searching intentions, have never been studied in any prior machine learning (ML) or social-network-related work.

A new multidisciplinary methodology for the inference of behavioral attributes
We introduce a novel, multidisciplinary methodology for the inference of behavioral attributes, such as decisions and intentions.We design modular Bayesian-network (BN) models that can capture the evolving nature of the human decision-making process by combining data and priors from multiple domains, such as behavioral psychology, sociology, human-computer interaction (HCI), etc.Our methodology handles common challenges in social-network mining, such as incomplete datasets, unlabeled data, and bidirectional influence between features and the target variable.
We present comprehensive experimental results conducted on real-world, social-network temporal datasets, which demonstrate our models' performance when used for the inference of five behavioral intentions: weight-loss intentions (WI), vaccination intentions (VI), travel-purchase intentions (PI), borrowing intentions (BI) and job-searching intentions (JI).Ours is the first work on intention inference that is not focused on a single intention but rather compares inference results obtained for multiple intentions from different domains.

Methods
We present here some of the core principles of the methods used in this article.We elaborate on each point in the next sections.
A cause-and-effect model: we model a behavioral intention as being caused by a set of causes and causing a set of effects.We use a Bayesian-network model to explicitly represent those causal relations.Furthermore, we also model causal relations between causes and effects of the same intention, as such knowledge may assist in the inference task (see Section 3).A layered Bayesian-Network model: Some of the causes and effects of an intention are latent variables-they cannot be extracted directly from social-network profiles.An example of a such variable is personality traits.To handle that, we use a layered Bayesian-Network model that is built using the following chain of relations: observed network features latent causes of intention !behavioral intention.Note that the observed network features are treated as the effects of the latent variable.More details including concrete examples can be found in Section 3. A combination of priors and data: Another unique aspect of our methodology is the combination of prior knowledge and data for training our Bayesian-network models: both for selecting the features we use; for selecting which relations between features will appear in the model; and for quantifying each relation's strength.Priors used in this work were obtained from prior behavioral psychology, sociology, and HCI literature; literature that surveys the specific intentions that we consider in this article (such as Jappelli, 1990;Markey & Markey, 2005;Ottaviani & Vandone, 2011); and Census data.This prior information was combined with information extracted from our datasets using BN-structure learning algorithms as well as a sophisticated, two-stage feature selection technique we have designed (see Section 4).A temporal model and a semi-supervised training method: We extend our static approach as detailed above to the temporal case, treating behavioral intentions as dynamic attributes.For this purpose, we use dynamic Bayesian networks and design novel training and inference algorithms.Our training algorithm is a temporal semi-supervised algorithm: a revised, weighted version of the Expectation-Maximization algorithm.Our inference algorithm aims at reducing the amount of labeled data needed for the inference task, as well as combine different types of data obtained from multiple sources.For more details, see Section 6. Dataset: labeled data is obtained using surveys.To reduce the self-selection bias in data obtained using surveys, we also use unlabeled data (see Section 7), as well as selfreported labels.The thought behind this approach is that by combining data from different sources, the unique type of bias associated with each data-collection method will average out, leading to datasets that are less biased and more representative of the general population.

A broader view
The ability to successfully infer time-varying behavioral intentions of social-network users has a broad societal impact.On the one hand, as extensively discussed in Subsection 1.1, this ability has immense potential for the design of better, more accurate recommendation systems, ad-targeting mechanisms, political campaigns, etc.
Knowledge of an individual's current intentions allows an advertiser or a campaigner to create significantly more effective targeting mechanisms, as unlike other attributes, such as personality traits or demographic attributes, intentions are behavioral attributes; that is, they often lead to a behavior.In addition, when inferring a given behavioral intention the inference task is focused on the single, specific domain of interest, unlike attributes, such as personality traits or demographic attributes that can surely assist in inferring a future behavior but are not domain-specific.
On the other hand, the ability to infer such intentions might have a detrimental effect on users' privacy.As extensively discussed in Saura et al. (2021), the ability to collect, analyze and predict user behavior poses significant risks to users' privacy.Since intentions oftentimes lead to a nearfuture behavior, a method that enables a successful inference of intentions may entail similar risks to the ones detailed in Saura et al. (2021).Moreover, note that our approach is based on data extracted solely from public social-network data.This is a crucial point as it demonstrates the fact that not only our social-network providers can learn such sensitive information about us as their users, but essentially anyone with adequate computing power can learn such sensitive information even though it doesn't have an active data-sharing agreement with the social-network provider.
Dynamic Bayesian networks (Murphy, 2002) are an extension of static Bayesian networks that allow for a dynamic representation of temporal nodes and edges.The use of dynamic Bayesian networks, a relatively new modeling technique, in the existing literature is less common than the use of Bayesian networks.The main areas to which dynamic Bayesian networks have been applied thus far are safety monitoring and reliability analysis, disaster prediction, risk assessment, and biology.Dynamic Bayesian networks were used to model potential threats posed by dynamic network vulnerabilities (Frigault et al., 2008) and for predicting insider threats (Axelrad et al., 2013); reliability evaluation and safety decision support (Amin et al., 2018;Li et al., 2017); modeling gene expression data (Dojer et al., 2006;Murphy & Mian, 1999); predicting disasters, such as car accidents, banking crisis, and wildfires (Dabrowski et al., 2016;Khakzad, 2019;Sun & Sun, 2015).Some works also used dynamic Bayesian networks for user modeling.Examples include human driving behavior (Kumagai & Akamatsu, 2006), students' learning styles (K€ aser et al., 2017), and user stress and anxiety levels (Liao et al., 2005).

Attribute inference
The inference of private attributes using social network data has been extensively researched.Inferring users' personality types was investigated in Golbeck et al. (2011) using regression models and Twitter/Facebook data, respectively.Youyou et al. (2015) showed that automatic inference methods that rely on Facebook like achieve better prediction accuracy than those achieved by asking the users' friends.Staiano et al. (2012) used data gathered through smartphones, such as calls and texts; their results significantly vary across different personality dimensions.More recent works on inferring personality traits include Kleanthous et al. (2016) andMcCain et al. (2016).A thorough review of the use of social-network data for predicting personality traits can be found in Marengo and Settanni (2019).
Demographic attributes' inference is another well-studied topic, with age and gender being the most researched attributes (Kulshrestha, 2021;Lampos et al., 2016;Schwartz et al., 2013).A related stream of research focuses on psychological and mental conditions.Depression is the most researched condition, followed by anxiety and stress (Guntuku, 2019;Mann et al., 2020;Zhang et al., 2021).Other works aim at assessing general life satisfaction or perceived quality of life (Bozkurt et al., 2020;Marengo et al., 2021).
The common denominator of all the above works is that they focus on attributes that are either static (their values rarely change), non-self-controlled, or both.
The inference of self-controlled attributes has also been extensively studied.However, such works focus on the inference of opinions and attitudes rather than behavioral attributes (Silva et al., 2021;Xi, 2020).While a substantial amount of work does study different types of behavioral attributes, their goals are different than ours.Such works study general correlations between network or linguistic features and a given behavioral intention (Kim, 2018;Luo et al., 2021), identify the prevalence of a certain behavior among the general population, or classify standalone social-network textual objects (a content-centric, rather than a user-centric approach), such as tweets or posts.For example, while there exists a considerable amount of work about the use of machine learning techniques applied to social network data for monitoring public health, none of these works aims at inferring the vaccination intent of a given social-network user at a specific point in time, solely using publicly-available social-network data.Rather, existing works analyze collective sentiment toward vaccinations (Cossard, 2020;Mitra et al., 2016), track the spread of infectious diseases (Lamb et al., 2016;S cepanovi c, 2021), or perform classification of stand-alone social-network objects according to vaccination attitudes of the object's creator (Aramaki et al., 2011;Huang, 2017).
Inferring time-varying, behavioral attributes using public social-network data have therefore been hardly researched, with two exceptions: voting intentions and online purchase intentions.There are several key differences between this work and prior ML work on PI.First, the majority of existing works examine general buying preferences rather than time-varying PIs (Zhang & Pennacchiotti, 2013).Other works try to infer PI of stand-alone social-network objects (content-centric) rather than PI of social-network users (user-centric), an approach that is inherently biased (Atouati et al., 2020;Gupta, 2014).The remaining works that do try to infer a user-centric, time-varying PI use data derived solely from E-commerce platforms (Mokryn et al., 2019).The closest work to ours is Lo et al. (2016) which infers PI of Pinterest users using temporal features and a logistic regression model.However, they only consider online purchases and do not differentiate between different product categories.
A systematic literature review on intention mining is given in Rashid et al. (2021); note, however, that while the article discusses existing works on intention mining it does not focus on behavioral-intention inference.

Information cascades
The formation of different types of information cascades (Acemoglu & Ozdaglar, 2011;Easley & Kleinberg, 2010;Shu et al., 2020;Zang et al., 2017) in social networks is a highly researched area that may assist in predicting various behaviors.Similar to this work, information cascades are dynamic in nature and are often modeled and analyzed using a Bayesian approach.However, as we stress in Section 3, this article is not aimed at inferring past behaviors, but rather with the intention to perform future behaviors.More importantly, this work does not use features that require analyzing social-network profiles of the user's network ties.Such an approach is significantly more practical than an approach that requires analyzing socialnetwork profiles of each tie of the target user, as automatically obtaining network information nowadays for most social-network platforms can only be done by social-network providers.Finally, while information cascades and herd behavior might explain some behaviors (voting behavior, for instance), they are far less useful for inferring behaviors that are either not explicitly and publicly shared online or are known to be influenced primarily by the individual's personal traits.

Time-series classification
Time-series classification (not to be confused with time-series forecasting) is the task of assigning labels to an instance based on a set of ordered features.Univariate time-series classification, where the model's feature set is solely composed of prior labels, is a highly researched area.Popular univariate classifiers include K-nearest neighbors classifier (usually K ¼ 1) with dynamic time warping as a distance measure; shapelet-based classifiers (Ye & Keogh, 2009), and interval-based classifiers (Deng et al., 2013).
In real-world scenarios, however, time-series problems are multivariate: feature sets may potentially contain other features beyond prior labels.Much less consideration has been given to multivariate time-series classification (MTSC).The vast majority of MTSC models are essentially an ensemble of univariate classifiers, using a separate model for each feature.The shortcoming of such an approach is that the ensemble is only able to learn temporal relations between the same feature, but cannot learn temporal relations between different features across different time points.
Only a few MTSC models are not built as an ensemble of univariate time-series classifiers; almost all of such models are built using neural networks.Such models include RNNbased models (H€ usken & Stagge, 2003) or various variations of convolutional networks, such as ResNet (Wang et al., 2017).Such models are indeed shown to obtain good results on MTSC datasets.However, they all require data from many time points (i.e., length) to achieve excellent results (Ruiz et al., 2021).To obtain high-length datasets, one must either wait a long time until starting the classification of new data, or sample data at high sampling rates (examples of data sampled at high sampling rates include sensor data, temperature data, motions, images, voice, etc.).This observation presents an important and interesting question: which model should we use in domains in which temporal data is sampled at relatively low rates-either because of the nature of the data or because of resource limitations-and we cannot wait a long period of time to start the classification, yet the dataset contains many features with important temporal relations between them?In this article, we propose one possible approach for handling such use cases.

Methodology
In this section, we present a general methodology for inferring behavioral intentions of social-network users where a "behavioral intention" refers to a decision to perform a present or near-future behavior.Our methodology includes multiple components; In the next sections, we further elaborate on each component and show how it is used to infer the five behavioral intentions considered in this work.
Our methodology consists of multiple stages and is illustrated in Figure 1.As can be seen, multiple stages, such as feature identification, feature selection, and model selection are performed using a combination of prior information and data; this approach enables us to handle multiple challenging traits of our datasets, such as scarce labeled data, a large number of missing values, and latent variables.In the paragraphs below we briefly list some main components of our methodology; a more detailed description of each component can be found in the supplementary material file.
1. Identify the best determinants of the formation of behavioral intentions.This step is performed using behavioral psychology literature.A significant challenge, however, is the fact that the values of some of those determinants (e.g., personality traits) cannot be directly extracted from the user's social-network profiles.Such variables are referred to as "latent variables" and are presented in Layer 2, Figure 2. Therefore, we enrich the model with various observed network features as detailed in step 2. 2. Extract observed network features.Observed variables are attributes that at least some users publicly reveal on their public social-network accounts.Those network features may assist in both inferring the target intention as well as inferring each intention's latent determinants.This step is performed using a combination of existing HCI literature and feature-selection methods based solely on the data at hand.Such nodes are presented in Layer 3, Figure 2. 3. Identify the best determinants of the specific intentions that we consider in this work (Layer 2, Figure 2).This step is performed using a combination of existing HCI and psychology literature, feature selection methods based solely on the data at hand, and existing literature from the specific domain of the target intention.For those determinants that are latent, repeat step 2. 4. Map features to network nodes. 5. Build a static Bayesian-network model: select the best features (feature selection) and the best model (model selection).6. Enhance the static models built in step 5 to the temporal case using dynamic Bayesian networks: select the best static and temporal features (feature selection); select the best model-both static and temporal edges (model selection); and perform temporal training of the DBN.
One of the key ideas of our methodology is the representation of each behavioral intention as being caused by a set of causes and causing a set of effects.Each latent cause and each latent effect are also modeled in a similar manner.We continue recursively until we get to an observed node that contains a reasonable percentage of missing values (i.e., <30%).In addition, we model dependency relations between different causes and effects of the same target intention; this enables us to reduce the uncertainty in the model and better handle variables with many missing values.
As an example, let us consider weight-loss intentions.Body image is known to be a potential cause of weightloss intentions (Markey & Markey, 2005).However, most users do not publicly reveal their body image on socialnetwork accounts and therefore body image is considered a latent variable.In this case, we use the observed effects of body image, observed network features, such as the number of self-tagged photos, to create a chain of dependency relations between causes and effects as shown in Figure 3.

Use of dynamic Bayesian networks
The choice of DBNs for our inference tasks was motivated by several properties of DBNs that specifically suit the nature of our inference task and the nature of our datasets.First, DBNs can work directly on incomplete datasets.Other popular discriminative classifiers cannot work directly on datasets with missing values.Since our datasets contain a large number of missing values that correspond to attributes that the user has not publicly revealed on one of her social-network accounts, choosing a model that can naturally handle missing values without the need for imputation is crucial for the model's performance.Furthermore, the probabilistic nature of DBNs allows us to incorporate unlabeled data in the training phase using algorithms, such as the Expectation-Maximization algorithm, as explained in Section 6, to reduce various types of selection bias oftentimes encountered when using socialnetwork datasets.
Another important property of DBNs is their ability to incorporate prior information in the inference process.This fact is of special importance for our inference task as at least for some of the model's dependency relations there exists very strong priors obtained from existing statistics (Census  data), as well as existing behavioral-psychology, social psychology, and human-computer interaction literature.Such priors allow us to work better on small and medium-sized datasets, noisy datasets, and incomplete datasets.Small and medium-sized datasets are common in many domains in which annotations are expensive or labels are obtained using procedures that require the subjects' active participation, such as clinical trials.Thus, developing data mining methods that can work on such datasets and yet provide valuable insights is of utmost importance, yet is significantly less researched than big-data-oriented models.
Moreover, due to their ability to model temporal dependency structures, DBNs are highly flexible in terms of the type of queries that they can serve including inference, prediction, diagnosis, backward reasoning, and retroactive analysis.DBNs can handle mixed datasets (categorical and numeric features) and naturally handle multivariate temporal analysis (i.e., multiple attributes change across different time dimensions) unlike other temporal models.
Neither traditional discriminative classifiers, such as support vector machines (SVM) or decision-tree ensembles nor inherently temporal methods that are specifically adapted to time series classification can support all the above functionalities.

Features
We use a diverse set of features (Table 1).A value of a feature f for a given user u was included in u's record in our dataset only if u publicly revealed the value of f on one of the public portions of her online profiles; otherwise, f was deleted from u's record in the dataset and treated as a missing value.Hence, our datasets contain a large number of missing values; furthermore, latent attributes appeared only in the training set.
Figure 2 presents a diagram of our static intention-inference network model.As can be seen, it is built in a layered manner with target intentions in the first layer, latent and partially-observed determinants of intentions in the second layer, and observed network features in the third layer.A summary of the features that we used is listed in Table 1.For some linguistic features, we considered separately usergenerated content (i.e., posts) and non-user-generated content (i.e., hashtags).For instance, LIWC-NUG and LIWC-UG features represent LIWC analysis applied to nonuser-generated textual features and user-generated textual features, respectively.
Due to space limitations, we will not elaborate in this section on all of the features that we use; a detailed description of them can be found in the supplementary material file.However, we do elaborate on the intention-specific features that we use.Note that as some of these features are latent (i.e., body image, impulsivity, etc.) we combined them in the model using the cause and effect approach as described above.Intention-specific features include: Weight-loss intentions: body image.Body image dissatisfaction is a catalyst for weight-loss intentions, especially among women (Markey & Markey, 2005).The relationship between social-network use and body image is mediated through appearance comparison, which in turn can be inferred using various social-network features (Kim & Chock, 2015;Meier & Gray, 2014).
Borrowing intentions: optimism and impulsivity.General impulsivity is linked to higher levels of borrowing (Ottaviani & Vandone, 2011).Optimists not only believe that they will be able to pay back their debts quicker, but are also less likely to be discouraged from applying due to fear of rejection.This phenomenon referred to as "discouraged borrowing," may constitute a measure of financial self-efficacy (Jappelli, 1990).
Job-searching intentions: unemployment statistics and employment history.Employee perception of alternative employment opportunities is a crucial component of turnover intention (Mano-Negrin & Tzafrir, 2014).We capture this measure using an observed variable that represents an industry-specific unemployment rate.In addition, we hypothesize that the user's employment history can be used to infer current intentions and introduce variables, such as average length of time at a job and length of time at the current job which are inferred using network features.The resulting set of variables ("status") may constitute a measure of job-search self-efficacy.Vaccination intentions: attitudes and habits.Many articles investigate the relation between anti-vaccination positions and other attitudes and habits.Using themes identified in previous work, we create a second keyword set that contains keywords related to attitudes and habits that are known to be strongly linked to anti-vaccination attitudes.This set is treated similarly to other keyword-based features.
Travel-purchase intentions: level of concern about COVID-19.One of the industries hardest hit by COVID-19 is the travel industry, both due to regulations and people's fear to put themselves at risk in crowded flights and hotels.Therefore, as our datasets were collected during the 2020 COVID-19 pandemic, the level of concern about COVID-19 appears to be a key determinant of travel-purchase intentions, inferred using demographic and linguistic features.

Feature selection and model selection
We designed a two-level, hybrid feature-selection method.Due to the high number of correlations between different features, we opted for a Bayesian-network-based-featureselection method rather than a univariate filter-based approach.However, performing feature selection using only a BN may lead to overfitting.In addition, the vast majority of Bayesian-network-learning algorithms require complete datasets.To combine the best of both worlds, we employed a hybrid feature-selection approach.First, a simple, univariate feature selection method was applied to a subset of the features on which we didn't have strong prior information.For that purpose, we used a mutual information-based feature-selection method and removed all the features that received a score below a certain threshold.The resulting features, as well as the set of latent/high-prior features, were the input for the second phase which uses two structurelearning algorithms: Greedy Thick-Thinning and the PC algorithm (Cheng et al., 1997;Spirtes et al., 1993).This phase aimed at identifying the best features using Markov Blankets.
A Markov Blanket of a variable t is a minimal variable subset conditioned on which all other variables are probabilistically independent of t.The Markov Blanket of a BN node, MB(t) is the set of its parents, P(t); children, C(t); and spouses, U(t) as encoded by the structure of the BN.As shown in Koller and Sahami (1996), the Markov Blanket of a given target variable is the theoretically optimal set of variables to predict its value.However, simply considering all the features in the Markov Blanket of the behavioral intention node is unsatisfactory in our case, due to the existence of latent variables.Thus, a better strategy would be to first find an "approximated" Markov Blanket of the target node, MB0ðtÞ which includes the variables in the sets P(t), C(t), and U(t) as discussed above.Then, identify the Markov Blanket of each latent variable that is also a member of the target's approximated Markov Blanket and include the features in the union of those blankets in our feature set (in addition, of course, to features in MB 0 ðtÞ).That is, our feature set is: Where S represents the set of latent variables in our model.The above strategy would have probably been sufficient if our datasets were complete.However, our datasets contain missing values which had to be imputed before running the structure-learning algorithm.We note that while BNs can perform parameter learning and inference in the presence of missing values, most of the Bayesian-networkstructure-learning algorithms require complete datasets.Hence, for some variables, we consider an "extended" notion of a Markov Blanket which also includes certain variables that belong to the variable's second-degree Markov Blanket.Specifically, if a given variable, v, represents an observed attribute with more than 50% missing values (mðÞ) and for which we do not have a strong prior (pðÞ), we consider a restricted notion of v's second degree Markov Blanket and add both its direct parents, P(v), and its direct children, C(v), to our feature set.Let F be our variable-set before applying feature selection, and O the set FnS: Our final feature set includes the following features:

Model selection
The approach described above not only yields a feature set but also a BN structure, comprised of the nodes in the feature set and the edges connecting features in the feature set.Some edges were corrected to reflect strong prior information.The balance between automatic structure-learning algorithms and the use of priors for structure elicitation, as well as the initial parameters for the structure-learning algorithms (when applicable), were validated using cross-validation.Note that while the information gathered from prior literature would have probably been sufficient to model most of the meaningful dependency relations between an intention and second-layer features, relations between thirdlayer features and other features, as well as between thirdlayer features and the target intentions cannot be captured solely using priors, as those types of relations are not as extensively studied as behavioral intentions-second-layer features relations.

Dynamic models
In previous sections we viewed intentions as static attributes, ignoring the inherently temporal nature of the human decision-making process.When treating intentions as dynamic attributes, using dynamic Bayesian networks, not only can we model temporal relations between different features and each behavioral intention, but also model static and temporal relations between different features.
A DBN is a sequence of BNs.Each BN represents a time slice of the DBN, i 2 T, corresponding to one instance of time.A DBN adds three components to a static BN: temporal variables, temporal edges, and temporal evidence.For instance, if a static BN contains the variables fX j g j2D , a DBN contains variables that can take different values in different time slices, e.g., fX j i g j2D, i2T , as well as temporal edges between them.Formally, a DBN is defined as a pair ðB 0 , B t Þ where B 0 defines the prior PðX 1 Þ and B t is a two-slice temporal BN that defines PðX i jX iÀ1 Þ by means of a directed acyclic graph (PAðX In the following subsections, we describe our temporal, user-centric approach to modeling social-network users using dynamic Bayesian networks and then apply it for the inference of the five behavioral intentions discussed in previous sections.Our approach is particularly suited to the scenario in which data is temporal and labeled data is scarce-a very common scenario in social-network mining.We would like to combine different types of data in our training set, obtained from multiple sources, to reduce the bias resulting from the use of a single data source, such as self-reported labels, as shown in Algorithm 1. Furthermore, as labeled data is scarce we would like to use, at each time point, no more than the minimum amount of labeled data needed to perform adequate training given all the training cycles the DBN has already been through in previous time points; our semi-supervised training approach, illustrated in Algorithm 1 and Algorithm 2, attempts to address both requirements. Algorithm 1. Temporal Train-Infer (Intention I) Each social-network user is modeled using a set of dynamic Bayesian networks.Specifically, let u be a social-network user, and let jKj ¼ K be the set of u's dynamic attributes we aim at inferring.u is represented by the set fðD X k u , CðX k ÞÞjk 2 ½Kg: D X r u corresponds to a DBN which aims at inferring the attribute X r of a user u and D X r u ½i corresponds to the i'th slice of the DBN.CðX r Þ refers to X r 's unique "sampling rate"-the rate in which data for each DBN's attribute is sampled from the u's public social-network accounts.The sampling rate CðX r Þ associated with a DBN D X r u should be determined according to the unique attribute to be inferred, X r .For instance, if our target attributes are various behavioral intentions, the sampling rate of each intention's DBN should be determined according to the intention-behavior (IB) interval (Sheeran, 2002) of intention X r ; the shorter the IB interval of an intention X r is, the higher CðX r Þ should be.We further elaborate on the relation between the IB interval of various intentions and our results in Section 8.

Algorithm 2. TrainNetwork
Input: A l (labeled data), A u (unlabeled data), T, S (threshold), C (number of classes), X (Prior), X ESS , i (time point) Output: H, the final model's parameters Require: There exists an inference procedure InferNetworkðh, AÞ that performs inference given the network's parameters h and a test set A 1: q n i ¼ max j2C Pðc nj jM k i :hÞ 14: if PriorLabelðx n i Þ < S then 15: x n t ¼ GetLastLabeledRecordðx n i Þ 16: v i ¼ e Àk i ðiÀtÞ 17: s n i ¼ 0:5 20: end if 21: end for 27: Algorithm 1 presents a high-level overview of our temporal inference strategy for one target attribute I-a behavioral intention-and a set of target users U, creating a set of DBNs fðD I u , CðIÞÞju 2 Ug: At each i 2 CðIÞ we infer I u i , a behavioral intention I of user u at time i.A similar process can be done for other time-varying attributes.In Algorithm 2, we show how to combine the unlabeled data obtained by Algorithm 1 in the training procedure.
At time i ¼ 0, the DBN's parameters reflect existing priors obtained using domain-specific information.For i > 0, s.t.i 2 CðIÞ we perform both training and inference.If i < c, training of fðD I u ju 2 Ug is performed using both labeled data and unlabeled data.Labeled data for i < c is obtained from two sources: D LP -data achieved using methods that require users' active participation in the data-collection process (e.g., surveys); and D LNP -data that is mined directly from a set J of social-network accounts whose associated users publicly publish their intention at time i, I i .Each data record j 2 D LNP contains I j i as well as F j i -other publiclyrevealed attributes which serve as features for I. Unlabeled data D UL is composed solely of current (t ¼ i) and historic (t < i) values of publicly-revealed attributes of a set A of randomly chosen users which serve as features of I.That is, the set fF a t ja 2 A, t ig: Such users do not publicly reveal their intentions.
If i > c, we no longer need to obtain labeled data using methods that require the users' active participation, as the inter-slice and intra-slice dependencies are already adequately trained, and thus it is not required to provide large amounts of labeled data.In this case, D LP is partitioned into two sets: D LP 1 and D LP 2 : D LP 1 contains participants from D LP who publicly reveal I i on their accounts.For those users, we have both historical features and labels, as well as current (t ¼ i) features and labels.Our labeled training data in this case will be composed of D LP 1 as well as new labeled data mined directly from social-network accounts (D LNP as described above).
D LP 2 contains participants from D LP who do not publicly reveal I i on their accounts.For those users, we have historical features and labels, and current features, but not current labels.Hence they are used as unlabeled data in the case that i !c: Note that c should be determined separately for each inferred attribute according to its own unique traits, such as the number of features.
Inference of I at time i for each user u is done based on the sampled feature sets of the user from current and historical samples; the user's publicly-revealed current and past intentions; and publicly-revealed behaviors: a behavior at time t may suggest an associated intention at time t À 1 or t À 2 thus allowing us to retroactively update the network's parameters to reflect the new insights.Note that if an intention or behavior is not publicly revealed it cannot be used for inference, hence the curled brackets on I i and B i in line 30.

Parameter learning
In Algorithm 2, we present a semi-supervised training procedure for learning the parameters of a given network structure using both labeled and unlabeled data.The algorithm utilizes a weighted version of the Expectation-Maximization (EM) algorithm that is specifically suited to the temporal inference structure described in Algorithm 1.For simplicity, we use a parameter, T, that specifies the number of iterations.In general, though, we would repeat the loop (line 9) while the network's parameters improve as measured by the change in lðhjA l , A u Þ, the log probability of the labeled and unlabeled data.Note that the symbol M T i :h refers to the set of parameters of model M T i : When building the classifier (line 27), one should carefully consider which unlabeled records are fed to the classifier as part of the training set.The first reasonable option is to include an unlabeled training sample in the final training set only if the BN was highly confident about its labelmeaning, the probability of its most probable state (q n ) is above a certain threshold ("confidence scores").Another option that may lead to better generalization is avoiding selecting only a small subset of unlabeled records for the final training set, and instead including all the unlabeled samples weighted by a measure that represents the influence each unlabeled record should have in the final training set.This means that the unlabeled records are treated as several fractional samples in the EM algorithm according to a measure of our choice.
There are several ways to determine the weight of each unlabeled record.One way is to use the same confidence measure as we discussed before (q n ).However, at least in some cases, we have additional information that we can use to determine the weights.
For instance, we can view a sample x n i as a set of sets of temporal features fF n t jt <¼ ig where each set of features, F n t , is sampled from user n's social-network accounts at a different point in time, t.In case x n i is a labeled record, there exists a label for it at time i, l n i , in addition to its set of features.If the record is unlabeled, all we have for time i is the set of sampled features.However, while a current label is missing for an unlabeled sample, for some unlabeled samples x n i (D LP2 in Algorithm 1, for instance) there exists at least one prior label l n t for some t < i.We can use such prior labels to obtain a better understanding of the unlabeled sample.One option is to evaluate the influence of prior labeled samples (which correspond to the same user n) on the model.Intuitively, if a sample was highly influential on the model at time t, it is likely that it will be influential at time i, i > t as well.Therefore, the second component of the weight of an unlabeled sample x n i is the level of influence of the most recent labeled sample which corresponds to the same user, x n t , t < i (GetLastLabeledRecord()), measured using a Leave One Out (LOO)-based measure, such as the conditional predictive ordinate (Gelfand et al., 1992; NormalizedInfluenceScore()).
Furthermore, the more recent a prior label is observed for an unlabeled sample, the more confident we are with regard to its influence on the model.For instance, assuming a survey was conducted at time t, thus obtaining labels for a subset of samples, X.At each consecutive point in time, we keep sampling social-network data of samples in X, thus obtaining temporal features, but not labels, for each sample.For those samples, the most recent label was obtained at time t and therefore for all unlabeled samples, the label obtained from the survey is the one that we consider for measuring the sample's influence.Now, suppose that at time t a > t, for a given sample x n1 t a we obtain a new label, l n1 t a updated with respect to time t a ; this can happen, for instance, if the user has decided to publicly reveal her intention (or any other time-varying behavioral attribute that we are trying to infer) in one of her social-network accounts at time t a .In this case, at each consecutive point in time t b > t a > t we will continue using x n t for weight calculation of our unlabeled samples (x n t b ) except for the unlabeled sample x n1 t b , which now has a newer prior label obtained at time t a > t and thus its weight calculations will be done according to x n1 t a :

Ageing of labeled data
The more recent a prior labeled sample x n t is, the more confident we are that its influence on the model resembles the influence of the current unlabeled sample x n i : Therefore, we weight the influence of each labeled sample x n t by an exponential fading function that exponentially decays the influence of a labeled sample with time.Specifically, we assign each unlabeled sample the following weight: where i is the current time point of an unlabeled sample x n i , and t is the time point of a labeled sample x n t : From Equation 4, we can see that the weight of each unlabeled sample at time i is composed of its confidence score q n i , as well as the influence of the most recent labeled sample corresponding to the same user n, weighted by a function of the labeled-sample "recency."k is a decay factor: The lower the value of k, the higher the impact we assign to older labeled samples.The decay factor can change between time points; for instance, if at some time point we have a large number of labeled records we might want to use a higher decay factor.Finally, note that if too much time has passed between the current point in time and the time of the most recent label acquisition (i.e., more than S in Algorithm 2) we drop the labeled-sample-influence component (s n in Algorithm 2) and only consider the unlabeled-sample-confidence component (q n in Algorithm 2).

Intention-specific models
We evaluate the performance of our approach using the five intentions considered in this article: weight-loss intentions, vaccination intentions, job-searching intentions, borrowing intentions, and travel-purchase intentions.For each intention, we build its own DBN which leans on algorithms (feature selection, parameter learning, etc.) and features developed in previous sections both for the static case and the dynamic case.Note that because we used a semi-supervised training approach, we were able to combine both labeled and unlabeled data in our training sets to reduce self-selection bias.Using the EM algorithm we were also able to use the original, incomplete training dataset.We believed that since our test datasets include a large number of missing values, training the DBN on incomplete datasets will allow the DBN to learn relations between missing and observed values of different features.Prior information was combined in the model using a Dirichlet prior.Figure 4 presents a high-level overview of our intention-specific DBNs (third-level features are omitted due to space limitations).
As can be seen, a temporal link is created between variables that represent our target intentions in consecutive time slices.Pðintention iþ1 jintention i , UÞ represents the intention's evolution over time, given changes in other temporal variables in the network (U).
Interest-intention is an interesting relation.First, we see that interest may serve as either a cause or an effect of different intentions.Second, interest seems to be a cyclic process as can be concluded from PðWI i jinterest i , UÞ and Pðinterest iþ1 jWI i Þ, for example.Such a temporal relation might be attributed to the fact that interest in a certain topic assists in forming a behavioral intention related to that topic.After the intention has been formed, a new level of interest is formed, aimed at understanding how to fulfill that intention.In addition, PðPI iþ1 jinterest i , UÞ and Pðinterest iþ1 jPI iþ1 Þ show that both prior interest-level and current interest-level are important determinants of some intentions.Such historical data can assist in identifying a sudden increase in the user's interest level.
"Opinion" is another interesting variable; it is influenced by multiple factors, such as personality traits and demographics as demonstrated by VI's Pðopinion iþ1 jopinion i , education, age, personalityÞ: Note that this cpt also contains opinion i .This represents the fact that oftentimes, opinion is a self-propelling process: opinion at a given point in time, in addition to other factors, influences opinion at future points in time.A similar cpt is seen in "COVID-19 concern." In Section 8, we show our inference results [Lauritzen-Spiegelhalter junction tree algorithm (Lauritzen & Spiegelhalter, 1988)] when using a two-slice DBN and social-network data sampled twice.

Data collection
We designed and distributed a comprehensive survey, created and hosted using Qualtrics survey platform.The first part of our survey contained questions about the participants' personal attributes, as discussed in Section 4. The second part contained the following statements, which users were asked to rank (as well as dummy statements about unrelated intentions): "I am planning to start a weight-loss regime within the next 1-4 weeks" and "I am currently trying to lose weight" (weight-loss intentions); "I am planning to look for a new job within the next 1-4 weeks" and "I am currently looking for a new job" (job-searching intentions); "I am planning to apply for a loan within the next 1-4 weeks" (borrowing intentions); "I received a flu vaccine this season"-depending on the answer to that statement the following statement was presented for either the upcoming (2020-2021) flu season or the next season (2021-2022): "I am planning to get vaccinated against influenza this upcoming fall-winter/next year" (vaccination intentions); "I am planning to make a travel-related purchase within the next 1-4 weeks" (travel-purchase intentions).All survey data was anonymized after collection.We informed participants that their responses would be used for academic research.

Datasets
Survey data was collected in two waves with a three-month lag.Training and test datasets include data obtained from Amazon Mechanical Turk (MTurk), Facebook, Instagram, and Linkedin.MTurk participants were presented with two options: providing links to their social-network profiles (which they reported having in the screening step), or answering a series of questions about their social-network profiles; providing social-network links was optional and completely voluntary.
A well-known risk of using surveys for acquiring labels is participants who lie in the survey.Nevertheless, obtaining labels through surveys is the only viable option in cases where existing datasets cannot be used, as none contains the labels of the attribute of interest.While some works use self-reported labels, which can be collected automatically, this approach is highly biased and populates the data set only with those individuals who choose to actively reveal their label (target intention, for instance) online.
Furthermore, surveys are a common, well-known, and well-respected methodology for obtaining labels in prior HCI and attribute-inference literature; almost two-thirds of the HCI and attribute-inference articles that are cited in this article used surveys as the primary means of label collection.
Lastly, we implemented several methods for identifying and excluding data from participants who answered unreliably.
We eliminated responses from participants who took the survey more than once.While Qualtrics offers a cookiebased option to block users from taking the survey twice, we have found it to be unreliable.We, therefore, took a conservative approach and discarded responses that came from the same IP address.
To ensure the eligibility of participants, they had to complete a screening questionnaire before taking the survey.The survey was carefully designed, to prevent respondents from inferring the qualifications we are looking for and taking the survey without being an eligible participants.We avoided yes/no questions regarding the qualifications we were looking for and used multiway questions instead, as yes/no questions tend to insinuate the "correct" answer to pass the screening questionnaire.Second, we disguised the real screening questions among other dummy questions.For example: instead of directly including the following question in our screening questionnaire: "do you have a Facebook account"?we asked a series of identical multiway questions about news consumption habits from each of the media platforms (TV, newspaper, radio, websites, Facebook).For example, the question "how often do you consume news via TV?" had the following five possible answers: More than 5 times a day; 3 times a day; once daily; Never; I do not have a TV.In the same question that dealt with Facebook, the last answer was replaced by "I do not have a Facebook account." In the survey, we avoided yes/no questions (for reasons described above) and used multiway questions or open questions instead.For many multi-way questions we included, in addition to the real statements we are interested in, dummy statements; for instance, in questions that asked for the user's intentions, we provided statements, such as "I am planning to buy a car within the next 4 weeks," or "I am looking to move to a new apartment within the next 4 weeks" (dummy statements for car-purchase intentions and moving intentions).
In addition, we included control questions to ensure that the respondents were providing reliable data; those were fairly straightforward questions that asked the same question multiple times, in different parts of the survey and used slightly different terminology.We took a conservative approach and excluded participants who failed in one or more of the control questions.
Our datasets include both labeled and unlabeled data; unlabeled data is specifically important when using multiwave data, as a considerable number of participants dropped out after the first wave: from 1300 respondents who participated in our first-wave survey, only 803 participated in our second-wave survey (0.617 response rate).To both reduce non-response bias and create a bigger training dataset, we combined both labeled and unlabeled data records in our second-wave training set where unlabeled records belong to participants who dropped out (missing attributes were treated as missing values, see Section 6).
We considered two configurations.First, only 150 unlabeled records, which received the highest confidence scores (Section 6) were incorporated into the second-wave dataset.In this configuration, C1, our training datasets, D 1 j (first-wave data for intention j) and D 2 j (second-wave data for intention j) consist of 780 and 592 labeled and unlabeled data records, respectively.Our test datasets, D 3 j (first-wave data for intention j) and D 4 j (second-wave data for intention j) consist of 520 and 361 labeled data records, respectively.
In the second configuration, C2, we used a larger set of unlabeled data, and combined an identical number of labeled and unlabeled records in our second-wave training set.Our training datasets, D 1 j and D 2 j consist of 780 and 606 (303 labeled and unlabeled) data records, respectively.Our test datasets, D 3 j (first-wave data for intention j) and D 4 j (second-wave data for intention j) consist of 520 and 500 labeled data records, respectively.In this way, we were able to both assess the influence of unlabeled data on our inference task and increase the size of the second-wave test set so both D 3 j and D 4 j contain roughly the same number of records (500 and 520).The size of our datasets is similar to datasets used in prior attribute-inference articles that collect their own datasets and in which labels are obtained using surveys or other means that require the users' active participation (unlike self-reported labels-a highly biased methodology).Examples include Al Zamal et al. (2012, p. 400), Conover (2011, p. 956), De Choudhury et al. (2013, p. 1583), Golbeck et al. (2011, p. 279), Guntuku (2019, p. 601), Jaidka et al. (2018, p. 523), Kristensen et al. (2017Kristensen et al. ( , p. 1216)), Lampos et al. (2016, p. 1342), Lopez-Brau et al. (2020, p. 80), Mann et al. (2020, p. 221), Marengo et al. (2021, p. 603, 2022, p. 1094), and Staiano et al. (2012, p. 53).Moreover, many social-network platforms forbid the automatic scrapping of social-network accounts.Thus, developing methods that can work on medium-sized datasets and yet provide good insights is important for understanding what can social-network providers learn about their users.

Experimental results
For a given intention, j, we tested its DBN, DBN j , using our datasets as follows: in the first stage (1), we trained DBN j using D 1 j and tested it on D 3 j : Only the first DBN's slice was affected in this stage.In the second stage (2), we trained DBN j using D 2 j (implicitly using D 1 j as well due to the use of priors) and tested it on D 4 j , using evidence data from D 3 j as well.Hence, inference results in (2) were obtained based on data and parameters from two slices of the DBN.Note that "evidence data" contains only historical values of publicly available features, and does not include historical labels as in most cases exact information on historical labels for all prior test sets will not be available in real-time.As can be seen in Table 2, some of our datasets are highly imbalanced.Moreover, to simulate a real-world inference task we only consider the public portion of each user's online social-network profiles.Therefore, each dataset contains a large number of missing values that correspond to attributes that the user has not publicly revealed on her social-network accounts (attributes with more than 50% missing values are shown in Table 5).Those facts make the inference task highly challenging, both as a stand-alone task and compared to inference tasks in prior attribute-inference works.
Table 3 provides a detailed summary of our results.We report Micro F1 and Macro F1 scores for each stage ((1) and ( 2)) and for each intention-specific DBN.Table 4 compares the average ROC AUC results of C1 and C2: In addition, we compare our average ROC AUC scores to those achieved by boosted decision trees (BDT) and support vector machine (SVM), as well as two multivariate time-series classification models: Time-Series Forest (TSF) and Residual Neural Network (ResNset).We implemented our Bayesiannetwork models using BayesFusion, 1 a popular Bayesian-network-modeling software.Models with which we compare the results of our Bayesian-network results were benchmarked using scikit learn (SVM, BDT) and pyts 2 (multivariate TSF) while for ResNet we used the implementation from Wang et al. (2017).
As can be seen, different intentions achieved significantly different Micro F1 and Macro F1 scores.BI's score is the lowest, whereas WI's score is the highest.A possible explanation for BI's performance is that applying for a loan is an intention that is oftentimes not publicly shared on social networks.However, other non-publicly shared intentions, such as JI scored significantly better than BI.This can be attributed to the fact that we were able to find other strong predictors for JI which don't depend on user-generated content, whereas for BI we failed to do so.
When comparing Micro F1 and Macro F1 scores achieved in different stages ((1) and ( 2)) using the same DBN, we can see that the differences are more pronounced for PI and JI.This can be attributed to the underlying differences between different intentions.As evidenced by our data, intentions, such as WI and VI can be seen as "continuous intentions" in the sense that their intentionbehavior (IB) interval, the period of time between intention formation and completion of the associated behavior, is longer than for other intentions; the persistence rate of such intentions is significantly higher than rates reported for PI or JI.Another explanation for the varying differences is the different set of determinants of each intention.While the importance of some of those determinants stems from their intra-slice values (that is, their values at a given point in time), the importance of others is derived from a combination of intra-slice values and inter-slice change patterns between slices.For instance, various features related to nonuser-generated content serve as excellent predictors of PI in (2), but only as solid predictors in (1).In a similar manner, the change in different MISC features between (1) and ( 2) serve as an important predictor of JI in (2) [and for obvious reasons, cannot be used for JI's inference in (1)].
Figure 5 compares our ROC AUC scores to those achieved by two different types of static classifiers: BDT and SVM (RBF kernel) as well as two different types of classifiers that are specifically adapted to MTSC tasks: Multivariate TSF and ResNet.We should note, however, that a comparison between our DBN results and results achieved by other classifiers cannot capture all the unique aspects of the inference task due to the existence of several features that are supported by BNs, yet the generalization of some of those features to other classifiers (especially discriminative classifiers) is not straightforward, or simply impossible.However for the comparison to be as fair as possible, we took several steps: Hyperparameters were tuned using grid search over a large grid covering at least 7 options for each numeric were private, some NUMERIC features can still be extracted even for such accounts (followers/following ratio, number of posts, etc.).hyperparameter; the grid search was configured to optimize the ROC AUC score, as this is the metric we chose to compare the classifiers; imputation of missing values was done using scikit-learn's IterativeImputer (using a random-forest regressor), a multivariate imputation method that is currently considered the best imputation method offered by scikit-Learn that can work with mixed datasets.As seen in Figure 5, our models outperform both static models, SVM and BDT on all five intentions, though the differences in results vary between intentions.A possible explanation is a varying number of dependencies among each intention's features or the existence of dependencies that are of specific importance for each inference task.Another possible explanation is the varying number of missing values within the unique set of features of each intention, as well as the varying number of latent variables (personality, income, impulsivity, etc.).A particularly interesting observation when looking at Figure 5 is that the TSF models do not outperform static classifiers, especially not the BDT; the TSF seems to perform consistently worse than BDT and DBN and almost consistently worse than SVM.This may result from the fact that a multivariate TSF is essentially an ensemble of univariate TSFs, one for each feature, and classification is obtained using a majority vote thus temporal relations between features cannot be modeled.ResNet results were also interesting; they achieved the highest training score (lowest training error) on almost all the datasets; on some datasets, training scores even passed the 0.96 AUC markwe have never encountered such training scores when training our DBN models.However, they did not perform equally well on the test sets, especially for intentions with a more dynamic nature and a shorter intention-behavior interval, such as BI or PI.Of course, a gap between the training error and the test error is to be expected, but ResNet's gap was much higher than the gap we saw when training all the other classifiers.It should be noted that several settings were experimented with when using ResNet to improve generalization errors, such as early stopping.
Looking at Table 4, it seems that using unlabeled data for training does not improve our training set or our results.However, bear in mind that because we used more unlabeled data in C2, we were able to use less labeled data in D 2 j (and for this reason we could use a bigger secondwave test set).The fact that results for training sets with 75% and 50% labeled records do not significantly differ is encouraging with regard to the role played by unlabeled data and its contribution to the inference task.However, we should be cautious when trying to generalize those results and apply them to other inference tasks, as our second-wave models were not only trained by D 2 j , but also implicitly trained using D 1 j (first-wave training set), used as priors.The propagation of information from history into the present is unique to the temporal nature of our inference task and might reduce the negative influence of uninformative or mislabeled unlabeled data records.For instance, we assume that the latent variable "personality," which was obtained using our survey and included in our training set, remains static over a three-month period and thus we were able to use the value of this latent variable in the second-wave training set for the relevant intentions even for those users who did not participate in the second-wave survey.Of course, when referring to "propagation of information from history into present" we only refer to historical training data or publicly-available data, and not historical test-set labels or other means that are not available in real-world temporal inference tasks.
To better understand the influence of different types of features on the inference of our behavioral intentions, we performed a sensitivity analysis, the most common type of ablation study used in Bayesian-network models (Pearl, 1988).Sensitivity analysis investigates the effect of small changes in the network's parameters on the posterior probabilities of the target variables (Laskey, 1995).Parameters that are more sensitive (i.e., a higher sensitivity score) affect the reasoning results more significantly; thus, sensitivity analysis can be used to identify the set of most determinant features with respect to a given target variable and a given network.Table 6 presents three groups of attributes that received the highest average sensitivity score, and three groups of attributes that received the lowest average sensitivity score.Note that those results represent not only the average over all the models (when applicable), but also the average on various components of the same cpt (we find the average to be more stable than the maximum sensitivity score).
As can be seen in Table 6, the most determinant groups of features include KWS features, intention-specific features, and MISC features; the latter is important as MISC features are usually much easier to collect and process than numeric or textual features, yet are oftentimes ignored in attributeinference works.Emotions, on the other hand, were found to have little influence on all intentions.This finding contradicts existing strong priors (Martin et al., 2008;Mohiyeddini et al., 2009;Wetherick, 2002) which find emotions to be an important determinant of some intentions (mostly shorttermed ones).A possible explanation is a fact that emotions change rapidly, and therefore we were not able to correctly capture and model them with coarse-grained sampling rates.Surprisingly, LDA features were also among the features with the lowest influence with respect to all intentions (though were moderately influential on personality-trait nodes).This can be attributed to our use of keyword features, which can be seen as "targeted" LDA features with predefined topics that are known to be highly related to each intention.

Discussion
In this work, we introduced a novel methodology for intention inference.We presented inference results for five behavioral intentions from different domains.We also modeled the problem of intention inference as a time-series classification problem using DBNs and a semi-supervised training approach.
In recent years, a lot of attention has been given to the "big dataset" problem: researching new algorithms and modeling techniques that can be applied to very big datasets.That is, undoubtedly, a highly important task.Yet very few works have focused their attention on models, methods, and techniques for handling small or medium-sized datasets.Such datasets are prevalent in multiple domains; perhaps the best example of such a domain is the medical domain, in which oftentimes labels are expensive to obtain.This work aims at tackling the small dataset problem and offers new solutions for handling small and medium-sized datasets using BNs.Furthermore, in the temporal case, the size of the dataset doesn't only refer to the number of records in the dataset, but also to its length; that is, the number of time points available for each record (row).A short-length temporal dataset can be seen either in cases where the temporal data collection process has just recently begun or because a low sampling rate is used.A low sampling rate might occur either due to a lack of resources, when each sampling (i.e., label, and perhaps feature, acquisition) is expensive, or due to the nature of the problem and domain.For instance, as we discuss in the next subsection, some intentions have very long intention-behavior (IB) intervals and hence there is no added value in using high sampling rates.In this case, we may face the cold start problem, in which when using common discriminative classifiers we will have a relatively long period of time where at the beginning of the data collection process we don't have enough data to make adequate predictions.In such cases, we found that DBNs clearly outperform more complex models, such as NNs.This is particularly true for intentions with a more dynamic nature.TSF, another model particularly suited to the temporal case, performed worse than other models that are not particularly adapted to the time-series case, such as BDT.One possible explanation for the results obtained by the TSF is the fact that a TSF is not an inherent multivariate classifier; to work for the multivariate case, multiple univariate TSFs are chained together to produce a majority-vote ensemble.Thus a TSF cannot model relations between different features over different time points.
Our findings demonstrate a key idea: when designing a model one should carefully consider the traits of the unique inference task and the traits of the unique dataset at hand.For instance, if we were to deal with very big, lengthy datasets it is highly plausible that NNs would have outperformed DBNs.It is also plausible that we would have encountered complexity issues with BNs on very big datasets, though it should be noted that such problems can be tackled by a smart design of the BN which reduces its connectivity, such as the use of canonical gates or a heavier usage of intermediate nodes.

Comparison with existing results
The uniqueness of this work compared to prior work on intention mining is that it aims at inferring a given, offline behavioral intentions of a given individual at a given point in time solely using publicly available social-network data.Our approach is both user-centric, unlike other works which aim at classifying standalone social-network objects, such as posts or tweets; temporal, unlike other works that completely ignore the dynamic nature of intentions; focuses on offline intentions, unlike prior works that only consider online From the paragraph below it, therefore, follows that a direct comparison between our results and results obtained by prior work is not particularly meaningful.Yet to give some context to our results, we give a brief overview of how our results compare to results obtained by other attribute-inference works.We start by discussing prior attribute inference works that aim at inferring the three most commonly-researched classes of attributes: demographic attributes, personality traits, and emotions.Those attributes are also relevant to our work as they were used in our models as latent attributes.Please note that all results are expressed using an accuracy score metric unless mentioned otherwise.9.1.1.Demographic-attributes inference Age and gender are the most well-researched demographic attributes.Schwartz et al. (2013) used LIWC and openvocabulary features to infer social-network users' age and gender based on their status updates (.91).Morgan-Lopez et al. (2017) applied multiple regression models to Twitter datasets to predict both age group and "life stage" (employee, student, etc.).As expected, the younger age group (20-) achieved the highest score (.94 precision, .94recall).Chen et al. (2015) also used a Twitter dataset; the dataset was annotated using MTurk and was then used to infer various demographic attributes including gender, ethnicity, and age (.87, .78,.66).Liu and Ruths (2013) focused on the use of self-reported names in social-network accounts for improving gender inference (.87).Finally, Kulshrestha (2021) aimed at quantifying the extent to which various age and gender categories affect the ability to predict users' web browsing behavior.
Other works aim at inferring a more diverse set of demographic attributes.For instance, Zhong et al. (2015) used location check-ins extracted from online customer reviews to infer demographic attributes, such as education background (0.9 AUC score), marital status (.31 F1-score), and even blood type (.3 F1-score).Huang et al. (2015) used both content-based features and network-based features to infer users' occupation, achieving a .79accuracy score using only content-based features and .81accuracy score using both content-based features and network-based features.Aletras and Chamberlain (2018) examined to what extent Twitter users' graph embeddings may assist in inferring income level and occupational class.Using a support vector machine trained using network features, they achieved a .64accuracy score (10 occupation categories).Lampos et al. (2016) used Twitter data including textual features, topics, and behavioral features to infer the users' income level, achieving .82 on a 2-class problem and .75 on a multi-class problem.9.1.2.Personality inference Segalin (2017) analyzed the effectiveness of different types of visual features, such as color choice in profile pictures, for personality inference (.69).Other articles provide more finegrained results, by separately listing the results obtained for each personality dimension (Big Five Model).Pratama and Sarno (2015) compared the accuracy of multiple models and received the highest accuracy score using a Naive Bayes model.Openness scored the highest (.63) whereas extraversion and neuroticism scored the lowest (.57).Staiano et al. (2012) examined personality classification results obtained by using various feature sets.The best scores were obtained for openness (.77) whereas the lowest overall scores were obtained for neuroticism (.52).The results demonstrate the importance of the choice of features used in the models.Wald et al. (2012) report even more fine-grained results, by both comparing several models and ranking individuals in terms of the Big Five dimensions, and predicting which users will appear in the top or bottom five and ten percent of each dimension.Their results demonstrate the significant variance that exists when predicting different personality dimensions.For instance, using their best-achieving model, the low/high five percent were best predicted for extraversion (.64) and worst for agreeableness (.28).Openness scored the highest (.74), significantly higher than other dimensions, for the top ten percent.Those results are consistent with pioneering works in the personality-inference domain, such as Golbeck et al. (2011) and Youyou et al. (2015).

Emotion inference
The vast majority of the works on emotion detection use the Ekman model and focus on six basic emotions: joy, fear, disgust, anger, sadness, and surprise.Wang et al. (2012) used multiple Twitter datasets as well as multiple feature sets, focusing on LIWC features; anger and joy scored the highest (.72 and .71F1-score).Roberts et al. (2012) also focused on LIWC features, achieving the highest score on fear, followed by joy (.74, .67).Joy was the highest-scored emotion in Mohammad and Kiritchenko (2015) as well, along with fear (.62 and .5 F1-score), when examining the extent to which hashtags can serve good labels for accompanied tweets.Volkova and Bachrach (2016) tried to infer emotions and understand their relation to demographic attributes.They achieved the best scores on disgust and anger (.92, .8).Abdul-Mageed and Ungar (2017) extended the classification task to eight basic emotions, using recurrent neural networks applied to Twitter data, achieving impressive results for both common emotions, such as joy (.91) and less common emotions, such as disgust (.82) and surprise (.86).From the results we can conclude that similarly to the personality-inference task, emotion inference results heavily vary according to the specific emotion to be inferred.
Prior work on intention inference primarily focuses on purchase intentions and voting intentions.Some works also research vaccination intentions; yet, such works apply a content-centric approach instead of a user-centric approach, where posts that are selected to be included in the dataset must contain at least one keyword from a given keyword set (Aramaki et al., 2011;Huang, 2017;.75, .82 F1-score).This is a highly-biased methodology that does not generalize well as the vast majority of the users do not produce public social-network content that contains a specific keyword.Other works on vaccination intent in social networks focus on learning general opinions on vaccination, or understanding general characteristics of people with anti-vaccination opinions rather than taking a temporal approach and inferring whether a given social-network user intends to get vaccinated at a specific point in time (Cossard, 2020;Mitra et al., 2016).
Inference of purchase intentions is a highly researched topic.Yet, existing works either focus on general purchase preferences, rather than a time-specific purchase intention (Yang et al., 2015;Zhang & Pennacchiotti, 2013;.79);are limited to online purchase intentions (Ding et al., 2015;Lo et al., 2016;.72 AUC score, .72);use private data, such as logs of E-commerce websites rather than public social-network data (Mokryn et al., 2019;.78;Montgomery et al., 2004); or, as in the vaccination intention inference case, take a content-centric approach rather than a user-centric approach (Atouati et al., 2020;81 F1-score;Gupta, 2014;.89).
Voting intentions differ from the everyday-life type of intentions researched in this article in multiple respects.For instance, when trying to infer an individual's voting intention, we know that there exists a well-known, public deadline until which she has to make up her mind with regards to her intent (election day).The latter simplifies the inference task compared to everyday-life intention inference, as an intention can form at any point in time and decline at any point in time after its initial formation; then after a certain period of time the same intention may form again and there exists no "public deadline" when those changes happen unlike the case for voting intentions.Voting intentions inference was researched by Bansal and Srivastava (2019), Boutet et al. (2012), Idan and Feigenbaum (2019), Kassraie et al. (2017), andKristensen et al. (2017).

Theoretical and practical implications
The results obtained for different intentions demonstrate an important principle: behavioral intentions should not be treated as a monolith for inference and prediction purposes.When trying to infer a behavioral intention, one should carefully consider the unique traits of the intention and its domain to understand how to best sample data, build a model, and train it.For instance, we have found that one such important trait is the intention-behavior interval of an intention.For many intentions, such information can be easily obtained from existing literature, especially behavioral psychology and sociology literature.
Another important implication of our method and our experiments that should be highlighted is the encouragement to go beyond labeled data and incorporate unlabeled data in the training phase.In some domains, acquiring labeled data is a highly challenging task.The possibilities that are available to the researcher are either using existing datasets; manual data collection (surveys, for instance)-which requires a lot of time and resources, or using self-reported labels-a highly biased methodology.At the same time, unlabeled data is usually much easier to collect, yet it remains a largely untapped resource.Using a novel, revised version of the EM algorithm which is specifically suited for handling temporal data we were able to obtain adequate results even when the labeled-unlabeled data ratio decreased from .75 to .5, as seen in Table 4. Based on our results, we believe that more research efforts should be invested in finding novel semi-supervised techniques in general, and semi-supervised techniques that are specifically suited to the temporal case in particular, as such methods can save a significant amount of time and resources dedicated to labeleddata collection.

Conclusion, limitations, and future work
In this article, we presented a new methodology for intention inference using Bayesian networks.We then presented a new conceptual model of social-network users using dynamic Bayesian networks and utilized it to build intention-specific DBN models that can capture the temporal nature of the human decision-making process as well as handle common challenges in social-network research, such as incomplete datasets, unlabeled data, and bidirectional influence.The models' applicability for real-world inference tasks was evaluated using different types of intentions in different domains and multi-wave, social-network data, achieving promising results despite the use of highly imbalanced, incomplete datasets.
Even though tremendous efforts have been put into this work, it has several limitations, some of which we hope to address in future work.
The first limitation is the use of surveys for obtaining ground truth (labels).Evidently, individuals can lie in the survey, leading to false results.Nevertheless, as extensively explained in Section 7, collecting labels through surveys was the only viable tool that we could use, as no existing, public dataset contains the attributes that we needed for this work, and on the other hand, label collection using self-reported labels is an extremely biased methodology which we believe does not and cannot generalize.Furthermore, as extensively discussed in Section 7, we applied multiple mechanisms to ensure that we identify those individuals who do not answer truthfully to the survey and excluded them from our dataset.
The second limitation is the "length" of our dataset: data was collected using two waves (with a three-month lag) due to limited time and resources.In future work, we plan to extend our research and sample data at more time points, as many as time and resources allow (respondents must be separately paid in each wave).It should be noted, however, that a short-length dataset reflects many real-world use cases in which the data collector is not the data owner and does not have the resources to perform data collection in many waves; thus, there is a justification to explore this scenario as well, as done in this article.
The third limitation is the fact that we used the same sampling rate for each intention.As discussed in previous sections, the optimal setting is determining the sampling rate separately for each intention according to its unique intention-behavior interval.We believe that such finegrained sampling rates would yield better results and allow us to effectively combine features with a very strong temporal nature, such as emotions, which change rapidly, and hence given the coarse-grained sampling rates used in this work, their contribution to our models was not significant.
In future work, we will explore the use of higher-order dynamic Bayesian networks and intention-specific sampling rates; new Bayesian-network-based feature-selection algorithms that are specifically suited for temporal settings, like the one described in this article; and additional weighting schemes for unlabeled records in semi-supervised Bayesiannetwork training algorithms.Another interesting and important research direction in the context of intention inference is behavior inference; that is, not only identifying behavioral intentions but also predicting which intentions will eventually develop into behaviors.One challenge that might present itself when utilizing the results of such a prediction task for performing a certain action (for instance, presenting an ad to the user), is understanding to what extent the original intention led to the behavior, and to what extent the action that was taken as a result of the prediction results (i.e., presenting the ad to the user) led to the behavior.The latter is another interesting question that we would like to explore in our future work.Notes 1. https://www.bayesfusion.com/2. https://pyts.readthedocs.io/en/stable/

Figure 2 .
Figure 2. A diagram of our static intention-inference model.

Figure 4 .
Figure 4.A DBN representation of various intentions.

Table 1 .
Features used in this work (L: latent; O: observed; PO: partially observed).
Ug, Inferred value of attribute I at time i for each user u 2 U Require: There exists a training procedure TrainNetwork, s.t.h ¼ TrainNetworkðA l , A ul , X, X ESS Þ (parameters: labeled data, unlabeled data, prior, equivalent sample size (ESS)) CðIÞ then 18: for v 2 D LP do j i Þjj 2 Jg ¼ FindLabeledSNAccountsðIÞ 10: if i < c and i 2 CðIÞ then 11: for a 2 A do 12: F a i ExtractFeaturesFromSNAccountðI, i, aÞ 13: end for 14: D UL F UL [ fF a i ja 2 Ag 15: h i TrainNetworkðD LP [ D LPN , D UL , h iÀ1 , rÞ 16: end if 17: if i !c and i 2

Table 5 .
Attributes with more than 50% missing values.It should be noted that though most Instagram accounts in our datasets

Table 3 .
Results of the DBN models presented in this article.