Using a Virtual Workplace Environment to Reduce Implicit Gender Bias

Abstract Implicit gender bias has costly and complex consequences for women in the workplace, with many women reporting gender microaggressions which result in them being overlooked or disrespected. We present an online desktop virtual environment that follows the story of a male or female self-avatar from the first-person perspective, who either experiences a positive or negative workplace scenario. The negative scenario included many examples from the taxonomy of gender microaggressions. Participants who experienced negative workplace experiences with a female self-avatar had significantly decreased levels of implicit gender bias compared to those who had a male self-avatar. There was evidence of empathy and perspective taking in the negative condition for the female self-avatar. Experiences of a positive workplace scenario showed no significant decreases in implicit gender bias regardless of self-avatar gender. We discuss the implications of these findings and make recommendations for virtual environment technologies and scenarios with respect to the reduction of implicit biases.


Introduction
There is evidence to support that a person can consciously believe that they would never willingly discriminate against others, but still unconsciously have implicit biases that can affect their behavior (Banaji & Greenwald, 2016; Baron & Banaji, 2006;Boysen & Vogel, 2008). These implicit biases can be defined as "bits of knowledge about social groups … [which] are stored in our brains because we encounter them so frequently in our cultural environments … [the biases] can influence our behavior towards members of particular social groups, but we remain oblivious to their influence." (page xii) (Banaji & Greenwald, 2016). As a society, we have come a long way in recognizing explicit biases, however, we are all unwitting victims of social conditioning that imbues us with biased implicit attitudes.
There has been a call to action from the humancomputer interaction community on Feminist HCI (Bardzell, 2010), Queer(ing) HCI (Spiel et al., 2019), and Critical Race Theory in HCI (Ogbonnaya-Ogburu et al., 2020) to integrate concepts such as these into HCI research. In addition, the principle of gender equality is viewed as a topic of global importance (Fetterolf, 2015), however, 42% of women in the United States say they have faced gender discrimination in their workplace (Parker & Funk, 2017a). These numbers are much higher in STEM settings where men outnumber women (78%), women in computer jobs (74%), and women in STEM with a postgraduate degree (62%) (Parker & Funk, 2017b). Within these groups, there are also much higher reports that gender has made it hard for women to succeed at work and of sexual harassment at work (Parker & Funk, 2017b). Research has shown that implicit gender biases have complex and costly consequences on women in the workplace (Byron, 2007;Carli, 2001;Caruso et al., 2009;Eagly et al., 1992;Glick et al., 1988;Gutek, 1985;Heilman et al., 2004;Rudman & Glick, 1999;Rudman & Kilianski, 2000;Shackelford et al., 1996). Megan Smith, third Chief Technology Officer of the United States (2014)(2015)(2016)(2017), called this the "death by 1000 cuts" (KVUE, 2016).
In this paper, we examine the use of desktop virtual environments to reduce implicit gender biases, through the adoption of self-avatars being placed in negative and positive workplace scenarios. While this was originally meant to be a in-person study with full body visuomotor synchrony akin to (Banakou et al., 2016;Gonzalez-Liencres et al., 2020;Hamilton-Giachritsis et al., 2018;Lopez et al., 2019;Neyret et al., 2020;Peck et al., 2013) we pivoted to an online desktop virtual environment as a result of the COVID-19 pandemic. This had the added benefit of opening the study up to a much broader demographic of participants who would not normally have the resources or access to virtual reality equipment. We found that participants who experienced a negative workplace scenario using gender microaggressions as a female self-avatar had a significant decrease in implicit gender bias compared to a male self-avatar. We found no significant differences in implicit gender bias in a positive workplace environment. The objectives of this study, therefore, are: 2. Investigate the effects of embodiment and empathy in a desktop virtual environment with self-avatars. 3. Discuss the implications of the findings for reduction in implicit gender biases and make recommendations for future desktop virtual environment scenarios.

Related work
2.1. Implicit gender biases and gender microaggressions 2.1.1. Implicit gender biases When it comes to implicit gender biases, women face complex and costly situations in the workplace and find themselves in the following double bind (Caruso et al., 2009): Women in the workplace are expected to be more sympathetic and nurturing than men (Gutek, 1985), but as a result, are not considered to have the suitable qualities to be managers (Rudman & Kilianski, 2000;Spence & Buckner, 2000). Therefore, it has been necessary for women to breakaway from gender stereotypes and to act "more like men" to be seriously considered for positions that require leadership skills (Glick et al., 1988;Rudman & Glick, 1999). However, research shows that women who display assertive and directive leadership qualities are evaluated more negatively than men with the same qualities (Eagly et al., 1992). Women who behave in a dominant manner or are extremely competent at their jobs lose likeability and influence (Carli, 2001;Shackelford et al., 1996). Women in supervisory roles may be penalized for not attending to others' emotions or expressing anger (Byron, 2007), for leading in a task-orientated versus a people-oriented style (Eagly et al., 1992), as well as performing extremely well in traditionally maledominated positions (Heilman et al., 2004) compared to the same behaviors from their male counterparts. Unfortunately, a simple awareness of one's implicit bias is not enough to change it (Banaji & Greenwald, 2016), however, there has been some evidence for the malleability of implicit biases across race, gender, and age biases (Blair, 2002;Blair et al., 2001;Brambilla et al., 2012;Dasgupta, 2013;Dasgupta & Asgari, 2004;Dasgupta & Greenwald, 2001;, with excellent reviews by (Blair, 2002) and (Dasgupta, 2013). For example, subjects asked to imagine a strong woman as a mental exercise experienced reduced Implicit Association Test (IAT) male ¼ strong associations (Blair et al., 2001). Banaji and Greenwald (2016) think that these changes in implicit biases may stretch out and then return to their earlier configurations, but there is evidence to show that repeated exposure to bias changing situations can create sustained reduction in biases. Women college students showed increases in female ¼ leader and female ¼ math associations after exposure to women faculty members (Banaji & Greenwald, 2016).
Implicit biases seem to be intrinsically linked with microaggressions, which studies suggest are implicit in nature Sue & Capodilupo, 2008;. We turn to gender microaggressions and their consequences, specifically in the workplace.

Gender microaggressions
Microaggressions have been defined as "the brief and commonplace daily verbal, behavioral, and environmental indignities, whether intentional or unintentional, that communicate hostile, derogatory, or negative racial, gender, sexual-orientation, and religious slights and insults to the target person or group" . Gender microaggressions devalue women and dismiss many of their accomplishments. They can occur frequently, thereby limiting women's effectiveness in professional environments (Banaji & Greenwald, 1995;Benokraitis, 1997). In the workplace, many women have reported being overlooked, disrespected, or dismissed by male colleagues. A classic example which is cited in the literature and which many women report (including some female authors on this paper) is that of a female employee contributing an idea during a meeting, which a male superior may not respond to or seemingly not hear. However, when a male coworker suggests the same idea, it is recognized and praised by colleagues (Sue, 2010). Other examples of gender microaggressions in the workplace include exclusion from formal and informal meetings, lack of effective mentorship compared to men, male mentors mistaking their interactions as a sexual invitation, and over 60% of women reporting sexual harassment at the workplace in the form of sexist jokes and unwanted sexual attention (Lyness & Thompson, 2000;Piotrkowski, 1998).
Taxonomies of gender microaggressions have been researched and identified (Capodilupo et al., 2010;Nadal et al., 2010;Sue, 2010;Sue & Capodilupo, 2008) into the following categories: Sexual Objectification, Second-Class Citizenship, Use of Sexist Language, Assumptions of Inferiority, Denial of the Reality of Sexism, Traditional Gender Role Assumptions, Invisibility, Denial of Individual Sexism, and Sexist Humor/Jokes. We further discuss the taxonomy of gender microaggressions in the Materials and Methods section.

Perspective taking
Perspective taking is the process of putting oneself into the shoes of another person and imagining the other person's situation and point of view. Perspective taking has been used in social psychology to reduce stereotype activation and reduce biases between groups (Galinsky, 2002;Galinsky & Ku, 2004;Galinsky & Moskowitz, 2000). Perspective taking has been shown to make the observer think that the target's behavior is due to situational, rather than dispositional, reasons (Regan & Totten, 1975). This is actually akin to how we explain our own behaviors (Jones & Nisbett, 1987). For example, we tend to give considerable weight to external or environmental (situational) causes, such as "I did badly in that presentation because my boss did not give me enough time to prepare." When observing other people we tend to place more emphasis on internal or personal (dispositional) causes for their behavior, such as "That person performed badly on the presentation because s/he is lazy." Thus, perspective taking allows us to consider others' situations and be more understanding or thoughtful about their behavior.
These findings tie in with results from Davis et al. (1996) that showed that perspective taking creates an overlap between the self and other, where the observer's thoughts and feelings about the target become more "self-like." The observer's two mental representations of the self and target come to share more features in common, creating a merging of the self and other. Importantly, Davis et al. (1996) found that this overlapping of self and other occurred implicitly (at an unconscious level) as it was unaffected by dividing the attention of the participants. Galinsky and Moskowitz (2000) demonstrated that perspective taking results in greater overlap between the self and the whole group to which to the target belongs, and successfully decreased bias and stereotypes. Both Davis et al. (1996) and Galinsky and Moskowitz (2000) discuss that perspective taking occurs because it decreases the accessibility of stereotypes that normally occurs through stereotype activation. Stereotype activation occurs when stereotyped-group features such as gender and race activate automatic stereotypes about a social group (Bargh et al., 1996;Devine, 1989). By using perspective taking, the activation of the self-concept and one's selfschema overpowers the activation of the stereotype normally associated with that social group (Davis et al., 1996;Galinsky & Moskowitz, 2000).

Self-Avatars in virtual environments
Virtual environments 1 allow users to inhabit an avatar of a different race (Banakou et al., 2016;Groom et al., 2009;Peck et al., 2013), gender (Gonzalez-Liencres et al., 2020;Lee et al., 2014;Lopez et al., 2019;Muller et al., 2017;Neyret et al., 2020;Seinfeld et al., 2018), age (Hamilton-Giachritsis et al., 2018;Yee & Bailenson, 2006), or even a different species (Ahn et al., 2016). First person perspective taking in virtual simulations can be used as a powerful tool to decrease implicit bias (Banakou et al., 2016;Gonzalez-Liencres et al., 2020;Peck et al., 2013), decrease negative stereotyping (Yee & Bailenson, 2006), increase empathy (Ahn et al., 2016;Hamilton-Giachritsis et al., 2018;Muller et al., 2017;Seinfeld et al., 2018), and decrease conflict in negotiations (Gehlbach et al., 2015). However, certain circumstances can also lead to increased implicit racial (Groom et al., 2009) or gender (Lopez et al., 2019) bias or even increased aggression towards women (Neyret et al., 2020). For example, taking the first person perspective of a darkskinned avatar in neutral scenarios (standing in front of a mirror) (Peck et al., 2013) and doing Tai Chi (Banakou et al., 2016)) led to a reduction in implicit racial bias. However, participants embodying a dark-skinned avatar in a job interview scenario actually had increases in implicit racial bias (Groom et al., 2009). Although there were technical differences in levels of embodiment between the studies, the fundamental difference was the scenario, i.e., that of a job interview, which is a negatively stereotyped situation for people of color. It is very likely that stereotype activation (Bargh et al., 1996) occurred, which is the activation of stereotypes about a group by a feature such as gender or race, overwhelming any of the positive effects of perspective taking (Groom et al., 2009).
Similarly, Lopez et al. (2019) found an increase in implicit gender bias when males embodied a female avatar. The scenario, carrying out Tai Chi in front of a mirror, was initially thought to be 'benign' due to previous work with race (Banakou et al., 2016). However, it is most likely that doing a sport was not a gender neutral scenario and most likely also caused stereotype activation. On the other hand, using negative scenarios that put men into the first person perspective of women experiencing domestic violence (Gonzalez-Liencres et al., 2020;Seinfeld et al., 2018) or sexual harassment (Neyret et al., 2020) has been shown to decrease implicit gender bias or increase empathy. Seinfeld et al. (2018) put male domestic violence offenders in the female avatar of a domestic abuse victim and found that offenders improved their ability to recognize fearful facial expressions on women. Gonzalez-Liencres et al. (2020) put non-offender men into the first person perspective scenario of a woman in intimate partner violence and found the level of decrease in implicit gender bias was correlated with the level of identification with the female avatar. Neyret et al. (2020) found that men who embodied a woman experiencing sexual harassment by a group of men at a bar were less likely than a control to give virtual electric shocks to a woman in a later Milgram obedience scenario. However, they also found that men who embodied the perspective of the men doing the sexual harassment actually doubled the number of virtual electric shocks they administered to a woman. It is clear that the virtual scenarios and the avatars that users embody and interact with are critical to the outcome of the experiments and have a real impact on the user cognitively and behaviorally.

Online virtual environments and crowdsourcing
Recent research has investigated crowdsourcing and conducting online VR experiments outside the laboratory (Huber & Gajos, 2020;Ma et al., 2018;Mottelson & Hornbaek, 2017;Saffo et al., 2020;Steed et al., 2016). Steed et al. (2016) carried out the first VR experiment in the wild using Google Cardboard and Samsung Gear and demonstrated that the presence of a self-avatar had a positive effect on self-report of embodiment. Ma et al. (2018) investigated crowdsourced VR experiments by recruiting Amazon Mechanical Turk (AMT) workers who owned a VR device. They focused on one headset device designed to be used with a Samsung mobile phone, which was owned by the majority of AMT workers who responded. The study was only able to replicate one type of VR illusion (Gonzalez-Franco & Lanier, 2017) out of three studies. This could have been due to the fact that the mobile device used to power the headset had performance limitations and did not deliver high-resolution, rapid rendering (Ma et al., 2018). Similar issues were faced by (Mottelson & Hornbaek, 2017) who found that performance metrics were mostly governed by the technology used. Saffo et al. (2020) investigated crowdsourcing on the online multiplayer social platform VRChat. The VRChat SDK ruled out virtual environments with complex interactions which limited the experimenters to static stimuli and there were limitations with participant recruitment (Saffo et al., 2020).
Huber and Gajos (2020) studied unpaid and unsupervised online virtual environments across different devices with 91% of their users working from desktop devices. They were able to replicate the results of two VR studies, and found no effect of device type on place illusion and embodiment illusion, suggesting that it is an equally effective mechanism independent of the device. These findings are backed up by other studies that have utilized desktop virtual environments such as Gehlbach et al. (2015) who recruited AMT workers and carried out perspective taking successfully. Peña et al. (2009) showed that negative priming of avatars can affect user attitude and cognition using desktop virtual settings.
While there is great potential for crowdsourcing VR studies, very few participants had devices at home that allowed hand movement tracking and no devices for full body tracking (Huber & Gajos, 2020;Ma et al., 2018;Saffo et al., 2020;Steed et al., 2016). There are also limitations in the demographics of participants who have the financial resources or interest in purchasing VR equipment with the majority of participants being male (Huber & Gajos, 2020;Ma et al., 2018;Mottelson & Hornbaek, 2017), White (Ma et al., 2018), from the USA (Huber & Gajos, 2020;Ma et al., 2018), and relatively young (Huber & Gajos, 2020;Ma et al., 2018).

Materials and methods
We pivoted away from the original methodology of a visuomotor synchrony VR in-lab study as a result of the COVID-19 pandemic. Due to the lack of full body tracking equipment in people's homes (Huber & Gajos, 2020;Ma et al., 2018), evidence that perspective taking in desktop virtual environments is possible (Gehlbach et al., 2015;Huber & Gajos, 2020;Peña et al., 2009), evidence that empathy can be increased effectively through desktop environments (Herrera et al., 2018), and the benefits of reaching a broader demographic who may not have the financial means to purchase VR equipment, we pivoted to an online desktop virtual environment.

Virtual environment scenarios
There were five scenes in each workplace scenario ( Figure 1). The story was told from the first-person perspective of either a male (Kevin) or female (Kate) self-avatar, who interviews for a position at a company and gets the job (scene 1), has their first day at the office and meets their supervisor (scene 2), has a standup meeting with three male colleagues (scene 3), has a follow-up talk with their supervisor (scene 4), and after some time has passed has a chat with a same-sex colleague in the breakroom about their experiences in their job (scene 5). Prior to the first scene, the participant was presented with text on the screen giving some background on their self-avatar and details on the job interview. There was transition text prior to each scene to update the participant on the situation of their self-avatar and explain any passage of time. The full script of both experiments can be found in the Supplementary Material. 3.1.1.1. Experiment 1. For the negative scenario, we used the taxonomy of gender microaggressions (Capodilupo et al., 2010;Nadal et al., 2010;Sue, 2010;Sue & Capodilupo, 2008): Sexual Objectification, Second-Class Citizenship, Use of Sexist Language, Assumptions of Inferiority, Denial of the Reality of Sexism, Traditional Gender Role Assumptions, Invisibility, Denial of Individual Sexism, and Sexist Jokes. Table 1 shows examples of the gender microaggressions used in the negative workplace scenario and their categorizations. The full list of microaggressions used can be found in the Supplementary Material. The same experiences were created for both the female and male self-avatar.
In the negative scenario, the self-avatar is not given the position that they applied and are qualified for, their ideas are not listened to in the meeting, they are not invited to further meetings, their work is not acknowledged and praised, and they are not put up for promotion. The last scene allows the user to explicitly express the difficult experiences of the microaggressions to a sympathetic same-sex colleague who has also experienced similar treatment.
3.1.1.2. Experiment 2. In the positive scenario, none of the gender microaggressions were included. Instead, the self-avatar is given the position that they applied and are qualified for, their ideas are listened to in the meeting, they are invited to further meetings, their work is acknowledged and praised, and they are put up for promotion. The last breakroom scene allows the user to explicitly express the how well the work has been going to a same-sex colleague who praises them.

Virtual environment interaction
Before the workplace scenario, participants were given time to get accustomed to their self-avatar in a room with a virtual mirror in front of and to the left of the self-avatar ( Figure 2). Participants were asked to press keys to move the avatar's body and look around the room for several minutes akin to Lopez et al. (2019)'s introductory scene. This was followed by the virtual workplace scenario where each scene had a virtual mirror so that participants could see the reflection of their self-avatar during the interactions (Figure 1). Each scene contained engagement checks at the start where the participant was asked to confirm some visual element of the self-avatar, such as shirt color (Figure 2). This was also to ensure that the participant was aware of  their self-avatar in the mirror to retain a sense of embodiment. In each of the workplace scenes, participants actively spoke the lines of their self-avatar and recorded their voice when prompted. Moving from immersive virtual environments to desktop virtual environments, we wanted to avoid a situation where participants would skim through a script. This way, they could listen to the dialogue and interact directly. Audio recordings have been used in virtual environments in the past (e.g., Yuksel et al., 2017). Figure 3 shows the recording interface and progressions from "Ready to Record" to "Recording" to "Transcribing" and either "Recorded" if successfully recorded or "Words not detected" otherwise.

Experimental design
We conducted two experiments to address whether implicit gender bias could be reduced by experiencing negative (Experiment 1) and positive (Experiment 2) workplace scenarios in a desktop virtual environment. Each experiment was between-subjects and had two conditions: experiencing a female self-avatar versus a male self-avatar (control). Participants were recruited online using Prolific 2 as several multi-part studies. The first study verified users were using the Google Chrome browser on a desktop and could hear and record audio. Users that passed the systems checks were administered a gender-career Implicit Association Test (IAT) (preIAT) (Greenwald et al., 1998;Greenwald et al., 2003). After 48 h, users that successfully completed the preIAT were invited to take part in the virtual workplace simulation and followup post simulation gender-career IAT (postIAT) and questionnaires. Participants were compensated $10/h for each study and awarded a $1 bonus for completing all parts of the study. The first two studies took around 20 minutes and the last study took around 1 h. The studies used Qualtrics 3 to collect data. The system checks used built-in metadata to check for mobile devices and browsers, an embedded audio player widget from Google Drive to test the audio setup, and an embedded Unity WebGL widget from Simmer 4 to record audio and display the workplace simulation. Counterbalancing was used to assign an equal number of participants to the study conditions based on gender. We used the IATGEN tool (Carpenter et al., 2019) to conduct the pre and postIAT in Qualtrics.
Before starting, participants were given time to adjust to their self-avatar in front of a virtual mirror. During the workplace scenario, participants actively spoke with the other avatars by recording their voice when prompted by their self-avatar's lines (Figure 3, 4). Participants went through an audio and microphone check to ensure that they could hear the dialogue and record their voice before starting. We removed participants who failed engagement checks (questions about appearance of self-avatar and virtual environment and lack of transcribed audio).
We measured the difference in the pre and postIAT in the same way as (Banakou et al., 2016;Lopez et al., 2019;Peck et al., 2013): DIAT ¼ postIAT-preIAT 5 . To measure levels of empathy we used questions originally derived from the Interpersonal Reactivity Index (IRI) (Davis, 1980), which measures four aspects of empathy, and which were adapted by Muller et al. (2017) and Herrera et al. (2018) to avatars in a virtual environment ('Empathy' category in Table 2). To measure levels of embodiment and body ownership of the self-avatar we used and adapted questions derived from and that were used in M. G.-F. Gonzalez-Franco &Peck, 2018 andin Slater, Usoh, &Steed, 1994 ('Embodiment' category in Table 2). We used a control question similar to (Mottelson & Hornbaek, 2017;Steed et al., 2016) ('Outdoor Park') and disregarded participants who agreed that they were in an outdoor park. We also included two open-ended questions (Table 2).
Ninety participants took part in the negative workplace scenario (Experiment 1) aged 18 to 63 (mean age of 32.0, SD of 11.2; 38 female, 50 male, 2 non-binary). Sixty-six took part in the positive workplace scenario (Experiment 2) aged 18-66 (mean age of 29.2, SD of 10.5; 35 female, 31 male). All of the participants were White in order to control for race, as carried out by (Banakou et al., 2016;Lopez et al., 2019;Peck et al., 2013). All participants had English as their first language to ensure that the interactive dialogues were understood at the same level. Figure 3. The recording interface and progression from "Ready to Record" (left) to "Recording" (right top) to "Transcribing" (right middle) and either "Recorded" if successfully recorded or "Words not detected" (right bottom) otherwise. Prompt for participant: "Sounds great. I'm glad to be here!"

Virtual avatar creation, animation, and interaction
We used Unity 2019.2.7f2 to build the virtual environment and deployed the build via the WebGL platform 6 which was deployed on Simmer. We used Adobe Fuse CC 7 to create the avatar models using stock faces for the third party avatars. For the male and female self-avatars, we used faces and body types that were piloted to demonstrate neutral physical attractiveness and likeability (please see Lopez et al. (2019)). The models were exported into Mixamo 8 for rigging and some animation before being imported into Unity. We used voice recordings to act out the parts of the third party avatars and used LipSync Pro 9 , a third-party Unity asset, to animate speech articulations performed by the avatars. We were not able to record full body motion capture as originally intended for avatar animation due to social distancing requirements, we therefore used a combination of Mixamo animations, FinalIK (Inverse Kinematics third party Unity Asset 10 ), and LipSync Pro. This allowed us to supplement animations with custom movement not reflected in an animation. We used Microphone WebGL Library 11 , a Unity third-party asset, to communicate with the microphone from the WebGL build. After a recording was completed, it was sent to Google's Speech-To-Text service. Google Cloud Speech Recognition Pro 12 , a third-party Unity asset, was used to access Google's Cloud Speech REST API from within the Unity WebGL build.

IAT
The IAT is a measurement of implicit bias which relies on people making decisions without time for conscious  introspection. It measures reaction times and accuracy in associating a target group with positive and negative qualities (Greenwald et al., 1998(Greenwald et al., , 2003. In this paper, we used the Gender-Career IAT which requires users to quickly associate words into the the Female-Male and Career-Family categories. We used the IATGEN tool (Carpenter et al., 2019) which integrates the IAT with the Qualtrics survey software. A positive IAT score indicates a Male-Career and Female-Family association, a negative IAT score indicates Male-Family and Female-Career association, while a zero score indicates no bias. The IATGEN tool uses the improved IAT algorithm by Greenwald et al. (2003) over the conventional algorithm by (Greenwald et al., 1998) (please see Table 4 in Greenwald et al. (2003) for a comparison of the two algorithms). The new algorithm is less sensitive to prior IAT experience, particularly for postIAT-preIAT study designs (Greenwald et al., 2003). However, prior IAT experience effect is not completely eliminated by the new algorithm and the order that the association pairings are presented in need to be counterbalanced, as they were in this paper.

Results experiment 1: negative scenario
In both experiments, primary statistical results are reported using bootstrapped 95% confidence intervals for means and effect sizes, following recent calls for more transparent statistical reporting in human-computer interaction (Dragicevic, 2016). Traditional comparison metrics are included for context, such as Wilcoxon tests for the non-normally distributed IAT data.

IAT data
Shown in Figure 5A, the DIAT was significantly lower (W ¼ 458, p ¼ 0:02196) in participants who experienced . This finding demonstrates that in the negative scenario participants who experienced a female self-avatar had a significant decrease in implicit gender bias compared to participants who experienced a male self-avatar.

Questionnaire data
Results showed that there was some embodiment in the negative condition (Virtual Body), particularly for the female self-avatar ( Figure 6A). There was no sense of control over the movements (Movements) of the self-avatar ( Figure 6B). Participants did feel that they were in an office environment ( Figure 8B). The empathy questions showed that participants who experienced the negative workplace scenario really got involved with the feelings of their self-avatar ( Figure 7A) and felt apprehensive and ill-at-ease by what was happening, particularly in the female self-avatar condition ( Figure 7B). Interestingly, while almost all participants felt that the female self-avatar experienced gender discrimination, they did not feel that way about the male self-avatar ( Figure 8D), although the male self-avatar experienced the same events and participants felt both avatars were equally not treated fairly ( Figure 8C) nor listened to ( Figure 8E). Participants did not feel attracted to the self-avatar ( Figure 8A). This is beneficial as it avoids objectification of the self-avatar, which would obstruct perspective taking and the merging of the self and other.

Perspective taking
Sixteen out of 19 male participants and 13 out of 16 female participants commented on the fact that having a female selfavatar in the negative scenario helped them experience her point of view and/or that they were affected in some way.

Comments such as these examples below help illustrate this:
After the first or second question by the CEO I guessed what was going to be the theme of this simulation. I have personally witnessed gender discrimination against my colleagues and whilst I can empathise to a degree I do recognise its not something I'll ever experience. It was quite a clever simulation, and echoes what I've seen in the past and often the stories I hear from my partner. I'd be bitterly disappointed if I ever raise a son to behave like that in the workplace. [FN condition, male participant, DIAT ¼ À0.0362599] I think the virtual body was a really interesting idea to try and understand the viewpoint of someone being discriminated in a workplace situation. Although I was aware gender discrimination is a huge issue, experiencing it in person (albeit virtually) made me feel uncomfortable and forced me to consider my positionality. I really enjoyed the study and hope you find some useful results! [FN condition, male participant, DIAT ¼ À0.1984775] I did not enjoy the experience. I felt bad for my virtual body and what they (she) was experiencing. I was cringing at what that all male team would say next … Furious at that team. Why hire an experienced individual if they're just going to fob her off with menial tasks unrelated to the job role in the guise of 'taking one for the team'? Despicable work place sexism. It's none of that hiring manager's business if she decides to start a family and to put her on the spot like that, I'd have walked out. [FN condition, male participant, DIAT ¼ À0.1030974] Figure 6. Embodiment data questions all rated on a 5-point Likert scale from 1 (Strongly Disagree) to 5 (Strongly Agree). Assigned labels displayed in Table 2. The thick black horizontal lines are the means, the boxes are the interquartile ranges, and the whiskers extend to ±1.5 x IQR, or the range. Individual points are outliers.  (Davis, 1980) and adapted by (Muller et al., 2017) all rated on a 5-point Likert scale from 1 (Strongly Disagree) to 5 (Strongly Agree). Assigned labels displayed in Table 2. The thick black horizontal lines are the means, the boxes are the interquartile ranges, and the whiskers extend to ±1.5 x IQR, or the range. Individual points are outliers.
Very interestingsuper frustrating at times when in a real situation you might just want to call out the misogyny but the script wasn't so forthcoming. Of course, this type of reaction would differ depending on levels of privilege and seniority though.
[FN condition, female participant, DIAT ¼ À0.7445791] It was very upsetting because this is a reality I have experienced and I was apprehensive from the beginning already knowing it was going to be a male dominated field. Good study! [FN condition, female participant, DIAT ¼ À0.8908222]

Questionnaire data
Shown in Figure 6, participants did not feel embodied in the self-avatars in the positive scenario. They felt neutrally to moderately that they were in an office environment, but did not have empathetic feelings towards the self-avatar (Figure 7). Participants felt that their self-avatars were fairly treated ( Figure 8C), listened to ( Figure 8E), and did not experience gender discrimination ( Figure 8D). They did not feel attracted to the self-avatar ( Figure 8A).

Perspective taking
Three out of 12 male participants and seven out of 15 female participants commented on the fact that the positive scenario was "unrealistic" and/or they had a difficult time identifying with it.
Here is a female participant who was surprised that her female self-avatar did not experience gender discrimination in the workplace! I expected the virtual body to experience discrimination and for her ideas to be dismissed as she was a female, but was pleasantly surprised by the fact that she was so well treated. I can see a lot of effort went into the simulation, so I commend everyone involved in creating the study. [FP condition, female participant, DIAT ¼ À0.3266387] This participant felt that the scenario was "too nice" to perhaps be reflective of real life.
It felt like the people inside the workplace were too nice. [FP condition, male participant, DIAT ¼ 0.52409966] There was also a sense of unreality from this female participant who had a male self-avatar: I feel like it would be better if it were tailored to the gender we submitted. For example, as a woman I know that I'd face a lot more discrimination in this workplace than just being "the new guy," so it was hard to relate to the virtual character. [MP condition, female participant, DIAT ¼ 0.05095463]  Table 2. The thick black horizontal lines are the means, the boxes are the interquartile ranges, and the whiskers extend to ±1.5 x IQR, or the range. Individual points are outliers.

Technology
There was important feedback on having more control over the self-avatar to increase embodiment: Actual direct control could have made the virtual body feel more realistic than what amounts to random button presses. Even if we never moved Kate, giving mouse control to her head and allowing us to look around to a certain degree would have made the whole thing easier to "buy" into. [IAT data rejected by algorithm for being too fast] There was also helpful feedback that the audio of the third party avatars and the audio recordings that the participants spoke out loud helped the level of immersion: 6. Discussion

Reduction of implicit gender bias and perspective taking
The findings showed that participants who experienced a negative workplace scenario with a female self-avatar had significantly decreased implicit gender bias compared to those with a male self-avatar (Figure 5a). This is the first time that we know of that implicit gender bias has been decreased, as measured by the IAT, in a desktop virtual environment. This is also the first time that implicit gender bias has been reduced across both male and female participants in a virtual environment.
The results suggest that perspective taking, i.e., an overlap between the self and the other when being put into someone else's shoes, took place in those who experienced negative workplace experiences and gender microaggressions with a female self-avatar. One aspect of perspective taking is that the ascription of self-traits onto the other is not affected by dividing the cognitive attention of the observer and occurs implicitly at an unconscious level (Davis et al., 1996). This may be why perspective taking is a powerful tool against implicit biases, which are inherently unconscious and separate from an individual's conscious belief that they would never willingly discriminate (Banaji & Greenwald, 2016). Research into trying to reduce biases through conscious means, such as thought suppression has actually had a paradoxical "rebound effect" where the unwanted stereotypic thoughts about others return and increase in effect (Macrae et al., 1994;Wegner et al., 1987).
A reduction in implicit gender bias occurred across both male and female participants. This has extremely hopeful and slightly different consequences for both men and women. Sadly, out-groups often have implicit biases towards themselves and their own group due to internalization of social stereotypes, resulting in people of color, women, and members of the LGBTQ community often carrying implicit biases towards their own race, gender, and sexual orientation, respectively (Banaji & Greenwald, 2016) with evidence that this even occurs in children (Baron & Banaji, 2006;Newheiser & Olson, 2012). In women, research has shown that gender stereotypes such as "women are bad at math," actually can cause under-performance in math tests (Aronson et al., 1998;Shih et al., 1999;Steele, 1997), termed as stereotype threat (Steele & Aronson, 1995). Implicit gender-science stereotypes have been shown to be a stronger predictor than actual math SAT scores for whether a woman would choose a science major (Smyth et al., 2009). Such self-biases are not exhaustive and serve as examples of how women's own self-biases can cripple their professional decisions and performance. Having the technology presented in this paper, which can reduce women's implicit bias towards their own gender, and perhaps most importantly towards themselves, is hugely important and has wide-reaching consequences beyond the human-computer interaction community.
This technology has equally important consequences for men. Some of the most powerful comments came from male participants who experienced the negative scenarios as a female self-avatar. Many men echoed that this gave them the opportunity to experience gender discrimination as a women first-hand, and that while they were aware of the problem previously, the difference in positionality created a shift in their perspective. This highlights the importance of perspective taking, particularly for in-groups. While everyone uses stereotypes, some people are more likely to be stereotyped than others. Members of an in-group are much less likely to be a victim of stereotyping and much more likely to to see stereotype as less of a problem (Banaji & Greenwald, 2016). Therefore, it is not surprising that experiencing the stereotype-based discriminatory behavior from the first-person perspective created a shift in men's view of gender bias.
While the negative scenario caused a significant difference in reduction of implicit gender bias between the male and female self-avatar conditions, there was no difference in the positive condition ( Figure 5B). This could be, as Galinsky and Moskowitz (2000) put it, because "the probability of perspective taking increases when one has endured the same slings and arrows as the target person" (p709). It is possible that the negative scenario as a female self-avatar made it easier for perspective taking to occur. The comments from the negative scenario, particularly with the female self-avatar, demonstrated that participants were really affected by the experience. In contrast, many comments from the positive scenario reflected the unrealism of participants' expectations due to how well the experience went. Results showed that there was some embodiment in the negative condition (virtual body), particularly for the female self-avatar ( Figure 6A). Interestingly, this was also the condition that participants were involved with the feelings of their self-avatar ( Figure 7A) and felt apprehensive and ill-at-ease by the scenario (Figure 7B). Interestingly, while almost all felt that the female self-avatar experienced gender discrimination, they did not feel that way about the male self-avatar (Figure 8d), who experienced the same events. The DIAT, questionnaire, and comments suggest that the participants were identifying with the female self-avatar in the negative scenario, perhaps because of the negative scenario. This is consistent with the literature on perspective taking, that when exposed to a needy target (i.e., the female self-avatar in the negative condition) observers' experience feelings of sympathy and compassion (Batson, 2014;Davis, 1983) and personal unease and distress (Betancourt, 1990;Schaller & Cialdini, 1988).

Recommendations for virtual scenarios
We make three recommendations for the development of the workplace scenarios with respect to the reduction of implicit gender biases: 1) Avoid stereotype activation, 2) Provide a backstory and situational descriptions for the selfavatar, and 3) Focus on the correct type of scenario (i.e., positive, negative, or neutral).
Avoiding the automaticity of stereotype activation (Bargh et al., 1996) was key in order to allow perspective taking to take place. Unintentional stereotype activation can occur in first-person perspective studies in race (Groom et al., 2009) and gender (Lopez et al., 2019;Neyret et al., 2020). It was important not to make the female avatar in the negative scenario look incompetent or unlikeable, even unintentionally, so as to not trigger negative stereotypes about women in the workplace. For example, in an earlier draft of the script, we had her supervisor unfairly discuss her job performance negatively. However, it was possible that negative feedback from a supervisor, albeit unfairly, could have been construed as incompetency by the observer and trigger stereotype activation. Secondly, we carefully provided a back story and information on the self-avatar through text both as an introduction and as transitions between scenes. We were worried that the gender microaggressions might have been too implicit and gone unnoticed (particularly by male participants who reported less gender discrimination than female participants). This is consistent with Gehlbach et al. (2015)'s findings who argued that when individuals are asked to take the perspective of another person, but are given no or little information about that person, there is actually no perspective to be taken. Lastly, the type of scenario (positive, negative, or neutral) is vital and can differ depending on the type of implicit bias that is being focused on. Lopez et al. (2019) found that implicit gender bias was actually increased in an apparently neutral scenario and suggested that this may be due to the difficulty of finding a neutral and 'benign' scenario when exploring gender due to the pervasive nature of gender roles and stereotypes. In this paper, we investigated both positive and negative scenarios and found no significant difference in implicit gender bias in the positive scenario between female and male self-avatars. Our findings that a negative scenario significantly reduced implicit bias are congruent with Gonzalez-Liencres et al. (2020) where male participants experienced a domestic violence situation. The evidence, therefore, suggests that negative scenarios may be particularly potent when trying to reduce implicit gender bias. It should be noted, however, that in the reduction of implicit racial bias, a neutral scenario was effective (Banakou et al., 2013;Peck et al., 2013) and an interview scene, which is negatively stereotyped for people of color, actually increased implicit racial bias (Groom et al., 2009). Therefore the type of bias could determine the most effective type of virtual scenario. There is a lot of room for further exploration for gender and other biases.

Technology of desktop virtual environment
Deploying the project as a WebGL build placed certain limitations on the fidelity of the virtual environment, when compared to a build for the standalone PC platform. This can be reflected in the lack of lighting options, due to WebGL only supporting non-directional Baked Global Illumination. A project deployed to the standalone PC platform would have allowed the virtual environment to have better lighting, including real-time lighting, which might have added to the fidelity of the virtual environment. Another factor which may impede fidelity is the performance of the build. WebGL is less performant than a build running natively, and that performance may be further hindered by the browser being used by the participant. Low performance may add unintended delays in loading, skipping of render frames, and unforeseen bugs which could lower the fidelity of the virtual environment.
Due to the build being deployed on a web server, there is slight lag when compared to running a build locally. In order for input to be processed from the client, it must be sent to the server on which the build was being hosted on via an online connection, which is then processed by the server and sent back to the client. These are extra steps which are absent when a build is run locally. Additionally, the speed of this communication is dependent on the Internet connection of the client and may have accounted some of the "jerkiness" reported by some participants. This effect was much more noticeable when processing audio input from the participant, which was sent to Google's Cloud services where it was then processed by Google's Speech-to-text API. Using a non-cloud-based speech-to-text solution could have lowered audio input lag and improved fidelity.
We had originally planned to use motion capture for all avatar animations but were not able to due to COVID-19 social distancing rules and therefore pivoted to a mixture of animations from Mixamo and Unity. Motion capture animation would certainly add to the realism of avatar movement in future iterations.
Participants reported no sense of control over the movements in either the negative or positive condition ( Figure 6B), which was as expected in a desktop virtual environment, however, there are some areas for improvement for fidelity and control in the technology. As part of keeping participants engaged, we had used random keys to press to continue, which may have also added to the lack of control. Comments from participants reported enjoying the interactive mode of using their voice to speak the lines of the avatar and that this allowed them to become more involved in the experience. Moving forward, giving participants direct control over the movement of their avatar's head and/or body using key presses and mouse movements could increase the sense of embodiment ( Figure 6B) and perspective taking. This would be consistent with Gonzalez-Liencres et al. (2020)'s findings that the level of identification with the female avatar correlated with the decrease in implicit gender bias (with full body tracking).

Reaching a broad audience
The quantitative and qualitative data is pointing towards a reduction in implicit gender bias in the negative female selfavatar condition. The fact that this occurred with medium levels of embodiment in a desktop virtual environment is extremely promising and is congruent with Herrera et al. (2018)'s comparison of affects on empathy in immersive versus desktop virtual environments. The fidelity and control of the desktop virtual environment technology can be improved as discussed above, which we believe would further improve the sense of embodiment and, as a result, perspective taking. One of the most unexpected outcomes of being forced to pivot to an desktop virtual environment has been the (somewhat surprising) promise that such a technology can offer to reduce implicit biases and to reach an incredibly wide audience: into people's homes, businesses, and government offices. While investigating these effects using visuomotor synchrony is certainly worthwhile, it does not necessarily behoove us to solely focus on full body tracking and abandon desktop virtual environments for the purposes of implicit bias reduction once COVID-19 is over.

Limitations and ethical concerns
In this paper, we did not look at the intersectionality of users' identities such as race and gender (Schlesinger et al., 2017;Wisniewski et al., 2018). All of the participants were White in order to control for race, as carried out by (Banakou et al., 2016;Lopez et al., 2019;Peck et al., 2013). The intersection of sexism and racism is complex and certainly needs to be explored further, as do other intersectionalities (Schlesinger et al., 2017;Wisniewski et al., 2018).
Moreover, we only had 2 participants whose gender was non-binary (both in Experiment 1). All participants' data were used for the results as the analysis was dividing by self-avatar gender, not participant gender. The non-binary participants did not make any comments regarding the gender experience of the virtual simulation. It is important to further investigate non-binary perspectives on gender portrayal and first-person perspective in virtual environments.
There have been concerns that negative first-person perspective experiences in VR can have traumatic effects on the very marginalized populations that they are supposed to represent (Cogburn et al., 2018;Nakamura, 2020). For example, One Dark Night (Emblematic, 2018) is a VR story that traces the sequence of events leading up to the shooting of Trayvon Martin by George Zimmerman. Ward (2016) describes her experience after watching: "But after I experienced 'One Dark Night' myself, I stayed to watch other users. Most of the Black people I spoke to after had the same reaction: "Why would someone create this!?" White users, on the other hand, seemed to be able to get up, move on and go about enjoying the rest of festival. While this project may have been created to raise awareness around police brutality, it does so by putting the mental and emotional health of African-American users at risk." Nakamura (2020) also argues that movies like Clouds Over Sidra, 13 invades the personal and private space of a vulnerable population such as refugees. Cogburn et al. (2018), creator of 1000 Cut Journey, asks the important questions of "What are the ethics around stories that are not our own?" and "Are the communities being represented also represented through the creation process?" 14 Bennett and Rosner (2019) warn of "the promise of empathy" using the example of disability and drawing from feminist theory, and how designers can unknowingly contribute to the priority of normalcy over disability. They suggest that we should prioritize what it takes to "be with" someone over what it takes to "be like" someone in order to be affected by the differences without reinforcing the differences. That is precisely what can happen in VR simulations when stereotype activation occurs, which is why in our work we have been so careful to avoid that. In that vein, it is important to acknowledge that this work stemmed from female authors, was based on female authors' own experiences and female authors led and were included in every step of the work.

Future work
As stated earlier it is important to investigate the intersectionality of users' identities with gender including race (Schlesinger et al., 2017;Wisniewski et al., 2018). Since race was controlled in the study akin to Lopez et al. (2019), all participants were White. This work should be tested again using avatars of different races and ethnicities. Along those lines, while excellent work has been done to reduce implicit racial bias in immersive virtual environments (Banakou et al., 2016;Groom et al., 2009;Peck et al., 2013), it would be beneficial to further explore this with desktop virtual environments. Another factor to investigate is the sustainability in reduction of the biases. Banakou et al. (2016) found a sustained reduction in implicit racial bias using full body tracking technology in an immersive virtual environment. Even if implicit biases return to a similar level later, there is evidence that repeated exposure to bias changing situations can create sustained reduction (Banaji & Greenwald, 2016).
It is also worthwhile to investigate the effects of desktop versus immersive virtual environments, as well as visuomotor synchrony and visuotactile synchrony on the reduction of implicit biases. The more immersive and embodied the participants feel, the likelihood of perspective taking can increase, as long as the virtual scenario does not cause stereotype activation. There are many directions that technology like this can go. Diversity training, such as unconscious bias training, has been found to be largely ineffective in quashing managerial bias (Dobbin & Kalev, 2013). This may relate to findings that conscious suppression of implicit biases have the opposite desired effect (Macrae et al., 1994;Wegner et al., 1987). Implicit biases exist in all types of workplace situations, including schools (McIntosh et al., 2014;Bergh et al. 2010), hospitals (Moskowitz et al., 2012;Penner et al., 2016), therapist offices (Boysen & Vogel, 2008), and law enforcement (Correll et al., 2007;Kang et al., 2012). A desktop virtual environment scenario that reduces implicit biases could have a global impact on diversity training.

Conclusion
Our findings demonstrate that participants who experienced a negative scenario in a desktop virtual workplace environment as a female self-avatar had significantly decreased implicit gender bias compared to participants who experienced a male self-avatar. There were no significant differences between self-avatar gender in the positive workplace scenario. There is some evidence for empathy and perspective taking through questionnaire data and participant comments. We recommend pursuing this exciting avenue of desktop virtual environments to reduce implicit gender bias due to its accessibility to a broad audience.
As the authors, we wish to acknowledge our own implicit biases and limitations as well as the limitations of this paper. It is impossible to fit all the complexities and intersectionality of identity and nuances of biases and prejudice into one paper. Our hope is that by taking steps towards understanding the effects of virtual environments on biases, we can start to use this medium for positive change, rather than activating or emphasizing stereotypes. Notes 1. For the purposes of this paper, the term virtual environment (VE) includes fully immersive virtual environments as well as desktop virtual environments. 2. https://www.prolific.co 3. https://www.qualtrics.com 4. https://simmer.io/ 5. A positive DIAT indicates an increase in implicit gender bias. A negative DIAT indicates a decrease in implicit gender bias. 6. https://docs.unity3d.com/Manual/webgl-graphics.html/ 7. https://www.adobe.com/products/fuse.html 8. https://www.mixamo.com 9. https://assetstore.unity.com/packages/tools/animation/ lipsync-pro-32117 10. https://assetstore.unity.com/packages/tools/animation/finalik-14290 11. https://assetstore.unity.com/packages/tools/inputmanagement/microphone-webgl-library-79989 12. https://assetstore.unity.com/packages/add-ons/ machinelearning/google-cloud-speech-recognition-vr-armobile-desktop-pro-72625 13. The taxonomy of a movie like Clouds Over Sidra is not clear. It is footage of the real world with a 360 camera. It is classified as a VR film by IMDB. 14. Cited from supplementary video (Cogburn et al., 2018).