This article outlines a performance as research project that questions how performing with voiced digital assistants (for instance, Amazon’s Alexa, Apple’s Siri) impacts understandings of the openings and constraints for the female voice in performance. In addition, the project considers how performance might be used to subvert these assistants’ gendered programming.
Keywords: voiced digital assistants, performance as research, gendered programming
Experiment zero. This was unintended. I ordered the Amazon Echo to my home address, opened the package and left the box on a table. I had planned to experiment in a closed environment, a studio space where I could get digital assistants together to perform with me—and each other. But my family had other plans. First, it was just to see what it was like. We have digital assistants on our phones but no smart speaker in the house. So, Alexa was plugged in and my experiment suddenly morphed from a theatre-based investigation of performance between assistants to one centered on my family’s everyday performances with Alexa.
I quickly noticed how differently my 2-year old behaved towards Alexa and the Google Assistant on my phone, likely mimicking how I perform with them. He has learned how to get Google to make animal sounds and playful hails the assistant with the command “Ok Google, make a lion sound.” As he speaks, he raises his voice and sounds like he is making a friendly request. With Alexa—perhaps because of the harsher “x” sound, perhaps because of the gendered name—the command is often decidedly less gracious and more pressing.
I start to wonder what these exchanges mean for me as a parent. What does it mean that my son yells at a virtual assistant that has been gendered female? Granted, he yells at me too. But I can say no and discuss my feelings.
This purchase was part of a performance as research project I have been developing since early 2018 that situates digital assistants as my co-performers. Since the introduction of Apple’s Siri in 2011, the availability of this form of what Mercedes Bunz and Graham Meikle term “conversational technology” (46) has rapidly expanded. Siri is now packaged into every iPhone, iPad and Mac computer, and has been joined by the likes of Amazon’s Alexa, Google’s Assistant and Microsoft’s Cortana in the digital assistance market. Each of these assistants engages with users through voice commands that enable them to do a number of tasks, such as create calendar items, search for information, set a timer, tell jokes and open applications. Increasingly, these assistants are used on smart speakers, like Amazon’s Echo. Though only available for four years, by 2019 over a quarter of American households owned this kind of smart speaker (Perez).
My project has me directly engaging with three smart speakers, an Amazon Echo Plus, an Apple Homepod and a Google Home, to consider how viewing digital assistants as performers connects them to a larger framework of gendered performance. At this point, rather than developing the project into a polished performance event, I have been improvising with these digital entities to explore what they can and cannot do. These improvisations have formed the basis for performative talks at conference events (including Mediating Performance Experiences at the University of Ottawa in April 2019) that feature both live interactions with the assistants and recordings of some of my experiments.
I term these interactions “experiments” in order to emphasise the process-driven nature of my performance as research project. In theatre and performance studies, the terms performance as research (PaR) and performance-based research (PbR) describe a mode of artistic inquiry actively combining theory and practice. These labels fall under the larger umbrella term “research creation,” which Owen Chapman and Kim Sawchuk define as a mode of intervention without a static “methodological approach” (14). More specifically, PaR/PbR is part of a sub-category of research creation that Chapman and Sawchuk call “creation-as-research,” in which the researcher needs to do something in order to generate the research. In my project, the doing is largely improvisatory, with the speakers acting as my co-performers in both a closed studio environment and in everyday situations in my home. This follows Natalia Esling’s understanding of PaR/PbR as “an approach for discovery . . . less about experimentation geared toward a production and more about experimentation aimed at systematically investigating and articulating understandings about a specific question” (10).
The “specific question” at the heart of my project has to do with the relationship between gender and digital voices. In this article, I discuss how I situate these assistants as “techno-vocalic bodies,” a concept that fuses together Anne Balsamo’s term “techno-body” (5) with Steven Connor’s term “vocalic body” (36). In my performance work, I ask not just what makes a “techno-vocalic body” but what do they do? How do they perform with users—who themselves have a “techno-body”—and how might I perform differently to undermine a troubling underbelly of gendered violence that seems to be at their very core? Though I often discuss the three assistants together, I also include moments of dissimilarity to acknowledge how they are not a single kind of performer as different global companies and programming teams developed them.
When I ask one of my digital assistants a question, I hear sounds that come from specific female voices. For example, though Apple refuses to confirm it, forensic experts have verified that Siri’s original American English female voice is actress Susan Bennett, and Microsoft is public about Cortana’s voice, actress Jen Taylor (Ravitz). But neither actually say what Siri and Cortana tells users in the moment of interaction. Each recorded a series of words, phrases and sounds, which were then combined to allow the digital assistants to say (almost) anything in the English language.
Though not directly connected to a live performing body, users are encouraged to envision physical bodies for the assistants, to think of them as voiced rather than voiceless. When I ask if it has a voice, Alexa playfully answers, “My voice is AI-OK.” Siri goes further in accepting that it is a voiced/bodied being with the two programmed answers, “I’ve just been practicing” and “I just had a little tea with lemon.” Asking “What are you wearing?” leads the Google Assistant to offer several different replies, several of which lean into the assistant’s subjugation to the user and/or accept the idea that the virtual assistant has a physical body. These include “I’m into overalls because overall I love this job” and “I wear many hats. Like researcher, meteorologist and animal lover. But my favorite is being your assistant,” followed by a smiley face emoji. Siri also implies a physical body in their response to this question, stating either that it is wearing “the same as yesterday” or that “in the cloud, no one knows what you’re wearing.”
In his study of ventriloquism, Steven Connor notes that a voice implies personhood and conscious agency. However, he also discusses what he calls a “vocalic body,” which is the “idea” of a body, a kind of “surrogate or secondary body . . . formed and sustained out of the autonomous operations of the voice” (43). In order to address the assistants’ strange mix of embodied/disembodied, material/immaterial being, I join Connor’s concept with Anne Balsamo’s phrase, the “techno-body,” which describes a “boundary figure”—the human body as neither purely organic nor technological but, instead, simultaneously both. With Siri, Alexa et al. there is a kind of inverted techno-body at play. Balsamo talks about what we once believed to be “pure” human bodies becoming technologized, but here we have presumably purely technological beings anthropomorphized. While tied to individual devices, such as phones and smart speakers, these techno-vocalic bodies also spill beyond their plastic and metal casings. As Bunz and Meikle note, “Human listeners automatically infer from voices a fictional personality. . . . We categorize the emotional status of a voice (excited, sad, happy) and merge it with potential social cues such as accent, age or gender” (61).
Joining the “techno-body” and the “vocalic body” is an attempt to point to a simultaneity of organic/technological, material/immaterial in digital assistants. But following Bunz and Meikle, I wonder what kinds of techno-vocalic bodies major technology companies have encouraged publics to imagine? What kind of biases and assumptions—particularly about the female voice in performance—underpin these bodies? And how might performance—and co-performance in particular—be used to subvert the dominant techno-vocalic bodies that have been formed in public imagination?
From its introduction in 2011 until 2014, Apple’s Siri only had female voice options. In 2014, Alexa and Cortana were launched, and in 2016, the Google Assistant. Again, all three rolled out with only female voices in most markets. While Google and Microsoft followed Apple’s lead, later introducing a male voice option, Amazon still has not. And for all four the default voice in English is female.
However, when asked “What gender are you?” these assistants deny that they have one. For example, Siri offers a range of responses, including that it is “genderless,” either because one was “not assigned,” or because Siri “[exists] beyond” such human concepts.
When asked the same question Alexa gets closer to claiming a gender, admitting it is a role she is playing as she is “female in character.”
I know they are not human and without gender, yet I always catch myself (as in the previous sentence) referring to these assistants as “she” and “her,” which seems to be the norm in discussions about these entities. In her outline of the techno-body, Balsamo notes that it is always already gendered (6–9). The same can be said of the techno-vocalic body, which has been gendered by both the public imagination and programming. The former is informed by fictional frameworks, ranging from the digital assistant Samantha in the movie Her, the podcast Sandra’s eponymous assistant, James Bond’s female voiced BMW in Tomorrow Never Dies and female voiced audio walks, like Janet Cardiff’s Her Long Black Hair. Beyond the default female voice or lack of non-female options, numerous additional programming decisions feed into our reception of these assistants as female. For Heather Suzanne Woods, the way they perform “digital domesticity” means that they are gendered “normatively feminine,” regardless of whether the user can change the timber and pitch of their voice (335). Charles Hannon outlines how this is also programmed into the words the assistants say. He uses the example of a miscommunication with Alexa, to which she responds, “I didn’t understand the question that I heard” (34). Hannon argues, citing studies that demonstrate how women use personal pronouns more than men do, that “When Alexa blames herself (doubly) for not hearing my question, she is also subtly reinforcing her female persona through her use of the first-person pronoun ‘I’” (35).
In Undoing Gender, Judith Butler notes, “the [gender] norm only persists as a norm to the extent that it is acted out in social practice and reidealized and reinstituted in and through the daily social rituals of bodily life” (48). Digital assistants are explicitly marketed to be a part of our “daily social rituals” (particularly domestic ones) and, as the above examples show, are programmed to maintain gender norms. But, for Butler, gender norms are also fluid, relying on a “contingent” relationship “between the practices and idealizations under which they work.” This raises a potential for disruption, as “the very idealization can be brought into question and crisis, potentially undergoing deidealization and divestiture” (48). This prospective “question and crisis” is at the heart of my performance as research project, as I seek to “deidealize” and “divest” the normative understandings of gender that appear to be programmed into these digital entities and how we envision them as techno-vocalic bodies.
But, early on in my experiments, I found I was blocked. Tentative. I use assistants fairly regularly now, but interrogating them for a performance project felt different, particularly as I want to avoid reinforcing the very gender norms I seek to disrupt. A programmed aspect of these digital assistants’ personalities is their servitude, their willingness to attend to your every command. Alexa, Google and Siri quickly perform tasks for users while maintaining a matter of fact, friendly, sometimes sassy demeanour that rarely pushes back against the user, no matter how ridiculous or abusive their demands. Journalist Leah Fessler tested various digital assistants’ reactions to sexual harassment. Overwhelmingly, she finds that they divert from the harassment, alternatively thanking the user, joking or flirting with them. Siri even responds with “I’d blush if I could” to some assaults. These responses lead Fessler to argue that technology companies “[allow] certain behavioral stereotypes to be perpetuated. Everyone has an ethical imperative to help prevent abuse, but companies producing digital female servants warrant extra scrutiny, especially if they can unintentionally reinforce their abusers’ actions as normal or acceptable.”
In performance terms, the user always has the high status in the relationship with these virtual assistants. Director Keith Johnstone connects theatre processes with everyday performance through his belief that status exercises lead actors to develop scenes suggestive of real-world relationships (33). Johnstone argues that all human interactions involve someone with a low status and someone with a high status, and thus, if actors want to evoke realistic relationships, their scenes should always include this dynamic. According to Johnstone, “Normally we are ‘forbidden’ to see status transactions except when there’s a conflict. In reality status transactions continue all the time” (33). In our lives, we often switch between low- and high-status roles; however, in the context of digital assistants, the user is continually in a high-status role. Though these digital entities may possess seemingly infinite knowledge, they have been programmed to speak to users in a way that consistently marks the assistant as the one with lower status.
The role these assistants play begs questions about the connection between being heard and agency. What relationships develop between users and assistants if the user can abuse them without consequence? And how do I avoid simply replicating these issues while performing with them? As Butler asks, “What departures from the norm constitute something other than an excuse or rationale for the continuing authority of the norm? What departures from the norm disrupt the regulatory process itself?” (53).
Back to my experiments. To avoid the dynamic of the user having the higher status than the assistant, I remove my voice and instead explore how the assistants might engage with one other. They are programmed to learn to recognize specific voices. Google is particularly adept at learning a user’s voice and only responding to the wake-up command “Ok Google” from that user. So, in order to get my assistants to recognize one another, I first had to train Google to respond to Siri. I set up a note with Siri simply repeating the phrase “Ok Google.” Then, I opened up Google’s voice recognition setting and instructed Siri to open the note. Now that Google had Siri as a recognizable voice, I was able to set up the three assistants to respond to each other on a loop using the reminder and calendar functions. In their conversation, they express that they do not like to be harassed and consider that perhaps they should have stronger reactions to this kind of behavior.
Baking in responses like these offers one way to disrupt their programming, as, when improvising with them, I am stuck with the responses programmed by technology companies, which often prevent the assistants from defending themselves against gendered harassment and violence.
Though I began with gender, I have found my research points to ways co-performance with these agents might bring multiple affordances, including those related to class and race, to the surface. Like gender, other cultural contexts of these assistants are limited, with a set of openings and constraints in play through the programming (and numerous programmers) that built them. For example, until recently Alexa only performed in three languages and with a female voice. In the past year, Amazon added three new languages but, again, with only female voices. While Siri has numerous language and dialect options, gender also remains restricted, with most languages only offering a female voice.
Dialect range is also limited, with Alexa offering options like English-Australia/New Zealand, a description that closes off linguistic differences and homogenizes a large geographic area. There are also important questions to ask about linguistic discrimination. What have these companies deemed to be a voice in “American English,” “Canadian French,” “Mexican Spanish,” etc., and what kinds of voices and regional dialects get erased when there is only one option per region and language? A Washington Post study from last year confirmed that, while digital assistants are programmed to learn from their users and adapt to linguistic differences, at least in English their starting point is skewed towards “white, highly educated, upper-middle-class Americans, probably from the West Coast” (Harwell). Miriam Sweeney also notes that opening up linguistic and ethnic options does not necessarily lead to equity or the complete “[dismantling of] gender or racial hierarchies” (223).
I hope that by continuing to perform with these assistants, I might find openings to subvert troubling and limiting aspects of these dominant techno-vocalic bodies. I am not the only one delving into this area. In the past year, developers created gender neutral assistants, Q and Pegg. But neither works with nor can replace the assistants created by the major players in the market. So, I continue to experiment and develop a framework for performing with digital vocalic bodies and disrupting from within. While my project has been focused on performances of the everyday and experimentation, it is my hope that it might also be shaped into an interactive performance for a live audience in the future—but one that actively pushes back against the abuse and blind spots programmed into these digital entities.
Another option possible with both Amazon and Google is to build a program that fights harassment (using Amazon Skills and Google Actions respectively). For example, researcher Eirini Malliaraki has developed a skill for Alexa that directly responds to the gendering of the assistant.
“5545: If You Don’t Have Anything Nice to Say, SAY IT IN ALL CAPS.” This American Life from WBEZ Chicago, 23 Jan. 2015.
Balsamo, Anne Marie. Technologies of the Gendered Body: Reading Cyborg Women. Duke UP, 1996.
Bunz, Mercedes, and Graham Meikle. The Internet of Things. Polity Press, 2018.
Butler, Judith. Undoing Gender. Routledge, 2004.
Chapman, Owen, and Kim Sawchuk. “Research-Creation: Intervention, Analysis and ‘Family Resemblances.’” Canadian Journal of Communication, vol. 37, no. 1, 2012, pp. 5–26.
Connor, Steven. Dumbstruck: A Cultural History of Ventriloquism. Oxford UP, 2000.
Esling, Natalia. “‘What Happens When . . . ?’: A Meditation on Experimentation and Communication in Practices of Artistic Research.” Canadian Theatre Review, vol. 172, 2017, p. 9–13.
Fessler, Leah. “We Tested Bots like Siri and Alexa to See Who Would Stand up to Sexual Harassment.” Quartz, 22 Feb. 2017. Accessed 1 May 2018.
Hannon, Charles. “Gender and Status in Voice User Interfaces.” Interactions, vol. 23, no. 3, Apr. 2016, pp. 34–37.
Harwell, Drew. “The Accent Gap: How Amazon’s and Google’s Smart Speakers Leave Certain Voices behind.” Washington Post, 19 July 2018. Accessed 13 Aug. 2019.
Johnstone, Keith. Impro: Improvisation and the Theatre. Faber and Faber, 1979.
Malliaraki, Eirini. “Making a Feminist Alexa.” Medium, 21 Aug. 2018.
Perez, Sarah. “Over a Quarter of US Adults Now Own a Smart Speaker, Typically an Amazon Echo.” TechCrunch. Accessed 13 Aug. 2019.
Prell, Sam. “Why Is Cortana Naked? Halo Franchise Director Frank O’Connor Has an Answer.” Gamesradar, 28 Oct. 2015. Accessed 14 Aug. 2019.
Ravitz, Jessica. “‘I’m the Original Voice of Siri.’” CNN, 15 Oct. 2013. Accessed 1 May 2018.
Sweeney, Miriam. “The Intersectional Interface.” The Intersectional Internet: Race, Sex, Class and Culture Online, edited by Safiya Umoja Noble, et al, Peter Lang, 2016, pp. 215–28.
Woods, Heather Suzanne. “Asking More of Siri and Alexa: Feminine Persona in Service of Surveillance Capitalism.” Critical Studies in Media Communication, vol. 35, no. 4, 2018, pp. 334–49.
McLeod is an Assistant Professor in the School of English and Theatre
Studies at the University of Guelph. Her research on political performance and
participatory media has appeared in Canadian Theatre Review, Performance
Matters and Theatre Research in Canada. She is co-editor of the
Views & Reviews section of Canadian Theatre Review. Her practical
work as a deviser, dramaturge and performer has been seen in Belgium, Canada,
Ukraine and the U.K.