Americans have trust issues with their voice assistants. The Washington Post reports that even though Americans have purchased voice assisted smart speakers en masse over the past few years, they’re struggling to use them. People who own smart speakers say that voice assistants are finicky, frequently misinterpret instructions, and make it too hard to retrieve simple information such as names of songs to listen to or products to buy.

For example, the article cites the case of computer science teacher Kate Compton, who struggled to get Alexa to agree to play the song “Despacito” (even though it’s an easy-to-find hit from 2017); and Brian Glick, founder of Philadelphia-based software company, who asked Alexa to help him purchase a simple roll of toilet paper only to be forced to endure a litany of purchase options that taxed his patience and stole precious time from his day.

Why are these problems happening? This is a topic we’ve been blogging about for a few years now. We believe the issue is that voice assistants need to be designed with people at the center, and they need to solve problems where they can have the most impact, such as helping people who require assisted living. When voice assistants are designed with the wants and needs of people at the center, they will become more trustworthy and even lovable.

Stumbling Blocks with Voice Assistants

The Washington Post cites two main problems with voice assistants:

  • Unreliability. As the examples of Kate Compton and Brian Glick illustrate, too often voice assistants don’t perform tasks people ask them to do. They complicate simple requests by suggesting answers no one asked for, and they create friction for actions such as shopping that can be performed more easily with old-fashioned text-based website searches.
  • Tone. Users report unsettling experiences with voice assistants due to their inappropriate tone. “Why does Google Assistant confidently say ‘sure!’ before delivering a “bafflingly incorrect” response to a request,” asks Kate Compton. “Why is Alexa always bragging about her capabilities and asking if you’d like her to do extra tasks, TikTok creator @OfficiallyDivinity wonders in a video.”

The combination of inappropriate tone and unreliability means that “Talking with them requires ‘emotional labor’ and ‘cognitive effort . . . “

This is certainly not a lovable experience. How to address it?

Design Voice-Based Products with People at the Center

For any product to be lovable – that is, adopted and used in a way that satisfies the owner – the product must be designed with both the rational and emotional needs of the person at the center. In other words, the product should be designed for functional and emotional trust.

  • Functional trust means that the product does what it’s supposed to do. This comes down to involving people in the design of an artificial-intelligence-fueled product – or keeping people in the loop throughout all phases of design. Oftentimes, the AI community talks about the need to make voice more accurate through advances in technology, and that’s certainly true. But it’s essential that product designers employ techniques such as design thinking in order to test the reliability of the product with the input of the user throughout inception.
  • Emotional trust means that people feel at ease when working with the product, and many of the people quoted in The Washington Post article clearly are not at ease with voice assistants. One way to build emotional trust is to be more mindful of the tone of the voice assistant. For the most part, humans understand how to modulate tone: someone coaching an adult to pursue an exercise regimen might effectively use a tough-love tone. But a child could require gentler encouragement—akin to what a grandparent can offer. Humans can make those adjustments, and for AI to be most useful, that flexibility and nuance in tone must be replicated somehow.

Tone also applies to people using voice assistants globally (although The Washington Post article did not touch on this issue). Cultures around the world have different hot buttons and parameters for good manners—and many of these parameters are driven by tone. One of the next frontiers of designing for trust is to adapt AI-based voice products for different people—from different age groups, from different cultures—a process otherwise known as AI localization. By training AI with local data, AI localization ensures that a product like Alexa will use the right tone in a specific market.

The payoff is big: smart, trustworthy products will encourage:

  • Repeatability: people will come back to your product.
  • Dependability: people will repeatedly engage with your product.
  • Relatability: people will feel so connected to your product they will form a trusted relationship.

The Washington Post did not mention another way to make voice more lovable: focus voice-based speakers on helping people who are more likely to need voice assistants to live their lives, such as the elderly and infirmed. Here, voice could improve quality of life

In fact, the Wall Street Journal recently reported that Amazon and Google are experimenting with features for their smart-home devices that more proactively assist people instead of waiting to be called on each time. Making smart speakers more proactive could make them more inclusive, too. Putting the onus on the owner to activate smart speakers creates more of a burden for people with disabilities and the elderly, especially for those who experience memory issues. There were 703 million persons aged 65 years or over in the world in 2019, and the number of older persons is projected to double to 1.5 billion in 2050. What if smart speakers were to provide healthcare-related prompts such as reminding someone when it’s time to take a prescribed medication? This kind of feature could be incredibly useful especially for someone who needs to take multiple medications.

What Businesses Should Do

The key to making voice assistants more lovable is to design them with people at the center. There are tools to help businesses do that. For instance, at Moonshot, we use the Mindful AI Canvas to identify a person’s wants and needs initially at the design of an AI product (such as one that uses a voice assistant)

We use the Mindful AI Canvas in a design approach known as FUEL to constantly keep the user’s emotional wants and needs at the center of both product design and roll-out. FUEL incorporates design thinking techniques such as design sprints to develop product prototypes quickly and cost-effectively via ongoing user feedback. 

We’ve used these tools for voice-based applications such as learning, as we discuss in this blog post.

Contact Moonshot to get started.


For Further Insight

Will People Trust Voice Assistants to Make Medical Appointments?

Making a Voice Assistant a Life Coach – and What That Means for AI

How Amazon and Google Might Make Voice More Inclusive”

Why Fighting AI Bias Requires Mindful AI

What’s the Future of Voice?

Voice Uptake Is on the Rise

Removing Limitations to Create Access through Design

How to Create a Voice-Based Product with a Design Sprint

How to Design Trusted Customer Relationships for Digital Products

Introducing the Mindful AI Canvas

Why Encyclopædia Britannica’s “Guardians of History” is a Lovable Voice Experience