With the emergence of voice-first applications, developers and designers need to evolve to serve consumers. We are calling it Voice User Experience (VUX).

VUX requires understanding the consumer’s interest, and the ability to interact hands and screen free. There will be do’s and don’ts, rules of thumb, and lots of data-driven best practices (VoiceLabs’s specialty). Amazon has already started a list for developers on their Voice Design Best Practices page, but we wanted to provide real data, from real use cases.

Here are a few key topics that are top of mind within our network of developers, and will be areas of future exploration and innovation:

  • Session Length
  • Session Interactions
  • Intent Threshold
  • Optimal Response Time

Session Length – How long do people want to converse?

Typically, a voice application has a specific set of goals. For example, Uber’s Alexa skill wants people to book rides. Chef Basil wants consumers to find that perfect recipe for dinner. The Bartender wants to make sure you make that cocktail just right.

Other applications are all about the fun and engagement. For example, The Wayne Investigation by Warner Brothers wants kids, young adults and Batman fans to enjoy the experience. The real goals are to feel an affinity for Batman and to go watch a movie or buy merchandise.

Therefore, optimal session length is dependent on goals and vertical. What’s critical is to watch for deviations in session length, which are usually caused by onboarding issues or other user experience friction points.

Screen Shot 2016-08-18 at 4.40.20 PM

VoiceLabs Dashboard

Session Interactions

This metric is new, and a big departure from measuring success via other interfaces. This is an area where web and mobile developers will have to shed previous practices.

With VUX, forms and lists are obsolete. You can’t give someone a list of ten choices and expect them to listen for three minutes to them all, or make the right choice. They typically just give up and move on.

Instead, you need to train consumers to understand the commands they can say and help them speak freely (but in a defined way – I know, a little counter-intuitive). That way, people feel like they are talking naturally, and you are collecting the information you need to personalize the experience.

A great example of this is Chef Basil. Rather than listing 4,000 potential recipes one at a time, Chef Basil simply asks “What would you like to cook?”

A great example of what not to do is to port your mobile app directly to voice. For example, to buy tickets on Gametime’s mobile app I need to scroll through a list of events, then scroll through a list of tickets, then click through a few screens to find the actual price of the transactions – that won’t fly in VUXWeb and Mobile developers will need to learn this new VUX discipline to deliver a high-quality voice-first user experience.

Intent Threshold

This is probably the item where you go ‘huh?’ Intents are an Amazon Alexa specific term, that define the type of commands your application can interpret. For example, in the Domino’s Alexa skill, ‘Order’ is a defined intent. These intents are an integral part of voice development and an abstraction layer that makes building conversational experiences much more robust (deep-dive to come in another post!).

That said, it’s important not to get too carried away with Intents. There is a balancing act between the number of Intents and the ability to serve your users. You create too few Intents, your Voice app is boring. You create too many, you increase the risk of your application misunderstanding your users, and those users getting confused.

VoiceLabs is seeing that 8-15 Intents is the optimal range. That range will get higher over time as Voice technology improves, and it will also evolve based on vertical. Hence, the optimal Intent Threshold currently is 15. Stay tuned for more data and updates on this metric.

Screen Shot 2016-08-18 at 4.57.40 PMVoiceLabs Dashboard (2016)

Optimal Response Time

This is a fancy way of saying “How long will they listen to Alexa before they start telling her to shut up?” To be clear, Alexa’s voice is okay not great… she is no Morgan Freeman. Also, Amazon is COMMITTED to ‘Alexa everywhere,’ hence don’t expect to swap out Alexa’s voice anytime soon. (or ever)

Response Time is another balancing act between being too terse with your response causing confusion, and being too long-winded causing users to tune out. Therefore, that three sentence response might feel like the best way to make sure there is no confusion, but will cause consumers to yell “Alexa stop” and go do something else. The best way to monitor this is to look at unusually early Alexa “Stop” or “Cancel” events as part of your conversation path.

Most importantly, Voice User Experience (VUX) is the start of something big, and lots more learning to come. We are excited to see new innovation, and new key metrics to track and optimize.

-Alex and Adam

Alex and Adam @Alpine.AI

Learn more about Alpine.AI