A Conversational User Interface is an interface that enables people to communicate and interact with technology by means of voice assisted services. When thoughtfully designed and executed, interaction with voice devices should be similar to natural conversation flows. Conversational interfaces leverage natural language processing and algorithms to interpret and convert voice to text. These invisible user interfaces are an emerging and driving force of the Internet of Things (IoT) and are poised to transform the next generation of consumer, smart-home, and enterprise applications. Amazon’s Alexa is one of the most popular voice assisted services and powers stationary devices such as Amazon’s Echo and Dot. Google Assistant is another popular voice service powered by Google Home, while Apple IOS includes Siri, and Microsoft has Cortana.

Recently, SMITH created a demonstration for the 2017 Episerver Ascend conference that integrates an Azure-deployed Episerver storefront, coupled with Amazon’s Alexa, Echo, and a tablet to create a Caffeinated Commerce experience. The application allows customers to experience a personalized conversation using voice, touch, and facial recognition to order coffee. Check out these posts to learn more about our Caffeinated Commerce experience and our thoughts on designing voice interfaces.

In this article, we will explore testing conversational interfaces using Amazon’s Alexa voice service and Echo device. Testing conversational interfaces focuses on validating conversation flows, intents, and outcomes. The potential problems encountered testing voice services and stationary devices is not the same as testing mobile or web apps. It can be much more challenging! Moreover, if not tested properly conversations can lead to unexpected and mischievous results.

Alexa Gets Hauled Downtown For Questioning

With technology’s rapid progression towards conversational interfaces and the advances of natural language processing and algorithms, the challenge remains how to effectively test these invisible user interfaces in a universe with limitless boundaries. Designing test coverage for conversational interfaces can be complex as there could be an infinite number of conversation flows and conditions to verify. Moreover, the technology is constantly learning and evolving. How users interact with the technology from nuances of voice, tone, phrasing, emotion, ambiguousness, and external environmental conditions should all be accounted for in terms of test coverage.

As the detectives from the Late Show discovered, our intents may not always be interpreted the way we expect and our friendly voice assisted devices can easily lead us into a blackhole.

Interactive Discovery

Mobile and web applications are assessed with a broad range of functional, integration, performance, and usability test practices. Interpreting these practices for conversational interfaces is not quite as clear. However, one thing is clear, testing conversational interfaces is a highly iterative and interactive discovery process. As logic patterns and scenarios are mapped-out and reimagined through the design and creative life-cycles, technical and manual testing becomes an inherent part of the workflow. Moreover, understanding the components and capabilities to evoke interaction with voice services is key when devising and executing test coverage.

Amazon’s Alexa

Conversational User Interfaces built on Amazon’s Alexa voice service are known as skills. Alexa Skills are composed of a web service and an interaction model. The Alexa Voice Service is Amazon’s intelligent voice recognition service that exposes skill capabilities to connected devices like the Echo.


The web service is the functionality that the skill will invoke. This web service can be an AWS Lambda function or any secured external web service-programmed in any language like Node.js, Java, and .Net. Most importantly, web services must adhere to the specifications for the Alexa REST interface using JSON.

The interaction model is configured on the Amazon Developer portal using the Alexa Skills Kit. The Alexa Skills Kit is a set of APIs and tools to configure, build, and test skill capabilities. Skills are designed on the intent model which maps voice invocations and is composed of the following:

  • Wake Word: The wake word activates Alexa. For example, it could be Alexa, Echo, or Amazon.
  • Invocation Name: The invocation name is a trigger which tells Alexa to invoke a specific skill.
  • Intent: An intent is the action made by the user in the form of a request or response and is defined in the Intent Schema.
  • Utterances: Utterances are the phrases and words defined for the skill to invoke intents. In conversational interfaces, a user could express an intent in many ways through various utterances. For example, the following phrase “Let me have” could be expressed in several ways:

    Give me …

    I will have

    I would like

    I'll go with

  • Slots: For each Intent, there can also be optional parameters known as slots, which are used by Alexa to capture user input. Built-in slot types include AMAZON.DATE, AMAZON.NUMBER, or AMAZON.DURATION. Custom slot types are used for lists of items that are not covered by one of Amazon’s built-in slot types.

The basic steps of a conversation, starts with the user waking the Alexa Voice Service using a connected device like the Echo and invoking a skill with the invocation name. Alexa interprets the request, routes it to the skill’s web service where it is processed. A response is returned to Alexa, which then delivers the voice response to the user.


The following example invokes our Caffeinated Commerce skill and starts a personalized conversation with a user from the state of Virginia:



Testing Conversational User Interfaces is a functional and manual process conducted in an iterative workflow. Functional testing focuses on verifying the web services and processes as well as integration points such as backend database systems. While manual testing encompasses testing the interaction model, user experience, and web services. For our Caffeinated Commerce demo, we tested our web service using Amazon’s Service Simulator and Voice Simulator in addition to adding unit test coverage. Our manual test coverage was focused on scenarios to verify all permutations of the interaction model, voice service, and EpiServer storefront.

Some guidelines to consider for manual test coverage includes:

Intents: Verify all intents can be properly triggered by voice commands from the words and phrases defined in the Intent Schema.

Utterances: Words and phrases can be used in many forms to express an intent. It is very important to test all utterances using voice commands. When utterances are not recognized or misinterpreted, it often ends in a poor user experience.

Test variations of the utterances with different slot values and different phrases. In addition, testing utterances often leads to identifying additional words and phrases a user might use to express an intent.

Slots: Test built-in slot types such as AMAZON.DATE, AMAZON.NUMBER, or AMAZON.DURATION, to verify spoken words are converted to the proper data type format. When testing custom slots values, also verify incorrect values, permutations, and plural forms.


Content: Verify the skill’s overall creative content in the context of conversation flows to ensure the words and meaning are properly interpreted by the voice service.

Negative Testing: Test conversation flows that result in negative conditions to verify how gracefully they are handled. For example:

- When there is a negative response to a request

- When the voice service cannot interpret an ambiguous or incorrect request/response

- When there is an exception in the service

The following is an example of a negative response resulting in a retry scenario:

Alexa:   “Great. Would you like anything in your coffee? You can have an extra shot, cream, milk or sugar. For example, say two extra shots, one milk”

User:     "mmm, I’ll have two sugars."

Alexa:   "You would like two sugars, Is that correct?"

User:     "No”

Alexa:   "What extras would you like in your coffee?"

User:     "One milk"

Alexa:   "You would like one milk, Is that correct?"

User:     "No”

Alexa:   “I’m sorry. It appears I am not understanding the extras you are requesting. Please take some time to review exactly what extras you would like and wake me up by saying Alexa, open the coffee shop

Response Time: Test how responsive the service is when a user does not respond to a request from the voice service in a timely manner. Alexa will only allow for one wait retry when a response is not made. For example:

Alexa:   "You would like one milk, Is that correct?"

User:     "”

Alexa:   "I am waiting on a confirmation on your extras"

User:     "”

Alexa:   “I’m sorry. It appears I am not understanding the extras you are requesting. Please take some time to review exactly what extras you would like and wake me up by saying Alexa, open the coffee shop

Environment Conditions: Test how responsive the voice service interprets interaction when there are external conditions such as background music, loud noises, or distractive conditions.

Help: Test how responsive the experience is from a usability perspective to provide guidance. For example, how well the voice interface’s help guide and assists users throughout the experience. Is there a voice command, a user can say to get assistance?

Alexa App: The Alexa App is very useful tool when manually testing to help troubleshoot and verify all interactions with Alexa. The Alexa app displays all requests and responses Alexa heard the user say in text format.


Developer Tools

For functional testing, there are a number of developer tools available to test and simulate skills. These tools range from basic text based interfaces to verify JSON requests and responses to more advanced libraries.

The following is a list of tools and practices available to help test skills in development:

  • Service Simulator: The Alexa Skills Kit provides a Service Simulator that enables developers to test skills without an Alexa enabled-device. It lets you type in utterances or an end point, and verify the JSON.




  • Voice Simulator: The Alexa Skills Kit provides a Voice Simulator that enables a means to hear the Alexa voice representation of the JSON response.


  • Alexa Skill Testing Tool: Amazon Echo Simulator is a browser-based interface to Alexa, and is intended to allow developers who are working with the Alexa Skills Kit to test skills in development.
  • Alexa Conversation: Alexa Conversation is a framework to easily test functionality by creating a conversation with a skill. This framework makes it easy to test Alexa skill's outputs for a given intent in different ways.

    Alexa Conversation is built on top of Mocha and Node.js and supports tests built using Behavior Driven Development (BDD). For more information on Alexa Conversation visit the following link.

  • Bespoken: Bespoken allows you to develop against actual devices, such as the Echo without redeploying updates. It allows Alexa to communicate with your local machine through a proxy service or a live endpoint.
  • Alexa Skill Test: Alexa Skill Test is a tool that provides a live express server for local testing of Alexa Skills written in Node.js.
  • Behavior Driven Development (BDD): BDD is a functional process to provide automated test coverage for conversational interfaces. BDD uses frameworks like Cucumber to describe tests in terms of desired behavior in a simple and meaningful way and is based on the Gherkin language.

Final Thoughts

Conversational User Interfaces are just at their infancy. They offer the promise of a more authentic, intimate, and immediate experience than any other technology before. The potential value they can add to our lives and the challenges they present are just being realized.

So, are you ready to grab a coffee and get started?

“Alexa, get a cappuccino from the coffee shop”

Edward Layeux is a Senior Quality Assurance Engineer at SMITH

Tags: Caffeinated Commerce, Alexa, QA, Experience Design