Episerver’s 2017 Ascend North America Conference has just wrapped up, so there’s finally time for a breather and to share with everyone what SMITH created as a demonstration. First, I’d to thank Episerver for putting together a great conference in Las Vegas at The Cosmopolitan. It was a great opportunity to exchange ideas with the best thinkers in the ecosystem.

One of the most exciting benefits of Episerver is its ability to support best-of-breed API connections. To begin exploring what can be done, we decided to integrate Episerver and Amazon’s Alexa AI interface.

This is the first of four post-Ascend blogs on our Caffeinated Commerce demo, which uses conversational commerce to enable personalized buying experiences. (You can read more about our thoughts on that trend here.) The focus of this post will be around how we integrated Alexa with the Episerver storefront. The next posts will explore the mechanics of the Episerver deployment itself, and the intricacies of designing a conversational user interface.

Our theme for the conference was Caffeinated Commerce, that extra shot of energy that pushes a good experience to a great one. To literally bring that idea to life, we set up a simulated coffee shop in our booth at Ascend.


The demo had customers enter their basic information for personalization on a tablet before Alexa guided them through a voice-enabled shopping experience directly tied into the tablet storefront, having them select their coffee and add extras via voice commands.

The conversation and product recommendations were personalized according to the customer’s home town, and the experience closed out by providing them with news from back home, and some music inspired by their State. Customers walked away with a Caffeinated Commerce mug, a Starbucks Gift Card emailed to their phone, and hopefully, a good glimpse into the buying experience of tomorrow.

While simple in execution, it’s a first step towards a multi-dimensional experience where users can transact through multiple devices and technologies interchangeably.

We chose Alexa as our first voice-enablement API as it appeared to be the quickest to get up and running with, and also view it as the most powerful and accurate voice system on the market.  While I can’t yet comment on the validity of that statement, I do have to say that Amazon’s Alexa did a great job.

Not only did Alexa understand people speaking with different accents, noise was virtually a non-factor on the busy conference floor. We tested for noise in our prototype environment, but you never truly know how a system will perform until it’s “in the wild.” 

Our biggest test was the Monday Night Mixer, where all conference attendees gathered in one room. The music was LOUD, and the adult beverages enjoyed by the crowd had the natural effect of making conversation a little louder than average. Alexa held up perfectly, understanding everyone’s intents very clearly.

What was the process of setting up our custom Alexa skill?

The starting point in our journey, and where a lot of our time was spent, is on Amazon’s developer portal: developer.amazon.com. The developer account was used to log into the Echo device, essentially marrying the two platforms.

When this is done, any skill in development will be automatically linked to the Echo. This development feature is great as it not only allows you to test your skill without any extra configuration, it doesn’t require you to publish your skill to the outside world.  The latter doesn’t come into play with a public production skill, however it’s great for internal skills that you don’t want the rest of the world to have access to.

Setting up a skill is straightforward with 5 setup tabs.

1. Skill Information
This is where the basic skill information is set up. The Application ID is auto-generated and critical when building your skill web service, which we’ll get to in the next blog post in this series.

The Name of the skill is how it appears in the Alexa app. The Invocation Name defines how Alexa knows to go to your custom skill.  Our Invocation Name was SMITH, and the 2 basic ways to trigger our custom skill within the Alexa framework are by saying either “From SMITH” or “Ask SMITH.” 

One feature to note is that you can support additional audio within your skill.  In our case, we tied-in location-based music when a user completes an order (a fan-favorite) by enabling the audio player within our skill. 


2. Interaction Model
This tab is where you define the entire interaction model of your skill. There are 3 sub-sections here, and it’s important to understand the differences.

First, there are Intents. Intents are essentially anticipated keywords for the skill to understand. We set up Intents to capture login, coffee choice, and extras. Amazon also has built-in intents that we leveraged. For example, AMAZON.YesIntent lets us capture multiple variations of how users can say “Yes.”

Utterances are used to extend Intents, to provide variations of how Intents are presented in phrases.  For example, we had an Intent for selecting a coffee called SelectCoffee. We defined multiple Utterances specifically for that Intent to ensure that Alexa would be able to understand the range of ways a user might speak the phrase using variations in natural language.

SelectCoffee is the Intent name we defined, and “give me a” is how a user will indicate to Alexa that they want a coffee. For example, SelectCoffee: “give me a {CoffeeType}” or SelectCoffee: “I decided on a {CoffeeType}.”

I know what you’re thinking, “What’s with CoffeeType surrounded by curly braces?” These are what Amazon calls Slot Types.  Think of Slot Types as enumerated values, they are predictable words with pre-defined values. One Slot Type we used was for the actual catalog items (coffees) where we list all the possible values of the coffees within that one slot type (Cappuccino, Latte, etc.).  We also defined additional Slot Types for the extras you could order with your coffee (Milk, Cream, Sugar, etc.).

3. Configuration
Configuration is fairly straightforward. Amazon prefers that developers use an AWS web service, however since we’re on the Microsoft stack in this case, we pointed to an Azure-based web service.

The service endpoint is where we interpret the Intentions Alexa has passed along, and react accordingly in order to process the customer order and generate the conversation. More details regarding our Intent processing service, how we integrated with Episerver, and other integration points will follow in the next blog posts.

We also linked our Amazon account so that every Intent passed to our service includes the logged-in user information. Doing this allows us to have multiple Echo devices and sites running simultaneously, in turn allowing multiple orders to be processed at the same time.


4. SSL Certificate
The SSL certificate has one option, and we chose “My development endpoint is a sub-domain of a domain that has a wildcard certificate from a certificate authority.” This is required in order to work with our Azure service.

5. Test
The Test tab is a way we can quickly test our web service. The best part of this tab is that you get the full JSON request and response, allowing you to create stubs for more thorough testing.

That’s all for now.  In the next part of this series, I will dive into the Intent processing service, how we accept the requests, and how we integrate with Episerver itself. Another post will explore the intricacies of UX design with AI, in other words, no visible interface. 

Let us know if you'd like a deeper overview by reaching out! 

Chris Vafiadis is a Solutions Architect at SMITH

Tags: Episerver, ecommerce, Caffeinated Commerce, Alexa, Experience Design, Conversational Commerce, Technology