AWS Startups Blog

Bringing Art to Amazon Alexa on AWS Lambda

Guest post by Daniel Doubrovkine, CTO, Artsy


At a recent Artsy board meeting an investor asked, “You’ve shipped a tremendous amount of software in 2016 with a very small team. How did you do that?”

Indeed, in 2016 we’ve built a new live auctions service and executed over 40 auctions for major auction houses, including Phillips and Heritage. We’ve simultaneously grown our online publishing business to become the most read art publication in the world. And we’ve more than doubled the number of gallery partners on the platform, all while seeing fairly moderate growth of operational costs.

This progress is the result of combining great people, an exceptionally efficient organization, and a systematic approach to creating experiments with the breadth, depth, and virtually unlimited power of AWS. Practically, this means that we evaluate and adopt at least one major new framework with each non-critical project. We develop small ideas as opportunities to learn, and often graduate these into production workloads. For example, last year we tried and have now fully adopted Kubernetes with Amazon ECR. And today we’re exploring AWS Lambda for mission-critical applications after first trying to load data from Amazon S3 to Amazon Redshift to then shipping an Alexa skill that turns the little voice-activated device into an art historian.

In this post, I walk you through developing, deploying, and monitoring a Node.js Lambda function that powers Artsy on Alexa. We implement an Alexa skill that runs on a development machine, deploy the code to AWS Lambda, and enable it on an Alexa device, the Amazon Echo. You can find the complete source code for this process on Github.

First, a bit of context about the Amazon Echo. The device contains a built-in speech recognizer for the wake word, so it’s always listening. After it hears the wake word, a blue light ring turns on, and it begins transmitting the user’s voice (called an utterance) to the Alexa platform that runs on AWS. The light ring indicates that Alexa is “listening.” The Alexa cloud service translates speech to text and runs it through a natural language system to identify an intent, such as “ask Artsy about”. The intent is sent to a skill (a Lambda function) that generates a directive to “speak” along with markup in SSML format, which is transformed into voice and sent in WAV format back to the device, to be played back to the user.

To get started, you need an Amazon Apps and Services Developer account and access to AWS Lambda.

 

Designing an intent

 

To get Alexa to listen, you first design an intent. Each intent is identified by a name and a set of slots. The intents have to be simple and clear and use English language words or predefined vocabularies. I started with a simple “ask Artsy about an artist” intent, which takes an artist’s name as input:

{
   "intents": [
      {
         "intent": "AboutIntent",
         "slots": [
            {
               "name": "VALUE",
               "type": "NAME"
            }
         ]
      }
   ]
}

The only possible sample utterance of this intent is “about {VALUE}”. The “ask Artsy” portion is implied.

Alexa supports several built-in slot types, such as “AMAZON.DATE” or “AMAZON.NUMBER”. Because Alexa cannot understand artists’ names out-of-the-box, we had to teach it with a custom, user-defined slot type added to the Skill Interaction Model with about a thousand of the most popular artists’ names on Artsy.

Teaching Alexa through the interaction model

 

Implementing a skill

 

Intents are the API of a skill, otherwise known as an Alexa app. Using the open-source alexa-app library from the alexa-js community makes implementing intents easy.

In the following code, we define a “launch” event that is invoked when the skill is launched by the user (for example, “Alexa, open Artsy”). The launch event is followed by the “about” intent that we described earlier:

var alexa = require('alexa-app');
var app = new alexa.app('artsy');

app.launch(function(req, res) {
    res
        // welcome message
        .say("Welcome to Artsy! Ask me about an artist.")
        // don't close the session, wait for user input (an artist name)
        // and provide a re-prompt in case the user says something meaningless
        .shouldEndSession(false, "Ask me about an artist. Say help if you need help or exit any time to exit.")
        // speak the response
        .send();
});

app.intent('AboutIntent', {
        "slots": {
            "VALUE": "NAME"
        },
        "utterances": [
            "about {-|VALUE}"
        ]
    },
    function(req, res) {
      // intent implementation goes here
    }
});

The skill expects a slot value, which is the artist’s name.

var value = req.slot('VALUE');

if (!value) {
  return res
    .say("Sorry, I didn't get that artist name.")
    // don't close the session, wait for user input again (an artist name)
    .shouldEndSession(false, "Ask me about an artist. Say help if you need help or exit any time to exit.");
} else {
  // asynchronously look up the artist in the Artsy API, read their bio
  // tell alexa-app that we're performing an asynchronous operation by returning false
  return false;
}

We use the Artsy API to implement the actual skill. You can refer to the complete source code for implementation details. There’s not much more to it.

 

Organizing code

 

Although the production version of our skill runs on Lambda, the development version runs in Express, using a wrapper called alexa-app-server. It automatically loads skills from subdirectories with the following directory structure:

+--- server.js                  // the alexa-app-server host for development
+--- package.json               // dependencies of the host
+--- project.json               // lambda settings for deployment with apex
+----functions                  // all skills
     +--artsy                   // the artsy skill
        +--function.json        // describes the skill lambda function
        +--package.json         // dependencies of the skill
        +--index.js             // skill intent implementation
        +--schema.json          // exported skill intent schema
        +--utterances.txt       // exported skill utterances
        +--node_modules         // modules from npm install
+--- node_modules               // modules from npm install

The server also neatly exports the express.js server for automated testing:

var AlexaAppServer = require('alexa-app-server');

AlexaAppServer.start({
    port: 8080,
    app_dir: "functions",
    post: function(server) {
        module.exports = server.express;
    }
});

 

Skill modes

 

The skill is mounted standalone in AWS Lambda and runs under alexa-app-server in development. It decides what to do based on process.env['ENV'], which is natively supported by Lambda:

if (process.env['ENV'] == 'lambda') {
    exports.handle = app.lambda(); // AWS Lambda
} else {
    // development mode
    // http://localhost:8080/alexa/artsy
    module.exports = app; 
}

 

Automated testing

 

A Mocha test can use the Alexa app server to make an HTTP request using intent data. It expects well-defined SSML output:

chai = require('chai');
expect = chai.expect;
chai.use(require('chai-string'));
chai.use(require('chai-http'));

var server = require('../server');

describe('artsy alexa', function() {
    it('tells me about Norman Rockwell', function(done) {
        var aboutIntentRequest = require('./AboutIntentRequest.json');
        chai.request(server)
            .post('/alexa/artsy')
            .send(aboutIntentRequest)
            .end(function(err, res) {
                expect(res.status).to.equal(200);
                var data = JSON.parse(res.text);
                expect(data.response.outputSpeech.type).to.equal('SSML')
                var ssml = data.response.outputSpeech.ssml;
              expect(ssml).to.startWith('<speak>American artist Norman Rockwell ');
              done();
            });
    });
});

 

Lambda deployment

 

The production version of the Alexa skill is a Lambda function without the development server parts.

We created an “alexa-artsy” function with a new AWS IAM role, “alexa-artsy”, in AWS Lambda. We copied and pasted the role URN into “project.json”. This is a file that is used by Apex, a Lambda deployment tool (curl https://raw.githubusercontent.com/apex/apex/master/install.sh | sh) along with awscli (brew install awscli). We had to configure access to AWS (aws configure) the first time, too.

Production process of Alexa skill is similar to a Lambda function

To connect the Lambda function with an Alexa skill, we added an Alexa Skills Kit trigger.

Adding Alexa skills kit trigger

We also configured the Service Endpoint in the Alexa Skills Kit configuration to point to our Lambda function.

Configuring Service Endpoint in the Alexa Skills Kit

To deploy the Lambda function, we chose apex. You can use apex deploy and test the skill with apex invoke. This workflow creates a new function version every time, including a copy of any configured environment variables. A certified production version of the Alexa skill is tied to a specific version of the Lambda function, which is quite different from a typical server infrastructure. You have all the versions of a given function available at all times and accessible by the same URN.

Logs didn’t appear in Amazon CloudWatch with the execution policy created by default. I had to give the IAM “alexa-artsy” role more access to “arn:aws:logs:::*” via an additional inline policy:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "logs:CreateLogGroup",
        "logs:CreateLogStream",
        "logs:PutLogEvents"
      ],
      "Resource": "arn:aws:logs:*:*:*"
    }
  ]
}

The logs contain console.log output. The logs are versioned, timestamped, and searchable in CloudWatch.

Alexa logs searchable in CloudWatch

 

Testing

 

You can use echosim.io to test your skill, or you can use an actual Echo device, such as an Echo Dot. Test skills appear automatically in the Alexa configuration attached to your account and are automatically installed on all devices configured with it.

Testing Alexa skills

You can also enable the production version of the Artsy skill on your own device. Ask Alexa to “enable Artsy”.

 

Conclusion

 

In this post, we showed how to combine familiar Node.js developer tools and libraries, such as express.js and Mocha, to build a development version of a complete system. We then pushed the functional parts of it to AWS Lambda. This model works very well and removes all the busy work or headaches associated with typical server infrastructure. At Artsy, we now plan to look at larger systems that expose clearly defined APIs or perform tasks on demand, and attempt to decompose them into simpler moving parts that can be deployed and developed in a similar manner.

 

Find out more on the Artsy Engineering blog or follow me on Twitter.