AWS Blog

Announcing the Winners of the AWS Chatbot Challenge – Conversational, Intelligent Chatbots using Amazon Lex and AWS Lambda

A couple of months ago on the blog, I announced the AWS Chatbot Challenge in conjunction with Slack. The AWS Chatbot Challenge was an opportunity to build a unique chatbot that helped to solve a problem or that would add value for its prospective users. The mission was to build a conversational, natural language chatbot using Amazon Lex and leverage Lex’s integration with AWS Lambda to execute logic or data processing on the backend.

I know that you all have been anxiously waiting to hear announcements of who were the winners of the AWS Chatbot Challenge as much as I was. Well wait no longer, the winners of the AWS Chatbot Challenge have been decided.

May I have the Envelope Please? (The Trumpets sound)

The winners of the AWS Chatbot Challenge are:

  • First Place: BuildFax Counts by Joe Emison
  • Second Place: Hubsy by Andrew Riess, Andrew Puch, and John Wetzel
  • Third Place: PFMBot by Benny Leong and his team from MoneyLion.
  • Large Organization Winner: ADP Payroll Innovation Bot by Eric Liu, Jiaxing Yan, and Fan Yang

 

Diving into the Winning Chatbot Projects

Let’s take a walkthrough of the details for each of the winning projects to get a view of what made these chatbots distinctive, as well as, learn more about the technologies used to implement the chatbot solution.

 

BuildFax Counts by Joe Emison

The BuildFax Counts bot was created as a real solution for the BuildFax company to decrease the amount the time that sales and marketing teams can get answers on permits or properties with permits meet certain criteria.

BuildFax, a company co-founded by bot developer Joe Emison, has the only national database of building permits, which updates data from approximately half of the United States on a monthly basis. In order to accommodate the many requests that come in from the sales and marketing team regarding permit information, BuildFax has a technical sales support team that fulfills these requests sent to a ticketing system by manually writing SQL queries that run across the shards of the BuildFax databases. Since there are a large number of requests received by the internal sales support team and due to the manual nature of setting up the queries, it may take several days for getting the sales and marketing teams to receive an answer.

The BuildFax Counts chatbot solves this problem by taking the permit inquiry that would normally be sent into a ticket from the sales and marketing team, as input from Slack to the chatbot. Once the inquiry is submitted into Slack, a query executes and the inquiry results are returned immediately.

Joe built this solution by first creating a nightly export of the data in their BuildFax MySQL RDS database to CSV files that are stored in Amazon S3. From the exported CSV files, an Amazon Athena table was created in order to run quick and efficient queries on the data. He then used Amazon Lex to create a bot to handle the common questions and criteria that may be asked by the sales and marketing teams when seeking data from the BuildFax database by modeling the language used from the BuildFax ticketing system. He added several different sample utterances and slot types; both custom and Lex provided, in order to correctly parse every question and criteria combination that could be received from an inquiry.  Using Lambda, Joe created a Javascript Lambda function that receives information from the Lex intent and used it to build a SQL statement that runs against the aforementioned Athena database using the AWS SDK for JavaScript in Node.js library to return inquiry count result and SQL statement used.

The BuildFax Counts bot is used today for the BuildFax sales and marketing team to get back data on inquiries immediately that previously took up to a week to receive results.

Not only is BuildFax Counts bot our 1st place winner and wonderful solution, but its creator, Joe Emison, is a great guy.  Joe has opted to donate his prize; the $5,000 cash, the $2,500 in AWS Credits, and one re:Invent ticket to the Black Girls Code organization. I must say, you rock Joe for helping these kids get access and exposure to technology.

 

Hubsy by Andrew Riess, Andrew Puch, and John Wetzel

Hubsy bot was created to redefine and personalize the way users traditionally manage their HubSpot account. HubSpot is a SaaS system providing marketing, sales, and CRM software. Hubsy allows users of HubSpot to create engagements and log engagements with customers, provide sales teams with deals status, and retrieves client contact information quickly. Hubsy uses Amazon Lex’s conversational interface to execute commands from the HubSpot API so that users can gain insights, store and retrieve data, and manage tasks directly from Facebook, Slack, or Alexa.

In order to implement the Hubsy chatbot, Andrew and the team members used AWS Lambda to create a Lambda function with Node.js to parse the users request and call the HubSpot API, which will fulfill the initial request or return back to the user asking for more information. Terraform was used to automatically setup and update Lambda, CloudWatch logs, as well as, IAM profiles. Amazon Lex was used to build the conversational piece of the bot, which creates the utterances that a person on a sales team would likely say when seeking information from HubSpot. To integrate with Alexa, the Amazon Alexa skill builder was used to create an Alexa skill which was tested on an Echo Dot. Cloudwatch Logs are used to log the Lambda function information to CloudWatch in order to debug different parts of the Lex intents. In order to validate the code before the Terraform deployment, ESLint was additionally used to ensure the code was linted and proper development standards were followed.

 

PFMBot by Benny Leong and his team from MoneyLion

PFMBot, Personal Finance Management Bot,  is a bot to be used with the MoneyLion finance group which offers customers online financial products; loans, credit monitoring, and free credit score service to improve the financial health of their customers. Once a user signs up an account on the MoneyLion app or website, the user has the option to link their bank accounts with the MoneyLion APIs. Once the bank account is linked to the APIs, the user will be able to login to their MoneyLion account and start having a conversation with the PFMBot based on their bank account information.

The PFMBot UI has a web interface built with using Javascript integration. The chatbot was created using Amazon Lex to build utterances based on the possible inquiries about the user’s MoneyLion bank account. PFMBot uses the Lex built-in AMAZON slots and parsed and converted the values from the built-in slots to pass to AWS Lambda. The AWS Lambda functions interacting with Amazon Lex are Java-based Lambda functions which call the MoneyLion Java-based internal APIs running on Spring Boot. These APIs obtain account data and related bank account information from the MoneyLion MySQL Database.

 

ADP Payroll Innovation Bot by Eric Liu, Jiaxing Yan, and Fan Yang

ADP PI (Payroll Innovation) bot is designed to help employees of ADP customers easily review their own payroll details and compare different payroll data by just asking the bot for results. The ADP PI Bot additionally offers issue reporting functionality for employees to report payroll issues and aids HR managers in quickly receiving and organizing any reported payroll issues.

The ADP Payroll Innovation bot is an ecosystem for the ADP payroll consisting of two chatbots, which includes ADP PI Bot for external clients (employees and HR managers), and ADP PI DevOps Bot for internal ADP DevOps team.


The architecture for the ADP PI DevOps bot is different architecture from the ADP PI bot shown above as it is deployed internally to ADP. The ADP PI DevOps bot allows input from both Slack and Alexa. When input comes into Slack, Slack sends the request to Lex for it to process the utterance. Lex then calls the Lambda backend, which obtains ADP data sitting in the ADP VPC running within an Amazon VPC. When input comes in from Alexa, a Lambda function is called that also obtains data from the ADP VPC running on AWS.

The architecture for the ADP PI bot consists of users entering in requests and/or entering issues via Slack. When requests/issues are entered via Slack, the Slack APIs communicate via Amazon API Gateway to AWS Lambda. The Lambda function either writes data into one of the Amazon DynamoDB databases for recording issues and/or sending issues or it sends the request to Lex. When sending issues, DynamoDB integrates with Trello to keep HR Managers abreast of the escalated issues. Once the request data is sent from Lambda to Lex, Lex processes the utterance and calls another Lambda function that integrates with the ADP API and it calls ADP data from within the ADP VPC, which runs on Amazon Virtual Private Cloud (VPC).

Python and Node.js were the chosen languages for the development of the bots.

The ADP PI bot ecosystem has the following functional groupings:

Employee Functionality

  • Summarize Payrolls
  • Compare Payrolls
  • Escalate Issues
  • Evolve PI Bot

HR Manager Functionality

  • Bot Management
  • Audit and Feedback

DevOps Functionality

  • Reduce call volume in service centers (ADP PI Bot).
  • Track issues and generate reports (ADP PI Bot).
  • Monitor jobs for various environment (ADP PI DevOps Bot)
  • View job dashboards (ADP PI DevOps Bot)
  • Query job details (ADP PI DevOps Bot)

 

Summary

Let’s all wish all the winners of the AWS Chatbot Challenge hearty congratulations on their excellent projects.

You can review more details on the winning projects, as well as, all of the submissions to the AWS Chatbot Challenge at: https://awschatbot2017.devpost.com/submissions. If you are curious on the details of Chatbot challenge contest including resources, rules, prizes, and judges, you can review the original challenge website here:  https://awschatbot2017.devpost.com/.

Hopefully, you are just as inspired as I am to build your own chatbot using Lex and Lambda. For more information, take a look at the Amazon Lex developer guide or the AWS AI blog on Building Better Bots Using Amazon Lex (Part 1)

Chat with you soon!

Tara

New – SES Dedicated IP Pools

Today we released Dedicated IP Pools for Amazon Simple Email Service (SES). With dedicated IP pools, you can specify which dedicated IP addresses to use for sending different types of email. Dedicated IP pools let you use your SES for different tasks. For instance, you can send transactional emails from one set of IPs and you can send marketing emails from another set of IPs.

If you’re not familiar with Amazon SES these concepts may not make much sense. We haven’t had the chance to cover SES on this blog since 2016, which is a shame, so I want to take a few steps back and talk about the service as a whole and some of the enhancements the team has made over the past year. If you just want the details on this new feature I strongly recommend reading the Amazon Simple Email Service Blog.

What is SES?

So, what is SES? If you’re a customer of Amazon.com you know that we send a lot of emails. Bought something? You get an email. Order shipped? You get an email. Over time, as both email volumes and types increased Amazon.com needed to build an email platform that was flexible, scalable, reliable, and cost-effective. SES is the result of years of Amazon’s own work in dealing with email and maximizing deliverability.

In short: SES gives you the ability to send and receive many types of email with the monitoring and tools to ensure high deliverability.

Sending an email is easy; one simple API call:

import boto3
ses = boto3.client('ses')
ses.send_email(
    Source='randhunt@amazon.com',
    Destination={'ToAddresses': ['randhunt@amazon.com']},
    Message={
        'Subject': {'Data': 'Hello, World!'},
        'Body': {'Text': {'Data': 'Hello, World!'}}
    }
)

Receiving and reacting to emails is easy too. You can set up rulesets that forward received emails to Amazon Simple Storage Service (S3), Amazon Simple Notification Service (SNS), or AWS Lambda – you could even trigger a Amazon Lex bot through Lambda to communicate with your customers over email. SES is a powerful tool for building applications. The image below shows just a fraction of the capabilities:

Deliverability 101

Deliverability is the percentage of your emails that arrive in your recipients’ inboxes. Maintaining deliverability is a shared responsibility between AWS and the customer. AWS takes the fight against spam very seriously and works hard to make sure services aren’t abused. To learn more about deliverability I recommend the deliverability docs. For now, understand that deliverability is an important aspect of email campaigns and SES has many tools that enable a customer to manage their deliverability.

Dedicated IPs and Dedicated IP pools

When you’re starting out with SES your emails are sent through a shared IP. That IP is responsible for sending mail on behalf of many customers and AWS works to maintain appropriate volume and deliverability on each of those IPs. However, when you reach a sufficient volume shared IPs may not be the right solution.

By creating a dedicated IP you’re able to fully control the reputations of those IPs. This makes it vastly easier to troubleshoot any deliverability or reputation issues. It’s also useful for many email certification programs which require a dedicated IP as a commitment to maintaining your email reputation. Using the shared IPs of the Amazon SES service is still the right move for many customers but if you have sustained daily sending volume greater than hundreds of thousands of emails per day you might want to consider a dedicated IP. One caveat to be aware of: if you’re not sending a sufficient volume of email with a consistent pattern a dedicated IP can actually hurt your reputation. Dedicated IPs are $24.95 per address per month at the time of this writing – but you can find out more at the pricing page.

Before you can use a Dedicated IP you need to “warm” it. You do this by gradually increasing the volume of emails you send through a new address. Each IP needs time to build a positive reputation. In March of this year SES released the ability to automatically warm your IPs over the course of 45 days. This feature is on by default for all new dedicated IPs.

Customers who send high volumes of email will typically have multiple dedicated IPs. Today the SES team released dedicated IP pools to make managing those IPs easier. Now when you send email you can specify a configuration set which will route your email to an IP in a pool based on the pool’s association with that configuration set.

One of the other major benefits of this feature is that it allows customers who previously split their email sending across several AWS accounts (to manage their reputation for different types of email) to consolidate into a single account.

You can read the documentation and blog for more info.

AWS Online Tech Talks – August 2017

Welcome to mid-August, everyone–the season of beach days, family road trips, and an inbox full of “out of office” emails from your coworkers. Just in case spending time indoors has you feeling a bit blue, we’ve got a piping hot batch of AWS Online Tech Talks for you to check out. Kick up your feet, grab a glass of ice cold lemonade, and dive into our latest Tech Talks on Compute and DevOps.

August 2017 – Schedule

Noted below are the upcoming scheduled live, online technical sessions being held during the month of August. Make sure to register ahead of time so you won’t miss out on these free talks conducted by AWS subject matter experts.

Webinars featured this month are:

Thursday, August 17 – Compute

9:00 – 9:40 AM PDT: Deep Dive on Lambda@Edge.

Monday, August 28 – DevOps

10:30 – 11:10 AM PDT: Building a Python Serverless Applications with AWS Chalice.

12:00 – 12:40 PM PDT: How to Deploy .NET Code to AWS from Within Visual Studio.

The AWS Online Tech Talks series covers a broad range of topics at varying technical levels. These sessions feature live demonstrations & customer examples led by AWS engineers and Solution Architects. Check out the AWS YouTube channel for more on-demand webinars on AWS technologies.

– Sara (Hello everyone, I’m a co-op from Northeastern University joining the team until December.)

New – VPC Endpoints for DynamoDB

Starting today Amazon Virtual Private Cloud (VPC) Endpoints for Amazon DynamoDB are available in all public AWS regions. You can provision an endpoint right away using the AWS Management Console or the AWS Command Line Interface (CLI). There are no additional costs for a VPC Endpoint for DynamoDB.

Many AWS customers run their applications within a Amazon Virtual Private Cloud (VPC) for security or isolation reasons. Previously, if you wanted your EC2 instances in your VPC to be able to access DynamoDB, you had two options. You could use an Internet Gateway (with a NAT Gateway or assigning your instances public IPs) or you could route all of your traffic to your local infrastructure via VPN or AWS Direct Connect and then back to DynamoDB. Both of these solutions had security and throughput implications and it could be difficult to configure NACLs or security groups to restrict access to just DynamoDB. Here is a picture of the old infrastructure.

Creating an Endpoint

Let’s create a VPC Endpoint for DynamoDB. We can make sure our region supports the endpoint with the DescribeVpcEndpointServices API call.


aws ec2 describe-vpc-endpoint-services --region us-east-1
{
    "ServiceNames": [
        "com.amazonaws.us-east-1.dynamodb",
        "com.amazonaws.us-east-1.s3"
    ]
}

Great, so I know my region supports these endpoints and I know what my regional endpoint is. I can grab one of my VPCs and provision an endpoint with a quick call to the CLI or through the console. Let me show you how to use the console.

First I’ll navigate to the VPC console and select “Endpoints” in the sidebar. From there I’ll click “Create Endpoint” which brings me to this handy console.

You’ll notice the AWS Identity and Access Management (IAM) policy section for the endpoint. This supports all of the fine grained access control that DynamoDB supports in regular IAM policies and you can restrict access based on IAM policy conditions.

For now I’ll give full access to my instances within this VPC and click “Next Step”.

This brings me to a list of route tables in my VPC and asks me which of these route tables I want to assign my endpoint to. I’ll select one of them and click “Create Endpoint”.

Keep in mind the note of warning in the console: if you have source restrictions to DynamoDB based on public IP addresses the source IP of your instances accessing DynamoDB will now be their private IP addresses.

After adding the VPC Endpoint for DynamoDB to our VPC our infrastructure looks like this.

That’s it folks! It’s that easy. It’s provided at no cost. Go ahead and start using it today. If you need more details you can read the docs here.

AWS Partner Webinar Series – August 2017

We love bringing our customers helpful information and we have another cool series we are excited to tell you about. The AWS Partner Webinar Series is a selection of live and recorded presentations covering a broad range of topics at varying technical levels and scale. A little different from our AWS Online TechTalks, each AWS Partner Webinar is hosted by an AWS solutions architect and an AWS Competency Partner who has successfully helped customers evaluate and implement the tools, techniques, and technologies of AWS.

Check out this month’s webinars and let us know which ones you found the most helpful! All schedule times are shown in the Pacific Time (PDT) time zone.

Security Webinars

Sophos
Seeing More Clearly: ATLO Software Secures Online Training Solutions for Correctional Facilities with SophosUTM on AWS Link.
August 17th, 2017 | 10:00 AM PDT

F5
F5 on AWS: How MailControl Improved their Application Visibility and Security
August 23, 2017 | 10:00 AM PDT

Big Data Webinars

Tableau, Matillion, 47Lining, NorthBay
Unlock Insights and Reduce Costs by Modernizing Your Data Warehouse on AWS
August 22, 2017 | 10:00 AM PDT

Storage Webinars

StorReduce
How Globe Telecom does Primary Backups via StorReduce to the AWS Cloud
August 29, 2017 | 8:00 AM PDT

Commvault
Moving Forward Faster: How Monash University Automated Data Movement for 3500 Virtual Machines to AWS with Commvault
August 29, 2017 | 1:00 PM PDT

Dell EMC
Moving Forward Faster: Protect Your Workloads on AWS With Increased Scale and Performance
August 30, 2017 | 11:00 AM PDT

Druva
How Hatco Protects Against Ransomware with Druva on AWS
September 13, 2017 | 10:00 AM PDT

AWS Summit New York – Summary of Announcements

Whew – what a week! Tara, Randall, Ana, and I have been working around the clock to create blog posts for the announcements that we made at the AWS Summit in New York. Here’s a summary to help you to get started:

Amazon Macie – This new service helps you to discover, classify, and secure content at scale. Powered by machine learning and making use of Natural Language Processing (NLP), Macie looks for patterns and alerts you to suspicious behavior, and can help you with governance, compliance, and auditing. You can read Tara’s post to see how to put Macie to work; you select the buckets of interest, customize the classification settings, and review the results in the Macie Dashboard.

AWS GlueRandall’s post (with deluxe animated GIFs) introduces you to this new extract, transform, and load (ETL) service. Glue is serverless and fully managed, As you can see from the post, Glue crawls your data, infers schemas, and generates ETL scripts in Python. You define jobs that move data from place to place, with a wide selection of transforms, each expressed as code and stored in human-readable form. Glue uses Development Endpoints and notebooks to provide you with a testing environment for the scripts you build. We also announced that Amazon Athena now integrates with Amazon Glue, as does Apache Spark and Hive on Amazon EMR.

AWS Migration Hub – This new service will help you to migrate your application portfolio to AWS. My post outlines the major steps and shows you how the Migration Hub accelerates, tracks,and simplifies your migration effort. You can begin with a discovery step, or you can jump right in and migrate directly. Migration Hub integrates with tools from our migration partners and builds upon the Server Migration Service and the Database Migration Service.

CloudHSM Update – We made a major upgrade to AWS CloudHSM, making the benefits of hardware-based key management available to a wider audience. The service is offered on a pay-as-you-go basis, and is fully managed. It is open and standards compliant, with support for multiple APIs, programming languages, and cryptography extensions. CloudHSM is an integral part of AWS and can be accessed from the AWS Management Console, AWS Command Line Interface (CLI), and through API calls. Read my post to learn more and to see how to set up a CloudHSM cluster.

Managed Rules to Secure S3 Buckets – We added two new rules to AWS Config that will help you to secure your S3 buckets. The s3-bucket-public-write-prohibited rule identifies buckets that have public write access and the s3-bucket-public-read-prohibited rule identifies buckets that have global read access. As I noted in my post, you can run these rules in response to configuration changes or on a schedule. The rules make use of some leading-edge constraint solving techniques, as part of a larger effort to use automated formal reasoning about AWS.

CloudTrail for All Customers – Tara’s post revealed that AWS CloudTrail is now available and enabled by default for all AWS customers. As a bonus, Tara reviewed the principal benefits of CloudTrail and showed you how to review your event history and to deep-dive on a single event. She also showed you how to create a second trail, for use with CloudWatch CloudWatch Events.

Encryption of Data at Rest for EFS – When you create a new file system, you now have the option to select a key that will be used to encrypt the contents of the files on the file system. The encryption is done using an industry-standard AES-256 algorithm. My post shows you how to select a key and to verify that it is being used.

Watch the Keynote
My colleagues Adrian Cockcroft and Matt Wood talked about these services and others on the stage, and also invited some AWS customers to share their stories. Here’s the video:

Watch the Twitch Stram
My colleagues Ian Massingham, Tara Walker, and Abby Fuller hosted a live stream on the AWS Twitch channel:

Jeff;

 

Launch – Hello Amazon Macie: Automatically Discover, Classify, and Secure Content at Scale

When Jeff and I heard about this service, we both were curious on the meaning of the name Macie. Of course, Jeff being a great researcher looked up the name Macie and found that the name Macie has two meanings. It has both French and English (UK) based origin, it is typically a girl name, has various meanings. The first meaning of Macie that was found, said that that name meant “weapon”.  The second meaning noted the name was representative of a person that is bold, sporty, and sweet. In a way, these definitions are appropriate, as today I am happy to announce that we are launching  Amazon Macie, a new security service that uses machine learning to help identify and protect sensitive data stored in AWS from breaches, data leaks, and unauthorized access with Amazon Simple Storage Service (S3) being the initial data store. Therefore, I can imagine that Amazon Macie could be described as a bold, weapon for AWS customers providing a sweet service with a sporty user interface that helps to protects against malicious access of your data at rest. Whew, that was a mouthful, but I unbelievably got all the Macie descriptions out in a single sentence! Nevertheless, I am a thrilled to share with you the power of the new Amazon Macie service.

Amazon Macie is a service powered by machine learning that can automatically discover and classify your data stored in Amazon S3. But Macie doesn’t stop there, once your data has been classified by Macie, it assigns each data item a business value, and then continuously monitors the data in order to detect any suspicious activity based upon access patterns. Key features of the Macie service include:

  • Data Security Automation: analyzes, classifies, and processes data to understand the historical patterns, user authentications to data, data access locations, and times of access.
  • Data Security & Monitoring: actively monitors usage log data for anomaly detected along with automatic resolution of reported issues through CloudWatch Events and Lambda
  • Data Visibility for Proactive Loss prevention: Provides management visibility into details of storage data while providing immediate protection without the need for manual customer input
  • Data Research and Reporting: allows administrative configuration for reporting and alert management requirements

How does Amazon Macie accomplish this you ask? 

Using machine learning algorithms for natural language processing (NLP), Macie can automate the classification of data in your S3 buckets. In addition, Amazon Macie takes advantage of predictive analytics algorithms enabling data access patterns to be dynamically analyzed. Learnings are then used to inform and to alert you on possible suspicious behavior. Macie also runs an engine specifically to detect common sources of personally identifiable information (PII), or sensitive personal information (SP).  Macie takes advantage of AWS CloudTrail and continuously checks Cloudtrail events for PUT requests in S3 buckets and automatically classify new objects in almost real time.

While Macie is a powerful tool to use for security and data protection in the AWS cloud, it also can aid you with governance, compliance requirements, and/or audit standards.  Many of you may already be aware of the EU’s most stringent privacy regulation to date – The General Protection Data Regulation (GDPR), which becomes enforceable on May 25, 2018. As Amazon Macie recognizes personally identifiable information (PII) and provides customers with dashboards and alerts, it will enable customers to comply with GDPR regulations around encryption and pseudonymization of data. When combined with Lambda queries, Macie becomes a powerful tool to help remediate GDPR concerns.

Tour of the Amazon Macie Service

Let’s look a tour of the service and look at Amazon Macie up close and personal.

First, I will log onto the Macie console and start the process of setting up Macie so that I can start to my data classification and protection by clicking the Get Started button.


As you can see, to enable the Amazon Macie service, I must have the appropriate IAM roles created for the service, and additionally I will need to have AWS CloudTrail enabled in my account.

I will create these roles and turn on the AWS CloudTrail service in my account. To make things easier for you to setup Macie, you can take advantage of sample template for CloudFormation provided in the Macie User Guide that will set up required IAM roles and policies for you, you then would only need to setup a trail as noted in the CloudTrail documentation.

If you have multiple AWS accounts, you should note that the account you use to enable the Macie service will be noted as the master account, you can integrate other accounts with the Macie service but they will have the member account designation. Users from member accounts will need to use an IAM role to federate access to the master account in order access the Macie console.

Now that my IAM roles are created and CloudTrail is enabled, I will click the Enable Macie button to start Macie’s data monitoring and protection.


Once Macie is finished starting the service in your account, you will be brought to the service main screen and any existing alerts in your account will be presented to you. Since I have just started the service, I currently have no existing alerts at this time.


Considering we are doing a tour of the Macie service, I will now integrate some of my S3 buckets with Macie. However, you do not have to specify any S3 buckets for Macie to start monitoring since the service already uses the AWS CloudTrail Management API analyze and process information. With this tour of Macie, I have decided to monitor some object level API events in from certain buckets in CloudTrail.

In order to integrate with S3, I will go to the Integrations tab of the Macie console.  Once on the Integrations tab, I will see two options: Accounts and Services. The Account option is used to integrate member accounts with Macie and to set your data retention policy. Since I want to integrate specific S3 buckets with Macie, I’ll click the Services option go to the Services tab.


When I integrate Macie with the S3 service, a trail and a S3 bucket will be created to store logs about S3 data events. To get started, I will use the Select an account drop down to choose an account.  Once my account is selected, the services available for integration are presented. I’ll select the Amazon S3 service by clicking the Add button.

Now I can select the buckets that I want Macie to analyze, selecting the Review and Save button takes me to a screen which I confirm that I desire object level logging by clicking Save button.

4
Next, on our Macie tour, let’s look at how we can customize data classification with Macie.

As we discussed, Macie will automatically monitor and classify your data. Once Macie identifies your data it will classify your data objects by file and content type. Macie will also use a support vector machine (SVM) classifier to classify the content within S3 objects in addition to the metadata of the file. In deep learning/machine learning fields of study, support vector machines are supervised learning models, which have learning algorithms used for classification and regression analysis of data. Macie trained the SVM classifier by using a data of varying content types optimized to support accurate detection of data content even including the source code you may write.

Macie will assign only one content type per data object or file, however, you have the ability to enable or disable content type and file extensions in order to include or exclude them from the Macie service classifying these objects. Once Macie classifies the data, it will assign risk level of the object between 1 and 10 with 10 being the highest risk and 1 being the lowest data risk level.

To customize the classification of our data with Macie, I’ll go to the Settings Tab. I am now presented with the choices available to enable or disable the Macie classifications settings.


For an example during our tour of Macie, I will choose File extension. When presented with the list of file extensions that Macie tracks and uses for classifications.

As a test, I’ll edit the apk file extension for Android application install file, and disable monitoring of this file by selecting No – disabled from the dropdown and clicking the Save button. Of course, later I will turn this back on since I want to keep my entire collection of data files safe including my Android development binaries.


One last thing I want to note about data classification using Macie is that the service provides visibility in how you data object are being classified and highlights data assets that you have stored regarding how critical or important the information for compliance, for your personal data and for your business.

Now that we have explored the data that Macie classifies and monitors, the last stop on our service tour is the Macie dashboard.

 

The Macie Dashboard provides us with a complete picture of all of the data and activity that has been gathered as Macie monitors and classifies our data. The dashboard displays Metrics and Views grouped by categories to provide different visual perspectives of your data. Within these dashboard screens, you also you can go from a metric perspective directly to the Research tab to build and run queries based on the metric. These queries can be used to set up customized alerts for notification of any possible security issues or problems. We won’t have an opportunity to tour the Research or Alerts tab, but you can find out more information about these features in the Macie user guide.

Turning back to the Dashboard, there are so many great resources in the Macie Dashboard that we will not be able to stop at each view, metric, and feature during our tour, so let me give you an overview of all the features of the dashboard that you can take advantage of using.

Dashboard Metrics monitored data grouped by the following categories:

  • High-risk S3 objects: data objects with risk levels of 8 through 10.
  • Total event occurrences: – total count of all event occurrences since Macie was enabled
  • Total user sessions – 5-minute snapshot of CloudTrail data

Dashboard Views – views to display various points of the monitored data and activity:

  • S3 objects for a selected time range
  • S3 objects
  • S3 objects by personally identifiable information (PII)
  • S3 objects by ACL
  • CloudTrail events and associated users
  • CloudTrail errors and associated users
  • Activity location
  • AWS CLoudTrail events
  • Activity ISPs
  • AWS CloudTrail user identity types

Summary

Well, that concludes our tour of the new and exciting Amazon Macie service. Amazon Macie is a sensational new service that uses the power of machine learning and deep learning to aid you in securing, identifying, and protecting your data stored in Amazon S3. Using natural language processing (NLP) to automate data classification, Amazon Macie enables you to easily get started with high accuracy classification and immediate protection of your data by simply enabling the service.  The interactive dashboards give visibility to the where, what, who, and when of your information allowing you to proactively analyze massive streams of data, data accesses, and API calls in your environment. Learn more about Amazon Macie by visiting the product page or the documentation in the Amazon Macie user guide.

Tara

New – Encryption of Data at Rest for Amazon Elastic File System (EFS)

We launched Amazon Elastic File System in production form a little over a year ago (see Amazon Elastic File System – Production Ready in Three Regions for more information). Later in the year we added On-Premises access via Direct Connect and made EFS available in the US East (Ohio) Region, following up this year with availability in the EU (Frankfurt) and Asia Pacific (Sydney) Regions.

Encryption at Rest
Today we are adding support for encryption of data at rest. When you create a new file system, you can select a key that will be used to encrypt the contents of the files that you store on the file system. The key can be a built-in key that is managed by AWS or a key that you created yourself using AWS Key Management Service (KMS). File metadata (file names, directory names, and directory contents) will be encrypted using a key managed by AWS. Both forms of encryption are implemented using an industry-standard AES-256 algorithm.

You can set this up in seconds when you create a new file system. You simply choose the built-in key (aws/elasticfilesystem) or one of your own:

EFS will take care of the rest! You can select the filesystem in the console to verify that it is encrypted as desired:

A cryptographic algorithm that meets the approval of FIPS 140-2 is used to encrypt data and metadata. The encryption is transparent and has a minimal effect on overall performance.

You can use AWS Identity and Access Management (IAM) to control access to the Customer Master Key (CMK). The CMK must be enabled in order to grant access to the file system; disabling the key prevents it from being used to create new file systems and blocks access (after a period of time) to existing file systems that it protects. To learn more about your options, read Managing Access to Encrypted File Systems.

Available Now
Encryption of data at rest is available now in all regions where EFS is supported, at no additional charge.

Jeff;

 

Launch – AWS Glue Now Generally Available

by Randall Hunt | on | in AWS Glue | Permalink | Comments |  Share

Today we’re excited to announce the general availability of AWS Glue. Glue is a fully managed, serverless, and cloud-optimized extract, transform and load (ETL) service. Glue is different from other ETL services and platforms in a few very important ways.

First, Glue is “serverless” – you don’t need to provision or manage any resources and you only pay for resources when Glue is actively running. Second, Glue provides crawlers that can automatically detect and infer schemas from many data sources, data types, and across various types of partitions. It stores these generated schemas in a centralized Data Catalog for editing, versioning, querying, and analysis. Third, Glue can automatically generate ETL scripts (in Python!) to translate your data from your source formats to your target formats. Finally, Glue allows you to create development endpoints that allow your developers to use their favorite toolchains to construct their ETL scripts. Ok, let’s dive deep with an example.

In my job as a Developer Evangelist I spend a lot of time traveling and I thought it would be cool to play with some flight data. The Bureau of Transportations Statistics is kind enough to share all of this data for anyone to use here. We can easily download this data and put it in an Amazon Simple Storage Service (S3) bucket. This data will be the basis of our work today.

Crawlers

First, we need to create a Crawler for our flights data from S3. We’ll select Crawlers in the Glue console and follow the on screen prompts from there. I’ll specify s3://crawler-public-us-east-1/flight/2016/csv/ as my first datasource (we can add more later if needed). Next, we’ll create a database called flights and give our tables a prefix of flights as well.

The Crawler will go over our dataset, detect partitions through various folders – in this case months of the year, detect the schema, and build a table. We could add additonal data sources and jobs into our crawler or create separate crawlers that push data into the same database but for now let’s look at the autogenerated schema.

I’m going to make a quick schema change to year, moving it from BIGINT to INT. Then I can compare the two versions of the schema if needed.

Now that we know how to correctly parse this data let’s go ahead and do some transforms.

ETL Jobs

Now we’ll navigate to the Jobs subconsole and click Add Job. Will follow the prompts from there giving our job a name, selecting a datasource, and an S3 location for temporary files. Next we add our target by specifying “Create tables in your data target” and we’ll specify an S3 location in Parquet format as our target.

After clicking next, we’re at screen showing our various mappings proposed by Glue. Now we can make manual column adjustments as needed – in this case we’re just going to use the X button to remove a few columns that we don’t need.

This brings us to my favorite part. This is what I absolutely love about Glue.

Glue generated a PySpark script to transform our data based on the information we’ve given it so far. On the left hand side we can see a diagram documenting the flow of the ETL job. On the top right we see a series of buttons that we can use to add annotated data sources and targets, transforms, spigots, and other features. This is the interface I get if I click on transform.

If we add any of these transforms or additional data sources, Glue will update the diagram on the left giving us a useful visualization of the flow of our data. We can also just write our own code into the console and have it run. We can add triggers to this job that fire on completion of another job, a schedule, or on demand. That way if we add more flight data we can reload this same data back into S3 in the format we need.

I could spend all day writing about the power and versatility of the jobs console but Glue still has more features I want to cover. So, while I might love the script editing console, I know many people prefer their own development environments, tools, and IDEs. Let’s figure out how we can use those with Glue.

Development Endpoints and Notebooks

A Development Endpoint is an environment used to develop and test our Glue scripts. If we navigate to “Dev endpoints” in the Glue console we can click “Add endpoint” in the top right to get started. Next we’ll select a VPC, a security group that references itself and then we wait for it to provision.


Once it’s provisioned we can create an Apache Zeppelin notebook server by going to actions and clicking create notebook server. We give our instance an IAM role and make sure it has permissions to talk to our data sources. Then we can either SSH into the server or connect to the notebook to interactively develop our script.

Pricing and Documentation

You can see detailed pricing information here. Glue crawlers, ETL jobs, and development endpoints are all billed in Data Processing Unit Hours (DPU) (billed by minute). Each DPU-Hour costs $0.44 in us-east-1. A single DPU provides 4vCPU and 16GB of memory.

We’ve only covered about half of the features that Glue has so I want to encourage everyone who made it this far into the post to go read the documentation and service FAQs. Glue also has a rich and powerful API that allows you to do anything console can do and more.

We’re also releasing two new projects today. The aws-glue-libs provide a set of utilities for connecting, and talking with Glue. The aws-glue-samples repo contains a set of example jobs.

I hope you find that using Glue reduces the time it takes to start doing things with your data. Look for another post from me on AWS Glue soon because I can’t stop playing with this new service.
Randall

AWS CloudHSM Update – Cost Effective Hardware Key Management at Cloud Scale for Sensitive & Regulated Workloads

Our customers run an incredible variety of mission-critical workloads on AWS, many of which process and store sensitive data. As detailed in our Overview of Security Processes document, AWS customers have access to an ever-growing set of options for encrypting and protecting this data. For example, Amazon Relational Database Service (RDS) supports encryption of data at rest and in transit, with options tailored for each supported database engine (MySQL, SQL Server, Oracle, MariaDB, PostgreSQL, and Aurora).

Many customers use AWS Key Management Service (KMS) to centralize their key management, with others taking advantage of the hardware-based key management, encryption, and decryption provided by AWS CloudHSM to meet stringent security and compliance requirements for their most sensitive data and regulated workloads (you can read my post, AWS CloudHSM – Secure Key Storage and Cryptographic Operations, to learn more about Hardware Security Modules, also known as HSMs).

Major CloudHSM Update
Today, building on what we have learned from our first-generation product, we are making a major update to CloudHSM, with a set of improvements designed to make the benefits of hardware-based key management available to a much wider audience while reducing the need for specialized operating expertise. Here’s a summary of the improvements:

Pay As You Go – CloudHSM is now offered under a pay-as-you-go model that is simpler and more cost-effective, with no up-front fees.

Fully Managed – CloudHSM is now a scalable managed service; provisioning, patching, high availability, and backups are all built-in and taken care of for you. Scheduled backups extract an encrypted image of your HSM from the hardware (using keys that only the HSM hardware itself knows) that can be restored only to identical HSM hardware owned by AWS. For durability, those backups are stored in Amazon Simple Storage Service (S3), and for an additional layer of security, encrypted again with server-side S3 encryption using an AWS KMS master key.

Open & Compatible  – CloudHSM is open and standards-compliant, with support for multiple APIs, programming languages, and cryptography extensions such as PKCS #11, Java Cryptography Extension (JCE), and Microsoft CryptoNG (CNG). The open nature of CloudHSM gives you more control and simplifies the process of moving keys (in encrypted form) from one CloudHSM to another, and also allows migration to and from other commercially available HSMs.

More Secure – CloudHSM Classic (the original model) supports the generation and use of keys that comply with FIPS 140-2 Level 2. We’re stepping that up a notch today with support for FIPS 140-2 Level 3, with security mechanisms that are designed to detect and respond to physical attempts to access or modify the HSM. Your keys are protected with exclusive, single-tenant access to tamper-resistant HSMs that appear within your Virtual Private Clouds (VPCs). CloudHSM supports quorum authentication for critical administrative and key management functions. This feature allows you to define a list of N possible identities that can access the functions, and then require at least M of them to authorize the action. It also supports multi-factor authentication using tokens that you provide.

AWS-Native – The updated CloudHSM is an integral part of AWS and plays well with other tools and services. You can create and manage a cluster of HSMs using the AWS Management Console, AWS Command Line Interface (CLI), or API calls.

Diving In
You can create CloudHSM clusters that contain 1 to 32 HSMs, each in a separate Availability Zone in a particular AWS Region. Spreading HSMs across AZs gives you high availability (including built-in load balancing); adding more HSMs gives you additional throughput. The HSMs within a cluster are kept in sync: performing a task or operation on one HSM in a cluster automatically updates the others. Each HSM in a cluster has its own Elastic Network Interface (ENI).

All interaction with an HSM takes place via the AWS CloudHSM client. It runs on an EC2 instance and uses certificate-based mutual authentication to create secure (TLS) connections to the HSMs.

At the hardware level, each HSM includes hardware-enforced isolation of crypto operations and key storage. Each customer HSM runs on dedicated processor cores.

Setting Up a Cluster
Let’s set up a cluster using the CloudHSM Console:

I click on Create cluster to get started, select my desired VPC and the subnets within it (I can also create a new VPC and/or subnets if needed):

Then I review my settings and click on Create:

After a few minutes, my cluster exists, but is uninitialized:

Initialization simply means retrieving a certificate signing request (the Cluster CSR):

And then creating a private key and using it to sign the request (these commands were copied from the Initialize Cluster docs and I have omitted the output. Note that ID identifies the cluster):

$ openssl genrsa -out CustomerRoot.key 2048
$ openssl req -new -x509 -days 365 -key CustomerRoot.key -out CustomerRoot.crt
$ openssl x509 -req -days 365 -in ID_ClusterCsr.csr   \
                              -CA CustomerRoot.crt    \
                              -CAkey CustomerRoot.key \
                              -CAcreateserial         \
                              -out ID_CustomerHsmCertificate.crt

The next step is to apply the signed certificate to the cluster using the console or the CLI. After this has been done, the cluster can be activated by changing the password for the HSM’s administrative user, otherwise known as the Crypto Officer (CO).

Once the cluster has been created, initialized and activated, it can be used to protect data. Applications can use the APIs in AWS CloudHSM SDKs to manage keys, encrypt & decrypt objects, and more. The SDKs provide access to the CloudHSM client (running on the same instance as the application). The client, in turn, connects to the cluster across an encrypted connection.

Available Today
The new HSM is available today in the US East (Northern Virginia), US West (Oregon), US East (Ohio), and EU (Ireland) Regions, with more in the works. Pricing starts at $1.45 per HSM per hour.

Jeff;