The Internet of Things on AWS – Official Blog

Samsung Selects AWS IoT for Cloud Print with Help from ClearScale

Background

ClearScale was founded in 2011, with a focus on delivering best-of-breed cloud systems integration, application development, and managed services. We are an AWS Premier Consulting Partner with competencies in Migration, DevOps, Marketing & Commerce, Big Data, and Mobile. Our experienced engineers, architects, and developers are all certified or accredited by AWS.

We have a long track record of successful IoT projects and the proven ability to design and automate IoT platforms, build IoT applications and create infrastructure on which connected devices can easily and securely interact with each other, gather and analyze data and provide valuable insights to your business and customers.

Our firm is unique in that we offer a breadth of technical experience coupled with a holistic organizational view. This allows us to truly partner with our customers and translate complex business requirements into solid, scalable, cloud-optimized solutions. At ClearScale, we understand best practices for driving the maximum business value from cloud deployments.

Samsung partnered with our firm to launch a Cloud Solutions Platform for delivering robust infrastructure and printing solutions at cloud-scale for any device from any location. In order to architect the device management component of the platform, we conducted a competitive analysis between the AWS IoT and the incumbent solution based on Ejabberd messaging platform.

With the goal of this effort focused around delivering to Samsung a methodology that would allow them to get the most reliable printing services for their customer base, the analysis needed to leverage a key item; the device management component. This component handles the authentication and messaging between devices, in this case printers, and the Cloud infrastructure. In addition, it allows for collecting instrumentation data from the devices for later analysis which in turn would allow Samsung to understand the health and utilization of each device to identify issues that required remote troubleshooting and subsequent proactive maintenance.

High Level Application Overview: 

Defining the Test Rules

Working with Samsung, we defined a set of criteria for evaluating AWS IoT versus Ejabberd for their device management capability. The attributes were prioritized and weighted based on Samsung’s business requirements. While these key areas are applicable to any IoT evaluation the subsequent scoring methodology may differ somewhat depending on the client’s specific use case(s) and requirements.

The analysis needed to address two major areas: functional testing and load testing. For the functional testing, we wanted to compare the Eiabberds’ solution to AWS IoT evaluating each solution’s core capabilities, security posturing and the ubiquity of its technology. For the load testing, we needed to understand the availability, scalability, maintainability, performance and reliability of each solution so that the metrics gathered around each area of concern could be applied to a scoring matrix as shown below.

* A score was awarded for each quality attribute, with a total score being the sum of all scores for the quality attributes. The maximum total score for a solution was deemed to be 100.

Functional Testing

Functional testing was performed first, with the goal of ensuring each system could fulfill the defined functional requirements, and only after which were the more expensive “load testing” performed. We deployed a small environment for Ejabberd and configured the AWS IoT service, so that they were functionally identical. Five functional tests were performed to validate the solutions and both solutions satisfied Samsung’s requirements without any issues.

Load Testing

Defining the Scenarios

Before comparing Ejabberd and AWS IoT, we needed to design the load testing criteria by opting to run two distinct scenarios.

  1. Simulate peak load conditions
  2. Demonstrate system stability

The message rates were calculated from the following profile:

  • Consumer (2-3 jobs per week)
  • SMB (10-20 jobs per week)
  • Enterprise (150-300 jobs per week)
  • Proposed distribution: 50%, 30%, 20%
  • Total number of agents: 500,000

AVERAGE NUMBER OF MESSAGES PER SECOND

AvgMsgs = MsgsPerJob * NumOfAgents * JobsPerWeek / SecondsPerWeek

= 2 * 500,000 * 300 / (7 * 24 * 60 * 60)
= 496.032

Where:

  • MsgsPerJob = Number of messages resulting from each job (2; see note)
  • AvgJobs = Average number of jobs per second
  • NumOfAgents = Total number of agents (500,000)
  • JobsPerWeek = Number of jobs a week per one agent
  • SecondsPerWeek = Number of seconds in a week (7 * 24 * 60 * 60)

Note: Results are doubled due to SCP behavior. For each job, XoaCommMtgSrv sends a PING message to an Agent. After the Agent executes the job, XoaCommMtgSrv sends another PING message to XCSP Service.

MAXIMUM NUMBER OF MESSAGES PER SECOND

  • Number of jobs executed during busy hours: 90%
  • Number of busy hours per week: 10 (2 hours per day; 5 days per week)

MaxMsgs = MsgsPerJob * BusyHourJobs * NumOfAgents * JobsPerWeek / BusyHours

= 2 * 0.9 * 500,000 * 240 / 36,000
= 6,000

Where:

  • MsgsPerJob = Number of messages resulting from each job (2; see note)
  • BusyHourJobs = Percentage of jobs expected to be executed during busy hours (90% = 0.9)
  • NumOfAgents = Total Number of agents (500,000)
  • JobsPerWeek = Number of jobs a week per one agent
  • BusyHours = Number of seconds in busy hours a week (2 * 5 * 3600)

Load Generation

We selected Apache JMeter as our load generation engine. It is an extensible solution with which customized tests are easy to develop. The product is widely used and has strong community support.

“The Apache JMeter™ application is open source software, a 100% pure Java application designed to load test functional behavior and measure performance. Apache JMeter may be used to test performance on static/dynamic resources and dynamic web applications. It can be used to simulate a heavy load on a server, group of servers, network or object to test its strength or to analyze overall performance under different load types.”

Ejabberd and AWS IoT utilize different protocols, so we developed custom plugins for Apache JMeter (XMPP and MQTT, respectively). The plugins allowed us to create custom logging for deeper analysis and address connection persistence and manage secure connections. Our goal was to have the load generation closely emulate the actual system functionality including connection security and persistence. This included requests/messages from devices (Agents) as well as requests/responses from the Samsung’s device management application (XoaCommMtgSrv).

By using an existing tool and extending its functionality, we reduced the overall time needed to develop the load generation code. The following custom JMeter plugins were created to provide capabilities required by the test methodology:

  • MQTT protocol plugin for JMeter – used for AWS IoT testing
  • XMPP protocol plugin for JMeter – used for Ejabberd testing

There are several reasons to use custom plugins:

  • The test model can more closely emulate the actual system
  • Emulate a few number of XoaCommMtgSrv servers and a huge number of Agents
  • Support persistent connections – not supported by existing plugins
  • Support secure connections – not supported by existing plugins

Custom logging

  • Distinguish XoaCommMtgSrv server actions from Agent actions
  • Associate specific JMeter engine node to the XoaCommMtgSrv/Agent a log messages
  • Capture job execution sequences and identify out-of-order job processing
  • Enable low level debugging

The JMeter test plans for each solution have the same high-level behavior:

While testing the JMeter MQTT plugin, we determined that a single JMeter engine node was capable of emulating 8,000 agents without a performance bottleneck. In order to emulate 500,000 agents, as called for by the test methodology, we used 64 JMeter engine nodes for AWS IoT load generation.

While testing the JMeter XMPP plugin, it was discovered that a single JMeter engine node was capable of simulating 6,500 agents without a performance bottleneck. In order to emulate 500,000 agents as called for by the Test Methodology, we used 80 JMeter engine nodes for Ejaberd load generation. This was an important step to ensure that the metrics were not skewed by limitations on the load generation side of the equation.

We deployed the JMeter management node and engine nodes on C4.xlarge EC2 instances. The JMeter cluster was deployed within a single Availability Zone (AZ) for simplicity.

Test Execution

Preparing to load test AWS IoT (MQTT message broker) was a straight forward process. We configured the service and AWS handled all of the resources and scaling behind the scenes. To properly simulate unique devices, we generated 512,000 client certificates and policy rules. These certificates and policies were required for clients to authenticate to the MQTT message broker provided by AWS IoT.

Preparing the Ejabberd environment took a bit more effort; we needed to conduct single node load tests to identify suitable instance sizes and maximum capacity of each node. They elected to run the full load tests against two instance types and deployed two Ejabberd clusters (attached to MySQL on EC2) using c4.2xlarge instances with 9 nodes and c4.4xlarge instances with 4 nodes. In order to replicate real-world scenarios, we provisioned an extra node per cluster for HA purposes.

For Stability and Busy Hours testing, the following configurations were used:

  • c4.2xlarge with 9 nodes
  • c4.4xlarge with 4 nodes

Table: Ejabberd Single Node Limits


The common bottleneck for both instance types is “Auth Rate”. To be able to support 1,500 auth/sec it’s needed to have 3 c4.2xlarge instances. Because of High Availability requirement, we added 1 extra instance for a total of 4 nodes in the cluster. We used the same formula to calculate the 9 node cluster of c4.2xlarge instances.

We ran two iterations of the Peak test scenario and two iterations of the Stability test scenario in order to compare results. They cleared the JMeter engines of previous test data and temporary files and restarted the instances to ensure the load generation platform was clean and would provide accurate and reliable results from one test run to the next without having the results skewed by previous test result data.

Test Results

AWS IOT

General Information

Both test cases for AWS IoT were passed. The number of errors was less than 0.01%.

Table: AWS IoT Load Test Results

“Error Distribution” diagrams show cumulative number of errors that happen during time. The relationship is almost linear.

Stability Load Testing

Table: Stability Testing – Summary

Diagram: Stability Testing – Message Latency Histogram

Notes:

Histograms for all tests represent a distribution of message latency (the amount of time needed to send a message from a publisher to a subscriber). The values will differ from real values because testing environment is located in the same Region as tested services. But in a real life scenarios, agents will be global so Internet related delays will apply.

The purpose of the histograms presented in this document is to show if there are any delays related to buffering or overload (service degradation)

Diagram: Stability Testing – Error Distribution (Cumulative)

Busy Hour Load Testing

Table: Busy Hour Load Testing – Summary

Diagram: Busy Hour Load Testing – Message Latency Histogram

Diagram: Busy Hour Load Testing – Error Distribution (Cumulative)

Notes:

During the first test 1712 threads lost their connection (16-37 threads on each engine node) between 22:39:17 – 22:41:52 UTC. Threads were reconnected to different AWS IoT endpoint IP’s.

All threads reconnected successfully, but only after the message receive timeout. In this case AWS IoT was dropping messages because there were no agents subscribed to topics, and this can’t be considered as an AWS IoT error.

It was decided to normalize the first diagram by removing the data for that time period.­­

EJABBERD
General Information
Stability and busy hour load test cases were for AWS Ejabberd both passed. The number of errors is less than 0.01%.

Stability Load Testing
Test case was executed twice for each instance size, and was passed without errors.

Table: Stability Testing – Summary

Notes:

  • All tests were finished successfully
  • Test #1 for c4.4xlarge was stopped because of the overtime. One message was not received due test stop

Diagram: Stability Testing – Message Latency Histogram (c4.2xlarge)

Diagram: Stability Testing – Message Latency Histogram (c4.4xlarge)

Diagram: Stability Testing – Error Distribution (c4.2xlarge)

Diagram: Stability Testing – Error Distribution (c4.4xlarge)

Busy Hour Load Testing

Table: Stability Testing – Summary

Diagram: Busy Hour Load Testing – Message Latency (c4.2xlarge)

Diagram: Busy Hour Load Testing – Message Latency (c4.4xlarge)

Diagram: Busy Hour Load Testing – Error Distribution (c4.2xlarge)

Diagram: Busy Hour Load Testing – Error Distribution (c4.4xlarge)

Comparing Results
At the conclusion of the load testing we found the following:

The analysis showed that both solutions could provide very comparable services for the load profile and use cases.

Cost Analysis

We conducted a cost comparison based on capital expenses (CAPEX) and operational expenses (OPEX). For this particular analysis, they defined CAPEX as the cost of development and deployment of the given solution. OPEX was defined as monthly/yearly infrastructure and maintenance costs. For ease of calculations, they did not include human resource and common organizational expenses for this exercise.

CAPEX costs are based on actual work, performed by ClearScale, for other clients to develop and deploy similar solutions.

Upon further review it was apparent that the AWS IoT solution was extremely cost effective from a capital expenditure perspective. The huge difference in CAPEX costs also indicated that AWS IoT would take less time to deploy.

Conclusion

The AWS IoT solution scored higher in Availability, Maintainability and Cost. Ejabberd did score higher on Message Reliability which carried the lowest weight and priority on our scoring matrix based on the criteria and requirements provided by Samsung.

Table: AWS IoT Results Summary Table

Table: Ejabberd Results Summary Table

Samsung had two main objectives they were attempting to answer with this analysis:

  • “How does this affect our customers?” AWS IoT provides the availability, consistency, and security that deliver the best possible service. This enables Samsung to keep printers online and operational so that their customers can experience uninterrupted printing services.
  • “How does this affect our innovation?” (We can define innovation as the time a developer spends on creating new services) As we can see from the level of effort required to setup our testing environments, the AWS IoT solution is much easier to deploy than the Ejabberd clusters. We did not have any overhead for performance tuning or system scaling. The best part of AWS IoT is that there is zero maintenance effort moving forward. The time and money saved can be redirected to creating new products and features for customers.

We were able to demonstrate to Samsung that the better solution was AWS IoT. By reviewing the test results and comprehensive cost analysis, they were able to provide a solution to Samsung that would meet the requirements that were set forth, provide a solution that would scalable and maintainable, and deliver an improved customer experience by leveraging new and innovative technologies.

Learn more about ClearScale IoT