We have a more flexible way to manage our logging solution using multiple types of Amazon EBS volumes, and that has helped us save more than 60% of the overall costs of running the solution.
David Bernstein Director, Operations Services Management

Zendesk builds software for better customer relationships. It empowers organizations to improve customer engagement and better understand their customers. More than 94,000 paid customer accounts in over 150 countries and territories use Zendesk products. Based in San Francisco, Zendesk has operations in the United States, Europe, Asia, Australia, and South America.

Several years ago, Zendesk moved its SaaS platform to the Amazon Web Services (AWS) Cloud. “The cloud was the best choice for us, because it matches the agile processes we increasingly have in place here,” says David Bernstein, director, operations services management for Zendesk. The company initially deployed its Elasticsearch, Logstash, and Kibana (ELK) big-data stack on dozens of Amazon Elastic Compute Cloud (Amazon EC2) I2 instances using local-instance storage to meet system requirements around memory and disk performance. Zendesk uses the ELK stack for logging for its DevOps development model.

While its AWS architecture was effective for several years, Zendesk eventually needed a better way to scale its ELK cluster. “We were growing fast as a company, so we needed to scale the cluster, but we were using built-in instance storage, and we always had to add more instances if we wanted more storage,” says Kyle House, a senior software engineer at Zendesk. “That meant our costs were rising, and we didn’t have an easy way to control our storage.”

Zendesk also needed to improve its data-encryption capabilities. “We had to write and maintain a lot of code to ensure all the data on disk was properly encrypted,” says House. “Maintaining encryption logic was a lot of work, and it was error prone.” Finally, Zendesk sought to increase its data-retention window. “We needed to keep 90 days of data, but 30 days was the maximum we would have been able to do with the cost,” says Bernstein.

As Zendesk began exploring a redesign of its ELK cluster, it noticed users were only accessing log data that was a few days old. “Data older than seven days was only used for reporting, and it didn’t require high performance,” House says. During this time, Amazon Elastic Block Store (Amazon EBS) launched its new Throughput Optimized Hard Disk Drive (HDD) st1 volumes and Cold HDD (sc1) HDD-backed volumes, which offer persistent block-level storage volumes leveraging HDDs used with Amazon EC2 instances. “With the announcement of Amazon EBS HDD-backed volumes, we saw we could have more options instead of paying for an SSD-cost model. We knew this would make it more effective for offloading hot data,” House says.

The organization decided to use a tiered storage model for its ELK cluster, using Amazon EBS General Purpose SSD (gp2) volumes and Amazon EBS (st1) volumes, as well as Amazon EBS (sc1) volumes for infrequently accessed data. By using multiple volume types on a single instance, Zendesk gains higher performance by pushing hot I/O to gp2 and st1 volumes and sending less-accessed data onto cheaper sc1 storage as it ages beyond seven days.

Based on the success of its initial ELK deployment, the Zendesk security operations (SecOps) organization then extended the scope of the Amazon EC2 and EBS tiered storage re-architecture to its Splunk-based security information and event management (SIEM) logging platform.

Zendesk can now easily scale its ELK cluster to meet growth. “Scaling our logging solution is fast and easy using Amazon EBS, especially because we’ve separated instance types from storage,” says House. “And on top of that, we have a very predictable pricing model as we scale.” By moving from specialized Amazon EC2 instances and instance storage and taking advantage of a tiered model for its Amazon EBS volumes, Zendesk was able to significantly reduce costs. “We have a more flexible way to manage our logging solution using multiple types of Amazon EBS volumes, and that has helped us save more than 60 percent of the overall costs of running the solution,” says Bernstein. In addition, Zendesk has seen a 50 percent monthly DevOps savings.

The Zendesk SecOps team is also realizing significant savings. “We were going to deploy Splunk based on the Splunk blueprints around specialized instances and ephemeral storage, but we ultimately decided to use the general instances and tiered EBS Volumes we used for our ELK stack,” says Bernstein. “As a result, we saw an 86 percent cost reduction.”

The company also greatly improved its encryption management by relying on Amazon EBS. “Data encryption is now managed by Amazon EBS, so our operational complexity related to encryption has been reduced tremendously,” says Bernstein. “We no longer have to manage and maintain custom code for encryption.”

Also, by decoupling storage allocation from Amazon EC2 instance types, Zendesk was able to extend its data- retention period. “We reduced costs while tripling our retention period using Amazon EBS,” says Bernstein. “By utilizing lower-cost EBS volumes for older data, we stretched it by 200 percent.”

The organization has also increased the availability and stability of its ELK cluster by taking advantage of multiple AWS Availability Zones. “By using multiple AWS Availability Zones, we’ve made our logging system and other applications more resilient and highly available,” says Bernstein. “As our company keeps growing, we have a lot of confidence in the scalability and reliability of our solutions, because of the AWS Cloud.”

Learn more about AWS big-data solutions.