San Francisco-based Librato, a SolarWinds company, provides a real-time cloud-monitoring solution for tracking and understanding the metrics that impact businesses at all levels of the stack. Librato provides everything customers need to visualize, analyze, and receive alerts on the metrics that matter to them. The Librato platform accepts metrics from any source for real-time data aggregation and transformation.
The Librato monitoring platform runs on an Apache Cassandra distributed database system, which is critical to the company’s business. “We rely completely on Cassandra for supporting all customer monitoring data,” says Mike Heffner, director of data engineering at Librato. “We run hundreds of Cassandra instances across multiple Cassandra rings. Because it is our primary data store, we’re always striving to boost the performance, so customers can analyze their data faster,” Heffner says.
Librato also needs agility and scalability. As Heffner explains, “We require the ability to add compute and storage capacity on demand, because our business is growing fast.”
The company also sought to decrease the time required to repair its Cassandra database in the event of a failure. “We often relied on Cassandra to repair itself, but that took several hours, and we could only do one recovery operation at a time,” says Heffner. “We were looking for more operational simplicity overall.”
Since its founding in 2011, Librato has used Amazon Web Services (AWS) to run its monitoring platform. “We originally chose AWS for ease of management and the ability to easily add capacity on demand,” says Heffner. “Many of our customers run on AWS as well, so we built a better product by using the same environment they use.”
Librato initially used 160 Amazon Elastic Compute Cloud (Amazon EC2) I2 instances to support its Cassandra cluster. Because the organization wanted to improve Cassandra performance and scalability, it recently began moving its Cassandra data to C4 instances and Amazon Elastic Block Store (Amazon EBS), a service that offers persistent block-level storage volumes for use with Amazon EC2 instances. “We wanted a higher CPU-to-disk ratio for some of our workloads, and Amazon EBS gives us that in a very cost-effective solution,” says Heffner. C4 instances are the latest generation of compute-optimized instances, offering the highest-performing processors and lowest price-to-compute performance in Amazon EC2.
Librato is utilizing Amazon EBS General Purpose SSD (gp2) volumes for data partitions and Throughput Optimized HDD (st1) volumes to support its commit logs. Attaching different volume types to a single instance, Librato can utilize different volume types for different disk-access patterns. Overall, by optimizing EC2 instances and using different EBS volumes for different needs, Librato gains an ideal price-to-performance ratio.
Using Amazon EBS volumes, Librato has seen considerable improvements in the performance of its Cassandra instances. “We have reduced write latencies across our Cassandra rings by up to 500 milliseconds simply by moving from I2 instances to C4 instances with Amazon EBS volumes,” says Heffner. “That has also contributed to reduced latency for customers when they interact with our API, giving them faster time to data analysis.”
By moving from 160 I2 2xlarge EC2 instances to 96 C4 4xlarge instances and Amazon EBS, Librato gains the flexibility to choose the best instances for the optimal workload performance, helping the company reduce operational costs. “We are seeing at least a 35 percent savings over our previous architecture by using Amazon EBS volumes,” says Peter Norton, senior software engineer for Librato. “With those savings, we can funnel more resources toward new-feature development for the platform.”
Librato also has more agility and scalability using Amazon EBS, which helps the company scale to support business growth. “Adding new nodes and increasing capacity are fast and simple using Amazon EBS,” Norton says. “As our workload keeps growing, we are confident we can scale to support that growth.”
In addition, the company has increased its operational simplicity, which has helped reduce the time to repair its Cassandra environment. “We have a much faster mean time to repair using Amazon EBS,” says Heffner. “It used to take multiple hours to perform a database-recovery operation, but we can do it in minutes now. Using Amazon EBS volumes, we simply need to reattach the block storage to the replacement instance in the event of a failure, and then we can quickly bring the instance back into the Cassandra ring. This means we can also take database snapshots in a shorter time frame, making it easier for us to have data available in a recovery store.”
Librato can also more easily upgrade its Amazon EBS volumes in the future. “We can essentially upgrade a Cassandra ring in place within a matter of hours,” Heffner says. “Previously, that would have taken days to weeks because we would have had to stream all the data off each node as we replaced it.” The organization is planning to enhance its AWS environment by eventually moving to the Amazon Aurora relational database engine and taking advantage of additional AWS services. Heffner says, “As our company continues to grow, we are looking forward to supporting that growth and improving our Cassandra performance with more of the new services Amazon is offering.”
Learn more about cloud storage and database services from AWS.
Watch Librato's Experience Running Cassandra Using EBS, ENIs, and VPC at re:Invent 2016.