Splunk Inc. enables organizations to monitor, search, analyze, visualize and act on massive streams of real-time and historical machine data. More than 4,800 enterprises, universities, government agencies and service providers in more than 85 countries use Splunk software to gain operational intelligence that informs business and customer understanding, improves service and uptime, reduces cost and mitigates cyber-security risk.

The Splunk team needed to launch a service-based version of their software, called Splunk Storm, that customers could use without having to provision hardware, configure a server or install and manage the Splunk Enterprise software. “It wasn’t going to be easy to bootstrap that to our existing software,” says Alex Munk, Splunk Senior Product Manager. “We needed to adapt an enterprise product that ran on-premises and turn it into a service.”

The team also wanted to get started quickly, without making big investments in either staffing or infrastructure. “A huge capital expenditure might have been an anchor for Storm,” says Munk. “We recognized an immediate market opportunity to deliver Splunk as a service, so we needed something we could bring up quickly.”

According to Munk, the decision to use AWS was straightforward. “We decided from the beginning to build a service with the scalability and elasticity you get from the AWS Cloud,” Munk says.

Splunk partnered with Opscode, a Seattle, Washington-based company that provides infrastructure automation solutions to hundreds of customers worldwide. Opscode recommended using its Hosted Chef SaaS offering with the AWS Cloud, which automates everything from basic configuration to application updates.

Opscode and Splunk built Storm around AWS—specifically, Amazon Elastic Compute Cloud (Amazon EC2). “Storm wouldn’t function without AWS,” Munk says.

Lucas Welch, Opscode Senior Communications Manager, agrees. “We love AWS. Hosted Chef in the AWS environment helps the end user get the most out of AWS—deploying applications, updating, and managing servers in a very seamless, code-based way.”

The Splunk team has about 150 instances in production. “It’s a push-button operation,” Munk adds. “It is possible to bring up our service in one hour. If a hurricane hit and we lost all our instances, we believe we could recover the entire service with a single command, using AWS and Opscode.”

The Opscode team uses an API plug-in into Amazon EC2 resources, so that the Splunk team can simply use command-line code to configure, manage and provision server instances.

Splunk Storm uses Amazon EC2 as its infrastructure, and also uses Amazon Elastic Block Store (Amazon EBS), Elastic Load Balancing (ELB), AWS Identity and Access Management (IAM). The team uses Amazon ELB for data input traffic to their API and to load-balance their website traffic. Splunk uses Amazon EBS for much of their data storage, so that they can easily scale up the amount of storage the company needs. The team uses a logical volume manager that adds resiliency to their storage layer.

The team also uses Hosted Chef for orchestration and infrastructure management, Zuora for billing, Salesforce.com for customer relationship management, and Dynect for DNS. They also use Pingdom to monitor their availability and PagerDuty to alert them to any issues.

AWS has made innovation more accessible, Munk says. “AWS is inexpensive enough that if you want to experiment with it, you can experiment without a lot of extra costs. You don’t have to buy a new machine to experiment.”

Splunk was able to avoid capital expenditures by hosting Splunk Storm in the cloud. By using AWS, the team also reduced operating expenditures in both direct compute and storage costs, compared to pricing from various other cloud service vendors. The AWS pay-as-you-go model allows Splunk to pay for machines by the hour, rather than by the month, which enables Splunk to meet increased demand easily without unnecessary additional operational expenditure.

The team also improved both availability and time to market. “We track uptime and service availability, and AWS has surpassed our expectations,” Munk says. “We can also spin up more capacity as we need it. It’s very efficient, particularly for teams who don’t have a lot of resources. It allows small teams to do far more than they would have been able to do otherwise.”

“Here at Splunk, what we’ve traditionally been good at is managing big data. AWS has made it possible to build and manage a large service,” Munk says.

For more information about building website applications on the AWS Cloud, see: http://aws.amazon.com/web-mobile-social.

For more information about how Opscode can help your company run on the AWS Cloud, see Opscode's listing in the AWS Partner Directory.