How to Reduce AWS Storage Costs for Splunk Deployments Using SmartStore

By Devendra Singh, Partner Solutions Architect at AWS
By Jae Jung, Global Strategic Alliances Sales Engineer – APAC at Splunk

It can be overwhelming for organizations to keep pace with the amount of data being generated by machines every day.

Forbes estimates that around 2.5 quintillion bytes of data is generated each day through mediums such as the Internet of Things (IoT), websites, and IT services, to name a few. This data is a great source of meaningful information that can be extracted by organizations, but these companies need software vendors to develop tools that help.

Splunk is an AWS Partner Network (APN) Advanced Technology Partner with multiple AWS Competencies is key solution areas such as Data & Analytics, DevOps, and Security. Its popular big data platform that has seen widespread adoption globally.

In a wide range of use cases ranging from cyber security and network operations to the expanding adoption of IoT and machine learning, Splunk software and cloud services enable customers to search, monitor, analyze, and visualize machine-generated big data.

In this post, we will introduce you to Splunk SmartStore and how it helps customers to reduce storage cost in a Splunk deployment on Amazon Web Services (AWS).

Solution Overview

Traditionally and until recently, Splunk workloads on AWS were mirrors of their on-premises deployments, installed on an array of Amazon Elastic Compute Cloud (Amazon EC2) instances and attached Amazon Elastic Block Store (EBS) volumes.

These workloads rarely took advantage of additional AWS services such as Amazon Simple Storage Service (Amazon S3). But customers kept asking about Splunk’s compatibility with Amazon S3 considering all of the redundancy and cost benefits built into the service.

This issue was solved with the release and refinement of SmartStore for Splunk Enterprise. SmartStore reduces overall cost of ownership (TCO), efficiently reallocates infrastructure spend, and brings all of the benefits of S3 to Splunk deployments on AWS.

SmartStore for Splunk Enterprise

SmartStore finally brings decoupling of storage and compute to the indexer tier, which has traditionally had the highest infrastructure demands based on a cost and performance point of view.

In the past, search peers would have been provisioned or built with a fixed amount of storage and compute, and organizations would have considered joining additional peers to the cluster whenever compute was fully consumed, even if storage headroom remained, or vice versa.

With the separation of compute resources and storage onto Amazon S3, however, organizations can now organically scale search peers into Splunk indexer clusters to resolve compute constraints without unnecessary investment into storage.

SmartStore works by moving “warm” or “cold” buckets (i.e. Splunk containers of indexed data that is no longer being actively written) to Amazon S3 via API. The search peers can still operate in an indexer cluster, but each peer contains a cache manager that handles the writing and retrieval of buckets from S3, as required.

SmartStore has been available since Splunk 7.2, and the recent release of Splunk 7.3 has enabled Data Model Acceleration (DMA) support for SmartStore-enabled indexes. This was a critical path for customers using products with heavy DMA usage, with Splunk Enterprise Security being the most obvious.

As SmartStore can be enabled on a per-index basis, customers can choose to use it for all of their data, or just a subset to start before migrating their indexer data completely.

Configuring SmartStore to Use Amazon S3

Since SmartStore leverages Amazon S3 for storage, users must begin the configuration by creating an S3 bucket with the appropriate permissions.

When configuring S3 buckets:

They must have read, write, and delete permissions.
If the indexers are running on Amazon EC2, provision the S3 buckets for the same region as the Amazon EC2 instances that use it.
As a best practice, use an AWS Identity and Access Management (IAM) role for S3 access when deploying indexers on AWS.

Step 1: Create an Amazon S3 Bucket

You can create an Amazon S3 bucket from the AWS Management Console, or by using the following command line syntax:

aws s3api create-bucket --bucket <bucketname> --region <regionID) --create-bucket-configuration LocationConstraint=<regionID>

Example:

aws s3api create-bucket --bucket splunk-index-singapore --region ap-southeast-1 --create-bucket-configuration LocationConstraint=ap-southeast-1

Note that ap-southeast-1 is the nomenclature for the AWS Singapore Region. Also note that bucket names are unique and you can’t use the splunk-index-singapore bucket name again, so choose a different bucket name for your deployment.

Step 2: Configure Amazon S3 Access from Splunk Indexers

There are two approaches to configuring Amazon S3 buckets:

Approach 1: Configure an IAM role with the required permissions to access your S3 bucket, and configure the Amazon EC2 instances with Splunk indexer capability to use that IAM role. This is the recommended approach and helps to avoid sharing security credentials and access keys when configuring SmartStore.
.
Approach 2: This is not the recommended approach, but you can use an AWS access key to access the S3 bucket. If you don’t have the access keys, you can create an IAM user or use an existing user and generate the access key, or use the access key of an existing user with the required permissions to access the S3 bucket. For more information on how to generate AWS access keys, please see the documentation

Step 3: Configure Splunk Indexer to Use Amazon S3 by Editing indexes.conf File

This example configures SmartStore indexes, using an Amazon S3 bucket as the remote object store.

The SmartStore-related settings are configured at the global level, which means all indexes are SmartStore-enabled and all use a single remote storage volume, named remote_store. This example also creates one new index called cs_index.

[default]
# Configure all indexes to use the SmartStore remote volume called
# "remote_store".
# Note: If you want only some of your indexes to use SmartStore,
# place this setting under the individual stanzas for each of the
# SmartStore indexes, rather than here.
remotePath = volume:remote_store/$_index_name
 
repFactor = auto
 
# Configure the remote volume
[volume:remote_store]
storageType = remote
 
# On the next line, the volume's path setting points to the remote storage location
# where indexes reside. Each SmartStore index resides directly below the location
# specified by the path setting. The <scheme> identifies a supported remote
# storage system type, such as S3. The <remote-location-specifier> is a
# string specific to the remote storage system that specifies the location
# of the indexes inside the remote system.
# This is an S3 example: "path = s3://mybucket/some/path".
 
path = s3://mybucket/some/path

# The following S3 settings are required only if you're using the access and secret
# keys. They are not needed if you are using AWS IAM roles.
 
remote.s3.access_key = <S3 access key>
remote.s3.secret_key = <S3 secret key>

# An example value for below would be https://https://s3-us-west-2.amazonaws.com
remote.s3.endpoint = https:|http://<S3 host>


# This example stanza configures a custom index, "cs_index".
[cs_index]
homePath = $SPLUNK_DB/cs_index/db
# SmartStore-enabled indexes do not use thawedPath or coldPath, but you must still specify them here.
coldPath = $SPLUNK_DB/cs_index/colddb
thawedPath = $SPLUNK_DB/cs_index/thaweddb

# Additional parameters that should be changed for SmartStore
# Splunk bucket sizes are reset to 750MB (auto) for efficient swapping
maxDataSize = auto

# hot to warm transition and data upload frequency
maxHotBuckets = 3
maxHotIdleSecs = 0
maxHotSpanSecs = 777600

# Per index cache preferences
hotlist_recency_secs = 86400
hotlist_bloom_filter_recency_hours = 360

Once the configuration is complete, Splunk indexers will be ready to use Amazon S3 to store warm and cold data.

The key difference with SmartStore is the remote Amazon S3 bucket becomes the location for master copies of warm buckets, while the indexer’s local storage is used to cache copies of warm buckets currently participating in a search or that have a high likelihood of participating in a future search.

Summary

In this post, we enabled our Splunk indexers to store data on Amazon S3, while being able to return search results in a performant way.

Amazon S3 offers highly resilient, highly secure, and highly available data storage in a cost effective way. Organizations can use S3 to store data for Splunk that has traditionally resided on persistent EBS volumes.

Additionally, Splunk Enterprise now supports the use of SmartStore for almost all of the major use cases solved with Splunk Enterprise, including enterprise security.

Many customers are using SmartStore to reduce the size of Amazon EBS volumes and moving data to S3. This switching brings down the cost for storage, as S3 is cheaper in comparison to EBS volumes.

For more information on SmartStore please see the documentation.
.

Splunk – APN Partner Spotlight

Splunk is an AWS Competency Partner. Its software and cloud services enable customers to search, monitor, analyze, and visualize machine-generated big data from websites, applications, servers, networks, IoT, and mobile devices.

Contact Splunk | Solution Overview | AWS Marketplace

*Already worked with Splunk? Rate this Partner

*To review an APN Partner, you must be an AWS customer that has worked with them directly on a project.