I want to make sure that data I store in Amazon Elasticsearch Service resources is protected against accidental deletion, application or hardware failures, or outages. What are some best practices I should keep in mind while designing my infrastructure?

To improve the fault tolerance of your Amazon ES domain, keep the following things in mind:

Ensure you have up-to-date index snapshots

Configure automatic daily snapshots to have an up-to-date backup solution for your cluster. In the event your cluster needs to be restored, a recent snapshot can help expedite the process.

Manual index snapshots allow you to create manual backups of the data in your Amazon ES domain and store them in an Amazon S3 bucket on your account. In addition to being a backup, they can help you migrate data from one Amazon ES domain to another.

Monitor the status of your Amazon ES resources

The Monitoring pane of the Amazon ES console provides an overview of the relative health of Elasticsearch clusters.

You can also set up automatic email notifications using Amazon CloudWatch when those metrics reach thresholds that you define, which enables you to make changes to your domain in response to potential issues. For example, you can monitor the AutomatedSnapshotFailure CloudWatch metric to ensure that automated snapshots of your cluster are being taken at regular intervals.

Understand Amazon ES service limits

When planning your infrastructure, keep the Amazon Elasticsearch service limits in mind, especially if you plan to scale up in response to the needs of your application or clients.

Use more than two nodes in a domain

It's best practice to use more than two nodes to avoid issues such as an unintentionally partitioned network (split brain), and it's best to have a replica for each index to avoid potential data loss. If you are not using a dedicated master node, use three or more nodes.

Enable zone awareness for your domain

Enabling zone awareness allocates nodes and replica index shards that belong to an Elasticsearch cluster across multiple Availability Zones in the same region.

Avoid using t2 instance types in production environments

For the best performance in your production environments, we recommend using m3 instances or larger.

Consider using dedicated master nodes

Dedicated master nodes can help guard against issues caused by overloaded data nodes. Dedicated master nodes are especially helpful when one or more of the following is true:

  • Your domain has 10 or more data nodes
  • Your domain has 500 or more shards
  • Your index mapping is complex, with many fields defined across types and indices

Did this page help you? Yes | No

Back to the AWS Support Knowledge Center

Need help? Visit the AWS Support Center

Published: 2016-10-20

Updated: 2017-11-20