Best practices for right-sizing Amazon OpenSearch Service domains

Amazon OpenSearch Service is a fully managed service for search, analytics, and observability workloads, helping you index, search, and analyze large datasets with ease. Making sure your OpenSearch Service domain is right-sized—balancing performance, scalability, and cost—is critical to maximizing its value. An over-provisioned domain wastes resources, whereas an under-provisioned one risks performance bottlenecks like high latency or write rejections.

In this post, we guide you through the steps to determine if your OpenSearch Service domain is right-sized, using AWS tools and best practices to optimize your configuration for workloads like log analytics, search, vector search, or synthetic data testing.

Why right-sizing your OpenSearch Service domain matters

Right-sizing your OpenSearch Service domain provides optimal performance, reliability, and cost-efficiency. An undersized domain leads to high CPU utilization, memory pressure, and query latency, whereas an oversized domain drives unnecessary spend and resource waste. By continuously matching domain resources to workload characteristics such as ingestion rate, query complexity, and data growth, you can maintain predictable performance without overpaying for unused capacity.

Beyond cost and performance, right-sizing facilitates architectural agility. It helps make sure your cluster scales smoothly during traffic spikes, meets SLA targets, and sustains stability under changing workloads. Regularly tuning resources to match actual demand optimizes infrastructure efficiency and supports long-term operational resilience.

Key Amazon CloudWatch metrics

OpenSearch Service provides Amazon CloudWatch metrics that offer insights into various aspects of your domain’s performance. These metrics fall into 16 different categories, including cluster metrics, EBS volume metrics, and instance metrics. To determine if your OpenSearch Service domain is misconfigured, monitor these common symptoms that indicate resizing or optimization may be necessary. These are caused by imbalances in resource allocation, workload demands, or configuration settings. The following table summarizes these parameters:

CloudWatch Metrics	Parameter
CPU Utilization Metrics	`CPUUtilization:` Average CPU usage across all data nodes. Optimal range: 60-80% for sustained workloads Primary control plane CPU utilization (for dedicated primary nodes): Average CPU usage on primary nodes. Optimal range: Under normal conditions <50%
Memory Utilization Metrics	`JVMMemoryPressure:` Percentage of heap memory used across data nodes. Optimal range: 65–85% Note: With Garbage First Garbage Collector (G1GC), JVM may delay collections to optimize performance. Evaluate `JVMMemoryPressure` together with GC metrics (Old Gen usage and GC pause time) to confirm true pressure trends. `MasterJVMMemoryPressure`: Heap usage on dedicated primary nodes. Optimal range: <80% Note: Occasional spikes are normal during state updates; sustained high memory pressure warrants scaling or tuning.
Storage Metrics	`StorageUtilization`: Percentage of storage space used. Optimal range: 70–85% `FreeStorageSpace`: Available storage in MB. Critical threshold: When approaching the read-only threshold.
Node Level Search and Indexing Performance (These latencies are not per-request latencies or rate, but at node level based on shards assigned to a node.)	`SearchLatency`: Average time for search requests. Baseline establishment: Monitor during normal operations. `IndexingLatency`: Average time for indexing operations. Impact: Can indicate CPU or I/O bottlenecks. `SearchRate` and `IndexingRate`: Requests per minute for search and indexing. Usage: Correlate with latency metrics to understand performance impact.
Cluster Health Indicators	`ClusterStatus.yellow` and `ClusterStatus.red`: Yellow status: Some replica shards are unassigned. Red status: Some primary shards are unassigned (data loss risk). Nodes What it measures: Number of nodes in the cluster. Usage: Track node failures and recovery patterns.

Signs of under-provisioning

Under-provisioned domains struggle to handle workload demands, leading to performance degradation and cluster instability. Look for sustained resource pressure and operational errors that signal the cluster is running beyond its limits. For monitoring, you can set CloudWatch alarms to catch early signals of stress and prevent outages or degraded performance. The following are critical warning signs:

High CPU utilization for data nodes (>80%) sustained over time (such as more than 10 minutes)
High CPU utilization for primary nodes (>60%) sustained over time (such as more than 10 minutes)
JVM memory pressure consistently high (>85%) for data and primary nodes
Storage utilization reaching high (>85%)
Increasing search latency with stable query patterns (increasing by 50% from baseline)
Frequent cluster status yellow/red events
Node failures under normal load conditions

When resources are constrained, the end-user experience suffers with slower searches, failed indexing, and system errors. The following are key performance impact indicators:

Search timeouts increasing
Indexing delays growing
Circuit breaker exceptions
Rejected execution exceptions in logs

Remediation recommendations

The following table summarizes CloudWatch metric symptoms, possible causes, and potential solutions.

CloudWatch metric symptom	Causes and solution
`FreeStorageSpace` drops <20%	Storage pressure occurs when data volume outgrows local storage due to high ingestion, long retention without cleanup, or unbalanced shards. Lack of tiering (such as UltraWarm) further worsens capacity issues. Solution: Free up space by deleting unused indexes or automating cleanup with ISM and use force merge on read-only indexes to reclaim storage. If pressure persists, scale vertically or horizontally, use UltraWarm or cold storage for older data, and adjust shard counts at rollover for better balance.
`CPUUtilization` and `JVMMemoryPressure` consistently >70%	High CPU or JVM pressure arises when instance sizes are too small or shard counts per node are excessive, leading to frequent GC pauses. Inefficient shard strategy, uneven distribution, and poorly optimized queries or mappings further spike memory usage under heavy workloads. Solution: Address high CPU/JVM pressure by scaling vertically to larger instances (such as from r6g.large to r6g.xlarge) or adding nodes horizontally. Optimize shard counts relative to heap size, smooth out peak traffic, and use slow logs to pinpoint and tune resource-heavy queries.
`SearchLatency` or `IndexingLatency` spikes >500 milliseconds	Thread pool rejections often stem from resource contention like high CPU/JVM pressure or GC pauses. Inefficient shard sizing, over-sharding, and overly complex queries (deep aggregations, frequent cache evictions) further increase overhead and push tasks into rejection. Solution: Reduce query latency by optimizing queries with profiling, tuning shard sizes (10–50 GB each), and avoiding over-sharding. Improve parallelism by scaling the cluster, adding replicas for read capacity, increasing cache through larger nodes, and setting appropriate query timeouts.
`ThreadpoolRejected` metrics indicate queued requests	Thread pool rejections occur when high concurrent requests overflow queues beyond capacity, especially with undersized nodes limited by vCPU-based threads. Sudden unscaled traffic spikes further overwhelm pools, causing tasks to be dropped or delayed. Solution: Mitigate thread pool rejections by enforcing shard balance across nodes, scaling horizontally to boost thread capacity, and managing client load with retries and reduced concurrency. Monitor search queues, right-size instances for vCPUs, and cautiously tune thread pool settings to handle bursty workloads.
`ThroughputThrottle` or `IopsThrottle` reach 1	I/O throttling arises when Amazon EBS or Amazon EC2 limits are exceeded, such as gp3’s 125 MBps baseline, or when burst credits are depleted due to sustained spikes. Mismatched volume types and heavy operations like bulk indexing without optimized storage further amplify throughput bottlenecks. Solution: Address I/O throttling by upgrading to gp3 volumes with higher baseline or provisioning extra IOPS and consider I/O-optimized instances like i3/i4 families while monitoring burst balance. For sustained workloads, scale nodes or schedule heavy operations during off-peak hours to avoid hitting throughput caps.

Signs of over-provisioning

Over-provisioned clusters show consistently low utilization across CPU, memory, and storage, suggesting resources far exceed workload demands. Identifying these inefficiencies helps reduce unnecessary spend without impacting performance. You can use CloudWatch alarms to track cluster health and cost-efficiency metrics over 2–4 weeks to confirm sustained underutilization:

Low CPU utilization for data and primary nodes (<40%) sustained over time
Low JVM memory pressure for data and primary nodes (<50%)
Excessive free storage (>70% unused)
Underutilized instance types for workload patterns

Monitor cluster indexing and search latencies constantly as the cluster is being downsized—these latencies should not increase if the cluster is eliminating unused capacity. Also, it’s recommended to reduce nodes one at a time and continue to observe latencies to continue further downturn. By right-sizing instances, reducing node counts, and adopting cost-efficient storage options, you can align resources to actual usage. Optimizing shard allocation further supports balanced performance at a lower cost.

Best practices for right-sizing

In this section, we discuss best practices for right-sizing.

Iterate and optimize

Right-sizing is an ongoing process, not a one-time exercise. As workloads evolve, continuously monitor CPU, JVM memory pressure, and storage utilization using CloudWatch to make sure they remain within healthy thresholds. Rising latency, queue buildup, or unassigned shards often signal capacity or configuration issues that require attention.

Regularly review slow logs, query latency, and ingestion trends to identify performance bottlenecks early. If search or indexing performance degrades, consider scaling, rebalancing shards, or adjusting retention policies. Periodic reviews of instance sizes and node count help align cost with demand, maintaining 200-millisecond latency targets while avoiding over-provisioning. Consistent iteration helps your OpenSearch Service domain remain performant and cost-efficient over time.

Establish baselines

Monitor for 2–4 weeks after initial deployment and document peak usage patterns and seasonal variations. Record performance during different workload types. Set appropriate CloudWatch alarm thresholds based on your baselines.

Regular review process

Conduct weekly metric reviews during initial optimization and monthly assessments for stable workloads. Conduct quarterly right-sizing exercises for cost optimization.

Scaling strategies

Consider the following scaling strategies:

Vertical scaling (instance types) – Use larger instance types when performance constraints stem from CPU, memory, or JVM pressure, and overall data volume is within a single node’s capacity. Choose memory-optimized instances (such as r8g, r7g, or r7i) for heavy aggregation or indexing workloads. Use compute-optimized instances (c8g, c7g, or c7i) for CPU-bound workloads such as query-heavy or log-processing environments. Vertical scaling is ideal for smaller clusters or testing environments where simplicity and cost-efficiency are priorities.

Horizontal scaling (node count) – Add more data nodes when storage, shard count, or query concurrency increases beyond what a single node can handle. Maintain an odd number of primary-eligible nodes (typically three or five) and use dedicated primary nodes for clusters with more than 10 data nodes. Deploy across three Availability Zones for high availability in production. Horizontal scaling is preferred for large, production-grade workloads requiring fault tolerance and sustained growth. Use _cat/allocation?v to verify shard distribution and node balance:

GET /_cat/allocation/node_name_1,node_name_2,node_name_3

Optimize storage configuration

Use the latest generation of Amazon EBS General Purpose (gp) volumes for improved performance and cost-efficiency compared to earlier versions. Monitor storage growth trends using ClusterUsedSpace and FreeStorageSpace metrics. Maintain data utilization below 50% of total storage capacity to allow for growth and snapshots.

Choose storage tiers based on performance and access patterns—for example, enable UltraWarm or cold storage for large, infrequently accessed datasets. Move older or compliance-related data to cost-efficient tiers (for analytics or WORM workloads) only after ensuring the data is immutable.

Use the _cat/indices?v API to monitor index sizes and refine retention or rollover policies accordingly:

GET /_cat/indices/index1,index2,index3

Analyze shard configuration

Shards directly affect performance and resource usage, so an appropriate shard strategy should be used. The indexes that have heavy ingestion and searches should have a number of shards in the order of number of nodes for better efficiency across all data nodes in the cluster. We recommend keeping shard sizes between 10–30 GB for search workloads and up to 50 GB for log analytics workloads and limit to <20 shards per GB of JVM heap.

Run _cat/shards?v to confirm even shard distribution and no unassigned shards. Evaluate over-sharding by checking JVMMemoryPressure (>80%) or SearchLatency spikes (>200 milliseconds) from excessive shard coordination. Assess under-sharding if IndexingLatency (>200 milliseconds) or low SearchRate indicates limit parallelism. Use _cat/allocation?v to identify unbalanced shard sizes or hot spots on nodes:

GET /_cat/allocation/node_name_1,node_name_2,node_name_3

Handling unexpected traffic spikes

Even well right-sized OpenSearch Service domains can face performance challenges during sudden workload surges, such as log bursts, search traffic peaks, or seasonal load patterns. To handle such unexpected spikes effectively, consider implementing the following best practices:

Enable Auto-Tune – Automatically adjust cluster settings based on current usage and traffic patterns
Distribute shards effectively – Avoid shard hotspots by using balanced shard allocation and index rollover policies
Pre-warm clusters for known events – For expected peak periods (end-of-month reports, marketing campaigns), temporarily scale up before the spike and scale down afterward
Monitor with CloudWatch alarms – Set proactive alarms for CPU, JVM memory, and thread pool rejections to catch early stress indicators

Deploy CloudWatch alarms

CloudWatch alarms perform an action when a CloudWatch metric exceeds a specified value for some amount of time to take remediation action proactively.

Conclusion

Right-sizing is a continuous process of observing, analyzing, and optimizing. By using CloudWatch metrics, OpenSearch Dashboards, and best practices around shard sizing and workload profiling, you can make sure your domain is efficient, performant, and cost-effective. Right-sizing your OpenSearch Service domain helps provide optimal performance, cost-efficiency, and scalability. By monitoring key metrics, optimizing shards, and using AWS tools like CloudWatch, ISM, and Auto Scaling, you can maintain a high-performing cluster without over-provisioning.

For more information about right-sizing OpenSearch Service domains, refer to Sizing Amazon OpenSearch Service domains.

AWS Big Data Blog