My Amazon EMR job fails with an HTTP 503 "Slow Down" AmazonS3Exception

Last updated: 2019-11-21

My Amazon EMR job fails with an HTTP 503 "Slow Down" AmazonS3Exception:

java.io.IOException: com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.services.s3.model.AmazonS3Exception: Slow Down (Service: Amazon S3; Status Code: 503; Error Code: 503 Slow Down; Request ID: 2E8B8866BFF00645; S3 Extended Request ID: oGSeRdT4xSKtyZAcUe53LgUf1+I18dNXpL2+qZhFWhuciNOYpxX81bpFiTw2gum43GcOHR+UlJE=), S3 Extended Request ID: oGSeRdT4xSKtyZAcUe53LgUf1+I18dNXpL2+qZhFWhuciNOYpxX81bpFiTw2gum43GcOHR+UlJE=

Short Description

This error occurs when you exceed the Amazon Simple Storage Service (Amazon S3) request rate (3,500 PUT/COPY/POST/DELETE and 5,500 GET/HEAD requests per second per prefix in a bucket). There are two ways to resolve this problem:

  • Reduce the number of Amazon S3 requests.
  • Add more prefixes to the S3 bucket.

Resolution

Before you can identify the issue with too many requests, first configure Amazon CloudWatch request metrics

Configure CloudWatch request metrics

To monitor Amazon S3 requests, enable CloudWatch request metrics for the bucket. Then, define a filter for the prefix. For a list of useful metrics to monitor, see Amazon S3 CloudWatch Request Metrics.

After you enable metrics, use the data in the metrics to determine which of the following resolutions is best for your use case.

Reduce the number of Amazon S3 requests

  • If multiple concurrent jobs (Spark, Apache Hive, or s3-dist-cp) are reading or writing to same Amazon S3 prefix: Reduce the number of concurrent jobs. If you configure cross-account access for Amazon S3, keep in mind that other accounts might also be submitting jobs to the prefix.
  • If the error happens when the job tries to write to the destination bucket: Reduce the parallelism of the jobs. For example, use Spark .coalesce() or .repartition() operations to reduce number of Spark output partitions before writing to Amazon S3. You can also reduce the number of cores per executor or reduce the number of executors.
  • If the error happens when the job tries to read from the source bucket: Reduce the number of files to reduce the number of Amazon S3 requests. For example, use s3-dist-cp to merge a large number of small files into a smaller number files of large files.

Add more prefixes to the S3 bucket

Another way to resolve "Slow Down" errors is to add more prefixes to the S3 bucket. There are no limits to the number of prefixes in a bucket. The request rate applies to each prefix, not the bucket. For example, if you create three prefixes in a bucket like this:

  • s3://awsexamplebucket/images
  • s3://awsexamplebucket/videos
  • s3://awsexamplebucket/documents

then you can make 10,500 write requests or 16,500 read requests per second to that bucket.