How do I troubleshoot "Error Code: 503 Slow Down" on s3-dist-cp jobs in Amazon EMR?

3 minute read

My S3DistCp (s3-dist-cp) job on Amazon EMR job fails due to Amazon Simple Storage Service (Amazon S3) throttling. I get an error message similar to the following: mapreduce.Job: Task Id : attempt_xxxxxx_0012_r_000203_0, Status : FAILED Error: com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.services.s3.model.AmazonS3Exception: Slow Down (Service: Amazon S3; Status Code: 503; Error Code: 503 Slow Down; Request ID: D27E827C847A8304; S3 Extended Request ID: XWxtDsEZ40GLEoRnSIV6+HYNP2nZiG4MQddtNDR6GMRzlBmOZQ/LXlO5zojLQiy3r9aimZEvXzo=), S3 Extended Request ID: XWxtDsEZ40GLEoRnSIV6+HYNP2nZiG4MQddtNDR6GMRzlBmOZQ/LXlO5zojLQiy3r9aimZEvXzo= at com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleErrorResponse(AmazonHttpClient.java:1712)

Short description

"Slow Down" errors occur when you exceed the Amazon S3 request rate (3,500 PUT/COPY/POST/DELETE and 5,500 GET/HEAD requests per second per prefix in a bucket). This often happens when your data uses Apache Hive-style partitions. For example, the following Amazon S3 paths use the same prefix (/year=2019/). This means that the request limit is 3,500 write requests or 5,500 read requests per second.

s3://awsexamplebucket/year=2019/month=11/day=01/mydata.parquet
s3://awsexamplebucket/year=2019/month=11/day=02/mydata.parquet
s3://awsexamplebucket/year=2019/month=11/day=03/mydata.parquet

If increasing the number of partitions isn't an option, reduce the number of reducer tasks or increase the EMR File System (EMRFS) retry limit to resolve Amazon S3 throttling errors.

Resolution

Use one of the following options to resolve throttling errors on s3-dist-cp jobs.

Reduce the number of reduces

The mapreduce.job.reduces parameter sets the number of reduces for the job. Amazon EMR automatically sets mapreduce.job.reduces based on the number of nodes in the cluster and the cluster's memory resources. Run the following command to confirm the default number of reduces for jobs in your cluster:

$ hdfs getconf -confKey mapreduce.job.reduces

To set a new value for mapreduce.job.reduces, run a command similar to the following. This command sets the number of reduces to 10.

$ s3-dist-cp -Dmapreduce.job.reduces=10 --src s3://awsexamplebucket/data/ --dest s3://awsexamplebucket2/output/

Increase the EMRFS retry limit

By default, the EMRFS retry limit is set to 4. Run the following command to confirm the retry limit for your cluster:

$ hdfs getconf -confKey fs.s3.maxRetries

To increase the retry limit for a single s3-dist-cp job, run a command similar to the following. This command sets the retry limit to 20.

$ s3-dist-cp -Dfs.s3.maxRetries=20 --src s3://awsexamplebucket/data/ --dest s3://awsexamplebucket2/output/

To increase the retry limit on a new or running cluster:

New cluster: Add a configuration object similar to the following when you launch a cluster.
Running cluster: Use the following configuration object to override the cluster configuration for the instance group (Amazon EMR release versions 5.21.0 and later).

[
    {
      "Classification": "emrfs-site",
      "Properties": {
        "fs.s3.maxRetries": "20"
      }
    }
]

When you increase the retry limit for the cluster, Spark and Hive applications can also use the new limit. Here's an example of a Spark shell session that uses the higher retry limit:

spark> sc.hadoopConfiguration.set("fs.s3.maxRetries", "20")
spark> val source_df = spark.read.csv("s3://awsexamplebucket/data/")
spark> source_df.write.save("s3://awsexamplebucket2/output/")

Related information

Best practices design patterns: optimizing Amazon S3 performance

Why does my Spark or Hive job on Amazon EMR fail with an HTTP 503 "Slow Down" AmazonS3Exception?

Topics

Analytics

Relevant content

Attack from 34.207.209.209 and server goes down after 503 error
Accepted Answer
Sophoscriptor
asked 4 months ago
Receiving S3 503 slow down responses
Accepted Answer
abhigupta-mb
asked a year ago
Can I run S3DistCp on EMR Serverless?
nikos64
asked a year ago
Much slow down of responses of s3select requests several times today
yyfyyf123123
asked 4 years ago
EMRFS and S3 503 slow down responses
bruce ritchie
asked 2 years ago
Why does my Spark or Hive job on Amazon EMR fail with an HTTP 503 "Slow Down" AmazonS3Exception?
AWS OFFICIALUpdated 2 years ago
How do I troubleshoot a HTTP 500 or 503 error from Amazon S3?
AWS OFFICIALUpdated 15 days ago
How do I prevent "rate exceeded" ThrottlingException errors when I run monitoring scripts in Amazon EMR?
AWS OFFICIALUpdated a year ago
How do I troubleshoot "ThrottlingException" and "Rate exceeded" errors in Amazon SNS?
AWS OFFICIALUpdated 9 months ago
Troubleshooting HTTP 5xx errors from Amazon S3
EXPERT
Gayathri Krishnamoorthy
published a year ago