Python 3.4.3 is installed on my Amazon EMR cluster instances, but the default Python version used by Spark and other programs is Python 2.7.10. How do I change the default Python version to Python 3, and then run a PySpark job?

In your configuration file, change the PYSPARK_PYTHON environment variable to /usr/bin/python3 for the spark-env classification. Example:

[
  {
     "Classification": "spark-env",
     "Configurations": [
       {
         "Classification": "export",
         "Properties": {
            "PYSPARK_PYTHON": "/usr/bin/python3"
          }
       }
    ]
  }
]

Run the following command to change the default Python environment while the EMR cluster is running:

sudo sed -i -e '$a\export PYSPARK_PYTHON=/usr/bin/python3' /etc/spark/conf/spark-env.sh

Spark will use the new configuration for the next job.


Did this page help you? Yes | No

Back to the AWS Support Knowledge Center

Need help? Visit the AWS Support Center

Published: 2016-10-26

Updated: 2018-10-16