I want to run SQL queries from a SQL client on my Amazon EMR cluster. How do I configure a Java Database Connectivity (JDBC) driver for Spark Thrift Server so I can do this?

Note: The following steps require the SQuirrel SQL client. Download and install SQuirrel SQL before proceeding.

1.    On the master node of your Amazon EMR cluster, run the following command to start Spark Thrift Server:

sudo /usr/lib/spark/sbin/start-thriftserver.sh

2.    Copy all the .jar files from the /usr/lib/spark/jars directory on the master node to your local machine.

3.    Open SQuirrel SQL and create a new driver.
For Name, enter Spark JDBC Driver
For Example URL, enter jdbc:hive2://localhost:1000

4.    On the Extra Class Path tab, choose Add.

5.    In the dialog box, navigate to the directory where you copied the .jar files in step 2, and then select all the files.

6.    In the Class Name field, enter org.apache.hive.jdbc.HiveDriver, and then choose OK.

7.    Run a command on your local machine similar to the following to set up an SSH tunnel using local port forwarding:

ssh -o ServerAliveInterval=10 -i path-to-key-file -N -L 10001:localhost:10001 hadoop@master-public-dns-name

8.    To connect to the Spark Thrift Server, create a new alias in SQuirrel SQL.
For Name, enter Spark JDBC
For Driver, enter Spark JDBC Driver
For URL, enter jdbc:hive2://localhost:10001
For Username, enter hadoop

You should now be able to run queries from the SQuirrel SQL client.

Did this page help you? Yes | No

Back to the AWS Support Knowledge Center

Need help? Visit the AWS Support Center

Published: 2018-09-24