How do I set up a Spark SQL JDBC connection on Amazon EMR?

Last updated: 2020-09-14

I want to run SQL queries from a SQL client on my Amazon EMR cluster. How do I configure a Java Database Connectivity (JDBC) driver for Spark Thrift Server so I can do this?

Resolution

1.    Download and install SQuirrel SQL Client.

2.    Connect to the master node using SSH.

3.    On the master node, run the following command to start Spark Thrift Server:

sudo /usr/lib/spark/sbin/start-thriftserver.sh

4.    Copy all .jar files from the /usr/lib/spark/jars directory on the master node to your local machine.

5.    Open SQuirrel SQL Client and create a new driver:
For Name, enter Spark JDBC Driver.
For Example URL, enter jdbc:hive2://localhost:10001.

6.    On the Extra Class Path tab, choose Add.

7.    In the dialog box, navigate to the directory where you copied the .jar files in step 4, and then select all the files.

8.    In the Class Name field, enter org.apache.hive.jdbc.HiveDriver, and then choose OK.

9.    On your local machine, set up an SSH tunnel using local port forwarding:

ssh -o ServerAliveInterval=10 -i path-to-key-file -N -L 10001:localhost:10001 hadoop@master-public-dns-name

10.    To connect to the Spark Thrift Server, create a new alias in SQuirrel SQL Client:
For Name, enter Spark JDBC.
For Driver, enter Spark JDBC Driver.
For URL, enter jdbc:hive2://localhost:10001.
For Username, enter hadoop.

11.    Run queries from SQuirrel SQL Client.


Did this article help?


Do you need billing or technical support?