I want to export my Hive metastore from an Amazon EMR cluster to an Amazon Relational Database Service (Amazon RDS) MySQL DB instance and then launch a new cluster using the metastore. How do I do that?

1.    Connect to the master node of your Amazon EMR cluster.

2.    Run a mysqldump command to export the metastore to Amazon RDS.

3.    Connect to the Amazon RDS instance and run MySQL commands similar to the following. These commands allow Hive to access the metastore in Amazon RDS.

mysql -h your-rds-hostname.region.rds.amazonaws.com -P 3306 -u usernameForRDS -p
grant all privileges on hive.* to 'myUsername'@'%' identified by 'myPassword'; flush privileges;

4.    Run the following MySQL commands to verify that you can access the Hive tables on the Amazon RDS DB instance.

use hive;
show tables;
select * from TBLS;

5.    Create a JSON file similar to the following, using the name, username, and password of your Amazon RDS DB instance.

     "Classification": "hive-site",
     "Properties": {

6.    Upload the JSON file to an Amazon Simple Storage Service (Amazon S3) bucket.

7.    On the Amazon EMR console, choose Clusters from the navigation pane, and then choose Create Cluster.

8.    Choose Go to advanced options.

9.    Under Edit software settings, choose Load JSON from S3, and then navigate to the file that you uploaded in step #6.

10.   Finish creating the cluster.

11.   Run Hive commands similar to the following to verify that you can access the metastore from the new cluster.

show databases;
use default;
show tables;
describe my_hive_table;

Because the Hive metastore is now stored outside your Amazon EMR cluster, you can also access the metastore from other Amazon EMR clusters. You do not have to run CREATE EXTERNAL TABLE from each cluster.

Did this page help you? Yes | No

Back to the AWS Support Knowledge Center

Need help? Visit the AWS Support Center

Published: 2018-07-27