Why does my AWS Glue test connection fail?
Last updated: 2021-03-17
I want to troubleshoot a failed test connection in AWS Glue.
Check for the following common problems.
- Check connectivity to JDBC data stores: AWS Glue creates elastic network interfaces with private IP addresses in the connection's subnet. This means that AWS Glue can't use the public internet to connect to the data store. If the data store is outside the Amazon Virtual Private Cloud (Amazon VPC), such as an on-premises data store or an Amazon Relational Database Service (Amazon RDS) resource with a public hostname, then the subnet's route table must have a route to a NAT gateway in a public subnet. Otherwise, the connection times out. If the data store is in the Amazon VPC, then confirm that the connection's security groups and network access control list (network ACL) allow traffic to the data store.
- Check the connection's security groups: One of the security groups associated with the connection must have a self-referencing inbound rule that's open to all TCP ports. Similarly, one of the security groups must also be open to all outbound traffic. You can use a self-referencing rule to restrict outbound traffic to the Amazon VPC. For more information, see Setting up a VPC to connect to JDBC Data Stores.
- Check the number of free IP addresses: The number of free IP addresses in the subnet must be greater than the number of data processing units (DPUs) specified for the job. This allows AWS Glue to create elastic network interfaces in the specified subnet.
- Confirm that the subnet can access Amazon Simple Storage Service (Amazon S3): Provide an Amazon S3 endpoint or provide a route to a NAT gateway in your subnet's route table. For more information, see Error: Could not find S3 endpoint or NAT Gateway for subnetId in VPC.
- Check if have you an AWS KMS VPC endpoint: If your Glue Data Catalog is encrypting connections, be sure that you have a route to AWS KMS. For example, this route can be an AWS KMS VPC interface endpoint. For more information, see Connecting to AWS KMS through a VPC endpoint.
- Choose the correct IAM role: The AWS Identity and Access Management (IAM) role that you select for the test connection must have a trust relationship with AWS Glue. An easy way to do this is to choose a service-linked role that has the AWSGlueServiceRole policy attached to it.
- If the connection password is encrypted with AWS Key Management Service (AWS KMS): Confirm that the connection's IAM role allows the kms:Decrypt action for the key. For more information, see Setting up encryption in AWS Glue.
- Check the connection logs: Logs from test connections are located in Amazon CloudWatch Logs under /aws-glue/testconnection/output. Check the logs for error messages.
- Check the SSL settings: If the data store requires SSL connectivity for the specified user, be sure to select Require SSL connection when you create the connection on the console. Don't select this option if the data store doesn't support SSL.
- Check the JDBC username and password: The user who is accessing the JDBC data store must have sufficient access permissions. For example, AWS Glue crawlers require SELECT permissions. A job that writes to a data store requires INSERT, UPDATE, and DELETE permissions.
- Check the JDBC URL syntax: Syntax requirements vary by database engine. For more information, see Adding an AWS Glue connection and review the examples under JDBC URL.
- Connection type: Be sure to choose the correct connection type. When you choose Amazon RDS or Amazon Redshift for Connection type, AWS Glue auto populates the VPC, subnet, and security group.
- DNS problems: To rule out DNS issues, use the data store's public or private IP address as the JDBC URL for the AWS Glue connection. When you do this, you must uncheck Require SSL connection because you're no longer using a domain name.
- Incompatible driver: If the connection fails because of an incompatible driver, provide the correct driver as an extra JAR file in the job properties, along with the failed connection name. (When you specify the connection name as a job property, AWS Glue uses the connection's networking settings, such as the VPC and subnets.) Then, override the default AWS Glue data store drivers by manually creating the Apache Spark data frame using the JAR file that you provided in the job properties. After creating the data frame, you can optionally convert it into an AWS Glue dynamic frame. For more information, see fromDF.
- If the JDBC data store is publicly accessible: Connect to the data store using MySQL Workbench and the JDBC URL. Or, launch an Amazon Elastic Compute Cloud (Amazon EC2) instance that has SSH access to the same subnet and security groups used for the connection. Then, connect to the instance using SSH and run the following commands to test connectivity.
$ dig hostname $ nc -zv hostname port