Google BigQuery Connector for AWS Glue

Easily connect to Google BigQuery from AWS Glue

Overview

The Google BigQuery Connector for AWS Glue simplifies the process of connecting AWS Glue jobs to extract data from BigQuery, and also load data into BigQuery. This connector provides comprehensive access to BigQuery data, facilitating cloud ETL processes for operational reporting, backup and disaster recovery, data governance, and more.

Highlights

* Connect to Google BigQuery from AWS Glue Jobs * Simplify data extracts from Google BigQuery * Simplify data loads to Google BigQuery

Details

Sold by

Amazon Web Services

Unlock automation with AI agent solutions

Fast-track AI initiatives with agents, tools, and solutions from AWS Partners.

Explore AI agent solutions

Features and programs

Financing for AWS Marketplace purchases

AWS Marketplace now accepts line of credit payments through the PNC Vendor Finance program. This program is available to select AWS customers in the US, excluding NV, NC, ND, TN, & VT.

View financing details

Pricing

Google BigQuery Connector for AWS Glue

Info

View purchase options

This product is available free of charge. Free subscriptions have no end date and may be canceled any time.

Additional AWS infrastructure costs may apply. Use the AWS Pricing Calculator to estimate your infrastructure costs.

Vendor refund policy

No Refunds

How can we make this page better?

We'd like to hear your feedback and ideas on how to improve this page.

Legal

Vendor terms and conditions

Upon subscribing to this product, you must acknowledge and agree to the terms and conditions outlined in the vendor's End User License Agreement (EULA) .

Content disclaimer

Vendors are responsible for their product descriptions and other product content. AWS does not warrant that vendors' product descriptions or other product content are accurate, complete, reliable, current, or error-free.

Usage information

Info

Version

Delivery option

Delivery details

Glue 3.0

Supported services: Learn more

Amazon ECS
Amazon EKS

Container image

Containers are lightweight, portable execution environments that wrap server application software in a filesystem that includes everything it needs to run. Container applications run on supported container runtimes and orchestration services, such as Amazon Elastic Container Service (Amazon ECS) or Amazon Elastic Kubernetes Service (Amazon EKS). Both eliminate the need for you to install and operate your own container orchestration software by managing and scheduling containers on a scalable cluster of virtual machines.

Version release notes

Google BigQuery Connector for AWS Glue 0.24.2.

This version is built with spark-bigquery-connector 0.24.2.
This version is compatible with AWS Glue 3.0, 2.0 and 1.0.
This version supports both read from and write into Google BigQuery.

Additional details

Usage instructions

Please subscribe to the product from AWS Marketplace and Activate the Glue connector from AWS Glue Studio .

Pre-requisite

An account in Google Cloud, specifically a service account that has permissions to Google BigQuery
GCP credentials (service_account_json_file)
GCS bucket (only for writes)
BigQuery dataset (only for writes)
AWS Secrets Manager secret (you can create the secret in following steps)

Create a new secret for Google BigQuery in AWS Secrets Manager

We create a secret in AWS Secrets Manager to store the Google service account file contents as a base64-encoded string.

1.Download the service account credentials JSON file from Google Cloud.

For base64 encoding, you can use one of the online utilities or system commands to do that. For Linux and Mac, you can use base64 [service_account_json_file] to print the file contents as a base64-encoded string.

On the Secrets Manager console, choose Store a new secret.
For Secret type, select Other type of secret.
Enter your key as credentials and the value as the base64-encoded string.
Leave the rest of the options at their default.
Choose Next.
Give a name to the secret bigquery_credentials.
Follow through the rest of the steps to store the secret.

Connection options

You can pass the following options to the connector.

parentProject (required): The Google Cloud Project ID of the table
dataset(optional unless omitted in table): The BigQuery dataset containing the table.
table (required): The BigQuery table in the format [[project:]dataset.]table
temporaryGcsBucket (optional. required for writes):

You can see other available options here: https://github.com/GoogleCloudDataproc/spark-bigquery-connector/tree/0.24.2

Spark configurations

Following Spark configurations are required only for writes into BigQuery.

spark.hadoop.fs.gs.impl=com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystem
spark.hadoop.google.cloud.auth.service.account.json.keyfile=true

You also need to configure credentials in one of following set of configurations.

Credential file

spark.hadoop.fs.gs.auth.service.account.json.keyfile=credentials.json

You need to upload credentials.json to your S3 bucket, and set the file path in Referenced files path.

Private key

spark.hadoop.fs.gs.auth.service.account.email= [your-email-extracted-from-service_account_json_file]
spark.hadoop.fs.gs.auth.service.account.private.key.id= [your-private-key-id-extracted-from-service_account_json_file]
spark.hadoop.fs.gs.auth.service.account.private.key= [your-private-key-body-extracted-from-service_account_json_file]

You can set these Spark configurations in one of following ways.

The param --conf of Glue job parameters
The job script using SparkConf

from pyspark.conf import SparkConf conf = SparkConf() conf.set("spark.hadoop.fs.gs.impl", "com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystem") conf.set("spark.hadoop.fs.gs.auth.service.account.enable", "true") conf.set("spark.hadoop.google.cloud.auth.service.account.json.keyfile", "credentials.json")

Support

Vendor support

Please allow 24 hours

Get support

AWS infrastructure support

AWS Support is a one-on-one, fast-response support channel that is staffed 24x7x365 with experienced and technical support engineers. The service helps customers of all sizes and technical abilities to successfully utilize the products and features provided by Amazon Web Services.

Get support

Similar products

CData AWS Glue Connector for Google BigQuery

By CData Software

Easily connect to Google BigQuery from AWS Glue

View product

DataDirect Cloud Connector for Google BigQuery

By Progress Software

Quickly connect to Google BigQuery data from AWS Glue with drivers by Progress DataDirect, the leader in real-time data connectivity.

View product

Solidatus

By Solidatus

Solidatus is the leading and most powerful data lineage and metadata management software solution by providing an organization-wide connected governance solution, an Enterprise Data Blueprint.

View product

Google Cloud Storage Connector for AWS Glue

By Amazon Web Services

Easily connect to Google Cloud Storage from AWS Glue

View product

Plausible: An Alternative to Google Analytics packaged by Code Creator

By Code Creator

This product has a fee associated with the provision and deployment of the application and AMI support. Plausible is a simple and privacy-friendly Google Analytics alternative.

View product

Windows Server 2022 Base with Containers Google Chrome and VSCode

By Solve DevOps

This is Windows Server 2022 Base with Containers repackaged with Google Chrome and VSCode to quickly get you started with the Web and Productivity. Launch your Windows Server today and enjoy the new stable Operating System from Microsoft.

View product

Microsoft Windows Server 2022 Base with Google Chrome and VSCode

By Solve DevOps

This is Windows Server 2022 repackaged with Google Chrome and VSCode to quickly get you started with the Web and Productivity. Launch your Windows Server today and enjoy the new stable Operating System from Microsoft.

View product

Customer reviews

Leave a review

Ratings and reviews

Info

1 ratings

5 star

4 star

3 star

2 star

1 star

100%

1 AWS reviews

2 external reviews

Star ratings include only reviews from verified AWS customers. External reviews can also include a star rating, but star ratings from external reviews are not averaged in with the AWS customer star ratings.

Preview

Aws

Reviewed on Apr 23, 2024

Review from a verified AWS customer

This is the coolest product ever and it's so useful, and really amazing I appreciate it, so have, it guys

Mithun M.

No fuss connectivity to Bigquery from AWS Gglue

Reviewed on Feb 04, 2023

Review provided by G2

What do you like best about the product?

- Easy to setup the connection from Glue to GCP to get up and running with loading data from Bigquery tables to AWS glue
- We use it for bringing the GA data amounting to multiple gigabytes and process it using Pyspark in AWS glue

What do you dislike about the product?

Haven't faced any issues while setting up or using the AWS Glue Connector for Google BigQuery, but having more details and upto date documentation would be a good way to improve it

What problems is the product solving and how is that benefiting you?

For bringing data from Big Query table to AWS glue and process it using AWS glue pyspark and push the processed data to S3 location and then back to another table in Bigquery

Education Management

Glue Connector Integrations with BQ

Reviewed on Apr 08, 2022

Review provided by G2

What do you like best about the product?

AWS glue has been a game changer for me. We've been utilizing the Glue Schema Registries, it provides versioning of schema, which wasn't available when we were dealing with Pub/Sub Schema.

What do you dislike about the product?

Unfortunately the Glue client is available only in Java. Particularly for the SerDe operations on our Avro Data.

What problems is the product solving and how is that benefiting you?

We use it mostly to provide schemas for our tables in BigQuery. The idea behind using Glue is to inferr avro schema from the data we have from CDC, and move it to BigQuery.

Recommendations to others considering the product:

Strong Tool to manage out your metadata!

View all reviews