Listing Thumbnail

    TPC-DS connector for AWS Glue

     Info
    Generate the TPC-DS compliant datasets from AWS Glue.
    Listing Thumbnail

    TPC-DS connector for AWS Glue

     Info

    Overview

    The TPC-DS Glue connector enables Glue ETL Jobs to generate TPC-DS compliant datasets with your preferred scale. The generated datasets can be used for any benchmarking purpose in AWS Glue jobs, Amazon Athena, Amazon EMR, Amazon Redshift Spectrum, etc.

    Highlights

    • Randomly generate TPC-DS compliant datasets from AWS Glue Jobs inside the connector.

    Details

    Delivery method

    Delivery option
    Glue 3.0
    Glue 1.0/2.0

    Latest version

    Operating system
    Linux

    Features and programs

    Financing for AWS Marketplace purchases

    AWS Marketplace now accepts line of credit payments through the PNC Vendor Finance program. This program is available to select AWS customers in the US, excluding NV, NC, ND, TN, & VT.
    Financing for AWS Marketplace purchases

    Pricing

    TPC-DS connector for AWS Glue

     Info
    This product is free. Subscriptions have no end date and can be canceled anytime.

    Vendor refund policy

    We do not currently support refunds (you can cancel at any time)

    Legal

    Vendor terms and conditions

    Upon subscribing to this product, you must acknowledge and agree to the terms and conditions outlined in the vendor's End User License Agreement (EULA) .

    Content disclaimer

    Vendors are responsible for their product descriptions and other product content. AWS does not warrant that vendors' product descriptions or other product content are accurate, complete, reliable, current, or error-free.

    Usage information

     Info

    Delivery details

    Glue 3.0

    Supported services: Learn more 
    • Amazon ECS
    • Amazon EKS
    Container image

    Containers are lightweight, portable execution environments that wrap server application software in a filesystem that includes everything it needs to run. Container applications run on supported container runtimes and orchestration services, such as Amazon Elastic Container Service (Amazon ECS) or Amazon Elastic Kubernetes Service (Amazon EKS). Both eliminate the need for you to install and operate your own container orchestration software by managing and scheduling containers on a scalable cluster of virtual machines.

    Version release notes

    TPC-DS data generator for AWS Glue.

    Additional details

    Usage instructions

    Please subscribe to the product from AWS Marketplace and Activate the Glue connector for Glue 3.0 from Glue Studio .

    What is the TPC-DS connector for AWS Glue?

    This connector generates TPC-DS compliant datasets. To generate the datasets, you don't need any data sources. After writing the datasets on your resource such as Amazon S3 by the Glue ETL job, the dataset can be used for any benchmarking purpose in your workload.

    Please refer to http://www.tpc.org/tpcds/  about TPC-DS and http://tpc.org/tpc_documents_current_versions/pdf/tpc-ds_v3.2.0.pdf  about TPC-DS specification.

    Connector options you need to set

    You can pass the following options to the connector.

    • table (required) - A table name. You can pick up a table from 25 tables. The table list is in https://github.com/awslabs/aws-athena-query-federation/tree/master/athena-tpcds .
    • scale (optional, default is 1) - Generated data size. Possible range is 1 to 100000. You can specify the size of generated data. Specifically scale 1 means that all table data will be generated with 1GB. For example, when you specify 7 for scale, 7GB data of all tables will be generated. This scale factor is described in the section 3 in http://tpc.or/tpc_documents_current_versions/pdf/tpc-ds_v3.2.0.pdf .
    • numPartitions (optional, default is 1) - The maximum number of concurrency to generate table data in parallel. Please set more than 1 for the concurrent processing with Spark.
      • It's recommended that the value of numPartitions parameter be set based on "Number of Workers" of your ETL job. Here's the calculation formula. The following calculation takes consideration in Glue 2.0 and 3.0. Please be aware that the calculation depends on the "Worker type" of your job as follows.
        • G.1X - numPartitions = (Number of Workers - 1) * 4
        • G.2X - numPartitions = (Number of Workers - 1) * 8
      • For example, when your Glue 3.0 job is set to G.1X as the worker type and 10 number of workers, the numPartitions will be calculated by (10 - 1) * 4 = 36.

    You can set up the connector by the below steps in AWS Glue Studio.

    Using the TPC-DS connector for AWS Glue

    Here's the setup steps for using the TPC-DS connector.

    1. Setup TPC-DS custom connector and a related connection on Glue Studio console.
    2. Create a job. You set connector options and a necessary job parameter.
    3. Save and run the job.

    Step 1: Setup TPC-DS connector and create a relevant connection

    To set up the TPC-DS connector and create a connection for your job:

    1. Subscribe the product and Activate the connector using AWS Glue Studio from the top of this instruction page.
    2. Enter your connection name and choose "Create connection and active connector". You can optionally add a description and "Network options". For "Connection access", keep it empty.

    Step 2: Create a job

    To create a job from your connection which is created in the previous step:

    1. Choose the connection and "create job".
    2. Select your created connection figure on the visual canvas.
    3. Add connection options and enter the necessary information. Specifically table option is required, and if needed, you can specify scale and numPartitions options. (e.g.) table = customer, scale = 10, numPartitions = 30
    4. Enter job properties in the "Job details" tab, and Choose "Save"

    Step 3. Save and run the job

    After filling in all parameters and creating the connector job, run the job.

    Support

    Vendor support

    Please allow 24 hours

    AWS infrastructure support

    AWS Support is a one-on-one, fast-response support channel that is staffed 24x7x365 with experienced and technical support engineers. The service helps customers of all sizes and technical abilities to successfully utilize the products and features provided by Amazon Web Services.

    Similar products

    Customer reviews

    Ratings and reviews

     Info
    0 ratings
    5 star
    4 star
    3 star
    2 star
    1 star
    0%
    0%
    0%
    0%
    0%
    0 AWS reviews
    No customer reviews yet
    Be the first to write a review for this product.