AWS Big Data Blog

Introducing AWS Glue usage profiles for flexible cost control

AWS Glue is a serverless data integration service that enables you to run extract, transform, and load (ETL) workloads on your data in a scalable and serverless manner. One of the main advantages of using a cloud platform is its flexibility; you can provision compute resources when you actually need them. However, with this ease of creating resources comes a risk of spiraling cloud costs when those resources are left unmanaged or without guardrails. As a result, admins need to balance avoiding high infrastructure costs with allowing users to work without unnecessary friction.

To address that, today we are excited to announce the general availability of AWS Glue usage profiles. With AWS Glue usage profiles, admins can create different profiles for various classes of users within the account, such as developers, testers, and product teams. Each profile is a unique set of parameters that can be assigned to different types of users. For example, developers may need more workers and can have a higher number of maximum workers, whereas product teams may need fewer workers and a lower timeout or idle timeout value.

How AWS Glue usage profiles works

An AWS Glue usage profile is a resource identified by an Amazon Resource Name (ARN) for better governance of resources. Admins have the ability to create AWS Glue usage profiles and define default values to be used when a parameter value is not provided. For example, you can create an AWS Glue usage profile with the default number of workers set to 2. When you sign in to the AWS Glue console using the AWS Identity and Access Management (IAM) user associated with the usage profile and create a new job, the initial value configured for the number of workers shows as 2 instead of the service default of 10.

Additionally, you can specify a set of allowed values for validation when a user associated with this profile creates a resource. If the parameter is numeric, admins can define a range of allowed values by specifying minimum and maximum values, instead of a specific set. For example, you can create an AWS Glue usage profile that allows only G.1X worker types. When you sign in to the AWS Glue console using an IAM user associated with this usage profile and create a job with a G.2X worker type, saving it will result in a failure.

Because an AWS Glue profile is a resource identified by an ARN, all the default IAM controls apply, including action-based, resource-based, and tag-based authorization. Admins update the IAM policy of users who create AWS Glue resources, granting them read permission on the profiles. This enables users to view the profiles. In order to use them when making API calls to create AWS Glue resources, admins will tag the user or role with glue:UsageProfile as the key and the profile name as the value. AWS Glue validates the API requests such as CreateJob, UpdateJob, StartJobRun, and CreateSession based on the values specified in the AWS Glue profile and raise appropriate exceptions.

In the following sections, we demonstrate how to create AWS Glue usage profiles, assign profiles to users, and demonstrate the usage profiles in action.

Create an AWS Glue usage profiles

To get started and create AWS Glue usage profiles, complete the following steps:

  1. On the AWS Glue console, choose Cost management in the navigation pane.

Let’s create your first usage profile for your developers.

  1. Choose Create usage profile.
  2. For Usage profile name, enter developer.
  3. Under Customize configurations for jobs, for Number of workers, for Default, enter 20.
  4. For Default worker type, choose G.1X.
  5. For Allowed worker types, choose G.1X, G.2X, G.4X, and G.8X.
  6. For Customize configurations for sessions, configure the same values.
  7. Choose Create usage profile.

Next, create another usage profile for your business analysts, who need fewer workers and a lower timeout or idle timeout value.

  1. Choose Create usage profile.
  2. For Usage profile name, enter analyst.
  3. Under Customize configurations for jobs, for Number of workers, for Default, enter 2. For Maximum, enter 5.
  4. For Default worker type, choose G.1X.
  5. For Allowed worker types, choose only G.1X.
  6. For Timeout, for Default, enter 60. For Maximum, enter 120.
  7. For Customize configurations for sessions, configure the same values.
  8. For Idle timeout, for Default, enter 10. For Maximum, enter 60.
  9. Choose Create usage profile.

You have successfully created two usage profiles.

Assign usage profiles

Restrictions can only be applied to AWS Glue API calls made by IAM users or roles if the profile is assigned to them. There are two steps that the admin needs to take in order to assign a profile:

  • In IAM, create a tag named glue:UsageProfile on the user or role, with the name of the profile used as the tag value
  • The IAM policy assigned to the user or role needs to be updated to include the glue:GetUsageProfile IAM action permission to read the assigned profile

Follow these steps to create two new users, each assigned a different profile:

  1. On the IAM console, choose Users in the navigation pane.
  2. Choose Create user.
  3. For User name, enter blogDeveloper.
  4. Select Provide user access to the AWS Management Console and I want to create an IAM user.
  5. You can enter a custom password or let one be generated (in the latter case, select Show password so you can use it later to sign in).
  6. Choose Next.
  7. Attach the managed policies AWSGlueConsoleFullAccess and IAMReadOnlyAccess.
  8. Choose Next.
  9. Review the summary and complete the creation.
  10. Remember the password for later and choose Return to users list and choose the user just created.
  11. On the Permissions tab, for Add permissions, choose Create inline policy.
  12. In the policy editor, switch to JSON and enter the following policy, replacing the AWS Region, account ID, and usage profile name placeholders. For the usage profile name, use the value developer for the user blogDeveloper and analyst for the role blogAnalyst.
    {
      "Version": "2012-10-17",
      "Statement": [
        {
          "Effect": "Allow",
          "Action": [
            "glue:GetUsageProfile"
          ],
          "Resource": [
            "arn:aws:glue:<aws region>:<account id>:usageProfile/<usage profile name>"
          ]
        },
        {
          "Effect": "Allow",
          "Action": [
            "iam:PassRole"
          ],
          "Resource": [
            "*"
          ],
          "Condition": {
            "StringLike": {
              "iam:PassedToService": [
                "glue.amazonaws.com"
              ]
            }
          }
        }
      ]
    }
  13. Name the policy GlueUsageProfilePermission and complete the creation.
  14. On the Tags tab, add a new tag with the name glue:UsageProfile and the value developer.

Repeat the steps to create a user named blogAnalyst, and replace the ARN in the policy with arn:aws:glue:<aws region>:<account id>:usageProfile/analyst. Make sure the Region and account ID are populated before updating the policy. For the tag value, specify analyst instead of developer.

On the AWS Glue console, navigate to the developer usage profile. You can see that the status has been changed from Not assigned to Assigned.

Lastly, complete the following steps to create two IAM roles for AWS Glue jobs and sessions with the profile.

  1. Create two IAM roles for AWS Glue. Name them GlueServiceRole-developer and GlueServiceRole-analyst.
  2. Configure the following inline policies by replacing the Region, account ID, and usage profile name placeholders. For the usage profile name placeholder, use the value developer for the role GlueServiceRole-developer and analyst for the role GlueServiceRole-analyst.
    {
      "Version": "2012-10-17",
      "Statement": [
        {
          "Effect": "Allow",
          "Action": [
            "glue:GetUsageProfile"
          ],
          "Resource": [
            "arn:aws:glue:<aws region>:<account id>:usageProfile/<usage profile name>"
          ]
        },
        {
          "Effect": "Allow",
          "Action": [
            "iam:PassRole"
          ],
          "Resource": [
            "*"
          ],
          "Condition": {
            "StringLike": {
              "iam:PassedToService": [
                "glue.amazonaws.com"
              ]
            }
          }
        }
      ]
    }
  3. On the Tags tab for the IAM role, add a new tag with the name glue:UsageProfile and the value developer for GlueServiceRole-developer and analyst for GlueServiceRole-analyst.

Usage profiles in action: Jobs

Now you have two users with different AWS Glue profiles assigned. Let’s test them and see the differences. First, let’s try the user blogDeveloper to see how the profile developer works.

  1. Open the AWS Glue console with the blogDeveloper user.
  2. Choose ETL jobs in the navigation pane and choose Script editor.
  3. Choose Create script.
  4. Choose the Job details tab.

The default number of Requested number of workers is 20, which corresponds to the default setting of the profile developer.

Next, let’s try the user blogAnalyst to see how the profile analyst works.

  1. Open AWS Glue console with the blogAnalyst user.
  2. Choose ETL jobs in the navigation pane and choose Script editor.
  3. Choose Create script.
  4. Choose the Job details tab.

The default number of Requested number of workers is 2, which corresponds to the default setting of the profile analyst.

Additionally, the default number of Job timeout is 60, which corresponds to the default setting of the profile analyst.

  1. For Worker type, choose the dropdown menu.

Only G.1X is available and G.2X, G.4X, and G.8X are disabled. This is because we allowed the profile analyst to choose G.1X.

  1. For Requested number of workers, enter 20 to simulate invalid input.

You will see the waring message The maximum number of workers cannot exceed 5 for usage profile "analyst".

Now, the user blogAnalyst is attempting to run a job in the account where the number of workers set for the job is 20. However, the maximum number of workers in the profile assigned to this user is 5. When the user tries to run the job, it fails with an error, as shown in the following screenshot.

In this example, we’ve demonstrated how usage profiles manage AWS Glue jobs based on the preconfigured values in the profiles.

Usage profiles in action: Sessions

Next, continue using the user blogAnalyst and try the AWS Glue Studio notebook interface to see how interactive sessions work with usage profiles:

  1. Open the AWS Glue console with the blogAnalyst user.
  2. Choose ETL jobs in the navigation pane and choose Notebook.
  3. For IAM role, choose GlueServiceRole-analyst.
  4. Choose Create notebook.
  5. Wait for the notebook to be ready.

In the second notebook cell, %number_of_workers is set to 2, which corresponds to the default value of the profile analyst.

  1. Update %number_of_workers from 2 to 10 to simulate an invalid access pattern:
    %number_of_workers 10
  2. Run the cell.

You get an error message saying “Provided number of workers is not within the range [1, 5] in the analyst profile.”

This is because the given value of 10 exceeds the maximum number of workers set in the profile assigned to this user.

  1. Update %number_of_workers from 10 to 5 to simulate a valid access pattern:
    %number_of_workers 5
  2. Run the cell.

This time, the session has been successfully created.

Now you have observed how usage profiles manage AWS Glue interactive sessions based on the preconfigured values in the profiles.

Conclusion

This post demonstrated how AWS Glue usage profiles allow you to manage your AWS Glue resources with ease and flexibility.

With AWS Glue usage profiles, you can manage and control resources of different users in order to set your organization’s best practices and save costs. AWS Glue usage profiles serve as a guardrail to prevent unauthorized resource usage from occurring.

Try out the feature for yourself, and leave any feedback or questions in the comments.


About the Authors

Noritaka Sekiyama is a Principal Big Data Architect on the AWS Glue team. He is responsible for building software artifacts to help customers. In his spare time, he enjoys cycling with his road bike.

Gonzalo Herreros is a Senior Big Data Architect on the AWS Glue team, with a background in machine learning and AI.

Keerthi Chadalavada is a Senior Software Development Engineer at AWS Glue. She is passionate about designing and building end-to-end solutions to address customer data integration and analytic needs.

Gal HeyneGal Heyne is a Product Manager for AWS Glue with a strong focus on AI/ML, data engineering, and BI. She is passionate about developing a deep understanding of customers’ business needs and collaborating with engineers to design easy-to-use data products.