Desktop and Application Streaming

How to report Amazon AppStream 2.0 home folder use with Amazon Athena

Customers ask how to analyze Amazon AppStream 2.0 home folder usage so they can track related spend, manage usage, and administer AppStream 2.0 home folders. Customers have questions like:

“How much data is User1 using in AppStream 2.0 home folders?”

“What are the top 10 largest files being stored and who owns them?”

This blog posts shows you how to use Amazon Athena, AWS Glue and Amazon S3 inventory to create custom reports of AppStream 2.0 home folder usage.

Time to read: 5 minutes
Time to complete setup: 30 minutes
Cost to complete (estimated): $1
Learning level: Advanced (300)
Services used: Amazon AppStream 2.0, AWS Glue, Amazon Athena, Amazon S3

Overview of solution

Amazon AppStream 2.0 manages user content by using Amazon S3 buckets created in your account. The buckets are fully managed by the service, without any configuration from an administrator. For every AWS Region, AppStream 2.0 creates a bucket within your account. The hierarchy of the user’s home folders depends on how you launch a streaming session. The user folder structure is as follows:

/bucket/user/<auth type>/<user-id-SHA-256-hash>/

The hash is the lowercase SHA-256 hash hexadecimal string. It is generated from the UserID passed to the CreateStreamingURL API operation, or the NameID SAML attribute passed in the SAML federation request.

In this blog, AWS Lambda is configured to process the AppStream 2.0 usage reports and hash each of the UserIDs. Amazon S3 inventory provides comma-separated values (CSV) for the AppStream 2.0 generated bucket for the user’s home folder. Another Lambda function processes the report. AWS Glue crawlers are configured to run nightly and update the AWS Glue Data Catalog. The results can then be queried using Amazon Athena. The following architecture diagram provides an example of the end state environment.

Architectural diagram showing the AWS Lambda functions processing objects from the Amazon S3 buckets and the AWS Glue Crawlers running on the resulting files.

Walkthrough

This walkthrough allows you to configure Amazon AppStream 2.0 home folder usage reporting with Amazon Athena.

Prerequisites:

Step 1: Create an Amazon S3 bucket

Create an Amazon S3 Bucket for Amazon S3 Inventory Reporting and AppStream 2.0 Reporting. This will be a separate bucket from the AppStream Usage Reports that is automatically created. This blog post uses appstream-reporting. You will need to choose your own unique name for the bucket.

  1. Sign in to the AWS Management Console and open the Amazon S3 console at https://console.aws.amazon.com/s3
  2. Choose Create Bucket.
  3. In Bucket name, enter a DNS-compliant name for your bucket.
  4. In Region, select the AWS Region where you are running AppStream 2.0.
  5. Choose Create.

Step 2: Enable Amazon S3 inventory on the AppStream 2.0 home folder bucket.

Amazon S3 inventory provides a flat file list of your objects on a daily or weekly basis. You can use it to audit and report on the status of your objects including object size and number of objects.

  1. Open the AppStream 2.0 console at https://console.aws.amazon.com/appstream2
  2. Choose a Stack that has home folders enabled, then choose the Storage
  3. Next to S3 Bucket Name, choose the name to take you the S3 console.
  4. Choose the Management Tab, then choose the Inventory
  5. Choose + Add New
    1. Inventory name: appstream-inventory-report
    2. Filters: keep blank
    3. Destination bucket: Choose the bucket you created in Step 1
    4. Destination prefix: as-home-folder-s3-inventory-report
    5. Frequency: Daily
    6. Output format: CSV
    7. Object Versions: Current version only
    8. Optional fields: Select all
  6. Choose Save.

Step 3: Create AWS Lambda functions and AWS Glue crawlers.

The AWS Lambda functions are used to enhance the data for the Amazon Athena queries. The CloudFormation template creates two functions. One function hashes the usernames from the Amazon AppStream 2.0 usage reports. The other function processes the S3 Inventory report to simplify queries.

  1. Choose the quick create link to launch the CloudFormation stack.
  2. Check the Region that you want to create the resources in is selected in the upper right-hand corner of the screen.
  3. On the Create stack page, choose Next.
  4. For Parameters:
    1. AppStreamUsageHomeFolderReportsBucket: This is the name of the S3 bucket created in Step 1.
    2. BucketNameAppStreamUsageReports: This is the name of the S3 bucket created by AppStream 2.0 for usage reports enabled as a prerequisite. You can find the name by going to the AppStream 2.0 console. Choose Usage Reports on the left-hand side.
    3. DatabaseName: This can be kept as the default. If you changed the name of the AWS Glue database when launching the CloudFormation stack for AppStream 2.0 usage reports, modify this value.
    4. ScheduleExpression: You can modify when the AWS Glue crawler runs. By default, it is 23:00 UTC.
  5. On the Specify stack details page, choose Next.
  6. On the configure stack options, choose Next.
  7. On Review Stack Name page, choose I acknowledge that AWS CloudFormation might create IAM resources. Then, choose Create Stack.

Step 4: Create Lambda Triggers

Since the Amazon S3 buckets are not managed by AWS CloudFormation, the triggers for the AWS Lambda functions must be setup manually.

Create the AWS Lambda trigger for the Amazon AppStream 2.0 usage reports:

  1. Open the Lambda console at https://console.aws.amazon.com/lambda
  2. Choose the function with the name <Stack Name>-AppStreamUsageCreateHash-<ID>
  3. In the designer section, choose + Add trigger.
  4. For Select a trigger, choose S3.
    1. Bucket: Choose the bucket that contains the AppStream 2.0 usage reports. The bucket name starts with appstream-logs-<region>
    2. Event type: PUT
    3. Prefix: sessions/
    4. Suffix: .csv
  5. Acknowledge the recursive innovation message. Choose Add.

Create the AWS Lambda trigger for the Amazon S3 Inventory reports:

  1. On the functions page, choose the function with the name <Stack-Name>-AppStreamUsageSplitS3Inv-<ID>
  2. In the designer section, choose + Add trigger.
  3. For Select a trigger, choose S3.
    1. Bucket: This is the name of the bucket you created in Step 1
    2. Event type: PUT
    3. Prefix: as-home-folder-s3-inventory-report/appstream2-36fb080bb8-<region>-<account id>/appstream-inventory-report/data/
      1. You can use the Amazon S3 Console to verify your folder path and copy it from there.
    4. Acknowledge the recursive innovation message. Then, choose Add.

The AWS Lambda functions will now run once new objects are added to the Amazon S3 bucket. The AWS Glue crawlers run daily on the schedule specified in the AWS CloudFormation configuration.

Step 5: Test

If you have at least one session report from AppStream 2.0 and one Amazon S3 Inventory csv file, you can manually start the process. You may have to wait up to 48 hours for the initial Amazon S3 inventory report.

To manually start the AWS Lambda function to hash usernames:

  1. Open the Amazon S3 Console and navigate to the bucket with the usage reports.
  2. Open the sessions folder and navigate to the latest date.
  3. Download the latest object and upload it with the same name. This triggers the Amazon S3 hash Lambda function to run.
  4. You see a new object created under hash in the Amazon S3 bucket created in Step 1. The time taken depends on the file size.

To manually start the AWS Lambda function for the Amazon S3 inventory report:

  1. Open the Amazon S3 Console and navigate to the S3 bucket created in Step 1.
  2. In the as-home-folder-s3-inventory-report prefix, navigate to the data location specified in Step 4.
  3. Download the latest object and upload it with the same name. This triggers the Amazon S3 inventory Lambda function to run.
  4. You see a new object created under as-homefolder in the Amazon S3 bucket created in Step 1. The time taken depends on the file size.

To manually start the AWS Glue crawlers:

  1. Open the AWS Glue console at https://console.aws.amazon.com/glue
  2. On the left-hand side, choose Crawlers.
  3. Select the box next to appstream-usage-home-folder-inventory-report and choose Run crawler.
  4. Select the box next to appstream-usage-home-folder-user-hash and choose Run crawler.

Step 6: Query with Amazon Athena

Now that the AWS Glue crawlers have updated the Data Catalog, you can use Amazon Athena to run queries on the data. This helps us answer questions like:

“How much data is User1 using in AppStream 2.0 home folders?”

“What are the top 10 largest files being stored and who owns them?”

  1. Open the Amazon Athena console at https://console.aws.amazon.com/athena
  2. If this is your first time visiting the Athena console, you’ll get a Getting Started page. Choose Get Started to open the Query Editor.
  3. Choose the link to set up a query result location in Amazon S3.
  4. In the Settings dialog box, enter the path to the bucket you created for your query results. You may choose to add a new folder and store the results in there.
  5. Choose Save.

The AWS CloudFormation template we launched contained two saved queries to get started.

  1. In the top bar, choose Saved queries.
  2. Choose AS2_homefolder_size_summary.
  3. Choose Run query. The following screenshot shows an example.
  4. After reviewing the Results, choose Saved queries
  5. Choose AS2_homefolder_largest_files and run the query.

The Amazon Athena console showing the total size of each users home folder and the number of objects in the Amazon S3 bucket.

Cleaning up

To avoid incurring future charges, delete the resources created in this blog.

  1. Delete the CloudFormation Stack you created in Step 3.
  2. Empty the S3 bucket you created in Step 1.

Conclusion

You can analyze AppStream 2.0 home folder usage to track related spend, manage usage, and administer home folders using Amazon Athena to create custom reports.

Combining the Amazon AppStream usage reports with home folder data allows you to answer complex business questions. You can use Amazon QuickSight to easily create and publish interactive dashboards with the walkthrough in this blog.

With AWS Glue, you only pay for the time your ETL job takes to run. With Amazon Athena, you only pay for the queries you run making reporting AppStream 2.0 reporting cost effective with no upfront payments or long-term contracts.

Amazon AppStream 2.0 is a fully managed non-persistent application and desktop streaming service. You can centrally manage your desktop applications on AppStream 2.0 and securely deliver them to any computer. You can try sample applications at no cost.