Desktop and Application Streaming

Analyze web browsing activities for Amazon WorkSpaces Secure Browser

Customers in regulated industries like the public sector, healthcare, and financial services often need to meet compliance and audit requirements. This involves storing, archiving, and analyzing user’s browsing trends, such as website visits and session times. Amazon WorkSpaces Secure Browser is a fully managed, remote enterprise browser that allows users to access internal websites and software as a service (SaaS) applications while logging this data through Amazon Kinesis Data Streams. This post demonstrates how to capture browsing streams using Amazon Data Firehose, transform them with AWS Lambda, and store the data in Amazon Simple Storage Service (Amazon S3). Using Amazon Athena, business analysts can extract data using SQL queries and visualise the results using business intelligence dashboards such as Amazon QuickSight.

Solution Overview

Architecture

Figure 1: Architecture

The solution is designed to capture, store, and analyze streaming data generated by users browsing with WorkSpaces Secure Browser. The data pipeline begins with the collection of browsing events such as ‘StartSession’, ‘VisitPage’, and ‘EndSession’ for each user. These events are captured and processed by Amazon Kinesis, buffered by Amazon Data Firehose until they reach a certain threshold, and then triggered to a Lambda function for processing. The processing step involves decoding, transforming, and re-encoding the data, as well as data validation and error handling. The goal of this step is to prepare the data for efficient storage and subsequent analysis.

Next, the transformed data is cataloged by AWS Glue Data Catalog, which makes the data visible, searchable, and queryable for users. Once the data is cataloged, business analysts can use Amazon Athena SQL for data analysis, and visualization. The data is stored in Amazon S3, which provides highly available, scalable, durable, and secure storage. The data is organized using predefined prefixes and partitioned by date and time to ensure easy access and efficient query. An AWS Glue Crawler is scheduled to run every hour to ensure the partitions are updated. This setup provides a seamless and efficient workflow for data analysis, enabling quick and easy access to insights derived from the browsing data.

Optionally, you can use Amazon QuickSight to visualize the data and derive trends. Amazon QuickSight provides integration with Amazon Athena and S3, enabling data analysts to leverage advanced capabilities like Natural Language Processing (NLP) and Generative BI.

This solution can be automatically deployed using an AWS CloudFormation template.

Pre-requisites

Launch the CloudFormation Stack

The launch wizard requires you to provide an Amazon Kinesis Stream name and Amazon S3 bucket name. Other resources required by the solution such as AWS Lambda, AWS Glue Data Catalog, AWS Glue Crawler and Athena workgroups are created by this solution.

  1. Click on Launch Stack to take you to the CloudFormation console.
    LaunchStack
    Disclaimer – This AWS CloudFormation template is for demo and guidance only, not for production use. Please adapt and test it within your organization’s security framework to ensure compliance with your security requirements.
  2. Choose Next.
  3. Leave the parameters as default or make appropriate changes based on your requirements, then choose Next.
    CloudFormation stack parameters
    Figure 2: CloudFormation stack parameters
  4. Review the details on the final page and select I acknowledge that AWS CloudFormation might create IAM resources.
  5. Choose Create.

This will take approximately 5 minutes to complete after which you can view the deployed stack on the AWS CloudFormation console.

Testing the solution

  1. Navigate to the Amazon S3 console.
  2. Locate folders and files created by Amazon Firehose. Ensure the S3 bucket contains browsing logs from the WorkSpaces Secure Browser sessions.
    Folders created by Amazon Firehose
    Figure 3: Folders created by Amazon Firehose

    Logs are created in the S3 bucket only after users use the system and data reaches Amazon Data Firehose buffer intervals.

  3. The buffer size and intervals can be configured on the Amazon Data Firehose stream by the administrator.
    Buffer hints configuration Figure 4: Buffer hints configuration
  4. Make sure that AWS Glue Crawler is run at least once for the AWS Glue Data Catalog to get updated.
    First run of AWS Glue Crawler
    Figure 5: First run of AWS Glue Crawler
  5. To query data and download, go to the Amazon Athena query editor tab and choose the newly created workgroup. You may use the saved queries or create your own.
    Sample saved queries

    Figure 6: Sample saved queries

    For example, the below query gets the number of webpages visited by each user in 2024.

SELECT 
  userName,
  COUNT(*) AS event_count
FROM 
  "securebrowser_database"."securebrowser"
WHERE 
  year = '2024'
GROUP BY 
  username
Amazon Athena query
Figure 7: Amazon Athena query

The following query provides the list of webpages visited by a particular user, carlos_salazar@example.com, in 2024.

SELECT 
  userName, title, utc_timestamp
FROM 
  "securebrowser_database"."securebrowser"
WHERE
  userName = ‘carlos_salazar@example.com’
  AND year = '2024'
Query result
Figure 8: Query result

Optionally, you may use Amazon QuickSight by creating an Amazon Athena dataset and configuring your dashboards to visualize the data or query using natural language. Note that QuickSight integration is not provided as a part of the CloudFormation template.

Visualising data via QuickSight

Figure 9: Visualising data via QuickSight

Cleaning Up

To avoid additional resource charges, delete the CloudFormation template you deployed . You may use choose to delete or retain the data in S3 storage.

Conclusion

Overall, this pipeline automates the process of ingesting, processing, storing, cataloging, and querying WorkSpaces Secure Browser browsing streams, ensuring a seamless and efficient workflow for data analysis.

About the Authors

Arun.P.C Arun.P.C is a Senior Solutions Architect specializing in End User Computing in South East Asia, with more than ten years of experience in consulting and sales engineering in the region.
He holds a bachelor’s degree in engineering and is the holder of three US patents. Connect with him on LinkedIn.
Noritaka Noritaka Sekiyama is a Principal Big Data Architect on the AWS Glue team. He is responsible for building software artifacts to help customers. In his spare time, he enjoys cycling with his new road bike.