This Guidance demonstrates how to trace and better understand your data lineage in Amazon QuickSight. It does this through a combination of AWS services that replace complex scripting with an AWS CloudFormation template. This allows you to visualize and analyze the usage and relationships of data sources and datasets. Previously, complex scripts were required to trace connections between these assets. QuickSight assets needed manual evaluation to validate migration. Manual checks of dashboards were also needed when evaluating changes in data schemas, filters, parameters, or visuals. This manual process did not scale well and risked production failures by missing impacted dashboards. With this new automated architecture, you can reduce the time spent tracing QuickSight data lineage from weeks to minutes.

Please note: [Disclaimer]

Architecture Diagram

[Architecture diagram description]

Download the architecture diagram PDF 

Well-Architected Pillars

The AWS Well-Architected Framework helps you understand the pros and cons of the decisions you make when building systems in the cloud. The six pillars of the Framework allow you to learn architectural best practices for designing and operating reliable, secure, efficient, cost-effective, and sustainable systems. Using the AWS Well-Architected Tool, available at no charge in the AWS Management Console, you can review your workloads against these best practices by answering a set of questions for each pillar.

The architecture diagram above is an example of a Solution created with Well-Architected best practices in mind. To be fully Well-Architected, you should follow as many Well-Architected best practices as possible.

  • The AWS CloudFormation stack combined with the Lambda function enable logging of resource provisioning, errors, and user activity. This enables consistent measurement of operations and identification of improvement. Other services used in this Guidance include AWS CloudTrail and Amazon CloudWatch that automatically capture the Lambda function's logs and errors. The provisioned Athena and Quicksight services also log user API calls in CloudTrail. The Lambda function logs errors like API failures, throttling, or rate limits to CloudWatch.

    CloudFormation templates deploy the automated infrastructure, logging any failures to CloudWatch for review. If a resource fails provisioning, CloudFormation rolls back other resources.

    All the services in this architecture support configurable logging to CloudWatch or CloudTrail, allowing tracking and customization as needed for your operational requirements.

    Read the Operational Excellence whitepaper 
  • Access to data is secured through AWS Identity and Access Management (IAM) policies granting permission only to authorized users. The Amazon S3 bucket has a policy allowing access solely to the IAM role used by QuickSight and Athena. Resources are private by default, and can only be modified with IAM identity-based policies. The QuickSight Data Lineage Dashboard is private to one user initially, who can optionally share with additional authorized users. The data in the Amazon S3 bucket has private access restricted only to QuickSight and Athena using IAM roles. No other identities are granted access to the data by this architecture, helping to ensure security.

    Read the Security whitepaper 
  • The AWS services in this Guidance are serverless, using managed AWS endpoints and DNS to support a highly available network topology with AWS handling service failures and recovery automatically. Specifically, the CloudFormation stack automates provisioning, rolling back all resources if one fails. The CloudFormation stack also provisions required resources except Athena tables, deleting all but QuickSight on failure or deletion. CloudFormation logs provisioning and errors available in CloudTrail and CloudWatch.

    The Amazon S3 bucket stores recoverable QuickSight metadata, Athena provides high availability across Availability Zones, and QuickSight utilizes AWS reliability features.

    The Lambda function is stateless, using Amazon S3 for invocations; Lambda logs invocations and errors to CloudWatch.

    Finally, the serverless services scale automatically based on usage. Amazon S3 scales with data, Athena and QuickSight with usage, and additional Lambda functions are invoked by QuickSight updates.

    Read the Reliability whitepaper 
  • The services chosen are purpose-built for this data lineage use case. First, QuickSight is a serverless service that integrates with Athena to query data in Amazon S3. The QuickSight dashboard allows you to gain insights about your QuickSight resources and data lineage. You can build additional Athena views or QuickSight datasets based on the Athena tables to query or visualize more information.

    Second, the Lambda function provides on-demand compute when QuickSight resources are created or updated.

    Third, the services deploy in the same Region to reduce latency and data transfer costs.

    Finally, the managed serverless services scale automatically based on usage, with scaling and maintenance handled by AWS. This optimized architecture allows you to focus on data lineage insights rather than performance management.

    Read the Performance Efficiency whitepaper 
  • This Guidance utilizes managed AWS services to eliminate maintenance overhead and the need for third-party licensing. By leveraging these optimized and automated AWS services, costs are reduced through serverless usage and reduced data transfer. Also, the services deploy in the same Region to minimize data transfer charges, and QuickSight has no data transfer fees.

    The serverless services run only as needed. Athena invokes when the Data Lineage Dashboard is accessed. The Amazon S3 bucket contains just flat files of QuickSight metadata. Lambda runs once initially and then only when invoked by QuickSight resource updates through EventBridge.

    Read the Cost Optimization whitepaper 
  • The architecture uses sustainable AWS services that scale on demand. Athena invokes only when users access the QuickSight datasets, automatically scaling with usage. The Lambda function runs during initial setup, then only when QuickSight resources are created or modified. Data storage in Amazon S3 is cost-effective and auto-scalable. Data remains in Amazon S3 and is accessed only when required. The serverless services provision computation only when invoked, avoiding continuous hardware allocation. This optimized on-demand resource usage enhances sustainability while automated scaling, serverless services, and Amazon S3 storage minimize environmental impacts.

    Read the Sustainability whitepaper 

Implementation Resources

The sample code is a starting point. It is industry validated, prescriptive but not definitive, and a peek under the hood to help you begin.

[Subject]
[Content Type]

[Title]

[Subtitle]
This [blog post/e-book/Guidance/sample code] demonstrates how [insert short description].

Disclaimer

The sample code; software libraries; command line tools; proofs of concept; templates; or other related technology (including any of the foregoing that are provided by our personnel) is provided to you as AWS Content under the AWS Customer Agreement, or the relevant written agreement between you and AWS (whichever applies). You should not use this AWS Content in your production accounts, or on production or other critical data. You are responsible for testing, securing, and optimizing the AWS Content, such as sample code, as appropriate for production grade use based on your specific quality control practices and standards. Deploying AWS Content may incur AWS charges for creating or using AWS chargeable resources, such as running Amazon EC2 instances or using Amazon S3 storage.

References to third-party services or organizations in this Guidance do not imply an endorsement, sponsorship, or affiliation between Amazon or AWS and the third party. Guidance from AWS is a technical starting point, and you can customize your integration with third-party services when you deploy the architecture.

Was this page helpful?